mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-01 23:07:55 +01:00

Author	SHA1	Message	Date
Junio C Hamano	ad2d777604	Merge branch 'nd/pack-ofs-4gb-limit' "git pack-objects" and "git index-pack" mostly operate with off_t when talking about the offset of objects in a packfile, but there were a handful of places that used "unsigned long" to hold that value, leading to an unintended truncation. * nd/pack-ofs-4gb-limit: fsck: use streaming interface for large blobs in pack pack-objects: do not truncate result in-pack object size on 32-bit systems index-pack: correct "offset" type in unpack_entry_data() index-pack: report correct bad object offsets even if they are large index-pack: correct "len" type in unpack_data() sha1_file.c: use type off_t* for object_info->disk_sizep pack-objects: pass length to check_pack_crc() without truncation	2016-07-28 10:34:42 -07:00
Nguyễn Thái Ngọc Duy	166df26f28	sha1_file.c: use type off_t* for object_info->disk_sizep This field, filled by sha1_object_info() contains the on-disk size of an object, which could go over 4GB limit of unsigned long on 32-bit systems. Use off_t for it instead and update all callers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:14:20 -07:00
Junio C Hamano	628991391d	Merge branch 'jk/cat-file-buffered-batch-all' "git cat-file --batch-all" has been sped up, by taking advantage of the fact that it does not have to read a list of objects, in two ways. * jk/cat-file-buffered-batch-all: cat-file: default to --buffer when --batch-all-objects is used cat-file: avoid noop calls to sha1_object_info_extended	2016-05-31 12:40:54 -07:00
Jeff King	6a36e1e7bb	cat-file: default to --buffer when --batch-all-objects is used Traditionally cat-file's batch-mode does not do any output buffering. The reason is that a caller may have pipes connected to its input and output, and would want to use cat-file interactively, getting output immediately for each input it sends. This may involve a lot of small write() calls, which can be slow. So we introduced --buffer to improve this, but we can't turn it on by default, as it would break the interactive case above. However, when --batch-all-objects is used, we do not read stdin at all. We generate the output ourselves as quickly as possible, and then exit. In this case buffering is a strict win, and it is simply a hassle for the user to have to remember to specify --buffer. This patch makes --buffer the default when --batch-all-objects is used. Specifying "--buffer" manually is still OK, and you can even override it with "--no-buffer" if you're a masochist (or debugging). For some real numbers, running: git cat-file --batch-all-objects --batch-check='%(objectname)' on torvalds/linux goes from: real 0m1.464s user 0m1.208s sys 0m0.252s to: real 0m1.230s user 0m1.172s sys 0m0.056s for a 16% speedup. Suggested-by: Charles Bailey <charles@hashpling.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-18 14:17:39 -07:00
Jeff King	845de33a5b	cat-file: avoid noop calls to sha1_object_info_extended It is not unreasonable to ask cat-file for a batch-check format of simply "%(objectname)". At first glance this seems like a noop (you are generally already feeding the object names on stdin!), but it has a few uses: 1. With --batch-all-objects, you can generate a listing of the sha1s present in the repository, without any input. 2. You do not have to feed sha1s; you can feed arbitrary sha1 expressions and have git resolve them en masse. 3. You can even feed a raw sha1, with the result that git will tell you whether we actually have the object or not. In case 3, the call to sha1_object_info is useful; it tells us whether the object exists or not (technically we could swap this out for has_sha1_file, but the cost is roughly the same). In case 2, the existence check is of debatable value. A mass-resolution might prefer performance to safety (against outputting a value for a corrupted ref, for example). However, the object lookup cost is likely not as noticeable compared to the resolution cost. And since we have provided that safety in the past, the conservative choice is to keep it. In case 1, though, the object lookup is a definite noop; we know about the object because we found it in the object database. There is no new information gained by making the call. This patch detects that case and optimizes out the call. Here are best-of-five timings for linux.git: [before] $ time git cat-file --buffer \ --batch-all-objects \ --batch-check='%(objectname)' real 0m2.117s user 0m2.044s sys 0m0.072s [after] $ time git cat-file --buffer \ --batch-all-objects \ --batch-check='%(objectname)' real 0m1.230s user 0m1.176s sys 0m0.052s There are two implementation details to note here. One is that we detect the noop case by seeing that "struct object_info" does not request any information. But besides object existence, there is one other piece of information which sha1_object_info may fill in: whether the object is cached, loose, or packed. We don't currently provide that information in the output, but if we were to do so later, we'd need to take note and disable the optimization in that case. And that leads to the second note. If we were to output that information, a better implementation would be to remember where we saw the object in --batch-all-objects in the first place, and avoid looking it up again by sha1. In fact, we could probably squeeze out some extra performance for less-trivial cases, too, by remembering the pack location where we saw the object, and going directly there to find its information (like type, size, etc). That would in theory make this optimization unnecessary. I didn't pursue that path here for two reasons: 1. It's non-trivial to implement, and has memory implications. Because we sort and de-dup the list of output sha1s, we'd have to record the pack information for each object, too. 2. It doesn't save as much as you might hope. It saves the find_pack_entry() call, but getting the size and type for deltified objects requires walking down the delta chain (for the real type) or reading the delta data header (for the size). These costs tend to dominate the non-trivial cases. By contrast, this optimization is easy and self-contained, and speeds up a real-world case I've used. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-18 14:17:38 -07:00
Junio C Hamano	b42ca3dd0f	cat-file: read batch stream with strbuf_getline() It is possible to prepare a text file with a DOS editor and feed it as a batch command stream to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-15 10:35:06 -08:00
Junio C Hamano	8f309aeb82	strbuf: introduce strbuf_getline_{lf,nul}() The strbuf_getline() interface allows a byte other than LF or NUL as the line terminator, but this is only because I wrote these codepaths anticipating that there might be a value other than NUL and LF that could be useful when I introduced line_termination long time ago. No useful caller that uses other value has emerged. By now, it is clear that the interface is overly broad without a good reason. Many codepaths have hardcoded preference to read either LF terminated or NUL terminated records from their input, and then call strbuf_getline() with LF or NUL as the third parameter. This step introduces two thin wrappers around strbuf_getline(), namely, strbuf_getline_lf() and strbuf_getline_nul(), and mechanically rewrites these call sites to call either one of them. The changes contained in this patch are: * introduction of these two functions in strbuf.[ch] * mechanical conversion of all callers to strbuf_getline() with either '\n' or '\0' as the third parameter to instead call the respective thin wrapper. After this step, output from "git grep 'strbuf_getline('" would become a lot smaller. An interim goal of this series is to make this an empty set, so that we can have strbuf_getline_crlf() take over the shorter name strbuf_getline(). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-15 10:12:51 -08:00
Junio C Hamano	33e8fc8740	usage: do not insist that standard input must come from a file The synopsys text and the usage string of subcommands that read list of things from the standard input are often shown like this: git gostak [--distim] < <list-of-doshes> This is problematic in a number of ways: * The way to use these commands is more often to feed them the output from another command, not feed them from a file. * Manual pages outside Git, commands that operate on the data read from the standard input, e.g "sort", "grep", "sed", etc., are not described with such a "< redirection-from-file" in their synopsys text. Our doing so introduces inconsistency. * We do not insist on where the output should go, by saying git gostak [--distim] < <list-of-doshes> > <output> * As it is our convention to enclose placeholders inside <braket>, the redirection operator followed by a placeholder filename becomes very hard to read, both in the documentation and in the help text. Let's clean them all up, after making sure that the documentation clearly describes the modes that take information from the standard input and what kind of things are expected on the input. [jc: stole example for fmt-merge-msg from Jonathan] Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-16 15:27:52 -07:00
Jeff King	3115ee45c8	cat-file: sort and de-dup output of --batch-all-objects The sorting we could probably live without, but printing duplicates is just a hassle for the user, who must then de-dup themselves (or risk a wrong answer if they are doing something like counting objects with a particular property). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-26 09:24:42 -07:00
Jeff King	6a951937ae	cat-file: add --batch-all-objects option It can sometimes be useful to examine all objects in the repository. Normally this is done with "git rev-list --all --objects", but: 1. That shows only reachable objects. You may want to look at all available objects. 2. It's slow. We actually open each object to walk the graph. If your operation is OK with seeing unreachable objects, it's an order of magnitude faster to just enumerate the loose directories and pack indices. You can do this yourself using "ls" and "git show-index", but it's non-obvious. This patch adds an option to "cat-file --batch-check" to operate on all available objects (rather than reading names from stdin). This is based on a proposal by Charles Bailey to provide a separate "git list-all-objects" command. That is more orthogonal, as it splits enumerating the objects from getting information about them. However, in practice you will either: a. Feed the list of objects directly into cat-file anyway, so you can find out information about them. Keeping it in a single process is more efficient. b. Ask the listing process to start telling you more information about the objects, in which case you will reinvent cat-file's batch-check formatter. Adding a cat-file option is simple and efficient. And if you really do want just the object names, you can always do: git cat-file --batch-check='%(objectname)' --batch-all-objects Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	44b877e9bc	cat-file: split batch_one_object into two stages There are really two things going on in this function: 1. We convert the name we got on stdin to a sha1. 2. We look up and print information on the sha1. Let's split out the second half so that we can call it separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	82330950d9	cat-file: stop returning value from batch_one_object If batch_one_object returns an error code, we stop reading input. However, it will only do so if we feed it NULL, which cannot happen; we give it the "buf" member of a strbuf, which is always non-NULL. We did originally stop on other errors (like a missing object), but this was changed in `3c076db` (cat-file --batch / --batch-check: do not exit if hashes are missing, 2008-06-09). These days we keep going for any per-object error (and print "missing" when necessary). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	fc4937c372	cat-file: add --buffer option We use a direct write() to output the results of --batch and --batch-check. This is good for processes feeding the input and reading the output interactively, but it introduces measurable overhead if you do not want this feature. For example, on linux.git: $ git rev-list --objects --all \| cut -d' ' -f1 >objects $ time git cat-file --batch-check='%(objectsize)' \ <objects >/dev/null real 0m5.440s user 0m5.060s sys 0m0.384s This patch adds an option to use regular stdio buffering: $ time git cat-file --batch-check='%(objectsize)' \ --buffer <objects >/dev/null real 0m4.975s user 0m4.888s sys 0m0.092s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	bfd155943e	cat-file: move batch_options definition to top of file That way all of the functions can make use of it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	ad42f28d0c	cat-file: minor style fix in options list We do not put extra whitespace before the first macro argument. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Junio C Hamano	67f0b6f3b2	Merge branch 'dt/cat-file-follow-symlinks' "git cat-file --batch(-check)" learned the "--follow-symlinks" option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at Documentation/RelNotes/2.5.0.txt. With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead. * dt/cat-file-follow-symlinks: cat-file: add --follow-symlinks to --batch sha1_name: get_sha1_with_context learns to follow symlinks tree-walk: learn get_tree_entry_follow_symlinks	2015-06-01 12:45:16 -07:00
David Turner	122d53464b	cat-file: add --follow-symlinks to --batch This wires the in-repo-symlink following code through to the cat-file builtin. In the event of an out-of-repo link, cat-file will print the link in a new format. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-20 13:46:21 -07:00
Karthik Nayak	39e4ae3880	cat-file: teach cat-file a '--allow-unknown-type' option 'git cat-file' throws an error while trying to print the type or size of a broken/corrupt object. This is because these objects are usually of unknown types. Teach git cat-file a '--allow-unknown-type' option where it prints the type or size of a broken/corrupt object without throwing an error. Modify '-t' and '-s' options to call sha1_object_info_extended() directly to support the '--allow-unknown-type' option. Add documentation for 'cat-file --allow-unknown-type'. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> cat-file: add documentation for '--allow-unknown-type' option. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-06 13:35:48 -07:00
Karthik Nayak	b48158ac94	cat-file: make the options mutually exclusive We only parse the options if 2 or 3 arguments are specified. Update 'struct option options[]' to use OPT_CMDMODE rather than OPT_SET_INT to allow only one mutually exclusive option and avoid the need for checking number of arguments. This was written by Junio C Hamano, tested by me. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-06 13:35:48 -07:00
Junio C Hamano	bb831db677	Merge branch 'ah/usage-strings' * ah/usage-strings: standardize usage info string format	2015-02-11 13:44:20 -08:00
Junio C Hamano	67b5440d0d	Merge branch 'ak/cat-file-clean-up' * ak/cat-file-clean-up: cat-file: use "type" and "size" from outer scope	2015-01-22 13:46:38 -08:00
Alex Henrie	9c9b4f2f8b	standardize usage info string format This patch puts the usage info strings that were not already in docopt- like format into docopt-like format, which will be a litle easier for end users and a lot easier for translators. Changes include: - Placing angle brackets around fill-in-the-blank parameters - Putting dashes in multiword parameter names - Adding spaces to [-f\|--foobar] to make [-f \| --foobar] - Replacing <foobar>* with [<foobar>...] Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-14 09:32:04 -08:00
Alexander Kuleshov	331004836b	cat-file: use "type" and "size" from outer scope In cat_one_file(), "type" and "size" variables are defined in the function scope, and then two variables of the same name are defined in a block in one of the if/else statement, hiding the definitions in the outer scope. Because the values of the outer variables before the control enters this scope, however, do not have to be preserved, we can remove useless definitions of variables from the inner scope safely without breaking anything. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-13 12:36:04 -08:00
Alexander Kuleshov	10aff315f6	cat-file: remove unused includes - "exec_cmd.h" became unnecessary at `b931aa5a` (Call builtin ls-tree in git-cat-file -p, 2006-05-26), when it changed an earlier code that delegated tree display to "ls-tree" via the run_command() API (hence needing "exec_cmd.h") to call cmd_ls_tree() directly. We should have removed the include in the same commit, but we forgot to do so. - "diff.h" was added at `e5fba602` (textconv: support for cat_file, 2010-06-15), together with "userdiff.h", but "userdiff.h" can be included without including "diff.h"; the header was unnecessary from the beginning. - "tag.h" and "tree.h" were necessary since `8e440259` (Use blob_, commit_, tag_, and tree_type throughout., 2006-04-02) to check the type of object by comparing typename with tree_type and tag_type (pointers to extern strings). `21666f1a` (convert object type handling from a string to a number, 2007-02-26) made these <type>_type strings unnecessary, and it could have switched to include "object.h", which is necessary to use typename(), but it forgot to do so. Because "tag.h" and "tree.h" include "object.h", it did not need to explicitly include "object.h" in order to start using typename() itself. We do not even have to include "object.h" after removing these two #includes, because "builtin.h" includes "commit.h" which in turn includes "object.h" these days. This happened at `7b9c0a69` (git-commit-tree: make it usable from other builtins, 2008-07-01). Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-09 16:18:35 -08:00
René Scharfe	e3f1da982e	use skip_prefix() to avoid more magic numbers Continue where `ae021d87` (use skip_prefix to avoid magic numbers) left off and use skip_prefix() in more places for determining the lengths of prefix strings to avoid using dependent constants and other indirect methods. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-07 11:09:16 -07:00
Junio C Hamano	d4c6e9fb6f	Merge branch 'jk/warn-on-object-refname-ambiguity' * jk/warn-on-object-refname-ambiguity: rev-list: disable object/refname ambiguity check with --stdin cat-file: restore warn_on_object_refname_ambiguity flag cat-file: fix a minor memory leak in batch_objects cat-file: refactor error handling of batch_objects	2014-03-25 11:07:36 -07:00
Jeff King	a42fcd15d8	cat-file: restore warn_on_object_refname_ambiguity flag Commit `25fba78` turned off the object/refname ambiguity check during "git cat-file --batch" operations. However, this is a global flag, so let's restore it when we are done. This shouldn't make any practical difference, as cat-file exits immediately afterwards, but is good code hygeine and would prevent an unnecessary surprise if somebody starts to call cmd_cat_file later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-13 11:56:17 -07:00
Junio C Hamano	b2132068c6	Merge branch 'jk/oi-delta-base' Teach "cat-file --batch" to show delta-base object name for a packed object that is represented as a delta. * jk/oi-delta-base: cat-file: provide %(deltabase) batch format sha1_object_info_extended: provide delta base sha1s	2014-01-10 10:33:11 -08:00
Junio C Hamano	b0504a9519	Merge branch 'cc/replace-object-info' read_sha1_file() that is the workhorse to read the contents given an object name honoured object replacements, but there is no corresponding mechanism to sha1_object_info() that is used to obtain the metainfo (e.g. type & size) about the object, leading callers to weird inconsistencies. * cc/replace-object-info: replace info: rename 'full' to 'long' and clarify in-code symbols Documentation/git-replace: describe --format option builtin/replace: unset read_replace_refs t6050: add tests for listing with --format builtin/replace: teach listing using short, medium or full formats sha1_file: perform object replacement in sha1_object_info_extended() t6050: show that git cat-file --batch fails with replace objects sha1_object_info_extended(): add an "unsigned flags" parameter sha1_file.c: add lookup_replace_object_extended() to pass flags replace_object: don't check read_replace_refs twice rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT	2014-01-10 10:32:10 -08:00
Jeff King	648027c4c8	cat-file: fix a minor memory leak in batch_objects We should always have been freeing our strbuf, but doing so consistently was annoying until the refactoring in the previous patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-07 14:31:52 -08:00
Jeff King	07e2383945	cat-file: refactor error handling of batch_objects This just pulls the return value for the function out of the inner loop, so we can break out of the loop rather than do an early return. This will make it easier to put any cleanup for the function in one place. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-07 14:31:10 -08:00
Jeff King	65ea9c3c3d	cat-file: provide %(deltabase) batch format It can be useful for debugging or analysis to see which objects are stored as delta bases on top of others. This information is available by running `git verify-pack`, but that is extremely expensive (and is harder than necessary to parse). Instead, let's make it available as a cat-file query format, which makes it fast and simple to get the bases for a subset of the objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-26 11:54:26 -08:00
Christian Couder	de7b5d6218	sha1_object_info_extended(): add an "unsigned flags" parameter This parameter is not used yet, but it will be used to tell sha1_object_info_extended() if it should perform object replacement or not. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:53:48 -08:00
Jeff King	6554dfa97a	cat-file: handle --batch format with missing type/size Commit `98e2092` taught cat-file to stream blobs with --batch, which requires that we look up the object type before loading it into memory. As a result, we now print the object header from information in sha1_object_info, and the actual contents from the read_sha1_file. We double-check that the information we printed in the header matches the content we are about to show. Later, commit `93d2a60` allowed custom header lines for --batch, and commit `5b08640` made type lookups optional. As a result, specifying a header line without the type or size means that we will not look up those items at all. This causes our double-checking to erroneously die with an error; we think the type or size has changed, when in fact it was simply left at "0". For the size, we can fix this by only doing the consistency double-check when we have retrieved the size via sha1_object_info. In the case that we have not retrieved the value, that means we also did not print it, so there is nothing for us to check that we are consistent with. We could do the same for the type. However, besides our consistency check, we also care about the type in deciding whether to stream or not. So instead of handling the case where we do not know the type, this patch instead makes sure that we always trigger a type lookup when we are printing, so that even a format without the type will stream as we would in the normal case. Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:31:25 -08:00
Jeff King	370c9268d1	cat-file: pass expand_data to print_object_or_die We currently individually pass the sha1, type, and size fields calculated by sha1_object_info. However, if we pass the whole struct, the called function can make more intelligent decisions about which fields were actually filled by sha1_object_info. This patch takes that first refactoring step, passing the whole struct, so further patches can make those decisions with less noise in their diffs. There should be no functional change to this patch (aside from a minor typo fix in the error message). As a side effect, we can rename the local variables in the function to "type" and "size", since the names are no longer taken. Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:27:21 -08:00
Junio C Hamano	4197361e39	Merge branch 'mg/more-textconv' Make "git grep" and "git show" pay attention to --textconv when dealing with blob objects. * mg/more-textconv: grep: honor --textconv for the case rev:path grep: allow to use textconv filters t7008: demonstrate behavior of grep with textconv cat-file: do not die on --textconv without textconv filters show: honor --textconv for blobs diff_opt: track whether flags have been set explicitly t4030: demonstrate behavior of show with textconv	2013-10-23 13:21:31 -07:00
Jeff King	97be04077f	cat-file: only split on whitespace when %(rest) is used Commit `c334b87b` (cat-file: split --batch input lines on whitespace, 2013-07-11) taught `cat-file --batch-check` to split input lines on the first whitespace, and stash everything after the first token into the %(rest) output format element. It claimed: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[<tree>]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default, hence we will ship 1.8.4 with the commit reverted. Resurrect its attempt but in a weaker form; only do the splitting when "%(rest)" is used in the output format. Since that element did not exist at all before `c334b87`, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" \| git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-05 09:30:48 -07:00
Junio C Hamano	062aeee8aa	Revert "cat-file: split --batch input lines on whitespace" This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the update assumed that people only used the command to read from "rev-list --objects" output, whose lines begin with a 40-hex object name followed by a whitespace, but it turns out that scripts feed random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which a whitespace has to be kept.	2013-08-02 09:29:30 -07:00
Jeff King	5b0864070e	sha1_object_info_extended: make type calculation optional Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a "cat-file --batch-check" that does not include %(objecttype). This patch adds a "typep" query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real 0m8.680s user 0m8.160s sys 0m0.512s to: real 0m7.205s user 0m6.580s sys 0m0.608s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:16:36 -07:00
Jeff King	25fba78d36	cat-file: disable object/refname ambiguity check for batch mode A common use of "cat-file --batch-check" is to feed a list of objects from "rev-list --objects" or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since `798c35f` (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by `ca91993` (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectname)' <objects on the linux.git repository: real 1m13.494s user 0m25.924s sys 0m47.532s If we revert `ca91993`, the packed-refs up-to-date check, it gets a little better: real 0m54.697s user 0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert `798c35f`, disabling the warning entirely, we get a much more reasonable time: real 0m7.452s user 0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:09:56 -07:00
Jeff King	c334b87b30	cat-file: split --batch input lines on whitespace If we get an input line to --batch or --batch-check that looks like "HEAD foo bar", we will currently feed the whole thing to get_sha1(). This means that to use --batch-check with `rev-list --objects`, one must pre-process the input, like: git rev-list --objects HEAD \| cut -d' ' -f1 \| git cat-file --batch-check Besides being more typing and slightly less efficient to invoke `cut`, the result loses information: we no longer know which path each object was found at. This patch teaches cat-file to split input lines at the first whitespace. Everything to the left of the whitespace is considered an object name, and everything to the right is made available as the %(reset) atom. So you can now do: git rev-list --objects HEAD \| git cat-file --batch-check='%(objectsize) %(rest)' to collect object sizes at particular paths. Even if %(rest) is not used, we always do the whitespace split (which means you can simply eliminate the `cut` command from the first example above). This whitespace split is backwards compatible for any reasonable input. Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. The only input hurt is if somebody really expected input of the form "HEAD is a fine-looking ref!" to fail; it will now parse HEAD, and make "is a fine-looking ref!" available as %(rest). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:18:42 -07:00
Jeff King	a4ac106178	cat-file: add %(objectsize:disk) format atom This atom is just like %(objectsize), except that it shows the on-disk size of the object rather than the object's true size. In other words, it makes the "disk_size" query of sha1_object_info_extended available via the command-line. This can be used for rough attribution of disk usage to particular refs, though see the caveats in the documentation. This patch does not include any tests, as the exact numbers returned are volatile and subject to zlib and packing decisions. We cannot even reliably guarantee that the on-disk size is smaller than the object content (though in general this should be the case for non-trivial objects). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:18:42 -07:00
Jeff King	93d2a607ba	cat-file: add --batch-check=<format> The `cat-file --batch-check` command can be used to quickly get information about a large number of objects. However, it provides a fixed set of information. This patch adds an optional <format> option to --batch-check to allow a caller to specify which items they are interested in, and in which order to output them. This is not very exciting for now, since we provide the same limited set that you could already get. However, it opens the door to adding new format items in the future without breaking backwards compatibility (or forcing callers to pay the cost to calculate uninteresting items). Since the --batch option shares code with --batch-check, it receives the same feature, though it is less likely to be of interest there. The format atom names are chosen to match their counterparts in for-each-ref. Though we do not (yet) share any code with for-each-ref's formatter, this keeps the interface as consistent as possible, and may help later on if the implementations are unified. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:18:12 -07:00
Jeff King	b71bd48017	cat-file: refactor --batch option parsing We currently use an int to tell us whether --batch parsing is on, and if so, whether we should print the full object contents. Let's instead factor this into a struct, filled in by callback, which will make further batch-related options easy to add. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-11 10:39:13 -07:00
Jeff King	98e2092b50	cat-file: teach --batch to stream blob objects The regular "git cat-file -p" and "git cat-file blob" code paths already learned to stream large blobs. Let's do the same here. Note that this means we look up the type and size before making a decision of whether to load the object into memory or stream (just like the "-p" code path does). That can lead to extra work, but it should be dwarfed by the cost of actually accessing the object itself. In my measurements, there was a 1-2% slowdown when using "--batch" on a large number of objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-11 10:37:16 -07:00
Michael J Gruber	3ac21617b0	cat-file: do not die on --textconv without textconv filters When a command is supposed to use textconv filters (by default or with "--textconv") and none are configured then the blob is output without conversion; the only exception to this rule is "cat-file --textconv". Make it behave like the rest of textconv aware commands. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-10 10:27:16 -07:00
Jeff King	9cfa5126a0	cat-file: print tags raw for "cat-file -p" When "cat-file -p" prints commits, it shows them in their raw format, since git's format is already human-readable. For tags, however, we print the whole thing raw except for one thing: we convert the timestamp on the tagger line into a human-readable date. This dates all the way back to `a0f15fa` (Pretty-print tagger dates, 2006-03-01). At that time there was no other way to pretty-print a tag. These days, however, neither of those matters much. The normal way to pretty-print a tag is with "git show", which is much more flexible than "cat-file -p". Commit `a0f15fa` also built "verify-tag --verbose" (and subsequently "tag -v") around the "cat-file -p" output. However, that behavior was lost in commit `62e09ce` (Make git tag a builtin, 2007-07-20), and we went back to printing the raw tag contents. Nobody seems to have noticed the bug since then (and it is arguably a saner behavior anyway, as it shows the actual bytes for which we verified the signature). Let's drop the tagger-date formatting for "cat-file -p". It makes us more consistent with cat-file's commit pretty-printer, and as a bonus, we can drop the hand-rolled tag parsing code in cat-file (which happened to behave inconsistently with the tag pretty-printing code elsewhere). This is a change of output format, so it's possible that some callers could considered this a regression. However, the original behavior was arguably a bug (due to the inconsistency with commits), likely nobody was relying on it (even we do not use it ourselves these days), and anyone relying on the "-p" pretty-printer should be able to expect a change in the output format (i.e., while "cat-file" is plumbing, the output format of "-p" was never guaranteed to be stable). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-17 14:48:45 -07:00
Ramsay Jones	803a777942	cat-file: Fix an gcc -Wuninitialized warning After commit `cbfd5e1c` ("drop some obsolete "x = x" compiler warning hacks", 21-03-2013) removed a gcc specific hack, older versions of gcc now issue an "'contents' might be used uninitialized" warning. In order to suppress the warning, we simply initialize the variable to NULL in it's declaration. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-29 23:47:00 -07:00
Jeff King	cbfd5e1cbb	drop some obsolete "x = x" compiler warning hacks In cases where the setting and access of a variable are protected by the same conditional flag, older versions of gcc would generate a "might be used unitialized" warning. We silence the warning by initializing the variable to itself, a hack that gcc recognizes. Modern versions of gcc are smart enough to get this right, going back to at least version 4.3.5. gcc 4.1 does get it wrong in both cases, but is sufficiently old that we probably don't need to care about it anymore. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-21 14:06:38 -07:00
Junio C Hamano	096bbd6537	Merge branch 'nd/i18n-parseopt-help' A lot of i18n mark-up for the help text from "git <cmd> -h". * nd/i18n-parseopt-help: (66 commits) Use imperative form in help usage to describe an action Reduce translations by using same terminologies i18n: write-tree: mark parseopt strings for translation i18n: verify-tag: mark parseopt strings for translation i18n: verify-pack: mark parseopt strings for translation i18n: update-server-info: mark parseopt strings for translation i18n: update-ref: mark parseopt strings for translation i18n: update-index: mark parseopt strings for translation i18n: tag: mark parseopt strings for translation i18n: symbolic-ref: mark parseopt strings for translation i18n: show-ref: mark parseopt strings for translation i18n: show-branch: mark parseopt strings for translation i18n: shortlog: mark parseopt strings for translation i18n: rm: mark parseopt strings for translation i18n: revert, cherry-pick: mark parseopt strings for translation i18n: rev-parse: mark parseopt strings for translation i18n: reset: mark parseopt strings for translation i18n: rerere: mark parseopt strings for translation i18n: status: mark parseopt strings for translation i18n: replace: mark parseopt strings for translation ...	2012-09-07 11:09:09 -07:00

1 2