1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-18 15:04:49 +01:00
Commit graph

75 commits

Author SHA1 Message Date
Raphael Zimmerer
83caecca2f git grep: Add "-z/--null" option as in GNU's grep.
Here's a trivial patch that adds "-z" and "--null" options to "git
grep". It was discussed on the mailing-list that git's "-z"
convention should be used instead of GNU grep's "-Z".
So things like 'git grep -l -z "$FOO" | xargs -0 sed -i "s/$FOO/$BOO/"'
do work now.

Signed-off-by: Raphael Zimmerer <killekulla@rdrz.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-01 09:14:54 -07:00
Junio C Hamano
6a42cfe86c Merge branch 'nd/worktree' into maint
* nd/worktree:
  setup_git_directory(): fix move to worktree toplevel directory
  update-index: fix worktree setup
  read-tree: setup worktree if merge is required
  grep: fix worktree setup
  diff*: fix worktree setup
2008-09-03 15:35:37 -07:00
Junio C Hamano
7e44c93558 'git foo' program identifies itself without dash in die() messages
This is a mechanical conversion of all '*.c' files with:

	s/((?:die|error|warning)\("git)-(\S+:)/$1 $2/;

The result was manually inspected and no false positive was found.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-31 09:39:19 -07:00
Nguyễn Thái Ngọc Duy
6577f542b3 grep: fix worktree setup
Unless used with --cached or grepping on a tree, "git grep" will
search on working directory, so set up worktree properly

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-28 22:46:16 -07:00
Junio C Hamano
fa4946b553 Merge branch 'maint'
* maint:
  fix usage string for git grep
  refresh-index: fix bitmask assignment

Conflicts:
	builtin-grep.c
2008-07-20 17:16:29 -07:00
Jonathan Nieder
2d9c572578 fix usage string for git grep
Without this patch, git-grep gives confusing usage information:

	$ git grep --confused
	usage: git grep <option>* <rev>* [-e] <pattern> [<path>...]
	$ git grep HEAD pattern
	fatal: ambiguous argument 'pattern': unknown revision or path no
	t in the working tree.
	Use '--' to separate paths from revisions

So put <pattern> before the <rev>s, in accordance with actual correct
usage.  While we're changing the usage string, we might as well include
the "--" separating revisions and paths, too.

Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-20 13:01:26 -07:00
Junio C Hamano
588c038ac6 Merge branch 'sb/dashless'
* sb/dashless:
  Make usage strings dash-less
  t/: Use "test_must_fail git" instead of "! git"
  t/test-lib.sh: exit with small negagive int is ok with test_must_fail

Conflicts:
	builtin-blame.c
	builtin-mailinfo.c
	builtin-mailsplit.c
	builtin-shortlog.c
	git-am.sh
	t/t4150-am.sh
	t/t4200-rerere.sh
2008-07-16 17:22:50 -07:00
Dmitry Potapov
620e2bb937 Fix buffer overflow in git-grep
If PATH_MAX on your system is smaller than any path stored in the git
repository, that can cause memory corruption inside of the grep_tree
function used by git-grep.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-16 13:30:34 -07:00
Stephan Beyer
1b1dd23f2d Make usage strings dash-less
When you misuse a git command, you are shown the usage string.
But this is currently shown in the dashed form.  So if you just
copy what you see, it will not work, when the dashed form
is no longer supported.

This patch makes git commands show the dash-less version.

For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh
generates a dash-less usage string now.

Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-13 14:12:48 -07:00
Jeff King
5f7c643afe add NO_EXTERNAL_GREP build option
Previously, we just chose whether to allow external grep
based on the __unix__ define. However, there are systems
which define this macro but which have an inferior group
(e.g., one that does not support all options used by t7002).
This allows users to accept the potential speed penalty to
get a more consistent grep experience (and to pass the
testsuite).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-13 00:57:53 -07:00
Shawn O. Pearce
2cd5dfd240 Teach git-grep --name-only as synonym for -l
I expected git grep --name-only to give me only the file names,
much as git diff --name-only only generates filenames.  Alas the
option is -l, which matches common external greps but doesn't match
other parts of the git UI.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-20 20:36:20 -08:00
Linus Torvalds
7a51ed66f6 Make on-disk index representation separate from in-core one
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.

In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.

This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 12:44:31 -08:00
Jim Meyering
872c930dcb Don't access line[-1] for a zero-length "line" from fgets.
A NUL byte at beginning of file, or just after a newline
would provoke an invalid buf[-1] access in a few places.

* builtin-grep.c (cmd_grep): Don't access buf[-1].
* builtin-pack-objects.c (get_object_list): Likewise.
* builtin-rev-list.c (read_revisions_from_stdin): Likewise.
* bundle.c (read_bundle_header): Likewise.
* server-info.c (read_pack_info_file): Likewise.
* transport.c (insert_packed_refs): Likewise.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-04 12:28:58 -08:00
Junio C Hamano
6326cee51b git grep shows the same hit repeatedly for unmerged paths
When the index is unmerged, e.g.

	$ git ls-files -u
        100644 faf413748eb6ccb15161a212156c5e348302b1b6 1	setup.c
        100644 145eca50f4 2	setup.c
        100644 cb9558c49b6027bf225ba2a6154c4d2a52bcdbe2 3	setup.c

running "git grep" for work tree files repeats hits for each unmerged
stage.

	$ git grep -n -e setup_work_tree -- '*.[ch]'
        setup.c:209:void setup_work_tree(void)
        setup.c:209:void setup_work_tree(void)
        setup.c:209:void setup_work_tree(void)

This should fix it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-05 16:16:40 -08:00
Junio C Hamano
4b87474bc9 grep -An -Bm: fix invocation of external grep command
When building command line to invoke external grep, the
arguments to -A/-B/-C options were placd in randarg[] buffer,
but the code forgot that snprintf() does not count terminating
NUL in its return value.  This caused "git grep -A1 -B2" to
invoke external grep with "-B21 -A1".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-17 21:19:55 -08:00
Junio C Hamano
b67a43bb8f grep with unmerged index
We called flush_grep() every time we saw an unmerged entry in
the index.  If we happen to find an unmerged entry before we saw
more than two paths, we incorrectly declared that the user had
too many non-paths options in front.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 18:57:58 -08:00
Junio C Hamano
d99ebf0817 Split grep arguments in a way that does not requires to add /dev/null.
In order to (almost) always show the name of the file without
relying on "-H" option of GNU grep, we used to add /dev/null to
the argument list unless we are doing -l or -L.  This caused
"/dev/null:0" to show up when -c is given in the output.

It is not enough to add -c to the set of options we do not pass
/dev/null for.  When we have too many files, we invoke grep
multiple times and we need to avoid giving a widow filename to
the last invocation -- otherwise we will not see the name.

This keeps two filenames when the argv[] buffer is about to
overflow and we have not finished iterating over the index, so
that the last round will always have at least two paths to work
with (and not require /dev/null).

An obvious and the only exception is when there is only 1 file
that is given to the underlying grep, and in that case we avoid
passing /dev/null and let the external "grep -c" report only the
number of matches.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-14 15:16:43 -07:00
Jim Meyering
6aead43db3 sscanf/strtoul: parse integers robustly
* builtin-grep.c (strtoul_ui): Move function definition from here, to...
* git-compat-util.h (strtoul_ui): ...here, with an added "base" parameter.
* builtin-grep.c (cmd_grep): Update use of strtoul_ui to include base, "10".
* builtin-update-index.c (read_index_info): Diagnose an invalid mode integer
that is out of range or merely larger than INT_MAX.
(cmd_update_index): Use strtoul_ui, not sscanf.
* convert-objects.c (write_subdirectory): Likewise.

Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-11 19:13:55 -07:00
Linus Torvalds
6fda5e5180 Initialize tree descriptors with a helper function rather than by hand.
This removes slightly more lines than it adds, but the real reason for
doing this is that future optimizations will require more setup of the
tree descriptor, and so we want to do it in one place.

Also renamed the "desc.buf" field to "desc.buffer" just to trigger
compiler errors for old-style manual initializations, making sure I
didn't miss anything.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21 10:21:57 -07:00
Linus Torvalds
a8c40471ab Remove "pathlen" from "struct name_entry"
Since we have the "tree_entry_len()" helper function these days, and
don't need to do a full strlen(), there's no point in saving the path
length - it's just redundant information.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21 10:21:56 -07:00
Jim Meyering
09f2825147 git-grep: don't use sscanf
If you use scanf or sscanf to parse integers, your code probably
accepts bogus inputs.  For example, builtin-grep (aka git-grep) uses
sscanf(scan, "%u", &num) to parse the integer argument to -A, -B, -C.
Currently, "-C 1,000" and "-C 4294967297" are both treated just like
"-C 1":

    $ git-grep -h -C 4294967297 juggle
    out and you may find it easier to switch back and forth if you
    juggle multiple lines of development simultaneously. Of
    course, you will pay the price of more disk usage to hold

The obvious fix is to use strtoul instead.  But using a bare strtoul is
too messy, at least when done properly, so I've added a wrapper function.

The new function in the patch below belongs elsewhere if it would be
useful in replacing any of the four remaining uses of sscanf.

One final note:  With this change, I get a slightly different
diagnostic depending on the context size:

  $ ./git-grep -h -C 4294967296 juggle
  fatal: 4294967296: invalid context length argument
  [Exit 128]
  $ ./git-grep -h -C 4294967295 juggle
  grep: 4294967295: invalid context length argument

  [Exit 1]

A common convention that makes it easy to identify the source
of a diagnostic is to include the program name before the first ":".
Whether that should be "git" or "git-grep" is another question.
Using "grep" or "fatal" is misleading.

Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-14 01:37:50 -07:00
Shawn O. Pearce
dc49cd769b Cast 64 bit off_t to 32 bit size_t
Some systems have sizeof(off_t) == 8 while sizeof(size_t) == 4.
This implies that we are able to access and work on files whose
maximum length is around 2^63-1 bytes, but we can only malloc or
mmap somewhat less than 2^32-1 bytes of memory.

On such a system an implicit conversion of off_t to size_t can cause
the size_t to wrap, resulting in unexpected and exciting behavior.
Right now we are working around all gcc warnings generated by the
-Wshorten-64-to-32 option by passing the off_t through xsize_t().

In the future we should make xsize_t on such problematic platforms
detect the wrapping and die if such a file is accessed.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 11:15:26 -08:00
Shawn O. Pearce
ff1f99453f Don't build external_grep if its not used
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 10:42:07 -08:00
Nicolas Pitre
21666f1aae convert object type handling from a string to a number
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value.  One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.

This is an initial step for the removal of the version using a char array
found in object reading code paths.  The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:34:21 -08:00
Junio C Hamano
599065a3bb prefixcmp(): fix-up mechanical conversion.
Previous step converted use of strncmp() with literal string
mechanically even when the result is only used as a boolean:

    if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo")))

This step manually cleans them up to read:

    if (!prefixcmp(arg, "foo"))

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-20 22:03:15 -08:00
Junio C Hamano
cc44c7655f Mechanical conversion to use prefixcmp()
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily.  Leftover from this will be fixed in a separate step, including
idiotic conversions like

    if (!strncmp("foo", arg, 3))

  =>

    if (!(-prefixcmp(arg, "foo")))

This was done by using this script in px.perl

   #!/usr/bin/perl -i.bak -p
   if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
           s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
   }
   if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
           s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
   }

and running:

   $ git grep -l strncmp -- '*.c' | xargs perl px.perl

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-20 22:03:15 -08:00
Andy Whitcroft
93d26e4cb9 short i/o: fix calls to read to use xread or read_in_full
We have a number of badly checked read() calls.  Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads.  Add a read_in_full()
providing those semantics.  Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-08 15:44:47 -08:00
Junio C Hamano
85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Junio C Hamano
36f2587ffb grep: do not skip unmerged entries when grepping in the working tree.
We used to skip unmerged entries, which made sense for grepping
in the cached copies, but not for grepping in the working tree.

Noticed by Johannes Sixt.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-26 14:22:01 -08:00
Junio C Hamano
0ab7befa31 grep --all-match
This lets you say:

	git grep --all-match -e A -e B -e C

to find lines that match A or B or C but limit the matches from
the files that have all of A, B and C.

This is different from

	git grep -e A --and -e B --and -e C

in that the latter looks for a single line that has all of these
at the same time.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 23:59:09 -07:00
Junio C Hamano
b48fb5b6a9 grep: free expressions and patterns when done.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 16:27:10 -07:00
Junio C Hamano
83b5d2f5b0 builtin-grep: make pieces of it available as library.
This makes three functions and associated option structures from
builtin-grep available from other parts of the system.

 * options to drive built-in grep engine is stored in struct
   grep_opt;

 * pattern strings and extended grep expressions are added to
   struct grep_opt with append_grep_pattern();

 * when finished calling append_grep_pattern(), call
   compile_grep_patterns() to prepare for execution;

 * call grep_buffer() to find matches in the in-core buffer.

This also adds an internal option "status_only" to grep_opt,
which suppresses any output from grep_buffer().  Callers of the
function as library can use it to check if there is a match
without producing any output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 11:14:38 -07:00
Linus Torvalds
7977f0ea53 Add "-h/-H" parsing to "git grep"
It turns out that I actually wanted to avoid the filenames (because I
didn't care - I just wanted to see the context in which something was
used) when doing a grep. But since "git grep" didn't take the "-h"
parameter, I ended up having to do "grep -5 -h *.c" instead.

So here's a trivial patch that adds "-h" (and thus has to enable -H too)
to "git grep" parsing.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-14 11:46:11 -07:00
Shawn Pearce
9befac470b Replace uses of strdup with xstrdup.
Like xmalloc and xrealloc xstrdup dies with a useful message if
the native strdup() implementation returns NULL rather than a
valid pointer.

I just tried to use xstrdup in new code and found it to be missing.
However I expected it to be present as xmalloc and xrealloc are
already commonly used throughout the code.

[jc: removed the part that deals with last_XXX, which I am
 finding more and more dubious these days.]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-02 03:24:37 -07:00
Junio C Hamano
599f8d6314 builtin-grep.c: remove unused debugging piece.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23 18:47:39 -07:00
Junio C Hamano
6493cc09c2 builtin-grep: remove unused debugging cruft.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-16 13:58:32 -07:00
David Rientjes
b756776d19 builtin-grep.c cleanup
Removes conditional return.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-14 18:38:07 -07:00
Junio C Hamano
0d042fecf2 git-grep: show pathnames relative to the current directory
By default, the command shows pathnames relative to the current
directory.  Use --full-name (the same flag to do so in ls-files)
if you want to see the full pathname relative to the project root.

This makes it very pleasant to run in Emacs compilation (or
"grep-find") buffer.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-11 19:08:10 -07:00
Junio C Hamano
f25b79397c Fix "grep -w"
We used to find the first match of the pattern and then if the
match is not for the entire word, declared that the whole line
does not match.

But that is wrong.  The command "git grep -w -e mmap" should
find that a line "foo_mmap bar mmap baz" matches, by tring the
second instance of pattern "mmap" on the same line.

Problems an earlier round of "fix" had were pointed out by Morten
Welinder, which have been incorporated in the t7002 tests.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-06 01:37:08 -07:00
Linus Torvalds
a633fca0c0 Call setup_git_directory() much earlier
This changes the calling convention of built-in commands and
passes the "prefix" (i.e. pathname of $PWD relative to the
project root level) down to them.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-29 01:34:07 -07:00
Linus Torvalds
1974632c66 Remove TYPE_* constant macros and use object_type enums consistently.
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-12 23:18:03 -07:00
Pavel Roskin
82e5a82fd7 Fix more typos, primarily in the code
The only visible change is that git-blame doesn't understand
"--compability" anymore, but it does accept "--compatibility" instead,
which is already documented.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-10 00:36:44 -07:00
Junio C Hamano
79d3696cfb git-grep: boolean expression on pattern matching.
This extends the behaviour of git-grep when multiple -e options
are given.  So far, we allowed multiple -e to behave just like
regular grep with multiple -e, i.e. the patterns are OR'ed
together.

With this change, you can also have multiple patterns AND'ed
together, or form boolean expressions, like this (the
parentheses are quoted from the shell in this example):

	$ git grep -e _PATTERN --and \( -e atom -e token \)

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-05 16:41:23 -07:00
Junio C Hamano
088b084bbb git-grep: use a bit more specific error messages.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-04 03:15:46 -07:00
Junio C Hamano
fcfe34b5ac git-grep: fix exit code when we use external grep.
Upon hit, we should exit with status 0.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-04 03:15:46 -07:00
Junio C Hamano
5390590f6d git-grep: fix parsing of pathspec separator '--'
We used to misparse

	git grep -e foo -- '*.sh'

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-04 03:15:46 -07:00
Junio C Hamano
3bec0da08d Merge branch 'jc/upload-corrupt' into next
* jc/upload-corrupt:
  upload-pack/fetch-pack: support side-band communication
  Retire git-clone-pack
  upload-pack: prepare for sideband message support.
  upload-pack: avoid sending an incomplete pack upon failure
  Fix possible out-of-bounds array access
2006-06-21 02:50:59 -07:00
Uwe Zeisberger
bb9e15a83c Fix possible out-of-bounds array access
If match is "", match[-1] is accessed.  Let pathspec_matches return 1 in that
case indicating that "" matches everything.

Incidently this fixes git-grep'ing in ".".

Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-21 02:30:46 -07:00
Linus Torvalds
1f1e895fcc Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.

That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.

This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.

The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).

One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.

It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 18:45:48 -07:00
Linus Torvalds
885a86abe2 Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:18 -07:00