This creates a simple specialized object allocator for basic
objects.
This avoids wasting space with malloc overhead (metadata and
extra alignment), since the specialized allocator knows the
alignment, and that objects, once allocated, are never freed.
It also allows us to track some basic statistics about object
allocations. For example, for the mozilla import, it shows
object usage as follows:
blobs: 627629 (14710 kB)
trees: 1119035 (34969 kB)
commits: 196423 (8440 kB)
tags: 1336 (46 kB)
and the simpler allocator shaves off about 2.5% off the memory
footprint off a "git-rev-list --all --objects", and is a bit
faster too.
[ Side note: this concludes the series of "save memory in object storage".
The thing is, there simply isn't much more to be saved on the objects.
Doing "git-rev-list --all --objects" on the mozilla archive has a final
total RSS of 131498 pages for me: that's about 513MB. Of that, the
object overhead is now just 56MB, the rest is going somewhere else (put
another way: the fact that this patch shaves off 2.5% of the total
memory overhead, considering that objects are now not much more than 10%
of the total shows how big the wasted space really was: this makes
object allocations much more memory- and time-efficient).
I haven't looked at where the rest is, but I suspect the bulk of it is
just the pack-file loading. It may be that we should pack the tree
objects separately from the blob objects: for git-rev-list --objects, we
don't actually ever need to even look at the blobs, but since trees and
blobs are interspersed in the pack-file, we end up not being dense in
the tree accesses, so we end up looking at more pages than we strictly
need to.
So with a 535MB pack-file, it's entirely possible - even likely - that
most of the remaining RSS is just the mmap of the pack-file itself. We
don't need to map in _all_ of it, but we do end up mapping a fair
amount. ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Every single user actually wanted this only for commit objects, and we
have no reason to waste space on it for other object types. So just move
the structure member from the low-level "struct object" into the "struct
commit".
This leaves the commit object the same size, and removes one unnecessary
pointer from all other object allocations.
This shrinks memory usage (still at a fairly hefty half-gig, admittedly)
of "git-rev-list --all --objects" on the mozilla repo by another 5% in my
tests.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When --attach is not used, usually we do not say Content-Type:
and fluff, but if the commit message is not 7-bit ASCII, mark
it as "text/plain; charset=UTF-8". This unclutters output
somewhat.
Signed-off-by: Junio C Hamano <junkio@cox.net>
By convention, the commit message and the author/committer names
in the commit objects are UTF-8 encoded. When formatting for
e-mails, Q-encode them according to RFC 2047.
While we are at it, generate the content-type and
content-transfer-encoding headers as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch touches a couple of files, because it adds options to print a
custom text just after the subject of a commit, and just after the
diffstat.
[jc: made "many dashes" used as the boundary leader into a single
variable, to reduce the possibility of later tweaks to miscount the
number of dashes to break it.]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* master: (119 commits)
diff family: add --check option
Document that "git add" only adds non-ignored files.
Add a conversion tool to migrate remote information into the config
fetch, pull: ask config for remote information
Fix build procedure for builtin-init-db
read-tree -m -u: do not overwrite or remove untracked working tree files.
apply --cached: do not check newly added file in the working tree
Implement a --dry-run option to git-quiltimport
Implement git-quiltimport
Revert "builtin-grep: workaround for non GNU grep."
builtin-grep: workaround for non GNU grep.
builtin-grep: workaround for non GNU grep.
git-am: use apply --cached
apply --cached: apply a patch without using working tree.
apply --numstat: show new name, not old name.
Documentation/Makefile: create tarballs for the man pages and html files
Allow pickaxe and diff-filter options to be used by git log.
Libify the index refresh logic
Builtin git-init-db
Remove unnecessary local in get_ref_sha1.
...
Still Work-in-progress git fmt-patch (should it be known as
format-patch-ng?) is matched with the fix made by Huw Davies
in 262a6ef76a commit to use
RFC2822 date format.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Updating "subject" variable without changing the hardcoded
number of bytes to memcpy from it would not help much.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This only does --stdout right now. To write into separate files
with pretty-printed filenames like the real thing does, it needs
a bit mroe work.
Signed-off-by: Junio C Hamano <junkio@cox.net>
In addition to the existing comment support, that just allows the user
to use a convention that works pretty much everywhere else.
Signed-off-by: Yann Dirson <ydirson@altern.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Partly because we've messed up and now have some commits with trailing
whitespace, but partly because this also just simplifies the code, let's
remove trailing whitespace from the end when pretty-printing commits.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds --date-order to rev-list; it is similar to topo order
in the sense that no parent comes before all of its children,
but otherwise things are still ordered in the commit timestamp
order.
The same flag is also added to show-branch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Earlier, f2d4227530 commit broke Merge:
lines for unabbreviated case. Do not emit extra dots if we do not
abbreviate.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When displaying Merge: lines, we used to take the real commit
parents from the commit objects. Use the parsed parents from
the commit object instead, so that we honor fake parent
information from info/grafts.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When describing more than one, we need to clear the commit marks
before handling the next one, but most of the time we are
running it for only one commit, and in such a case this clearing
phase is totally unnecessary.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The main loop was prepared to take more than one revs, but the actual
naming logic wad not (it used pop_most_recent_commit while forgetting
that the commit marks stay after it's done).
Signed-off-by: Junio C Hamano <junkio@cox.net>
ISO C99 (and GCC 3.x or later) lets you write a flexible array
at the end of a structure, like this:
struct frotz {
int xyzzy;
char nitfol[]; /* more */
};
GCC 2.95 and 2.96 let you to do this with "char nitfol[0]";
unfortunately this is not allowed by ISO C90.
This declares such construct like this:
struct frotz {
int xyzzy;
char nitfol[FLEX_ARRAY]; /* more */
};
and git-compat-util.h defines FLEX_ARRAY to 0 for gcc 2.95 and
empty for others.
If you are using a C90 C compiler, you should be able
to override this with CFLAGS=-DFLEX_ARRAY=1 from the
command line of "make".
Signed-off-by: Junio C Hamano <junkio@cox.net>
dietlibc versions of malloc, calloc and realloc all return NULL if
they're told to allocate 0 bytes, causes the x* wrappers to die().
There are several more places where these calls could end up asking
for 0 bytes, too...
Maybe simply not die()-ing in the x* wrappers if 0/NULL is returned
when the requested size is zero is a safer and easier way to go.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes git-rev-list so that when there are multiple branches, we still
sort the heads in proper approximate date order even when sorting the
output topologically.
This makes things like
gitk --all -d
work sanely and show the branches in date order (where "date order" is
obviously modified by the paren-child dependency requirements of the
topological sort).
The trivial fix is to just build the "work" list in date order rather than
inserting the new work entries at the beginning.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
git log without --pretty showed author and author-date, while
with --pretty=full showed author and committer but no dates.
The new formatting option, --pretty=fuller, shows both name and
timestamp for author and committer.
Signed-off-by: Junio C Hamano <junkio@cox.net>
One caller of deref_tag() was not careful enough to make sure
what deref_tag() returned was not NULL (i.e. we found a tag
object that points at an object we do not have). Fix it, and
warn about refs that point at such an incomplete tag where
needed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Do our own ctype.h, just to get the sane semantics: we want
locale-independence, _and_ we want the right signed behaviour. Plus we
only use a very small subset of ctype.h anyway (isspace, isalpha,
isdigit and isalnum).
Signed-off-by: Junio C Hamano <junkio@cox.net>
As pointed out on the list, git-rev-list can use a lot of memory.
One low-hanging fruit is to free the commit buffer for commits that we
parse. By default, parse_commit() will save away the buffer, since a lot
of cases do want it, and re-reading it continually would be unnecessary.
However, in many cases the buffer isn't actually necessary and saving it
just wastes memory.
We could just free the buffer ourselves, but especially in git-rev-list,
we actually end up using the helper functions that automatically add
parent commits to the commit lists, so we don't actually control the
commit parsing directly.
Instead, just make this behaviour of "parse_commit()" a global flag.
Maybe this is a bit tasteless, but it's very simple, and it makes a
noticable difference in memory usage.
Before the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.02system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3714minor)pagefaults 0swaps
after the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2433minor)pagefaults 0swaps
note how the minor faults have decreased from 3714 pages to 2433 pages.
That's all due to the fewer anonymous pages allocated to hold the comment
buffers and their metadata.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This reverts 6c5f9baa3b commit, whose
change breaks gcc-2.95.
Not that I ignore portability to compilers that are properly C99, but
keeping compilation with GCC working is more important, at least for
now. We would probably end up declaring with "name[1]" and teach the
allocator to subtract one if we really aimed for portability, but that
is left for later rounds.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The 'git show-branches' command turns out to be reasonably useful,
but painfully slow. So rewrite it in C, using ideas from merge-base
while enhancing it a bit more.
- Unlike show-branches, it can take --heads (show me all my
heads), --tags (show me all my tags), or --all (both).
- It can take --more=<number> to show beyond the merge-base.
- It shows the short name for each commit in the extended SHA1
syntax.
- It can find merge-base for more than two heads.
Examples:
$ git show-branch --more=6 HEAD
is almost the same as "git log --pretty=oneline --max-count=6".
$ git show-branch --merge-base master mhf misc
finds the merge base of the three given heads.
$ git show-branch master mhf misc
shows logs from the top of these three branch heads, up to their
common ancestor commit is shown.
$ git show-branch --all --more=10
is poor-man's gitk, showing all the tags and heads, and
going back 10 commits beyond the merge base of those refs.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This introduces --pretty=oneline to git-rev-tree and
git-rev-list commands to show only the first line of the commit
message, without frills.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Again I left the v2.6.11-tree tag behind. My bad.
This commit makes sure that we do not barf when pushing a ref
that is a non-commitish tag. You can update a remote ref under
the following conditions:
* You can always use --force.
* Creating a brand new ref is OK.
* If the remote ref is exactly the same as what you are
pushing, it is OK (nothing is pushed).
* You can replace a commitish with another commitish which is a
descendant of it, if you can verify the ancestry between them;
this and the above means you have to have what you are replacing.
* Otherwise you cannot update; you need to use --force.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Introduce a new file $GIT_DIR/info/grafts (or $GIT_GRAFT_FILE)
which is a list of "fake commit parent records". Each line of
this file is a commit ID, followed by parent commit IDs, all
40-byte hex SHA1 separated by a single SP in between. The
records override the parent information we would normally read
from the commit objects, allowing both adding "fake" parents
(i.e. grafting), and pretending as if a commit is not a child of
some of its real parents (i.e. cauterizing).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This was brought on by a bad tree of Thomas Gleixner, where some bogus
commit objects weren't warned about properly
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we allow a tag object in place of a commit object, we only
dereferenced the given tag once, which causes a tag that points at a tag
that points at a commit to be rejected. Instead, dereference tag
repeatedly until we get a non-tag.
This patch makes change to two functions:
- commit.c::lookup_commit_reference() is used by merge-base,
rev-tree and rev-parse to convert user supplied SHA1 to that of
a commit.
- rev-list uses its own get_commit_reference() to do the same.
Dereferencing tags this way helps both of these uses.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces an in-place topological sort procedure to commit.c.
Given a list of commits, sort_in_topological_order() will perform an in-place
topological sort of that list.
The invariant that applies to the resulting list is:
a reachable from b => ord(b) < ord(a)
This invariant is weaker than the --merge-order invariant, but is cheaper
to calculate (assuming the list has been identified) and will serve any
purpose where only a minimal topological order guarantee is required.
Signed-off-by: Jon Seymour <jon.seymour@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Otherwise the "git log" information doesn't tell enough to make sense of
a merge.
I'll need to add some parent information for regular entries too, I
think, but the merge is more important.
Make 'sha1' parameters const where possible
Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch linearises the GIT commit history graph into merge order
which is defined by invariants specified in Documentation/git-rev-list.txt.
The linearisation produced by this patch is superior in an objective sense
to that produced by the existing git-rev-list implementation in that
the linearisation produced is guaranteed to have the minimum number of
discontinuities, where a discontinuity is defined as an adjacent pair of
commits in the output list which are not related in a direct child-parent
relationship.
With this patch a graph like this:
a4 ---
| \ \
| b4 |
|/ | |
a3 | |
| | |
a2 | |
| | c3
| | |
| | c2
| b3 |
| | /|
| b2 |
| | c1
| | /
| b1
a1 |
| |
a0 |
| /
root
Sorts like this:
= a4
| c3
| c2
| c1
^ b4
| b3
| b2
| b1
^ a3
| a2
| a1
| a0
= root
Instead of this:
= a4
| c3
^ b4
| a3
^ c2
^ b3
^ a2
^ b2
^ c1
^ a1
^ b1
^ a0
= root
A test script, t/t6000-rev-list.sh, includes a test which demonstrates
that the linearisation produced by --merge-order has less discontinuities
than the linearisation produced by git-rev-list without the --merge-order
flag specified. To see this, do the following:
cd t
./t6000-rev-list.sh
cd trash
cat actual-default-order
cat actual-merge-order
The existing behaviour of git-rev-list is preserved, by default. To obtain
the modified behaviour, specify --merge-order or --merge-order --show-breaks
on the command line.
This version of the patch has been tested on the git repository and also on the linux-2.6
repository and has reasonable performance on both - ~50-100% slower than the original algorithm.
This version of the patch has incorporated a functional equivalent of the Linus' output limiting
algorithm into the merge-order algorithm itself. This operates per the notes associated
with Linus' commit 337cb3fb8d.
This version has incorporated Linus' feedback regarding proposed changes to rev-list.c.
(see: [PATCH] Factor out filtering in rev-list.c)
This version has improved the way sort_first_epoch marks commits as uninteresting.
For more details about this change, refer to Documentation/git-rev-list.txt
and http://blackcubes.dyndns.org/epoch/.
Signed-off-by: Jon Seymour <jon.seymour@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
You can ask to print out "raw" format (full headers, full body),
"medium" format (author and date, full body) or "short" format
(author only, condensed body).
Use "git-rev-list --pretty=short HEAD | less -S" for an example.
object.
A fair number of the users potentially want to look at the
commit objects more closely, and if you worry about memory
leaking in certain applications, you can always do a
free(commit->buffer);
commit->buffer = NULL;
by hand after parsing them.
Add <limits.h> to the include files handled by "cache.h", and remove
extraneous #include directives from various .c files. The rule is that
"cache.h" gets all the basic stuff, so that we'll have as few system
dependencies as possible.
This adds knowledge of delta objects to fsck-cache and various object
parsing code. A new switch to git-fsck-cache is provided to display the
maximum delta depth found in a repository.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turns out that parse_object() is loading and decompressing given
object to free it just before calling the specific object parsing
function which does mmap and decompress the same object again. This
patch introduces the ability to parse specific objects directly from a
memory buffer.
Without this patch, running git-fsck-cache on the kernel repositorytake:
real 0m13.006s
user 0m11.421s
sys 0m1.218s
With this patch applied:
real 0m8.060s
user 0m7.071s
sys 0m0.710s
The performance increase is significant, and this is kind of a
prerequisite for sane delta object support with fsck.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes memory leaks in parse_object() and related functions;
these leaks were very noticeable when running git-fsck-cache.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce xmalloc and xrealloc to die gracefully with a descriptive
message when out of memory, rather than taking a SIGSEGV.
Signed-off-by: Christopher Li<chrislgit@chrisli.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make pop_most_recent_commit() return the same objects multiple times, but only
if called with different bits to mark.
This is necessary to make merge-base work again.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds a function for inserting an item in a commit list, a function
for sorting a commit list by date, and a function for progressively
scanning a commit history from most recent to least recent.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>