We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
With this patch,
$ git show -s \
--pretty=format:' Ze komit %h woss%n dunn buy ze great %an'
shows something like
Ze komit 04c5c88 woss
dunn buy ze great Junio C Hamano
The supported placeholders are:
'%H': commit hash
'%h': abbreviated commit hash
'%T': tree hash
'%t': abbreviated tree hash
'%P': parent hashes
'%p': abbreviated parent hashes
'%an': author name
'%ae': author email
'%ad': author date
'%aD': author date, RFC2822 style
'%ar': author date, relative
'%at': author date, UNIX timestamp
'%cn': committer name
'%ce': committer email
'%cd': committer date
'%cD': committer date, RFC2822 style
'%cr': committer date, relative
'%ct': committer date, UNIX timestamp
'%e': encoding
'%s': subject
'%b': body
'%Cred': switch color to red
'%Cgreen': switch color to green
'%Cblue': switch color to blue
'%Creset': reset color
'%n': newline
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
--pretty=o is a valid abbreviation, --pretty=omfg is not
Noticed by: Nicolas Vilz
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The internal function in_merge_bases(A, B) is used to make sure
that commit A is an ancestor of commit B. This changes the
signature of it to take an array of B's and updates its current
callers.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have a number of badly checked write() calls. Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes. Switch to using
the new write_in_full(). Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().
Note, the changes to config handling are much larger and handled
in the next patch in the sequence.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This modifies pretty_print_commit() to make the output of git-rev-list and
friends a bit more predictable.
A commit body starting with blank lines might be unheard-of, but still possible
to create using git-commit-tree (so is bound to appear somewhere, sometime).
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We special case the case where encoding recorded in the commit
and the output encoding are the same and do not call iconv().
But we should drop 'encoding' header for this case as well for
consistency.
Signed-off-by: Junio C Hamano <junkio@cox.net>
After re-coding the commit message into the encoding the user
specified (either with core.logoutputencidng or --encoding
option), this drops the "encoding" header altogether. The
output is after re-coding as the user asked (either with the
config or --encoding=<encoding> option), and the extra header
becomes redundant information.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Telling the git-log family not to do any character conversion is
done with --encoding=none, which sets log_output_encoding to an
empty string. However, logmsg_reencode() confused this with
log_output_encoding and commit_encoding set to NULL. The latter
means we should use the default encoding (i.e. utf-8).
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/utf8:
t3900: test conversion to non UTF-8 as well
Rename t3900 test vector file
UTF-8: introduce i18n.logoutputencoding.
Teach log family --encoding
i18n.logToUTF8: convert commit log message to UTF-8
Move encoding conversion routine out of mailinfo to utf8.c
Conflicts:
commit.c
It is plausible for somebody to want to view the commit log in a
different encoding from i18n.commitencoding -- the project's
policy may be UTF-8 and the user may be using a commit message
hook to run iconv to conform to that policy (and either not have
i18n.commitencoding to default to UTF-8 or have it explicitly
set to UTF-8). Even then, Latin-1 may be more convenient for
the usual pager and the terminal the user uses.
The new variable i18n.logoutputencoding is used in preference to
i18n.commitencoding to decide what encoding to recode the log
output in when git-log and friends formats the commit log message.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is to adjust to:
count-objects -v: show number of packs as well.
which will break a test in this series.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Updated commit objects record the encoding used in their
encoding header. This updates the log family to reencode it
into the encoding specified in i18n.commitencoding (or the
default, which is "utf-8") upon output.
To force a specific encoding that is different, log family takes
command line flag --encoding=<encoding>; giving --encoding=none
entirely disables the reencoding and lets you view log messges
in their original encoding.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The output from "symmetric diff", i.e. A...B, does not
distinguish between commits that are reachable from A and the
ones that are reachable from B. In this picture, such a
symmetric diff includes commits marked with a and b.
x---b---b branch B
/ \ /
/ .
/ / \
o---x---a---a branch A
However, you cannot tell which ones are 'a' and which ones are
'b' from the output. Sometimes this is frustrating. This adds
an output option, --left-right, to rev-list.
rev-list --left-right A...B
would show ones reachable from A prefixed with '<' and the ones
reachable from B prefixed with '>'.
When combined with --boundary, boundary commits (the ones marked
with 'x' in the above picture) are shown with prefix '-', so you
would see list that looks like this:
git rev-list --left-right --boundary --pretty=oneline A...B
>bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 3rd on b
>bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 2nd on b
<aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 3rd on a
<aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 2nd on a
-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1st on b
-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1st on a
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, by saying "git fetch -depth <n> <repo>" you can deepen
a shallow repository.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
A shallow commit is a commit which has parents, which in turn are
"grafted away", i.e. the commit appears as if it were a root.
Since these shallow commits should not be edited by the user, but
only by core git, they are recorded in the file $GIT_DIR/shallow.
A repository containing shallow commits is called shallow.
The advantage of a shallow repository is that even if the upstream
contains lots of history, your local (shallow) repository needs not
occupy much disk space.
The disadvantage is that you might miss a merge base when pulling
some remote branch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I was running git show on various commits found by fsck-objects
when I found this bug. Since find_unique_abbrev() cannot find
an abbreviation for an object not in the database, it will
return NULL, which is bad to run strlen() on. So instead, we'll
just display the unabbreviated sha1 that we referenced in the
commit.
I'm not sure that this is the best 'fix' for it because the
commit I was trying to show was broken, but I don't think a
program should segfault even if the user tries to do something
stupid.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I noticed that I was looking at the kernel gitweb output at some point
rather than just do "git log", simply because I liked seeing the
simplified date-format, ie the "5 days ago" rather than a full date.
This adds infrastructure to do that for "git log" too. It does NOT add the
actual flag to enable it, though, so right now this patch is a no-op, but
it should now be easy to add a command line flag (and possibly a config
file option) to just turn on the "relative" date format.
The exact cut-off points when it switches from days to weeks etc are
totally arbitrary, but are picked somewhat to avoid the "1 weeks ago"
thing (by making it show "10 days ago" rather than "1 week", or "70
minutes ago" rather than "1 hour ago").
[jc: with minor fix and tweak around "month" and "week" area.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Introduces global inline:
hashcmp(const unsigned char *sha1, const unsigned char *sha2)
Uses memcmp for comparison and returns the result based on the length of
the hash name (a future runtime decision).
Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
[jc: I needed to hand merge the changes to the updated codebase,
so the result needs to be checked.]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
format-patch previously didn't generate a newline after a subject. This
caused the diffstat to not be displayed in messages with only one line
for the commit message.
This patch fixes this by adding a newline after the headers if a body
hasn't been added.
Signed-off-by: Robert Shearman <rob@codeweavers.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.
Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This removes the "contaminate the well even more" approach
taken in the current merge-base postprosessing code. Instead,
when there are more than one merge-base results, we compute the
merge-base between them and see if one is a fast-forward of the
other, in which case the ancestor is removed from the result.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Fix clear_commit_marks() enough to be usable in
get_merge_bases(), and retire now unused clear_object_marks().
Signed-off-by: Junio C Hamano <junkio@cox.net>
Earlier change broke "git describe A B" among other things.
Revert it for now, and clean the commits smudged by
get_merge_bases using clear_object_marks() function. For
complex commit ancestry graph, this is way cheaper as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Actually in this case we would have traversed a lot of commits, so cleaning
things up is even more important.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Change get_merge_bases() to be able to clean up after itself if
needed by adding a cleanup parameter.
We don't need to save the flags and restore them afterwards anymore;
that was a leftover from before the flags were moved out of the
range used in revision.c. clear_commit_marks() sets them to zero,
which is enough.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Don't care if objects have been parsed or not and don't stop when we
reach a commit that is already clean -- its parents could be dirty.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add get_merge_bases_clean(), a wrapper for get_merge_bases() that cleans
up after doing its work and make get_merge_bases() NOT clean up.
Single-shot programs like git-merge-base can use the dirty and fast
version.
Also move the object flags used in get_merge_bases() out of the range
defined in revision.h. This fixes the "66ae0c77...ced9456a
89719209...262a6ef7" test of the ... operator which is introduced with
the next patch.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This creates a simple specialized object allocator for basic
objects.
This avoids wasting space with malloc overhead (metadata and
extra alignment), since the specialized allocator knows the
alignment, and that objects, once allocated, are never freed.
It also allows us to track some basic statistics about object
allocations. For example, for the mozilla import, it shows
object usage as follows:
blobs: 627629 (14710 kB)
trees: 1119035 (34969 kB)
commits: 196423 (8440 kB)
tags: 1336 (46 kB)
and the simpler allocator shaves off about 2.5% off the memory
footprint off a "git-rev-list --all --objects", and is a bit
faster too.
[ Side note: this concludes the series of "save memory in object storage".
The thing is, there simply isn't much more to be saved on the objects.
Doing "git-rev-list --all --objects" on the mozilla archive has a final
total RSS of 131498 pages for me: that's about 513MB. Of that, the
object overhead is now just 56MB, the rest is going somewhere else (put
another way: the fact that this patch shaves off 2.5% of the total
memory overhead, considering that objects are now not much more than 10%
of the total shows how big the wasted space really was: this makes
object allocations much more memory- and time-efficient).
I haven't looked at where the rest is, but I suspect the bulk of it is
just the pack-file loading. It may be that we should pack the tree
objects separately from the blob objects: for git-rev-list --objects, we
don't actually ever need to even look at the blobs, but since trees and
blobs are interspersed in the pack-file, we end up not being dense in
the tree accesses, so we end up looking at more pages than we strictly
need to.
So with a 535MB pack-file, it's entirely possible - even likely - that
most of the remaining RSS is just the mmap of the pack-file itself. We
don't need to map in _all_ of it, but we do end up mapping a fair
amount. ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Every single user actually wanted this only for commit objects, and we
have no reason to waste space on it for other object types. So just move
the structure member from the low-level "struct object" into the "struct
commit".
This leaves the commit object the same size, and removes one unnecessary
pointer from all other object allocations.
This shrinks memory usage (still at a fairly hefty half-gig, admittedly)
of "git-rev-list --all --objects" on the mozilla repo by another 5% in my
tests.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When --attach is not used, usually we do not say Content-Type:
and fluff, but if the commit message is not 7-bit ASCII, mark
it as "text/plain; charset=UTF-8". This unclutters output
somewhat.
Signed-off-by: Junio C Hamano <junkio@cox.net>
By convention, the commit message and the author/committer names
in the commit objects are UTF-8 encoded. When formatting for
e-mails, Q-encode them according to RFC 2047.
While we are at it, generate the content-type and
content-transfer-encoding headers as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch touches a couple of files, because it adds options to print a
custom text just after the subject of a commit, and just after the
diffstat.
[jc: made "many dashes" used as the boundary leader into a single
variable, to reduce the possibility of later tweaks to miscount the
number of dashes to break it.]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* master: (119 commits)
diff family: add --check option
Document that "git add" only adds non-ignored files.
Add a conversion tool to migrate remote information into the config
fetch, pull: ask config for remote information
Fix build procedure for builtin-init-db
read-tree -m -u: do not overwrite or remove untracked working tree files.
apply --cached: do not check newly added file in the working tree
Implement a --dry-run option to git-quiltimport
Implement git-quiltimport
Revert "builtin-grep: workaround for non GNU grep."
builtin-grep: workaround for non GNU grep.
builtin-grep: workaround for non GNU grep.
git-am: use apply --cached
apply --cached: apply a patch without using working tree.
apply --numstat: show new name, not old name.
Documentation/Makefile: create tarballs for the man pages and html files
Allow pickaxe and diff-filter options to be used by git log.
Libify the index refresh logic
Builtin git-init-db
Remove unnecessary local in get_ref_sha1.
...