"git diff --find-copies-harder" sometimes pretended as if the mode
bits have changed for paths that are marked with assume-unchanged
bit.
* jk/diff-files-assume-unchanged:
run_diff_files: do not look at uninitialized stat data
"git commit --allow-empty-message -C $commit" did not work when the
commit did not have any log message.
* jk/commit-C-pick-empty:
commit: do not complain of empty messages from -C
"git blame" assigned the blame to the copy in the working-tree if
the repository is set to core.autocrlf=input and the file used CRLF
line endings.
* bc/blame-crlf-test:
blame: correctly handle files regardless of autocrlf
"git blame" miscounted number of columns needed to show localized
timestamps, resulting in jaggy left-side-edge of the source code
lines in its output.
* jx/blame-align-relative-time:
blame: dynamic blame_date_width for different locales
blame: fix broken time_buf paddings in relative timestamp
"--ignore-space-change" option of "git apply" ignored the spaces
at the beginning of line too aggressively, which is inconsistent
with the option of the same name "diff" and "git diff" have.
* jc/apply-ignore-whitespace:
apply --ignore-space-change: lines with and without leading whitespaces do not match
The completion scripts (in contrib/) did not know about quite a few
options that are common between "git merge" and "git pull", and a
couple of options unique to "git merge".
* jk/complete-merge-pull:
completion: add missing options for git-merge
completion: add a note that merge options are shared
The "mailmap.file" configuration option did not support the tilde
expansion (i.e. ~user/path and ~/path).
* ow/config-mailmap-pathname:
config: respect '~' and '~user' in mailmap.file
The "%<(10,trunc)%s" pretty format specifier in the log family of
commands is used to truncate the string to a given length (e.g. 10
in the example) with padding to column-align the output, but did
not take into account that number of bytes and number of display
columns are different.
* as/pretty-truncate:
pretty.c: format string with truncate respects logOutputEncoding
t4205, t6006: add tests that fail with i18n.logOutputEncoding set
t4205 (log-pretty-format): use `tformat` rather than `format`
t4041, t4205, t6006, t7102: don't hardcode tested encoding value
t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs
"git log -2master" is a common typo that shows two commits starting
from whichever random branch that is not 'master' that happens to
be checked out currently.
* jc/revision-dash-count-parsing:
revision: parse "git log -<count>" more carefully
Reworded the error message given upon a failure to open an existing
loose object file due to e.g. permission issues; it was reported as
the object being corrupt, but that is not quite true.
* jk/report-fail-to-read-objects-better:
open_sha1_file: report "most interesting" errno
Tools that read diagnostic output in our standard error stream do
not want to see terminal control sequence (e.g. erase-to-eol).
Detect them by checking if the standard error stream is connected
to a tty.
* mn/sideband-no-ansi:
sideband.c: do not use ANSI control sequence on non-terminal
We used to unconditionally disable the pager in the pager process
we spawn to feed out output, but that prevented people who want to
run "less" within "less" from doing so.
* je/pager-do-not-recurse:
pager: do allow spawning pager recursively
31b808a0 (clone --single: limit the fetch refspec to fetched branch,
2012-09-20) tried to see if the given "branch" to follow is actually
a tag at the remote repository by checking with "refs/tags/" but it
incorrectly used strstr(3); it is actively wrong to treat a "branch"
"refs/heads/refs/tags/foo" and use the logic for the "refs/tags/"
ref hierarchy. What the code really wanted to do is to see if it
starts with "refs/tags/".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user asks for --format=%G with nothing else, we
correctly realize that "%G" is not a valid placeholder (it
should be "%G?", "%GK", etc). But we still tell the
strbuf_expand code that we consumed 2 characters, causing it
to jump over the trailing NUL and output garbage.
This also fixes the case where "%GX" would be consumed (and
produce no output). In other cases, we pass unrecognized
placeholders through to the final string.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not check these along with the other pretty-format
placeholders in t6006, because we need signed commits to
make them interesting. t7510 has such commits, and can
easily exercise them in addition to the regular
--show-signature code path.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We tested both good and bad signatures, but not ones made
correctly but with a key for which we have no trust.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check multiple commits in a loop. Because we want to
break out of the loop if any single iteration fails, we use
a subshell/exit like:
(
for i in $stuff
do
do-something $i || exit 1
done
)
However, we are inconsistent in our loop body. Some commands
get their own "|| exit 1", and others try to chain to the
next command with "&&", like:
X &&
Y || exit 1
Z || exit 1
This is a little hard to read and follow, because X and Y
are treated differently for no good reason. But much worse,
the second loop follows a similar pattern and gets it wrong.
"Y" is expected to fail, so we use "&& exit 1", giving us:
X &&
Y && exit 1
Z || exit 1
That gets the test for X wrong (we do not exit unless both X
fails and Y unexpectedly succeeds, but we would want to exit
if _either_ is wrong). We can write this clearly and
correctly by consistently using "&&", followed by a single
"|| exit 1", and negating Y with "!" (as we would in a
normal &&-chain). Like:
X &&
! Y &&
Z || exit 1
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our setup creates a sequence of commits, each with its own
tag. However, we sometimes refer to "seventh-signed" as
"master". This works, since it is at the tip of the created
branch, but is brittle if new tests need to add more
commits. Let's use its tag name to be unambiguous.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If git rebase --merge encountered a conflict, --skip would not work if the
next commit also conflicted. The msgnum file would never be updated with
the new patch number, so no patch would actually be skipped, resulting in an
inescapable loop.
Update the msgnum file's value as the first thing in call_merge. This also
avoids an "Already applied" message when skipping a commit. There is no
visible change for the other contexts in which call_merge is invoked, as the
msgnum file's value remains unchanged in those situations.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git submodule sync" command supports the --recursive flag, but
the documentation does not mention this. That flag is useful, for
example when a remote is changed in a submodule of a submodule.
Signed-off-by: Matthew Chen <charlesmchen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The original used to pass with /bin/dash but not with /bin/bash set
to $SHELL_PATH. The former turns "\\" into "\", but the latter does
not.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).
However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:
[baseline, no signatures]
$ time git log >/dev/null
real 0m4.902s
user 0m4.784s
sys 0m0.120s
[before]
$ time git log --show-signature >/dev/null
real 0m14.735s
user 0m9.964s
sys 0m0.944s
[after]
$ time git log --show-signature >/dev/null
real 0m9.981s
user 0m5.260s
sys 0m0.936s
Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.
An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.
For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.
This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will make it easier to manage the buffer cache
independently of the "struct commit" objects. It also
shrinks "struct commit" by one pointer, which may be
helpful.
Unfortunately it does not reduce the max memory size of
something like "rev-list", because rev-list uses
get_cached_commit_buffer() to decide not to show each
commit's output (and due to the design of slab_at, accessing
the slab requires us to extend it, allocating exactly the
same number of buffer pointers we dropped from the commit
structs).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers currently must use init_foo_slab() at runtime before
accessing a slab. For global slabs, it's much nicer if we
can initialize them in BSS, so that each user does not have
to add code to check-and-initialize.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.
And as a side effect, we abstract away the final bare uses
of commit->buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.
We return either:
1. The original commit->buffer
2. A newly allocated buffer from read_sha1_file
3. A reencoded buffer (based on either 1 or 2 above).
while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For both of these sites, we already do the "fallback to
read_sha1_file" trick. But we can shorten the code by just
using get_commit_buffer.
Note that the error cases are slightly different when
read_sha1_file fails. get_commit_buffer will die() if the
object cannot be loaded, or is a non-commit.
For get_sha1_oneline, this will almost certainly never
happen, as we will have just called parse_object (and if it
does, it's probably worth complaining about).
For record_author_date, the new behavior is probably better;
we notify the user of the error instead of silently ignoring
it. And because it's used only for sorting by author-date,
somebody examining a corrupt repo can fallback to the
regular traversal order.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some call sites check commit->buffer to see whether we have
a cached buffer, and if so, do some work with it. In the
long run we may want to switch these code paths to make
their decision on a different boolean flag (because checking
the cache may get a little more expensive in the future).
But for now, we can easily support them by converting the
calls to use get_cached_commit_buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many sites look at commit->buffer to get more detailed
information than what is in the parsed commit struct.
However, we sometimes drop commit->buffer to save memory,
in which case the caller would need to read the object
afresh. Some callers do this (leading to duplicated code),
and others do not (which opens the possibility of a segfault
if somebody else frees the buffer).
Let's provide a pair of helpers, "get" and "unuse", that let
callers easily get the buffer. They will use the cached
buffer when possible, and otherwise load from disk using
read_sha1_file.
Note that we also need to add a "get_cached" variant which
returns NULL when we do not have a cached buffer. At first
glance this seems to defeat the purpose of "get", which is
to always provide a return value. However, some log code
paths actually use the NULL-ness of commit->buffer as a
boolean flag to decide whether to try printing the
commit. At least for now, we want to continue supporting
that use.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now this is just a one-liner, but abstracting it will
make it easier to change later.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is not C. The code would break under mksh when 'pull.ff' is set:
$ git pull
/usr/lib/git-core/git-pull[67]: break: can't break
Already up-to-date.
Signed-off-by: Jacek Konieczny <jajcus@jajcus.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplifies the code, as logmsg_reencode handles the
reencoding for us in a single call. It also means we learn
logmsg_reencode's trick of pulling the buffer from disk when
commit->buffer is NULL (we currently just silently return!).
It is doubtful this matters in practice, though, as
sequencer operations would not generally turn off
save_commit_buffer.
Note that we may be fixing a bug here. The existing code
does:
if (same_encoding(to, from))
reencode_string(buf, to, from);
That probably should have been "!same_encoding".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const. We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In both blame and merge-recursive, we sometimes create a
"fake" commit struct for convenience (e.g., to represent the
HEAD state as if we would commit it). By allocating
ourselves rather than using alloc_commit_node, we do not
properly set the "index" field of the commit. This can
produce subtle bugs if we then use commit-slab on the
resulting commit, as we will share the "0" index with
another commit.
We can fix this by using alloc_commit_node() to allocate.
Note that we cannot free the result, as it is part of our
commit allocator. However, both cases were already leaking
the allocated commit anyway, so there's nothing to fix up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we create a commit object via lookup_commit, we
give it a unique index to be used with the commit-slab API.
The theory is that any "struct commit" we create would
follow this code path, so any such struct would get an
index. However, callers could use alloc_commit_node()
directly (and get multiple commits with index 0).
Let's push the indexing into alloc_commit_node so that it's
hard for callers to get it wrong.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 2c1cbec (Use proper object allocators for unknown
object nodes too, 2007-04-16), added a special "any_object"
allocator, it never taught alloc_report to report on it. To
do so we need to add an extra type argument to the REPORT
macro, as that commit did for DEFINE_ALLOCATOR.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is not a good idea to strbuf_attach an arbitrary pointer
just because a function you are calling wants a strbuf.
Attaching implies a transfer of memory ownership; if anyone
were to modify or release the resulting strbuf, we would
free() the pointer, leading to possible problems:
1. Other users of the original pointer might access freed
memory.
2. The pointer might not be the start of a malloc'd
area, so calling free() on it in the first place would
be wrong.
In the two cases modified here, we are fortunate that nobody
touches the strbuf once it is attached, but it is an
accident waiting to happen. Since the previous commit,
commit_tree and friends take a pointer/buf pair, so we can
just do away with the strbufs entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While strbufs are pretty common throughout our code, it is
more flexible for functions to take a pointer/len pair than
a strbuf. It's easy to turn a strbuf into such a pair (by
dereferencing its members), but less easy to go the other
way (you can strbuf_attach, but that has implications about
memory ownership).
This patch teaches commit_tree (and its associated callers
and sub-functions) to take such a pair for the commit
message rather than a strbuf. This makes passing the buffer
around slightly more verbose, but means we can get rid of
some dangerous strbuf_attach calls in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config name is "writeBitmaps", so the internal variable
missing the plural is unnecessarily confusing to write.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config option to turn on bitmaps is read all the way
down in the plumbing of pack-objects. This makes it hard for
other options in the porcelain of repack to make decisions
based on the bitmap setting. For example,
repack.packKeptObjects tries to kick in by default only when
bitmaps are turned on. But it can't do so reliably because
it doesn't yet know whether we are using bitmaps.
This patch teaches repack to respect pack.writebitmaps. It
means we pass a redundant command-line flag to pack-objects,
but that's OK; it shouldn't affect the outcome.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit ee34a2b (repack: add `repack.packKeptObjects` config
var, 2014-03-03) added a flag which could duplicate kept
objects, but did not mean to turn it on by default. Instead,
the option is tied by default to the decision to write
bitmaps, like:
if (pack_kept_objects < 0)
pack_kept_objects = write_bitmap;
after which we expect pack_kept_objects to be a boolean 0 or
1. However, that assignment neglects that write_bitmap is
_also_ a tri-state with "-1" as the default, and with
neither option given, we accidentally turn the option on.
This patch is the minimal fix to restore the desired
behavior for the default state. Further patches will fix the
more complicated cases.
Note the update to t7700. It failed to turn on bitmaps,
meaning we were actually confirming the wrong behavior!
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixed some minor typos in api-strbuf.txt: 'A' instead of 'An', 'have'
instead of 'has', a overlong line, and 'another' instead of 'an other'.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This mistyped command line simply ignores "master" and ends up
showing two commits from the current HEAD:
$ git log -2master
because we feed "2master" to atoi() without making sure that the
whole string is parsed as an integer.
Use the strtol_i() helper function instead.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/t7810-grep.sh had its own test_config() function which served the
same purpose as the one in t/test-lib-functions.sh. Removed, all tests
pass.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two commands are supposed to be equivalent:
$ git log --exclude=refs/notes/\* --all --no-merges --since=2.days |
git shortlog
$ git shortlog --exclude=refs/notes/\* --all --no-merges --since=2.days
However, the latter does not understand the ref-exclusion command
line option, even though other options understood by "log", such as
"--all" and "--no-merges", are understood.
This was because e7b432c5 (revision: introduce --exclude=<glob> to
tame wildcards, 2013-08-30) did not wire the new option fully to the
machinery. A new option understood by handle_revision_pseudo_opt()
must be told to handle_revision_opt() as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_cmp() is primarily meant to compare text files (and display the
difference for debug purposes).
Raw "cmp" is better suited to compare binary files (tar, zip, etc.).
On MinGW, test_cmp is a shell function mingw_test_cmp that tries to
read both files into environment, stripping CR characters (introduced
in commit 4d715ac0).
This function usually speeds things up, as fork is extremly slow on
Windows. But no wonder that this function is extremely slow and
sometimes even crashes when comparing large tar or zip files.
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>