* jl/nor-or-nand-and:
code and test: fix misuses of "nor"
comments: fix misuses of "nor"
contrib: fix misuses of "nor"
Documentation: fix misuses of "nor"
* cn/fetch-prune-overlapping-destination:
fetch: handle overlaping refspecs on --prune
fetch: add a failing test for prunning with overlapping refspecs
Unicode 6.3 defines more code points as combining or accents. For
example, the character "ö" could be expressed as an "o" followed by
U+0308 COMBINING DIARESIS (aka umlaut, double-dot-above). We should
consider that such a sequence of two codepoints occupies one display
column for the alignment purposes, and for that, git_wcwidth()
should return 0 for them. Affected codepoints are:
U+0358..U+035C
U+0487
U+05A2, U+05BA, U+05C5, U+05C7
U+0604, U+0616..U+061A, U+0659..U+065F
Earlier unicode standards had defined these as "reserved".
Only the range 0..U+07FF has been checked to see which codepoints
need to be marked as 0-width while preparing for this commit; more
updates may be needed.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both MSVC and MINGW have alloca(3) definitions in malloc.h, so by moving
win32-compat alloca.h from compat/vcbuild/include/ to compat/win32/ ,
which is included by both MSVC and MINGW CFLAGS, we can make alloca()
work on both those Windows environments.
In MINGW, malloc.h has explicit check for GNUC and if it is so, defines
alloca to __builtin_alloca, so it looks like we don't need to add any
code to here-shipped alloca.h to get optimum performance.
Compile-tested on Windows in MSysGit.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Acked-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bp/commit-p-editor:
run-command: mark run_hook_with_custom_index as deprecated
merge hook tests: fix and update tests
merge: fix GIT_EDITOR override for commit hook
commit: fix patch hunk editing with "commit -p -m"
test patch hunk editing with "commit -p -m"
merge hook tests: use 'test_must_fail' instead of '!'
merge hook tests: fix missing '&&' in test
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.
* jl/nor-or-nand-and:
code and test: fix misuses of "nor"
comments: fix misuses of "nor"
contrib: fix misuses of "nor"
Documentation: fix misuses of "nor"
OPT_SET_PTR() implementation was broken on IL32P64 platforms;
it turns out that the macro is not used by any real user.
* mr/opt-set-ptr:
parse-options: remove unused OPT_SET_PTR
parse-options: add cast to correct pointer type to OPT_SET_PTR
MSVC: fix t0040-parse-options crash
Make sure that the help text given to describe the "<param>" part
of the "git cmd --option=<param>" does not contain SP or _,
e.g. "--gpg-sign=<key-id>" option for "git commit" is not spelled
as "--gpg-sign=<key id>".
* jc/rev-parse-argh-dashed-multi-words:
parse-options: make sure argh string does not have SP or _
update-index: teach --cacheinfo a new syntax "mode,sha1,path"
parse-options: multi-word argh should use dash to separate words
'git p4 rebase' fails with the following message if there is a file
named HEAD in the current directory:
fatal: ambiguous argument 'HEAD': both revision and filename
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Take the suggestion above and explicitly state that HEAD should be
treated as a revision.
Signed-off-by: Vlad Dogaru <vdogaru@ixiacom.com>
Acked-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As was recently shown in "combine-diff: optimize
combine_diff_path sets intersection", combine-diff runs very slowly. In
that commit we optimized paths sets intersection, but that accounted
only for ~ 25% of the slowness, and as my tracing showed, for linux.git
v3.10..v3.11, for merges a lot of time is spent computing
diff(commit,commit^2) just to only then intersect that huge diff to
almost small set of files from diff(commit,commit^1).
In previous commit, we described the problem in more details, and
reworked the diff tree-walker to be general one - i.e. to work in
multiple parent case too. Now is the time to take advantage of it for
finding paths for combine diff.
The implementation is straightforward - if we know, we can get generated
diff paths directly, and at present that means no diff filtering or
rename/copy detection was requested(*), we can call multiparent tree-walker
directly and get ready paths.
(*) because e.g. at present, all diffcore transformations work on
diff_filepair queues, but in the future, that limitation can be
lifted, if filters would operate directly on combine_diff_paths.
Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log")
and with `-c` ("git log -c") and with `-c --merges` ("git log -c --merges")
before and after the patch are as follows:
linux.git v3.10..v3.11
log log -c log -c --merges
before 1.9s 16.4s 15.2s
after 1.9s 2.4s 1.1s
The result stayed the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.
In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.
For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .
There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .
That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.
That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).
That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:
D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths)
where
D(A,P1...Pn) is combined diff between commit A, and parents Pi
and
D(A,Pi) is usual two-tree diff Pi..A
So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.
And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.
The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.
Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:
D(T,P1...Pn) calculation scheme
-------------------------------
D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set)
D(T,Pj) - diff between T..Pj
D(T,P1...Pn) - combined diff from T to parents P1,...,Pn
We start from all trees, which are sorted, and compare their entries in
lock-step:
T P1 Pn
- - -
|t| |p1| |pn|
|-| |--| ... |--| imin = argmin(p1...pn)
| | | | | |
|-| |--| |--|
|.| |. | |. |
. . .
. . .
at any time there could be 3 cases:
1) t < p[imin];
2) t > p[imin];
3) t = p[imin].
Schematic deduction of what every case means, and what to do, follows:
1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓
2) t > p[imin]
2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓
3) t = p[imin]
3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate
3.2) pi = p[imin] -> investigate δ(t,pi)
|
|
v
3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø ->
⎧δ(t,pi) - if pi=p[imin]
-> D += ⎨
⎩"+t" - if pi>p[imin]
in any case t↓ ∀ pi=p[imin] pi↓
~
For comparison, here is how diff_tree() works:
D(A,B) calculation scheme
-------------------------
A B
- -
|a| |b| a < b -> a ∉ B -> D(A,B) += +a a↓
|-| |-| a > b -> b ∉ A -> D(A,B) += -b b↓
| | | | a = b -> investigate δ(a,b) a↓ b↓
|-| |-|
|.| |.|
. .
. .
~~~~~~~~
This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff
D(A,B)
is by definition the same as combined diff
D(A,[B]),
so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.
What we do is as follows:
1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
a paths generator (new name diff_tree_paths()), with each generated path
being `struct combine_diff_path` with info for path, new sha1,mode and for
every parent which sha1,mode it was in it.
2) From that info, we can still generate usual diff queue with
struct diff_filepairs, via "exporting" generated
combine_diff_path, if we know we run for nparent=1 case.
(see emit_diff() which is now named emit_diff_first_parent_only())
3) In order for diff_can_quit_early(), which checks
DIFF_OPT_TST(opt, HAS_CHANGES))
to work, that exporting have to be happening not in bulk, but
incrementally, one diff path at a time.
For such consumers, there is a new callback in diff_options
introduced:
->pathchange(opt, struct combine_diff_path *)
which, if set to !NULL, is called for every generated path.
(see new compat ll_diff_tree_sha1() wrapper around new paths
generator for setup)
4) The paths generation itself, is reworked from previous
ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
scheme" provided above:
On the start we allocate [nparent] arrays in place what was
earlier just for one parent tree.
then we just generalize loops, and comparison according to the
algorithm.
Some notes(*):
1) alloca(), for small arrays, is used for "runs not slower for
nparent=1 case than before" goal - if we change it to xmalloc()/free()
the timings get ~1% worse. For alloca() we use just-introduced
xalloca/xalloca_free compatibility wrappers, so it should not be a
portability problem.
2) For every parent tree, we need to keep a tag, whether entry from that
parent equals to entry from minimal parent. For performance reasons I'm
keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
Not doing so, we'd need to alloca another [nparent] array, which hurts
performance.
3) For emitted paths, memory could be reused, if we know the path was
processed via callback and will not be needed later. We use efficient
hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
of potential additional slowdown.
4) goto(s) are used in several places, as the code executes a little bit
faster with lowered register pressure.
Also
- we should now check for FIND_COPIES_HARDER not only when two entries
names are the same, and their hashes are equal, but also for a case,
when a path was removed from some of all parents having it.
The reason is, if we don't, that path won't be emitted at all (see
"a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
all paths - with diff or without - to be emitted, to be later analyzed
for being copies sources.
The new check is only necessary for nparent >1, as for nparent=1 case
xmin_eqtotal always =1 =nparent, and a path is always added to diff as
removal.
~~~~~~~~
Timings for
# without -c, i.e. testing only nparent=1 case
`git log --raw --no-abbrev --no-renames`
before and after the patch are as follows:
navy.git linux.git v3.10..v3.11
before 0.611s 1.889s
after 0.619s 1.907s
slowdown 1.3% 0.9%
This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.
The output also stayed the same.
(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
"do no harm for nparent=1 case" rule.
For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.
P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with
`git log --reverse --raw --no-abbrev --no-renames -c`
to extract log of what was created/changed when, as a result building a
map
{} sha1 -> in which commit (and date) a content was added
that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.
That case was my initial motivation for combined diffs speedup.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we free the transaction when we are done, there is no need to
make a copy of transaction->updates before working with it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It used to be that ref_transaction_commit() allocated a temporary
array to hold the types of references while it is working. Instead,
add a type field to ref_update that ref_transaction_commit() can use
as its scratch space.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we manage ref_update objects internally, we can use them to
hold some of the scratch space we need when actually carrying out the
updates. Store the (struct ref_lock *) there.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use temporary variables in the for-loop blocks to simplify expressions
in the rest of the loop.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is consistent with the usual nomenclature.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been superseded by reference transactions. This also means
that struct ref_update can become private.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This change is mostly clerical: the parse_cmd_*() functions need to
use local variables rather than a struct ref_update to collect the
arguments needed for each update, and then call ref_transaction_*() to
queue the change rather than building up the list of changes at the
caller side.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build out the API for dealing with a bunch of reference checks and
changes within a transaction. Define an opaque ref_transaction type
that is managed entirely within refs.c. Introduce functions for
beginning a transaction, adding updates to a transaction, and
committing/rolling back a transaction.
This API will soon replace update_refs().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make (most of) the error messages for invalid input have the same
format [1]:
$COMMAND [SP $REFNAME]: $MESSAGE
Update the tests accordingly.
[1] A few error messages are left with their old form, because
$COMMAND and $REFNAME aren't passed all the way down the call
stack. Maybe those sites should be changed some day, too.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Distinguish this error from the error that an argument is missing for
another reason. Update the tests accordingly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This case wants to test passing a bad refname to the "update" command.
But it also passes too few arguments to "update", which muddles the
situation: which error should be diagnosed? So split this test into
two:
* One that passes too few arguments to update
* One that passes all three arguments to "update", but with a bad
refname.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the original version of this command, for the single case of the
"update" command's <newvalue>, the empty string was interpreted as
being equivalent to 40 "0"s. This shorthand is unnecessary (binary
input will usually be generated programmatically anyway), and it
complicates the parser and the documentation.
So gently deprecate this usage: remove its description from the
documentation and emit a warning if it is found. But for reasons of
backwards compatibility, continue to accept it.
Helped-by: Brad King <brad.king@kitware.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace three functions, update_store_new_sha1(),
update_store_old_sha1(), and parse_next_arg(), with a single function,
parse_next_sha1(). The new function takes care of a whole argument,
including checking whether it is there, converting it to an SHA-1, and
emitting errors on EOF or for invalid values. The return value
indicates whether the argument was present or absent, which requires
a bit of intelligence because absent values are represented
differently depending on whether "-z" was used.
The new interface means that the calling functions, parse_cmd_*(),
don't have to interpret the result differently based on the
line_termination mode that is in effect. It also means that
parse_cmd_create() can distinguish unambiguously between an empty new
value and a zeros new value, which fixes a failure in t1400.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the (slightly inconsistent) status quo; make sure it doesn't
change by accident.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of, for example,
fatal: update refs/heads/master missing [<oldvalue>] NUL
emit
fatal: update refs/heads/master missing <oldvalue>
Update the tests accordingly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old error messages emitted for invalid input sometimes said
"<oldvalue>"/"<newvalue>" and sometimes said "old value"/"new value".
Convert them all to the former. Update the tests accordingly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If an invalid value is passed to "update-ref --stdin" as <oldvalue> or
<newvalue>, include the command and the name of the reference at the
beginning of the error message. Update the tests accordingly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason to obscure the fact that parse_first_arg() always
parses refnames. Form the new function by combining parse_first_arg()
and update_store_ref_name().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aside from avoiding a tiny bit of work, this makes it transparently
obvious that old_sha1 and new_sha1 are identical. It is arguably a
bit silly to have to set new_sha1 in order to verify old_sha1, but
that is a problem for another day.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Read the whole input into a strbuf at once, and then parse it from
there. This might also be a tad faster, but that is not the point.
The point is to decouple the parsing code from the input source (the
old parsing code had to read new data even in the middle of commands).
Add docstrings for the parsing functions.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old signature of update_refs() required a
(const struct ref_update **) for its updates_orig argument. The
"const" is presumably there to promise that the function will not
modify the contents of the structures.
But this declaration does not permit the function to be called with a
(struct ref_update **), which is perfectly legitimate. C's type
system is not powerful enough to express what we'd like. So remove
the first "const" from the declaration.
On the other hand, the function *can* promise not to modify the
pointers within the array that is passed to it without inconveniencing
its callers. So add a "const" that has that effect, making the final
declaration
(struct ref_update * const *).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given that these constants are only being used when updating
references, it is inappropriate to give them such generic names as
"DIE_ON_ERR". So prefix their names with "UPDATE_REFS_".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously there were no good tests of C-quoted arguments.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old parse_arg(), when fed an argument
"refs/heads/a"master
parsed 'refs/heads/a' off of the front of the argument and considered
itself successful. It was only when parse_next_arg() tried to parse
the *next* argument that a problem was noticed. But in fact, the
definition of the input format requires arguments to be terminated by
SP or NUL, so *this* argument is already erroneous and parse_arg()
should diagnose the problem.
So teach parse_arg() to verify that C-quoted arguments are terminated
correctly. If not, emit a more specific error message.
There is no corresponding error case of a non-C-quoted argument that
is not terminated correctly, because the end of a non-quoted argument
is *by definition* a space or NUL, so there is no way to insert other
junk between the "end" of the argument and the argument terminator.
Adjust the tests to expect the new error message. Add a docstring to
the function, incorporating the comments that were formerly within the
function plus some added information.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>