* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function make_cache_entry() is too useful to be hidden away in
merge-recursive. So move it to libgit.a (exposing it via cache.h).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This brings builtin-stripspace, builtin-tag and mktag to use strbufs.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
convert_sha1_file() became unused by the previous patch -- remove it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning message to suggest "Consider running git-status" from
"git-diff" that we experimented with during the 1.5.3 cycle turns
out to be a bad idea. It robbed cache-dirty information from people
who valued it, while still asking users to run "update-index --refresh".
It was hoped that the new behaviour would at least have some educational
value, but not showing the cache-dirty paths like before meant that the
user would not even know easily which paths were cache-dirty, and it
made the need to refresh the index look like even more unnecessary chore.
This commit reinstates the traditional behaviour, but with a twist.
By default, the empty "diff --git" output is totally squelched out
from "git diff" output. At the end of the command, it automatically
runs "update-index --refresh" as needed, without even bothering the
user. In other words, people who do not care about the cache-dirtyness
do not even have to see the warning.
The traditional behaviour to see the stat-dirty output and to bypassing
the overhead of content comparison can be specified by setting the
configuration variable diff.autorefreshindex to false.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows to refresh only a subset of the project files, based on
the specified pathspecs.
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cr/tag:
Teach "git stripspace" the --strip-comments option
Make verify-tag a builtin.
builtin-tag.c: Fix two memory leaks and minor notation changes.
launch_editor(): Heed GIT_EDITOR and core.editor settings
Make git tag a builtin.
The read_tree() function is called only from the call chain to
run "git diff --cached" (this includes the internal call made by
git-runstatus to run_diff_index()). The function vacates stage
without any funky "merge" magic. The caller then goes and
compares stage #1 entries from the tree with stage #0 entries
from the original index.
When adding the cache entries this way, it used the general
purpose add_cache_entry(). This function looks for an existing
entry to replace or if there is none to find where to insert the
new entry, resolves D/F conflict and all the other things.
For the purpose of reading entries into an empty stage, none of
that processing is needed. We can instead append everything and
then sort the result at the end.
This commit changes read_tree() to first make sure that there is
no existing cache entries at specified stage, and if that is the
case, it runs add_cache_entry() with ADD_CACHE_JUST_APPEND flag
(new), and then sort the resulting cache using qsort().
This new flag tells add_cache_entry() to omit all the checks
such as "Does this path already exist? Does adding this path
remove other existing entries because it turns a directory to a
file?" and instead append the given cache entry straight at the
end of the active cache. The caller of course is expected to
sort the resulting cache at the end before using the result.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old version of work-tree support was an unholy mess, barely readable,
and not to the point.
For example, why do you have to provide a worktree, when it is not used?
As in "git status". Now it works.
Another riddle was: if you can have work trees inside the git dir, why
are some programs complaining that they need a work tree?
IOW it is allowed to call
$ git --git-dir=../ --work-tree=. bla
when you really want to. In this case, you are both in the git directory
and in the working tree. So, programs have to actually test for the right
thing, namely if they are inside a working tree, and not if they are
inside a git directory.
Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was
specified, unless there is a repository in the current working directory.
It does now.
The logic to determine if a repository is bare, or has a work tree
(tertium non datur), is this:
--work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true,
which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR
ends in /.git, which overrides the directory in which .git/ was found.
In related news, a long standing bug was fixed: when in .git/bla/x.git/,
which is a bare repository, git formerly assumed ../.. to be the
appropriate git dir. This problem was reported by Shawn Pearce to have
caused much pain, where a colleague mistakenly ran "git init" in "/" a
long time ago, and bare repositories just would not work.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the function set_git_dir() you can reset the path that will
be used for git_path(), git_dir() and friends.
The responsibility to close files and throw away information from the
old git_dir lies with the caller.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds convenience functions to work with absolute paths.
The function is_absolute_path() should help the efforts to integrate
the MinGW fork.
Note that make_absolute_path() returns a pointer to a static buffer.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the commit 'Add GIT_EDITOR environment and core.editor
configuration variables', this was done for the shell scripts.
Port it over to builtin-tag's version of launch_editor(), which
is just about to be refactored into editor.c.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new name is closer to the purpose of the function.
A NUL-terminated buffer makes things easier when callers need that.
Since the function returns only the memory written with data,
almost always allocating more space than needed because final
size is unknown, an extra NUL terminating the buffer is harmless.
It is not included in the returned size, so the function
remains working as before.
Also, now the function allows the buffer passed to be NULL at first,
and alloc_nr is now used for growing the buffer, instead size=*2.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These days, show_date() takes a date_mode parameter to specify
the output format, and a separate specialized function for dates
in E-mails does not make much sense anymore.
This retires show_rfc2822_date() function and make it just
another date output format.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support output of full ISO 8601 style dates in e.g. git log
and other places that use interpolation for formatting.
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split out the nnn{k,m,g} parsing code from git_config_int into
git_parse_long, so command-line parameters can enjoy the same
functionality. Also add get_parse_ulong for unsigned values.
Make git_config_int use git_parse_long, and add get_config_ulong
as well.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a configuration variable that performs the same function as,
but is overridden by, GIT_PAGER.
Signed-off-by: Brian Gernhardt <benji@silverinsanity.com>
Acked-by: Johannes E. Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ei/worktree+filter:
filter-branch: always export GIT_DIR if it is set
setup_git_directory: fix segfault if repository is found in cwd
test GIT_WORK_TREE
extend rev-parse test for --is-inside-work-tree
Use new semantics of is_bare/inside_git_dir/inside_work_tree
introduce GIT_WORK_TREE to specify the work tree
test git rev-parse
rev-parse: introduce --is-bare-repository
rev-parse: document --is-inside-git-dir
This patch arose from a discussion started by Jim Meyering's patch
whose intention was to provide better diagnostics for failed writes.
Linus proposed a better way to do things, which also had the added
benefit that adding a fflush() to git-log-* operations and incremental
git-blame operations could improve interactive respose time feel, at
the cost of making things a bit slower when we aren't piping the
output to a downstream program.
This patch skips the fflush() calls when stdout is a regular file, or
if the environment variable GIT_FLUSH is set to "0". This latter can
speed up a command such as:
GIT_FLUSH=0 strace -c -f -e write time git-rev-list HEAD | wc -l
a tiny amount.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We always quote "unusual" byte values in a pathname using
C-string style, to make it safer for parsing scripts that do not
handle NUL separated records well (or just too lazy to bother).
The absolute minimum bytes that need to be quoted for this
purpose are TAB, LF (and other control characters), double quote
and backslash.
However, we have also always quoted the bytes in high 8-bit
range; this was partly because we were lazy and partly because
we were being cautious.
This introduces an internal "quote_path_fully" variable, and
core.quotepath configuration variable to control it. When set
to false, it does not quote bytes in high 8-bit range anymore
but passes them intact.
The variable defaults to "true" to retain the traditional
behaviour for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ALLOC_GROW macro will never let us fill the array completely,
instead allocating an extra chunk if that would be the case. This is
because the 'nr' argument was originally treated as "how much we do have
now" instead of "how much do we want". The latter makes much more
sense because you can grow by more than one item.
This off-by-one never resulted in an error because it meant we were
overly conservative about when to allocate. Any callers which passed
"how much we have now" need to be updated, or they will fail to allocate
enough.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
so that an ugly commit message like this can be
handled sanely.
Currently, --pretty=oneline and --pretty=email (hence
format-patch) take and use only the first line of the commit log
message. This changes them to:
- Take the first paragraph, where the definition of the first
paragraph is "skip all blank lines from the beginning, and
then grab everything up to the next empty line".
- Replace all line breaks with a whitespace.
This change would not affect a well-behaved commit message that
adheres to the convention of "single line summary, a blank line,
and then body of message", as its first paragraph always
consists of a single line. Commit messages from different
culture, such as the ones imported from CVS/SVN, can however get
chomped with the existing behaviour at the first linebreak in
the middle of sentence right now, which would become much easier
to see with this change.
The Subject: and --pretty=oneline output would become very long
and unsightly for non-conforming commits, but their messages are
already ugly anyway, and thischange at least avoids the loss of
information.
The Subject: line from a multi-line paragraph is folded using
RFC2822 line folding rules at the places where line breaks were
in the original.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for keeping two entry lists in the
dir object.
This patch adds and uses the ALLOC_GROW() macro, which
implements the commonly used idiom of growing a dynamic
array using the alloc_nr function (not just in dir.c, but
everywhere).
We also move creation of a dir_entry to dir_entry_new.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup_gdg is used as abbreviation for setup_git_directory_gently.
The work tree can be specified using the environment variable
GIT_WORK_TREE and the config option core.worktree (the environment
variable has precendence over the config option). Additionally
there is a command line option --work-tree which sets the
environment variable.
setup_gdg does the following now:
GIT_DIR unspecified
repository in .git directory
parent directory of the .git directory is used as work tree,
GIT_WORK_TREE is ignored
GIT_DIR unspecified
repository in cwd
GIT_DIR is set to cwd
see the cases with GIT_DIR specified what happens next and
also see the note below
GIT_DIR specified
GIT_WORK_TREE/core.worktree unspecified
cwd is used as work tree
GIT_DIR specified
GIT_WORK_TREE/core.worktree specified
the specified work tree is used
Note on the case where GIT_DIR is unspecified and repository is in cwd:
GIT_WORK_TREE is used but is_inside_git_dir is always true.
I did it this way because setup_gdg might be called multiple
times (e.g. when doing alias expansion) and in successive calls
setup_gdg should do the same thing every time.
Meaning of is_bare/is_inside_work_tree/is_inside_git_dir:
(1) is_bare_repository
A repository is bare if core.bare is true or core.bare is
unspecified and the name suggests it is bare (directory not
named .git). The bare option disables a few protective
checks which are useful with a working tree. Currently
this changes if a repository is bare:
updates of HEAD are allowed
git gc packs the refs
the reflog is disabled by default
(2) is_inside_work_tree
True if the cwd is inside the associated working tree (if there
is one), false otherwise.
(3) is_inside_git_dir
True if the cwd is inside the git directory, false otherwise.
Before this patch is_inside_git_dir was always true for bare
repositories.
When setup_gdg finds a repository git_config(git_default_config) is
always called. This ensure that is_bare_repository makes use of
core.bare and does not guess even though core.bare is specified.
inside_work_tree and inside_git_dir are set if setup_gdg finds a
repository. The is_inside_work_tree and is_inside_git_dir functions
will die if they are called before a successful call to setup_gdg.
Signed-off-by: Matthias Lederhofer <matled@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/pack:
Style nit - don't put space after function names
Ensure the pack index is opened before access
Simplify index access condition in count-objects, pack-redundant
Test for recent rev-parse $abbrev_sha1 regression
rev-parse: Identify short sha1 sums correctly.
Attempt to delay prepare_alt_odb during get_sha1
Micro-optimize prepare_alt_odb
Lazily open pack index files on demand
* maint:
git-config: Improve documentation of git-config file handling
git-config: Various small fixes to asciidoc documentation
decode_85(): fix missing return.
fix signed range problems with hex conversions
* maint-1.5.1:
git-config: Improve documentation of git-config file handling
git-config: Various small fixes to asciidoc documentation
decode_85(): fix missing return.
fix signed range problems with hex conversions
Make hexval_table[] "const". Also make sure that the accessor
function hexval() does not access the table with out-of-range
values by declaring its parameter "unsigned char", instead of
"unsigned int".
With this, gcc can just generate:
movzbl (%rdi), %eax
movsbl hexval_table(%rax),%edx
movzbl 1(%rdi), %eax
movsbl hexval_table(%rax),%eax
sall $4, %edx
orl %eax, %edx
for the code to generate a byte from two hex characters.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* db/remote:
Move refspec pattern matching to match_refs().
Update local tracking refs when pushing
Add handlers for fetch-side configuration of remotes.
Move refspec parser from connect.c and cache.h to remote.{c,h}
Move remote parsing into a library file out of builtin-push.
In some repository configurations the user may have many packfiles,
but all of the recent commits/trees/tags/blobs are likely to
be in the most recent packfile (the one with the newest mtime).
It is therefore common to be able to complete an entire operation
by accessing only one packfile, even if there are 25 packfiles
available to the repository.
Rather than opening and mmaping the corresponding .idx file for
every pack found, we now only open and map the .idx when we suspect
there might be an object of interest in there.
Of course we cannot known in advance which packfile contains an
object, so we still need to scan the entire packed_git list to
locate anything. But odds are users want to access objects in the
most recently created packfiles first, and that may be all they
ever need for the current operation.
Junio observed in b867092f that placing recent packfiles before
older ones can slightly improve access times for recent objects,
without degrading it for historical object access.
This change improves upon Junio's observations by trying even harder
to avoid the .idx files that we won't need.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* np/pack:
deprecate the new loose object header format
make "repack -f" imply "pack-objects --no-reuse-object"
allow for undeltified objects not to be reused
As noted by Johan Herland, git-archive is a kind of checkout and needs
to apply any checkout filters that might be configured.
This patch adds the convenience function convert_sha1_file which returns
a buffer containing the object's contents, after converting, if necessary
(i.e. it's a combination of read_sha1_file and convert_to_working_tree).
Direct calls to read_sha1_file in git-archive are then replaced by calls
to convert_sha1_file.
Since convert_sha1_file expects its path argument to be NUL-terminated --
a convention it inherits from convert_to_working_tree -- the patch also
changes the path handling in archive-tar.c to always NUL-terminate the
string. It used to solely rely on the len field of struct strbuf before.
archive-zip.c already NUL-terminates the path and thus needs no such
change.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Make git notify the user about host resolution/connection attempts.
This is useful both as a progress indicator on slow links, and helps
reassure the user there are no firewall problems.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When we are applying a patch that creates a blob at a path, or
when we are switching from a branch that does not have a blob at
the path to another branch that has one, we need to make sure
that there is nothing at the path in the working tree, as such a
file is a local modification made by the user that would be lost
by the operation.
Normally, lstat() on the path and making sure ENOENT is returned
is good enough for that purpose. However there is a twist. We
may be creating a regular file arch/x86_64/boot/Makefile, while
removing an existing symbolic link at arch/x86_64/boot that
points at existing ../i386/boot directory that has Makefile in
it. We always first check without touching filesystem and then
perform the actual operation, so when we verify the new file,
arch/x86_64/boot/Makefile, does not exist, we haven't removed
the symbolic link arc/x86_64/boot symbolic link yet. lstat() on
the file sees through the symbolic link and reports the file is
there, which is not what we want.
The function has_symlink_leading_path() function takes a path,
and sees if any of the leading directory component is a symbolic
link.
When files in a new directory are created, we tend to process
them together because both index and tree are sorted. The
function takes advantage of this and allows the caller to cache
and reuse which symbolic link on the filesystem caused the
function to return true.
The calling sequence would be:
char last_symlink[PATH_MAX];
*last_symlink = '\0';
for each index entry {
if (!lose)
continue;
if (lstat(it))
if (errno == ENOENT)
; /* happy */
else
error;
else if (has_symlink_leading_path(it, last_symlink))
; /* happy */
else
error; /* would lose local changes */
unlink_entry(it, last_symlink);
}
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now that we encourage and actively preserve objects in a packed form
more agressively than we did at the time the new loose object format and
core.legacyheaders were introduced, that extra loose object format
doesn't appear to be worth it anymore.
Because the packing of loose objects has to go through the delta match
loop anyway, and since most of them should end up being deltified in
most cases, there is really little advantage to have this parallel loose
object format as the CPU savings it might provide is rather lost in the
noise in the end.
This patch gets rid of core.legacyheaders, preserve the legacy format as
the only writable loose object format and deprecate the other one to
keep things simpler.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds --date={local,relative,default} option to log family of commands,
to allow displaying timestamps in user's local timezone, relative time, or
the default format.
Existing --relative-date option is a synonym of --date=relative; we could
probably deprecate it in the long run.
Signed-off-by: Junio C Hamano <junkio@cox.net>
get_sha1_with_mode basically behaves as get_sha1. It has an additional
parameter for storing the mode of the object.
If the mode can not be determined, it stores S_IFINVALID.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
S_IFINVALID is used to signal, that no mode information is available.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes all low-level functions defined in read-cache.c to
take an explicit index_state structure as their first parameter,
to specify which index to work on. These functions
traditionally operated on "the_index" and were named foo_cache();
the counterparts this patch introduces are called foo_index().
The traditional foo_cache() functions are made into macros that
give "the_index" to their corresponding foo_index() functions.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This defines a index_state structure and moves index-related
global variables into it. Currently there is one instance of
it, the_index, and everybody accesses it, so there is no code
change.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* 'jc/attr': (28 commits)
lockfile: record the primary process.
convert.c: restructure the attribute checking part.
Fix bogus linked-list management for user defined merge drivers.
Simplify calling of CR/LF conversion routines
Document gitattributes(5)
Update 'crlf' attribute semantics.
Documentation: support manual section (5) - file formats.
Simplify code to find recursive merge driver.
Counto-fix in merge-recursive
Fix funny types used in attribute value representation
Allow low-level driver to specify different behaviour during internal merge.
Custom low-level merge driver: change the configuration scheme.
Allow the default low-level merge driver to be configured.
Custom low-level merge driver support.
Add a demonstration/test of customized merge.
Allow specifying specialized merge-backend per path.
merge-recursive: separate out xdl_merge() interface.
Allow more than true/false to attributes.
Document git-check-attr
Change attribute negation marker from '!' to '-'.
...
* lt/gitlink:
Tests for core subproject support
Expose subprojects as special files to "git diff" machinery
Fix some "git ls-files -o" fallout from gitlinks
Teach "git-read-tree -u" to check out submodules as a directory
Teach git list-objects logic to not follow gitlinks
Fix gitlink index entry filesystem matching
Teach "git-read-tree -u" to check out submodules as a directory
Teach git list-objects logic not to follow gitlinks
Don't show gitlink directories when we want "other" files
Teach git-update-index about gitlinks
Teach directory traversal about subprojects
Fix thinko in subproject entry sorting
Teach core object handling functions about gitlinks
Teach "fsck" not to follow subproject links
Add "S_IFDIRLNK" file mode infrastructure for git links
Add 'resolve_gitlink_ref()' helper function
Avoid overflowing name buffer in deep directory structures
diff-lib: use ce_mode_from_stat() rather than messing with modes manually
* np/pack: (27 commits)
document --index-version for index-pack and pack-objects
pack-objects: remove obsolete comments
pack-objects: better check_object() performances
add get_size_from_delta()
pack-objects: make in_pack_header_size a variable of its own
pack-objects: get rid of create_final_object_list()
pack-objects: get rid of reuse_cached_pack
pack-objects: clean up list sorting
pack-objects: rework check_delta_limit usage
pack-objects: equal objects in size should delta against newer objects
pack-objects: optimize preferred base handling a bit
clean up add_object_entry()
tests for various pack index features
use test-genrandom in tests instead of /dev/urandom
simple random data generator for tests
validate reused pack data with CRC when possible
allow forcing index v2 and 64-bit offset treshold
pack-redundant.c: learn about index v2
show-index.c: learn about index v2
sha1_file.c: learn about index version 2
...
The usual process flow is the main process opens and holds the lock to
the index, does its thing, perhaps spawning children during the course,
and then writes the resulting index out by releaseing the lock.
However, the lockfile interface uses atexit(3) to clean it up, without
regard to who actually created the lock. This typically leads to a
confusing behaviour of lock being released too early when the child
exits, and then the parent process when it calls commit_lockfile()
finds that it cannot unlock it.
This fixes the problem by recording who created and holds the lock, and
upon atexit(3) handler, child simply ignores the lockfile the parent
created.
Signed-off-by: Junio C Hamano <junkio@cox.net>
delete_ref function does not change the 'sha1' parameter. Non-const pointer
causes a compiler warning if you call to the function using a const argument.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces the fairly odd "created_object()" function that did _most_
of the object setup with a more complete "create_object()" function that
also has a more natural calling convention.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We used to use a different allocator scheme for when we didn't know the
object type. That meant that objects that were created without any
up-front knowledge of the type would not go through the same allocation
paths as normal object allocations, and would miss out on the statistics.
But perhaps more importantly than the statistics (that are useful when
looking at memory usage but not much else), if we want to make the
object hash tables use a denser object pointer representation, we need
to make sure that they all go through the same blocking allocator.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
... which consists of existing code split out of packed_delta_info()
for other callers to use it as well.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds "attribute macros" (for lack of better name). So far,
we have low-level attributes such as crlf and diff, which are
defined in operational terms --- setting or unsetting them on a
particular path directly affects what is done to the path. For
example, in order to decline diffs or crlf conversions on a
binary blob, no diffs on PostScript files, and treat all other
files normally, you would have something like these:
* diff crlf
*.ps !diff
proprietary.o !diff !crlf
That is fine as the operation goes, but gets unwieldy rather
rapidly, when we start adding more low-level attributes that are
defined in operational terms. A near-term example of such an
attribute would be 'merge-3way' which would control if git
should attempt the usual 3-way file-level merge internally, or
leave merging to a specialized external program of user's
choice. When it is added, we do _not_ want to force the users
to update the above to:
* diff crlf merge-3way
*.ps !diff
proprietary.o !diff !crlf !merge-3way
The way this patch solves this issue is to realize that the
attributes the user is assigning to paths are not defined in
terms of operations but in terms of what they are.
All of the three low-level attributes usually make sense for
most of the files that sane SCM users have git operate on (these
files are typically called "text'). Only a few cases, such as
binary blob, need exception to decline the "usual treatment
given to text files" -- and people mark them as "binary".
So this allows the $GIT_DIR/info/alternates and .gitattributes
at the toplevel of the project to also specify attributes that
assigns other attributes. The syntax is '[attr]' followed by an
attribute name followed by a list of attribute names:
[attr] binary !diff !crlf !merge-3way
When "binary" attribute is set to a path, if the path has not
got diff/crlf/merge-3way attribute set or unset by other rules,
this rule unsets the three low-level attributes.
It is expected that the user level .gitattributes will be
expressed mostly in terms of attributes based on what the files
are, and the above sample would become like this:
(built-in attribute configuration)
[attr] binary !diff !crlf !merge-3way
* diff crlf merge-3way
(project specific .gitattributes)
proprietary.o binary
(user preference $GIT_DIR/info/attributes)
*.ps !diff
There are a few caveats.
* As described above, you can define these macros only in
$GIT_DIR/info/attributes and toplevel .gitattributes.
* There is no attempt to detect circular definition of macro
attributes, and definitions are evaluated from bottom to top
as usual to fill in other attributes that have not yet got
values. The following would work as expected:
[attr] text diff crlf
[attr] ps text !diff
*.ps ps
while this would most likely not (I haven't tried):
[attr] ps text !diff
[attr] text diff crlf
*.ps ps
* When a macro says "[attr] A B !C", saying that a path does
not have attribute A does not let you tell anything about
attributes B or C. That is, given this:
[attr] text diff crlf
[attr] ps text !diff
*.txt !ps
path hello.txt, which would match "*.txt" pattern, would have
"ps" attribute set to zero, but that does not make text
attribute of hello.txt set to false (nor diff attribute set to
true).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds the basic infrastructure to assign attributes to
paths, in a way similar to what the exclusion mechanism does
based on $GIT_DIR/info/exclude and .gitignore files.
An attribute is just a simple string that does not contain any
whitespace. They can be specified in $GIT_DIR/info/attributes
file, and .gitattributes file in each directory.
Each line in these files defines a pattern matching rule.
Similar to the exclusion mechanism, a later match overrides an
earlier match in the same file, and entries from .gitattributes
file in the same directory takes precedence over the ones from
parent directories. Lines in $GIT_DIR/info/attributes file are
used as the lowest precedence default rules.
A line is either a comment (an empty line, or a line that begins
with a '#'), or a rule, which is a whitespace separated list of
tokens. The first token on the line is a shell glob pattern.
The rest are names of attributes, each of which can optionally
be prefixed with '!'. Such a line means "if a path matches this
glob, this attribute is set (or unset -- if the attribute name
is prefixed with '!'). For glob matching, the same "if the
pattern does not have a slash in it, the basename of the path is
matched with fnmatch(3) against the pattern, otherwise, the path
is matched with the pattern with FNM_PATHNAME" rule as the
exclusion mechanism is used.
This does not define what an attribute means. Tying an
attribute to various effects it has on git operation for paths
that have it will be specified separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
GIT 1.5.1.1
cvsserver: Fix handling of diappeared files on update
fsck: do not complain on detached HEAD.
(encode_85, decode_85): Mark source buffer pointer as "const".
This just adds the basic helper functions to recognize and work with git
tree entries that are links to other git repositories ("subprojects").
They still aren't actually connected up to any of the code-paths, but
now all the infrastructure is in place.
The next commit will start actually adding actual subproject support.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The coming index format change doesn't allow for the number of objects
to be determined from the size of the index file directly. Instead, Let's
initialize a field in the packed_git structure with the object count when
the index is validated since the count is always known at that point.
While at it let's reorder some struct packed_git fields to avoid padding
due to needed 64-bit alignment for some of them.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This merge strategy largely piggy-backs on git-merge-recursive.
When merging trees A and B, if B corresponds to a subtree of A,
B is first adjusted to match the tree structure of A, instead of
reading the trees at the same level. This adjustment is also
done to the common ancestor tree.
If you are pulling updates from git-gui repository into git.git
repository, the root level of the former corresponds to git-gui/
subdirectory of the latter. The tree object of git-gui's toplevel
is wrapped in a fake tree object, whose sole entry has name 'git-gui'
and records object name of the true tree, before being used by
the 3-way merge code.
If you are merging the other way, only the git-gui/ subtree of
git.git is extracted and merged into git-gui's toplevel.
The detection of corresponding subtree is done by comparing the
pathnames and types in the toplevel of the tree.
Heuristics galore! That's the git way ;-).
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/index-output:
git-read-tree --index-output=<file>
_GIT_INDEX_OUTPUT: allow plumbing to output to an alternative index file.
Conflicts:
builtin-apply.c
This function was not called "add_file_to_cache()" only because
an ancient program, update-cache, used that name as an internal
function name that does something slightly different. Now that
is gone, we can take over the better name.
The plan is to name all functions that operate on the default
index xxx_cache(). Later patches create a variant of them that
take an explicit parameter xxx_index(), and then turn
xxx_cache() functions into macros that use "the_index".
Signed-off-by: Junio C Hamano <junkio@cox.net>
This error message should not usually trigger, but the function
make_cache_entry() called by add_cacheinfo() can return early
without calling into refresh_cache_entry() that sets cache_errno.
Also the error message had a wrong function name reported, and
it did not say anything about which path failed either.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Let's avoid the open coded pack index reference in pack-object and use
nth_packed_object_sha1() instead. This will help encapsulating index
format differences in one place.
And while at it there is no reason to copy SHA1's over and over while a
direct pointer to it in the index will do just fine.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This corrects the interface mistake of the previous one, and
gives a command line parameter to the only plumbing command that
currently needs it: "git-read-tree".
We can add the calls to set_alternate_index_output() to other
plumbing commands that update the index if/when needed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When defined, this allows plumbing commands that update the
index (add, apply, checkout-index, merge-recursive, mv,
read-tree, rm, update-index, and write-tree) to write their
resulting index to an alternative index file while holding a
lock to the original index file. With this, git-commit that
jumps the index does not have to make an extra copy of the index
file, and more importantly, it can do the update while holding
the lock on the index.
However, I think the interface to let an environment variable
specify the output is a mistake, as shown in the documentation.
If a curious user has the environment variable set to something
other than the file GIT_INDEX_FILE points at, almost everything
will break. This should instead be a command line parameter to
tell these plumbing commands to write the result in the named
file, to prevent stupid mistakes.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Use hash_sha1_file() instead of duplicating code to compute object SHA1.
While at it make it accept a const pointer.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new configuration variable core.deltaBaseCacheLimit allows the
user to control how much memory they are willing to give to Git for
caching base objects of deltas. This is not normally meant to be
a user tweakable knob; the "out of the box" settings are meant to
be suitable for almost all workloads.
We default to 16 MiB under the assumption that the cache is not
meant to consume all of the user's available memory, and that the
cache's main purpose was to cache trees, for faster path limiters
during revision traversal. Since trees tend to be relatively small
objects, this relatively small limit should still allow a large
number of objects.
On the other hand we don't want the cache to start storing 200
different versions of a 200 MiB blob, as this could easily blow
the entire address space of a 32 bit process.
We evict OBJ_BLOB from the cache first (credit goes to Junio) as
we want to favor OBJ_TREE within the cache. These are the objects
that have the highest inflate() startup penalty, as they tend to
be small and thus don't have that much of a chance to ammortize
that penalty over the entire data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* sp/run-command:
Use run_command within send-pack
Use run_command within receive-pack to invoke index-pack
Use run_command within merge-index
Use run_command for proxy connections
Use RUN_GIT_CMD to run push backends
Correct new compiler warnings in builtin-revert
Replace fork_with_pipe in bundle with run_command
Teach run-command to redirect stdout to /dev/null
Teach run-command about stdout redirection
Especially with the new index format to come, it is more appropriate
to encapsulate more into check_packed_git_idx() and assume less of the
index format in struct packed_git.
To that effect, the index_base is renamed to index_data with void * type
so it is not used directly but other pointers initialized with it. This
allows for a couple pointer cast removal, as well as providing a better
generic name to grep for when adding support for new index versions or
formats.
And index_data is declared const too while at it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new builtin-revert code introduces a few new compiler errors
when I'm building with my stricter set of checks enabled in CFLAGS.
These all just stem from trying to store a constant string into
a non-const char*. Simple fix, make the variables const char*.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When accessing objects, we first look for them in packs that
are linked together in the reverse order of discovery.
Since younger packs tend to contain more recent objects, which
are more likely to be accessed often, and local packs tend to
contain objects more relevant to our specific projects, sort the
list of packs before starting to access them. In addition,
favoring local packs over the ones borrowed from alternates can
be a win when alternates are mounted on network file systems.
Signed-off-by: Junio C Hamano <junkio@cox.net>
In order to track and build on top of a branch 'topic' you track from
your upstream repository, you often would end up doing this sequence:
git checkout -b mytopic origin/topic
git config --add branch.mytopic.remote origin
git config --add branch.mytopic.merge refs/heads/topic
This would first fork your own 'mytopic' branch from the 'topic'
branch you track from the 'origin' repository; then it would set up two
configuration variables so that 'git pull' without parameters does the
right thing while you are on your own 'mytopic' branch.
This commit adds a --track option to git-branch, so that "git
branch --track mytopic origin/topic" performs the latter two actions
when creating your 'mytopic' branch.
If the configuration variable branch.autosetupmerge is set to true, you
do not have to pass the --track option explicitly; further patches in
this series allow setting the variable with a "git remote add" option.
The configuration variable is off by default, and there is a --no-track
option to countermand it even if the variable is set.
Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Not all platforms have declared 'unsigned long' to be a 64 bit value,
but we want to support a 64 bit packfile (or close enough anyway)
in the near future as some projects are getting large enough that
their packed size exceeds 4 GiB.
By using off_t, the POSIX type that is declared to mean an offset
within a file, we support whatever maximum file size the underlying
operating system will handle. For most modern systems this is up
around 2^60 or higher.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
As we permit up to 2^32-1 objects in a single packfile we cannot
use a signed int to represent the object offset within a packfile,
after 2^31-1 objects we will start seeing negative indexes and
error out or compute bad addresses within the mmap'd index.
This is a minor cleanup that does not introduce any significant
logic changes. It is roach free.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We did not detect broken loose object files, either when
underlying inflate() signalled the breakage, nor inflate()
finished and we had garbage trailing at the end. We do better
now.
We also make unpack_sha1_file() a static function to
sha1_file.c, since it is not used by anybody outside.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some file systems that can host git repositories and their working copies
do not support symbolic links. But then if the repository contains a symbolic
link, it is impossible to check out the working copy.
This patch enables partial support of symbolic links so that it is possible
to check out a working copy on such a file system. A new flag
core.symlinks (which is true by default) can be set to false to indicate
that the filesystem does not support symbolic links. In this case, symbolic
links that exist in the trees are checked out as small plain files, and
checking in modifications of these files preserve the symlink property in
the database (as long as an entry exists in the index).
Of course, this does not magically make symbolic links work on such defective
file systems; hence, this solution does not help if the working copy relies
on that an entry is a real symbolic link.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows for keeping the common idiom which consists of using
negative values to signal error conditions by ensuring that the enum
will be a signed type.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, show_date() can print three different kinds of dates: normal,
relative and short (%Y-%m-%s) dates.
To achieve this, the "int relative" was changed to "enum date_mode
mode", which has three states: DATE_NORMAL, DATE_RELATIVE and
DATE_SHORT.
Since existing users of show_date() only call it with relative_date
being either 0 or 1, and DATE_NORMAL and DATE_RELATIVE having these
values, no behaviour is changed.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* lt/crlf:
Teach core.autocrlf to 'git apply'
t0020: add test for auto-crlf
Make AutoCRLF ternary variable.
Lazy man's auto-CRLF
* jc/apply-config:
t4119: test autocomputing -p<n> for traditional diff input.
git-apply: guess correct -p<n> value for non-git patches.
git-apply: notice "diff --git" patch again
Fix botched "leak fix"
t4119: add test for traditional patch and different p_value
apply: fix memory leak in prefix_one()
git-apply: require -p<n> when working in a subdirectory.
git-apply: do not lose cwd when run from a subdirectory.
Teach 'git apply' to look at $HOME/.gitconfig even outside of a repository
Teach 'git apply' to look at $GIT_DIR/config
When we do not trust executable bit from lstat(2), we copied
existing ce_mode bits without checking if the filesystem object
is a regular file (which is the only thing we apply the "trust
executable bit" business) nor if the blob in the index is a
regular file (otherwise, we should do the same as registering a
new regular file, which is to default non-executable).
Noticed by Johannes Sixt.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since "git log origin/master" uses dwim_log() to match
"refs/remotes/origin/master", it makes sense to do that for
"git log --reflog", too.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>