When running a required clean filter, we do not have to mmap the
original before feeding the filter. Instead, stream the file
contents directly to the filter and process its output.
* sp/stream-clean-filter:
sha1_file: don't convert off_t to size_t too early to avoid potential die()
convert: stream from fd to required clean filter to reduce used address space
copy_fd(): do not close the input file descriptor
mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
config.c: add git_env_ulong() to parse environment variable
convert: drop arguments other than 'path' from would_convert_to_git()
Optimize the code path to write out the packed-refs file, which
especially matters in a repository with a large number of refs.
* jk/write-packed-refs-via-stdio:
refs: write packed_refs file using stdio
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.
* ta/config-add-to-empty-or-true-fix:
config: avoid a funny sentinel value "a^"
make config --add behave correctly for empty and NULL values
Code clean-up.
* jk/commit-author-parsing:
determine_author_info(): copy getenv output
determine_author_info(): reuse parsing functions
date: use strbufs in date-formatting functions
record_author_date(): use find_commit_header()
record_author_date(): fix memory leak on malformed commit
commit: provide a function to find a header in a buffer
"log --date=iso" uses a slight variant of ISO 8601 format that is
made more human readable. A new "--date=iso-strict" option gives
datetime output that is more strictly conformant.
* bb/date-iso-strict:
pretty: provide a strict ISO 8601 date format
Introduce CONFIG_REGEX_NONE as a more explicit sentinel value to say
"we do not want to replace any existing entry" and use it in the
implementation of "git config --add".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new caching config-set API in git_config() calls.
* ta/config-set-1:
add tests for `git_config_get_string_const()`
add a test for semantic errors in config files
rewrite git_config() to use the config-set API
config: add `git_die_config()` to the config-set API
change `git_config()` return value to void
add line number and file name info to `config_set`
config.c: fix accuracy of line number in errors
config.c: mark error and warnings strings for translation
We write each line of a new packed-refs file individually
using a write() syscall (and sometimes 2, if the ref is
peeled). Since each line is only about 50-100 bytes long,
this creates a lot of system call overhead.
We can instead open a stdio handle around our descriptor and
use fprintf to write to it. The extra buffering is not a
problem for us, because nobody will read our new packed-refs
file until we call commit_lock_file (by which point we have
flushed everything).
On a pathological repository with 8.5 million refs, this
dropped the time to run `git pack-refs` from 20s to 6s.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add in-core caching layer to let us avoid reading the same
configuration files number of times.
* ta/config-set:
test-config: add tests for the config_set API
add `config_set` API for caching config-like files
Start "git config --edit --global" from a skeletal per-user
configuration file contents, instead of a total blank, when the
user does not already have any. This immediately reduces the need
for a later "Have you forgotten setting core.user?" and we can add
more to the template as we gain more experience.
* mm/config-edit-global:
commit: advertise config --global --edit on guessed identity
home_config_paths(): let the caller ignore xdg path
config --global --edit: create a template file if needed
There are cases where you lock and open to write a file, close it to
show the updated contents to external processes, and then have to
update the file again while still holding the lock, but the lockfile
API lacked support for such an access pattern.
* jc/reopen-lock-file:
lockfile: allow reopening a closed but still locked file
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.
The output from "--date=iso" deviates from ISO 8601 in these ways:
- a space instead of the `T` date/time delimiter
- a space between time and time zone
- no colon between hours and minutes of the time zone
Add a strict ISO 8601 date format for displaying committer and
author dates. Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.
See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.
Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new function parses an integeral value that fits in unsigned
long in human readable form, i.e. possibly with unit suffix, e.g.
10k = 10240, etc., from an environment variable. Parsing of
GIT_MMAP_LIMIT and GIT_ALLOC_LIMIT will use it in later patches.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).
Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:
1. In test-date.c, where we don't care about efficiency.
2. In determine_author_info, which is not performance
critical (and where the use of a strbuf will help later
refactoring).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Of all the functions in `git_config*()` family, `git_config()` has the
most invocations in the whole code base. Each `git_config()` invocation
causes config file rereads which can be avoided using the config-set API.
Use the config-set API to rewrite `git_config()` to use the config caching
layer to avoid config file rereads on each invocation during a git process
lifetime. First invocation constructs the cache, and after that for each
successive invocation, `git_config()` feeds values from the config cache
instead of rereading the configuration files.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add `git_die_config` that dies printing the line number and the file name
of the highest priority value for the configuration variable `key`. A custom
error message is also printed before dying, specified by the caller, which can
be skipped if `err` argument is set to NULL.
It has usage in non-callback based config value retrieval where we can
raise an error and die if there is a semantic error.
For example,
if (!git_config_get_value(key, &value)){
if (!strcmp(value, "foo"))
git_config_die(key, "value: `%s` is illegal", value);
else
/* do work */
}
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently `git_config()` returns an integer signifying an error code.
During rewrites of the function most of the code was shifted to
`git_config_with_options()`. `git_config_with_options()` normally
returns positive values if its `config_source` parameter is set as NULL,
as most errors are fatal, and non-fatal potential errors are guarded
by "if" statements that are entered only when no error is possible.
Still a negative value can be returned in case of race condition between
`access_or_die()` & `git_config_from_file()`. Also, all callers of
`git_config()` ignore the return value except for one case in branch.c.
Change `git_config()` return value to void and make it die if it receives
a negative value from `git_config_with_options()`.
Original-patch-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Store file name and line number for each key-value pair in the cache
during parsing of the configuration files.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.
Add a `config_set`, that can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets.
`git_configset_get_value` follows `last one wins` semantic (i.e. if there
are multiple matches for the queried key in the files of the configset the
value returned will be the last entry in `value_list`).
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list). Add
type specific query functions like `git_configset_get_bool` and similar.
Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files (repo specific .git/config, user wide
~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set`
is populated using `git_config()`.
Add two external functions `git_config_get_value` and
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.
While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user has no ~/.gitconfig file, git config --global --edit used
to launch an editor on an nonexistant file name.
Instead, create a file with a default content before launching the
editor. The template contains only commented-out entries, to save a few
keystrokes for the user. If the values are guessed properly, the user
will only have to uncomment the entries.
Advanced users teaching newbies can create a minimalistic configuration
faster for newbies. Beginners reading a tutorial advising to run "git
config --global --edit" as a first step will be slightly more guided for
their first contact with Git.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.
* jk/alloc-commit-id:
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
* kb/perf-trace:
api-trace.txt: add trace API documentation
progress: simplify performance measurement by using getnanotime()
wt-status: simplify performance measurement by using getnanotime()
git: add performance tracing for git's main() function to debug scripts
trace: add trace_performance facility to debug performance issues
trace: add high resolution timer function to debug performance issues
trace: add 'file:line' to all trace output
trace: move code around, in preparation to file:line output
trace: add current timestamp to all trace output
trace: disable additional trace output for unit tests
trace: add infrastructure to augment trace output with additional info
sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
trace: improve trace performance
trace: remove redundant printf format attribute
trace: consistently name the format parameter
trace: move trace declarations from cache.h to new trace.h
Early part of the "ref transaction" topic.
* rs/ref-transaction-0:
refs.c: change ref_transaction_update() to do error checking and return status
refs.c: remove the onerr argument to ref_transaction_commit
update-ref: use err argument to get error from ref_transaction_commit
refs.c: make update_ref_write update a strbuf on failure
refs.c: make ref_update_reject_duplicates take a strbuf argument for errors
refs.c: log_ref_write should try to return meaningful errno
refs.c: make resolve_ref_unsafe set errno to something meaningful on error
refs.c: commit_packed_refs to return a meaningful errno on failure
refs.c: make remove_empty_directories always set errno to something sane
refs.c: verify_lock should set errno to something meaningful
refs.c: make sure log_ref_setup returns a meaningful errno
refs.c: add an err argument to repack_without_refs
lockfile.c: make lock_file return a meaningful errno on failurei
lockfile.c: add a new public function unable_to_lock_message
refs.c: add a strbuf argument to ref_transaction_commit for error logging
refs.c: allow passing NULL to ref_transaction_free
refs.c: constify the sha arguments for ref_transaction_create|delete|update
refs.c: ref_transaction_commit should not free the transaction
refs.c: remove ref_transaction_rollback
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.
* nd/split-index: (32 commits)
t1700: new tests for split-index mode
t2104: make sure split index mode is off for the version test
read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
read-tree: note about dropping split-index mode or index version
read-tree: force split-index mode off on --index-output
rev-parse: add --shared-index-path to get shared index path
update-index --split-index: do not split if $GIT_DIR is read only
update-index: new options to enable/disable split index mode
split-index: strip pathname of on-disk replaced entries
split-index: do not invalidate cache-tree at read time
split-index: the reading part
split-index: the writing part
read-cache: mark updated entries for split index
read-cache: save deleted entries in split index
read-cache: mark new entries for split index
read-cache: split-index mode
read-cache: save index SHA-1 after reading
entry.c: update cache_changed if refresh_cache is set in checkout_entry()
cache-tree: mark istate->cache_changed on prime_cache_tree()
cache-tree: mark istate->cache_changed on cache tree update
...
In some code paths (e.g. giving "add -i" to prepare the contents to
be committed interactively inside "commit -p") where a caller takes
a lock, writes the new content, give chance for others to use it
while still holding the lock, and then releases the lock when all is
done. As an extension, allow the caller to re-update an already
closed file while still holding the lock (i.e. not yet committed) by
re-opening the file, to be followed by updating the contents and
then by the usual close_lock_file() or commit_lock_file().
This is necessary if we want to add code to rebuild the cache-tree
and write the resulting index out after "add -i" returns the control
to "commit -p", for example.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Making errno when returning from resolve_ref_unsafe() meaningful,
which should fix
* a bug in lock_ref_sha1_basic, where it assumes EISDIR
means it failed due to a directory being in the way
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Introducing a new unable_to_lock_message helper, which has nicer
semantics than unable_to_lock_error and cleans up lockfile.c a little.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.
While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename cache_def_free to cache_def_clear as it doesn't free the struct
cache_def, but just clears its content.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git checkout' fails if a directory is longer than PATH_MAX, because the
lstat_cache in symlinks.c checks if the leading directory exists using
PATH_MAX-bounded string operations.
Remove the limitation by using strbuf instead.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running. Detect such a race and avoid overwriting the index.
* ym/fix-opportunistic-index-update-race:
read-cache.c: verify index file before we opportunistically update it
wrapper.c: add xpread() similar to xread()
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.
To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically. In addition, the cache_name_compare() function in
read-cache.c is nearly identical. The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.
Unify these three functions by using the implementation from
cache_name_compare(). This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also include direct dependencies (strbuf.h and git-compat-util.h for
__attribute__) so that trace.h can be used independently of cache.h, e.g.
in test programs.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)
This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.
Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.
The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>