Teach the codepaths that read .gitignore and .gitattributes files
that these files encoded in UTF-8 may have UTF-8 BOM marker at the
beginning; this makes it in line with what we do for configuration
files already.
* cn/bom-in-gitignore:
attr: skip UTF8 BOM at the beginning of the input file
config: use utf8_bom[] from utf.[ch] in git_parse_source()
utf8-bom: introduce skip_utf8_bom() helper
add_excludes_from_file: clarify the bom skipping logic
dir: allow a BOM at the beginning of exclude files
Because the function reads one character at the time, unfortunately
we cannot use the easier skip_utf8_bom() helper, but at least we do
not have to duplicate the constant string this way.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reading configuration from a blob object, when it ends with a lone
CR, use to confuse the configuration parser.
* jk/config-no-ungetc-eof:
config_buf_ungetc: warn when pushing back a random character
config: do not ungetc EOF
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".
* jk/blame-commit-label:
blame.c: fix garbled error message
use xstrdup_or_null to replace ternary conditionals
builtin/commit.c: use xstrdup_or_null instead of envdup
builtin/apply.c: use xstrdup_or_null instead of null_strdup
git-compat-util: add xstrdup_or_null helper
Our config code simulates a stdio stream around a buffer,
but our fake ungetc() does not behave quite like the real
one. In particular, we only rewind the position by one
character, but do _not_ actually put the character from the
caller into position.
It turns out that this does not matter, because we only ever
push back the character we just read. In other words, such
an assignment would be a noop. But because the function is
called ungetc, and because it takes a character parameter,
it is a mistake waiting to happen.
Actually assigning the character into the buffer would be
ideal, but our pointer is actually a "const" copy of the
buffer. We do not know who the real owner of the buffer is
in this code, and would not want to munge their contents.
Instead, we can simply add an assertion that matches what
the current caller does, and will let us know if new callers
are added that violate the contract.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are parsing a config value, if we see a carriage
return, we fgetc the next character to see if it is a
line feed (in which case we silently drop the CR). If it
isn't, we then ungetc the character, and take the literal
CR.
But we never check whether we in fact got a character at
all. If the config file ends in CR, we will get EOF here,
and try to ungetc EOF. This works OK for a real stdio
stream. The ungetc returns an error, and the next fgetc will
then return EOF again.
However, our custom buffer-based stream is not so fortunate.
It happily rewinds the position of the stream by one
character, ignoring the fact that we fed it EOF. The next
fgetc call returns the final CR again, over and over, and we
end up in an infinite loop.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x).
The change is fairly mechanical, with the exception of
resolve_refdup, which can eliminate a temporary variable.
There are still a few hits grepping for "?.*xstrdup", but
these are of slightly different forms and cannot be
converted (e.g., "x ? xstrdup(x->foo) : NULL").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on NTFS nor FAT32.
In practice this probably doesn't matter, though, as
the restricted names are rather obscure and almost
certainly would never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on HFS+. In practice
this probably doesn't matter, though, as the restricted
names are rather obscure and almost certainly would
never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using abs() on long values can cause truncation, so use labs() instead.
Reported by Clang 3.5 (-Wabsolute-value, enabled by -Wall).
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lockfile API and its users have been cleaned up.
* mh/lockfile: (38 commits)
lockfile.h: extract new header file for the functions in lockfile.c
hold_locked_index(): move from lockfile.c to read-cache.c
hold_lock_file_for_append(): restore errno before returning
get_locked_file_path(): new function
lockfile.c: rename static functions
lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
commit_lock_file_to(): refactor a helper out of commit_lock_file()
trim_last_path_component(): replace last_path_elm()
resolve_symlink(): take a strbuf parameter
resolve_symlink(): use a strbuf for internal scratch space
lockfile: change lock_file::filename into a strbuf
commit_lock_file(): use a strbuf to manage temporary space
try_merge_strategy(): use a statically-allocated lock_file object
try_merge_strategy(): remove redundant lock_file allocation
struct lock_file: declare some fields volatile
lockfile: avoid transitory invalid states
git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
dump_marks(): remove a redundant call to rollback_lock_file()
api-lockfile: document edge cases
commit_lock_file(): rollback lock file on failure to rename
...
When running a required clean filter, we do not have to mmap the
original before feeding the filter. Instead, stream the file
contents directly to the filter and process its output.
* sp/stream-clean-filter:
sha1_file: don't convert off_t to size_t too early to avoid potential die()
convert: stream from fd to required clean filter to reduce used address space
copy_fd(): do not close the input file descriptor
mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
config.c: add git_env_ulong() to parse environment variable
convert: drop arguments other than 'path' from would_convert_to_git()
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value. (That will be fixed in a moment.)
Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file. But lock_file objects are often reused. By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After commit_lock_file() is called, then the lock_file object is
necessarily either committed or rolled back. So there is no need to
call rollback_lock_file() again in either of these cases.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.
* ta/config-add-to-empty-or-true-fix:
config: avoid a funny sentinel value "a^"
make config --add behave correctly for empty and NULL values
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.
* ta/config-add-to-empty-or-true-fix:
config: avoid a funny sentinel value "a^"
make config --add behave correctly for empty and NULL values
Introduce CONFIG_REGEX_NONE as a more explicit sentinel value to say
"we do not want to replace any existing entry" and use it in the
implementation of "git config --add".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new caching config-set API in git_config() calls.
* ta/config-set-1:
add tests for `git_config_get_string_const()`
add a test for semantic errors in config files
rewrite git_config() to use the config-set API
config: add `git_die_config()` to the config-set API
change `git_config()` return value to void
add line number and file name info to `config_set`
config.c: fix accuracy of line number in errors
config.c: mark error and warnings strings for translation
"git -c section.var command" and "git -c section.var= command"
should pass the configuration differently (the former should be
a boolean true, the latter should be an empty string).
* jk/command-line-config-empty-string:
config: teach "git -c" to recognize an empty string
Add in-core caching layer to let us avoid reading the same
configuration files number of times.
* ta/config-set:
test-config: add tests for the config_set API
add `config_set` API for caching config-like files
Instead of using skip_prefix() to check the first part of the string
and then strcmp() to check the rest, simply use strcmp() to check the
whole string.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new function parses an integeral value that fits in unsigned
long in human readable form, i.e. possibly with unit suffix, e.g.
10k = 10240, etc., from an environment variable. Parsing of
GIT_MMAP_LIMIT and GIT_ALLOC_LIMIT will use it in later patches.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently if we have a config file like,
[foo]
baz
bar =
and we try something like, "git config --add foo.baz roll", Git will
segfault. Moreover, for "git config --add foo.bar roll", it will
overwrite the original value instead of appending after the existing
empty value.
The problem lies with the regexp used for simulating --add in
`git_config_set_multivar_in_file()`, "^$", which in ideal case should
not match with any string but is true for empty strings. Instead use a
regexp like "a^" which can not be true for any string, empty or not.
For removing the segfault add a check for NULL values in `matches()` in
config.c.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Of all the functions in `git_config*()` family, `git_config()` has the
most invocations in the whole code base. Each `git_config()` invocation
causes config file rereads which can be avoided using the config-set API.
Use the config-set API to rewrite `git_config()` to use the config caching
layer to avoid config file rereads on each invocation during a git process
lifetime. First invocation constructs the cache, and after that for each
successive invocation, `git_config()` feeds values from the config cache
instead of rereading the configuration files.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add `git_die_config` that dies printing the line number and the file name
of the highest priority value for the configuration variable `key`. A custom
error message is also printed before dying, specified by the caller, which can
be skipped if `err` argument is set to NULL.
It has usage in non-callback based config value retrieval where we can
raise an error and die if there is a semantic error.
For example,
if (!git_config_get_value(key, &value)){
if (!strcmp(value, "foo"))
git_config_die(key, "value: `%s` is illegal", value);
else
/* do work */
}
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently `git_config()` returns an integer signifying an error code.
During rewrites of the function most of the code was shifted to
`git_config_with_options()`. `git_config_with_options()` normally
returns positive values if its `config_source` parameter is set as NULL,
as most errors are fatal, and non-fatal potential errors are guarded
by "if" statements that are entered only when no error is possible.
Still a negative value can be returned in case of race condition between
`access_or_die()` & `git_config_from_file()`. Also, all callers of
`git_config()` ignore the return value except for one case in branch.c.
Change `git_config()` return value to void and make it die if it receives
a negative value from `git_config_with_options()`.
Original-patch-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Store file name and line number for each key-value pair in the cache
during parsing of the configuration files.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a callback returns a negative value to `git_config*()` family,
they call `die()` while printing the line number and the file name.
Currently the printed line number is off by one, thus printing the
wrong line number.
Make `linenr` point to the line we just parsed during the call
to callback to get accurate line number in error messages.
Commit-message-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a config file, you can do:
[foo]
bar
to turn the "foo.bar" boolean flag on, and you can do:
[foo]
bar=
to set "foo.bar" to the empty string. However, git's "-c"
parameter treats both:
git -c foo.bar
and
git -c foo.bar=
as the boolean flag, and there is no way to set a variable
to the empty string. This patch enables the latter form to
do that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.
Add a `config_set`, that can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets.
`git_configset_get_value` follows `last one wins` semantic (i.e. if there
are multiple matches for the queried key in the files of the configset the
value returned will be the last entry in `value_list`).
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list). Add
type specific query functions like `git_configset_get_bool` and similar.
Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files (repo specific .git/config, user wide
~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set`
is populated using `git_config()`.
Add two external functions `git_config_get_value` and
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we see the core.commentchar config option, we extract
the string with git_config_string, which does two things:
1. It complains via config_error_nonbool if there is no
string value.
2. It makes a copy of the string.
Since we immediately parse the string into its
single-character value, we only care about (1). And in fact
(2) is a detriment, as it means we leak the copy. Instead,
let's just check the pointer value ourselves, and parse
directly from the const string we already have.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replaces the only two uses of fchmod() with chmod() because the
former does not work on Windows port and because luckily we can.
* kb/avoid-fchmod-for-now:
config: use chmod() instead of fchmod()
There is no fchmod() on native Windows platforms (MinGW and MSVC), and the
equivalent Win32 API (SetFileInformationByHandle) requires Windows Vista.
Use chmod() instead.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "mailmap.file" configuration option did not support the tilde
expansion (i.e. ~user/path and ~/path).
* ow/config-mailmap-pathname:
config: respect '~' and '~user' in mailmap.file
The skip_prefix() function returns a pointer to the content
past the prefix, or NULL if the prefix was not found. While
this is nice and simple, in practice it makes it hard to use
for two reasons:
1. When you want to conditionally skip or keep the string
as-is, you have to introduce a temporary variable.
For example:
tmp = skip_prefix(buf, "foo");
if (tmp)
buf = tmp;
2. It is verbose to check the outcome in a conditional, as
you need extra parentheses to silence compiler
warnings. For example:
if ((cp = skip_prefix(buf, "foo"))
/* do something with cp */
Both of these make it harder to use for long if-chains, and
we tend to use starts_with() instead. However, the first line
of "do something" is often to then skip forward in buf past
the prefix, either using a magic constant or with an extra
strlen(3) (which is generally computed at compile time, but
means we are repeating ourselves).
This patch refactors skip_prefix() to return a simple boolean,
and to provide the pointer value as an out-parameter. If the
prefix is not found, the out-parameter is untouched. This
lets you write:
if (skip_prefix(arg, "foo ", &arg))
do_foo(arg);
else if (skip_prefix(arg, "bar ", &arg))
do_bar(arg);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
mailmap.file configuration names a pathname, hence should honor
~/path and ~user/path as its value.
* ow/config-mailmap-pathname:
config: respect '~' and '~user' in mailmap.file