While git doesn't track empty directories, git archive can be tricked
into putting some into archives. One way is to construct an empty tree
object, as t5004 does. While that is supported by the object database,
it can't be represented in the index and thus it's unlikely to occur in
the wild.
Another way is using the literal name of a directory in an exclude
pathspec -- its contents are are excluded, but the directory stub is
included. That's inconsistent: exclude pathspecs containing wildcards
don't leave empty directories in the archive.
Yet another way is have a few levels of nested subdirectories (e.g.
d1/d2/d3/file1) and ignoring the entries at the leaves (e.g. file1).
The directories with the ignored content are ignored as well (e.g. d3),
but their empty parents are included (e.g. d2).
As empty directories are not supported by git, they should also not be
written into archives. If an empty directory is really needed then it
can be tracked and archived by placing an empty .gitignore file in it.
There already is a mechanism in place for suppressing empty directories.
When read_tree_recursive() encounters a directory excluded by a pathspec
then it enters it anyway because it might contain included entries. It
calls the callback function before it is able to decide if the directory
is actually needed. For that reason git archive adds directories to a
queue and writes entries for them only when it encounters the first
child item -- but currently only if pathspecs with wildcards are used.
Queue *all* directories, no matter if there even are pathspecs present.
This prevents git archive from writing entries for empty directories in
all cases.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reject directories with the attribute export-ignore already while
queuing them. This prevents read_tree_recursive() from descending into
them and this avoids write_archive_entry() rejecting them later on,
which queue_or_write_archive_entry() is not prepared for.
Borrow the existing strbuf to build the full path to avoid string
copies and extra allocations; just make sure we restore the original
value before moving on.
Keep checking any other attributes in write_archive_entry() as before,
but avoid checking them twice.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add helpers for accessing attributes that encapsulate the details of how
to retrieve their values.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert parse_tree_indirect to take a pointer to struct object_id.
Update all the callers. This transformation was achieved using the
following semantic patch and manual updates to the declaration and
definition. Update builtin/checkout.c manually as well, since it uses a
ternary expression not handled by the semantic patch.
@@
expression E1;
@@
- parse_tree_indirect(E1.hash)
+ parse_tree_indirect(&E1)
@@
expression E1;
@@
- parse_tree_indirect(E1->hash)
+ parse_tree_indirect(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The remaining callers are all simple "I have N attributes I am
interested in. I'll ask about them with various paths one by one".
After this step, no caller to git_check_attrs() remains. After
removing it, we can extend "struct attr_check" struct with data
that can be used in optimizing the query for the specific N
attributes it contains.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The traditional API to check attributes is to prepare an N-element
array of "struct git_attr_check" and pass N and the array to the
function "git_check_attr()" as arguments.
In preparation to revamp the API to pass a single structure, in
which these N elements are held, rename the type used for these
individual array elements to "struct attr_check_item" and rename
the function to "git_check_attrs()".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since b9605bc4f2 ("config: only read .git/config from configured
repos", 2016-09-12), we do not read from ".git/config" unless we
know we are in a repository. "git archive" however didn't do the
repository discovery and instead relied on the old behaviour.
Teach the command to run a "gentle" version of repository discovery
so that local configuration variables are honoured.
[jc: stole tests from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function takes a pointer to a pathspec structure, and releases
the resources held by it, but does not free() the structure itself.
Such a function should be called "clear", not "free".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
When we are allocating a struct with a FLEX_ARRAY member, we
generally compute the size of the array and then sprintf or
strcpy into it. Normally we could improve a dynamic allocation
like this by using xstrfmt, but it doesn't work here; we
have to account for the size of the rest of the struct.
But we can improve things a bit by storing the length that
we use for the allocation, and then feeding it to xsnprintf
or memcpy, which makes it more obvious that we are not
writing more than the allocated number of bytes.
It would be nice if we had some kind of helper for
allocating generic flex arrays, but it doesn't work that
well:
- the call signature is a little bit unwieldy:
d = flex_struct(sizeof(*d), offsetof(d, path), fmt, ...);
You need offsetof here instead of just writing to the
end of the base size, because we don't know how the
struct is packed (partially this is because FLEX_ARRAY
might not be zero, though we can account for that; but
the size of the struct may actually be rounded up for
alignment, and we can't know that).
- some sites do clever things, like over-allocating because
they know they will write larger things into the buffer
later (e.g., struct packed_git here).
So we're better off to just write out each allocation (or
add type-specific helpers, though many of these are one-off
allocations anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git log" and friends a new "--date=format:..." option to
format timestamps using system's strftime(3).
* jk/date-mode-format:
strbuf: make strbuf_addftime more robust
introduce "format" date-mode
convert "enum date_mode" into a struct
show-branch: use DATE_RELATIVE instead of magic number
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.
Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor. However, the tricky case is where we use the
enum labels as constants, like:
show_date(t, tz, DATE_NORMAL);
Ideally we could say:
show_date(t, tz, &{ DATE_NORMAL });
but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:
1. Manually add a "struct date_mode d = { DATE_NORMAL }"
definition to each caller, and pass "&d". This makes
the callers uglier, because they sometimes do not even
have their own scope (e.g., they are inside a switch
statement).
2. Provide a pre-made global "date_normal" struct that can
be passed by address. We'd also need "date_rfc2822",
"date_iso8601", and so forth. But at least the ugliness
is defined in one place.
3. Provide a wrapper that generates the correct struct on
the fly. The big downside is that we end up pointing to
a single global, which makes our wrapper non-reentrant.
But show_date is already not reentrant, so it does not
matter.
This patch implements 3, along with a minor macro to keep
the size of the callers sane.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some functions from the refs module were still declared in cache.h.
Move them to refs.h.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Identify parts of the code that knows that we use SHA-1 hash to
name our objects too much, and use (1) symbolic constants instead
of hardcoded 20 as byte count and/or (2) use struct object_id
instead of unsigned char [20] for object names.
* bc/object-id:
apply: convert threeway_stage to object_id
patch-id: convert to use struct object_id
commit: convert parts to struct object_id
diff: convert struct combine_diff_path to object_id
bulk-checkin.c: convert to use struct object_id
zip: use GIT_SHA1_HEXSZ for trailers
archive.c: convert to use struct object_id
bisect.c: convert leaf functions to use struct object_id
define utility functions for object IDs
define a structure for object IDs
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:
- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows the callback to use 'base' as a temporary buffer to
quickly assemble full path "without" extra allocation. The callback
has to restore it afterwards of course.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes two problems with using :(glob) (or even "*.c"
without ":(glob)").
The first one is we forgot to turn on the 'recursive' flag in struct
pathspec. Without that, tree_entry_interesting() will not mark
potential directories "interesting" so that it can confirm whether
those directories have anything matching the pathspec.
The marking directories interesting has a side effect that we need to
walk inside a directory to realize that there's nothing interested in
there. By that time, 'archive' code has already written the (empty)
directory down. That means lots of empty directories in the result
archive.
This problem is fixed by lazily writing directories down when we know
they are actually needed. There is a theoretical bug in this
implementation: we can't write empty trees/directories that match that
pathspec.
path_exists() is also made stricter in order to detect non-matching
pathspec because when this 'recursive' flag is on, we most likely
match some directories. The easiest way is not consider any
directories "matched".
Noticed-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_bool()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid scanning strings twice, once with strchr() and then with
strlen(), by using strchrnul().
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Mani <rohit.mani@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit ee27ca4, we started restricting remote git-archive
invocations to only accessing reachable commits. This
matches what upload-pack allows, but does restrict some
useful cases (e.g., HEAD:foo). We loosened this in 0f544ee,
which allows `foo:bar` as long as `foo` is a ref tip.
However, that still doesn't allow many useful things, like:
1. Commits accessible from a ref, like `foo^:bar`, which
are reachable
2. Arbitrary sha1s, even if they are reachable.
We can do a full object-reachability check for these cases,
but it can be quite expensive if the client has sent us the
sha1 of a tree; we have to visit every sub-tree of every
commit in the worst case.
Let's instead give site admins an escape hatch, in case they
prefer the more liberal behavior. For many sites, the full
object database is public anyway (e.g., if you allow dumb
walker access), or the site admin may simply decide the
security/convenience tradeoff is not worth it.
This patch adds a new config option to disable the
restrictions added in ee27ca4. It defaults to off, meaning
there is no change in behavior by default.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-archive relies on get_pathspec to convert its argv into
a list of pathspecs. When get_pathspec is given an empty
argv list, it returns a single pathspec, the empty string,
to indicate that everything matches. When we feed this to
our path_exists function, we typically see that the pathspec
turns up at least one item in the tree, and we are happy.
But when our tree is empty, we erroneously think it is
because the pathspec is too limited, when in fact it is
simply that there is nothing to be found in the tree. This
is a weird corner case, but the correct behavior is almost
certainly to produce an empty archive, not to exit with an
error.
This patch teaches git-archive to create empty archives when
there is no pathspec given (we continue to complain if a
pathspec is given, since it obviously is not matched). It
also confirms that the tar and zip writers produce sane
output in this instance.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The manpage of gitattributes says: "The rules how the pattern
matches paths are the same as in .gitignore files" and the gitignore
pattern matching has a pattern ending with / for directory matching.
This rule is specifically relevant for the 'export-ignore' rule used
for git archive.
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Somewhere in help usage, we use both "message" and "msg", "command"
and "cmd", "key id" and "key-id". This patch makes all help text from
parseopt use the first form. Clearer and 3 fewer strings for
translators.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need to build a copy of the relevant part of the string just
to make sure we have a NUL-terminated string. We can simply pass the
length of the interesting part to dwim_ref() instead.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
archive-tar.c and archive-zip.c now perform conversion check, with
help of sha1_file_to_archive() from archive.c
This gives backends more freedom in dealing with (streaming) large
blobs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tightening done in (ee27ca4a: archive: don't let remote clients
get unreachable commits, 2011-11-17) went too far and disallowed
HEAD:Documentation as it would try to find "HEAD:Documentation" as a
ref.
Only DWIM the "HEAD" part to see if it exists as a ref. Once we're
sure that we've been given a valid ref, we follow the normal code
path. This still disallows attempts to access commits which are not
branch tips.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually git is careful not to allow clients to fetch
arbitrary objects from the database; for example, objects
received via upload-pack must be reachable from a ref.
Upload-archive breaks this by feeding the client's tree-ish
directly to get_sha1, which will accept arbitrary hex sha1s,
reflogs, etc.
This is not a problem if all of your objects are publicly
reachable anyway (or at least public to anybody who can run
upload-archive). Or if you are making the repo available by
dumb protocols like http or rsync (in which case the client
can read your whole object db directly).
But for sites which allow access only through smart
protocols, clients may be able to fetch trees from commits
that exist in the server's object database but are not
referenced (e.g., because history was rewound).
This patch tightens upload-archive's lookup to use dwim_ref
rather than get_sha1. This means a remote client can only
fetch the tip of a named ref, not an arbitrary sha1 or
reflog entry.
This also restricts some legitimate requests, too:
1. Reachable non-tip commits, like:
git archive --remote=$url v1.0~5
2. Sub-trees of reachable commits, like:
git archive --remote=$url v1.7.7:Documentation
Local requests continue to use get_sha1, and are not
restricted at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The list variable (which is OPT_BOOLEAN) is initialized to 0 and only
checked against 0 in the code, so it is safe to use OPT_BOOL().
The worktree_attributes variable (which is OPT_BOOLEAN) is initialized to
0 and later assigned to a field with the same name in struct archive_args,
which is a bitfield of width 1. It is safe and even more correct to use
OPT_BOOL() here; the new test in 5001 demonstrates why using OPT_COUNTUP
is wrong.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suggested by: Junio Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tar filters may be very expensive to run, so sites do
not want to expose them via upload-archive. This patch lets
users configure tar.<filter>.remote to turn them off.
By default, gzip filters are left on, as they are about as
expensive as creating zip archives.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git-archive will guess a format from the output filename if
no format is explicitly given. The current function just
hardcodes "zip" to the zip format, and leaves everything
else NULL (which will default to tar). Since we are about
to add user-specified formats, we need to be more flexible.
The new rule is "if a filename ends with a dot and the name
of a format, it matches that format". For the existing "tar"
and "zip" formats, this is identical to the current
behavior. For new user-specified formats, this will do what
the user expects if they name their formats appropriately.
Because we will eventually start matching arbitrary
user-specified extensions that may include dots, the strrchr
search for the final dot is not sufficient. We need to do an
actual suffix match with each extension.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The process for guessing an archive output format based on
the filename is something like this:
a. parse --output in cmd_archive; check the filename
against a static set of mapping heuristics (right now
it just matches ".zip" for zip files).
b. if found, stick a fake "--format=zip" at the beginning
of the arguments list (if the user did specify a
--format manually, the later option will override our
fake one)
c. if it's a remote call, ship the arguments to the remote
(including the fake), which will call write_archive on
their end
d. if it's local, ship the arguments to write_archive
locally
There are two problems:
1. The set of mappings is static and at too high a level.
The write_archive level is going to check config for
user-defined formats, some of which will specify
extensions. We need to delay lookup until those are
parsed, so we can match against them.
2. For a remote archive call, our set of mappings (or
formats) may not match the remote side's. This is OK in
practice right now, because all versions of git
understand "zip" and "tar". But as new formats are
added, there is going to be a mismatch between what the
client can do and what the remote server can do.
To fix (1), this patch refactors the location guessing to
happen at the write_archive level, instead of the
cmd_archive level. So instead of sticking a fake --format
field in the argv list, we actually pass a "name hint" down
the callchain; this hint is used at the appropriate time to
guess the format (if one hasn't been given already).
This patch leaves (2) unfixed. The name_hint is converted to
a "--format" option as before, and passed to the remote.
This means the local side's idea of how extensions map to
formats will take precedence.
Another option would be to pass the name hint to the remote
side and let the remote choose. This isn't a good idea for
two reasons:
1. There's no room in the protocol for passing that
information. We can pass a new argument, but older
versions of git on the server will choke on it.
2. Letting the remote side decide creates a silent
inconsistency in user experience. Consider the case
that the locally installed git knows about the "tar.gz"
format, but a remote server doesn't.
Running "git archive -o foo.tar.gz" will use the tar.gz
format. If we use --remote, and the local side chooses
the format, then we send "--format=tar.gz" to the
remote, which will complain about the unknown format.
But if we let the remote side choose the format, then
it will realize that it doesn't know about "tar.gz" and
output uncompressed tar without even issuing a warning.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current archivers are very static; when you are in the
write_tar_archive function, you know you are writing a tar.
However, to facilitate runtime-configurable archivers
that will share a common write function we need to tell the
function which archiver was used.
As a convenience, we also provide an opaque data pointer in
the archiver struct so that individual archivers can put
something useful there when they register themselves.
Technically they could just use the "name" field to look in
an internal map of names to data, but this is much simpler.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>