The "core.logAllRefUpdates" that used to be boolean has been
enhanced to take 'always' as well, to record ref updates to refs
other than the ones that are expected to be updated (i.e. branches,
remote-tracking branches and notes).
* cw/log-updates-for-all-refs-really:
doc: add note about ignoring '--no-create-reflog'
update-ref: add test cases for bare repository
refs: add option core.logAllRefUpdates = always
config: add markup to core.logAllRefUpdates doc
When core.logallrefupdates is true, we only create a new reflog for refs
that are under certain well-known hierarchies. The reason is that we
know that some hierarchies (like refs/tags) are not meant to change, and
that unknown hierarchies might not want reflogs at all (e.g., a
hypothetical refs/foo might be meant to change often and drop old
history immediately).
However, sometimes it is useful to override this decision and simply log
for all refs, because the safety and audit trail is more important than
the performance implications of keeping the log around.
This patch introduces a new "always" mode for the core.logallrefupdates
option which will log updates to everything under refs/, regardless
where in the hierarchy it is (we still will not log things like
ORIG_HEAD and FETCH_HEAD, which are known to be transient).
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Patch generated by Coccinelle and contrib/coccinelle/object_id.cocci.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When deleting/pruning references, remove any directories that are made
empty by the deletion of loose references or of reflogs. Otherwise such
empty directories can survive forever and accumulate over time. (Even
'pack-refs', which is smart enough to remove the parent directories of
loose references that it prunes, leaves directories that were already
empty.)
And now that files_transaction_commit() takes care of deleting the
parent directories of loose references that it prunes, we don't have to
do that in prune_ref() anymore.
This change would be unwise if the *creation* of these directories could
race with our deletion of them. But the earlier changes in this patch
series made the creation paths robust against races, so now it is safe
to tidy them up more aggressively.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new "flags" parameter that tells the function whether to remove
empty parent directories of the loose reference file, of the reflog
file, or both. The new functionality is not yet used.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's bad manners and surprising and therefore error-prone.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the standard nomenclature.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was hardly doing anything anymore, and had only one caller.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is simpler to derive the path to the file that must be deleted from
"lock->ref_name" than from the lock_file object.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now files_log_ref_write() doesn't do anything beyond call
log_ref_write_1(), so inline the latter into the former.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing the name of the reflog file into a strbuf that is
supplied by the caller but not needed there, write it into a local
temporary buffer and remove the strbuf parameter entirely.
And while we're adjusting the function signature, reorder the arguments
to move the input parameters before the output parameters.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's unnecessary to pass a strbuf holding the reflog path up and down
the call stack now that it is hardly needed by the callers. Remove the
places where log_ref_write_1() uses it, in preparation for making it
internal to log_ref_setup().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function will most often be called by log_ref_write_1(), which
wants to append to the reflog file. In that case, it is silly to close
the file only for the caller to reopen it immediately. So, in the case
that the file was opened, pass the open file descriptor back to the
caller.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change log_ref_setup() to use raceproof_create_file() to create the new
logfile. This makes it more robust against a race against another
process that might be trying to clean up empty directories while we are
trying to create a new logfile.
This also means that it will only call create_leading_directories() if
open() fails, which should be a net win. Even in the cases where we are
willing to create a new logfile, it will usually be the case that the
logfile already exists, or if not then that the directory containing the
logfile already exists. In such cases, we will save some work that was
previously done unconditionally.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The behavior of this function (especially how it handles errors) is
quite different depending on whether we are willing to create the reflog
vs. whether we are only trying to open an existing reflog. So separate
the code paths.
This also simplifies the next steps.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function doesn't do anything beyond call files_log_ref_write(), so
replace it with the latter at its call sites.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Don't capitalize error strings
* Report true paths of affected files
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Besides shortening the code, this saves an unnecessary call to
safe_create_leading_directories_const() in almost all cases.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of coding the retry loop inline, use raceproof_create_file() to
make lock acquisition safe against directory creation/deletion races.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`lflags` is set a single time then never changed, so just inline it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A stray symbolic link in $GIT_DIR/refs/ directory could make name
resolution loop forever, which has been corrected.
* jk/ref-symlink-loop:
files_read_raw_ref: prevent infinite retry loops in general
files_read_raw_ref: avoid infinite loop on broken symlinks
Limit the number of retries to 3. That should be adequate to
prevent any races, while preventing the possibility of
infinite loops if the logic fails to handle any other
possible error modes correctly.
After the fix in the previous commit, there's no known way
to trigger an infinite loop, but I did manually verify that
this fixes the test in that commit even when the code change
is not applied.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our ref resolution first runs lstat() on any path we try to
look up, because we want to treat symlinks specially (by
resolving them manually and considering them symrefs). But
if the results of `readlink` do _not_ look like a ref, we
fall through to treating it like a normal file, and just
read the contents of the linked path.
Since fcb7c76 (resolve_ref_unsafe(): close race condition
reading loose refs, 2013-06-19), that "normal file" code
path will stat() the file and if we see ENOENT, will jump
back to the lstat(), thinking we've seen inconsistent
results between the two calls. But for a symbolic ref, this
isn't a race: the lstat() found the symlink, and the stat()
is looking at the path it points to. We end up in an
infinite loop calling lstat() and stat().
We can fix this by avoiding the retry-on-inconsistent jump
when we know that we found a symlink. While we're at it,
let's add a comment explaining why the symlink case gets to
this code in the first place; without that, it is not
obvious that the correct solution isn't to avoid the stat()
code path entirely.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT. The resulting code is
shorter and supports empty arrays with NULL pointers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref-store abstraction was introduced to the refs API so that we
can plug in different backends to store references.
* mh/ref-store: (38 commits)
refs: implement iteration over only per-worktree refs
refs: make lock generic
refs: add method to rename refs
refs: add methods to init refs db
refs: make delete_refs() virtual
refs: add method for initial ref transaction commit
refs: add methods for reflog
refs: add method iterator_begin
files_ref_iterator_begin(): take a ref_store argument
split_symref_update(): add a files_ref_store argument
lock_ref_sha1_basic(): add a files_ref_store argument
lock_ref_for_update(): add a files_ref_store argument
commit_ref_update(): add a files_ref_store argument
lock_raw_ref(): add a files_ref_store argument
repack_without_refs(): add a files_ref_store argument
refs: make peel_ref() virtual
refs: make create_symref() virtual
refs: make pack_refs() virtual
refs: make verify_refname_available() virtual
refs: make read_raw_ref() virtual
...
Alternate refs backends might still use files to store per-worktree
refs. So provide a way to iterate over only the per-worktree references
in a ref_store. The other backend can set up a files ref_store and
iterate using the new DO_FOR_EACH_PER_WORKTREE_ONLY flag when iterating.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of including a files-backend-specific struct ref_lock, change
the generic ref_update struct to include a void pointer that backends
can use for their own arbitrary data.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes the last caller of function get_files_ref_store(), so
remove it.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Alternate refs backends might not need the refs/heads directory and so
on, so we make ref db initialization part of the backend.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the file-based backend, delete_refs has some special optimization
to deal with packed refs. In other backends, we might be able to make
ref deletion faster by putting all deletions into a single
transaction. So we need a special backend function for this.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ronnie Sahlberg <rsahlberg@google.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the file-based backend, the reflog piggybacks on the ref lock.
Since other backends won't have the same sort of ref lock, ref backends
must also handle reflogs.
Signed-off-by: Ronnie Sahlberg <rsahlberg@google.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now it only supports the main reference store.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reference backends will be able to customize this function to implement
reference reading.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_recursively() can handle references in arbitrary files
reference stores, so use it to resolve "gitlink" (i.e., submodule)
references. Aside from removing redundant code, this allows submodule
lookups to benefit from the much more robust code that we use for
reading non-submodule references. And, since the code is now agnostic
about reference backends, it will work for any future references
backend (so move its definition to refs.c).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that resolve_packed_ref() can work with an arbitrary
files_ref_store, there is no need to have a separate
resolve_gitlink_packed_ref() function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move resolve_gitlink_ref() and related functions lower in the file to
avoid the need for forward declarations in the next step.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions currently only work in the main repository, so add an
assert_main_repository() check to each function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want ref_stores to be polymorphic, so invent a base class of which
files_ref_store is a derived class. For now there is exactly one
ref_store for the main repository and one for any submodules whose
references have been accessed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a `struct ref_storage_be` to represent types of reference stores. In
OO notation, this is the class, and will soon hold some class
methods (e.g., a factory to create new ref_store instances) and will
also serve as the vtable for ref_store instances of that type.
As yet, the backends cannot do anything.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The greater goal of this patch series is to develop the concept of a
reference store, which is a place that references, their values, and
their reflogs are stored, and to virtualize the reference interface so
that different types of ref_stores can be implemented. We will then, for
example, use ref_store instances to access submodule references and
worktree references.
Currently, we keep a ref_cache for each submodule that has had its
references iterated over. It is a far cry from a ref_store, but they are
stored the way we will want to store ref_stores, and ref_stores will
eventually have to hold the reference caches. So let's treat ref_caches
as embryo ref_stores, and build them out from there.
As the first step, simply rename `ref_cache` to `files_ref_store`, and
rename some functions and attributes correspondingly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, do_submodule_path will attempt locating the .git directory by
using read_gitfile on <path>/.git. If this fails it just assumes the
<path>/.git is actually a git directory.
This is good because it allows for handling submodules which were cloned
in a regular manner first before being added to the superproject.
Unfortunately this fails if the <path> is not actually checked out any
longer, such as by removing the directory.
Fix this by checking if the directory we found is actually a gitdir. In
the case it is not, attempt to lookup the submodule configuration and
find the name of where it is stored in the .git/modules/ directory of
the superproject.
If we can't locate the submodule configuration, this might occur because
for example a submodule gitlink was added but the corresponding
.gitmodules file was not properly updated. A die() here would not be
pleasant to the users of submodule diff formats, so instead, modify
do_submodule_path() to return an error code:
- git_pathdup_submodule() returns NULL when we fail to find a path.
- strbuf_git_path_submodule() propagates the error code to the caller.
Modify the callers of these functions to check the error code and fail
properly. This ensures we don't attempt to use a bad path that doesn't
match the corresponding submodule.
Because this change fixes add_submodule_odb() to work even if the
submodule is not checked out, update the wording of the submodule log
diff format to correctly display that the submodule is "not initialized"
instead of "not checked out"
Add tests to ensure this change works as expected.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The API to iterate over all the refs (i.e. for_each_ref(), etc.)
has been revamped.
* mh/ref-iterators:
for_each_reflog(): reimplement using iterators
dir_iterator: new API for iterating over a directory tree
for_each_reflog(): don't abort for bad references
do_for_each_ref(): reimplement using reference iteration
refs: introduce an iterator interface
ref_resolves_to_object(): new function
entry_resolves_to_object(): rename function from ref_resolves_to_object()
get_ref_cache(): only create an instance if there is a submodule
remote rm: handle symbolic refs correctly
delete_refs(): add a flags argument
refs: use name "prefix" consistently
do_for_each_ref(): move docstring to the header file
refs: remove unnecessary "extern" keywords
Error handling in the codepaths that updates refs has been
improved.
* mh/update-ref-errors:
lock_ref_for_update(): avoid a symref resolution
lock_ref_for_update(): make error handling more uniform
t1404: add more tests of update-ref error handling
t1404: document function test_update_rejected
t1404: remove "prefix" argument to test_update_rejected
t1404: rename file to t1404-update-ref-errors.sh
Further preparatory work on the refs API before the pluggable
backend series can land.
* mh/split-under-lock: (33 commits)
lock_ref_sha1_basic(): only handle REF_NODEREF mode
commit_ref_update(): remove the flags parameter
lock_ref_for_update(): don't resolve symrefs
lock_ref_for_update(): don't re-read non-symbolic references
refs: resolve symbolic refs first
ref_transaction_update(): check refname_is_safe() at a minimum
unlock_ref(): move definition higher in the file
lock_ref_for_update(): new function
add_update(): initialize the whole ref_update
verify_refname_available(): adjust constness in declaration
refs: don't dereference on rename
refs: allow log-only updates
delete_branches(): use resolve_refdup()
ref_transaction_commit(): correctly report close_ref() failure
ref_transaction_create(): disallow recursive pruning
refs: make error messages more consistent
lock_ref_sha1_basic(): remove unneeded local variable
read_raw_ref(): move docstring to header file
read_raw_ref(): improve docstring
read_raw_ref(): rename symref argument to referent
...
Apply the set of semantic patches from contrib/coccinelle to convert
some leftover places using struct object_id's hash member to instead
use the wrapper functions that take struct object_id natively.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we're overwriting a symref with a SHA-1, we need to resolve the value
of the symref (1) to check against update->old_sha1 and (2) to write to
its reflog. However, we've already read the symref itself and know its
referent. So there is no need to read the symref's value through the
symref; we can read the referent directly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To aid the effort, extract a new function, check_old_oid(), and use it
in the two places where the read value of the reference has to be
checked against update->old_sha1.
Update tests to reflect the improvements.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow references with reflogs to be iterated over using a ref_iterator.
The latter is implemented as a files_reflog_iterator, which in turn uses
dir_iterator to read the "logs" directory.
Note that reflog iteration doesn't correctly handle per-worktree
reflogs (either before or after this patch).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is a file under "$GIT_DIR/logs" with no corresponding
reference, the old code was emitting an error message, aborting the
reflog iteration, and returning -1. But
* None of the callers was checking the exit value
* The callers all want to find all legitimate reflogs (sometimes for the
purpose of determining object reachability!) and wouldn't benefit from
a truncated iteration anyway.
So instead, emit an error message and skip the "broken" reflog, but
continue with the iteration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the reference iterator interface to implement do_for_each_ref().
Delete a bunch of code supporting the old for_each_ref() implementation.
And now that do_for_each_ref() is generic code (it is no longer tied to
the files backend), move it to refs.c.
The implementation is via a new function, do_for_each_ref_iterator(),
which takes a reference iterator as argument and calls a callback
function for each of the references in the iterator.
This change requires the current_ref performance hack for peel_ref() to
be implemented via ref_iterator_peel() rather than peel_entry() because
we don't have a ref_entry handy (it is hidden under three layers:
file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So:
* do_for_each_ref_iterator() records the active iterator in
current_ref_iter while it is running.
* peel_ref() checks whether current_ref_iter is pointing at the
requested reference. If so, it asks the iterator to peel the
reference (which it can do efficiently via its "peel" virtual
function). For extra safety, we do the optimization only if the
refname *addresses* are the same, not only if the refname *strings*
are the same, to forestall possible mixups between refnames that come
from different ref_iterators.
Please note that this optimization of peel_ref() is only available when
iterating via do_for_each_ref_iterator() (including all of the
for_each_ref() functions, which call it indirectly). It would be
complicated to implement a similar optimization when iterating directly
using a reference iterator, because multiple reference iterators can be
in use at the same time, with interleaved calls to
ref_iterator_advance(). (In fact we do exactly that in
merge_ref_iterator.)
But that is not necessary. peel_ref() is only called while iterating
over references. Callers who iterate using the for_each_ref() functions
benefit from the optimization described above. Callers who iterate using
reference iterators directly have access to the ref_iterator, so they
can call ref_iterator_peel() themselves to get an analogous optimization
in a more straightforward manner.
If we rewrite all callers to use the reference iteration API, then we
can remove the current_ref_iter hack permanently.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the API for iterating over references is via a family of
for_each_ref()-type functions that invoke a callback function for each
selected reference. All of these eventually call do_for_each_ref(),
which knows how to do one thing: iterate in parallel through two
ref_caches, one for loose and one for packed refs, giving loose
references precedence over packed refs. This is rather complicated code,
and is quite specialized to the files backend. It also requires callers
to encapsulate their work into a callback function, which often means
that they have to define and use a "cb_data" struct to manage their
context.
The current design is already bursting at the seams, and will become
even more awkward in the upcoming world of multiple reference storage
backends:
* Per-worktree vs. shared references are currently handled via a kludge
in git_path() rather than iterating over each part of the reference
namespace separately and merging the results. This kludge will cease
to work when we have multiple reference storage backends.
* The current scheme is inflexible. What if we sometimes want to bypass
the ref_cache, or use it only for packed or only for loose refs? What
if we want to store symbolic refs in one type of storage backend and
non-symbolic ones in another?
In the future, each reference backend will need to define its own way of
iterating over references. The crux of the problem with the current
design is that it is impossible to compose for_each_ref()-style
iterations, because the flow of control is owned by the for_each_ref()
function. There is nothing that a caller can do but iterate through all
references in a single burst, so there is no way for it to interleave
references from multiple backends and present the result to the rest of
the world as a single compound backend.
This commit introduces a new iteration primitive for references: a
ref_iterator. A ref_iterator is a polymorphic object that a reference
storage backend can be asked to instantiate. There are three functions
that can be applied to a ref_iterator:
* ref_iterator_advance(): move to the next reference in the iteration
* ref_iterator_abort(): end the iteration before it is exhausted
* ref_iterator_peel(): peel the reference currently being looked at
Iterating using a ref_iterator leaves the flow of control in the hands
of the caller, which means that ref_iterators from multiple
sources (e.g., loose and packed refs) can be composed and presented to
the world as a single compound ref_iterator.
It also means that the backend code for implementing reference iteration
will sometimes be more complicated. For example, the
cache_ref_iterator (which iterates over a ref_cache) can't use the C
stack to recurse; instead, it must manage its own stack internally as
explicit data structures. There is also a lot of boilerplate connected
with object-oriented programming in C.
Eventually, end-user callers will be able to be written in a more
natural way—managing their own flow of control rather than having to
work via callbacks. Since there will only be a few reference backends
but there are many consumers of this API, this is a good tradeoff.
More importantly, we gain composability, and especially the possibility
of writing interchangeable parts that can work with any ref_iterator.
For example, merge_ref_iterator implements a generic way of merging the
contents of any two ref_iterators. It is used to merge loose + packed
refs as part of the implementation of the files_ref_iterator. But it
will also be possible to use it to merge other pairs of reference
sources (e.g., per-worktree vs. shared refs).
Another example is prefix_ref_iterator, which can be used to trim a
prefix off the front of reference names before presenting them to the
caller (e.g., "refs/heads/master" -> "master").
In this patch, we introduce the iterator abstraction and many utilities,
and implement a reference iterator for the files ref storage backend.
(I've written several other obvious utilities, for example a generic way
to filter references being iterated over. These will probably be useful
in the future. But they are not needed for this patch series, so I am
not including them at this time.)
In a moment we will rewrite do_for_each_ref() to work via reference
iterators (allowing some special-purpose code to be discarded), and do
something similar for reflogs. In future patch series, we will expose
the ref_iterator abstraction in the public refs API so that callers can
use it directly.
Implementation note: I tried abstracting this a layer further to allow
generic iterators (over arbitrary types of objects) and generic
utilities like a generic merge_iterator. But the implementation in C was
very cumbersome, involving (in my opinion) too much boilerplate and too
much unsafe casting, some of which would have had to be done on the
caller side. However, I did put a few iterator-related constants in a
top-level header file, iterator.h, as they will be useful in a moment to
implement iteration over directory trees and possibly other types of
iterators in the future.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract new function ref_resolves_to_object() from
entry_resolves_to_object(). It can be used even if there is no ref_entry
at hand.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Free up the old name for a more general purpose.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is not a nonbare repository where a submodule is supposedly
located, then don't instantiate a ref_cache for it.
The analogous check can be removed from resolve_gitlink_ref().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will be useful for passing REF_NODEREF through.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the context of the for_each_ref() functions, call the prefix that
references must start with "prefix". (In some places it was called
"base".) This is clearer, and also prevents confusion with another
planned use of the word "base".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now lock_ref_sha1_basic() is only called with flags==REF_NODEREF. So we
don't have to handle other cases anymore.
This enables several simplifications, the most interesting of which come
from the fact that ref_lock::orig_ref_name is now always the same as
ref_lock::ref_name:
* Remove ref_lock::orig_ref_name
* Remove local variable orig_refname from lock_ref_sha1_basic()
* ref_name can be initialize once and its value reused
* commit_ref_update() never has to write to the reflog for
lock->orig_ref_name
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
If a transaction includes a non-NODEREF update to a symbolic reference,
we don't have to look it up in lock_ref_for_update(). The reference will
be dereferenced anyway when the split-off update is processed.
This change requires that we store a backpointer from the split-off
update to its parent update, for two reasons:
* We still want to report the original reference name in error messages.
So if an error occurs when checking the split-off update's old_sha1,
walk the parent_update pointers back to find the original reference
name, and report that one.
* We still need to write the old_sha1 of the symref to its reflog. So
after we read the split-off update's reference value, walk the
parent_update pointers back and fill in their old_sha1 fields.
Aside from eliminating unnecessary reads, this change fixes a
subtle (though not very serious) race condition: in the old code, the
old_sha1 of the symref was resolved before the reference that it pointed
at was locked. So it was possible that the old_sha1 value logged to the
symref's reflog could be wrong if another process changed the downstream
reference before it was locked.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Before the previous patch, our first read of the reference happened
before the reference was locked, so we couldn't trust its value and had
to read it again. But now that our first read of the reference happens
after acquiring the lock, there is no need to read it a second time. So
move the read_ref_full() call into the (update->type & REF_ISSYMREF)
block.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Before committing ref updates, split symbolic ref updates into two
parts: an update to the underlying ref, and a log-only update to the
symbolic ref. This ensures that both references are locked correctly
during the transaction, including while their reflogs are updated.
Similarly, if the reference pointed to by HEAD is modified directly, add
a separate log-only update to HEAD, rather than leaving the job of
updating HEAD's reflog to commit_ref_update(). This change ensures that
HEAD is locked correctly while its reflog is being modified, as well as
being cheaper (HEAD only needs to be resolved once).
This makes use of a new function, lock_raw_ref(), which is analogous to
read_raw_ref(), but acquires a lock on the reference before reading it.
This change still has two problems:
* There are redundant read_ref_full() reference lookups.
* It is still possible to get incorrect reflogs for symbolic references
if there is a concurrent update by another process, since the old_oid
of a symref is determined before the lock on the pointed-to ref is
held.
Both problems will soon be fixed.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
WIP
When renaming refs, don't dereference either the origin or the destination
before renaming.
The origin does not need to be dereferenced because it is presently
forbidden to rename symbolic refs.
Not dereferencing the destination fixes a bug where renaming on top of
a broken symref would use the pointed-to ref name for the moved
reflog.
Add a test for the reflog bug.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
The refs infrastructure learns about log-only ref updates, which only
update the reflog. Later, we will use this to separate symbolic
reference resolution from ref updating.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
It is nonsensical (and a little bit dangerous) to use REF_ISPRUNING
without REF_NODEREF. Forbid it explicitly. Change the one REF_ISPRUNING
caller to pass REF_NODEREF too.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
* Always start error messages with a lower-case letter.
* Always enclose reference names in single quotes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
resolve_ref_unsafe() can cope with being called with NULL passed to its
flags argument. So lock_ref_sha1_basic() can just hand its own type
parameter through.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Among other things, document the (important!) requirement that input
refname be checked for safety before calling this function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
This will hopefully reduce confusion with the "flags" arguments that are
used in many functions in this module as an input parameter to choose
how the function should operate.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
These microoptimizations don't make a significant difference in speed.
And they cause problems if somebody ever wants to modify the function to
add updates to a transaction as part of processing it, as will happen
shortly.
Make the same changes in initial_ref_transaction_commit().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Even if there is an empty directory where we look for the loose version
of a reference, check for a packed reference before giving up. This
fixes the failing test that was introduced two commits ago.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>