1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-10-31 06:17:56 +01:00
Commit graph

933 commits

Author SHA1 Message Date
Stefan Beller
90c62155d6 repository: introduce raw object store field
The raw object store field will contain any objects needed for access
to objects in a given repository.

This patch introduces the raw object store and populates it with the
`objectdir`, which used to be part of the repository struct.

As the struct gains members, we'll also populate the function to clear
the memory for these members.

In a later step, we'll introduce a struct object_parser, that will
complement the object parsing in a repository struct: The raw object
parser is the layer that will provide access to raw object content,
while the higher level object parser code will parse raw objects and
keeps track of parenthood and other object relationships using 'struct
object'.  For now only add the lower level to the repository struct.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-23 11:06:01 -07:00
Nguyễn Thái Ngọc Duy
7bc0dcaa61 sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
getenv() is supposed to work on the main repository only. This delayed
getenv() code in sha1_file.c makes it more difficult to convert
sha1_file.c to a generic object store that could be used by both
submodule and main repositories.

Move the getenv() back in setup_git_env() where other env vars are
also fetched.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:03 -08:00
Junio C Hamano
9238941618 Merge branch 'cc/sha1-file-name'
Code clean-up.

* cc/sha1-file-name:
  sha1_file: improve sha1_file_name() perfs
  sha1_file: remove static strbuf from sha1_file_name()
2018-02-13 13:39:10 -08:00
Junio C Hamano
9bc89b17e3 Merge branch 'tb/crlf-conv-flags'
Code clean-up.

* tb/crlf-conv-flags:
  convert_to_git(): safe_crlf/checksafe becomes int conv_flags
2018-02-13 13:39:08 -08:00
Junio C Hamano
867622398f Merge branch 'gs/retire-mru'
Retire mru API as it does not give enough abstraction over
underlying list API to be worth it.

* gs/retire-mru:
  mru: Replace mru.[ch] with list.h implementation
2018-02-13 13:39:06 -08:00
Junio C Hamano
f3d618d2bf Merge branch 'jh/fsck-promisors'
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.

* jh/fsck-promisors:
  gc: do not repack promisor packfiles
  rev-list: support termination at promisor objects
  sha1_file: support lazily fetching missing objects
  introduce fetch-object: fetch one promisor object
  index-pack: refactor writing of .keep files
  fsck: support promisor objects as CLI argument
  fsck: support referenced promisor objects
  fsck: support refs pointing to promisor objects
  fsck: introduce partialclone extension
  extension.partialclone: introduce partial clone extension
2018-02-13 13:39:03 -08:00
Gargi Sharma
ec2dd32c70 mru: Replace mru.[ch] with list.h implementation
Replace the custom calls to mru.[ch] with calls to list.h. This patch is
the final step in removing the mru API completely and inlining the logic.
This patch leads to significant code reduction and the mru API hence, is
not a useful abstraction anymore.

Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24 09:52:16 -08:00
Christian Couder
3449847168 sha1_file: improve sha1_file_name() perfs
As sha1_file_name() could be performance sensitive, let's
make it faster by using strbuf_addstr() and strbuf_addc()
instead of strbuf_addf().

Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 13:21:49 -08:00
Christian Couder
ea6577303f sha1_file: remove static strbuf from sha1_file_name()
Using a static buffer in sha1_file_name() is error prone
and the performance improvements it gives are not needed
in many of the callers.

So let's get rid of this static buffer and, if necessary
or helpful, let's use one in the caller.

Suggested-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-17 12:21:32 -08:00
Torsten Bögershausen
8462ff43e4 convert_to_git(): safe_crlf/checksafe becomes int conv_flags
When calling convert_to_git(), the checksafe parameter defined what
should happen if the EOL conversion (CRLF --> LF --> CRLF) does not
roundtrip cleanly. In addition, it also defined if line endings should
be renormalized (CRLF --> LF) or kept as they are.

checksafe was an safe_crlf enum with these values:
SAFE_CRLF_FALSE:       do nothing in case of EOL roundtrip errors
SAFE_CRLF_FAIL:        die in case of EOL roundtrip errors
SAFE_CRLF_WARN:        print a warning in case of EOL roundtrip errors
SAFE_CRLF_RENORMALIZE: change CRLF to LF
SAFE_CRLF_KEEP_CRLF:   keep all line endings as they are

In some cases the integer value 0 was passed as checksafe parameter
instead of the correct enum value SAFE_CRLF_FALSE. That was no problem
because SAFE_CRLF_FALSE is defined as 0.

FALSE/FAIL/WARN are different from RENORMALIZE and KEEP_CRLF. Therefore,
an enum is not ideal. Let's use a integer bit pattern instead and rename
the parameter to conv_flags to make it more generically usable. This
allows us to extend the bit pattern in a subsequent commit.

Reported-By: Randall S. Becker <rsbecker@nexbridge.com>
Helped-By: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16 12:35:56 -08:00
Junio C Hamano
97e1f857fc Merge branch 'ds/for-each-file-in-obj-micro-optim'
The code to iterate over loose object files got optimized.

* ds/for-each-file-in-obj-micro-optim:
  sha1_file: use strbuf_add() instead of strbuf_addf()
2017-12-13 13:28:57 -08:00
Junio C Hamano
721cc4314c Merge branch 'bc/hash-algo'
An infrastructure to define what hash function is used in Git is
introduced, and an effort to plumb that throughout various
codepaths has been started.

* bc/hash-algo:
  repository: fix a sparse 'using integer as NULL pointer' warning
  Switch empty tree and blob lookups to use hash abstraction
  Integrate hash algorithm support with repo setup
  Add structure representing hash algorithm
  setup: expose enumerated repo info
2017-12-13 13:28:54 -08:00
Jonathan Tan
8b4c0103a9 sha1_file: support lazily fetching missing objects
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Junio C Hamano
79bafd23a8 Merge branch 'jk/fewer-pack-rescan'
Internaly we use 0{40} as a placeholder object name to signal the
codepath that there is no such object (e.g. the fast-forward check
while "git fetch" stores a new remote-tracking ref says "we know
there is no 'old' thing pointed at by the ref, as we are creating
it anew" by passing 0{40} for the 'old' side), and expect that a
codepath to locate an in-core object to return NULL as a sign that
the object does not exist.  A look-up for an object that does not
exist however is quite costly with a repository with large number
of packfiles.  This access pattern has been optimized.

* jk/fewer-pack-rescan:
  sha1_file: fast-path null sha1 as a missing object
  everything_local: use "quick" object existence check
  p5551: add a script to test fetch pack-dir rescans
  t/perf/lib-pack: use fast-import checkpoint to create packs
  p5550: factor out nonsense-pack creation
2017-12-06 09:23:42 -08:00
Derrick Stolee
163ee5e635 sha1_file: use strbuf_add() instead of strbuf_addf()
Replace use of strbuf_addf() with strbuf_add() when enumerating
loose objects in for_each_file_in_obj_subdir(). Since we already
check the length and hex-values of the string before consuming
the path, we can prevent extra computation by using the lower-
level method.

One consumer of for_each_file_in_obj_subdir() is the abbreviation
code. OID abbreviations use a cached list of loose objects (per
object subdirectory) to make repeated queries fast, but there is
significant cache load time when there are many loose objects.

Most repositories do not have many loose objects before repacking,
but in the GVFS case the repos can grow to have millions of loose
objects. Profiling 'git log' performance in GitForWindows on a
GVFS-enabled repo with ~2.5 million loose objects revealed 12% of
the CPU time was spent in strbuf_addf().

Add a new performance test to p4211-line-log.sh that is more
sensitive to this cache-loading. By limiting to 1000 commits, we
more closely resemble user wait time when reading history into a
pager.

For a copy of the Linux repo with two ~512 MB packfiles and ~572K
loose objects, running 'git log --oneline --parents --raw -1000'
had the following performance:

 HEAD~1            HEAD
----------------------------------------
 7.70(7.15+0.54)   7.44(7.09+0.29) -3.4%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-04 10:38:55 -08:00
Junio C Hamano
af6e0fe3a5 Merge branch 'tb/add-renormalize'
"git add --renormalize ." is a new and safer way to record the fact
that you are correcting the end-of-line convention and other
"convert_to_git()" glitches in the in-repository data.

* tb/add-renormalize:
  add: introduce "--renormalize"
2017-11-27 11:06:37 +09:00
Jeff King
87b5e236a1 sha1_file: fast-path null sha1 as a missing object
In theory nobody should ever ask the low-level object code
for a null sha1. It's used as a sentinel for "no such
object" in lots of places, so leaking through to this level
is a sign that the higher-level code is not being careful
about its error-checking.  In practice, though, quite a few
code paths seem to rely on the null sha1 lookup failing as a
way to quietly propagate non-existence (e.g., by feeding it
to lookup_commit_reference_gently(), which then returns
NULL).

When this happens, we do two inefficient things:

  1. We actually search for the null sha1 in packs and in
     the loose object directory.

  2. When we fail to find it, we re-scan the pack directory
     in case a simultaneous repack happened to move it from
     loose to packed. This can be very expensive if you have
     a large number of packs.

Only the second one actually causes noticeable performance
problems, so we could treat them independently. But for the
sake of simplicity (both of code and of reasoning about it),
it makes sense to just declare that the null sha1 cannot be
a real on-disk object, and looking it up will always return
"no such object".

There's no real loss of functionality to do so Its use as a
sentinel value means that anybody who is unlucky enough to
hit the 2^-160th chance of generating an object with that
sha1 is already going to find the object largely unusable.

In an ideal world, we'd simply fix all of the callers to
notice the null sha1 and avoid passing it to us. But a
simple experiment to catch this with a BUG() shows that
there are a large number of code paths that do so.

So in the meantime, let's fix the performance problem by
taking a fast exit from the object lookup when we see a null
sha1. p5551 shows off the improvement (when a fetched ref is
new, the "old" sha1 is 0{40}, which ends up being passed for
fast-forward checks, the status table abbreviations, etc):

  Test            HEAD^             HEAD
  --------------------------------------------------------
  5551.4: fetch   5.51(5.03+0.48)   0.17(0.10+0.06) -96.9%

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 10:50:11 +09:00
Junio C Hamano
5a80d1dd9c Merge branch 'jk/info-alternates-fix' into maint
We used to add an empty alternate object database to the system
that does not help anything; it has been corrected.

* jk/info-alternates-fix:
  link_alt_odb_entries: make empty input a noop
2017-11-21 14:05:31 +09:00
Torsten Bögershausen
9472935d81 add: introduce "--renormalize"
Make it safer to normalize the line endings in a repository.
Files that had been commited with CRLF will be commited with LF.

The old way to normalize a repo was like this:

 # Make sure that there are not untracked files
 $ echo "* text=auto" >.gitattributes
 $ git read-tree --empty
 $ git add .
 $ git commit -m "Introduce end-of-line normalization"

The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.

The new "add --renormalize" does not add untracked files:

 $ echo "* text=auto" >.gitattributes
 $ git add --renormalize .
 $ git commit -m "Introduce end-of-line normalization"

Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".

While at it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.

Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-17 10:31:05 +09:00
Junio C Hamano
26a45eac80 Merge branch 'jk/info-alternates-fix'
We used to add an empty alternate object database to the system
that does not help anything; it has been corrected.

* jk/info-alternates-fix:
  link_alt_odb_entries: make empty input a noop
2017-11-15 12:14:36 +09:00
Jeff King
f28e36686a link_alt_odb_entries: make empty input a noop
If an empty string is passed to link_alt_odb_entries(), our
loop finds no entries and we link nothing. But we still do
some preparatory work to normalize the object directory
path, even though we'll never look at the result. This
triggers in basically every git process, since we feed the
usually-empty ALTERNATE_DB_ENVIRONMENT to the function.

Let's detect early that there's nothing to do and return.
While we're at it, let's treat NULL the same as an empty
string as a favor to our callers. That saves
prepare_alt_odb() from having to cover this case.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-13 14:05:27 +09:00
brian m. carlson
f50e766b7b Add structure representing hash algorithm
Since in the future we want to support an additional hash algorithm, add
a structure that represents a hash algorithm and all the data that must
go along with it.  Add a constant to allow easy enumeration of hash
algorithms.  Implement function typedefs to create an abstract API that
can be used by any hash algorithm, and wrappers for the existing SHA1
functions that conform to this API.

Expose a value for hex size as well as binary size.  While one will
always be twice the other, the two values are both used extremely
commonly throughout the codebase and providing both leads to improved
readability.

Don't include an entry in the hash algorithm structure for the null
object ID.  As this value is all zeros, any suitably sized all-zero
object ID can be used, and there's no need to store a given one on a
per-hash basis.

The current hash function transition plan envisions a time when we will
accept input from the user that might be in SHA-1 or in the NewHash
format.  Since we cannot know which the user has provided, add a
constant representing the unknown algorithm to allow us to indicate that
we must look the correct value up.  Provide dummy API functions that die
in this case.

Finally, include git-compat-util.h in hash.h so that the required types
are available.  This aids people using automated tools their editors.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-13 13:20:44 +09:00
Junio C Hamano
bde1370010 Merge branch 'rs/hex-to-bytes-cleanup'
Code cleanup.

* rs/hex-to-bytes-cleanup:
  sha1_file: use hex_to_bytes()
  http-push: use hex_to_bytes()
  notes: move hex_to_bytes() to hex.c and export it
2017-11-09 14:31:27 +09:00
Junio C Hamano
e7e456f500 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (25 commits)
  refs/files-backend: convert static functions to object_id
  refs: convert read_raw_ref backends to struct object_id
  refs: convert peel_object to struct object_id
  refs: convert resolve_ref_unsafe to struct object_id
  worktree: convert struct worktree to object_id
  refs: convert resolve_gitlink_ref to struct object_id
  Convert remaining callers of resolve_gitlink_ref to object_id
  sha1_file: convert index_path and index_fd to struct object_id
  refs: convert reflog_expire parameter to struct object_id
  refs: convert read_ref_at to struct object_id
  refs: convert peel_ref to struct object_id
  builtin/pack-objects: convert to struct object_id
  pack-bitmap: convert traverse_bitmap_commit_list to object_id
  refs: convert dwim_log to struct object_id
  builtin/reflog: convert remaining unsigned char uses to object_id
  refs: convert dwim_ref and expand_ref to struct object_id
  refs: convert read_ref and read_ref_full to object_id
  refs: convert resolve_refdup and refs_resolve_refdup to struct object_id
  Convert check_connected to use struct object_id
  refs: update ref transactions to use struct object_id
  ...
2017-11-06 14:24:27 +09:00
Junio C Hamano
0b646bcac9 Merge branch 'ma/lockfile-fixes'
An earlier update made it possible to use an on-stack in-core
lockfile structure (as opposed to having to deliberately leak an
on-heap one).  Many codepaths have been updated to take advantage
of this new facility.

* ma/lockfile-fixes:
  read_cache: roll back lock in `update_index_if_able()`
  read-cache: leave lock in right state in `write_locked_index()`
  read-cache: drop explicit `CLOSE_LOCK`-flag
  cache.h: document `write_locked_index()`
  apply: remove `newfd` from `struct apply_state`
  apply: move lockfile into `apply_state`
  cache-tree: simplify locking logic
  checkout-index: simplify locking logic
  tempfile: fix documentation on `delete_tempfile()`
  lockfile: fix documentation on `close_lock_file_gently()`
  treewide: prefer lockfiles on the stack
  sha1_file: do not leak `lock_file`
2017-11-06 13:11:21 +09:00
René Scharfe
62a24c8923 sha1_file: use hex_to_bytes()
The path of a loose object contains its hash value encoded into two
substrings of 2 and 38 hexadecimal digits separated by a slash.  The
first part is handed to for_each_file_in_obj_subdir() in decoded form as
subdir_nr.  The current code builds a full hexadecimal representation of
the hash in a temporary buffer, then uses get_oid_hex() to decode it.

Avoid the intermediate step by taking subdir_nr as-is and using
hex_to_bytes() directly on the second substring.  That's shorter and
easier.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 10:35:40 +09:00
Junio C Hamano
95c1a79630 Merge branch 'jk/info-alternates-fix' into maint
A regression fix for 2.11 that made the code to read the list of
alternate object stores overrun the end of the string.

* jk/info-alternates-fix:
  read_info_alternates: warn on non-trivial errors
  read_info_alternates: read contents into strbuf
2017-10-23 14:40:00 +09:00
Junio C Hamano
96c6bb566e Merge branch 'jk/write-in-full-fix' into maint
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

* jk/write-in-full-fix:
  read_pack_header: handle signed/unsigned comparison in read result
  config: flip return value of store_write_*()
  notes-merge: use ssize_t for write_in_full() return value
  pkt-line: check write_in_full() errors against "< 0"
  convert less-trivial versions of "write_in_full() != len"
  avoid "write_in_full(fd, buf, len) != len" pattern
  get-tar-commit-id: check write_in_full() return against 0
  config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-10-23 14:37:22 +09:00
Junio C Hamano
eeed979e6a Merge branch 'jk/sha1-loose-object-info-fix' into maint
Leakfix and futureproofing.

* jk/sha1-loose-object-info-fix:
  sha1_loose_object_info: handle errors from unpack_sha1_rest
2017-10-18 14:19:14 +09:00
Junio C Hamano
7c9375db0e Merge branch 'jk/drop-sha1-entry-pos' into maint
Code clean-up.

* jk/drop-sha1-entry-pos:
  sha1-lookup: remove sha1_entry_pos() from header file
  sha1_file: drop experimental GIT_USE_LOOKUP search
2017-10-18 14:19:06 +09:00
brian m. carlson
a98e6101f0 refs: convert resolve_gitlink_ref to struct object_id
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:

@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)

@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
brian m. carlson
bcd2986473 sha1_file: convert index_path and index_fd to struct object_id
Convert these two functions and the functions that underlie them to take
pointers to struct object_id.  This is a prerequisite to convert
resolve_gitlink_ref.  Fix a stray tab in the middle of the index_mem
call in index_pipe by converting it to a space.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
Junio C Hamano
40abbe4306 Merge branch 'jk/sha1-loose-object-info-fix'
Leakfix and futureproofing.

* jk/sha1-loose-object-info-fix:
  sha1_loose_object_info: handle errors from unpack_sha1_rest
2017-10-11 14:52:22 +09:00
Jeff King
b3ea7dd32d sha1_loose_object_info: handle errors from unpack_sha1_rest
When a caller of sha1_object_info_extended() sets the
"contentp" field in object_info, we call unpack_sha1_rest()
but do not check whether it signaled an error.

This causes two problems:

  1. We pass back NULL to the caller via the contentp field,
     but the function returns "0" for success. A caller
     might reasonably expect after a successful return that
     it can access contentp without a NULL check and
     segfault.

     As it happens, this is impossible to trigger in the
     current code. There is exactly one caller which uses
     contentp, read_object(). And the only thing it does
     after a successful call is to return the content
     pointer to its caller, using NULL as a sentinel for
     errors. So in effect it converts the success code from
     sha1_object_info_extended() back into an error!

     But this is still worth addressing avoid problems for
     future users of "contentp".

  2. Callers of unpack_sha1_rest() are expected to close the
     zlib stream themselves on error. Which means that we're
     leaking the stream.

The problem in (1) comes from from c84a1f3ed4 (sha1_file:
refactor read_object, 2017-06-21), which added the contentp
field.  Before that, we called unpack_sha1_rest() via
unpack_sha1_file(), which directly used the NULL to signal
an error.

But note that the leak in (2) is actually older than that.
The original unpack_sha1_file() directly returned the result
of unpack_sha1_rest() to its caller, when it should have
been closing the zlib stream itself on error.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 13:04:41 +09:00
Martin Ågren
f132a127ee sha1_file: do not leak lock_file
There is no longer any need to allocate and leak a `struct lock_file`.
Initialize it on the stack instead.

Before this patch, we set `lock = NULL` to signal that we have already
rolled back, and that we should not do any more work. We need to take
another approach now that we cannot assign NULL. We could, e.g., use
`is_lock_file_locked()`. But we already have another variable that we
could use instead, `found`. Its scope is only too small.

Bump `found` to the scope of the whole function and rearrange the "roll
back or write?"-checks to a straightforward if-else on `found`. This
also future-proves the code by making it obvious that we intend to take
exactly one of these paths.

Improved-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 10:07:10 +09:00
Junio C Hamano
cb1083ca23 Merge branch 'jk/read-in-full'
Code clean-up to prevent future mistakes by copying and pasting
code that checks the result of read_in_full() function.

* jk/read-in-full:
  worktree: check the result of read_in_full()
  worktree: use xsize_t to access file size
  distinguish error versus short read from read_in_full()
  avoid looking at errno for short read_in_full() returns
  prefer "!=" when checking read_in_full() result
  notes-merge: drop dead zero-write code
  files-backend: prefer "0" for write_in_full() error check
2017-10-03 15:42:49 +09:00
Jeff King
90dca6710e avoid looking at errno for short read_in_full() returns
When a caller tries to read a particular set of bytes via
read_in_full(), there are three possible outcomes:

  1. An error, in which case -1 is returned and errno is
     set.

  2. A short read, in which fewer bytes are returned and
     errno is unspecified (we never saw a read error, so we
     may have some random value from whatever syscall failed
     last).

  3. The full read completed successfully.

Many callers handle cases 1 and 2 together by just checking
the result against the requested size. If their combined
error path looks at errno (e.g., by calling die_errno), they
may report a nonsense value.

Let's fix these sites by having them distinguish between the
two error cases. That avoids the random errno confusion, and
lets us give more detailed error messages.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 15:45:24 +09:00
Junio C Hamano
f759c873a3 Merge branch 'jk/info-alternates-fix'
A regression fix for 2.11 that made the code to read the list of
alternate object stores overrun the end of the string.

* jk/info-alternates-fix:
  read_info_alternates: warn on non-trivial errors
  read_info_alternates: read contents into strbuf
2017-09-25 15:24:09 +09:00
Junio C Hamano
c50424a6f0 Merge branch 'jk/write-in-full-fix'
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

* jk/write-in-full-fix:
  read_pack_header: handle signed/unsigned comparison in read result
  config: flip return value of store_write_*()
  notes-merge: use ssize_t for write_in_full() return value
  pkt-line: check write_in_full() errors against "< 0"
  convert less-trivial versions of "write_in_full() != len"
  avoid "write_in_full(fd, buf, len) != len" pattern
  get-tar-commit-id: check write_in_full() return against 0
  config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25 15:24:06 +09:00
Jeff King
f0f7bebef7 read_info_alternates: warn on non-trivial errors
When we fail to open $GIT_DIR/info/alternates, we silently
assume there are no alternates. This is the right thing to
do for ENOENT, but not for other errors.

A hard error is probably overkill here. If we fail to read
an alternates file then either we'll complete our operation
anyway, or we'll fail to find some needed object. Either
way, a warning is good idea. And we already have a helper
function to handle this pattern; let's just call
warn_on_fopen_error().

Note that technically the errno from strbuf_read_file()
might be from a read() error, not open(). But since read()
would never return ENOENT or ENOTDIR, and since it produces
a generic "unable to access" error, it's suitable for
handling errors from either.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-20 11:33:29 +09:00
Junio C Hamano
0db625f5d6 Merge branch 'jk/info-alternates-fix-2.11' into jk/info-alternates-fix
* jk/info-alternates-fix-2.11:
  read_info_alternates: read contents into strbuf
2017-09-20 11:33:06 +09:00
Jeff King
dc732bd5cb read_info_alternates: read contents into strbuf
This patch fixes a regression in v2.11.1 where we might read
past the end of an mmap'd buffer. It was introduced in
cf3c635210.

The link_alt_odb_entries() function has always taken a
ptr/len pair as input. Until cf3c635210 (alternates: accept
double-quoted paths, 2016-12-12), we made a copy of those
bytes in a string. But after that commit, we switched to
parsing the input left-to-right, and we ignore "len"
totally, instead reading until we hit a NUL.

This has mostly gone unnoticed for a few reasons:

  1. All but one caller passes a NUL-terminated string, with
     "len" pointing to the NUL.

  2. The remaining caller, read_info_alternates(), passes in
     an mmap'd file. Unless the file is an exact multiple of
     the page size, it will generally be followed by NUL
     padding to the end of the page, which just works.

The easiest way to demonstrate the problem is to build with:

  make SANITIZE=address NO_MMAP=Nope test

Any test which involves $GIT_DIR/info/alternates will fail,
as the mmap emulation (correctly) does not add an extra NUL,
and ASAN complains about reading past the end of the buffer.

One solution would be to teach link_alt_odb_entries() to
respect "len". But it's actually a bit tricky, since we
depend on unquote_c_style() under the hood, and it has no
ptr/len variant.

We could also just make a NUL-terminated copy of the input
bytes and operate on that. But since all but one caller
already is passing a string, instead let's just fix that
caller to provide NUL-terminated input in the first place,
by swapping out mmap for strbuf_read_file().

There's no advantage to using mmap on the alternates file.
It's not expected to be large (and anyway, we're copying its
contents into an in-memory linked list). Nor is using
git_open() buying us anything here, since we don't keep the
descriptor open for a long period of time.

Let's also drop the "len" parameter entirely from
link_alt_odb_entries(), since it's completely ignored. That
will avoid any new callers re-introducing a similar bug.

Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-20 11:32:04 +09:00
Junio C Hamano
d811ba1897 Merge branch 'rs/strbuf-leakfix'
Many leaks of strbuf have been fixed.

* rs/strbuf-leakfix: (34 commits)
  wt-status: release strbuf after use in wt_longstatus_print_tracking()
  wt-status: release strbuf after use in read_rebase_todolist()
  vcs-svn: release strbuf after use in end_revision()
  utf8: release strbuf on error return in strbuf_utf8_replace()
  userdiff: release strbuf after use in userdiff_get_textconv()
  transport-helper: release strbuf after use in process_connect_service()
  sequencer: release strbuf after use in save_head()
  shortlog: release strbuf after use in insert_one_record()
  sha1_file: release strbuf on error return in index_path()
  send-pack: release strbuf on error return in send_pack()
  remote: release strbuf after use in set_url()
  remote: release strbuf after use in migrate_file()
  remote: release strbuf after use in read_remote_branches()
  refs: release strbuf on error return in write_pseudoref()
  notes: release strbuf after use in notes_copy_from_stdin()
  merge: release strbuf after use in write_merge_heads()
  merge: release strbuf after use in save_state()
  mailinfo: release strbuf on error return in handle_boundary()
  mailinfo: release strbuf after use in handle_from()
  help: release strbuf on error return in exec_woman_emacs()
  ...
2017-09-19 10:47:57 +09:00
Jeff King
f48ecd38cb read_pack_header: handle signed/unsigned comparison in read result
The result of read_in_full() may be -1 if we saw an error.
But in comparing it to a sizeof() result, that "-1" will be
promoted to size_t. In fact, the largest possible size_t
which is much bigger than our struct size. This means that
our "< sizeof(header)" error check won't trigger.

In practice, we'd go on to read uninitialized memory and
compare it to the PACK signature, which is likely to fail.
But we shouldn't get there.

We can fix this by making a direct "!=" comparison to the
requested size, rather than "<". This means that errors get
lumped in with short reads, but that's sufficient for our
purposes here. There's no PH_ERROR tp represent our case.
And anyway, this function reads from pipes and network
sockets. A network error may racily appear as EOF to us
anyway if there's data left in the socket buffers.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:18:00 +09:00
Junio C Hamano
f04f860dfa Merge branch 'sb/sha1-file-cleanup' into maint
Code clean-up.

* sb/sha1-file-cleanup:
  sha1_file: make read_info_alternates static
2017-09-10 17:03:04 +09:00
Junio C Hamano
c580ce194f Merge branch 'rs/find-pack-entry-bisection' into maint
Code clean-up.

* rs/find-pack-entry-bisection:
  sha1_file: avoid comparison if no packed hash matches the first byte
2017-09-10 17:03:02 +09:00
Junio C Hamano
438776e3d4 Merge branch 'rs/unpack-entry-leakfix' into maint
Memory leak in an error codepath has been plugged.

* rs/unpack-entry-leakfix:
  sha1_file: release delta_stack on error in unpack_entry()
2017-09-10 17:02:53 +09:00
Rene Scharfe
ea8e029785 sha1_file: release strbuf on error return in index_path()
strbuf_readlink() already frees the buffer for us on error.  Clean up
if write_sha1_file() fails as well instead of returning early.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 08:49:28 +09:00
Junio C Hamano
8b36f0b196 Merge branch 'po/read-graft-line'
Conversion from uchar[20] to struct object_id continues; this is to
ensure that we do not assume sizeof(struct object_id) is the same
as the length of SHA-1 hash (or length of longest hash we support).

* po/read-graft-line:
  commit: rewrite read_graft_line
  commit: allocate array using object_id size
  commit: replace the raw buffer with strbuf in read_graft_line
  sha1_file: fix definition of null_sha1
2017-09-06 13:11:25 +09:00
Junio C Hamano
eabdcd4ab4 Merge branch 'jt/packmigrate'
Code movement to make it easier to hack later.

* jt/packmigrate: (23 commits)
  pack: move for_each_packed_object()
  pack: move has_pack_index()
  pack: move has_sha1_pack()
  pack: move find_pack_entry() and make it global
  pack: move find_sha1_pack()
  pack: move find_pack_entry_one(), is_pack_valid()
  pack: move check_pack_index_ptr(), nth_packed_object_offset()
  pack: move nth_packed_object_{sha1,oid}
  pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry()
  pack: move unpack_object_header()
  pack: move get_size_from_delta()
  pack: move unpack_object_header_buffer()
  pack: move {,re}prepare_packed_git and approximate_object_count
  pack: move install_packed_git()
  pack: move add_packed_git()
  pack: move unuse_pack()
  pack: move use_pack()
  pack: move pack-closing functions
  pack: move release_pack_memory()
  pack: move open_pack_index(), parse_pack_index()
  ...
2017-08-26 22:55:09 -07:00