1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-17 22:44:49 +01:00
Commit graph

23 commits

Author SHA1 Message Date
Nicolas Pitre
abeb40e5aa improve reliability of fixup_pack_header_footer()
Currently, this function has the potential to read corrupted pack data
from disk and give it a valid SHA1 checksum.  Let's add the ability to
validate SHA1 checksum of existing data along the way, including before
and after any arbitrary point in the pack.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-29 21:51:27 -07:00
Nicolas Pitre
c41a4a9468 verify-pack: check packed object CRC when using index version 2
To do so, check_pack_crc() moved from builtin-pack-objects.c to
pack-check.c where it is more logical to share.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Nicolas Pitre
77d3ecee85 move show_pack_info() where it belongs
This is called when verify_pack() has its verbose argument set, and
verbose in this context makes sense only for the actual 'git verify-pack'
command.  Therefore let's move show_pack_info() to builtin-verify-pack.c
instead and remove useless verbose argument from verify_pack().

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Junio C Hamano
265ae18826 Merge branch 'np/progress'
* np/progress:
  Show total transferred as part of throughput progress
  make sure throughput display gets updated even if progress doesn't move
  return the prune-packed progress display to the inner loop
  add throughput display to git-push
  add some copyright notice to the progress display code
  add throughput display to index-pack
  add throughput to progress display
  relax usage of the progress API
  make struct progress an opaque type
  prune-packed: don't call display_progress() for every file
  Stop displaying "Pack pack-$ID created." during git-gc
  Teach prune-packed to use the standard progress meter
  Change 'Deltifying objects' to 'Compressing objects'
  fix for more minor memory leaks
  fix const issues with some functions
  pack-objects.c: fix some global variable abuse and memory leaks
  pack-objects: no delta possible with only one object in the list
  cope with multiple line breaks within sideband progress messages
  more compact progress display
2007-11-02 16:27:37 -07:00
Nicolas Pitre
4049b9cfc0 fix const issues with some functions
Two functions, namely write_idx_file() and open_pack_file(), currently
return a const pointer.  However that pointer is either a copy of the
first argument, or set to a malloc'd buffer when that first argument
is null.  In the later case it is wrong to qualify that pointer as const
since ownership of the buffer is transferred to the caller to dispose of,
and obviously the free() function is not meant to be passed const
pointers.

Making the return pointer not const causes a warning when the first
argument is returned since that argument is also marked const.

The correct thing to do is therefore to remove the const qualifiers,
avoiding the need for ugly casts only to silence some warnings.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-17 02:54:57 -04:00
Shawn O. Pearce
106764e651 Refactor index-pack "keep $sha1" handling for reuse
There is a subtle (but important) linkage between receive-pack and
index-pack that allows index-pack to create a packfile but protect
it from being deleted by a concurrent `git repack -a -d` operation.
The linkage works by having index-pack mark the newly created pack
with a ".keep" file and then it passes the SHA-1 name of that new
packfile to receive-pack along its stdout channel.

The receive-pack process must unkeep the packfile by deleting the
.keep file, but can it can only do so after all elgible refs have
been updated in the receiving repository.  This ensures that the
packfile is either kept or its objects are reachable, preventing
a concurrent repacker from deleting the packfile before it can
determine that its objects are actually needed by the repository.

The new builtin-fetch code needs to perform the same actions if
it choose to run index-pack rather than unpack-objects, so I am
moving this code out to its own function where both receive-pack
and fetch-pack are able to invoke it when necessary.  The caller
is responsible for deleting the returned ".keep" and freeing the
path if the returned path is not NULL.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-19 03:22:30 -07:00
Geert Bosch
aa7e44bf57 Unify write_index_file functions
This patch unifies the write_index_file functions in
builtin-pack-objects.c and index-pack.c.  As the name
"index" is overloaded in git, move in the direction of
using "idx" and "pack idx" when refering to the pack index.
There should be no change in functionality.

Signed-off-by: Geert Bosch <bosch@gnat.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-06-02 13:14:18 -07:00
Dana L. How
8b0eca7c7b Create pack-write.c for common pack writing code
Include a generalized fixup_pack_header_footer() in this new file.
Needed by git-repack --max-pack-size feature in a later patchset.

[sp: Moved close(pack_fd) to callers, to support index-pack, and
     changed name to better indicate it is for packfiles.]

Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-02 13:24:18 -04:00
Nicolas Pitre
4287307833 [PATCH] clean up pack index handling a bit
Especially with the new index format to come, it is more appropriate
to encapsulate more into check_packed_git_idx() and assume less of the
index format in struct packed_git.

To that effect, the index_base is renamed to index_data with void * type
so it is not used directly but other pointers initialized with it. This
allows for a couple pointer cast removal, as well as providing a better
generic name to grep for when adding support for new index versions or
formats.

And index_data is declared const too while at it.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-16 21:27:36 -07:00
Junio C Hamano
a69e542989 Refactor the pack header reading function out of receive-pack.c
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-24 18:08:02 -08:00
Simon 'corecode' Schubert
bb79103194 Use fixed-size integers for the on-disk pack structure.
Plain integer types without a fixed size can vary between platforms.  Even
though all common platforms use 32-bit ints, there is no guarantee that
this won't change at some point.  Furthermore, specifying an integer type
with explicit size makes the definition of structures more obvious.

Signed-off-by: Simon 'corecode' Schubert <corecode@fs.ei.tum.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-18 14:11:50 -08:00
Shawn O. Pearce
df1b059d8d Document pack .idx file format upgrade strategy.
Way back when Junio developed the 64 bit index topic he came up
with a means of changing the .idx file format so that older Git
clients would recognize that they don't understand the file and
refuse to read it, while newer clients could tell the difference
between the old-style and new-style .idx files.  Unfortunately
this wasn't recorded anywhere.

This change documents how we might go about changing the .idx
file format by using a special signature in the first four bytes.
Credit (and possible blame) goes completely to Junio for thinking
up this technique.

The change also modifies the error message of the current Git code
so that users get a recommendation to upgrade their Git software
should this version or later encounter a new-style .idx which it
cannot process.  We already do this for the .pack files, but since
we usually process the .idx files first its important that these
files are recognized and encourage an upgrade.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-17 20:51:45 -08:00
Junio C Hamano
05eb811aa1 Merge branch 'np/pack'
* np/pack:
  add the capability for index-pack to read from a stream
  index-pack: compare only the first 20-bytes of the key.
  git-repack: repo.usedeltabaseoffset
  pack-objects: document --delta-base-offset option
  allow delta data reuse even if base object is a preferred base
  zap a debug remnant
  let the GIT native protocol use offsets to delta base when possible
  make pack data reuse compatible with both delta types
  make git-pack-objects able to create deltas with offset to base
  teach git-index-pack about deltas with offset to base
  teach git-unpack-objects about deltas with offset to base
  introduce delta objects with offset to base
2006-10-22 22:51:42 -07:00
Junio C Hamano
29f049a0c2 Revert "move pack creation to version 3"
This reverts commit 16854571aa.
Git as recent as v1.1.6 do not understand version 3 delta.

v1.2.0 is Ok and I personally would say it is old enough, but
the improvement between version 2 and version 3 delta is not
bit enough to justify breaking older clients.

We should resurrect this later, but when we do so we shold
make it conditional.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-14 23:38:01 -07:00
Nicolas Pitre
780e6e735b make pack data reuse compatible with both delta types
This is the missing part to git-pack-objects allowing it to reuse delta
data to/from any of the two delta types.  It can reuse delta from any
type, and it outputs base offsets when --allow-delta-base-offset is
provided and the base is also included in the pack.  Otherwise it
outputs base sha1 references just like it always did.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 00:12:00 -07:00
Nicolas Pitre
16854571aa move pack creation to version 3
It's been quite a while now that GIT is able to read version 3 packs.
Let's create them at last.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-22 19:24:52 -07:00
Linus Torvalds
1974632c66 Remove TYPE_* constant macros and use object_type enums consistently.
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-12 23:18:03 -07:00
Junio C Hamano
a49dd05fd0 pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs.  If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.

Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed).  In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.

Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:

    $ git-rev-list --objects v2.6.16-rc3 >RL
    $ wc -l RL
    184141 RL
    $ time git-pack-objects p <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2

    real    12m4.323s
    user    11m2.560s
    sys     0m55.950s

With this patch, the same input:

    $ time ../git.junio/git-pack-objects q <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects.....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
    Total 184141, written 184141, reused 182441

    real    1m2.608s
    user    0m55.090s
    sys     0m1.830s

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 02:11:38 -08:00
Nicolas Pitre
d60fc1c864 remove delta-against-self bit
After experimenting with code to add the ability to encode a delta
against part of the deltified file, it turns out that resulting packs
are _bigger_ than when this ability is not used.  The raw delta output
might be smaller, but it doesn't compress as well using gzip with a
negative net saving on average.

Said bit would in fact be more useful to allow for encoding the copying
of chunks larger than 64KB providing more savings with large files.
This will correspond to packs version 3.

While the current code still produces packs version 2, it is made future
proof so pack versions 2 and 3 are accepted.  Any pack version 2 are
compatible with version 3 since the redefined bit was never used before.
When enough time has passed, code to use that bit to produce version 3
packs could be added.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-09 21:06:38 -08:00
Junio C Hamano
f3bf922409 [PATCH] verify-pack updates.
Nico pointed out that having verify_pack.c and verify-pack.c was
confusing.  Rename verify_pack.c to pack-check.c as suggested,
and enhances the verification done quite a bit.

 - Built-in sha1_file unpacking knows that a base object of a
   deltified object _must_ be in the same pack, and takes
   advantage of that fact.

 - Earlier verify-pack command only checked the SHA1 sum for the
   entire pack file and did not look into its contents.  It now
   checks everything idx file claims to have unpacks correctly.

 - It now has a hook to give more detailed information for
   objects contained in the pack under -v flag.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-30 22:33:47 -07:00
Junio C Hamano
f9253394a2 [PATCH] Add git-verify-pack command.
Given a list of <pack>.idx files, this command validates the
index file and the corresponding .pack file for consistency.

This patch also uses the same validation mechanism in fsck-cache
when the --full flag is used.

During normal operation, sha1_file.c verifies that a given .idx
file matches the .pack file by comparing the SHA1 checksum
stored in .idx file and .pack file as a minimum sanity check.
We may further want to check the pack signature and version when
we map the pack, but that would be a separate patch.

Earlier, errors to map a pack file was not flagged fatal but led
to a random fatal error later.  This version explicitly die()s
when such an error is detected.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29 09:11:39 -07:00
Linus Torvalds
01247d8742 Make git pack files use little-endian size encoding
This makes it match the new delta encoding, and admittedly makes the
code easier to follow.

This also updates the PACK file version to 2, since this (and the delta
encoding change in the previous commit) are incompatible with the old
format.
2005-06-28 22:15:57 -07:00
Linus Torvalds
a733cb606f Change pack file format. Hopefully for the last time.
This also adds a header with a signature, version info, and the number
of objects to the pack file.  It also encodes the file length and type
more efficiently.
2005-06-28 14:21:02 -07:00