This also adds a header with a signature, version info, and the number
of objects to the pack file. It also encodes the file length and type
more efficiently.
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Packed delta files created by git-pack-objects seems to be the
way to go, and existing "delta" object handling code has exposed
the object representation details to too many places. Remove it
while we refactor code to come up with a proper interface in
sha1_file.c.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).
This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1). That way you can
validate the pairing of the two if you want to.
We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing. The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.
Update unpack-objects to know about the extra 20 bytes at the end
of the index.
Starting from big objects and going backwards means that we end up
picking a delta that goes from a bigger object to a smaller one. That's
advantageous for two reasons: the bigger object is likely the newer one
(since things tend to grow, rather than shrink), and doing a delete
tends to be smaller than doing an add.
So the deltas don't tend to be top-of-tree, and the packed end result is
just slightly smaller.
This actually successfully packed and unpacked a git archive down to
1.3MB (17MB unpacked).
Right now unpacking is way too noisy, lots of debug messages left.
This finishes the initial round of git-pack-object /
git-unpack-object pair. They are now good enough to be used as
a transport medium:
- Fix delta direction in pack-objects; the original was
computing delta to create the base object from the object to
be squashed, which was quite unfriendly for unpacker ;-).
- Add a script to test the very basics.
- Implement unpacker for both regular and deltified objects.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A zero disables delta generation (like before), but we make the window
be one bigger than specified, since we use one entry for the one to be
tested (it used to be that "--window=1" was meaningless, since we'd have
used up the single-entry window with the entry to be tested, and had no
chance of actually ever finding a delta).
The default window remains at 10, but now it really means "test the 10
closest objects", not "test the 9 closest objects".
Anything that generates a delta to see if two objects are close usually
isn't interested in the delta ends up being bigger than some specified
size, and this allows us to stop delta generation early when that
happens.
When Junio fixed the lack of a successful error code from try_delta(),
that uncovered an off-by-one error in the caller.
Also, some testing made it clear that we now find a lot more deltas,
because we used to (incorrectly) break early on bogus "failure"
cases.
Return value of try_delta is checked for negativeness, but the
success path does not return anything, letting compiler warn and
presumably return garbage.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is kind of like a tar-ball for a set of objects, ready to be
shipped off to another end. Alternatively, you could use is as a packed
representation of the object database directly, if you changed
"read_sha1_file()" to read these kinds of packs.
The latter is partiularly useful to generate a "packed history", ie you
could pack up your old history efficiently, but still have it available
(at a performance hit, of course).
I haven't actually written an unpacker yet, so the end result has not
been verified in any way yet. I obviously always write bug-free code,
so it just has to work, no?