@@ -10,7 +10,8 @@ bookmarks into a packfile.
10
10
11
11
There are two versions of the packfile index - version one, which is the default
12
12
in versions of Git earlier than 1.6, and version two, which is the default
13
- from 1.6 forward, but which can be read by Git versions going back to 1.5.2.
13
+ from 1.6 forward, but which can be read by Git versions going back to 1.5.2, and
14
+ has been further backported to 1.4.4.5 if you are still on the 1.4 series.
14
15
15
16
Version 2 also includes a CRC checksum of each object so compressed data
16
17
can be copied directly from pack to pack during repacking without
@@ -20,8 +21,15 @@ larger than 4 Gb.
20
21
[ fig: packfile-index ]
21
22
22
23
In both formats, the fanout table is simply a way to find the offset of a
23
- particular sha faster within the index file. In version 1, the offsets and
24
- shas are in the same space, where in version two, there are seperate tables
24
+ particular sha faster within the index file. The offset/sha1[ ]
25
+ tables are sorted by sha1[ ] values (this is to allow binary search of this
26
+ table), and fanout[ ] table points at the offset/sha1[ ] table in a specific
27
+ way (so that part of the latter table that covers all hashes that start
28
+ with a given byte can be found to avoid 8 iterations of the binary
29
+ search).
30
+
31
+ In version 1, the offsets and shas are in the same space, where in version two,
32
+ there are seperate tables
25
33
for the shas, crc checksums and offsets. At the end of both files are
26
34
checksum shas for both the index file and the packfile it references.
27
35
@@ -33,12 +41,25 @@ a pack. The packfile format is used in upload-pack and receieve-pack programs
33
41
34
42
### The Packfile Format ###
35
43
36
- The packfile itself is a very simple format. The first four bytes is the
37
- string 'PACK', which is sort of used to make sure you're getting the start
38
- of the packfile correctly. After that, you get a series of packed objects,
44
+ The packfile itself is a very simple format. There is a header, a series of
45
+ packed objects (each with it's own header and body) and then a checksum trailer.
46
+ The first four bytes is the string 'PACK', which is sort of used to make sure
47
+ you're getting the start of the packfile correctly. This is followed by a 4-byte
48
+ packfile version number and then a 4-byte number of entries in that file. In
49
+ Ruby, you might read the header data like this:
50
+
51
+ ruby
52
+ def read_pack_header
53
+ sig = @session.recv(4)
54
+ ver = @session.recv(4).unpack("N")[0]
55
+ entries = @session.recv(4).unpack("N")[0]
56
+ [sig, ver, entries]
57
+ end
58
+
59
+ After that, you get a series of packed objects, in order of thier SHAs
39
60
which each consist of an object header and object contents. At the end
40
- of the packfile is a SHA1 sum of all the shas (in sorted order) in that
41
- packfile.
61
+ of the packfile is a 20-byte SHA1 sum of all the shas (in sorted order) in that
62
+ packfile.
42
63
43
64
[ fig: packfile-format ]
44
65
@@ -64,4 +85,19 @@ It is important to note that the size specified in the header data is not
64
85
the size of the data that actually follows, but the size of that data * when
65
86
expanded* . This is why the offsets in the packfile index are so useful,
66
87
otherwise you have to expand every object just to tell when the next header
67
- starts.
88
+ starts.
89
+
90
+ The data part is just zlib stream for non-delta object types; for the two
91
+ delta object representations, the data portion contains something that
92
+ identifies which base object this delta representation depends on, and the
93
+ delta to apply on the base object to resurrect this object. <code >ref-delta</code >
94
+ uses 20-byte hash of the base object at the beginning of data, while
95
+ <code >ofs-delta</code > stores an offset within the same packfile to identify the base
96
+ object. In either case, two important constraints a reimplementor must
97
+ adhere to are:
98
+
99
+ * delta representation must be based on some other object within the same
100
+ packfile;
101
+
102
+ * the base object must be of the same underlying type (blob, tree, commit
103
+ or tag);
0 commit comments