|
| 1 | +## The Packfile ## |
| 2 | + |
| 3 | +This chapter explains in detail, down to the bits, how the packfile and |
| 4 | +pack index files are formatted. |
| 5 | + |
| 6 | +### The Packfile Index ### |
| 7 | + |
| 8 | +First off, we have the packfile index, which is basically just a series of |
| 9 | +bookmarks into a packfile. |
| 10 | + |
| 11 | +There are two versions of the packfile index - version one, which is the default |
| 12 | +in versions of Git earlier than 1.6, and version two, which is the default |
| 13 | +from 1.6 forward, but which can be read by Git versions going back to 1.5.2. |
| 14 | + |
| 15 | +Version 2 also includes a CRC checksum of each object so compressed data |
| 16 | +can be copied directly from pack to pack during repacking without |
| 17 | +undetected data corruption. Version 2 indexes can also handle packfiles |
| 18 | +larger than 4 Gb. |
| 19 | + |
| 20 | +[fig:packfile-index] |
| 21 | + |
| 22 | +In both formats, the fanout table is simply a way to find the offset of a |
| 23 | +particular sha faster within the index file. In version 1, the offsets and |
| 24 | +shas are in the same space, where in version two, there are seperate tables |
| 25 | +for the shas, crc checksums and offsets. At the end of both files are |
| 26 | +checksum shas for both the index file and the packfile it references. |
| 27 | + |
| 28 | +Importantly, packfile indexes are *not* neccesary to extract objects from |
| 29 | +a packfile, they are simply used to *quickly* retrieve individual objects from |
| 30 | +a pack. The packfile format is used in upload-pack and receieve-pack programs |
| 31 | +(push and fetch protocols) to transfer objects and there is no index used then |
| 32 | +- it can be built after the fact by scanning the packfile. |
| 33 | + |
| 34 | +### The Packfile Format ### |
| 35 | + |
| 36 | +The packfile itself is a very simple format. The first four bytes is the |
| 37 | +string 'PACK', which is sort of used to make sure you're getting the start |
| 38 | +of the packfile correctly. After that, you get a series of packed objects, |
| 39 | +which each consist of an object header and object contents. At the end |
| 40 | +of the packfile is a SHA1 sum of all the shas (in sorted order) in that |
| 41 | +packfile. |
| 42 | + |
| 43 | +[fig:packfile-format] |
| 44 | + |
| 45 | +The object header is a series of one or more 1 byte (8 bit) hunks that |
| 46 | +specify the type of object the following data is, and the size of the data |
| 47 | +when expanded. Each byte is really 7 bits of data, with the first bit being |
| 48 | +used to say if that hunk is the last one or not before the data starts. If |
| 49 | +the first bit is a 1, you will read another byte, otherwise the data starts |
| 50 | +next. The first 3 bits in the first byte specifies the type of data, |
| 51 | +according to the table below. |
| 52 | + |
| 53 | +(Currently, of the 8 values that can be expressed |
| 54 | +with 3 bits (0-7), 0 (000) is 'undefined' and 5 (101) is not yet used.) |
| 55 | + |
| 56 | +Here, we can see an example of a header of two bytes, where the first |
| 57 | +specifies that the following data is a commit, and the remainder of the first |
| 58 | +and the last 7 bits of the second specifies that the data will be 144 bytes |
| 59 | +when expanded. |
| 60 | + |
| 61 | +[fig:packfile-logic] |
| 62 | + |
| 63 | +It is important to note that the size specified in the header data is not |
| 64 | +the size of the data that actually follows, but the size of that data *when |
| 65 | +expanded*. This is why the offsets in the packfile index are so useful, |
| 66 | +otherwise you have to expand every object just to tell when the next header |
| 67 | +starts. |
0 commit comments