coderChenMing
diff --git a/‎assets/images/figure/packfile-format.png
62.6 KB b/‎assets/images/figure/packfile-format.png
62.6 KB
diff --git a/‎assets/images/figure/packfile-index.png
105 KB b/‎assets/images/figure/packfile-index.png
105 KB
diff --git a/‎assets/images/figure/packfile-logic.png
39.3 KB b/‎assets/images/figure/packfile-logic.png
39.3 KB
diff --git a/‎script/html.rb
Lines changed: 4 additions & 5 deletions b/‎script/html.rb
Lines changed: 4 additions & 5 deletions
diff --git a/‎text/48_How_Git_Stores_Objects/0_ How_Git_Stores_Objects.markdown
Lines changed: 75 additions & 0 deletions b/‎text/48_How_Git_Stores_Objects/0_ How_Git_Stores_Objects.markdown
Lines changed: 75 additions & 0 deletions
diff --git a/‎text/52_The_Packfile/0_The_Packfile.markdown
Lines changed: 67 additions & 0 deletions b/‎text/52_The_Packfile/0_The_Packfile.markdown
Lines changed: 67 additions & 0 deletions
diff --git a/‎text/52_Working_With_Packfiles/0_ Working_With_Packfiles.markdown
Lines changed: 0 additions & 1 deletion b/‎text/52_Working_With_Packfiles/0_ Working_With_Packfiles.markdown
Lines changed: 0 additions & 1 deletion
@@ -8,11 +8,10 @@
 def do_replacements(html, type = :html)
 
   # highlight code
-  #html = html.gsub /<pre><code>.*?<\/code><\/pre>/m do |code|
-  #  code = code.gsub('<pre><code>', '').gsub('</code></pre>', '').gsub('&lt;', '<').gsub('&gt;', '>').gsub('&amp;', '&')
-  #  Uv.parse(code, "xhtml", "ruby", false, "mac_classic")
-  #end
-  
+  html = html.gsub /<pre><code>ruby.*?<\/code><\/pre>/m do |code|
+    code = code.gsub('<pre><code>ruby', '').gsub('</code></pre>', '').gsub('&lt;', '<').gsub('&gt;', '>').gsub('&amp;', '&')
+    Uv.parse(code, "xhtml", "ruby", false, "mac_classic")
+  end
 
   # replace gitlinks
   html.gsub! /linkgit:(.*?)\[\d\]/ do |code, waa|
 
@@ -1,2 +1,77 @@
 ## How Git Stores Objects ##
 
+This chapter goes into detail about how Git physically stores objects.
+
+All objects are stored as compressed contents by their sha values.  They
+contain the object type, size and contents in a gzipped format.
+
+There are two formats that Git keeps objects in - loose objects and 
+packed objects. 
+
+### Loose Objects ###
+
+Loose objects are the simpler format.  It is simply the compressed data stored
+in a single file on disk.  Every object written to a seperate file.
+
+If the sha of your object is <code>ab04d884140f7b0cf8bbf86d6883869f16a46f65</code>,
+then the file will be stored in the following path:
+
+	GIT_DIR/objects/ab/04d884140f7b0cf8bbf86d6883869f16a46f65
+
+It pulls the first two characters off and uses that as the subdirectory, so that
+there are never too many objects in one directory.  The actual file name is 
+the remaining 38 characters.
+
+The easiest way to describe exactly how the object data is stored is this Ruby
+implementation of object storage:
+
+	ruby
+	def put_raw_object(content, type)
+	  size = content.length.to_s
+ 
+	  header = "#{type} #{size}\0"
+	  store = header + content
+           
+	  sha1 = Digest::SHA1.hexdigest(store)
+	  path = @git_dir + '/' + sha1[0...2] + '/' + sha1[2..40]
+ 
+	  if !File.exists?(path)
+	    content = Zlib::Deflate.deflate(store)
+ 
+	    FileUtils.mkdir_p(@directory+'/'+sha1[0...2])
+	    File.open(path, 'w') do |f|
+	      f.write content
+	    end
+	  end
+	  return sha1
+	end
+
+### Packed Objects ###
+
+The other format for object storage is the packfile. Since Git stores each 
+version of each file as a seperate object, it can get pretty inefficient. 
+Imagine having a file several thousand lines long and changing a single line.
+Git will store the second file in it's entirety, which is a great big waste
+of space.
+
+In order to save that space, Git utilizes the packfile.  This is a format
+where Git will only save the part that has changed in the second file, with 
+a pointer to the file it is similar to.  
+
+When objects are written to disk, it is often in the loose format, since
+that format is less expensive to access.  However, eventually you'll want
+to save the space by packing up the objects - this is done with the 
+linkgit:git-gc[1] command.  It will use a rather complicated heuristic to 
+determine which files are likely most similar and base the deltas off that
+analysis.  There can be multiple packfiles, they can be repacked if neccesary
+(linkgit:git-repack[1]) or unpacked back into loose files 
+(linkgit:git-unpack-objects[1]) relatively easily. 
+
+Git will also write out an index file for each packfile that is much smaller 
+and contains offsets into the packfile to more quickly find specific objects 
+by sha.
+
+The actual details of the packfile implementation are found in the Packfile
+chapter a little later on.
+
+
@@ -0,0 +1,67 @@
+## The Packfile ##
+
+This chapter explains in detail, down to the bits, how the packfile and 
+pack index files are formatted.
+
+### The Packfile Index ###
+
+First off, we have the packfile index, which is basically just a series of 
+bookmarks into a packfile. 
+
+There are two versions of the packfile index - version one, which is the default
+in versions of Git earlier than 1.6, and version two, which is the default
+from 1.6 forward, but which can be read by Git versions going back to 1.5.2. 
+
+Version 2 also includes a CRC checksum of each object so compressed data 
+can be copied directly from pack to pack during repacking without 
+undetected data corruption.  Version 2 indexes can also handle packfiles
+larger than 4 Gb.
+
+[fig:packfile-index]
+
+In both formats, the fanout table is simply a way to find the offset of a
+particular sha faster within the index file.  In version 1, the offsets and
+shas are in the same space, where in version two, there are seperate tables
+for the shas, crc checksums and offsets.  At the end of both files are 
+checksum shas for both the index file and the packfile it references.
+
+Importantly, packfile indexes are *not* neccesary to extract objects from
+a packfile, they are simply used to *quickly* retrieve individual objects from
+a pack.  The packfile format is used in upload-pack and receieve-pack programs
+(push and fetch protocols) to transfer objects and there is no index used then
+- it can be built after the fact by scanning the packfile.
+
+### The Packfile Format ###
+
+The packfile itself is a very simple format.  The first four bytes is the 
+string 'PACK', which is sort of used to make sure you're getting the start 
+of the packfile correctly.  After that, you get a series of packed objects,
+which each consist of an object header and object contents.  At the end
+of the packfile is a SHA1 sum of all the shas (in sorted order) in that
+packfile.
+
+[fig:packfile-format]
+
+The object header is a series of one or more 1 byte (8 bit) hunks that
+specify the type of object the following data is, and the size of the data
+when expanded.  Each byte is really 7 bits of data, with the first bit being
+used to say if that hunk is the last one or not before the data starts.  If
+the first bit is a 1, you will read another byte, otherwise the data starts
+next.  The first 3 bits in the first byte specifies the type of data, 
+according to the table below. 
+
+(Currently, of the 8 values that can be expressed
+with 3 bits (0-7), 0 (000) is 'undefined' and 5 (101) is not yet used.)
+
+Here, we can see an example of a header of two bytes, where the first
+specifies that the following data is a commit, and the remainder of the first
+and the last 7 bits of the second specifies that the data will be 144 bytes
+when expanded.
+
+[fig:packfile-logic]
+
+It is important to note that the size specified in the header data is not 
+the size of the data that actually follows, but the size of that data *when 
+expanded*. This is why the offsets in the packfile index are so useful, 
+otherwise you have to expand every object just to tell when the next header 
+starts.