Zfs Internals Uli Graef

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

ZFS Internal Structure

Ulrich Grf Senior SE Sun Microsystems

ZFS Filesystem of a New Generation


Integrated Volume Manager ransactions for e!ery change on the "is# $hec#sums for e!erything Self %ealing Sim&lified 'dministration
'lso accelerated $hanges online

(erformance through $ontroll of "ata&ath

E!erything new) No* +ut new in this com,ination*

'nother e-&lanation why using ZFS


$urrent rends in "atacenters
.arger filesystems "ata li!es longer on dis#s +ac#u& de!ices are sufficient Enough de!ices for /estore0 E-&ensi!e +ac#u&s are com&lemented ,y co&ies on dis# $o&ies on dis#s are more !ulnera,le to failures

ZFS and failures


ZFS can correct structural errors caused ,y
+it errors 1 2 sectorin 23425 reads6 Errors caused ,y mis7&ositioning
(hantom writes Misdirected reads Misdirected writes

"M' &arity errors +ugs in software and firmware 'dministration errors

ZFS Self %ealing


Elements0
Integrated Volume Manager 1.arge*6 $hec#sums inside of +loc# (ointer

%ow does it wor#)


/ead a ,loc# determined ,y +loc# (ointer $reate a chec#sum $om&are it with chec#sum in +loc# (ointer 8n Error0 use9com&ute ,loc# 1redundancy6

Structural Integrity 1remem,er0 Star re#6

ZFS Self %ealing


Is different from other filesystems Is a :uality not a!aila,le from other filesystems Is only &ossi,le when com,ining
Integrated Volume Manager /edundant Setu& .arge $hec#sums

Is not a!aila,le on /eiser;< e-t=9e-t>< ?'F.< -fs ?ill ,e a!aila,le on ,trfs< when it is finished 1,ut not all other ZFS features6

ZFS Self %ealing

Application

Application

Application

ZFS mirror

ZFS mirror

ZFS mirror

ZFS Structure
ZFS Structure0
U,er,loc# ree with +loc# (ointers "ata only in lea!es

ZFS Structure0 vdev


' ZFS &ool 1@&ool6 is ,uilt from
?hole dis#s "is# &artitions Files

A called physical vdev

ZFS Structure0 $onfiguration


$onfiguration can ,e
Single de!ice Mirrored 1mirror6 /'I"7B9/'I"75 1raid@< raid@C6 /ecently0 raid@= 1raid@n is in &lanning6

ZFS0 physical vdev


Each physical vdev contains
> vdev labels 1CB5 D+ each6
C la,els at the ,eginning C la,els at the end

' =EB M+ hole for ,oot code 2CF#, ,loc#s for data of the @&ool

. .

. .

ZFS0 vdev label


' vdev label contains = &arts
ga& 1a!oid conflicts with dis# la,els6 n!list 1name !alue &air list6 12CFD+6
'ttri,utes of the @&ool Including the configuration of the @&ool

u,er,loc# array 12CF entries< each 2D+6

$onfiguration also defines logical vdevs


mirror or raid@< log and cache de!ices

ZFS0 n!list in a vdev label 126


$ zdb -v -v data version=4 name='data' state=0 txg=162882 pool_guid=1442865571463645041 hostid=13464466 hostname='nunzio' vdev_tree ...

ZFS0 n!list in a vdev label 1C6


vdev_tree type='root' Id=0 guid=1442865571463645041 children[0] type='disk' id=0 guid=15247716718277951357 path='/dev/dsk/c1t0d0s7' devid='id1,sd@SATA_____SAMSUNG_HM251JJ_______S1J... phys_path='/pci@0,0/pci1179,1@1f,2/disk@0,0:h' whole_disk=0 metaslab_array=14 metaslab_shift=27 ashift=9 asize=25707413504 is_log=0

ZFS0 uberblock
Verification
Magic num,er 1 3-33,a,2oc 6 for endianess Version ransaction Grou& num,er ime7stam& $hec#sum

$ontent0
(ointer to the root of the @&ool tree

ZFS0 uberblock: E-am&le


$ zdb -v -v data ... Uberblock magic = 0000000000bab10c version = 4 txg = 262711 guid_sum = 16690582289741596398 timestamp = 1256864671 UTC = Fri Oct 23 12:04:31 2009 rootbp = ...

ZFS0 block pointer


Data virtual address 12< C or = d!a6
(oints to other ,loc# /eferences a vdev num,er defined in configuration $ontains num,er of ,loc# in vdev Grid information 1for raid@6 Gang ,it 1Ggang chainingH of smaller ,loc#s6

y&e and si@e of ,loc# 1logical< allocated6 $om&ression information 1ty&e< si@e6 ransaction grou& numer $hec#sum of ,loc# 1d!a &oints to this ,loc#6

ZFS0 block pointer0 E-am&le


rootbp = [L0 DMU objset] 400L/200P DVA[0]=<0:5c8087800:200> DVA[1]=<0:4c81a2a00:200> DVA[2]=<0:3d002ca00:200> fletcher4 lzjb LE Contiguous birth=262711 Fill=324 cksum=914be711d:3ab1cae4571 :c07d93434c9b:1ab1618a08eccd

ZFS0 some block pointers in a @&ool

.. %

..

.. %

..

.. %

..

ZFS0 ransactions
2E Starting at a consistent structure CE +loc#s may ,e changed ,y &rograms

8nly &re&ared in main memory +loc#s are ne!er o!erwritten on dis# Structure is com&leted u& to the root ,loc# +loc#s are written to vdevs 8nly free ,loc#s are used he ne-t u,er,loc# slot is written

=E ransaction is &re&ared

>E ransaction is committed

ZFS0 ransaction

ZFS "MU 8,Iects


'll data in a @&ool is structured in o,Iects
dnode defines an o,Iect
y&e and si@e< indirection de&th .ist of block pointers +onus ,uffer 1fEeE for standard file attri,utes6

"MU o,Iect set


8,Iect that contains an array of dnodes U,er,loc#0 &oints to the Meta Object Set

ZFS0 8,Iect Structure

ZFS0 Intent .og


Stores all synchronously written data Uses unallocated ,loc#s Is rooted in the Object Set

ZFS0 "ataset and Sna&shot .ayer


"S. "ataset and Sna&shot .ayer
Filesystems Sna&shots< clones ZFS !olumes

Meta 8,Iect Set contains 8,Iect Set and


Num,er of "S. directory 1Z'( o,Iect6 $o&y of the !de! configuration +loc#&ointers to ,e freed

ZFS0 "S. Structure


ZFS hierarchical names
$hild "ataset Entries in the "S. "irectory Each $hild has own "S. "irectory

"S. "ataset
Im&lemented ,y a "MU dnode

Sna&shots and $lones


.in#ed .ist rooted at the "S. "ataset

ZFS0 "S. Structure

ZFS 'ttri,ute (rocessor


Z'( ZFS 'ttri,ute (rocessor
Name 9 !alue &airs %ash ta,le with o!erflow lists Used for
"irectories ZFS hierarchical names ZFS attri,utes

ZFS microZ'( 9 FatZ'(


microZ'(
8ne ,loc# 1u& to 2CF#6 Sim&le 'ttri,utes 15> ,it num,er6 Name length limited 1B3 ,ytes6

FatZ'(
8,Iect %ash into (ointer a,le (ointers go to Name9Value storage

ZFS (osi- .ayer 9 Volume


ZFS (osi- .ayer
Im&lements a (osi- filesystem with o,Iects "irectories are Z'( o,Iects Files are "MU o,Iects 'dditional0 "elete Jueue

ZFS Volume
8nly one o,Iect in "S. 8,Iect set the Volume

ZFS0 Misc
"ata is com&ressed when s&ecified Metadata is com&ressed ,y default
'll internal nodes Z'( "S. "irectories< "S. "atasets

$o&ies are im&lemented with "V' in +(


Z&ool data is stored in = co&ies ZFS data is stored in C co&ies "ata can ,e stored in u& to = co&ies

ZFS Internal Structure

Juestions)

You might also like