0% found this document useful (0 votes)
34 views9 pages

Forensic Analysis of Multiple Device BTRFS Configu

Uploaded by

Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views9 pages

Forensic Analysis of Multiple Device BTRFS Configu

Uploaded by

Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Digital Investigation 26 (2018) S21eS29

Contents lists available at ScienceDirect

Digital Investigation
journal homepage: www.elsevier.com/locate/diin

DFRWS 2018 USA d Proceedings of the Eighteenth Annual DFRWS USA

Forensic analysis of multiple device BTRFS configurations using


The Sleuth Kit
Jan-Niclas Hilgert a, *, Martin Lambertz a, Shujian Yang b
a
Fraunhofer FKIE, Bonn, Germany
b
Cap Barbell, Houston, TX, USA

a b s t r a c t
Keywords: The analysis of file systems is a fundamental step in every forensic investigation. Long-known file
File systems systems such as FAT, NTFS, or the ext family are well supported by commercial and open source forensics
Pooled storage
tools. When it comes to more recent file systems with technologically advanced features, however, most
Forensic analysis
BTRFS
tools fall short of being able to provide an investigator with means to perform a proper forensic analysis.
The Sleuth Kit BTRFS is such a file system which has not received the attention it should have. Although introduced in
2007, marked as stable in 2014, and being the default file system in certain Linux distributions, there is
virtually no research available in the area of digital forensics when it comes to BTRFS; nor are there any
software tools capable of analyzing a BTRFS file system in a way required for a forensic analysis.
In this paper we add support for BTRFSdincluding support for multiple device configurationsdto The
Sleuth Kit, a widely used toolkit when it comes to open source file system forensics. Moreover, we
provide an analysis of forensically important features of BTRFS and show how our implementation can be
used to utilize these during a forensic analysis.
© 2018 The Author(s). Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under
the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction TSK provides support for a variety of file systems including ext4
on Linux, Microsoft's NTFS and FAT, and Apple's HFSþ. Although
In 2005, Brian Carrier published his book “File System Forensic these file systems are still widely used on today's computers, other
Analysis” Carrier (2005), in which he analyzed and explained file systems have been introduced since the publication of Carrier's
storage devices and file systems in an unprecedented depth. book and TSK. While FAT for instance is still often used on thumb
Furthermore, he proposed a model how to analyze storage media drives or memory cards due to its simplicity, the demand for reli-
from the physical media up to the analysis of extracted files. His ability, security, and maintainability has sparked progress in the
work quickly became the foundation for any analysis conducted in world of file systems. The copy-on-write principle is used to keep
this area. Moreover, he provided an implementation for his theo- file systems in a stable state, even after a crash has caused a write
retical model, known as The Sleuth Kit (TSK) Carrier (2017). TSK is a operation to fail. Encryption on a file system-level increases the
forensic toolkit, providing multiple commands, which enables an protection of personal data in such a way that it is available out
investigator to perform a forensic analysis of file systems, inde- of the box and transparent to the user. Furthermore, modern file
pendent of the actual file system at hand. Thus, no extensive systems decrease the overhead for administrative tasks like volume
background knowledge about the internal structures of a file sys- management or partitioning. By implementing multiple device
tem is required in order to create a file listing, recover deleted files, support like ZFS or BTRFS, volumes can be added or removed
or search for unallocated sections. Along with the fact that it is open straightforwardly to existing file systems. Additionally, snapshots
source and can be used or extended by anyone, TSK became a are used to effortlessly create complete backups of a file system.
commonly used tool for many analysts and researchers next to In this paper, we implement one of these modern file systems
commercial products. into TSK in order to close the gap between them and the forensic
world. For this purpose, we are taking an in-depth look at BTRFS as
one of the most prominent examples in this area. BTRFS supports
* Corresponding author. multiple of the aforementioned features, including copy-on-write,
E-mail addresses: jan-niclas.hilgert@fkie.fraunhofer.de (J.-N. Hilgert), martin. snapshots, and multiple device support. Despite the fact that it
lambertz@fkie.fraunhofer.de (M. Lambertz), yang_shujian@hotmail.com (S. Yang).

https://doi.org/10.1016/j.diin.2018.04.020
1742-2876/© 2018 The Author(s). Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/
licenses/by-nc-nd/4.0/).
S22 J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29

was implemented into the Linux Kernel more than eight years ago,
it has not received the adequate amount of attention in the aca-
demic or practical forensic area. Therefore, we also provide the first
multiple device analysis of BTRFS form a forensic point of view.

2. Related work

In this section we present related work for two main aspects:


forensic analyses of BTRFS and extensions of TSK with a focus on
modern file systems with multiple device support.

2.1. BTRFS forensics

As already mentioned in the Introduction, there is virtually no


academic work dealing with BTRFS in the context of digital foren-
sics. While there are a few papers introducing BTRFS and some of
its structures Bacik (2012); Rodeh et al. (2013), to the best of our
knowledge there is no prior work investigating which structures
are of particular relevance to perform a forensic analysis of BTRFS.
Looking at the non-academic world, the situation is similar. At
the time of this writing the well known forensic suites like
EnCase Forensic, FTK, or X-Ways Forensics do not list BTRFS in
their lists of supported file systems. X-Ways only mentions the
“ability to identify BTRFS file systems” in their changelog of X-Ways
Forensics Fleischmann and Stefan, 2012. Although there is an
open pull request for BTRFS support for TSK on GitHub Po € schel
and Stefan, 2015, the code changes have not been merged since Fig. 1. Extended model for a file system forensic analysis of pooled storage file systems
2015. Moreover, the code is not able to handle multiple device Hilgert et al. (2017).
configurations which mirror or stripe data to their devices mak-
ing it applicable to a small fraction of BTRFS configurations only.
What is more, during our experiments the implementation failed assembled pool, but that a forensic tool should be able to parse all
for large test pools (z 1 TB of size). of the important data structures on its own in order to allow for the
adequate level of detail for a forensic analysis. Finally, the authors
2.2. Multiple device file systems in The Sleuth Kit demand that the pool analysis should be able to deal with missing
pool members where possible. That is, it should be possible to
In their work “Extending The Sleuth Kit and its Underlying perform a forensic analysis of a RAID or mirror pool if there are still
Model for Pooled Storage File System Forensic Analysis” Hilgert enough pool members present for example.
et al. (2017), Hilgert et al. use the term “pooled storage file sys- As a proof of concept Hilgert et al. implemented support for
tems” to refer to modern multiple device file systems like ZFS and ZFS into TSK to show that their extended model enables a forensic
BTRFS. These file systems are characterized by the fact that all analysis of modern pooled storage file systems. However, even
available space is combined to a pool and then shared between the though the authors mention BTRFS as a pooled storage file system,
file systems created on this pool. Thus, none of the file systems they do not provide a detailed investigation of this particular file
needs to be assigned a fixed size as they can grow and shrink system. Neither do they prove that BTRFS is in fact covered by their
dynamically. In the same transparent way, storage can be added model.
and removed to the storage pool. These advantages of pooled
storage file systems are possible, since they are providing their own 3. BTRFS fundamentals
type of volume management functionality keeping track of the
pool members and the mapping between the logical file system BTRFS is a modern copy-on-write file system primarily for the
addresses and the actual physical offsets on the members. Linux operating system. It supports advanced features like check-
In the same paper, Hilgert et al. assess the applicability of the sums, deduplication, and SSD awareness btrfs Wiki (2018a).
model behind TSK for such modern pooled storage file systems. Moreover, BTRFS allows the creation of subvolumes which can be
They found that the steps of the original model are still required, considered as “independently mountable POSIX filetree[s]” btrfs Wiki
but that the class of pooled storage file systems needs an additional (2017e). These subvolumes can be used to divide the complete file
step to be performed between the volume analysis and the file system into smaller units. Typically, such a unit contains areas of
system analysis. The authors call this step “pool analysis” and Fig. 1 the file system which are cohesive in some way. The subvolumes of
depicts where it has been added to in the original model. a BTRFS file system can be mounted independently of each other
Furthermore, they define five key aspects this step has to and with different mount options.
implement. An obvious aspect is the capability to detect pooled Furthermore, BTRFS supports snapshots, which utilize the copy-
storage file systems. Since pooled file systems play their strength on-write principle to save and restore (parts of) a file system.
when on multiple disks, support for such multiple device config- Snapshots are created per subvolume and technically a snapshot is
urations is also an important requirement for this step. Hilgert et al. a subvolume itself. A snapshot of a subvolume represents the state
state that it should be possible to determine the pool membership of the original subvolume at the time the snapshot was created.
of disks and afterwards analyze the resulting storage pools, which Since snapshots are subvolumes, they can be mounted and modi-
are potentially comprised of more than one disk. Furthermore, the fied. This concept gives users a comfortable option to create
authors highlight that a forensic analysis should not rely on an backups of their data without any additional soft- or hardware. This
J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29 S23

renders snapshots a highly interesting feature when it comes to the The general approach to perform a BTRFS file walk from the
forensic analysis of a BTRFS file system. superblock to the contents of a file is depicted in Fig. 2 and includes
As already mentioned before, BTRFS is a file system with built the following main steps:
in multiple device support. That is, it has its own volume manager
implemented responsible for storing data on and reading it back 1. Locate the superblock at the default physical address 0x10000.
from the underlying volumes of a file system. BTRFS supports 2. Extract the system chunk items stored in the superblock for the
different configurations in such a multiple device setup. At the time initial logical to physical address mapping.
of this writing the BTRFS status page btrfs Wiki (2017d) lists RAID0, 3. Find the logical address of the chunk tree in the superblock,
RAID1, and RAID10 as stable implementations and RAID5 and translate it to its physical counterpart, and build the chunk tree.
RAID6 as flawed implementations. In line with what Hilgert et al. From now on, this tree will be used to perform the mapping
did in their paper Hilgert et al. (2017) for ZFS, we will use the terms from logical to physical addresses.
pool and BTRFS file system interchangeably to refer to the complete 4. Find the logical address of the root tree in the superblock,
file system including subvolumes from now on; even though the translate it to its physical address and build the root tree.
term pool is not part of the BTRFS terminology. 5. The root tree stores the logical addresses of the roots of the
other trees including the file system trees. Find the address of
3.1. General overview the corresponding root of the file system tree, translate it and
build the tree.
Similar to the ext file systems, BTRFS starts with a superblock, 6. Traverse the file system tree to find the file of interest. Its name
which stores the most basic metadata about the file system. is stored in a directory item.
Apart from that, the rest of the data is stored in different B-trees. 7. Read the corresponding inode item of the file in the file system
The addresses of the roots of these trees can be found in the tree, referenced by the directory item, to retrieve its ID and
root tree. The address of the root tree in turn is stored in the metadata.
superblock. 8. Use the ID as a key to find its extent data items in the file system
A main characteristic of a B-tree is that all information is tree.
stored in its leaf nodes. The non-leaf nodes, known as internal 9. Extract the data described by all extents corresponding to the
nodes in BTRFS, are only used as references to leaf nodes. Due to file by mapping their logical to physical addresses.
this, the internal nodes of different tree types are very similar as
they only contain pointers to other nodes. The leaf nodes on In summary, the analysis of BTRFS starts with reading the
the other hand have different types of records called items. Their superblock and extracting the roots of the trees. Once the tree
exact structure and content depends on the type of the tree at roots are available, the rest of the analysis is all about expanding,
hand. Listed below is an overview of the most important types of referencing, and reading the child nodes of these tree roots. More
trees in BTRFS: detailed information about the on-disk format and data struc-
tures of BTRFS can be found in the official Wiki btrfs Wiki (2018b,
 Chunk tree: The chunk tree is used to perform the mapping 2017a).
from logical to physical addresses in BTRFS. All addresses used in
BTRFS are logical addresses, which translate to one or more 3.2. Multiple device support
physical addresses depending on the pool configuration. Since
also the chunk tree is referenced by its logical address, the su- An integral feature of BTRFS is the support for multiple devices,
perblock contains a part of the chunk tree, the system chunk whose available space is combined and shared by the subvolumes.
items, for the initial mapping. This is required to build the chunk In order to accomplish this, BTRFS adds another layer of abstraction
tree in the first place. A detailed description of the mapping between the logical addresses used by the file system and the
performed by the chunk tree in BTRFS is given in Section 3.2. corresponding physical addresses referring to the actual devices.
Besides, the chunk tree also contains information about the This abstraction is implemented by a mapping, which translates a
devices used in the pool. logical address to the correct combination of physical device and
 Root tree: The root tree stores the addresses of the roots of the corresponding physical offset. Depending on the configuration, a
trees used by BTRFS. This includes the extent tree, checksum logical address can also map to multiple physical offsets and
tree, and device tree as well as all available file system trees. The devices in order to increase the redundancy of the data.
root address of the chunk tree on the other hand is not stored in For keeping track of its devices and performing the logical-to-
the root tree, but in the superblock. physical mapping, BTRFS uses special structures stored in the
 File system tree: This type of tree stores information about the chunk tree. For each device, a device item is added to the chunk tree,
file and directory hierarchy in file systems, subvolumes, and containing information such as a unique identifier for the device,
snapshots. This includes the metadata of files and directories as another device identifier used to index the available devices, and its
well as extent data items referencing the actual data. total available space. In addition to device items, the chunk tree
 Extent tree: Allocation records can be found in the extent tree. contains multiple chunk items defining logical chunks. In BTRFS, the
This includes block group items, defining regions in the logical complete logical address space is split into these non-overlapping
address space of BTRFS as well as metadata and extent items logical chunks. Thus, one logical address can be uniquely associ-
allocating space within these regions. The number of references ated with one logical chunk. These logical chunks also correspond
to these items as well as a back reference for each reference is to the regions defined by the block group items found in the extent
also stored. tree. Each chunk item contains the logical start address of the chunk
 Checksum tree: This tree simply contains checksums for the it describes as well as its length, the type of data it stores, and the
data stored in the BTRFS file system. RAID configuration used to store it. Different types of chunk items
 Device tree: The device tree is used for the reversed address are used to map different types of data btrfs Wiki (2017b):
mapping, from physical to logical addresses. This becomes
necessary, when physical devices are for instance removed from  System: System chunk items are used for the translation of
the pool. logical addresses of the chunk tree itself. For this reason, all
S24 J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29

Fig. 2. Overview of the most important BTRFS structures used for a file walk.

available system chunk items are also already stored in the Listing 1. Chunk item example.
superblock as described previously. $ btrfs-debug-tree/dev/sda
 Metadata: Metadata chunk items are used for the translation of [...]
logical addresses of file system internal data structures like root item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 299892736)
items, inode items or directory items. Thus, tree structures like itemoff 15265 Itemsize 176
the root tree, extent tree, device tree, and file system trees are chunk length 262144000 owner 2 stripe_len 65536
built using this type of chunk items. In BTRFS, small amounts of type DATAjRAID10 num_stripes 4 sub_stripes 2
data can be stored inside of metadata structures, for example in stripe 0 devid 2 offset 9437184
extent data items. In this case, this chunk type is implicitly used dev uuid: 66aaeb1a-8cbb-4979-89cf-56fb0c6c958a
to map the addresses of the embedded raw data. stripe 1 devid 1 offset 152043520
 Data: These chunk items are only used for the translation of dev uuid: b3b74185-13b0-4d2a-8300-ca740c384f4b
logical addresses of data blocks. stripe 2 devid 5 offset 140509184
dev uuid: c7099e88-5597-4776-9ee0-3d6b662e53b3
Each chunk is further divided into a number of stripes defined in stripe 3 devid 4 offset 140509184
the chunk. The device corresponding to a stripe can be identified dev uuid: e84da2d2-d5fe-4226-a8aa-52d1ad8988b5
by the given device identifier. The physical offset of each stripe [...]
indicates the beginning of the data on a device. Each stripe in a
As an example, the chunk item depicted in Listing 1 defines
chunk item is in turn divided into equally sized units with a stripe
the chunk starting at the logical address 299892736 spanning
length defined in the chunk item. In addition to the type of data
to address 562036736. It is used to store data using a RAID10
stored within the chunk, its type also defines the RAID configura-
configuration with four stripes and two sub stripes. As described
tion used to store data.
earlier, this means that these stripes are split into two RAID1 con-
In RAID0, all data is striped across the available stripes of the
figurations, each consisting of two stripes. In this case, stripe 0 and
logical chunk. After a unit in a stripe is filled, the data is written to
1 as well as stripe 2 and 3 are used as a RAID1 configuration and
the next stripe. This configuration leads to data loss, if one of the
store the same data. For each stripe, the corresponding device
stripes fails. RAID1 mirrors the data to all stripes in the chunk
identifier is given indicating the physical device on which the data
resulting in redundancy. That is, the units of each stripe are the
of the stripe is stored. In this example, the stripes of the chunk are
same. As far as we know, RAID1 always uses a pair of all available
located on devices 1, 2, 4 and 5. The exact location of the data on
devices as its stripes for each chunk item, while RAID0 always uses
each stripe (and therefore on the devices) can be determined using
all of the available devices. The exact number of stripes used by
the given offset for each stripe.
each chunk item is always specified in the chunk item itself. RAID10
combines the aforementioned concepts in such a way, that all of the
available stripes in a chunk are split into RAID1 configurations 4. Integrating BTRFS into TSK
across which the data is then striped. Each of these RAID1 config-
urations in turn mirrors the data across all of their corresponding In order to integrate BTRFS into TSK, it is indispensable to
stripes. The exact number of stripes used per RAID1 configuration is evaluate the applicability of its underlying model to multiple device
defined in the chunk and referred to as sub stripes. file systems which BTRFS is an instance of. Hilgert et al. already
J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29 S25

discussed this and presented an extended model for TSK, which pool, which is required to perform a complete file system analysis
enables a forensic analysis of multiple device file systems. For this of BTRFS.
reason, we will first assess the applicability of the revised model for
BTRFS followed by a detailed overview of our implementation. 4.2.1. Pool membership detection
As an input, the pool analysis receives the volumes found during
4.1. Theoretical model the volume analysis and detects the underlying pooled storage file
system, if there is any. Each device in BTRFS stores a superblock at
Hilgert et al. adopted the first and last step, the physical media the physical offset 0x10000, containing the most essential file
analysis and the application analysis, as they stood because they do system information. It does not only identify the volume as part of a
not need to be changed in order to be applied to pooled storage file BTRFS pool, but it also contains the file system UUID. This ID is
systems. The first step only processes the available data on the global for the whole BTRFS pool and can be used to identify other
devicesdthe pool members in our casedas a sequence of bytes and members of the multiple device configuration. Unlike ZFS which
does not interpret the data at all. The last step on the other hand, requires a name for its pools, BTRFS does not demand a label to be
interprets the extracted data as files. This does not require any file set for a file system or a pool. The superblock also includes a device
system specific information, because at that time of the analysis, item for the current device containing its unique identifier enabling
the files have already been extracted from the file system. Since us to rule out duplicate volumes.
these two steps are independent of the file system, they can also be Another essential part of this step is the detection of missing
applied unchanged to BTRFS. devices. Although the superblock contains the total number of
The original model was extended by adding a pool analysis step. devices used in a BTRFS pool, it provides information only about the
Hilgert et al. added this step to address the integrated volume device it is stored on and not about any of the other devices of the
management capability pooled storage file systems are equipped pool. Some information can be obtained by looking at the system
with. The potentially multiple devices spanned by a BTRFS pool, chunk items stored in each superblock. These chunks contain the
however, are not necessarily raw hard disks. Instead, they can also IDs and the UUIDs of the devices used for its stripes. However, in
be partitions, RAIDs or other multiple disk volumes. Therefore, also configurations like RAID1 or RAID10, not all available devices may
for BTRFS it is still required to perform a volume analysis in order to be used for the available system chunks. In that case, this method
detect the volumes involved. will not provide a complete listing of all devices. Another possibility
Furthermore, mounting a BTRFS file system also results in access to obtain more information about the available devices opens up,
to the data (i.e. files and directories) stored on the most recent when all devices storing the chunk tree are available. In this case,
version of the file system. Apart from that, no access to file system the complete chunk tree can be built containing device items for all
internal data structures is possible. Accessing older versions of files devices used in the BTRFS pool.
as well as file system data and metadata directly requires direct
access to the BTRFS pool, which is obtained during the pool analysis 4.2.2. Mapping of logical to physical addresses
step. Taken all together, BTRFS fits into the model presented by After the available volumes of a pool have been detected, we
Hilgert et al. without any needs for further modification. need to gain direct access to data at the correct offsets stored on the
As Hilgert et al. pointed out, the pool analysis is a highly file pool members. For this, we need to be able to perform the mapping
system dependent step, which needs to be implemented for each from logical to physical addresses. In BTRFS, this mapping is done
new file system. This is similar to the file system analysis func- by utilizing the chunk tree as described in Section 3.2. Fig. 3 illus-
tionality in TSK that differs from file system to file system. The next trates the following steps describing how to map a logical address
section describes in detail how the pool analysis for BTRFS is to a physical address (i.e. the physical offset on the disk) for a RAID0
implemented. configuration:

1. Locate the chunk item containing the given logical target


4.2. Pool analysis
address (tlog) in the chunk tree. This gives us the logical start
address of the chunk (clog).
The tasks of the pool analysis can be divided into two major
2. Calculate the difference (D) between the logical target address
steps. First, the given volumes need to be searched for a pooled
and the logical start address of the chunk.
storage file system. Furthermore, the corresponding pool and its
members need to be identified. Second, after the members and
file system type are known, the mapping from logical to physical
addresses needs to be performed. This results in direct access to the

Fig. 3. Distribution of data in a RAID0 chunk item using three stripes.


S26 J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29

This difference represents the offset of the target address within implementation does not alter the functionality of the original
the chunk item. Sleuth Kit, so that it can still deal with any previously supported file
3. Use D and the stripe length (stripeLen) to compute the systems.
total number of stripe units preceding our target address
(preStripeUnits):
5.1. Forensic analysis of a BTRFS pool

As already described in Section 4.2.1, a main aspect during a


forensic investigation of a pooled storage file system is the detec-
4. Find out on which stripe (targetStripe) our logical address tion of its members followed by the detection of the pool config-
(and thus the start of the data) lies by calculating the total uration. For this purpose, we extended the pls command
number of preceding units modulus the number of stripes introduced by Hilgert et al. to enable support for BTRFS. This
(nStripes). command is used to perform and display the results of the pool
analysis. As shown in Listing 2, the output gives an investigator
insight into the most important information found in the super-
block stored on a device. This information includes the file system
5. Knowing the corresponding stripe gives us the physical start as well as the device UUID. For further analysis, it also displays
offset (phyStripeOff) of the data on the device specified in information about the pool including its label, if one was given, and
the chunk item. its total number of devices.
6. Calculate the number of units (nStripeUnits) that have
already been allocated on our stripe by dividing the total Listing 2. Using pls for a pool membership detection of a single
number of units already filled by the number of available stripes. disk.
$ pls/BTRFS/raid10_5disks/disk1
Part of BTRFS pool:
Label: RAID10Pool
File system UUID:
D369B8F5-53EA-4DA9-A020-F6E585AA67D4
7. Calculate the offset within the unit (unitOff) on our stripe. Root tree root address: 45711360
Chunk tree root address: 20987904
Generation: 42
Chunk root generation: 39
8. Adding the calculated values results in the final physical offset Total bytes: 5242880000
(phyOff) Number of devices: 5
Device UUID: B3B74185-13B0-4D2A-8300-CA740C384F4B
Device ID: 1
Device total bytes: 1048576000
Device total bytes used: 1004535808
[...]

For a single disk configuration, the logical address space After detecting the single members of a BTRFS pool, pls can be
described by the chunk starts at the physical offset of the one and used to analyze the pool configuration. For this, it provides the -P
only stripe and continues without any interruption. For this reason, parameter, indicating that the input volumes are now analyzed as a
the physical offset can simply be calculated by: pool. Listing 3 shows that all of the five devices of the BTRFS pool
have been successfully detected. It also gives information about the
RAID levels used for each type of chunk items in the pool as well as
the available and total number of these chunk items. In a case of
Since each stripe in a RAID1 configuration stores the same data, missing pool members, this provides information about the avail-
it is possible to choose any stripe of the chunk and calculate the ability of metadata and thus the chances of recovering data.
physical offset in a similar way to a single disk configuration. For
Listing 3. Initial analysis of acquired volumes using pls.
RAID10, it is necessary to choose one stripe out of each used RAID1
$ pls -P/BTRFS/raid10_5disks/Detected BTRFS Pool
configuration. Afterwards, these stripes are nothing but a RAID0
Label: RAID10Pool
configuration, whose mapping can be calculated following the
File system UUID:
aforementioned steps 1 to 8.
D369B8F5-53EA-4DA9-A020-F6E585AA67D4
BTRFS also supports RAID5 and RAID6, however, due to bugs in
Number of devices: 5 (5 detected)
the implementations and the consequent risk of data loss, it is
e
officially recommended not to use these configurations btrfs Wiki
Device ID: 1 (B3B74185-13B0-4D2A-8300-CA740C384F4B)
(2017c). Therefore, we do not cover RAID5 and RAID6 in our
Device ID: 2 (66AAEB1A-8CBB-4979-89CF-56FB0C6C958A)
implementation for now.
Device ID: 3 (71D7CC24-BBE3-4E31-B532-EDF15C5AC527)
Device ID: 4 (E84DA2D2-D5FE-4226-A8AA-52D1AD8988B5)
5. Forensic artifacts in BTRFS
Device ID: 5 (C7099E88-5597-4776-9EE0-3D6B662E53B3)
System chunks: RAID10 (1/1)
The following sections are used to highlight features of BTRFS
Metadata chunks: RAID10 (1/1)
which are of particular interest for a forensic examiner when pre-
Data chunks: RAID10 (6/6)
sented with a BTRFS file system. We extended the implementation
by Hilgert et al., to enable a forensic analysis of BTRFS Hilgert et al. After a pool has successfully been detected, the other tools
(2018). In the same way as their support for ZFS, our provided by our implementation can be used for a forensic analysis
J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29 S27

including file listings, timeline generation, or data extraction. þ r/r 269: 018371.docx
In line with the implementation of Hilgert et al., we have imple- d/d 270: home
mented support for BTRFS to the following tools of TSK: þ d/d 271: user
þþ r/r 272: 516411.docx
 fsstat: Shows general information about the BTRFS file system $ fls -P/BTRFS/raid10_5disks/snapshot_2017-12-06
including its snapshots and subvolumes. r/r 265: 043349.ppt
 fls: Lists all files and directories of a BTRFS file system, snapshot, d/d 266: data
or subvolume. þ r/r 267: 018367.docx
 istat: Shows metadata information about an object, which is þ r/r 268: 018370.docx
uniquely identified by its object ID shown in fls and its parent þ r/r 269: 018371.docx
file system, subvolume, or snapshot. d/d 270: home
 icat: Extracts the data associated with a metadata structure. þ d/d 271: user
þþ r/r 272: 516411.docx
þþ r/r 275: 043083.html
þþ r/r 276: 043084.html
5.2. Snapshots
þþ r/r 277: 043088.txt
As mentioned earlier, BTRFS offers the possibility to create
snapshots of existing file systems. Remember that a snapshot saves
5.3. Metadata-based file recovery
the current state of the file system and can afterwards be used to
revert the file system to the point in time when the snapshot was
BTRFS only stores allocated metadata for files and directories in
taken. What is more, snapshots are part of the file system and thus
its trees. For this reason, searching for unallocated metadata
always in a consistent state. Hence, they represent an outstanding
structures for file recovery in the most recent tree is not an option.
source for the recovery of deleted files. Enabling the detection and
Nevertheless, it is possible to look at still existing metadata struc-
analysis of snapshots is therefore an important analysis technique
tures of older trees. Due to the copy-on-write principle used
during the forensic examination of a BTRFS file system.
by BTRFS, each transaction creates a new root tree and results in a
Listing 4. Listing all available snapshots and subvolumes using new generation number. Thus, accessing an old root tree makes it
fsstat. possible to jump back in time, analyze a previous version of the file
$ fsstat -P/BTRFS/raid10_5disks/ system, and extract deleted files.
File system UUID: Unfortunately, there are two issues when trying to perform file
D369B8F5-53EA-4DA9-A020-F6E585AA67D4 recovery in this manner. First, we are dealing with possibly
[...] inconsistent metadata. The analysis is performed on artifacts of the
The following subvolumes or snapshots were found: file system and chances are high that parts of them have already
259 snapshot_2017-12-06 been overwritten. If this happens to metadata, it will not be
260 snapshot_2017-12-13 possible to continue the analysis.
261 snapshot_2017-12-20 Second, the location of an older root tree needs to be deter-
mined. Apart from scanning the complete set of volumes for
Since snapshots are subvolumes in BTRFS, the following
these root structures, file systems sometimes keep track of these
description applies not only to snapshots but also to subvolumes
locations. ZFS for example stores the last 128 versions of its root
in general. For each snapshot, a separate file system tree is
structure (called überblock) in an array. In BTRFS, unfortunately
created. These file system trees can be analyzed similar to the
only four versions of a structure referred to as btrfs_root_-
default “top-level” file system. Each of these file system trees is
backup are stored in an array in each superblock.
referenced by a ROOT_REF in the root tree containing for example
the ID of the file system tree or the name of the snapshot. Listing 6. Backup root addresses stored in the superblock shown
Furthermore, a root item is added to the root tree storing a by pls.
reference to the root node of the tree and additional information $ pls /BTRFS/raid10_5disks/disk1[...]
like the number of the generation that created the snapshot. These Backup Roots:
generation numbers are always updated whenever a transaction is 1. tree root at 45711360 (generation: 42)
written to the BTRFS pool. chunk tree root at 20987904 (generation: 39)
Using fsstat, we are able to list all subvolumes and snapshots 2. tree root at 44646400 (generation: 39)
for a particular BTRFS file system as shown in Listing 4. Afterwards, chunk tree root at 20987904 (generation: 39)
the corresponding name can be used to list, extract, or recover files 3. tree root at 45285376 (generation: 40)
from snapshots. This is done by passing the snapshot as an argu- chunk tree root at 20987904 (generation: 39)
ment to the other file system analysis tools like fls. Listing 5 shows 4. tree root at 45629440 (generation: 41)
an example in which snapshot_2017-12-06 contains multiple chunk tree root at 20987904 (generation: 39)
files, which have been deleted in the most recent version of the file
As shown in Listing 6, these backup structures can be listed
system tree. These deleted files, still available in the snapshot, are
using pls. Though the output only shows the logical addresses of
located in the /home/user/directory and can be restored using
the root and chunk trees from previous generation numbers, the
icat.
backup structure also contains the logical addresses of the roots of
Listing 5. Recovering files using snapshots. other important trees, like the extent or device tree. Furthermore, it
$ fls -P/BTRFS/raid10_5disks/ also stores the generation number corresponding to each tree and
r/r 265: 043349.ppt its logical address. These generation numbers are not necessarily
d/d 266: data the same for each tree in a backup structure, since not every
þ r/r 267: 018367.docx transaction modifies, for example, the chunk or device tree. In
þ r/r 268: 018370.docx Listing 6, the chunk tree at generation 39 is still used for the
S28 J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29

mapping of the most recent root tree. In our example, it can be seen number as a parameter, detects the corresponding backup struc-
that the most recent generation of the pool found in the superblock ture and uses the provided addresses for the reconstruction of the
is 42. The root tree address for that generation is 45711360 and can BTRFS file system. Listing 7 shows two file listings, one performed
already be found in the backup roots. The corresponding chunk tree with the tree root referenced by the superblockdthe most recent
is stored at address 20987904. versiondfollowed by one performed by specifying the previous
generation 41. By comparing the outputs, an image which is not
Listing 7. File listings using the most recent version and an older
available in the most recent version can be found. Similar to the
version of the tree root of the BTRFS file system.
usage of fls, we were able to successfully recover the file by using
$ fls -P/BTRFS/raid10_5disks/
icat with the same parameter. However, as already mentioned,
r/r 265: 043349.ppt
this type of file recovery does not always yield sufficient results.
d/d 266: data
þ r/r 267: 018367.docx
5.4. Missing disk
þ r/r 268: 018370.docx
þ r/r 269: 018371.docx
During a forensic examination, an investigator should be in a
$ fls -P/BTRFS/raid10_5disks/-T 41
position to perform an analysis of a BTRFS file system even if there
using rootTree at logical address: 45629440 (gen-
are disks missing. There are various reasons for missing disks: e.g. a
eration 41)
disk might have been destroyed or formatted before it could be
r/r 265: 043349.ppt
acquired. Similar to what Hilgert et al. observed for ZFS Hilgert et al.
d/d 266: data
(2017), missing disks render the normal file system tools useless.
þ r/r 267: 018367.docx
That is, a BTRFS file system spanning its data over multiple devices
þ r/r 268: 018370.docx
cannot be successfully accessed if there is a single device missing.
þ r/r 269: 018371.docx
This even holds for scenarios in which at least some of the data is
r/r 279: IMG00561.jpg
still recoverable.
The ZFS extension of TSK by Hilgert et al. provides a parameter As a test scenario, we created a BTRFS file system comprised of
to specify an older transaction group number for the recovery of three disks with the metadata profile set to RAID1 and the data
deleted files by using the corresponding überblock stored in profile set to RAID0. This means that with a single missing disk all
the array. In a similar manner, our tool expects the generation metadata should still be completely accessible whereas on average

Fig. 4. Extracting data from a degraded BTRFS pool missing disks.


J.-N. Hilgert et al. / Digital Investigation 26 (2018) S21eS29 S29

one third of the actual file data is expected to be missing. As shown one and only remaining disk, our implementation is able to suc-
in Listing 8, btrfs filesystem show recognizes the missing disk, cessfully detect, pad, and extract the remaining data of the image.
but cannot provide any additional information about it. As shown in Fig. 4c, this is sufficient to obtain an identifiable image,
in a case, in which former tools and methods returned nothing at all.
Listing 8. BTRFS pool with missing disk2.
$ btrfs filesystem show
warning, device 2 is missing 6. Conclusion and future research
warning devid 2 not found already
Label: none uuid: Just like Hilgert et al. we are convinced that pooled storage file
18f4475c-0b32-47c8-8827-739c6b8328d0 systems will become common in forensic investigations any time
Total devices 3 FS bytes used 89.91MiB soon. At the time of writing we hold the opinion that the forensic
devid 1 size 500.00MiB used 139.00MiB path/dev/sda community is not well enough prepared for file systems of this class:
devid 3 size 500.00MiB used 147.00MiB path/dev/sdc there are virtually no research papers and the toolsdboth commercial
*** Some devices missing ones as well as their open source counterpartsddo not support them.
In this paper we tie in with the efforts of Hilgert et al. to close
Trying to mount the file system in a degraded state using the
this serious gap. We confirmed that their proposed model is indeed
mount option -o degraded, fails with the message: BTRFS:
applicable to BTRFS. Subsequently, we followed their model to
missing devices(1) exceeds the limit(0), writeable
implement BTRFS support to TSK. This implementation enables
mount is not allowed. After mounting the file system readonly ,
practitioners to perform forensic analyses of BTRFS file systems.
it is possible to browse the directories and files. This is because the
Moreover, it can be used by the academic community for further
metadata containing this information is still available since it
research regarding BTRFS. While there have already been ap-
was mirrored to two independent stripes. Nevertheless, trying to
proaches to add BTRFS support before, to the best of our knowledge
access any of the filesdwhose content is not stored inline in met-
we provide the first implementation being able to handle multiple
adatadfails with cp: error reading/mnt/missing_disk/
device configurations correctly and efficiently.
IMG00158.bmp: Input/output error.
In addition to the implementation, which is publicly available
In line with what Hilgert et al. did to deal with missing disks in
and open source Hilgert et al. (2018), we also show how to perform
ZFS Hilgert et al. (2017), we also implemented direct access to the
a forensic analysis of a BTRFS file system using our extended TSK
file systems internal structures instead of relying on tools provided
version. Furthermore, we highlight features of BTRFS of particular
by the file system as described in Section 4.2. This fact enables us to
interest during a forensic investigation. These include snapshots
replace any data of missing devices with zeros so that we are able to
and means to be able to deal with missing or corrupted disks. Again,
extract the data which is still available and store it at the right
we also show how our TSK extension can be used to utilize these
offsets in the file.
features during an analysis.
Listing 9. BTRFS pool with missing disk2.
$ pls/BTRFS/missing_disk/
References
[...]
Number of devices: 3 (2 detected) Bacik, J., 2012. Btrfs: the Swiss army knife of storage. USENIX Login 37, 7e15.
e btrfs Wiki, 2017a. Data Structures - Btrfs Wiki. https://btrfs.wiki.kernel.org/index.
Device ID: 2 (2A756EDA-87F4-44CB-9745-361026DC91C8) php/Data_Structures.
btrfs Wiki, 2017b. Manpage/mkfs.btrfs - Btrfs Wiki. https://btrfs.wiki.kernel.org/
Device ID: 3 (AA193982-4C41-4E44-A2A5-350730E35E9B) index.php/Manpage/mkfs.btrfs.
System chunks: RAID1 (1/1) btrfs Wiki, 2017c. RAID56-btrfs Wiki. https://btrfs.wiki.kernel.org/index.php/RAID56.
Metadata chunks: RAID1 (1/1) btrfs Wiki, 2017d. Status - Btrfs Wiki. https://btrfs.wiki.kernel.org/index.php/Status.
btrfs Wiki, 2017e. SysadminGuide - Btrfs Wiki. https://btrfs.wiki.kernel.org/index.
Data chunks: RAID0 (0/2) php/SysadminGuide.
btrfs Wiki, 2018a. Btrfs Wiki. https://btrfs.wiki.kernel.org/index.php/Main_Page.
Using pls on the test scenario gives us additional information btrfs Wiki, 2018b. On-disk Format - Btrfs Wiki. https://btrfs.wiki.kernel.org/index.
about the chunk items of the detected pool as depicted in Listing 9. php/On-disk_Format.
As we can see, all of the data chunks are incomplete. However, Carrier, B., 2005. File system forensic analysis. Addison-wesley professional.
Carrier, B., 2017. The Sleuth Kit. https://www.sleuthkit.org/sleuthkit/.
all metadata chunks are completely available due to the RAID1 Fleischmann, Stefan, 2012. X-ways Forum: X64-ways Forensics 16.4. http://www.x-
configuration. This enables us to perform a recovery by filling the ways.net/winhex/forum/messages/1/3685.html?1359801502.
missing parts of the data. Hilgert, J.N., Lambertz, M., Plohmann, D., 2017. Extending the Sleuth Kit and its
underlying model for pooled storage file system forensic analysis. Digit. Invest.
An example for this recovery is depicted in Fig. 4. In this scenario,
22, S76eS85.
the common BTRFS toolsdeven though they provide support for Hilgert, J.N., Lambertz, M., Yang, S., 2018. The Sleuth Kit with Support for BTRFS.
degraded poolsdwould not provide any of the data, though roughly https://github.com/fkie-cad/sleuthkit.
€schel, Stefan, 2015. Btrfs Support by Basicmaster $ Pull Request #413 $ Sleuthkit/
Po
two thirds of it are still available. To take this even a step further, we
sleuthkit. https://github.com/sleuthkit/sleuthkit/pull/413.
have removed a second disk from our test scenario. Since the Rodeh, O., Bacik, J., Mason, C., 2013. BTRFS: the Linux B-Tree filesystem. Transact.
metadata is mirrored in such a way, that it is still available on the Storage (TOS) 9 (9), 1e9, 32.

You might also like