Types of Storage Virtualization: Block vs. File
Types of Storage Virtualization: Block vs. File
Types of Storage Virtualization: Block vs. File
Storage virtualization is the pooling of physical storage from multiple storage devices into what appears
to be a single storage device -- or pool of available storage capacity -- that is managed from a central
console. The technology relies on software to identify available storage capacity from physical devices
and to then aggregate that capacity as a pool of storage that can be used by traditional architecture
servers or in a virtual environment by virtual machines (VMs).
The virtual storage software intercepts input/output (I/O) requests from physical or virtual machines and
sends those requests to the appropriate physical location of the storage devices that are part of the overall
pool of storage in the virtualized environment. To the user, the various storage resources that make up
the pool are unseen, so the virtual storage appears like a single physical drive, share or logical unit
number (LUN) that can accept standard reads and writes.
A very basic form of storage virtualization is represented by a software virtualization layer between the
hardware of a storage resource and a host -- a personal computer (PC), a server or any device accessing
the storage -- that makes it possible for operating systems (OSes) and applications to access and use the
storage. Even a RAID array can sometimes be considered a type of storage virtualization. Multiple
physical drives in the array are presented to the user as a single storage device that, in the background,
stripes and replicates data to multiple disks to improve I/O performance and to protect data in case a
single drive fails.
Block-based or block access storage -- storage resources typically accessed via a Fibre Channel (FC) or
Internet Small Computer System Interface (iSCSI) storage area network (SAN) -- is more frequently
virtualized than file-based storage systems. Block-based systems abstract the logical storage, such as a
drive partition, from the actual physical memory blocks in a storage device, such as a hard disk drive
(HDD) or solid-state memory device. Because it operates in a similar fashion to the native drive
software, there's less overhead for read and write processes, so block storage systems will perform better
than file-based systems.
The block-based operation enables the virtualization management software to collect the capacity of the
available blocks of storage space across all virtualized arrays and pool them into a shared resource to be
assigned to any number of VMs, bare-metal servers or containers. Storage virtualization is particularly
beneficial for block storage. Unlike NAS systems, managing SANs can be a time-consuming process;
consolidating a number of block storage systems under a single management interface that often shields
users from the tedious steps of LUN configuration, for example, can be a significant timesaver.
Another early storage virtualization product was Hitachi Data Systems' TagmaStore Universal Storage
Platform, now known as Hitachi Virtual Storage Platform (VSP). Hitachi's array-based storage
virtualization enabled customers to create a single pool of storage across separate arrays, even those
from other leading storage vendors.
In-band virtualization -- also called symmetric virtualization -- handles the data that's being read or
saved and the control information (e.g., I/O instructions, metadata) in the same channel or layer. This
setup allows the storage virtualization to provide more advanced operational and management
functions such as data caching and replication services.
Out-of-band virtualization -- or asymmetric virtualization -- splits the data and control paths. Since
the virtualization facility only sees the control instructions, advanced storage features are usually
unavailable.
Virtualization methods
Storage virtualization today usually refers to capacity that is accumulated from multiple physical devices
and then made available to be reallocated in a virtualized environment. Modern IT methodologies, such
as hyper-converged infrastructure (HCI) and containerization, take advantage of virtual storage, in
addition to virtual compute power and often virtual network capacity.
Although waning as a backup target media, tape storage is still widely used for archiving infrequently
accessed data. Archival data tends to be voluminous; storage virtualization can be employed for tape
media to make it easier to manage large data stores. Linear tape file system (LTFS) is a form of tape
virtualization that makes a tape look like a typical NAS file storage device and makes it much easier to
find and restore data from tape using a file-level directory of the tape's contents.
Host-based storage virtualization is software-based and most often seen in HCI systems and cloud
storage. In this type of virtualization, the host, or a hyper-converged system made up of multiple hosts,
presents virtual drives of varying capacity to the guest machines, whether they are VMs in an enterprise
environment, physical servers or PCs accessing file shares or cloud storage. All of the virtualization and
management are done at the host level via software, and the physical storage can be almost any device or
array. Some server OSes have virtualization capabilities built in such as Windows Server Storage
Spaces.
Array-based storage virtualization most commonly refers to the method in which a storage array acts as
the primary storage controller and runs virtualization software, enabling it to pool the storage resources
of other arrays and to present different types of physical storage for use as storage tiers. A storage tier
may comprise solid-state drives (SSDs) or HDDs on the various virtualized storage arrays; the physical
location and specific array is hidden from the servers or users accessing the storage.
Network-based storage virtualization is the most common form used in enterprises today. A network
device, such as a smart switch or purpose-built server, connects to all storage devices in an FC or iSCSI
SAN and presents the storage in the storage network as a single, virtual pool.
Further development of virtualization software, along with standards such as Storage Management
Initiative Specification (SMI-S), allowed virtualization products to work with a wider variety of storage
systems, making it a much more attractive option for enterprises struggling with spiraling storage
capacities.
Easier management. A single management console to monitor and maintain multiple virtualized
storage arrays cuts down on the time and effort necessary to manage the physical systems. This is
particularly beneficial when storage systems from multiple vendors are in the virtualization pool.
Better storage utilization. Pooling storage capacity across multiple systems makes it easier to allocate
so the capacity is more efficiently allocated and used. With unconnected, disparate systems, it's likely
some systems will end up operating at or near capacity, while others are barely used.
Extend the life of older storage systems. Virtualization offers a great way to extend the usefulness of
older storage gear by including them in the pool as a tier to handle archival or less critical data.
Add advanced features universally. Some more advanced storage features like tiering, caching and
replication can be implemented at the virtualization level. This helps standardize these practices
across all member systems and can deliver these advanced functions to systems that may be lacking
them.