sg248516
sg248516
sg248516
Redbooks
IBM Redbooks
November 2021
SG24-8516-00
Note: Before using this information and the product it supports, read the information in “Notices” on page v.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 IBM Spectrum Scale RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Product history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Distinguishing features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 IBM Elastic Storage System (ESS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 IBM Elastic Storage System 3200. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Value added . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 License considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
Redbooks (logo) ® IBM Elastic Storage® POWER7®
AIX® IBM FlashSystem® POWER8®
IBM® IBM Spectrum® POWER9™
IBM Cloud® POWER® Redbooks®
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Ansible, Red Hat, are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the United
States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
This IBM® Redbooks® publication introduces and describes the IBM Elastic Storage®
Server 3200 (ESS 3200) as a scalable, high-performance data and file management solution.
The solution is built on proven IBM Spectrum® Scale technology, formerly IBM General
Parallel File System (IBM GPFS).
IBM Elastic Storage System 3200 is an all-Flash array platform. This storage platform uses
NVMe-attached drives in ESS 3200 to provide significant performance improvements as
compared to SAS-attached flash drives.
This book provides a technical overview of the ESS 3200 solution and helps you to plan the
installation of the environment. We also explain the use cases where we believe it fits best.
Our goal is to position this book as the starting point document for customers that would use
the ESS 3200 as part of their IBM Spectrum Scale setups.
This book is targeted toward technical professionals (consultants, technical support staff, IT
Architects, and IT Specialists) who are responsible for delivering cost-effective storage
solutions with ESS 3200.
Authors
This book was produced by a team of specialists from around the world working at IBM
Redbooks, Tucson Center.
Preface ix
Luis Bolinches has worked with IBM Power Systems servers
for over 15 years, and has been with IBM Spectrum Scale for
over 10 years. He works 20% for IBM Systems Lab Services in
the Nordic region, and the other 80% as part of the IBM
Spectrum Scale development team.
Preface xi
Wesley Jones serves as the test-team lead for IBM Spectrum
Scale Native RAID. He also serves as one of the principle
deployment architects for IBM ESS. His focus areas include
IBM Power servers, IBM Spectrum Scale (GPFS), cluster
software (xCAT), Red Hat Linux, Networking (especially
InfiniBand and Gigabit Ethernet), storage solutions,
automation, and Python.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
Preface xiii
xiv Implementation Guide for IBM Elastic Storage System 3200
1
Chapter 1. Introduction
This chapter introduces the IBM Elastic Storage System 3200 (ESS 3200) solution, the
software characteristics of IBM Spectrum Scale RAID (Redundant Array of Independent
Disks) software that runs on ESS 3200, and provides an overview of the ESS 3200.
ESS 3200 is a high-performance, NVMe flash-storage member of the IBM Spectrum Scale
and Elastic Storage System family storage solutions for high performance, high-scalability
Data and AI applications. For an overview of how ESS 3200 fits into this overall family, see the
companion IBM Redpaper Introduction Guide to the Elastic Storage System, REDP5253.
The IBM Spectrum Scale RAID software in ESS 3200 uses local NVMe drives. Because
RAID functions are handled by the software, ESS 3200 does not require an external RAID
controller or acceleration hardware.
IBM Spectrum Scale RAID in ESS 3200 supports two and three fault-tolerant RAID codes.
The two-fault tolerant codes include 8-data plus 2-parity, 4-data plus 2-parity, and 3-way
replication. The three-fault tolerant codes include 8-data plus 3-parity, 4-data plus 3-parity,
and 4-way replication. Figure 1-1 shows example RAID tracks consisting of data and parity
strips.
In 2007, IBM released the first market product based on IBM Spectrum Scale RAID, the P7IH.
The system is based on the IBM POWER7® system and SAS disks, delivering tens of
gigabytes per second of storage throughput already in 2007.
While P7IH was, and still is, a fantastic engineering machine, in 2012 IBM released the GSS
platform that was running what is known today as IBM Spectrum Scale RAID but on
commodity hardware.
In 2014, IBM superseded the GSS with the first ESS, based on the IBM POWER8 system but
using commercially available servers and disk enclosures while still based on the same IBM
Spectrum Scale RAID that was designed in 2003.
The third generation of IBM Elastic Storage Server was announced starting in October 2019
with the ESS 3000 NVMe Flash storage system. This announcement was followed by the
IBM Spectrum Scale RAID implements end-to-end checksums and data versions to detect
and correct the data integrity problems of traditional RAID. Data is checked from the PDisk
blocks on the ESS 3200 to the memory on the clients that connect over the network. It is the
same checksum, not layers or serialized checksums that terminate in between the chain, so it
really is an end-to-end checksum.
Figure 1-2 shows a simple example of declustered RAID. The left side shows a traditional
RAID layout that consists of three 2-way mirrored RAID volumes and a dedicated spare disk
that uses seven drives. The right side shows the equivalent declustered layout, which still
uses seven drives. Here, the blocks of the three RAID volumes and the spare capacity are
scattered over the seven disks.
Chapter 1. Introduction 3
The declustered RAID layout provides the following advantages over the traditional RAID
layout:
Figure 1-3 shows a significant advantage of the declustered RAID layout over the
traditional RAID layout after a drive failure. With the traditional RAID layout on the left side
of Figure 1-3, the system must copy the surviving replica of the failed drive to the spare
drive, reading only from one drive and writing only to one drive.
However, with the declustered layout that is shown on the right of Figure 1-3, the affected
replicas and the spares are distributed across all six surviving disks. This configuration
rebuilds reads from all surviving disks and writes to all surviving disks, which greatly
increases rebuild parallelism.
Another advantage of the declustered RAID technology that is used by ESS 3200 (and
other IBM systems) is that it minimizes the worst-case number of critical RAID tracks in the
presence of multiple disk failures. ESS 3200 can then handle restoring protection to
critical RAID tracks as a high priority, while giving lower priority to RAID tracks that are not
considered critical.
For example, consider an 8+3p RAID code on an array of 100 PDisks. In the traditional
layout and declustered layout, the probability that a specific RAID track is critical is 11/100
× 10/99 × 9/98 (0.1%). However, when a track is critical in the traditional RAID array, all
tracks in the volume are critical, whereas with declustered RAID, only 0.1%, of the tracks
are critical. By prioritizing the rebuild of more critical tracks over less critical tracks, ESS
3200 quickly gets out of critical rebuild and then can tolerate another failure.
ESS 3200 adapts these priorities dynamically; if a non-critical RAID track is used and
more drives fail, this RAID track’s rebuild priority can be escalated to critical.
A third advantage of declustered RAID is that it makes it possible to support any number
of drives in the array and to dynamically add and remove drives from the array. Adding a
drive in a traditional RAID layout (except in the case of adding a spare) requires significant
data reorganization and restriping. However, only targeted data movement is needed to
rebalance the array to include the added drive in a declustered array.
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=planning
You can also see the following IBM Redpaper: Introduction Guide to the IBM Elastic Storage
System, REDP-5253.
IBM Elastic Storage System 3200 can contain up to 24 NVMe-attached SSD drives, 12 drives
(half-populated), or 24 drives (fully-populated).
For details on the ESS 3200, see the following IBM Documentation link:
https://www.ibm.com/docs/en/ess/6.1.1_cd
Chapter 1. Introduction 5
The third-generation ESS-3200 addresses the challenges of managing today's data. ESS
3200 delivers a new generation of high-performance software-defined flash storage. It builds
on years of experience and couples proven IBM Spectrum Scale software with lightning-fast
NVMe storage technology to offer industry-leading file management capabilities. The ESS
3200 builds on and extends a track record of meeting the needs of the smartest, most
demanding organizations. The ESS 3200 is up to 100% faster than previous generation of
ESS NVMe storage.
Figure 1-4 shows the ESS 3200 NVMe storage building block.
Note: IBM ESS performance is available upon request from IBM or IBM Business Partner
representative. They use the IBM File and Object Solution Design Engine to estimate
performance that is based on your workload and network environment.
Optimum IBM ESS performance is derived from unconstrained IOR benchmark for 100%
sequential read numbers by using unconstrained InfiniBand networks. Other networks
(such as 100 GbE, 40 GbE, and 10 GbE) have more overhead than InfiniBand and typically
lower aggregate bandwidth capabilities result.
For more information, contact your IBM or IBM Business Partner representative.
A storage specialist from IBM System Lab Services implementation is not required if you have
an ESS and IBM Spectrum Scale system and you are comfortable with ESS implementations.
It is much easier to install, and maintenance can be performed by your IT staff.
If the customer is unfamiliar with IBM Spectrum Scale and ESS, IBM recommends that IBM
System Lab Services be used to assure high satisfaction with your initial ESS 3200
installation.
Fast time-to-value
ESS 3200 combines IBM Spectrum Scale file management software with NVMe flash storage
for the ultimate in scale-out performance and simplicity, delivering 80GB/s of data throughput
per 2U system.
Operational efficiency
The demands on IT staff time and expertise are minimized by the containerized software
install and a powerful management GUI. Dense storage within a 2U package means a small
data center footprint.
Reliability
The software-defined erasure coding assures data recovery while using less space than data
replication. Restores can take minutes, rather than hours or days, and can be run without
disrupting operations.
Deployment flexibility
ESS 3200 is available in a wide range of capacities from tens to hundreds of terabytes per
2U. Deploy as a standalone edge system or scale out with additional ESS 3200 systems or
with IBM Elastic Storage System.
ESS uses capacity-based licensing, which means that a customer can connect as many
clients as desired without extra license costs. For other types of configurations, contact IBM
or your IBM Business Partner for license details.
For more information about licensing on IBM Spectrum Scale and ESS, see the following IBM
Documentation links:
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=overview-capacity-based-lic
ensing
https://www.ibm.com/docs/en/spectrum-scale?topic=STXKQY/gpfsclustersfaq.html#morei
nfo
Chapter 1. Introduction 7
8 Implementation Guide for IBM Elastic Storage System 3200
2
CPU
The ESS 3200 system uses a single socket AMD EPYC Rome processor per I/O canister
node for a total of two CPUs per enclosure. Figure 2-1 shows a CPU in a canister.
Memory
The memory of each canister and enclosure is depicted in Table 2-1.
8 (64 GB 512 GB 16 1 TB
DIMM only)
Networking
The ESS 3200 includes two adapters per server canister that consist of the following features:
InfiniBand - EDR 100 Gb / HDR100 100 Gb / HDR200 200 Gb
Ethernet - 100 GbE
The systemctl command can be run on the EMS to start or stop the GUI. Table 2-2 shows the
systemctl command options.
To access the GUI, enter the IP address or host name of the EMS in a web browser using the
secure https mode (https://<IP or hostname of EMS>).
When the GUI is used for the first time, an initial user must be created:
Once the initial user is created, a user can log in to the GUI with the newly-created user and
create more users on the Services → GUI → Users page. By default, users are stored in an
internal user repository. Alternatively, an external user repository can also be utilized. An
external user repository can be configured on the Services → GUI → External
Authentication page.
2. Once all checks pass, the user can define where the ESS 3200 systems are installed on
the Racks page. The user can either choose a predefined rack type or choose Add new
specification if none of the rack types match the available rack. It is important that the
selected rack-type has the same number of height units. A meaningful name can be
specified for the racks to create. See Figure 2-5.
4. The ESS 3200 systems are assigned to the rack locations in which they are mounted. See
Figure 2-7.
The header area of the GUI provides a quick view of the current health problems and tips for
improvement, if applicable. Additionally, links exist to some help resources.
Some menus, such as Protocols, are only displayed when the related features, such as NFS,
SMB, or AFM are enabled.
The table values can be sorted by clicking one of the column headers. A little arrow in the
table header indicates the sorting.
Double-click a table row to open a more detailed view of the selected item.
The Replace Broken Disks action launches a guided procedure to replace broken disks if
there are any.
Click the ESS 3200 in the virtual rack to see more information about the ESS 3200, including
the physical disks and the two canisters (Figure 2-12). Move the mouse over the components,
such as drives and power supplies, to see more information. Clicking components moves to a
page with more detailed information. Broken disks are indicated with the color red, and a
context menu (right-click) enables the user to replace the selected broken disk.
If there is more than one rack, click the arrows that are displayed on the left and the right side
of the rack to switch to another rack.
This page allows the user to search for components by text and filter the results to display
only unhealthy hardware.
Click the > icon on the tree nodes to display subsequent children. For example, the user can
click to display all Current Sensors of the canister in Figure 2-13.
The procedure can be launched from different places. A good place to look for broken disks is
the Storage → Physical Disks page that is shown in Figure 2-15.
The user can mark events of type Notices as read to change the status of the event in the
Events view. The status icons become gray if an error or warning is fixed, or if it is marked as
read.
Some issues can be resolved by using the Run Fix Procedure action, which is available on
select events. Right-click an event in the events table to see this option.
The emails can be sent to multiple email recipients, which are defined in the Monitoring →
Event Notifications → Email Recipients page. For each recipient, the user can select the
components for which to receive emails, and the For minimum severity level (Tip, Info,
Warning, or Error). Instead of receiving a separate email per event, optionally a daily
summary email can be sent. Another option is to receive a Daily Quota report. See
Figure 2-19.
2.2.9 Dashboards
The Monitoring → Dashboard page provides an easy-to-read, single-page, real-time user
interface that provides a quick overview of the system performance.
Some default dashboards are included with the product. Users can further modify or delete
the default dashboards to suit their requirements and can create additional new dashboards.
The same dashboards are available to all GUI users, so modifications are visible to all users.
A dashboard consists of several dashboard widgets that can be displayed within a chosen
layout.
Widgets are available to display the following items, as shown in Figure 2-20:
– Performance metrics
– System health events
– File system capacity by file set
– File sets with the largest growth rate in the last week
– Time lines that correlate performance charts with health events
IBM preinstalls this complete integrated, tested ESS solution stack on the ESS servers in IBM
Manufacturing.
The ESS solution-stack levels are released as a version, release, modification, and fixpack
level.
For more information about the release levels of the ESS software solution and the levels of
the software components for that ESS release level, see IBM Documentation at:
https://www.ibm.com/docs/en/ess-p8?topic=SSYSP8/gnrfaq.html#GNRfaqSept2016-gen4sup
portmatrixq.
https://www.ibm.com/docs/en/ess/6.1.1_cd?topic=quick-deployment-guide
The ESS solution-stack components are periodically up-leveled, tested, and released as a
new level of ESS solution software. IBM recommends that clients plan to upgrade their ESS
to the current level of ESS solution software stack at least once a year.
Figure 2-21 shows the tree of the ansible directory included in the container.
The roles directory contains a set of folders intended to contain a set of tasks for various
purposes, for example: configureenv, contains the “essrun config load” set of tasks.
If you want to import or use the roles within your own Ansible Playbook, you can import the
roles and the vars.yml file, since it contains several variables used within every role.
Example 2-1 shows how to import an ESS role into your own Ansible Playbook:
vars_files:
- /opt/ibm/ess/deploy/ansible/vars.yml
# importing roles
tasks:
- include_role:
name:
/opt/ibm/ess/deploy/ansible/roles/configureenv
The mmvdisk command is an integrated command suite for IBM Spectrum Scale RAID. It
greatly simplifies IBM Spectrum Scale RAID administration, and encourages and enforces
consistent best practices regarding server, recovery group, VDisk NSD, and file system
configuration.
The mmvdisk command can be used to manage new IBM Spectrum Scale RAID installations.
If you are integrating ESS 3200 with a setup that already has other ESS systems that are
non-mmvdisk recovery groups, those systems must be online-converted into mmvdisk recovery
groups before adding the ESS 3200 into the same cluster.
For more information about the mmvdisk command, see the following IBM Documentation:
https://www.ibm.com/docs/en/ess-p8/6.1.1?topic=commands-mmvdisk-command
For any type of cluster, the holistic status includes the status of the general parallel file system
(GPFS) daemons, the NODE software status, tracking of EVENTS that happened to the
cluster, and the FILESYSTEM health status.
The depth of the details for one NODE depends on a few factors:
If the node is a software-only node, where IBM Spectrum Scale formats only external
block devices to the cluster.
If the system is one that uses IBM Spectrum Scale RAID, such as the ESS 3200. In the
ESS 3200 case, mmhealth monitors and reports the following non-exhaustive list:
– Hardware specific, the same as other ESS hardware solutions:
• Temperature of different sensors of the enclosure
• Power supply hardware status
• Fan speeds and status
The mmhealth command provides IBM Spectrum Scale software-related checks across all
node and device types present in the cluster. Software RAID checks are present across all
GPFS Native RAID GNR offerings (such as ESS 5000, ESS 3000, ESS 3200, and ECE). For
devices (such as ESS 3200) that are integrated with IBM Spectrum Scale hardware, you also
get the hardware checks and monitoring.
For details about how to operate with the mmhealth command, see IBM Documentation at:
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=reference-mmhealth-command
The mmhealth command changes to support ESS 3000 and ESS 3200
The mmhealth command, as described in “The mmhealth command” on page 27, has a
specific component monitoring when an IBM Spectrum Scale Native RAID (GNR)
environment is in-use. As of the June 202119 release of ESS 3200 (version 611x), mmhealth
now supports GNR health monitoring on the 5141-FN1 solution (x86 based NVME platform).
The mmhealth command was extended to support the additional hardware components of the
ESS 3000 and ESS 3200, and to address the needs of users who want to monitor the
environment. This section initially references much of the current mmhealth information
available through IBM Redbooks, command references, administration documents, and other
publicly available resources. The second section describes the specific additional changes
that are made in mmhealth to support ESS 3000 and ESS 3200.
The following link shows all of the current Reliability, Availability and Serviceability (RAS)
events supported by mmhealth. These include all events supported by IBM Spectrum Scale,
with a subset specific to GNR:
https://www.ibm.com/docs/en/ess-p8/6.1.1?topic=references-events
Canister events are new to ESS 3000 and ESS 3200, and are described in “Canister events”
on page 29. The rest of the events are applicable to ESS (legacy) and ESS 3000 and ESS
https://www.redbooks.ibm.com/abstracts/redp5557.html?Open
The following link is the main page for mmhealth. It shows the complete command usage,
features, and examples:
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=reference-mmhealth-command
The mmhealth command includes several changes to support ESS 3000 and ESS 3200.
The Canister events category was included to support many of the differences between
legacy ESS and ESS 3200.
The Server category was also adjusted.
Both of these adjustments and other changes to mmhealth are included in the following
sections.
Canister events
These events are new and specifically added to support the new Canister-based
building-block configuration of the ESS 3000 and ESS 3200. For information such as events
related to the boot drive, temperature, CPU, and memory, see:
https://www.ibm.com/docs/en/ess-p8/6.1.1?topic=events-canister
A new command (ess3kplt) was created by GNR to provide CPU and memory health
information to mmhealth:
/opt/ibm/gss/tools/bin/ess3kplt
Command usage:
ess3kplt -h
usage: ess3kplt [-h] [-t SELECTION] [-Y] [-v] [--local]
Optional arguments:
-h, --help show this help message and exit
-t SELECTION Provide selection keyword: [memory|cpu|all]
-Y Select report listing
-v Enable additional output
--local Select localhost option
CPU and DIMM-related events that mmhealth reports rely on the ess3kplt command in the
ESS 3000 and ESS 3200 environments.
ESS 3200 is targeted at delivering the following key traits in Appliance customer experience:
Easy to order
Easy to install
Easy to upgrade
Easy to use
Easy to service
ESS 3200 offers a Samsung-only NVMe drive with the following capacity options at its initial
GA in 2Q 2021 with either 12-drive or 24-drive installation options:
3.8 TB
7.6 TB
15.3 TB
ESS 3200 uses a mirrored set of 960 GB M.2 NVMe SSD drives as the boot disks. The M.2
SSD includes the Power Loss Protection (PLP) feature. It offers only one
memory-configuration per canister, as shown in Table 2-3.
Given the IBM Spectrum Scale GNR design and the M.2 PLP feature to ensure the data
persistency for GNR log files maintained in the boot disks, ESS 3200 does not require a
Battery Backup Unit (BBU).
When you plan to install a 100G adapter, the following adapters are available:
AJZL: CX-6 InfiniBand/VPI in PCIe form factor
– InfiniBand - HDR200 200 Gb / HDR100 100 Gb / EDR 100 Gb
– Ethernet - 100 GbE
AJZN: CX-6 DX in PCIe form factor (Ethernet - 100 GbE)
Figure 2-22 shows the front view of the ESS 3200 enclosure.
Figure 2-23 shows the rear view of the ESS 3200 enclosure.
ESS 3200 also offers same-day service upgrade options, and optional priced services that
include lab-based services (LBS) installation.
The following are key hardware components that fall into the FRU category:
Canister
Dual in-line memory module (DIMM)
Adapter
M.2 boot drive
The following are key hardware components that fall into the CRU category:
NVMe drive
Drive filler
Power module
https://www.ibm.com/docs/en/ess/6.1.1_cd?topic=service-guide
Since IBM Documentation is a single point of reference that provides information about IBM
systems hardware, operating systems, and server software, it is recommended that the user
conduct a search for ESS 3200 to get the latest updates, as the online source is actively
maintained. A Service Guide option is listed that brings up the appropriate service procedure.
If preferred, ESS Service Guide can be downloaded using the link in the navigation.
Concurrent maintenance repair and update are available on the ESS 3200 for servicing and
replacing of FRUs and CRUs in the system. This allows for maintenance to be performed on
the system while it is used in normal operations. In addition, CRUs are hot-swappable
components in the ESS 3200 that allow for replacement without powering down the server.
Components such as fans and power modules are supported as concurrently removable and
replaceable parts of the ESS 3200.
https://www.ibm.com/docs/en/spectrum-scale-ece/5.1.1?topic=commands-mmlsenclosure-
command
Enclosure components
Several component statuses are reported by the enclosure:
canister Failure of the left or right canister (for example server nodes)
cpu The failed CPUs that are associated with a canister
dimm The memory modules that are associated with a canister
fan The status of fans in the enclosure
Call-home events are also generated for other hardware-related events including boot drives,
fan modules, and power modules. Software call home is a supported call-home configuration
in the ESS 3200. In this case, the EMS node acts as the software call-home server. The
software call-home feature collects files, logs, traces, and details of certain system health
events from different nodes and services in an IBM Spectrum Scale cluster.
For more detailed information about how call home works, see the Elastic Storage Server
Version 6.1.1 Quick Deployment Guide:
https://www.ibm.com/docs/en/SSZL24_5K_6.1.1/pdf/ess_sg.pdf
https://www.ibm.com/docs/en/linux-on-systems?topic=tools-electronic-service-agent
Note: The performance measurements referenced here were made using standard
benchmarks in a controlled environment. The actual performance might vary depending on
several factors such as the interconnection network, the configuration of client nodes, and
the workload characteristics.
Some factors related to ESS 3200 performance are listed in the following sections.
2.5.1 Network
Network components play a key role in the overall performance of IO operations. This section
provides details of types of network hardware, associated configuration, and utilities in
assessing the network device throughput.
RDMA is available on standard Ethernet-based networks by using the RDMA over Converged
Ethernet (RoCE) interface. For more information on how to set up RoCE, see Highly Efficient
Data Access with RoCE on IBM Elastic Storage Systems and IBM Spectrum Scale,
REDP-5658.
Post ESS 3200 installation, if required, the network type can be changed using the mmchnode
command. For more information, see the mmchnode command.
For using RDMA, verify the following settings using the mmlsconfig command. These settings
can be modified to the desired values by using the mmchconfig command.
The verbsRdma option controls whether RDMA (instead of TCP) is used for NSD data
transfers. Valid values are enable and disable.
The verbsRdmaSend option controls whether RDMA (instead of TCP) and is also used for
most non-data IBM Spectrum Scale daemon-to-daemon communication. Valid values are
yes and no.
The verbsPorts option specifies the device names and port numbers that are used for
RDMA transfers between IBM Spectrum Scale client and server nodes. You must enable
verbsRdma to enable verbsPorts.
Link Aggregation
The bonding or link aggregation can be an important factor for Ethernet TCP/IP performance.
Most deployments use the Link Aggregation Control Protocol (LACP) or 802.3ad as the
aggregation mode. The LACP aggregation determines the interface to use based on the hash
of packet's source and destination information.
With multiple connections over TCP (MCOT), multiple TCP port numbers are used. By using
LACP xmit_hash_policy=1 or layer3+4 (hash is generated using the IP and Port information of
source and destination), a better chance exists of using multiple interfaces between a given
pair of nodes. The load-balancing algorithm on the switch is also important to ensure better
balancing across links from switch to destination.
Note: A NAND (short for NOT AND) flash NVMe drive has the potential for improved
performance over a NAND flash SATA drive because of its more efficient bus connection
and protocol improvements. For example, NVMe allows for longer command queues.
Example 2-4 192 (512 byte) sectors were read in 125 microseconds
I/O start time RW Buf type disk:sectorNum nSec time ms tag1 tag2 Disk UID typ NSD node context thread […]
--------------- -- ----------- ----------------- ----- ------- --------- ------------ ------------------ --- --------------- --------- ---------[…]
19:07:22.558042 R data 19:7219193624 192 0.125 83733675 103 C0A85216:61002B87 pd Pdisk NSDThread […]
To look at the I/O latencies of requests at the NSD layer on the ESS 3200 server, look for srv
layer I/O times. These times show I/O latencies that account for disk I/O times and NSD
processing on the server.
On the ESS 3200, disk I/Os are generally faster than on the non-NVMe based ESS models.
This means that the ratio of time spent in remote procedure call (RPC) overhead tends to be
higher relative to the actual disk I/O times. For this reason, systems that support RDMA
should enable the verbsRdmaSend option, so that RPCs can be handled through low latency
RDMA operations.
Example 2-5 shows a 195-microsecond network shared disk (NSD) server (srv) layer I/O on
an ESS 3200, which corresponds to the previously shown 125-microsecond pDisk I/O.
Example 2-5 also shows 128 sectors that are shown in Example 2-4, as pDisk layer I/O
accounts for additional sectors read for checksum validation.
Example 2-5 195-microsecond NSD server (srv) layer I/O on an Elastic Storage System 3200
I/O start time RW Buf type disk:sectorNum nSec time ms tag1 tag2 Disk UID typ NSD node context thread […]
--------------- -- ----------- ----------------- ----- ------- --------- ------------ ------------------ --- --------------- --------- --------- […]
19:07:22.558003 R data 2:207962112 128 0.195 83733675 103 C0A85216:61002E71 srv 100.168.85.111 NSDWorker NSDThread […]
If an I/O is satisfied from the GNR cache, there will not be a corresponding pdisk level I/O as
seen in the Example 2-4. Even though the drive accesses on the ESS 3200 are more efficient
than comparable drive accesses on non-NVMe devices, the relative benefit of data residing in
the GNR disk cache will be lower. However, an improved performance is expected as the
elements can be efficiently swapped in and out of the GNR cache.
To see the latency of I/O requests from the client's perspective, look for 'cli' I/O times on the
client-node in the output of mmdiag --iohist. (These times include network processing time
and the time that requests wait for processing on the server.) Example 2-6 shows that for the
previously shown 128 sector I/O, it took about 575 microseconds from the client's perspective.
For more information, see Managing TRIM support for storage space reclamation.
The optimal performance is achieved when both the servers access all the drives in parallel,
because a single server does not have sufficient PCIe bandwidth to drive all 24 disks at full
bandwidth.
2.5.4 Tuning
ESS 3200 configuration parameters are set automatically for optimal performance at the time
of install procedures. This section describes the key configuration parameters on the I/O
servers and client nodes that are required to be further modified to achieve optimal
performance. Modifications are based on the nature of the application I/O activities.
IBM Spectrum Scale version 5 introduced variable subblock sizes, making space allocations
for smaller files more efficient with larger block sizes, and improving file creation and block
allocation times. With variable subblock sizes, it is advised to avoid using different block sizes
for data and metadata within the same file system. Setting metadata block size that is smaller
than the data block size results in a larger subblock for user storage pools. This causes
block-allocation time to become longer than it would as compared to the case where the block
size for both metadata and data block is the same.
Example 2-7 The mmvdisk server configure command with the --verify option
# mmvdisk server configure --nc ess3200_x86_64_mmvdisk_78E4005 -verify
mmvdisk: Checking resources for specified nodes.
mmvdisk: Node class 'ess3200_x86_64_mmvdisk_78E4005' has 503 GiB total real memory
per server.
mmvdisk: Node class 'ess3200_x86_64_mmvdisk_78E4005' has a shared recovery group
disk topology.
mmvdisk: Node class 'ess3200_x86_64_mmvdisk_78E4005' has server disk topology 'ESS
3200 FN1 12 NVMe'.
mmvdisk: Node class 'ess3200_x86_64_mmvdisk_78E4005' uses 'ess3200.shared'
recovery group configuration.
The tuned profile should be set to “scale” automatically after ESS deployment. Check the
current active profile using tuned-adm active command, as shown in Example 2-8.
If tuned profile is not set to “scale”, modify using tuned-adm profile as shown in
Example 2-9.
The system settings can be verified against current profile by using the tuned-adm verify
command, as shown in Example 2-10.
Client Tuning
After the client cluster is created in the installation phase, extra networking and performance
settings can be applied. The gssClientConfig.sh script can be used to apply basic best
practice settings for your client NSD cluster. Running this script with the -D option shows the
configuration settings that it intends to set without setting them.
Additionally, this script attempts to configure the client nodes for RDMA access by setting the
following mmchconfig parameter values if applicable:
verbsRdma enable
verbsRdmaSend yes
pagepool
Defines the amount of memory to be used for caching file-system data and metadata.
Additionally, used in few non-caching operations, that is, buffers allocated for encryption
buffers and DMA transfers for DIO data.
pagepool is a pinned memory region that cannot be swapped out to disk, that is, IBM
Spectrum Scale will always consume at least the value of the pagepool attribute in the
system memory. Users need to consider the memory requirements of other applications that
are running on the node while determining a value for the pagepool attribute.
For best sequential performance, it is required that you tune the pagepool attribute.
Increasing pagepool beyond this value is most beneficial for workloads (non-direct IO) that
re-read the same data because more data can be cached in the pagepool.
Use the -P option of the gssClientConfig.sh script to set the pagepool value as follows:
#gssClientConfig.h -P <size in MiB> <node names>
ESS 3200 storage building blocks are part of an overall IBM Spectrum Scale solution. Other
elements of this solution might include (but are not limited to) other ESS models, Spectrum
Scale client or server or protocol nodes, integration with the customer network, physical
installation, racking, and cabling.
Planning for these broader factors is described in the IBM Redbooks Introduction Guide to
Elastic Storage System, REDP-5253, Chapter 3, “IBM Elastic Storage System planning and
integration”.
A TDA is an internal IBM process that includes a technical inspection of a completed solution
design. This process assures customer satisfaction and insures a smooth and timely
installation. Technical Subject Matter Experts (SMEs) who were not involved in the solution
design participate to determine:
• Will the ESS 3200 solution work?
• Is the implementation and plan sound?
• Will it meet customer requirements and expectations?
The two TDA processes have assessment questions, but also baseline benchmarks, that
need to be performed before the order can be fulfilled. Those tools are driven by IBM sales or
resellers, so they can help and direct you regarding this process.
Rack solution
The ESS 3200 comes with at least one rack from IBM; it can come with more if multiple
building blocks (BB) are ordered. The rack also holds the ESS Management Server and the
management switch. If the HS switches are ordered from IBM, those are also included in the
rack.
Although the preferred option is the rack version of the ESS 3200 solution, it is possible to
order ESS 3200 without the rack. If you choose to follow this path, you must verify that the
rack can hold the weight of the solution, and that the power distribution units (PDUs) on the
rack are the right ones for the ESS 3200 solution. In addition, you must contact IBM to
configure the management switch that comes with the solution.
Introduced with ESS 3200, is a new deployment configuration where the 1GbE management
switches have a specific configuration where ports 1 - 12 are “ESS 3200" ports as shown in
Figure 3-1.
Figure 3-1 is an example of the 1GbE Management Switch deployment. Deployment specifics
can change from release to release. Always check the ESS Quick Deployment Guide for the
information for your ESS implementation. At the time of this writing, the most recent ESS
Quick Deployment Guide can be found at:
https://www.ibm.com/docs/en/ess/6.1.1?topic=quick-deployment-guide
Connect switch
management
port to port 1
of switch
High-speed network
As with any other IBM Spectrum Scale/ESS configuration, the ESS 3200 requires a
high-speed (HS) network to be used as the data-storage cluster network. In some product
documentation, this network is referred to as a Clustering network. The hardware for the HS
network can be provided by IBM or the customer. If the hardware is provided by the customer,
it must be compatible with the network interfaces that the ESS 3200 supports. See section
2.1.1, “Canisters and servers” on page 10 to see the available network options on ESS 3200.
IBM does not support re-purposing an existing server or a VM or LPAR to be used as the
EMS.
For more details about the EMS server, see the following IBM Documentation link:
https://www.ibm.com/docs/POWER9/p9hdx/5105_22e_landing.htm
The IBM or IBM Business Partner team uses the FOSDE tool and IBM eConfig for Cloud to
configure the EMS. eConfig configures the EMS with the appropriate network cards such that
the EMS can participate in the same HS networks that are configured on the ESS 3200.
The default IBM ESS Management Server memory size is enough for most IBM ESS
installations. If many IBM ESSs are used in your Spectrum Scale IBM ESS configuration,
check with your IBM representative to see whether larger IBM ESS Management Server
memory sizes might be required for your installation. More EMS memory can be specified at
order time or added later as a field Miscellaneous Equipment Specification (MES).
While an ideal shop would update their systems at least once a year, there are other shops
where that is simply not possible. Either due legal certification reasons, operational, or other
reasons. To the point that might end up on updates happening once every three or more
years.
IBM strongly recommends the following key points that should be followed when doing
software currency on ESS-related environments:
Never do more than N-3 jump of an ESS-software update. Do intermediate jumps if
needed to maintain this rule.
Always update the EMS first.
Prefer offline over online updates. If online update is a requirement, explore the -serial
option to limit the risk exposure in case some nodes experience problems during the
update.
If you encounter a problem, involve IBM Service/Support. While you do have root access,
changing some things might fix your problem today but create future issues due to the
automation expecting the configuration to be a certain way. So, stabilize the environment,
involve support, and continue on another day.
Always keep the ESS cluster in the same level. You can update different systems, but
goal-level should be the same. If that is not possible, think about partitioning your backend
cluster to achieve this rule.
Use defaults, unless you have an empirical reason backed up with data to not do so.
Management network
The management network is a non-routable private network. It connects the EMS PCI card
slot 11 (C11) port 4 (T1), which acts as a DHCP server to all I/O nodes on C11 T1.
Through this network, EMS, and containers on the EMS, manage the OS of the I/O nodes.
This network cannot use VLAN tagging of any kind, so it must be configured as an access
VLAN on the switch. You can choose any netblock that fits your needs, but as a best practice
use a /24 block. If you have no preference, use the 192.168.45.0/24 block because it is the
one that is used in this paper and most of the documentation examples.
The EMS and the containers running on the EMS use this network to do FSP and baseboard
management controller (BMC) operations on the physical servers, which include powering-on
and powering-off the servers and many other operations. This network uses VLAN tagging at
the ESS 3200 ports and no tagging on the rest of ports. You can choose any netblock that fits
High-speed network
The HS data network is where the IBM Spectrum Scale daemon and admin networks should
be configured. It is a customer-provided and managed network.
Network design for parallel file system is not a simple topic. The HS network design and
implementation is usually the deciding factor on what the overall performance your system
delivers. Unfortunately, there is no silver-bullet design that fits every use case.
In cases where the HS data network is Ethernet based, both the daemon and admin network
should be on the HS Ethernet network (unless you have good reasons not to do so).
If you have InfiniBand networks, you can use Ethernet adapters on the HS network if they are
available, or you can use an IP over InfiniBand encapsulation.
The Management or FSP network should not be part of the IBM Spectrum Scale cluster as a
management or daemon network.
With the networking information described in this section, perform the following sizing
exercise:
Expected client-required throughput performance and number of client-ports with their
aggregated performance versus the number of ESS 3200 or other ESS I/O nodes or
canister aggregated performance.
Include any inter switch links (ISL) that are in place as well as PCIe speeds and feeds for
each system. As an example, consider a simple two HS IB 200 Gbit ports scenario, where
each ESS 3200 includes eight ports connected. Assuming there are PCI3 Gen 4 x16 lines on
the clients and 200 Gbit IB ports connected (high dynamic range (HDR)), it is an ideal
scenario to have up to eight of those clients and four ISLs between switches. From there,
some network oversubscription occurs that might affect your workload.
The ESS canister ports do not have Ethernet connectivity to connect to the canister.
Therefore, the IBM SSR accesses the device through a serial cable on each canister.
IBM and IBM Business Partners can provide education courses and services to teach these
skills.
Customers and IBM Business Partners can engage IBM Systems Lab Services, which are
available and recommended to provide help in integrating ESS 3200 into your client
environment.
A standalone ESS 3200 unit, which is known as a building block, must minimally consist of
the following components:
One EMS node in a 2U form factor
One ESS 3200 node in 2U form factor
1 GbE Network switch for management network (1U)
100/200 Gb high speed IB or Ethernet network for internode communication (1U)
The EMS node acts as the administrative end point for your ESS 3200 environment. It
performs the following functions:
Hosts the Spectrum Scale GUI
Hosts Call-Home services
Hosts system health and monitoring tools.
Manages cluster configuration, file system creation, and software updates
Acts as a cluster quorum node
The ESS 3200 features a brand-new container-based deployment model that focuses on
ease-of-use. The container runs on the EMS node. All of the configurations tasks that were
performed by the gssutils utility in legacy ESS are now implemented as Ansible Playbooks
that are run inside of the container. These playbooks are accessed using the essrun
command.
The essrun tool handles almost the entire deployment process, and is used to install software,
apply updates, and deploy the cluster and file system. Only minimum initial user input is
required, and most of that is covered by the TDA process before setting up the system. The
essrun tool automatically configures system tuneables to get the most out of a single ESS
For more information about deployment customization, see the ESS 3200 Quick Deployment
Guide:
https://www.ibm.com/docs/en/ess/6.1.1_cd?topic=quick-deployment-guide
Create bonds in ESS 3200 building block within ESS 3200 container that is running in the
POWER9 EMS. See Example 3-2.
Add ESS 3200 I/O nodes to the existing cluster from ESS5000Node1. See Example 3-3.
Add ESS 3200 EMS node (Example 3-4) to the existing cluster from ESS3200Node1. See
Example 3-4.
Create bonds in ESS 3200 building block within ESS 3200 container that is running in the
POWER9 EMS. See Example 3-6.
Add ESS 3200 I/O nodes to the existing ESS 5000 cluster from within ESS 3200 container
that is running in the POWER9 EMS. See Example 3-7.
Create bonds in ESS 3200 building block within ESS 3200 container that is running in the
POWER9 EMS. See Example 3-9.
Add ESS 3200 I/O nodes to existing the ESS 3000 cluster from within ESS 5000 container
that is running in the POWER9 EMS. See Example 3-10.
Add ESS 3200 EMS node to existing ESS 3000 cluster from within ESS 5000 container that is
running in the POWER9 EMS. See Example 3-11.
Adding ESS 3200 to mixed ESS Legacy + ESS 3000 cluster + ESS 5000
cluster
Run the config load command within ESS 3200 container that is running in the POWER9
EMS to fix the SSH keys across all of the nodes. See Example 3-12.
Example 3-12 Run config load with ESS Legacy + ESS 3000 + ESS 5000
ESS 3200 CONTAINER [root@cems0 /]# essrun -N
ESS32000Node1,ESS32000Node2,ESS3200EMSNode,
Create bonds in ESS 5000 building block within ESS 5000 container that is running in the
POWER9 EMS. See Example 3-13.
Add ESS 3200 I/O nodes to existing ESS Legacy + ESS 3000 cluster + ESS 5000 from within
ESS 3200 container that is running in the POWER9 EMS. See Example 3-14.
Example 3-14 Add ESS 3200 nodes to ESS Legacy + ESS 3000 cluster
ESS 3200 CONTAINER [root@cems0 /]# essrun -N ESS3000Node1 cluster --add-nodes
ESS3200Node1,ESS3200Node2 --suffix=Suffix
3.3.2 Scenario-1: Using ESS 3200 for metadata network shared disks
This section describes how to use ESS 3200 for metadata network shared disks (NSDs) for
the existing file system
This scenario starts with an existing ESS 5000 cluster and file system deployed from a
POWER9 EMS.
The following steps provide guidance to set up the ESS 3200 for metadata NSDs for the
existing file system:
1. Deploy the ESS 3200 container into the POWER9 EMS.
Customer logs in to the POWER9 EMS and completes the “Common Installation
Instructions” from the Quick Deployment Guide at:
https://www.ibm.com/docs/en/ess/6.1.1_cd?topic=guide-ess-common-installation
-instructions
2. Add the ESS 3200 Building Block to the cluster.
From step 1, within container run the ansible essrun command to add the new ESS 3200
to the ESS 5000 cluster (essio1 is equal to an existing ESS I/O node in the cluster):
root@cems0:/ # essrun -N essio1 cluster --add-nodes CommaSeparatedNodesList --suffix=-hs
3. Create the ESS 3200 VDisk set as metadataOnly.
Within container run the ansible essrun command to create the new ESS VDisk set (using
16M as block size, since ESS 5000 file system is the default one):
root@cems0:/ # essrun -N ess32001a,ess32001b vdisk --name newVdisk --bs 16M
--suffix=-hs --extra-vars "--nsd-usage metadataOnly --storage-pool system"
4. Add the ESS 3200 VDisk set to the existing ESS 5000 file system.
After the container is running and the cluster and recovery groups are created, the user can
create the file system by running the essrun command:
$ essrun -N essio1,essio2 filesystem --suffix=-hs
Note: This command creates vdisk sets, NSDs, and file systems by using mmvdisk. The
defaults are 4M blocksize, 80% set size, and 8+2p RAID code. These values can be
customized by using additional flags.
For CES deployment, the IBM ESS 3200 system should have a CES file system. To create
the CES file system, run the following command:
$ essrun -N essio1,essio2 filesystem --suffix=-hs --name cesSharedRoot --ces
Note: A CES and other file systems can coexist on the same IBM ESS cluster.
Data and AI use cases require not only high-performance storage, but also:
An ecosystem of dynamic, scalable, reliable, high-performance storage
A performance-storage tier that delivers GBps to TBps performance to drive GPUs and
modern compute
Performance storage that must also seamlessly integrate as part of an enterprise data
fabric that also has capacity tiers for:
• Enterprise data repositories
• Scalable flexible Hybrid Cloud tiers
• Cost-effective deep archive Tape and Object tiers
The following sections describe how ESS 3200 as performance storage solves essential Data
and AI application use cases for AI model training, inference, metadata, indexes, and
databases. ESS 3200 is also described as a seamless, integrated data component in a larger
Storage for Data and AI ecosystem based on IBM Spectrum Scale.
4.1.1 ESS 3200 as part of a larger Storage for Data and AI ecosystem
As we look further at the performance-storage use case storage ecosystem, ESS 3200 is
positioned as high-performance storage system within the performance tier.
More importantly, ESS 3200 is part of a larger family of IBM Storage solutions that
comprehensively covers all aspects of the Storage for Data and AI ecosystem, as shown in
Figure 4-2.
Because ESS 3200 is part of a larger storage ecosystem, you can start small, and then grow
your Data and AI ecosystem to enterprise levels, all non-disruptively. With ESS 3200, you
can:
Start small as standalone, single ESS 3200 high-performance system.
You can add additional ESS 3200s to expand the performance tier.
You can add one or multiple IBM ESS 5000s as HDD high-capacity tier.
You can add flexibly by adding other IBM and non-IBM Storage components to the storage
ecosystem, including hybrid cloud capacity, and archive capacity to tape or object storage.
All this can be done because ESS 3200 is an IBM Spectrum Scale storage system. IBM
Spectrum Scale provides a global namespace across all the physical storage and data under
its control. IBM Spectrum Scale provides the ability to non-disruptively, add, expand, grow,
and modify the Data and AI storage ecosystem as needed.
ESS 3200 is part of a larger set of Data and AI use cases that provide an end-to-end
enterprise data-fabric and data-management storage solution. With ESS 3200 and other IBM
Storage for Data and AI solutions, you can start small, seamlessly expand, and grow your
Storage for Data and AI ecosystem in many flexible ways, to enterprise levels. This is all
powered by the IBM Spectrum Scale single global namespace which spans all storage tiers.
Typical use cases for ESS 3200 include specific performance tier High-Performance
Computing (HPC), AI, analytics, or other high-performance workloads with demanding
requirements such as:
AI applications requiring high-performance data effectively exploiting GPU technology at
high resource utilization
Acceleration of scale-out applications with dense NVMe Flash technology
Information Lifecyle Management and data-tiering management of data in new or existing
IBM Spectrum Scale environments
Metadata acceleration, indexes, database acceleration
High-performance storage at the edge
The following sections of this chapter explore some of these use cases.
Metadata generally refers to data about data, and in the context of IBM Spectrum Scale
metadata refers to various on-disk data structures that are necessary to manage user data.
Directory entries and inodes are defined as metadata, but at times the distinction between
data and metadata might not be obvious.
For example, in the case of a 4 KB inode, although the inode might contain user data, the
inode is still classified as IBM Spectrum Scale metadata. The inode is placed in a metadata
pool if data and metadata are separated. Another example is the case of directory blocks,
which are classified as metadata but also contain user file and directory names.
This approach to metadata tiering can be adopted when trying to optimize the performance of
metadata operations, such as listing directories and making stat() calls on files. For more
information, see IBM Documentation for IBM Spectrum Scale on User Storage Pools.
Another alternative tiering approach involves, instead of tiering data based on a data or
metadata classification, using the IBM Spectrum Scale File Heat function to migrate data
between storage pools based on how frequently data is accessed. For more details on this
approach, see IBM Documentation for IBM Spectrum Scale File Heat: Tracking File Access
Temperature.
https://www.ibm.com/downloads/cas/MJLMALGL
https://www.ibm.com/downloads/cas/MNEQGQVP
These reference architectures provide a proven blueprint for enterprise leaders, solution
architects, and other readers who are interested in learning how the IBM Spectrum Storage
for AI with NVIDIA DGX systems simplifies and accelerates AI. The scalable infrastructure
solution integrates the NVIDIA DGX systems with IBM Spectrum Scale GPU Direct file
storage software, which powers the IBM ESS family of storage systems that includes the new
IBM ESS 3200.
The reference architectures describes the linear growth of the AI or ML system from both of
the GPU workloads on the NVIDIA DGX GPU data compute acceleration systems. The IBM
Spectrum Scale ability is used to deliver the linear growth in throughput, scaling linearly the
maximum of 80 GBps read-throughput for each ESS 3200.
In the market for AI and ML workloads, any GPU or data acceleration, AI, and ML workload
can benefit from the outstanding performance capabilities of the ESS 3200 system. For more
information on using ESS 3200 with high performance GPU, see:
https://community.ibm.com/community/user/storage/blogs/douglas-oflaherty1/2021/06/
22/ibm-nvidia-team-on-supercomputing-scalability
For more information on the IBM Spectrum Scale GPUDirect Storage (GDS) Technical
Preview, see:
https://www.ibm.com/support/pages/node/6444075
4.4.1 IBM Spectrum Scale with big data and analytics solutions
IBM Spectrum Scale is flexible and scalable software-defined file storage for analytics
workloads. Enterprises around the globe deploy IBM Spectrum Scale to form large data lakes
and content repositories to perform high-performance computing (HPC) and analytics
workloads. IBM Spectrum Scale is known to scale performance and capacity without
bottlenecks.
Cloudera is a leader in Hadoop and Spark distributions. Cloudera addresses the needs of
data-at-rest, powers real-time customer applications, and delivers robust analytics that
accelerate decision-making and innovation. IBM Spectrum Scale solves the challenge of
explosive growth of unstructured data against a flat IT budget. IBM Spectrum Scale provides
unified file and object software-defined storage for high-performance, large-scale workloads,
and it can be deployed on-premises or in the cloud. Refer to Cloudera Data Platform Private
Cloud Base with IBM Spectrum Scale, REDP-5608.
Hadoop and Spark services can use a storage system to save IT costs because
special-purpose storage is not required to perform the analytics. IBM Spectrum Scale
features a rich set of enterprise-level data management and protection features. These
features include snapshots, information lifecycle management (ILM), compression, and
encryption, all of which provide more value than traditional analytic systems do. For more
information, see IBM Spectrum Scale: Big Data and Analytics Solution Brief, REDP-5397.
Figure 4-4 IBM Spectrum Scale Active File Management connected to Cloud Object Storage
Using this function, IBM Spectrum Scale filesets and COS buckets become extensions of
each other. Files and objects required for applications such as AI and big data analytics can
be shared, downloaded, worked upon, and uploaded between ESS 3200, IBM Spectrum
Scale, and the COS. These use cases are shown in Figure 4-5.
Figure 4-5 IBM Spectrum Scale to cloud object storage and external NFS data sources
The workloads and workflows that might benefit from these use cases include (but are not
limited to) mobile applications, backup and restore, enterprise applications, big data analytics,
and file servers.
The AFM-to-COS feature also allows data center administrators to free ESS 3200 and IBM
Spectrum Scale storage capacity by policy management. Data is moved to lower-cost or
off-premise cloud storage, which reduces capital and operational expenditures. The data
movement can be done automatically through the AFM-based cache eviction feature or
through policy. The data movement can be used to automate and optimize data placement
between ESS 3200 and other storage within the IBM Spectrum Scale storage ecosystem.
This appendix also provides you with the data points to consider if you are not using the
IBM-provided and supported switches for the management network. It describes what to do in
your network switch to achieve the same functionality with your own network devices.
As shown on 3.1.2, “Hardware planning” on page 44, the management switch includes ports
1 to 12 as “ESS 3200" ports. Those ports are different from version 1 because both
management FSP networks are configured in the same port.
The process to platform the switch has not changed from version 1. The configuration-content
of the file is used to platform the switch. The same two VLANs that were used on version 1
are used in version 2. New VLANs are not added from version 1.
Note: If you are converting a switch that has already non ESS 3200 using the switch on
any port from 1 to 12, you need to evacuate one by one those ports. If you are not using
ports in the range 1-12 you just need to apply the process above.
That means to move the cables on the upper ports from 1 to 12 to any free upper port that
is not in the range ports 1-12. Equally any lower cable plugged to any port in the range
1-12 needs to be moved to any lower port not in the range of ports 1-12.
You should do the move one cable at the time and wait until the link LED on the destination
port comes up. Once all ports in the range 1-12 are no longer cabled, you can apply the
procedure explained here.
The file with the configuration must contain the data shown in Example A-1.
# Bridge setup
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports glob swp1-48
bridge-pvid 101
bridge-pvid 102
bridge-stp
off
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
Monitoring and Managing the IBM Elastic Storage Server Using the GUI, REDP-5471
Introduction Guide to the IBM Elastic Storage Server, REDP-5253
Implementation Guide for IBM Elastic Storage System 5000, SG24-8498
Implementation Guide for IBM Elastic Storage System 3000, SG24-8443
Highly Efficient Data Access with RoCE on IBM Elastic Storage Systems and IBM
Spectrum Scale, REDP-5658
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Online resources
These websites are also relevant as further information sources:
IBM Documentation - IBM Elastic Storage System 3200:
https://www.ibm.com/docs/en/ess/6.1.1_cd
IBM Spectrum Scale V 5.1.1 Planning:
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=planning
Licensing on IBM Spectrum Scale
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=overview-capacity-based-
licensing
Using IBM Cloud Object Storage with IBM Spectrum Scale:
https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=WUS12361USEN
mmvdisk Command Reference:
https://www.ibm.com/docs/en/spectrum-scale-ece/5.1.1?topic=commands-mmvdisk-com
mand
SG24-8516-00
ISBN 0738460176
Printed in U.S.A.
®
ibm.com/redbooks