James Cruickshank
Yongsheng Li (Victor)
Armin Röll
September 2018
Note: Before using this information and the product it supports, read the information in "Notices"
page vii.
This edition applies to the IBM Power System E980 (9080-M9S) server.
Important: At the time of publication, this book is based on a pre-GA version of a product. For the most
up-to-date information regarding this product, consult the product documentation or subsequent updates of
this book.
Preface
Authors
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
This publication is for professionals who want to acquire a better understanding of IBM Power
Systems™ products. The intended audience includes the following roles:
Sales and marketing professionals
Technical support professionals
IBM Business Partners
Independent software vendors (ISVs)
This paper expands the current set of IBM Power Systems documentation by providing a
desktop reference that offers a detailed technical description of the Power E980 server.
This paper does not replace the current marketing materials and configuration tools. It is
intended as an extra source of information that, together with existing sources, can be used to
enhance your knowledge of IBM server solutions.
This paper was produced by a team of specialists from around the world working at the
International Technical Support Organization, Austin Center.
James Cruickshank works in the Power Systems Client Technical Specialist team for IBM in
the UK. He holds an honors degree in Mathematics from the University of Leeds. James has
over 17 years experience working with IBM RS/6000®, IBM pSeries, IBM System p, and
Power Systems products. James supports customers in the financial services sector in the
Yongsheng Li (Victor) works in the Power Systems Level 2 Support team for IBM China. He
holds a master’s degree in Computer Application Technology from Graduate University of
Chinese Academy of Sciences. Victor has 15 years of experience working on RS/6000, IBM
System Storage®, AIX, Hardware Management Console (HMC), System p, and Power
Systems products.
Scott Vetter
PMP, IBM Austin, US
Brian Allison, Ron Arroyo, John Banchy, Rich Bireta, Gareth M Coates, Arnold Flores,
Nigel Griffiths, Daniel Henderson, Dan Hurlimann, Jeff Jajowka, Roxette Johnson,
Vic Mahaney, Charles Marino, Michael Mueller, Kaveh Naderi, Todd Rosedahl,
David Sheffield, Steve Sipocz, Alan Standridge, Bill Starke, Jeff Stuecheli,
The Power E980 server is the most powerful and scalable server in the Power Systems
portfolio because of the following features:
Includes massive throughput, performance, and scalability.
Enables large-scale consolidation of older, underutilized servers.
Improves infrastructure resilience.
Enables rapid service delivery.
The Power E980 server provides the following hardware and components and software
Up to 192 POWER9 processor cores.
Up to 64 TB memory.
Up to 32 Peripheral Component Interconnect Express (PCIe) Gen4 x16 slots in system
Initially, up to 48 PCIe Gen3 slots with four expansion drawers. This increases to 192 slots
with the support of 16 I/O drawers.
Up to over 4,000 directly attached SAS disks or solid-state drives (SSDs).
Up to 1,000 virtual machines (VMs) (logical partitions (LPARs)) per system.
A system control unit (SCU), which provides a redundant system master Flexible Service
Processor (FSP).
Support for IBM AIX, IBM i, and Linux environments.
Capacity on Demand (CoD) processor and memory options.
Model upgrades from IBM POWER8® processor-based IBM Power System E870, IBM
Power System E870C, IBM Power System E880, and IBM Power System E880C systems.
IBM Power Enterprise Pools, which support unsurpassed enterprise flexibility for workload
balancing and system.
The machine type model of the Power E980 server is 9080-M9S. At initial availability in
September 2018, up to two system nodes are supported. In November 2018, up to four
system nodes are supported.
Table 1-1 shows a comparison of the Power E880C server and the Power E980 server.
Table 1-1 Comparison between the Power E880C server and the Power E980 server
Features Power E880C server Power E980 server
Maximum memory 32 TB 64 TB
PCIe slots Eight PCIe Gen3 slots / drawer Eight PCIe Gen4 slots / drawer
Acceleration ports Yes (Coherent Accelerator Yes (CAPI 2.0 + IBM Open
Processor Interface (CAPI) 1.0) Coherent Accelerator
Processor Interface
Figure 1-2 shows a picture of a 4-node Power E980 server in a rack with a disk expansion
The SCU is powered from the system nodes. Two Universal Power Interconnect Cables
(UPIC) cables provide redundant power to the SCU. In a single-drawer system, both UPIC
cables originate from drawer 1. For system with a two or more drawers, one UPIC cable
originates from drawer 1 and the other UPIC cable originates from drawer 2. Just one UPIC
cord is enough to power the SCU; the other is in place for redundancy.
Figure 1-3 shows the front and rear view of an SCU. The locations of the connectors and
features are indicated.
Operator Panel
Three fans
USB connection
Power input from system nodes
from system node
Each SCM has dual memory controllers to support up to 128 GB off-chip embedded DRAM
(eDRAM) L4 cache to deliver up to 230 GBps of sustained memory bandwidth or 920 GBps
per node. Up to 410 GBps of peak memory bandwidth from the L4 cache to memory dual
inline memory modules (DIMMs) is provided per SCM or up to 1640 GBps per node. Using
tri-PCIe Gen4 I/O controllers, which are also integrated on to each SCM to further reduce
latency, up to 545.25 GBps I/O bandwidth is available per node. Thus, a Power E980 system
can help deliver over twice the performance per core of competitors, enabling applications to
run faster and be more responsive.
Each system node has 32 custom DIMM (CDIMM) slots and can support up to 16 TB of
DDR4 memory. Thus, a four-node server can have up to 64 TB of memory. Each system node
has eight PCIe slots, which are all PCIe Gen4 x16, low-profile. Thus, a four-node server can
have up to 32 PCIe slots. PCIe expansion units can optionally expand the number of PCIe
slots on the server.
A system node is ordered by using a processor feature. Each processor feature delivers a set
of four identical SCMs in one system node. All processor features in the system must be
identical. Cable features are required to connect system node drawers to the SCU and to
other system nodes.
Figure 1-4 shows the front view of a system node. Fans and Power Supply Units (PSUs) are
redundant and concurrently maintainable. Fans are n+1 redundant, so the system continues
to function when any one fan fails. Power supplies are n+2 redundant, so the system
continues to function even if any two power supplies fail.
Figure 1-5 shows the rear view of a system node. The locations of the connectors and
features are indicated.
8 x low profile
PCIe adapter slots 4 x NVMe slots
SMP cable ports
FSP power
2 + 2 power cords
2 x clock and
control cards 3 x USB3 FSP connection
Figure 1-6 shows the top view of a system node with the top lid removed. Voltage regulator
modules (VRMs) provide clean power to the various internal components.
10 - 35°C (50.0- 95.0°F)
Environmental assessment: The IBM Systems Energy Estimator tool can provide more
accurate information about power consumption and the thermal output of systems based
on a specific configuration, including adapters and I/O expansion drawers.
The Power E980 server must be installed in a rack with a rear door and side panels for EMC
compliance. The native Hardware Management Console (HMC) Ethernet ports must use
shielded Ethernet cables.
The actual sound pressure levels in your installation depend upon various factors, including
the number of racks in the installation, the size, materials, and configuration of the room
where you designate the racks to be installed, the noise levels from other equipment, the
room ambient temperature, and employees' location in relation to the equipment. Compliance
with such government regulations also depends on various extra factors, including the
duration of employees' exposure and whether employees wear hearing protection.
IBM recommends that you consult with qualified experts in this field to determine whether you
are in compliance with the applicable regulations.
The SCU requires 2U and each system node requires 5U. Thus, a single-enclosure system
requires 7U, a two-enclosure system requires 12U, a three-enclosure system requires 17U,
and a four-enclosure system requires 22U.
Table 1-3 lists the physical dimensions of the system node, SCU, and the PCIe Gen3 I/O
Expansion Drawer.
Table 1-3 Physical dimensions of the system node, SCU, and the PCIe Gen3 I/O Expansion Drawer
Dimension Power E980 system node Power E980 SCU PCIe I/O expansion
Width 445.5 mm (17.54 in.) 445.6 mm (17.54 in.) 447.3 mm (17.61 in.)
Depth 867 mm (34.13 in.) 779.7 mm (30.7 in.) 737 mm (29.0 in.)
Height 218 mm (8.58 in.) 5 EIA 86 mm (3.39 in.) 2 EIA 173 mm (6.8 in.) 4 EIA
units units units
Weight 86.2 kg (190 lb) 22.7 kg (50 lb) 54.4 kg (120 lb)
You can order system features during the initial system order. You can also add or replace
features later on as a MES. An MES is a hardware change that involves adding, removing, or
changing features.
For more information about the available features, see Chapter 2, “Architecture and technical
overview” on page 49.
Note: At initial availability in September 2018, zero, one, or two PCIe I/O Expansion
Drawers are supported per system node. In November 2018, zero, one, two, three, or four
expansion drawers are supported per system node.
Processors (8-core)
Power supply 4
Language Group
CoD processor core activation features are available on a per-core basis. Each Power E980
system requires a minimum number of permanent processor core activations by using either
static activations or Linux on Power activations that match the number of processor cores per
processor feature. This minimum is per system, not per node. The rest of the cores can be
permanently or temporary activated or remain inactive (dark) until needed.
No matter the core count per socket (8 or 12), the the minimum active cores is 8 for the entire
The activations are not specific to hardware cores, SCMs, or nodes. They are known to the
system as a total number of activations of different types, and used or assigned by the Power
Various activations fit different usage and pricing options. Static activations are permanent
and support any type of application environment on this server. Mobile activations are
ordered against a specific server, but can be moved to any server within the IBM Power
Enterprise Pool and support any type of application. Mobile-enabled activations are
technically static, but can be converted to mobile at no charge when logistically or
administratively eligible. Linux on Power activations can run only Linux workloads. Temporary
activations are used for Elastic Capacity on Demand (Temporary) (Elastic CoD), Utility
Capacity on Demand (Utility CoD), and Trial Capacity on Demand (Trial CoD).
P9 P9
4 B X buses
X bus 4 B
P9 P9
48 lanes PCIe Gen4
Figure 1-7 O-Bus and X-Bus fabric for Drawer to Drawer interconnect or Accelerator.
To assist with the quad-plugging rules above, four CDIMMs are ordered by using one memory
All CDIMMs must be identical when on the same SCM. If you use eight CDIMMs, both
memory FCs on an SCM must be identical. A different SCM in the same system node can
use a different memory FC. For example, one system node can technically use 128 GB,
256 GB, 512 GB, 1024 GB, and 2048 GB memory FCs. DDR3 and DDR4 memory cannot be
mixed on the same system node. DDR3 memory is available only when transferred from a
Power E870, Power E870C, Power E880, or Power E880C system as part of an upgrade.
To provide more flexible pricing, memory activations are ordered separately from the physical
memory and can be permanent or temporary. Activation features can be used on DDR4
memory FCs and used on any size memory FC. Activations are not specific to a CDIMM, but
are known as a total quantity to the server. The Power Hypervisor determines what physical
memory to use.
A minimum of 50% of the total physical memory capacity of a server must have permanent
memory activations that are ordered for that server. For example, a server with a total of 8 TB
of physical memory must have at least 4 TB of permanent memory activations that are
ordered for that server.
These activations can be static, mobile-enabled, mobile, or Linux on Power. At least 25%
must be static activations or Linux on Power activations. For example, a server with a total of
8 TB physical memory must have at least 2 TB of static activations or Linux on Power
activations. The 50% minimum cannot be fulfilled by using mobile activations that are ordered
on a different server.
The minimum activations that are ordered with MES orders of extra physical memory features
depend on the existing total installed physical memory capacity and the existing total installed
memory activation features. If you already installed more than 50% activations for your
existing system, then you can order fewer than 50% activations for the MES ordered memory.
The resulting configuration after the MES order of physical memory and any MES activations
must meet the same 50% and 25% minimum rules.
For the best possible performance, install memory evenly across all system node drawers
and all SCMs in the system. Balancing memory across the installed system board cards
enables memory access in a consistent manner and typically results in better performance for
your configuration.
Though maximum memory bandwidth is achieved by filling all the memory slots, plans for
future memory additions should be accounted for when deciding which memory FC to use at
the time of initial system order.
AME is an option that can increase the effective memory capacity of the system. For more
information about AME, see 3.2, “Active Memory Expansion” on page 125.
16 - 512 16 - 512
16 MB
GB 16 MB
16 - 512 16 - 512
16 MB
GB 16 MB
16 - 512 16 - 512
16 MB
GB 16 MB
16 - 512 16 - 512
16 MB
GB 16 MB
buffer on a
Figure 1-9 Memory buffer on a CDIMM
A blind-swap cassette (BSC) is used to house the low-profile adapters that go into these
slots. The server includes a full set of BSCs, even if the BSCs are empty. An FC to order
more low-profile BSCs is not required or announced. BSCs enable hot (system running)
replace, removal, and addition for an adapter without having to place a server in a service
position or open the drawer. The server BSCs are not the same as the ones that are used
in the I/O drawers.
If more PCIe slots beyond the system node slots are required, a system node x16 slot is
used to attach a six-slot expansion module in the I/O drawer. An I/O drawer holds two
expansion modules that are attached to any two x16 PCIe slots in the same system node
or in different system nodes.
PCIe Gen1, Gen2, and Gen3 adapters are supported in these Gen4 slots. The set of PCIe
adapters that is supported is described in 2.5, “PCIe adapters” on page 88.
Concurrent repair and add/removal of PCIe adapters is done by HMC-guided menus or by
OS support utilities.
The system nodes sense which IBM PCIe adapters are installed in the PCIe slots. If an
adapter requires higher levels of cooling, fans automatically speed up to increase airflow
across these PCIe adapters.
Each system node supports CAPI 2.0 adapters in all slots. At initial availability in
September 2018, no IBM developed or IBM Part Number CAPI adapters are supported.
The ports have been tested and adapters are available from IBM Business Partner, such
– Nallatech 250SP
– Flyslice FX609
– Semptian NSA241
– ReflexCES XpressVUP LP9P
1.4.7 USB
Each system node in the Power E980 server provides three USB 3.0 ports for general use by
the LPAR or Virtual I/O Server (VIOS) to which they are assigned. On the first system node,
one of the USB ports is rerouted to a port on the front of the SCU. This port is primarily
intended for use when attaching a USB DVD drive.
Table 1-8 on page 24 lists the options for USB-attached media that are available for the
Power E980 server.
#EUA5 63BD Stand-alone USB DVD drive w/cable 1 AIX and Linux Both
#EUA4 RDX USB External Docking Stationb 6 AIX and Linux Both
a. For more information about order types, see 1.4.8, “Disk and media features” on page 18.
b. #EUA4 is available to purchase only in the United States.
More direct attached storage is supported in external disk and media drawers.
Table 1-6 lists the available disk and SSD features for the Power E980 server.
ESD2 59CD 1.1 TB 10K RPM SAS SFF-2 Disk Drive (IBM i) 4032 IBM i Supported
ESF2 59DA 1.1 TB 10K RPM SAS SFF-2 Disk Drive 4 K Block - 4032 IBM i Both
ESD3 59CD 1.2 TB 10K RPM SAS SFF-2 Disk Drive (AIX/Linux) 4032 AIX and Supported
ESF3 59DA 1.2 TB 10K RPM SAS SFF-2 Disk Drive 4 K Block - 4032 AIX and Both
4096 Linux
ESGP 5B12 1.55 TB Enterprise SAS 4 K SFF-2 SSD for AIX/Linux 2016 AIX and Both
ESGQ 5B12 1.55 TB Enterprise SAS 4 K SFF-2 SSD for IBM i 2016 AIX and Both
ES8F 5B12 1.55 TB SFF-2 SSD 4 K eMLC4 for AIX/Linux 2016 AIX and Supported
ES8G 5B12 1.55 TB SFF-2 SSD 4 K eMLC4 for IBM i 2016 IBM i Supported
ESFS 59DD 1.7 TB 10K RPM SAS SFF-2 Disk Drive 4 K Block - 4032 IBM i Both
ESFT 59DD 1.8 TB 10K RPM SAS SFF-2 Disk Drive 4 K Block - 4032 AIX and Both
4096 Linux
ES96 5B21 1.86 TB Mainstream SAS 4 K SFF-2 SSD for 2016 Linux Both
ESHL 5B21 1.86 TB Mainstream SAS 4 K SFF-2 SSD for 2016 AIX and Both
AIX/Linux Linux
ES97 5B21 1.86 TB Mainstream SAS 4 K SFF-2 SSD for IBM i 2016 IBM i Both
ESHM 5B21 1.86 TB Mainstream SAS 4 K SFF-2 SSD for IBM i 2016 AIX and Both
ESE7 5B2D 3.72 TB Mainstream SAS 4 K SFF-2 SSD for 2016 AIX and Both
AIX/Linux Linux
ESM8 5B2D 3.72 TB Mainstream SAS 4 K SFF-2 SSD for 2016 Linux Both
ESE8 5B2D 3.72 TB Mainstream SAS 4 K SFF-2 SSD for IBM i 2016 AIX, IBM Both
i, and
ESM9 5B2D 3.72 TB Mainstream SAS 4 K SFF-2 SSD for IBM i 2016 IBM i Both
ES62 5B1D 3.86 - 4.0 TB 7200 RPM 4 K SAS LFF-1 Nearline Disk 2016 AIX and Both
Drive (AIX/Linux) Linux
ESHN 5B2F 7.45 TB Mainstream SAS 4 K SFF-2 SSD for 2016 AIX and Both
AIX/Linux Linux
ES64 5B1F 7.7 2 - 8.0 TB 7200 RPM 4 K SAS LFF-1 Nearline Disk 2016 AIX and Both
Drive (AIX/Linux) Linux
ESEY 59C9 283 GB 15 K RPM SAS SFF-2 4 K Block - 4224 Disk 4032 IBM i Both
ESNL 5B43 283 GB 15 K RPM SAS SFF-2 4 K Block Cached Disk 4032 IBM i Both
Drive (IBM i)
1948 19B1 283 GB 15 K RPM SAS SFF-2 Disk Drive (IBM i) 4032 IBM i Supported
ESEZ 59C9 300 GB 15 K RPM SAS SFF-2 4 K Block - 4096 Disk 4032 AIX and Both
Drive Linux
ESNM 5B43 300 GB 15 K RPM SAS SFF-2 4 K Block Cached Disk 4032 Linux Both
Drive (AIX/Linux)
1953 19B1 300 GB 15 K RPM SAS SFF-2 Disk Drive (AIX/Linux) 4032 AIX and Both
ESGB 5B10 387 GB Enterprise SAS 4 K SFF-2 SSD for AIX/Linux 2016 AIX and Both
ESGC 5B10 387 GB Enterprise SAS 4 K SFF-2 SSD for IBM i 2016 AIX and Both
ESG5 5B16 387 GB Enterprise SAS 5xx SFF-2 SSD for AIX/Linux 2016 AIX and Both
ESG6 5B16 387 GB Enterprise SAS 5xx SFF-2 SSD for IBM i 2016 AIX and Supported
ES85 5B10 387 GB SFF-2 SSD 4 K eMLC4 for AIX/Linux 2016 AIX and Supported
ES86 5B10 387 GB SFF-2 SSD 4 K eMLC4 for IBM i 2016 IBM i Supported
ES78 5B16 387 GB SFF-2 SSD 5xx eMLC4 for AIX/Linux 2016 AIX and Supported
ES79 5B16 387 GB SFF-2 SSD 5xx eMLC4 for IBM i 2016 IBM i Supported
1962 19B3 571 GB 10 K RPM SAS SFF-2 Disk Drive (IBM i) 4032 IBM i Supported
ESEU 59D2 571 GB 10K RPM SAS SFF-2 Disk Drive 4 K Block - 4032 IBM i Both
ESFN 59CC 571 GB 15 K RPM SAS SFF-2 4 K Block - 4224 Disk 4032 IBM i Both
ESNQ 5B47 571 GB 15 K RPM SAS SFF-2 4 K Block Cached Disk 4032 IBM i Both
Drive (IBM i)
ESDN 59CF 571 GB 15 K RPM SAS SFF-2 Disk Drive - 528 Block 4032 IBM i Supported
(IBM i)
1964 19B3 600 GB 10 K RPM SAS SFF-2 Disk Drive (AIX/Linux) 4032 AIX and Both
ESEV 59D2 600 GB 10K RPM SAS SFF-2 Disk Drive 4 K Block - 4032 AIX and Both
4096 Linux
ESFP 59CC 600 GB 15 K RPM SAS SFF-2 4 K Block - 4096 Disk 4032 AIX and Both
Drive Linux
ESNR 5B47 600 GB 15 K RPM SAS SFF-2 4 K Block Cached Disk 4032 Linux Both
Drive (AIX/Linux)
ESDP 59CF 600 GB 15 K RPM SAS SFF-2 Disk Drive - 5xx Block 4032 AIX and Supported
(AIX/Linux) Linux
ESGK 5B11 775 GB Enterprise SAS 4 K SFF-2 SSD for AIX/Linux 2016 AIX and Both
ESGL 5B11 775 GB Enterprise SAS 4 K SFF-2 SSD for IBM i 2016 AIX and Both
ESGF 5B17 775 GB Enterprise SAS 5xx SFF-2 SSD for AIX/Linux 2016 AIX and Both
ESGG 5B17 775 GB Enterprise SAS 5xx SFF-2 SSD for IBM i 2016 AIX and Supported
ES8C 5B11 775 GB SFF-2 SSD 4 K eMLC4 for AIX/Linux 2016 AIX and Supported
ES8D 5B11 775 GB SFF-2 SSD 4 K eMLC4 for IBM i 2016 IBM i Supported
ES7E 5B17 775 GB SFF-2 SSD 5xx eMLC4 for AIX/Linux 2016 AIX and Supported
ES7F 5B17 775 GB SFF-2 SSD 5xx eMLC4 for IBM i 2016 IBM i Supported
ES8Y 5B29 931 GB Mainstream SAS 4 K SFF-2 SSD for 2016 Linux Both
ESHJ 5B29 931 GB Mainstream SAS 4 K SFF-2 SSD for 2016 AIX and Both
AIX/Linux Linux
ES8Z 5B29 931 GB Mainstream SAS 4 K SFF-2 SSD for IBM i 2016 IBM i Both
ESHK 5B29 931 GB Mainstream SAS 4 K SFF-2 SSD for IBM i 2016 AIX and Both
a. For more information about order types, see 1.4.8, “Disk and media features” on page 18.
Table 1-7 shows the disk and SSD features that are available for bulk ordering.
EQ62 5B1D Quantity 150 of ES62 3.86 - 4.0 TB 7200 RPM 4 K 13 AIX and Both
LFF-1 Disk Linux
EQ64 5B1F Quantity 150 of ES64 7.72 - 8.0 TB 7200 RPM 4 K 13 AIX and Both
LFF-1 Disk Linux
EQ78 5B16 Quantity 150 of ES78 387 GB SFF-2 SSD 5xx 13 AIX and Supported
EQ79 5B16 Quantity 150 of ES79 387 GB SFF-2 SSD 5xx 13 IBM i Supported
EQ7E 5B17 Quantity 150 of ES7E 775 GB SFF-2 SSD 5xx 13 AIX and Supported
EQ7F 5B17 Quantity 150 of ES7F 775 GB SFF-2 SSD 5xx 13 IBM i Supported
EQ8G 5B12 Quantity 150 of ES8 G 1.55 TB SFF-2 SSD 4 K 13 IBM i Supported
EQ85 5B10 Quantity 150 of ES85 387 GB SFF-2 SSD 4 K 13 AIX and Supported
EQ86 5B10 Quantity 150 of ES86 387 GB SFF-2 SSD 4 K 13 IBM i Supported
EQ8C 5B11 Quantity 150 of ES8C 775 GB SFF-2 SSD 4 K 13 AIX and Supported
EQ8D 5B11 Quantity 150 of ES8D 775 GB SFF-2 SSD 4 K 13 IBM i Supported
EQ8F 5B12 Quantity 150 of ES8F 1.55 TB SFF-2 SSD 4 K 13 AIX and Supported
EQ8Y 5B29 Quantity 150 of ES8Y 931 GB SFF-2 SSD 4 K 13 Linux Both
EQ8Z 5B29 Quantity 150 of ES8Z 931 GB SFF-2 SSD 4 K 13 IBM i Both
EQ96 5B21 Quantity 150 of ES96 1.86 TB SFF-2 SSD 4 K 13 Linux Both
EQ97 5B21 Quantity 150 of ES97 1.86 TB SFF-2 SSD 4 K 13 IBM i Both
EQD3 59CD Quantity 150 of ESD3 (1.2 TB 10 K SFF-2) 26 AIX and Supported
EQDN 59CF Quantity 150 of ESDN (571 GB 15 K RPM SAS SFF-2 26 IBM i Supported
for IBM i)
EQDP 59CF Quantity 150 of ESDP (600 GB 15 K RPM SAS SFF-2 26 AIX and Supported
for AIX/LINUX) Linux
EQE7 5B2D Quantity 150 of ESE7 3.72 TB SFF-2 SSD 4 K 13 AIX and Both
EQE8 5B2D Quantity 150 of ESE8 3.72 TB SFF-2 SSD 4 K 13 AIX, IBM Both
i, and
EQEV 59D2 Quantity 150 of ESEV (600 GB 10 K SFF-2) 26 AIX and MES
EQEZ 59C9 Quantity 150 of ESEZ (300 GB SFF-2) 26 AIX and MES
EQF3 59DA Quantity 150 of ESF3 (1.2 TB 10 K SFF-2) 26 AIX and MES
EQFP 59CC Quantity 150 of ESFP (600 GB SFF-2) 26 AIX and MES
EQFT 59DD Quantity 150 of ESFT (1.8 TB 10 K SFF-2) 26 AIX and MES
EQG5 5B16 Quantity 150 of ESG5 (387 GB SAS 5xx) 13 AIX and Both
EQG6 5B16 Quantity 150 of ESG6 (387 GB SAS 5xx) 13 AIX and Supported
EQGB 5B10 Quantity 150 of ESGB (387 GB SAS 4 K) 13 AIX and Both
EQGC 5B10 Quantity 150 of ESGC (387 GB SAS 4 K) 13 AIX and Both
EQGF 5B17 Quantity 150 of ESGF (775 GB SAS 5xx) 13 AIX and Both
EQGG 5B17 Quantity 150 of ESGG (775 GB SAS 5xx) 13 AIX and Supported
EQGK 5B11 Quantity 150 of ESGK (775 GB SAS 4 K) 13 AIX and Both
EQGL 5B11 Quantity 150 of ESGL (775 GB SAS 4 K) 13 AIX and Both
EQGP 5B12 Quantity 150 of ESGP (1.55 TB SAS 4 K) 13 AIX and Both
EQGQ 5B12 Quantity 150 of ESGQ (1.55 TB SAS 4 K) 13 AIX and Both
ERHJ 5B29 Quantity 150 of ESHJ 931 GB SSD 4 K SFF-2 13 AIX and Both
ERHK 5B29 Quantity 150 of ESHK 931 GB SSD 4 K SFF-2 13 AIX and Both
ERHL 5B21 Quantity 150 of ESHL 1.86 TB SSD 4 K SFF-2 13 AIX and Both
ERHM 5B21 Quantity 150 of ESHM 1.86 TB SSD 4 K SFF-2 13 AIX and Both
ERHN 5B2F Quantity 150 of ESHN 7.45 TB SSD 4 K SFF-2 13 AIX and Both
ERM8 5B2D Quantity 150 of ESM8 3.72 TB SSD 4 K SFF-2 13 Linux Both
ERM9 5B2D Quantity 150 of ESM9 3.72 TB SSD 4 K SFF-2 13 IBM i Both
Table 1-8 lists the options for USB-attached media that are available for the Power E980
EUA5 63BD Stand-alone USB DVD drive w/cable 1 AIX and Both
At initial availability in September 2018, the Power E980 server supports the attachment of
zero, one, or two PCIe Gen3 I/O Expansion Drawers to each system node.
In November 2018, zero, one, two, three, or four PCIe Gen3 I/O Expansion Drawers per
system node will be supported. To connect an I/O expansion drawer, a PCIe slot is used to
attach a 6-slot expansion module in the I/O drawer. A PCIe Gen3 I/O Expansion Drawer
(#EMX0) holds two expansion modules that are attached to any two PCIe slots in the same
system node or in different system nodes.
For the connection of SAS disks, disk-only I/O drawers are available. The EXP24S,
EXP24SX, and EXP12SX drawers are supported.
A BSC is used to house the full-high adapters that go into these slots. The BSC is the same
BSC that is used with the previous generation server's 12X attached I/O drawers (#5802,
#5803, #5877, and #5873). The drawer is shipped with a full set of BSCs, even if the BSCs
are empty.
Concurrent repair and add/removal of PCIe adapters is done through HMC-guided menus or
by OS support utilities.
A PCIe CXP converter adapter and Active Optical Cables (AOCs) connect the system node to
a PCIe FanOut module in the I/O expansion drawer. Each PCIe Gen3 I/O Expansion Drawer
has two power supplies.
Drawers can be added to the server later, but system downtime must be scheduled for adding
a PCIe3 Optical Cable Adapter or a PCIe Gen3 I/O drawer (#EMX0) or fan-out module.
Figure 1-12 Rear view of a PCIe Gen3 I/O Expansion Drawer with PCIe slots location codes
Table 1-9 provides details abut the PCI slots in the PCIe Gen3 I/O Expansion Drawer.
Table 1-9 PCIe slot locations and descriptions for the PCIe Gen3 I/O Expansion Drawer
Slot Location code Description
In Table 1-9:
All slots support full-length, regular-height adapters or short (low-profile) adapters with a
regular-height tailstock in single-wide, Gen3 BSCs.
Slots C1 and C4 in each PCIe3 6-slot fan-out module are x16 PCIe3 buses, and slots C2,
C3, C5, and C6 are x8 PCIe buses.
All slots support enhanced error handling (EEH).
All PCIe slots are hot-swappable and support concurrent maintenance.
Table 1-10 summarizes the maximum number of I/O drawers that is supported and the total
number of PCI slots that is available when the expansion drawer consists of a single drawer
Table 1-10 Maximum number of I/O drawers that are supported and the total number of PCI slots
System nodes Maximum #EMX0 Total number of slots
PCIe3, x16 PCIe3, x8 Total PCIe3
Availability: At initial availability in September 2018, a maximum of two I/O drawers are
supported per system node, and a maximum of two system nodes is supported. In
November 2018, four system nodes and four I/O drawers per node will be supported.
With AIX, Linux, and VIOS, you can order the EXP24S drawer with four sets of six bays, two
sets of 12 bays, or one set of 24 bays (mode 4, 2, or 1). With IBM i, you can order the
EXP24S drawer as one set of 24 bays (mode 1). Mode setting is done by IBM Manufacturing.
If you need to change the mode after installation, ask your IBM System Services
Representative (IBM SSR) to refer to:
The EXP24S SAS ports are attached to a SAS PCIe adapter or pair of adapters by using SAS
YO or X cables (provided by manufacturing).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Figure 1-13 EXP24S SFF Gen2-bay Drawer
The EXP24SX drawer is a storage expansion enclosure with twenty-four 2.5-inch SFF SAS
bays. It supports up to 24 hot-plug HDDs or SSDs in only 2 EIA of space in a 19-inch rack.
The EXP24SX SFF bays use SFF Gen2 (SFF-2) carriers or trays.
The EXP12SX drawer is a storage expansion enclosure with twelve 3.5-inch large form factor
(LFF) SAS bays. It supports up to 12 hot-plug HDDs in only 2 EIA of space in a 19-inch rack.
The EXP12SX SFF bays use LFF Gen1 (LFF-1) carriers/trays. The 4 KB sector drives (#4096
or #4224) are supported. SSDs are not supported.
With AIX, Linux, and VIOS, the EXP24SX and EXP12SX drawers can be ordered with four
sets of six bays (mode 4), two sets of 12 bays (mode 2), or one set of 24-four bays (mode 1).
IBM i supports the EXP24SX drawer with one set of 24 bays (mode 1). It is possible to
change the mode setting in the field by using software commands along with a specifically
documented procedure.
Important: When changing modes, a skilled, technically qualified person should follow the
special documented procedures. Improperly changing modes can potentially destroy
existing RAID sets, prevent access to existing data, or allow other partitions to access
another partition's existing data.
The attachment between the EXP24SX and EXP12SX drawers and the PCIe3 SAS adapters
or integrated SAS controllers is through SAS YO12 or X12 cables. All ends of the YO12 and
X12 cables have mini-SAS HD narrow connectors. The PCIe Gen3 SAS adapters support 6
Gb throughput. The EXP24SX and EXP12SX drawers may support up to 12 Gb throughput if
future SAS adapters support that capability.
The EXP24SX and EXP12SX drawers include redundant AC power supplies and two power
For the EXP24SX drawer, a maximum of twenty-four 2.5-inch SSDs or 2.5-inch HDDs
are supported in the #ESLS 24 SAS bays. There can be no mixing of HDDs and SSDs
in the same mode-1 drawer. HDDs and SSDs can be mixed in a mode-2 or mode-4
drawer, but they cannot be mixed within a logical split of the drawer. For example, in a
mode-2 drawer with two sets of 12 bays, one set can hold SSDs and one set can hold
HDDs, but you cannot mix SSDs and HDDs in the same set of 12 bays.
The EXP24S, EXP24SX, and EXP12SX drawers can be mixed on the same server and
on the same PCIe3 adapters.
The EXP12SX drawer does not support SSD.
IBM i does not support the EXP12SX drawer.
Order information: It is highly recommended that you order the Power E980 server with
an IBM 42U enterprise rack (7014-T42, 7965-S42, or #0553). This rack provides a more
complete and higher quality environment for IBM Manufacturing system assembly and
testing, and provides a complete package.
If a system is installed in a rack or cabinet that is not from IBM, ensure that the rack meets the
requirements that are described in 1.6.11, “Original equipment manufacturer racks” on
page 40.
Responsibility: The client is responsible for ensuring that the installation of the drawer in
the preferred rack or cabinet results in a configuration that is stable, serviceable, safe, and
compatible with the drawer requirements for power, cooling, cable management, weight,
and rail security.
Vertical PDUs: All PDUs that are installed in a rack containing an E980 server must be
installed horizontally to allow for cable routing in the sides of the rack.
Some of the available door options for the T42 rack are shown in Figure 1-16.
The #ERG7 provides an attractive black full-height rack door. The door is steel, with a
perforated flat front surface. The perforation pattern extends from the bottom to the top of
the door to enhance ventilation and provide some visibility into the rack. The non-acoustic
door has a depth of about 134 mm (5.3 in.).
For more information about planning for the installation of the IBM Rear Door Heat
Exchanger, see IBM Knowledge Center.
Compared to the 7965-94Y Slim Rack, the Enterprise Slim Rack provides extra strength and
shipping/installation flexibility.
The 7965-S42 rack has space for up to four PDUs in side pockets. Extra PDUs beyond four
are mounted horizontally and each uses 1U of rack space.
Vertical PDUs: All PDUs that are installed in a rack containing a Power E980 server must
be installed horizontally to enable cable routing in the sides of the rack.
The Enterprise Slim Rack front door, which can be Basic Black/Flat (#ECRM), High-End
appearance (#ECRF), or original equipment manufacturer (OEM) black (#ECRE), has
perforated steel, which provides ventilation, physical security, and visibility of indicator lights in
the installed equipment within. It comes standard with a lock that is identical to the locks in the
rear doors. The door (#ECRG and #ECRE only) can be hinged on either the left or right side.
Orientation: #ECRF should not be flipped because the IBM logo would be upside down.
Order availability: The #0551 rack is available only when ordered as an MES order. It is
not available as an initial order.
Order availability: The #0551 rack is available only when ordered as an MES order. It is
not available as an initial order.
PDUs on the rack are optional. Each #7196 and #7189 PDU uses one of six vertical mounting
bays. Each PDU beyond four uses 1U of rack space.
If ordering Power Systems equipment in an MES order, use the equivalent rack #ER05
instead of 7965-94Y so that IBM Manufacturing can include the hardware in the rack.
Two possible PDU ratings are supported: 60A/63A (orderable in most countries) and
The 60A/63A PDU supports four system node power supplies and one I/O expansion
drawer or eight I/O expansion drawers.
The 30A/32A PDU supports two system node power supplies and one I/O expansion
drawer or four I/O expansion drawer
Rack-integrated system orders require at least two of either #7109, #7188, or #7196:
Intelligent PDU (iPDU) with Universal UTG0247 Connector (#7109) is for an intelligent AC
PDU that enables users to monitor the amount of power that is used by the devices that
are plugged into this PDU. This PDU provides 12 C13 power outlets. It receives power
through a UTG0247 connector. It can be used for many different countries and
applications by varying the PDU to Wall Power Cord, which must be ordered separately.
Each iPDU requires one PDU to Wall Power Cord. Supported power cords include #6489,
#6491, #6492, #6653, #6654, #6655, #6656, #6657, and #6658.
Power Distribution Unit (#7188) mounts in a 19-inch rack and provides 12 C13 power
outlets. The PDU has six 16 A circuit breakers, with two power outlets per circuit breaker.
System units and expansion units must use a power cord with a C14 plug to connect to
#7188. One of the following power cords must be used to distribute power from a wall
outlet to #7188: #6489, #6491, #6492, #6653, #6654, #6655, #6656, #6657, or #6658.
The Three-phase Power Distribution Unit (#7196) provides six C19 power outlets and is
rated up to 48 A. It has a 4.3 m (14 ft) fixed power cord to attach to the power source
(IEC309 60A plug (3P+G)). A separate “to-the-wall” power cord is not required or
orderable. Use the Power Cord 2.8 m (9.2 ft), Drawer to Wall/IBM PDU (250V/10A)
(#6665) to connect devices to this PDU. These power cords are different than the ones
that are used for the #7188 and #7109 PDUs. Supported countries for the #7196 PDU are
Antigua and Barbuda, Aruba, Bahamas, Barbados, Belize, Bermuda, Bolivia, Brazil,
Canada, Cayman Islands, Colombia, Costa Rica, Dominican Republic, Ecuador, El
Salvador, Guam, Guatemala, Haiti, Honduras, Indonesia, Jamaica, Japan, Mexico,
Netherlands Antilles, Nicaragua, Panama, Peru, Puerto Rico, Surinam, Taiwan, Trinidad
and Tobago, United States, and Venezuela.
Four PDUs can be mounted vertically in the back of the T00 and T42 racks.
Figure 1-17 shows the placement of the four vertically mounted PDUs.
1 2
In the rear of the rack, two more PDUs can be installed horizontally in the T00 rack and three
more PDUs in the T42 rack. The four vertical mounting locations are filled first in the T00 and
T42 racks. Mounting PDUs horizontally uses 1U per PDU and reduces the space that is
available for other racked components. When mounting PDUs horizontally, the preferred
approach is to use fillers in the EIA units that are occupied by these PDUs to facilitate proper
air-flow and ventilation in the rack.
The PDU receives power through a UTG0247 power-line connector. Each PDU requires one
PDU-to-wall power cord. Various power cord features are available for various countries and
applications by varying the PDU-to-wall power cord, which must be ordered separately. Each
power cord provides the unique design characteristics for the specific power requirements. To
match new power requirements and save previous investments, these power cords can be
requested with an initial order of the rack or with a later upgrade of the rack features.
Table 1-11 shows the available wall power cord options for the PDU and iPDU features, which
must be ordered separately.
Table 1-11 Wall power cord options for the PDU and iPDU features
Feature Wall plug Rated voltage Phase Rated amperage Geography
code (V AC)
6654 NEMA L6-30 200 - 208, 240 1 24 amps US, Canada, LA, and Japan
6655 RS 3750DP 200 - 208, 240 1 24 amps US, Canada, LA, and Japan
6492 IEC 309, 2P+G, 200 - 208, 240 1 48 amps US, Canada, LA, and Japan
60 A
Notes: Ensure that the appropriate power cord feature is configured to support the power
that is being supplied. Based on the power cord that is used, the PDU can supply
4.8 - 19.2 kVA. The power of all the drawers plugged into the PDU must not exceed the
power cord limitation.
To better enable electrical redundancy, each server has four power supplies that must be
connected to separate PDUs, which are not included in the base order.
For maximum availability, a preferred approach is to connect power cords from the same
system to two separate PDUs in the rack, and to connect each PDU to independent power
For detailed power requirements and power cord details about the 7014 racks, see
IBM Knowledge Center.
For detailed power requirements and power cord details about the 7965-94Y rack, see
IBM Knowledge Center.
Order information: The racking approach for the initial order must be either a 7014-T42 or
7965-S42. If an extra rack is required for I/O expansion drawers as an MES to an existing
system, either an #0551, #0553, or #ER05 rack must be ordered.
If you install the Power E980 server into a T42 rack, 2U of space must be left for cable
routing. For a bottom cable exit, the 2U must be left at the bottom of the rack. For a top
cable exit, the 2U must be left at the top of the rack.
If you install the Power E980 server into an S42 rack, no space is required for cable
The IBM System Storage 7226 Multi-Media Enclosure supports LTO Ultrium and DAT160
Tape technology, DVD-RAM, and RDX removable storage requirements on the following IBM
IBM POWER6® processor-based systems
IBM POWER7® processor-based systems
IBM POWER8 processor-based systems
IBM POWER9 processor-based systems
The IBM System Storage 7226 Multi-Media Enclosure offers an expansive list of drive feature
options, as shown in Table 1-13.
5763 DVD Front USB Port Sled with DVD-RAM USB Drive Available
Removable RDX drives are in a rugged cartridge that inserts in to an RDX removable (USB)
disk docking station (#1103 or #EU03). RDX drives are compatible with docking stations,
which are installed internally in IBM POWER6, IBM POWER6+™, POWER7,
IBM POWER7+™, POWER8, and POWER9 processor-based servers, where applicable.
Media that is used in the 7226 DAT160 SAS and USB tape drive features are compatible with
DAT160 tape drives that are installed internally in IBM POWER6, POWER6+, POWER7,
POWER7+, POWER8, and POWER9 processor-based servers.
Media that is used in LTO Ultrium 5 Half High 1.5 TB tape drives are compatible with Half
High LTO5 tape drives that are installed in the IBM TS2250 and TS2350 external tape drives,
IBM LTO5 tape libraries, and half-high LTO5 tape drives that are installed internally in IBM
POWER6, POWER6+, POWER7, POWER7+, POWER8, and POWER9 processor-based
Figure 1-18 shows the IBM System Storage 7226 Multi-Media Enclosure.
The IBM System Storage 7226 Multi-Media Enclosure offers a customer-replaceable unit
(CRU) maintenance service to help make the installation or replacement of new drives
efficient. Other 7226 components are also designed for CRU maintenance.
The IBM System Storage 7226 Multi-Media Enclosure is compatible with most POWER6,
POWER6+, POWER7, POWER7+, POWER8, and POWER9 processor-based systems that
offer current level AIX, IBM i, and Linux OSes.
For a complete list of host software versions and release levels that support the IBM System
Storage 7226 Multi-Media Enclosure, see System Storage Interoperation Center (SSIC).
Note: Any of the existing 7216-1U2, 7216-1U3, and 7214-1U2 multimedia drawers are
also supported.
The Model TF4 is a follow-on product to the Model TF3 and offers the following features:
A slim, sleek, and lightweight monitor design that occupies only 1U (1.75 in.) in a 19-inch
standard rack.
A 18.5-inch (409.8 mm x 230.4 mm) flat panel TFT monitor with truly accurate images and
virtually no distortion.
The ability to mount the IBM Travel Keyboard in the 7316-TF4 rack keyboard tray.
Support for the IBM 1x8 Rack Console Switch (#4283) IBM Keyboard/Video/Mouse (KVM)
#4283 is a 1x8 Console Switch that fits in the 1U space behind the TF4. It is a CAT5-based
switch containing eight analog rack interface (ARI) ports for connecting either PS/2 or USB
console switch cables. It supports chaining of servers that use an IBM Conversion Options
switch cable (#4269). This feature provides four cables that connect a KVM switch to a
system, or can be used in a daisy-chain scenario to connect up to 128 systems to a single
KVM switch. It also supports server-side USB attachments.
Rear Door
Figure 1-19 Top view of the rack specification dimensions (not specific to IBM)
The vertical distance between the mounting holes must consist of sets of three holes
spaced (from bottom to top) 15.9 mm (0.625 in.), 15.9 mm (0.625 in.), and 12.67 mm (0.5
in.) on-center, making each three-hole set of vertical hole spacing 44.45 mm (1.75 in.)
apart on center. Rail-mounting holes must be 7.1 mm ± 0.1 mm (0.28 in. ± 0.004 in.) in
Figure 1-20 shows the top front specification dimensions.
15.9mm 15.9mm
12.7mm 12.7mm
15.9mm 15.9mm
15.9mm 15.9mm
12.7mm 12.7mm
The HMC also supports advanced service functions, including guided repair and verification,
concurrent firmware updates for managed systems, and around-the-clock error reporting
through IBM Electronic Service Agent™ (ESA) for faster support.
The HMC management features help improve server usage, simplify systems management,
and accelerate provisioning of server resources by using IBM PowerVM® virtualization
Hardware support for CRUs comes as standard with the HMC. In addition, users can upgrade
this support level to IBM onsite support to be consistent with other Power Systems servers.
An HMC or vHMC is required for the Power E980 server.
Integrated Virtual Management (IVM) is no longer supported.
For more information about vHMC, see Virtual HMC Appliance (vHMC) Overview.
Multiple Power Systems servers can be managed by a single HMC. Each server can be
connected to multiple HMC consoles to build extra resiliency into the management platform.
If you are attaching an HMC to a new server or adding a function to an existing server that
requires a firmware update, the HMC machine code might need to be updated to support the
firmware level of the server. In a dual-HMC configuration, both HMCs must be at the same
version and release of the HMC code.
To determine the HMC machine code level that is required for the firmware level on any
server, go to Fix Level Recommendation Tool (FLRT) on or after the planned availability date
for this product.
FLRT identifies the correct HMC machine code for the selected system firmware level.
Access to firmware and machine code updates is conditional on entitlement and license
validation in accordance with IBM policy and practices. IBM might verify entitlement
through customer number, serial number electronic restrictions, or any other means or
methods that are employed by IBM at its discretion.
HMC V9 supports only the Enhanced+ version of the GUI. The Classic version is no
longer available.
HMC V9R1.911.0 added support for managing IBM OpenPOWER systems. The same
HMC that is used to manage FSP-based enterprise systems can now manage the
baseboard management controller (BMC)-based AC/LC servers. This provides a
consistent and consolidated hardware management solution.
HMC V9 supports connections to servers that are based on IBM servers that are based
on POWER9, POWER8, and POWER7 processors. There is no support in this release
for servers that are based on POWER6 processors or earlier.
You may use either architecture to manage the servers. You also may use one Intel-based
HMC and one POWER8 processor-based HMC if the software is at the same level.
As a preferred practice, use the new POWER8 processor-based consoles for server
Intel-based HMCs
HMCs that are based on Intel processors that support V9 code are:
7042-CR6 and earlier HMCs are not supported for use with the Power E980 server.
All future HMC development will be done for the POWER8 processor-based 7063-CR1 and
its successors.
Note: System administrators can remotely start or stop a 7063-CR1 HMC by using the
ipmitool command or the WebUI.
For the HMC to communicate properly with the managed server, eth0 of the HMC must be
connected to either the HMC1 or HMC2 ports of the managed server, although other network
configurations are possible. You may attach a second HMC to the remaining HMC port of the
server for redundancy. The two HMC ports must be addressed by two separate subnets.
Figure 1-23 shows a simple network configuration to enable the connection from the HMC to
the server and to allow for dynamic LPAR operations. For more information about HMC and
the possible network connections, see IBM Power Systems HMC Implementation and Usage
Guide, SG24-7491.
Management LAN
Figure 1-23 Network connections from the HMC to service processor and LPARs
By default, the SP HMC ports are configured for dynamic IP address allocation. The HMC can
be configured as a DHCP server, providing an IP address at the time that the managed server
is powered on. In this case, the FSP is allocated an IP address from a set of address ranges
that is predefined in the HMC software.
If the SP of the managed server does not receive a DHCP reply before timeout, predefined IP
addresses are set up on both ports. Static IP address allocation is also an option and can be
configured by using the Advanced System Management Interface (ASMI) menus.
To achieve HMC redundancy for a POWER9 processor-based server, the server must be
connected to two HMCs:
The HMCs must be running the same level of HMC code.
The HMCs must use different subnets to connect to the SP.
The HMCs must be able to communicate with the server’s partitions over a public network
to allow for full synchronization and function.
Figure 1-24 shows one possible highly available HMC configuration that manages two
servers. Each HMC is connected to one FSP port of each managed server.
1 1
th th
e e
For simplicity, only the hardware management networks (LAN1 and LAN2) are highly
available. However, the open network (LAN3) can be made highly available by using a similar
concept and adding a second network between the partitions and HMCs.
For more information about redundant HMCs, see IBM Power Systems HMC Implementation
and Usage Guide, SG24-7491.
The speeds that are shown are at an individual component level. Multiple components and
application implementation are key to achieving the best performance.
Always do performance sizing at the application workload environment level and evaluate
performance by using real-world performance measurements and production workloads.
Figure 2-1 shows the logical system architecture of the Power E980 server.
USB x 3
Per lane
O Bus O Bus
Controller 0
Controller 0
P9 (0) P9 (1) CDIMM
Controller 1
Controller 1
X B us
X B us
X B us
X B us
O Bus O Bus
Controller 0
Controller 0
P9 (3) P9 (2) CDIMM
Controller 1
Controller 1
8 GBps
each CDIMM
64 GBps
12 V each 12 V
power to power to
Interface card Interface card
Service Service
PCIe Gen4 x16 slot
Figure 2-2 shows the symmetric multiprocessing (SMP) connections between nodes for 2-,
3-, and 4-drawer configurations.
2 Drawer
Node 2
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
Node 1
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
3 Drawer
Node 3
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
Node 2
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
Node 1
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
4 Drawer
Node 4
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
Node 3
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
Node 2
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
Node 1
P0 P2 P3 P1
O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1
880 mm
SMP Each connection shown in
1005 mm
Cable the diagram represents a
1225 mm
lengths pair of SMP cables.
1425 mm
1625 mm
As shown in Figure 2-3, the chip contains 12 cores, two memory controllers, Peripheral
Component Interconnect Express (PCIe) Gen4 I/O controllers, and an interconnection system
that connects all components within the chip at 7 TBps. Each core has 512 KB of level 2
cache, and 10 MB of level 3 embedded DRAM (eDRAM) cache. The interconnect also
extends through module and system board technology to other POWER9 processors in
addition to memory and various I/O devices.
L2 L2 L2 L2
L2 L2 L2 L2
The Power E980 server uses memory buffer chips to interface between the POWER9
processor and the DDR3 or DDR4 memory. Each buffer chip also includes an L4 cache to
reduce the latency of local memory accesses.
The POWER9 chip provides an embedded algorithm for the following features:
External Interrupt Virtualization Engine. Reduces the code impact/path length and
improves performance compared to the previous architecture.
Compression and decompression.
PCIe Gen4 support.
Two memory controllers that support buffered connection to DDR3 or DDR4 memory.
Cryptography: Advanced encryption standard (AES) engine.
Random number generator (RNG).
Secure Hash Algorithm (SHA) engine: SHA-1, SHA-256, and SHA-512, and Message
Digest 5 (MD5)
IBM Data Mover Tool
Note: The total values represent the maximum of 12 cores for the POWER9
processor-based architecture. The Power E980 server has options for 32, 40, 44, and 48
cores per node.
Enhanced branch prediction that uses both local and global prediction tables with a
selector table to choose the best predictor
Improved out-of-order execution
Two symmetric fixed-point execution units
Two symmetric load/store units and two load units, all four of which can also run simple
fixed-point instructions
An integrated, multi-pipeline vector-scalar floating point unit for running both scalar and
SIMD-type instructions, including the Vector Multimedia eXtension (VMX) instruction set
and the improved Vector Scalar eXtension (VSX) instruction set, which is capable of up to
16 floating point operations per cycle (eight double precision or 16 single precision)
In-core AES encryption capability
Hardware data prefetching with 16 independent data streams and software control
Hardware decimal floating point (DFP) capability
For more information about Power ISA Version 3.0, see OpenPOWER: IBM Power ISA
Version 3.0B.
Figure 2-4 shows a picture of the POWER9 core, with some of the functional units
SMT enables a single physical processor core to simultaneously dispatch instructions from
more than one hardware thread context. With SMT, each POWER9 core can present eight
hardware threads. Because there are multiple hardware threads per physical processor core,
more instructions can run at the same time.
Table 2-2 shows a comparison between the different POWER processors in terms of SMT
capabilities that are supported by each processor architecture.
Table 2-3 shows the processor feature codes (FCs) that are available for the Power E980
EFB1 5C35 CBU for Power Enterprise Systems 3.20 GHz, 32-core POWER9
EFB2 5C36 CBU for Power Enterprise Systems 2.80 GHz, 40-core POWER9
EFB3 5C39 CBU for Power Enterprise Systems 2.90 GHz, 48-core POWER9
EFB4 5C46 CBU for Power Enterprise Systems 3.0 GHz, 44-core POWER9
EHC6 5C36 Solution Edition for Healthcare 3.15 GHZ, 40-core Processor
Processors in the Power E980 system support Capacity on Demand (CoD). For more
information about CoD, see 2.3, “Capacity on Demand” on page 78.
CoD features that are independent of the processor feature are shown in Table 2-5.
With a new system order, the DDR4 technology-based CDIMMs are available with 32 GB,
64 GB, 128 GB, 256 GB, and 512 GB capacity. Also, the POWER9 memory channels support
the same electrical signaling, transport layer characteristics, and high-level, neutral read/write
protocol as the POWER8 counterparts on Power E870, Power E870C, Power E880, and
Power E880C servers. This enables the option to reuse DDR3 and DDR4 CDIMMs when
transferred as part of a model upgrade from the named POWER8 high-end servers to a
Power E980 system.
The maximum supported memory capacity per processor module is 4 TB, which requires the
use of 512 GB CDIMMs in all eight available CDIMM slots of a module. A maximum of 16 TB
of main memory can be provided by the four processor modules that are available in one
system node. A 4-node Power E980 system makes up to 64 TB of system memory accessible
to configured logical partitions (LPARs).
Figure 2-5 shows the POWER9 hierarchical memory subsystem of a Power E980 system.
16 - 512 16 - 512
16 - 512 16 - 512
16 - 512 16 - 512
16 - 512 16 - 512
Figure 2-5 POWER9 hierarchical memory subsystem that uses memory buffers
For more information about the CDIMM technology, memory placement rules, memory
bandwidth, and other topics that are related to the memory subsystem of the Power E980
server, see 2.2, “Memory subsystem” on page 68.
Like the POWER8 processor, the POWER9 processor supports the same L3 non-uniform
cache access (NUCA) architecture that provides mechanisms to distribute and share cache
footprints across the chip. The on-chip L3 cache is organized into separate areas with
differing latency characteristics. Each processor core is associated with a fast 10 MB local
region of L3 cache (FLR-L3), but also has access to other L3 cache regions as a shared L3
cache. Additionally, each core can negotiate to use the FLR-L3 cache that is associated with
another core, depending on the reference patterns. Data can also be cloned and stored in
more than one core’s FLR-L3 cache, again depending on the reference patterns.
This intelligent cache management enables the POWER9 processor to optimize the access to
L3 cache lines and minimize overall cache latencies. Regarding the POWER8 L3
implementation, the POWER9 L3 introduces an enhanced replacement algorithm with data
type and reuse awareness that uses information from the core and L2 cache to manage
cache replacement states. The L3 cache supports an array of prefetch requests from the
core, including both instruction and data, and for different levels of urgency. Prefetch requests
for POWER9 cache include more information exchange between the core, cache, and the
memory controller to manage memory bandwidth and to mitigate the prefetch-based cache
The following list provides an overview of the features that are offered by the POWER9 L3
Private 10 MB L3.0 cache/shared L3.1:
– Victim cache for local L2 cache (L3.0)
– Victim cache for other on-chip L3 caches (L3.1)
20-way set associative.
128-byte cache lines with 64-byte sector support.
Ten EDRAM banks (interleaved for access overlapping).
64-byte wide data bus to L2 for reads.
64-byte wide data bus from L2 for L2 castouts.
Eighty 1 Mb EDRAM macros that are configured in 10 banks, with each bank having a
64-byte wide data bus.
All cache accesses have the same latency.
20-way directory that is organized as four banks, with up to four reads or two reads and
two writes every two processor clock cycles to differing banks.
The L3 cache architecture of the 12-core POWER9 processor is identical to the 24-core
POWER9 implementation. For more information about the L3 cache technology, see
POWER9 Processor User’s Manual.
For more information about the L3 cache in the context of the POWER9 core architecture, see
H. Le, et al., IBM POWER9 processor core, IBM Journal of Research & Development Volume
62 Number 4/5, July/September 2018, which you can search for at IBM Journal of Research
& Development.
The CAPI protocol uses the PCIe Gen4 bus, which is natively supported on the POWER9
processor die. CAPI-capable accelerators are implemented as adapters that are placed in a
CAPI-enabled PCIe Gen3 or Gen4 slot. The maximum bandwidth of a CAPI accelerator is
limited by the PCIe bandwidth, which is 64 GBps for a x16 PCIe Gen4 adapter slot in a
POWER9 processor-based system.
All eight x16 PCIe Gen4 slots in a Power E980 system node are enabled for CAPI support,
which yields a maximum of 32 CAPI-attached accelerators per 4-node system. The CAPI
protocol has been developed and standardized since 2013 by the OpenPOWER Foundation.
For more information about the CAPI protocol, see the OpenPOWER Foundation.
On Power E980 systems, OpenCAPI-attached accelerators and devices and NVLink graphics
processing unit (GPU) connections are supported by two buses per POWER9 processor.
They are using the same interconnect technology that facilitates the SMP communication
between Power E980 nodes and provides a combined transfer capacity of 48 lanes running
with a signaling rate of 25.78 Gbps. Each system node has four interconnect buses, which
are referred to as O-buses O0, O1, O2, and O3, which are designed to support the SMP,
OpenCAPI, or NVLink protocol.
On Power E980 system nodes, the buses are configured to support the following protocols:
O0: SMP, NVLink, or OpenCAPI
O1: SMP only
O2: SMP only
O3: SMP, NVLink, or OpenCAPI
The OpenCAPI technology is developed and standardized by the OpenCAPI Consortium. For
more information about the consortium’s mission and the OpenCAPI protocol specification,
see OpenCAPI Consortium.
The NVLink 2.0 protocol is natively supported by dedicated logic on the POWER9 processor
die. By using it, you can coherently attach NVIDIA GPUs through a maximum of two O-buses
per processor. Each NVLink O-bus is composed of two bricks, and each brick provides eight
data lanes in NVLink mode running at 25 Gbps signaling rate. The maximum bandwidth of
one O-bus that is used to attach a NVLink GPU is 103.12 GBps, as calculated by the
following formula:
The NVLink technology is developed by the NVIDIA Corporation. For more information about
the NVLink protocol, see NVIDIA NVLink.
Note: The Power E980 system supports the OpenCAPI and the NVLink protocol, but at the
time of writing OpenCAPI-attached accelerators these technologies are included for future
reference only.
One of the key benefits of CAPI is that the devices gain coherent shared memory access with
the processors in the server and share full virtual address translation with these processors
by using the standard PCIe interconnect logic. In CAPI 1.0, the address translation logic of
the attached devices or accelerators is implemented as POWER Service Layer (PSL) on the
FPGA or ASIC. To ensure cache coherency, the PSL exchanges the relevant address
information with the coherent accelerator processor proxy (CAPP) unit that is on the
POWER8 processor chip.
Applications can have customized functions in FPGAs or ASICs and queue work requests
directly into shared memory queues to the accelerator logic. Applications can also have
customized functions by using the same effective addresses that they use for any threads
running on a host processor. From a practical perspective, CAPI enables a specialized
hardware accelerator to be seen as a dedicated processor (hollow core) in the system with
access to the main system memory and coherent communication with other processors in the
CAPI 2.0 was introduced with the POWER9 processor-based technology and represents the
next step in the evolutionary development to enhance the architecture and the protocol for the
attachment of accelerators. CAPI 2.0 uses the standard PCIe Gen4 interface of the POWER9
processor, which provides twice the bandwidth compared to the previous PCIe Gen3
interconnect generation.
A key difference between CAPI 1.0 and CAPI 2.0 relies on the introduction of the Nest
Memory Management Unit (NMMU) with POWER9. The NMMU replaces the address
translation and page fault logic inside the PSL. CAPI 2.0 on POWER9 still requires a PSL to
control the PCIe bus, provide the memory mapped I/O (MMIO) support, and generate
AFU-specific interrupts. However, by taking the address translation and page fault logic out of
the POWER9 PSL, it has become a “lighter” version of the POWER8 PSL, which potentially
reduces the complexity of accelerator development.
Figure 2-6 shows a block diagram of the CAPI 2.0 POWER9 hardware architecture.
Memory A
Memory bus
SMP bus
emo B
The POWER9 processor modules of the Power E980 server provide fault handling for the
The benefits of using CAPI include the ability to access shared memory blocks directly from
the accelerator, perform memory transfers directly between the accelerator and processor
cache, and reduce the code path length between the adapter and the processors. This
reduction in the code path length might occur because the adapter is not operating as a
traditional I/O device, and there is no device driver layer to perform processing. CAPI also
presents a simpler programming model.
Figure 2-7 shows a comparison of the traditional model, where the accelerator must go
through the processor to access memory with CAPI.
Processor .
Memory Memory
As mentioned before with CAPI 1.0 on POWER8, the PSL on the accelerator adapter
provides address translation and system memory cache for the accelerator functions. The
custom processors on the adapter board, consisting of an FPGA or an ASIC, use this layer to
access shared memory regions and cache areas as though they were a processor in the
system. This ability enhances the performance of the data access for the device and
simplifies the programming effort to use the device. Instead of treating the hardware
accelerator as an I/O device, it is treated as a processor, which eliminates the requirement of
a device driver to perform communication. It also eliminates the need for direct memory
access (DMA) that requires system calls to the OS kernel. By removing these layers, the data
transfer operation requires fewer clock cycles in the processor, improving the I/O
With CAPI 2.0 on POWER9, the address translation and page fault logic are moved to the
native NMMU on the POWER9 processor module, but because the accelerator has direct
access to this functional unit, the benefit of reduced path length in the programming model
stays the same, and the cache coherency control of the unified address space is significantly
Existing system interfaces are insufficient to address these disruptive forces. Traditional I/O
architecture results in high processor impact when applications communicate with I/O or
accelerator devices at the necessary performance levels. Also, they cannot integrate multiple
memory technologies with different access methods and performance attributes.
These challenges are addressed by the OpenCAPI architecture in a way that allows full
industry participation. Embracing an open architecture is fundamental to establish sufficient
volume base to lower costs and ensure the support of a broad infrastructure of software
products and attached devices.
OpenCAPI is an open interface architecture that allows any microprocessor to attach to the
following items:
Coherent user-level accelerators and I/O devices
Advanced memories accessible through read/write or user-level DMA semantics
OpenCAPI is neutral to processor architecture and exhibits the following key attributes:
High-bandwidth, low latency interface that is optimized to enable streamlined
implementations of attached devices.
A 25 Gbps signaling and protocol that enables a low latency interface on processors and
attached devices.
Complexities of coherence and virtual addressing that is implemented on a host
microprocessor to simplify attached devices and facilitate interoperability across multiple
CPU architectures.
Attached devices operate natively within an application’s user space and coherently with
processors, enabling attached devices to fully participate in applications without kernel
Supports a wide range of use cases and access semantics:
– Hardware accelerators
– High-performance I/O devices
– Advanced memories
Every POWER9 processor-based scale-out or scale-up system has either DPM or MPM
enabled by default. Both modes dynamically adjust processor frequency to maximize
performance and enable a much higher processor frequency range in comparison to
POWER8 servers. Each of the new power saver modes delivers consistent system
performance without any variation if the nominal operating environment limits are met.
For POWER9 processor-based systems that are under control of the PowerVM hypervisor,
the DPM and MPM are a system-wide configuration setting, but each processor module
frequency is optimized separately.
Several factors determine the maximum frequency that a processor module can run at:
Processor utilization: Lighter workloads run at higher frequencies.
Number of active cores: Fewer active cores run at higher frequencies.
Environmental conditions: At lower ambient temperatures, cores are enabled to run at
higher frequencies.
Figure 2-8 shows the frequency ranges for the POWER9 static nominal mode (all modes
disabled), the DPM, and the MPM. The frequency adjustments for different workload
characteristic, ambient conditions, and idle states are also indicated.
Light workloads or Light/medium workloads
many idle cores or a few idle cores
lower limit
Static range
nominal Processor module
System idle System idle
(if IPS on) idle or system idle (if IPS on)
(if IPS on)
Idle power
Static Dynamic Maximum
nominal performance performance
mode mode mode
Figure 2-8 POWER9 power management modes and related frequency ranges
Table 2-6 shows the static nominal and the static power saver mode frequencies, and the
frequency ranges of the DPM and the MPM for all four processor module types that are
available for the Power E980 system.
Table 2-6 Characteristic frequencies and frequency ranges for Power E980 server
Feature Cores per Static nominal Static power Dynamic performance Maximum performance
code single-chip frequency saver mode mode frequency range mode frequency range
module [GHz] frequency [GHz] [GHz]
Figure 2-9 shows the POWER9 processor frequency as a function of power management
mode and system utilization.
Static nominal
Load Level
Figure 2-9 POWER9 processor frequency as a function of power management mode and system load
The default performance mode depends on the POWER9 processor-based server model. For
Power E980 systems, the MPM is enabled by default.
The controls for all power saver modes are available on the Advanced System Management
Interface (ASMI) and can be dynamically modified. This includes to enable or to disable the
IPS function and change the EnergyScale tunable parameters. A system administrator may
also use the Hardware Management Console (HMC) to disable all power saver modes or to
enable one of the three available power and performance modes: static power saver mode,
DPM, or MPM.
Figure 2-10 shows the ASMI menu for Power and Performance Mode Setup on a Power E980
Figure 2-10 Power E980 ASMI menu for power and performance mode setup
For more information about the POWER9 EnergyScale technology, see POWER9
EnergyScale Introduction.
Table 2-7 shows key features and characteristics in comparison between the POWER9
scale-up, POWER9 scale-out, POWER8, and POWER7+ processor implementations.
Table 2-7 Comparison of technology for the POWER9 processor and prior processor generations
Characteristics POWER9 scale-up POWER9 scale-out POWER8 POWER7+
Technology 14 nm 14 nm 22 nm 32 nm
Die size 68.5 mm x 68.5 mm 68.5 mm x 68.5 mm 649 mm2 567 mm2
Maximum cores 12 24 12 8
Maximum SMT Eight threads Four threads Eight threads Four threads
threads per core
Maximum frequency 3.9 - 4.0 GHz 3.8 - 4.0 GHz 4.15 GHz 4.4 GHz
L2 Cache 512 KB per core 256 KB per core 512 KB per core 256 KB per core
Memory support DDR4 and DDR3a DDR4 DDR3 and DDR4 DDR3
The Power E980 system supports both DDR3 and DDR4 versions of the CDIMM. Mixing of
DDR3 and DDR4 CDIMMs is not supported in the same system node, but DDR3 CDIMMs in
one system node and DDR4 CDIMMs in another system node is supported in the same
Power E980 server.
New orders of Power E980 servers can be configured only with DDR4 CDIMMs of 32 GB,
64 GB, 128 GB, 256 GB, and 512 GB capacities.
To provide significant investment protection, the 16 GB, 32 GB, 64 GB, 128 GB, 256 GB
DDR4 CDIMMs and the 16 GB, 32 GB, 64 GB, 128 GB, DDR3 CDIMMs of the Power E870,
Power E870C, Power E880, and Power E880C servers are supported in the context of model
upgrades to Power E980 systems.
The memory subsystem of the Power E980 server enables a maximum system memory of
16 TB per system node. A 4-node system can support up to 64 TB of system memory.
The memory of Power E980 systems is CoD-capable, allowing for the purchase of extra
physical memory capacity that can then be dynamically activated when needed. 50% of the
installed memory capacity must be active.
The Power E980 server supports an optional feature called Active Memory Expansion (AME)
(#EM89). This allows the effective maximum memory capacity to be much larger than the true
physical memory. This feature uses a dedicated coprocessor on the POWER9 processor to
compress memory pages as they are written to and decompress them as they are read from
memory. This can deliver memory expansion of up to 125%, depending on the workload type
and its memory usage.
By adopting this architecture for the memory DIMMs, several decisions and processes
regarding memory optimizations are run internally in the CDIMM, which saves bandwidth and
allows for faster processor-to-memory communications. This also allows for a more robust
RAS. For more information, see Chapter 4, “Reliability, availability, serviceability, and
manageability” on page 133.
Depending on the memory capacity, the CDIMMs are manufactured in a Tall CDIMM or a
Short CDIMM form factor. The 16 GB, 32 GB, and 64 GB CDIMMs are Short CDIMMs and
the 128 GB, 256 GB, and 512 GB CDIMMs are the Tall CDIMMs. Each design is composed of
a varying number of 4 Gb or 8 Gb SDRAM chips depending on the total capacity of the
CDIMM. The large capacity 256 GB and 512 GB CDIMMs are based on two-high (2H) and
four-high (4H) 3D-stacked (3DS) DRAM technology.
The CDIMM slots for the Power E980 server are Tall CDIMM slots. A filler is added to a Short
CDIMM allowing it to properly latch into the same physical location of a Tall CDIMM and
ensure proper airflow and ease of handling. Tall CDIMMs slots allow for larger DIMM sizes
and potentially a more seamless adoption of future technologies.
A detailed diagram of the CDIMMs that are available for the Power E980 server is shown in
Figure 2-11.
DDR Interface
CDIMM types:
• 128 GB = 152 DDR4
Scheduler & 16MB
8Gb SDRAM chips Managemet Memoryy
• 256 GB = 152 2H 3DS DDR4
8Gb SDRAM chips Shot Form Factor (SFF)
• 512 GB = 152 4H 3DS DDR4 CDIMM including Filler
8Gb SDRAM chips
CDIMM types:
• 32 GB = 40 DDR4 8Gb
SDRAM chips
• 64 GB = 80 DDR4 8Gb
SDRAM chips
The memory buffer is a L4 cache and is built on eDRAM technology (same as the L3 cache),
which has a lower latency than regular SRAM. Each CDIMM has 16 MB of L4 cache, and a
fully populated Power E980 server has 2 GB of L4 cache. The L4 cache performs several
functions that have direct impact on performance and bring a series of benefits to the
Power E980 server:
Reduces energy consumption by reducing the number of memory requests.
Increases memory write performance by acting as a cache and by grouping several
random writes into larger transactions.
Partial write operations that target the same cache block are gathered within the L4 cache
before being written to memory, becoming a single write operation.
Reduces latency on memory access. Memory access for cached blocks has up to 55%
lower latency than non-cached blocks.
All the memory CDIMMs are capable of Capacity Upgrade on Demand (CUoD) and must
have a minimum of 50% of their physical capacity activated. For example, the minimum
installed memory for a Power E980 server is 512 GB, which requires a minimum of 256 GB
memory activations.
For the Power E980 server, the following 1600 MHz DDR4 DRAM memory options are
available when placing an initial order:
128 GB (4 x 32 GB) (#EF20)
256 GB (4 x 64 GB) (#EF21)
512 GB (4 x 128 GB) (#EF22)
1024 GB (4 x 256 GB) (#EF23)
2048 GB (4 x 512 GB) (#EF24)
Each processor module has two memory controllers. These memory controllers must have at
least a pair of CDIMMs that are attached to it. This set of mandatory four CDIMMs is called a
memory quad.
A logical diagram of a POWER9 processor with its two memory quads attached to the
memory controllers MC0 and MC1 is shown in Figure 2-12.
MC0 Quad
Figure 2-12 Logical diagram of a POWER9 processor and its two memory quads
The suggested approach is to install memory evenly across all processors and across all
system nodes in the server and the chosen CDIMM size is consistently equal for all memory
slots. Balancing memory across the installed processors allows memory access in a
consistent manner and typically results in the best possible performance for your
configuration. You should account for any plans for future memory upgrades when you decide
which memory feature size to use at the time of the initial system order.
A physical diagram with the location codes of the memory CDIMMs of a system node and heir
grouping as memory quads is shown in Figure 2-13. Each system node has eight memory
quads that are attached to the memory controllers MC0 and MC1 of the respective processor
modules. The quads are identified by individually assigned color codes in Figure 2-13.
P1-C22 (Quad 4)
P1-C23 (Quad 4)
P1-C24 (Quad 8)
P1-C25 (Quad 8) Processor
P1-C26 (Quad 8)
P1-C27 (Quad 8) P1
P1-C28 (Quad 4)
P1-C29 (Quad 4)
P1-C30 (Quad 3)
P1-C31 (Quad 3)
P1-C32 (Quad 7)
P1-C33 (Quad 7) Processor
P1-C34 (Quad 7)
P1-C35 (Quad 7) P3
P1-C36 (Quad 3)
P1-C37 (Quad 3)
P1-C38 (Quad 2)
P1-C39 (Quad 2)
P1-C40 (Quad 6)
P1-C41 (Quad 6) Processor
P1-C42 (Quad 6)
P1-C43 (Quad 6) P2
P1-C44 (Quad 2)
P1-C45 (Quad 2)
P1-C46 (Quad 1)
P1-C47 (Quad 1)
P1-C48 (Quad 5)
P1-C49 (Quad 5) Processor
P1-C50 (Quad 5)
P1-C51 (Quad 5) P0
P1-C52 (Quad 1)
P1-C53 (Quad 1)
Figure 2-13 System node physical diagram with location codes for CDIMMs
Each Power E980 system node requires four memory quads to populate the required
minimum of 16 CDIMM slots. The location codes of the slots for the memory quads 1 - 4 are
shown in the following list. There is no specific ordering sequence that is implied because all
four quads must be present in a valid configuration.
Quad 1: P1-C46, P1-C47, P1-C52, and P1-C53 (slots connected to Processor P0)
Quad 4: P1-C22, P1-C23, P1-C28, and P1-C29 (slots connected to Processor P1)
Quad 2: P1-C38, P1-C39, P1-C44, and P1-C45 (slots connected to Processor P2)
Quad 3: P1-C30, P1-C31, P1-C36, and P1-C37 (slots connected to Processor P3)
No mandatory plugging sequence must be adhered to for the population of any of the
remaining open CDIMM slots. The locations codes for memory quads 5 - 8 are shown in the
following list:
Quad 5: P1-C48, P1-C49, P1-C50, and P1-C51 (slots connected to Processor P0)
Quad 8: P1-C24, P1-C25, P1-C26, and P1-C27 (slots connected to Processor P1)
Quad 6: P1-C40, P1-C41, P1-C42, and P1-C43 (slots connected to Processor P2)
Quad 7: P1-C32, P1-C33, P1-C34, and P1-C35 (slots connected to Processor P3)
The numbering of quads 5 - 8 does not indicate any ordinal sequence, and any quad can be
assigned to any processor module. The solution designer has the flexibility to assign the extra
memory quads to any processor module if the minimum memory configuration is established.
Furthermore, in multi-system node Power E980 servers, the solution designer can either fully
populate one drawer and have the other drawers partially populated or have all the drawers
symmetrically populated. For example, consider a 2-node Power E980 server with six extra
quads of memory beyond the eight quads that are needed to fulfill the minimum memory
configuration requirement. One option is to install four quads in one node and two quads in
the other node. In an alternative configuration, both system nodes can be expanded by three
quads each.
There are two activation types that can be used to accomplish this:
Static memory activations: Memory activations that are exclusive for a single server.
Mobile memory activations: Memory activations that can be moved from server to server
in an IBM Power Enterprise Pool.
Both types of memory activations can be in the same system if at least 25% of the memory
activations are static. This leads to a maximum of 75% of the memory activations as mobile.
Figure 2-14 shows an example of the minimum required activations for a system with 1 TB of
installed memory.
Figure 2-14 Example of the minimum required activations for a system with 1 TB of installed memory
The granularity for static memory activation is 1 GB, and for mobile memory activation the
granularity is 100 GB.
Specific FCs support memory activations for DDR3 or DDR4 memory CDIMMs that were
transferred from Power E880 or Power E880C systems in the context of a model upgrade to a
Power E980 server.
Table 2-8 lists the FCs that can be used to achieve the wanted number of activations.
Static memory activations can be converted to mobile memory activations after system
installation. To enable mobile memory activations, the systems must be part of an IBM Power
Enterprise Pool and have #EB35 configured. For more information about IBM Power
Enterprise Pools, see 2.3.5, “IBM Power Enterprise Pools and Mobile Capacity on Demand”
on page 83.
The total maximum theoretical memory bandwidth per Power E980 system node is
921.6 GBps, and the total maximum theoretical memory bandwidth per 4-node Power E980
system is 3686.4 GBps.
As data flows from main memory towards the execution units of the POWER9 processor, they
pass through the 512 KB L2 and the 64 KB L1 cache. In many cases, the 10 MB L3 victim
cache may also provide the data that is needed for the instruction execution.
Table 2-9 shows the maximum cache bandwidth for a single core as defined by the width of
the relevant channels and the related transaction rates on the Power E980 system.
Table 2-9 Power E980 single core architectural maximum cache bandwidth
Cache level of the Power E980 cache bandwidtha
POWER9 core
3.9 - 4.0 GHz core 3.7 - 3.9 GHz core 3.55 - 3.9 GHz core 3.58 - 3.9 GHz core
(#EFP1) (#EFP2) (#EFP3) (#EFP4)
[GBps] [GBps] [GBps] [GBps]
L1 64 KB data cache 374 - 384 355 - 374 341 - 374 344 - 374
L2 512 KB cache 374 - 384 355 - 374 341 - 374 344 - 374
L3 cache: With two clock cycles, one 64 byte read operation to the L2 cache and one
64-byte store operation from the L2 cache can be accomplished. The values vary
depending on the core frequency and are computed as follows:
– Core running at 3.55 GHz: (1 x 64 B + 1 x 64 B) x 3.55 GHz / 2 = 227.10 GBps
– Core running at 3.58 GHz: (1 x 64 B + 1 x 64 B) x 3.58 GHz / 2 = 229.12 GBps
– Core running at 3.70 GHz: (1 x 64 B + 1 x 64 B) x 3.70 GHz / 2 = 236.80 GBps
– Core running at 3.90 GHz: (1 x 64 B + 1 x 64 B) x 3.90 GHz / 2 = 249.60 GBps
– Core running at 4.00 GHz: (1 x 64 B + 1 x 64 B) x 4.00 GHz / 2 = 256.00 GBps
For each system node of a Power E980 server that is populated with four processor modules
and all its memory CDIMMs filled, the overall bandwidths as defined by the width of the
relevant channels and the related transaction rates are shown in Table 2-10.
Table 2-10 Power E980 system node architectural maximum cache and memory bandwidth
Memory Power E980 cache and system memory bandwidth per nodea
architecture entity
32 cores (#EFP1) 40 cores (#EFP2) 44 cores (#EFP4) 48 cores (#EFP3)
@ 3.9 - 4.0 GHz @ 3.7 - 3.9 GHz @ 3.58 - 3.9 GHz @ 3.55 - 3.9 GHz
[GBps] [GBps] [GBps] [GBps]
L1 64 KB data cache 11,981 - 12,288 14,208 - 14,976 15,122 - 16,474 16,358 - 17,971
L2 512 KB cache 11,981 - 12,288 14,208 - 14,976 15,122 - 16,474 16,358 - 17,9712
For the entire Power E980 system configured with four system nodes, the accumulated
bandwidth values are shown in Table 2-11.
Table 2-11 Power E980 4-node server total architectural maximum cache and memory bandwidth
Memory Power E980 cache and system memory bandwidth for 4-node systema
architecture entity
128 cores (#EFP1) 160 cores (#EFP2) 176 cores (#EFP4) 192 cores (#EFP3)
@ 3.9 - 4.0 GHz @ 3.7 - 3.9 GHz @ 3.58 - 3.9 GHz @ 3.55 - 3.9 GHz
[GBps] [GBps] [GBps] [GBps]
L1 64 KB data cache 47,923 - 49,152 56,832 - 59,904 60,488 - 65,894 65,434 - 71,885
L2 512 KB cache 47,923 - 49,152 56,832 - 59,904 60,488 - 65,894 65,434 - 71,885
Active Memory Mirroring (AMM) is included with all Power E980 systems at no extra charge.
It can be enabled, disabled, or reenabled depending on the user’s requirements.
The hypervisor code logical memory blocks are mirrored on distinct CDIMMs to allow for
more usable memory. There is no specific CDIMM that hosts the Hypervisor memory blocks,
so the mirroring is done at the logical memory block level, not at the CDIMM level. To enable
the AMM feature, the server must have enough free memory to accommodate the mirrored
memory blocks.
Besides the hypervisor code itself, other components that are vital to the server operation are
also mirrored:
Hardware page tables (HPTs), which are responsible for tracking the state of the memory
pages that are assigned to partitions
Translation control entities (TCEs), which are responsible for providing I/O buffers for the
partition’s communications
Memory that is used by the hypervisor to maintain partition configuration, I/O states,
virtual I/O information, and the partition state
It is possible to check whether the AMM option is enabled and change its status through the
classical GUI of the HMC by clicking the Advanced tab of the CEC Properties panel. If you
are using the enhanced GUI of the HMC, you find the relevant information and controls in the
Memory Mirroring section of the General Settings panel of the selected Power E980 system
(Figure 2-15).
Figure 2-15 Memory Mirroring section in the General Settings panel on the HMC enhanced GUI
After a failure on one of the CDIMMs containing hypervisor data occurs, all the server
operations remain active and the Flexible Service Processor (FSP) isolates the failing
CDIMMs. Systems stay in the partially mirrored state until the failing CDIMMs are replaced.
There are components that are not mirrored because they are not vital to the regular server
operations and require a larger amount of memory to accommodate its data:
Advanced Memory Sharing Pool
Memory that is used to hold the contents of platform memory dumps
Partition data: AMM will not mirror partition data. It mirrors only the hypervisor code and
its components, allowing this data to be protected against a DIMM failure.
With AMM, uncorrectable errors in data that is owned by a partition or application are handled
by the existing Special Uncorrectable Error (SUE) handling methods in the hardware,
firmware, and OS.
In addition, a spare DRAM per rank on each memory port provides for dynamic DRAM device
replacement during runtime operation. Also, dynamic lane sparing on the memory channel’s
DMI link allows for repair of a faulty data lane.
Other memory protection features include retry capabilities for certain faults that are detected
at both the memory controller and the memory buffer.
Memory is also periodically scrubbed so that soft errors can be corrected and solid single-cell
errors can be reported to the hypervisor, which supports OS deallocation of a page that is
associated with a hard single-cell fault.
For more information about memory RAS, see 4.4, “Memory RAS details” on page 143.
The following convention is used in the Order type column in all tables in this section:
Initial Only available when ordered as part of a new system
MES Only available as a Miscellaneous Equipment Specification (MES)
Both Available with a new system or as part of an upgrade
Supported Unavailable as a new purchase, but supported when migrated from
another system or as part of a model conversion
With the CUoD offering, you can purchase more static processors or memory capacity and
dynamically activate them when without restarting your server or interrupting your business.
All the static processor or memory activations are restricted to a single server.
CUoD has several benefits that enable a more flexible environment. One of its benefits is
reducing the initial investment in a system. Traditional projects that use other technologies
means that a system must be acquired with all the resources available to support the whole
lifecycle of the project. As a result, you pay up front for capacity that you do not need until the
later stages of the project or possible at all, which impacts software licensing costs and
software maintenance.
By using CUoD, a company starts with a system with enough installed resources to support
the whole project lifecycle, but uses only active resources that are necessary for the initial
project phases. More resources can be added as the project proceeds by activating
resources as they are needed. Therefore, a company can reduce the initial investment in
hardware and acquire software licenses only when they are needed for each project phase,
which reduces the total cost of ownership (TCO) and total cost of acquisition (TCA) of the
Figure 2-16 shows a comparison between two scenarios: a fully activated system versus a
system with CUoD resources being activated along the project timeline.
Without CuOD
Cores 40
With CuOD
Core Activations
Table 2-12 lists the static processor activation features that are available for initial order on the
Power E980 server.
Table 2-13 lists the static memory activation features that are available for initial order on the
Power E980 server.
Change of name: Some websites or documents still refer to Elastic CoD as On/Off
Capacity on Demand (On/Off CoD).
With the Elastic CoD offering, you can temporarily activate and deactivate processor cores
and memory units to help meet the demands of business peaks, such as seasonal activity,
period-end, or special promotions. Elastic CoD was previously called On/Off CoD. When you
order an Elastic CoD feature, you receive an enablement code that a system operator uses to
make requests for more processor and memory capacity in increments of one processor day
or 1 GB memory day. The system monitors the amount and duration of the activations. Both
prepaid and post-pay options are available.
Charges are based on usage reporting that is collected monthly. Processors and memory
may be activated and turned off an unlimited number of times when more processing
resources are needed.
This offering provides a system administrator an interface at the HMC to manage the
activation and deactivation of resources. A monitor that is on the server records the usage
activity. This usage data must be sent to IBM monthly. A bill is then generated that is based on
the total amount of processor and memory resources that are used, in increments of
processor and memory (1 GB) Days.
The Power E980 serve supports the 90-day temporary Elastic CoD processor and memory
enablement features. These features allow the system to activate processor days and GB
days equal to the number of inactive resources multiplied by 90 days. Thus, if all resources
are activated by using Elastic CoD, a new enablement code must be ordered every 90 days. If
only half of the inactive resources are activated by using Elastic CoD, a new enablement code
must be ordered every 180 days.
Before using temporary capacity on your server, you must enable your server. To enable your
server, an enablement feature (MES only) must be ordered and the required contracts must
be in place. The 90-day enablement feature for the Power E980 processors is #EP9T. For
memory, the enablement feature is #EM9V.
If a Power E980 server uses the IBM i OS in addition to any other supported OS on the same
server, the client must inform IBM which OS used the temporary Elastic CoD processors so
that the correct feature can be used for billing.
The Elastic CoD process consists of three steps: enablement, activation, and billing.
Before requesting temporary capacity on a server, you must enable it for Elastic CoD. To
do this, order an enablement feature and sign the required contracts. IBM generates an
enablement code, mails it to you, and posts it on the web for you to retrieve and enter into
the target server.
Activation requests
When Elastic CoD temporary capacity is needed, use the HMC menu for On/Off CoD.
Specify how many inactive processors or gigabytes of memory must be temporarily
activated for a specific number of days. You are billed for the days that are requested,
whether the capacity is assigned to partitions or remains in the shared processor pool
At the end of the temporary period (days that were requested), you must ensure that the
temporarily activated capacity is available to be reclaimed by the server (not assigned to
partitions), or you are billed for any unreturned processor days.
The contract, signed by the client before receiving the enablement code, requires the
Elastic CoD user to report billing data at least once a month (whether or not activity
occurs). This data is used to determine the proper amount to bill at the end of each billing
period (calendar quarter). Failure to report billing data for use of temporary processor or
memory capacity during a billing quarter can result in default billing that is equivalent to 90
processor days of temporary capacity.
For more information about registration, enablement, and usage of Elastic CoD, see Power
Systems Capacity on Demand.
Table 2-14 lists the Elastic CoD features that are available for the Power E980 server.
IBM Power Enterprise Pools is a technology for dynamically sharing processor and memory
activations among a group (or pool) of IBM Power Systems servers. By using Mobile Capacity
on Demand (CoD) activation codes, the systems administrator can perform tasks without
contacting IBM.
With this capability, you can move resources between Power E980 systems and Power E980
and Power E870, Power E870C, Power E880, and Power E880C systems, and have
unsurpassed flexibility for workload balancing and system maintenance.
A pool can support systems with different clock speeds or processor generations.
The basic rules for Mobile Capacity on Demand (Mobile CoD) are as follows:
The Power E980 server requires a minimum of eight static processor activations.
The Power 870, Power 870C, Power E880, and Power 880C servers require a minimum of
eight static processor activations.
For all systems, 25% of the installed memory capacity must have static activations.
All the systems in a pool must be managed by the same HMC or by the same pair of
redundant HMCs. If redundant HMCs are used, the HMCs must be connected to a network so
that they can communicate with each other. The HMCs must have at least 2 GB of memory.
An HMC can manage multiple IBM Power Enterprise Pools and systems that are not part of
an IBM Power Enterprise Pool. Systems can belong to only one IBM Power Enterprise Pool at
a time. Powering down an HMC does not limit the assigned resources of participating
systems in a pool, but does limit the ability to perform pool change operations.
After an IBM Power Enterprise Pool is created, the HMC can be used to perform the following
Mobile CoD processor and memory resources can be assigned to systems with inactive
resources. Mobile CoD resources remain on the system to which they are assigned until
they are removed from the system.
New systems can be added to the pool and existing systems can be removed from the
New resources can be added to the pool or existing resources can be removed from the
Pool information can be viewed, including pool resource assignments, compliance, and
history logs.
In order for the mobile activation features to be configured, an IBM Power Enterprise Pool and
the systems that are going to be included as members of the pool must be registered with
IBM. Also, the systems must have #EB35 for mobile enablement configured, and the required
contracts must be in place.
Table 2-15 lists the mobile processor and memory activation features that are available for the
Power E980 server.
For more information about IBM Power Enterprise Pools, see Power Enterprise Pools on IBM
Power Systems, REDP-5101.
With Utility CoD, you can place a quantity of inactive processors into the server’s SPP, which
then become available to the pool's resource manager. When the server recognizes that the
combined processor utilization within the SPP exceeds 100% of the level of base (purchased
and active) processors that are assigned across uncapped partitions, then a Utility CoD
processor minute is charged and this level of performance is available for the next minute of
If an extra workload requires a higher level of performance, the system automatically allows
the more Utility CoD processors to be used, and the system automatically and continuously
monitors and charges for the performance that is needed above the base (permanent) level.
Registration and usage reporting for utility CoD is made by using a website and payment is
based on reported usage. Utility CoD requires PowerVM Standard Edition or PowerVM
Enterprise Edition to be active.
If a Power E980 server uses the IBM i OS in addition to any other supported OS on the same
server, the client must inform IBM which OS caused the temporary Utility CoD processor
usage so that the correct feature can be used for billing.
For more information regarding registration, enablement, and use of Utility CoD, see IBM
Support Planning.
A standard request activates two processors or 64 GB of memory (or eight processor cores
and 64 GB of memory) for 30 days. Subsequent standard requests can be made after each
purchase of a permanent processor activation. An HMC is required to manage Trial CoD
An exception request for Trial CoD requires you to complete a form that includes contact
information and VPD from your Power E980 system with inactive CoD resources. An
exception request activates all inactive processors or all inactive memory (or all inactive
processor and memory) for 30 days. An exception request can be made only one time over
the life of the machine. An HMC is required to manage Trial CoD activations.
To request either a Standard or an Exception Trial, see Power Systems Capacity on Demand:
Trial Capacity on Demand.
I/O Backplane
PHB 0 PCIe Gen4 x16 slot
NVMe slot
NVMe slot
The system nodes allow for eight PCIe Gen4 x16 slots. More slots can be added by attaching
PCIe Expansion Drawers, and SAS disks can be attached to EXP24S SFF Gen2 Expansion
Drawers. The PCIe Expansion Drawer is connected by using an #EJ07 adapter. The EXP24S
drawer can be either attached to SAS adapters on the system nodes or on the PCIe
Expansion Drawer.
For a list of adapters and their supported slots, see 2.5, “PCIe adapters” on page 88.
Disk support: There is no support for SAS disks that are directly installed on the system
nodes and PCIe Expansion Drawers. If directly attached SAS disks are required, they must
be installed in a SAS disk drawer and connected to a supported SAS controller in one of
the PCIe slots.
For more information about PCIe Expansion Drawers, see 2.7.1, “PCIe Gen3 I/O Expansion
Drawer” on page 96.
Similar to the previous generation Power E870 and Power E880 systems, the redundant SPs
are housed in the system control unit (SCU). However, the SCU no longer hosts the system
clock. Each system node hosts its own redundant clocks. The cables that are used to provide
communications between the control units and system nodes depend on the number of
system nodes that is installed. When a system node is added, a new set of cables is also
The cables that are necessary for each system node are grouped under a single FC, allowing
for an easier configuration. Each cable set includes a pair of FSP cables, and when applicable
SMP cables and Universal Power Interconnect Cables (UPIC) cables.
Cable sets FCs are incremental and depend on the number of installed drawers as follows:
One system node: #EFCA
Two system nodes: #EFCA and #EFCB
Three system nodes: #EFCA, #EFCB, and #EFCC
Four system nodes: #EFCA, #EFCB, #EFCC, and #EFCD
The following convention is used in the Order type column in all tables in this section:
Initial Only available when ordered as part of a new system
MES Only available as an MES upgrade
Both Available with a new system or as part of an upgrade
Supported Unavailable as a new purchase, but supported when migrated from
another system or as part of a model conversion
The PCIe interfaces that are supported on the system nodes are PCIe Gen4, and are capable
of 32 GBps simplex (64 GBps duplex) speeds on a single x16 interface. PCIe Gen4 slots also
support previous generations (Gen3, Gen2, and Gen1) adapters, which operate at lower
speeds, according to the following rules:
Place x1, x4, x8, and x16 speed adapters in same connector size slots first before mixing
adapter speeds with connector slot sizes.
Adapters with smaller speeds are allowed in larger-sized PCIe connectors, but larger
speed adapters are not compatible in smaller connector sizes (for example, a x16 adapter
cannot go in an x8 PCIe slot connector).
The Power E980 server also supports expansion beyond the slots that are available in the
system nodes by attaching one or more PCIe Gen3 I/O Expansion Drawers (#EMX0).
IBM POWER9 processor-based servers can support two different form factors of PCIe
PCIe low-profile (LP) cards, which are used with system node PCIe slots.
PCIe full-height and full-high cards are used in the PCIe Gen3 I/O Expansion Drawer
Low-profile PCIe adapters are supported only in low-profile PCIe slots, and full-height and
full-high cards are supported only in full-high slots. Adapters that are low-profile have “LP” in
the adapter description.
Before adding or rearranging adapters, use the IBM System Planning Tool (SPT) to validate
the new adapter configuration.
If you are installing a new feature, ensure that you have the software that is required to
support the new feature and determine whether there are any existing update prerequisites to
install. To do this, see IBM Prerequisites.
The following sections describe the supported adapters and provide tables of orderable
feature numbers. The tables indicate OS support (AIX, IBM i, and Linux) for each of the
EN0W 2CC4 PCIe2 2-port 10/1 GbE BaseT RJ45 Adapter AIX, IBM i, Both
and Linux
EN0U 2CC3 PCIe2 4-port (10 Gb+1 GbE) Copper SFP+RJ45 AIX, IBM i, Both
Adapter and Linux
EN0S 2CC3 PCIe2 4-Port (10 Gb+1 GbE) SR+RJ45 Adapter AIX, IBM i, Both
and Linux
EN0X 2CC4 PCIe2 LP 2-port 10/1 GbE BaseT RJ45 Adapter AIX, IBM i, Both
and Linux
EN0V 2CC3 PCIe2 LP 4-port (10 Gb+1 GbE) Copper AIX, IBM i, Both
SFP+RJ45 Adapter and Linux
EN0T 2CC3 PCIe2 LP 4-Port (10 Gb+1 GbE) SR+RJ45 AIX, IBM i, Both
Adapter and Linux
EC2S 58FA PCIe3 2-Port 10 Gb NIC & RoCE SR/Cu Adapter IBM i and Both
EC2U 58FB PCIe3 2-Port 25/10 Gb NIC & RoCE SR/Cu IBM i and Both
Adapter Linux
EN0K 2CC1 PCIe3 4-port (10 Gb FCoE & 1 GbE) SFP+Copper AIX, IBM i, Both
& RJ45 and Linux
EN0H 2B93 PCIe3 4-port (10 Gb FCoE & 1 GbE) SR & RJ45 AIX, IBM i, Both
and Linux
EN17 2CE4 PCIe3 4-port 10 GbE SFP+ Copper Adapter AIX, IBM i, Both
and Linux
EC2R 58FA PCIe3 LP 2-Port 10 Gb NIC & RoCE SR/Cu IBM i and Both
Adapter Linux
EC3L 2CEC PCIe3 LP 2-port 100 GbE (NIC& RoCE) QSFP28 AIX and IBM i Both
Adapter x16
EC2T 58FB PCIe3 LP 2-Port 25/10 Gb NIC & RoCE SR/Cu IBM i and Both
Adapter Linux
EN0L 2CC1 PCIe3 LP 4-port (10 Gb FCoE & 1 GbE) AIX, IBM i, Both
SFP+Copper & RJ45 and Linux
EN0J 2B93 PCIe3 LP 4-port (10 Gb FCoE & 1 GbE) SR & AIX, IBM i, Both
RJ45 and Linux
EN18 2CE4 PCIe3 LPX 4-port 10 GbE SFP+ Copper Adapter AIX, IBM i, Both
and Linux
EN16 2CE3 PCIe3 LPX 4-port 10 GbE SR Adapter AIX, IBM i, Both
and Linux
EC67 2CF3 PCIe4 LP 2-port 100 Gb RoCE EN LP adapter IBM i and Both
a. For more information about order types, see 2.5, “PCIe adapters” on page 88.
EJ0J 57B4 PCIe3 RAID SAS Adapter Quad-port 6 Gb x8 AIX, IBM i, Both
and Linux
EJ0M 57B4 PCIe3 LP RAID SAS Adapter Quad-Port 6 Gb x8 AIX, IBM i, Both
and Linux
EJ10 57B4 PCIe3 SAS Tape/DVD Adapter Quad-port 6 Gb x8 AIX, IBM i, Both
and Linux
EJ11 57B4 PCIe3 LP SAS Tape/DVD Adapter Quad-port 6 Gb AIX, IBM i, Both
x8 and Linux
EJ14 57B1 PCIe3 12 GB Cache RAID PLUS SAS Adapter AIX, IBM i, Both
Quad-port 6 Gb x8 and Linux
a. For more information about order types, see 2.5, “PCIe adapters” on page 88.
5273 577D PCIe LP 8 Gb 2-Port Fibre Channel Adapter AIX, IBM i, Both
and Linux
5729 5729 PCIe2 8 Gb 4-port Fibre Channel Adapter AIX, IBM i, Both
and Linux
5735 577D 8 Gb PCI Express Dual Port Fibre Channel AIX, IBM i, Both
Adapter and Linux
EN0A 577F PCIe3 16 Gb 2-port Fibre Channel Adapter AIX, IBM i, Both
and Linux
EN0B 577F PCIe3 LP 16 Gb 2-port Fibre Channel Adapter AIX, IBM i, Both
and Linux
EN1A 578F PCIe3 32 Gb 2-port Fibre Channel Adapter IBM i and Both
EN1B 578F PCIe3 LP 32 Gb 2-port Fibre Channel Adapter IBM i and Both
EN1C 578E PCIe3 16 Gb 4-port Fibre Channel Adapter IBM i and Both
EN1D 578E PCIe3 LP 16 Gb 4-port Fibre Channel Adapter IBM i and Both
a. For more information about order types, see 2.5, “PCIe adapters” on page 88.
The ports are found at the rear of the system enclosures, as shown in Figure 2-18.
3 x USB3
Figure 2-18 The rear of the Power E980 server with the USB location highlighted
One of the ports on the first system node is connected to a port on the rear of the SCU, which
is then routed to a front-accessible USB port.
All USB ports on the system nodes and on the front of the SCU can function with any USB
device that is supported by the client OS to which the adapter is assigned.
There are no USB PCIe adapters that are supported in the Power E980 server.
The IBM PCIe Cryptographic Coprocessor adapter has the following features:
Integrated Dual processors that operate in parallel for higher reliability
Supports IBM Common Cryptographic Architecture or PKCS#11 standards
Ability to configure an adapter as a coprocessor or accelerator
Support for smart card applications by using Europay, MasterCard, and Visa
Cryptographic key generation and random number generation
PIN processing: Generation, verification, and translation
Encrypt and decrypt by using AES and DES keys
For the most recent firmware and software updates, see IBM CryptoCards.
Table 2-21 lists the cryptographic adapter that is available for the server.
EJ33 4767 PCIe3 Crypto Coprocessor BSC-Gen3 4767 AIX, IBM i, Both
and Linux
a. For more information about order types, see 2.5, “PCIe adapters” on page 88.
Adapter height: The #EJ33 adapter is a full-height adapter, so it is supported only in the
PCI Gen3 I/O Expansion Drawer.
4 x NVMe
Figure 2-19 Power E980 system node with SSD location highlighted
The internal SSD drives are intended for boot purposes only and not as general-purpose
EC5J 59B4 Mainstream 800 GB SSD NVMe U.2 module AIX and Both
a. For more information about order types, see 2.5, “PCIe adapters” on page 88.
At the initial availability date of 21 September 2018, the Power E980 system supports two
PCIe Gen3 I/O drawer per system node, which yields a maximum of four I/O drawers per
2-node Power E980 system configuration. One I/O drawer supports two fan-out modules that
offer six PCIe Gen3 adapter slots each. This delivers an extra 24 PCIe Gen3 slot capacity per
system node and a maximum of 48 PCIe Gen3 slots per 2-node server.
Eight slots in a system node s are used to cable the four I/O drawers for a total of 56 available
slots for a 2-node system.
With the availability of 3-node and 4-node Power E980 configurations in 16 November 2018,
the number of supported PCIe Gen3 I/O drawers is raised to four per system node with a
maximum of 16 per 4-node Power E980 system. A maximum of 48 PCIe Gen3 slots per
system node and a maximum of 192 PCIe Gen3 slots per 4-node Power E980 server are
available at that date.
Each fan-out module is attached by one optical cable adapter, which occupies one
x16 PCIe Gen4 slot of a system node. Therefore, at the initial availability date, a 1-node
Power E980 system configuration with two I/O drawers that are attached provides a maximum
of 28. With the enhanced configurations options that are available in November 2018, these
numbers will increase to a maximum of 48 for a 1-node configuration because all node slots
must cable the drawers, and to a maximum of 192 available PCIe slots for a 4-node
Power E980 server with 16 I/O drawers.
For the dimensions of the drawer, see 1.3, “Physical package” on page 10.
PCIe3 x16 to optical cable adapter (#EJ07) and 2.0 m (#ECC6), 10.0 m (#ECC8), or 20.0 m
(#ECC9) CXP 16X Active Optical Cables (AOCs) connect the system node to a PCIe FanOut
Module in the I/O expansion drawer. One #ECC6, one #ECC8, or one #ECC9 includes two
AOC cables.
A blind-swap cassette (BSC) is used to house the full-high adapters that go into these slots.
The BSC is the same BSC that was used with the previous generation server's #5802, #5803,
#5877, and #5873 12X attached I/O drawers.
Figure 2-20 shows the back view of the PCIe Gen3 I/O Expansion Drawer.
Figure 2-20 Rear view of the PCIe Gen3 I/O Expansion Drawer
Cable lengths: Use the 2.0 m cables for intra-rack installations. Use the 10.0 m or 20.0 m
cables for inter-rack installations.
A minimum of one PCIe3 Optical Cable Adapter for PCIe3 Expansion Drawer (#EJ07) is
required to connect to the PCIe3 6-slot fan-out module1 in the I/O expansion drawer. The top
port of the fan-out module must be cabled to the top port of the #EJ07 port. Also, the bottom
two ports must be cabled together.
1 Cabling rules and considerations apply to both supported fan-out modules #EMXG and #EMXF.
Figure 2-21 shows the connector locations for the PCIe Gen3 I/O Expansion Drawer.
Figure 2-21 Connector locations for the PCIe Gen3 I/O Expansion Drawer
General rules for the PCI Gen3 I/O Expansion Drawer configuration
The PCIe3 Optical Cable Adapter (#EJ07) can be in any of the PCIe adapter slots in a
Power E980 system node. However, we advise that you first populate the PCIe adapter slots
with odd locations codes (P1-C1, P1-C3, P1-C5, and P1-C7) and then populate the adapter
slots with even location codes (P1-C2, P1-C4, P1-C6, and P1-C8).
Each processor module drives two PCIe Gen4 slots, and all slots are equal regarding their
bandwidth characteristics. If you first use the slots with odd location codes, you ensure that
one PCIe Gen4 slot per processor module is populated before you use the second PCIe
Gen4 slot of the processor modules. There is no preference for the order that you use to
populate the odd or even sequence locations.
Table 2-23 shows the PCIe adapter slot priorities in the Power E980 system. If the sequence
within the odd location codes and the sequence within the even locations codes is chosen as
shown in the slot priority column, the adapters are assigned to the SCM in alignment with the
internal enumeration order: SCM0, SCM1, SCM2, and SCM3.
EJ07 PCIe3 Optical Cable Adapter for PCIe3 Expansion Drawer 1, 7, 3, 5, 2, 8, 4, and
The following figures show several examples of supported configurations. For simplification,
we have not shown every possible combination of the I/O expansion drawer to server
Figure 2-23 shows an example of a single system node and two PCI Gen3 I/O Expansion
I/O Drawer
System node
I/O Drawer
Figure 2-23 Example of a single system node and two I/O drawers
Figure 2-24 shows an example of two system nodes and two PCI Gen3 I/O expansion
I/O Drawer
System node
I/O Drawer
System node
Figure 2-24 Example of two system nodes and two I/O drawers
Figure 2-25 shows an example of two system nodes and four PCI Gen3 I/O expansion
I/O Drawer
System node
I/O Drawer
System node
I/O Drawer
I/O Drawer
Figure 2-25 Example of two system nodes and four I/O drawers
To maximize configuration flexibility and space utilization, the system node of a Power E980
system does not have integrated SAS bays or integrated SAS controllers. PCIe SAS adapters
and the EXP24S drawer can be used to provide direct-access storage.
To further reduce possible single points of failure, the EXP24S drawer configuration rules that
are consistent with previous Power Systems servers are used. IBM i configurations require
the drives to be protected (RAID or mirroring). Protecting the drives is highly advised, but not
required for other OSes. All Power OS environments that are using SAS adapters with write
cache require the cache to be protected by using pairs of adapters.
With AIX, Linux, and Virtual I/O Server (VIOS), you can order the EXP24S drawer with four
sets of six bays, two sets of 12 bays, or one set of 24 bays (mode 4, 2, or 1). With IBM i, you
can order the EXP24S drawer as one set of 24 bays (mode 1).
Figure 2-26 shows the front of the unit and the groups of disks on each mode.
Figure 2-26 EXP24S drawer front view with location codes and disk groups
Mode setting is done by IBM Manufacturing. The stickers indicate whether the enclosure is
set to mode 1, mode 2, or mode 4. They are attached to the lower-left shelf of the chassis (A)
and the center support between the Enclosure Services Manager (ESM) modules (B).
Figure 2-27 Mode sticker locations at the rear of the 5887 disk drive enclosure
The EXP24S SAS ports are attached to a SAS PCIe adapter or pair of adapters by using SAS
YO or X cables. The cable length varies depending on the FC, and the proper length should
be calculated by considering the routing for proper airflow and ease of handling.
The EXP24S drawer can support up to 24 SAS SFF Gen2 disks. Table 2-24 lists the available
disk options.
ESD2 59CD 1.1 TB 10 K RPM SAS SFF-2 Disk Drive (IBM i) IBM i
ESF2 58DA 1.1 TB 10 K RPM SAS SFF-2 Disk Drive 4 K Block - 4224 IBM i
ESD3 0 1.2 TB 10 K RPM SAS SFF-2 Disk Drive (AIX/Linux) AIX and
ESF3 0 1.2 TB 10 K RPM SAS SFF-2 Disk Drive 4 K Block - 4096 AIX and
ESFS 0 1.7 TB 10 K RPM SAS SFF-2 Disk Drive 4 K Block - 4224 IBM i
ESFT 0 1.8 TB 10 K RPM SAS SFF-2 Disk Drive 4 K Block - 4096 AIX and
ES80 1.9 TB Read Intensive SAS 4 K SFF-2 SSD for AIX/Linux AIX and
ES81 1.9 TB Read Intensive SAS 4 K SFF-2 SSD for IBM i IBM i
ES62 3.86 - 4.0 TB 7200 RPM 4 K SAS LFF-1 Nearline Disk Drive AIX and
(AIX/Linux) Linux
ES64 7.72 - 8.0 TB 7200 RPM 4 K SAS LFF-1 Nearline Disk Drive AIX and
(AIX/Linux) Linux
ESEY 0 283 GB 15 K RPM SAS SFF-2 4 K Block - 4224 Disk Drive IBM i
1948 19B1 283 GB 15 K RPM SAS SFF-2 Disk Drive (IBM i) IBM i
ESEZ 0 300 GB 15 K RPM SAS SFF-2 4 K Block - 4096 Disk Drive AIX and
1953 300 GB 15 K RPM SAS SFF-2 Disk Drive (AIX/Linux) AIX and
ES78 387 GB SFF-2 SSD 5xx eMLC4 for AIX/Linux AIX and
1962 19B3 571 GB 10 K RPM SAS SFF-2 Disk Drive (IBM i) IBM i
ESEU 59D2 571 GB 10 K RPM SAS SFF-2 Disk Drive 4 K Block - 4224 IBM i
ESFN 0 571 GB 15 K RPM SAS SFF-2 4 K Block - 4224 Disk Drive IBM i
ESDN 571 GB 15 K RPM SAS SFF-2 Disk Drive - 528 Block (IBM i) IBM i
1964 600 GB 10 K RPM SAS SFF-2 Disk Drive (AIX/Linux) AIX and
ESEV 0 600 GB 10 K RPM SAS SFF-2 Disk Drive 4 K Block - 4096 AIX and
ESFP 0 600 GB 15 K RPM SAS SFF-2 4 K Block - 4096 Disk Drive AIX and
ESDP 600 GB 15 K RPM SAS SFF-2 Disk Drive - 5xx Block AIX and
(AIX/Linux) Linux
ES7E 775 GB SFF-2 SSD 5xx eMLC4 for AIX/Linux AIX and
1738 19B4 856 GB 10 K RPM SAS SFF-2 Disk Drive (IBM i) IBM i
1752 900 GB 10 K RPM SAS SFF-2 Disk Drive (AIX/Linux) AIX and
There are six SAS connectors at the rear of the EXP24S drawer to which two SAS adapters
or controllers are attached. They are labeled T1, T2, and T3; there are two T1, two T2, and
two T3 connectors. While configuring the drawer, special configuration FCs indicate for IBM
Manufacturing the mode of operation in which the disks and ports will be split:
In mode 1, two or four of the six ports are used. Two T2 ports are used for a single SAS
adapter, and two T2 and two T3 ports are used with a paired set of two adapters or dual
adapters configuration.
In mode 2 or mode 4, four ports are used, two T2 and two T3, to access all SAS bays.
Figure 2-29 shows the rear connectors of the EXP24S drawer, how they relate to the modes
of operation, and disk grouping.
X Cables Required
1 1 1
Shared Ports
2 2 2
T3 T3 T3 T3 T3
3 3 T3 3
4 4 4
T2 5 T2 T2 5 T2 5 T2
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
T1 T1 T1 T1 T1 T1
11 11 11
12 12 12
13 13 13
14 14 14
15 15 15
16 16 16
17 17 17
18 18 18
19 19 19
20 20 20
21 21 21
22 22 22
23 23 23
24 24 24
Figure 2-29 Rear view of EXP24S drawer with the 3 three modes of operation and the disks that are
assigned to each port
An EXP24S drawer in mode 4 can be attached to two or four SAS controllers and provide
high configuration flexibility. An EXP24S drawer in mode 2 has similar flexibility. Up to
24 HDDs can be supported by any of the supported SAS adapters or controllers.
Note: Not all possible scenarios are included. For more information about supported
scenarios, see “Planning for serial-attached SCSI cables” in IBM Knowledge Center.
Figure 2-30 shows the connection diagram and components of the solution.
SAS YO Cable
SAS adapter #EJ0J
Figure 2-31 shows the connection diagram and components of the solution.
SAS YO Cable
port on
the pair
The ports that are used on the SAS adapters must be the same for both adapters of the pair.
There is no SSD support for this scenario.
Figure 2-32 shows the connection diagram and components of the solution.
2 x SAS X Cables
Virtual Virtual
I/O I/O #EJ0J Use
port on Server Server same
port on
the pair
#1 #2 the pair
Figure 2-32 Dual Virtual I/O Servers sharing a single EXP24S drawer
The ports that are used on the SAS adapters must be the same for both adapters of the pair.
There is no SSD support for this scenario.
Figure 2-33 shows the connection diagram and components of the solution.
2 x SAS YO Cables
Virtual Virtual
Server Server
#1 #2
2 x SAS YO Cables
Figure 2-33 Dual Virtual I/O Servers sharing two EXP24S drawers
Figure 2-34 shows the connection diagram and components of the solution.
2 x SAS X Cables
G1 G2 G3 G4
Virtual I/O
#EJ0J Server #1 #EJ0J
Virtual I/O
#EJ0J Server #2 #EJ0J
Virtual I/O
#EJ0J Server #3 #EJ0J
Virtual I/O
#EJ0J Server #4 #EJ0J
G1 G2 G3 G4
2 x SAS X Cables
Figure 2-34 Four Virtual I/O Servers sharing two EXP24S drawers
Other scenarios
For more information about direct connection to logical partitions (LPARs), different adapters,
and cables, see “5887 disk drive enclosure” in IBM Knowledge Center.
The following PCIe3 SAS adapters support the EXP24SX and EXP 12SX drawers:
PCIe3 RAID SAS Adapter Quad-port 6 Gb x8 (#EJ0J)
PCIe3 LP RAID SAS Adapter Quad-Port 6 Gb x8 (#EJ0M)
PCIe3 RAID SAS quad-port 6 Gb LP Adapter (#EL3B)
PCIe3 12 GB Cache RAID Plus SAS Adapter Quad-port 6 Gb x8 (#EJ14)
IBM i configurations require the drives to be protected (RAID or mirroring). Protecting the
drives is highly advised, but not required for other OSes. All Power Systems OS environments
that are using SAS adapters with write cache require the cache to be protected by using pairs
of adapters.
The EXP24SX and EXP12SX drawers have many high-reliability design points:
SAS bays that support hot-swap.
Redundant and hot-plug power and fan assemblies.
Dual power cords.
Redundant and hot-plug Enclosure Services Managers (ESMs).
Redundant data paths to all drives.
LED indicators on drives, bays, ESMs, and power supplies that support problem
Through the SAS adapters/controllers, drives that can be protected with RAID and
mirroring and hot-spare capability.
For the EXP24SX drawer, a maximum of twenty-four 2.5-inch SSDs or 2.5-inch HDDs
are supported in the #ESLS 24 SAS bays. There can be no mixing of HDDs and SSDs
in the same mode 1 drawer. HDDs and SSDs can be mixed in a mode 2 or mode 4
drawer, but they cannot be mixed within a logical split of the drawer. For example, in a
mode 2 drawer with two sets of 12 bays, one set can hold SSDs and one set can hold
HDDs, but you cannot mix SSDs and HDDs in the same set of 12-bays.
The EXP24S, EXP24SX, and EXP12SX drawers can be mixed on the same server and
on the same PCIe3 adapters.
The EXP12SX drawer does not support SSD.
The cables that are used to connect an #ESLL or #ESLS storage enclosure to a server are
different from the cables that are used with the 5887 disk drive enclosure. Attachment
between the SAS controller and the storage enclosure SAS ports is through the appropriate
SAS YO12 or X12 cables. The PCIe Gen3 SAS adapters support 6 Gb throughput. The
EXP12SX drawer supports up to 12 Gb throughput if future SAS adapters support that
There are six SAS connectors at the rear of the EXP24SX and EXP12SX drawers to which
SAS adapters or controllers are attached. They are labeled T1, T2, and T3; there are two T1,
two T2, and two T3 connectors.
In mode 1, two or four of the six ports are used. Two T2 ports are used for a single SAS
adapter, and two T2 and two T3 ports are used with a paired set of two adapters or a dual
adapters configuration.
In mode 2 or mode 4, four ports are used, two T2s and two T3 connectors, to access all
the SAS bays.
Figure 2-35 shows the connector locations for the EXP24SX and EXP12SX storage
Figure 2-35 Connector locations for the EXP24SX and EXP12SX storage enclosures
Mode setting is done by IBM Manufacturing. If you need to change the mode after installation,
ask your IBM System Services Representative (IBM SSR) for support and direct them Mode
Change on Power EXP24SX and EXP12SX SAS Storage Enclosures (Features #ESLL,
For more information about SAS cabling and cabling configurations, see “Connecting an
#ESLL or #ESLS storage enclosure to your system” in IBM Knowledge Center.
XIV Storage Systems extend ease of use with integrated management for large and multi-site
XIV deployments, reducing operational complexity and enhancing capacity planning. For
more information, see IBM XIV Storage System.
Additionally, the IBM System Storage DS8000 includes a range of features that automate
performance optimization and application quality of service, and also provide the highest
levels of reliability and system uptime. For more information, see IBM Knowledge Center.
In addition, the VIOS can be installed in special partitions that provide support to other
partitions running AIX or Linux OSes for using features such as virtualized I/O devices,
PowerVM Live Partition Mobility (LPM), or PowerVM Active Memory Sharing.
For more information about the software that is available on Power Systems, see IBM Power
Systems Software.
IBM periodically releases maintenance packages (service packs or technology levels) for the
AIX operating system. Information about these packages, downloading, and obtaining the
CD-ROM can be found at Fix Central.
The Fix Central website also provides information about how to obtain the fixes that are
included on the CD-ROM.
The Service Update Management Assistant (SUMA), which can help you automate the task
of checking and downloading operating system downloads, is part of the base operating
system. For more information about the suma command, see IBM Knowledge Center.
Table 2-25 shows minimum supported AIX levels when using any I/O configuration.
Table 2-26 shows the minimum supported AIX levels when using virtual I/O only.
Table 2-26 Supported minimum AIX levels for virtual I/O only
Version Technology level Service pack Planned availability
For compatibility information for hardware features and the corresponding AIX Technology
Levels, see IBM Prerequisites.
2.9.2 IBM i
IBM i is supported on the Power E980 server by the following minimum required levels:
IBM i 7.2 TR9 or later
IBM i 7.3 TR5 or later
For compatibility information for hardware features and the corresponding IBM i Technology
Levels, see IBM Prerequisites.
The supported versions of Linux on the Power E980 server are as follows:
Red Hat Enterprise Linux 7.5 for Power LE (p8compat) or later
SUSE Linux Enterprise Server 12 Service Pack 3 or later
SUSE Linux Enterprise Server for SAP with SUSE Linux Enterprise Server 12 Service
Pack 3 or later
SUSE Linux Enterprise Server for SAP with SUSE Linux Enterprise Server 11 Service
Pack 4
SUSE Linux Enterprise Server 15
Ubuntu 16.04.4
Learn about developing on the IBM Power Architecture®, find packages, get access to cloud
resources, and discover tools and technologies by going to the Linux on IBM Power Systems
Developer Portal.
The IBM Advance Toolchain for Linux on Power is a set of open source compilers, runtime
libraries, and development tools that you can use to take leading-edge advantage of POWER
hardware features on Linux. For more information, see Advance toolchain for Linux on Power.
For more information about SUSE Linux Enterprise Server, see SUSE Linux Enterprise
For more information about Red Hat Enterprise Linux, see Red Hat Enterprise Linux.
IBM regularly updates the VIOS code. For more information, see Fix Central.
Chapter 3. Virtualization
Virtualization is a key factor for productive and efficient use of IBM Power Systems servers. In
this chapter, you find a brief description of virtualization technologies that are available for
POWER9 processor-based systems. The following IBM Redbooks publications provide more
information about the virtualization features:
IBM PowerVM Best Practices, SG24-8062
IBM PowerVM Virtualization Introduction and Configuration, SG24-7940
IBM PowerVM Virtualization Active Memory Sharing, REDP-4470
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590
IBM Power Systems SR-IOV: Technical Overview and Introduction, REDP-5065
Combined with features in the POWER9 processors, the IBM POWER Hypervisor™ delivers
functions that enable other system technologies, including logical partitioning (LPAR)
technology, virtualized processors, IEEE virtual local area network (VLAN)-compatible virtual
switch, virtual SCSI adapters, virtual Fibre Channel adapters, and virtual consoles. The
POWER Hypervisor is a basic component of the system’s firmware and offers the following
Provides an abstraction between the physical hardware resources and the LPARs that use
Enforces partition integrity by providing a security layer between LPARs.
Controls the dispatch of virtual processors to physical processors.
Saves and restores all processor state information during a logical processor context
Controls hardware I/O interrupt management facilities for LPARs.
Provides VLAN channels between LPARs that help reduce the need for physical Ethernet
adapters for inter-partition communication.
Monitors the service processor (SP) and performs a reset or reload if it detects the loss of
the SP, notifying the operating system if the problem is not corrected.
The POWER Hypervisor is always active, regardless of the system configuration or whether it
is connected to the managed console. It requires memory to support the resource
assignment of the LPARs on the server. The amount of memory that is required by the
POWER Hypervisor firmware varies according to several factors:
Memory usage for hardware page tables (HPTs)
Memory usage to support I/O devices
Memory usage for virtualization
The amount of memory for the HPT is based on the maximum memory size of the partition
and the HPT ratio. The default HPT ratio is 1/128th (for AIX, Virtual I/O Server (VIOS), and
Linux partitions) of the maximum memory size of the partition. AIX, VIOS, and Linux use
larger page sizes (16 and 64 KB) instead of using 4 KB pages. Using larger page sizes
reduces the overall number of pages that must be tracked, so the overall size of the HPT can
be reduced. As an example, for an AIX partition with a maximum memory size of 256 GB, the
HPT would be 2 GB.
When defining a partition, the maximum memory size that is specified should be based on the
amount of memory that can be dynamically added to the dynamic partition (DLPAR) without
having to change the configuration and restart the partition.
In addition to setting the maximum memory size, the HPT ratio can also be configured. The
hpt_ratio parameter for the chsyscfg Hardware Management Console (HMC) command can
be issued to define the HPT ratio that is used for a partition profile. The valid values are 1:32,
1:64, 1:128, 1:256, or 1:512. Specifying a smaller absolute ratio (1/512 is the smallest value)
decreases the overall memory that is assigned to the HPT. Testing is required when changing
the HPT ratio because a smaller HPT might incur more CPU consumption because the
operating system might need to reload the entries in the HPT more frequently. Most
customers choose to use the IBM provided default values for the HPT ratios.
For physical I/O devices, the base amount of space for the TCEs is defined by the hypervisor
based on the number of I/O devices that are supported. A system that supports high-speed
adapters can also be configured to allocate more memory to improve I/O performance. Linux
is the only operating system that uses these additional TCEs so that the memory can be freed
for use by partitions if the system is using only AIX.
The POWER Hypervisor must set aside save areas for the register contents for the maximum
number of virtual processors that is configured. The greater the number of physical hardware
devices, the greater the number of virtual devices, the greater the amount of virtualization,
and the more hypervisor memory is required. For efficient memory consumption, wanted and
maximum values for various attributes (processors, memory, and virtual adapters) should be
based on business needs, and not set to values that are significantly higher than actual
The POWER Hypervisor provides the following types of virtual I/O adapters:
Virtual SCSI
Virtual Ethernet
Virtual Fibre Channel
Virtual (TTY) console
Virtual SCSI
The POWER Hypervisor provides a virtual SCSI mechanism for the virtualization of storage
devices. The storage virtualization is accomplished by using two paired adapters: a virtual
SCSI server adapter and a virtual SCSI client adapter.
Virtual Ethernet
The POWER Hypervisor provides a virtual Ethernet switch function that allows partitions
either fast and secure communication on the same server without any need for physical
interconnection or connectivity outside of the server if a Layer 2 bridge to a physical Ethernet
adapter is set in one VIOS partition, also known as Shared Ethernet Adapter (SEA).
On Power Systems servers, partitions can be configured to run in several modes, including
the following modes:
POWER7 compatibility mode
This is the mode for POWER7+ and POWER7 processors, implementing Version 2.06 of
the IBM Power Instruction Set Architecture (ISA). For more information, see
IBM Knowledge Center.
POWER8 compatibility mode
This is the native mode for POWER8 processors implementing Version 2.07 of the IBM
Power ISA. For more information, see IBM Knowledge Center.
POWER9 compatibility mode
This is the native mode for POWER9 processors implementing Version 3.0 of the IBM
Power ISA. For more information, see IBM Knowledge Center.
Figure 3-1 shows the available processor modes on a POWER9 processor-based system.
Processor compatibility mode is important when Live Partition Mobility (LPM) migration is
planned between different generation of servers. An LPAR that potentially might be migrated
to a machine that is managed by a processor from another generation must be activated in a
specific compatibility mode.
Table 3-1 shows an example where the processor mode must be selected when a migration
from POWER9 to POWER8 is planned.
POWER9 POWER9 Fails because the wanted processor Fails because the wanted processor
mode is not supported on the mode is not supported on the
destination. destination.
POWER9 POWER8 Fails because the wanted processor Fails because the wanted processor
mode is not supported on the mode is not supported on the
destination. destination.
AME is a technology that allows the effective maximum memory capacity to be much larger
than the true physical memory maximum. Compression and decompression of memory
content can allow memory expansion up to 1000% for AIX partitions, which in turn enables a
partition to perform more work or support more users with the same physical amount of
memory. Similarly, it can allow a server to run more partitions and do more work for the same
physical amount of memory.
Note: The AME feature is not supported by IBM i and the Linux operating systems.
SR-IOV is PCI standard architecture that enables PCIe adapters to become self-virtualizing. It
enables adapter consolidation through sharing, much like logical partitioning enables server
consolidation. With an adapter capable of SR-IOV, you can assign virtual slices of a single
physical adapter to multiple partitions through logical ports; all of this is done without a VIOS.
For more information, see IBM Power Systems SR-IOV: Technical Overview and Introduction,
3.4 PowerVM
The PowerVM platform is the family of technologies, capabilities, and offerings that delivers
industry-leading virtualization on Power Systems servers. It is the umbrella branding term for
Power Systems virtualization (logical partitioning, IBM Micro-Partitioning®, POWER
Hypervisor, VIOS, LPM, and more). As with Advanced Power Virtualization in the past,
PowerVM is a combination of hardware enablement and software.
Note: PowerVM Enterprise Edition License Entitlement is now included with each Power
E980 server. PowerVM Enterprise Edition is available as a hardware feature (#EPVV) and
supports up to 20 partitions per core, VIOS, and multiple shared processor pools (MSPPs).
It also offers LPM, Active Memory Sharing, and IBM PowerVP™ performance monitoring.
Logical partitions
LPARs and virtualization increase the usage of system resources and add a level of
configuration possibilities.
Logical partitioning is the ability to make a server that is run as though it were two or more
independent servers. When you logically partition a server, you divide the resources on the
server into subsets called LPARs. You can install software on an LPAR, and the LPAR runs as
an independent logical server with the resources that you allocated to the LPAR. LPAR is the
equivalent of a virtual machine (VM).
You can assign processors, memory, and input/output devices to LPARs. You can run AIX and
Linux, and VIOS in LPARs. VIOS provides virtual I/O resources to other LPARs with
general-purpose operating systems.
LPARs share a few system attributes, such as the system serial number, system model, and
processor FCs. All other system attributes can vary from one LPAR to another.
When you use the Micro-Partitioning technology, you can allocate fractions of processors to
an LPAR. An LPAR that uses fractions of processors is also known as a shared processor
partition or micropartition. Micropartitions run over a set of processors that is called a shared
processor pool (SPP), and virtual processors are used to let the operating system manage
the fractions of processing power that are assigned to the LPAR. From an operating system
perspective, a virtual processor cannot be distinguished from a physical processor unless the
operating system is enhanced to determine the difference. Physical processors are
abstracted into virtual processors that are available to partitions.
On the POWER9 processors, a partition can be defined with a processor capacity as small as
0.05 processing units. This number represents 0.05 of a physical core. Each physical core
can be shared by up to 20 shared processor partitions, and the partition’s entitlement can be
incremented fractionally by as little as 0.05 of the processor. The shared processor partitions
are dispatched and time-sliced on the physical processors under the control of the POWER
Hypervisor. The shared processor partitions are created and managed by the HMC.
The Power E980 server supports up to 192 cores in a single system. Here are the maximum
192 dedicated partitions
1000 micropartitions (1000 is the maximum that is supported by PowerVM.)
The maximum amounts are supported by the hardware, but the practical limits depend on
application workload demands.
Processing mode
When you create an LPAR, you can assign entire processors for dedicated use, or you can
assign partial processing units from an SPP. This setting defines the processing mode of the
Dedicated mode
In dedicated mode, physical processors are assigned as a whole to partitions. The SMT
feature in the POWER9 processor core allows the core to run instructions from two, four, or
eight independent software threads simultaneously.
Shared mode
In shared mode, LPARs use virtual processors to access fractions of physical processors.
Shared partitions can define any number of virtual processors (the maximum number is 20
times the number of processing units that are assigned to the partition). The POWER
Hypervisor dispatches virtual processors to physical processors according to the partition’s
processing units entitlement. One processing unit represents one physical processor’s
processing capacity. All partitions receive a total CPU time equal to their processing unit’s
entitlement. The logical processors are defined on top of virtual processors. So, even with a
virtual processor, the concept of a logical processor exists, and the number of logical
processors depends on whether SMT is turned on or off.
Micropartitions are created and then identified as members of either the default processor
pool or a user-defined SPP. The virtual processors that exist within the set of micropartitions
are monitored by the POWER Hypervisor, and processor capacity is managed according to
user-defined attributes.
If the Power Systems server is under heavy load, each micropartition within an SPP is
assured of its processor entitlement, plus any capacity that it might be allocated from the
reserved pool capacity if the micropartition is uncapped.
If certain micropartitions in an SPP do not use their capacity entitlement, the unused capacity
is ceded and other uncapped micropartitions within the same SPP are allocated the
additional capacity according to their uncapped weighting. In this way, the entitled pool
capacity of an SPP is distributed to the set of micropartitions within that SPP.
All Power Systems servers that support the MSPPs capability have a minimum of one (the
default) SPP and up to a maximum of 64 SPPs.
Virtual SCSI
Physical Ethernet Virtual Ethernet Adapter
Adapter Adapter
Virtual I/O Client 2
Physical Disk Virtual SCSI
Adapter Adapter Virtual Ethernet
Disk Virtual SCSI
By using the SEA, several client partitions can share one physical adapter, and you can
connect internal and external VLANs by using a physical adapter. The SEA service can be
hosted only in the VIOS, not in a general-purpose AIX or Linux partition, and acts as a Layer
2 network bridge to securely transport network traffic between virtual Ethernet networks
(internal) and one or more (Etherchannel) physical network adapters (external). These virtual
Ethernet network adapters are defined by the POWER Hypervisor on the VIOS.
Virtual SCSI
Virtual SCSI is used to view a virtualized implementation of the SCSI protocol. Virtual SCSI is
based on a client/server relationship. The VIOS LPAR owns the physical resources and acts
as a server or, in SCSI terms, a target device. The client LPARs access the virtual SCSI
backing storage devices that are provided by the VIOS as clients.
The virtual I/O adapters (a virtual SCSI server adapter and a virtual SCSI client adapter) are
configured by using a managed console or through the Integrated Virtualization Management
(IVM) on smaller systems. The virtual SCSI server (target) adapter is responsible for running
any SCSI commands that it receives. It is owned by the VIOS partition. The virtual SCSI client
adapter allows a client partition to access physical SCSI and SAN-attached devices and
LUNs that are assigned to the client partition. The provisioning of virtual disk resources is
provided by the VIOS.
N_Port ID Virtualization
N_Port ID Virtualization (NPIV) is a technology that allows multiple LPARs to access
independent physical storage through the same physical Fibre Channel adapter. This adapter
is attached to a VIOS partition that acts only as a pass-through, managing the data transfer
through the POWER Hypervisor.
Each partition has one or more virtual Fibre Channel adapters, each with their own pair of
unique worldwide port names, enabling you to connect each partition to independent physical
storage on a SAN. Unlike virtual SCSI, only the client partitions see the disk.
For more information and requirements for NPIV, see IBM PowerVM Virtualization Managing
and Monitoring, SG24-7590.
LPM provides systems management flexibility and improves system availability by:
Avoiding planned outages for hardware upgrade or firmware maintenance.
Avoiding unplanned downtime. With preventive failure management, if a server indicates a
potential failure, you can move its LPARs to another server before the failure occurs.
For more information and requirements for LPM, see IBM PowerVM Live Partition Mobility,
The physical memory of a Power Systems server can be assigned to multiple partitions in
either dedicated or shared mode. A system administrator can assign some physical memory
to a partition and some physical memory to a pool that is shared by other partitions. A single
partition can have either dedicated or shared memory:
With a pure dedicated memory model, the system administrator’s task is to optimize
available memory distribution among partitions. When a partition suffers degradation
because of memory constraints and other partitions have unused memory, the
administrator can manually issue a dynamic memory reconfiguration.
With a shared memory model, the system automatically decides the optimal distribution of
the physical memory to partitions and adjusts the memory assignment based on partition
load. The administrator reserves physical memory for the shared memory pool, assigns
partitions to the pool, and provides access limits to the pool.
Active Memory Deduplication allows the POWER Hypervisor to map dynamically identical
partition memory pages to a single physical memory page within a shared memory pool. This
way enables a better usage of the Active Memory Sharing shared memory pool, increasing
the system’s overall performance by avoiding paging. Deduplication can cause the hardware
to incur fewer cache misses, which also leads to improved performance.
Active Memory Deduplication depends on the Active Memory Sharing feature being available,
and it uses CPU cycles that are donated by the Active Memory Sharing pool’s VIOS partitions
to identify deduplicated pages. The operating systems that are running on the Active Memory
Sharing partitions can suggest to the POWER Hypervisor that some pages (such as
frequently referenced read-only code pages) are good for deduplication.
HMC V9R2 provides the following enhancements to the Remote Restart feature.
Remote restart a partition with reduced or minimum CPU/memory on the target system.
Remote restart by choosing a different virtual switch on the target system.
Remote restart the partition without turning on the partition on the target system.
Remote restart the partition for test purposes when the source-managed system is in the
Operating or Standby state.
Remote restart through the REST API.
The remainder of this section provides reference lists of strategic RAS features that are
implemented in Power E980 servers. More detailed explanations for individual features are
given in subsequent sections of this chapter and in Power Processor-Based Systems RAS.
The POWER9 processor-based Power E980 RAS enhancements, which are unique to this
enterprise class server, are as follows:
Internode symmetric multiprocessing (SMP) cable redundancy
8-bit wide (x8) dual inline memory module (DIMM) ranks with Chipkill correction
Custom DIMM (CDIMM) support with extra spare DRAM chips
Asynchronous clocking across system nodes
Redundant system clock with concurrent failover per system node
Redundant Flexible Service Processor (FSP) with concurrent failover
Dedicated receptacles that are not on the FSP card, but are on the system control unit
(SCU) rear panel to facilitate FSP to system node interconnects, which improves the
serviceably of the FSP card
A SCU drawer with concurrent maintenance support for fans and redundant electrical
power that is supplied by up to two system nodes
Voltage regulators with N+1 phase redundancy and integrated spare
RAS features that are specific to the enterprise class servers and are shared by the
Power E980 and Power E950 systems are as follows:
Core-contained checkstops
Extended L2/L3/L4 cache line-delete
IBM memory buffer support and spare DRAM module capacity with 4 bit wide (x4) DIMMs
Memory row repair
Active Memory Mirroring (AMM) for Hypervisor
Internal Non-Volatile Memory Express (NVMe) drive boot support
Voltage regulators with N+1 phase redundancy
Redundant/spare voltage phases on voltage converters for levels feeding processors,
Power E980 memory CDIMMs, or Power E950 memory riser cards
PCIe3 optical cable adapter with new routing for clock logic within the card and extra
recovery procedures for faults during initial program load (IPL)
The following list shows the POWER9 processor base RAS features that are shared among
all POWER9 processor-based systems. It also shows important infrastructure-related RAS
features that pertain to the Power E980 system but are shared with the POWER9 scale-out
servers Power System S914, Power System S922, Power System S924, Power System
H922, and Power System H924.
Traditional POWER9 processor RAS features include first failure data capture (FFDC),
processor instruction retry, L2/L3 cache error correction code (ECC) protection with cache
line-delete, and a power/cooling monitor function integrated into an on chip controller
OCC error handling with power safe mode
New POWER9 cyclic redundancy check (CRC) including retry capability and spare data
lane support for the processor fabric bus
Memory ECC with Chipkill handling
Memory scrubbing
Memory preserving IPL
Dynamic memory relocation
Enhanced error handling (EEH) for all adapters
I/O adapter concurrent maintenance with PowerVM virtualization or operating system
(OS) software-based redundancy support
Hot-swap direct access storage devices (DASDs)
At least n+1 redundancy and concurrent maintenance support for power supplies and fans
of each system node
Power cord redundancy
Redundant vital product data (VPD)
Emergency power-off (EPOW) reporting
Concurrent firmware updates
Table 4-1 provides a comparison between IBM POWER9 scale-out and POWER9
enterprise-class systems regarding significant RAS features:
4.2 Reliability
Highly reliable systems are built with highly reliable components. On IBM POWER
processor-based systems, this basic principle is expanded upon by using a clear design for
the reliability architecture and methodology. A concentrated, systematic, and
architecture-based approach improves the overall system reliability with each successive
generation of system offerings. Reliability can be improved in primarily three ways:
Reducing the number of components
Using higher reliability grade parts
Reducing the stress on the components
In the POWER9 processor-based systems, elements of all three are used to improve system
During the design and development process, subsystems go through rigorous verification and
integration testing processes. During system manufacturing, systems go through a thorough
testing process to help ensure the highest level of product quality.
Parts selection also plays a critical role in overall system reliability. IBM uses stringent design
criteria to select server-grade components that are extensively tested and qualified to meet
and exceed a minimum design life of 7 years. By selecting higher reliability grade
components, the frequency of all failures is lowered, and the failure of parts is not expected
within the OS life. Component failure rates can be further improved by burning in select
components or running the system before shipping it to the client. This period of high stress
removes the weaker components with higher failure rates, that is, it cuts off the front end of
the traditional failure rate bathtub curve (see Figure 4-1).
Power-on Hours
This design for availability begins with implementing an architecture for error detection and
fault isolation (ED/FI).
FFDC is the capability of IBM hardware and microcode to continuously monitor hardware
functions. Within the processor and memory subsystem, detailed monitoring is done by
circuits within the hardware components themselves. Fault information is gathered into fault
isolation registers (FIRs) and reported to the appropriate components for handling.
Processor and memory errors that are recoverable in nature are typically reported to the
dedicated SP that is built into each system. The dedicated SP then works with the hardware
to determine the course of action to be taken for each fault.
Tolerating a correctable solid fault runs the risk that the fault aligns with a soft error and
causes an uncorrectable error situation. There is also the risk that a correctable error is
predictive of a fault that continues to worsen over time, resulting in an uncorrectable error
You can predictively deallocate a component to prevent correctable errors from aligning with
soft errors or other hardware faults and causing uncorrectable errors to avoid such situations.
However, unconfiguring components, such as processor cores or entire caches in memory,
can reduce the performance or capacity of a system, which typically requires that the failing
hardware is replaced in the system. The resulting service action can also temporarily impact
system availability.
When such self-healing is successful, you avoid having to replace any hardware for a solid
correctable fault. The ability to predictively unconfigure a processor core is still available for
faults that cannot be repaired by self-healing techniques or because the sparing or
self-healing capacity is exhausted.
One extra advantage of the special ECC that is used in data error detection is that the
hardware can distinguish between an initial ECC error that is related to a specific component
of the data path and one that was passed along from earlier data transfer stages. This
advantage allows the correct component, the one originating the fault, to be reported as the
component to be replaced.
The advanced RAS features that are built into POWER9 processor-based systems handle
certain “uncorrectable” errors in ways that minimize the impact of the faults, even keeping an
entire system running after experiencing a failure.
Depending on the fault, a recovery might use the virtualization capabilities of PowerVM so
that the OS or any applications that are running in the system are not impacted or must
participate in the recovery.
Information about column and row repair operations is stored persistently for processors so
that more permanent repairs can be made during processor reinitialization (during system
restart, or individual Core Power on Reset by using the Power-On Reset Engine (PORE)).
Soft errors that are detected in the level 1 cache are also correctable by a try again operation
that is handled by the hardware. But instead of using error correcting code, intermittent L1
cache errors can be corrected by using data from elsewhere in the cache hierarchy. A portion
of an L1 cache can be disabled (set delete) to avoid outages due to persistent hard errors. If
too many errors are observed across multiple sets, the core that uses the L1 cache can be
predictively deallocated.
Separate from the system caches and the description above are cache directories that
provide indexing to the caches. These also have single-bit error correction, but uncorrectable
directory errors typically result in system checkstops.
Beyond soft error correction, the intent of the POWER9 design is to manage a solid
correctable error in an L2 or L3 cache by using techniques to delete a cache line with a
persistent issue.
Beyond the L1, L2, and L3 functional units, single-bit correcting ECC is used in multiple areas
of the processor as the standard means of protecting data against single-bit errors. This
includes a number of the internal buses where data is passed between units.
During the cache purge operation, the data that is stored in the cache line is corrected where
possible. If correction is not possible, the associated cache line is marked with a special ECC
code that indicates that the cache line itself has bad data.
Nothing within the system stops just because such an event is encountered. Rather, the
hardware monitors the usage of pages with marks. If such data is never used, hardware
replacement is requested, but nothing stops as a result of the operation. Software layers are
not required to handle such faults.
Only when data is loaded to be processed by a processor core or sent out to an I/O adapter is
any further action needed. In such cases, if data is used as owned by a partition, then the
partition OS might be responsible for stopping itself or just the program by using the marked
page. If data is owned by the hypervisor, then the hypervisor might choose to stop, resulting in
a system-wide outage.
However, the exposure to such events is minimized because cache-lines can be deleted,
which eliminates the repetition of an uncorrectable fault that is in a particular cache-line.
4.3.5 Cyclic redundancy check and lane repair for processor fabric buses
ECC is used internally in various data paths as data is transmitted between processor units.
However, externally to the processor, high-speed data buses can be susceptible to occasional
multiple bit errors due to electrical noise, timing drift, and various other factors.
With the introduction of the POWER9 processor, the CRC code for error detection, the retry
capability on error conditions, and the ability to substitute a faulty date lane are extended to
the processor fabric bus interfaces for both the onboard processor interconnect (X-bus) and
the internode processor interconnect (O-bus).
Processor instruction retry allows the system to recover from soft faults that otherwise result
in an outage of applications or the entire server.
Try again techniques are used in other parts of the system as well. Faults that are detected on
the memory bus that connects processor memory controllers to DIMMs can be tried again. In
POWER9 processor-based systems, the memory controller is designed with a replay buffer
that allows memory transactions to be tried again after certain faults internal to the memory
controller are detected. This complements the try again abilities of the memory buffer module
that is used in the Power E950 and Power E980 servers.
If such cases do occur, PowerVM can start a process for deallocating the failing processor
dynamically at run time. This process interacts with the OS that holds access to the processor
in question, and requires that control over the processor be ceded by the OS.
The core-contained checkstop technology allows PowerVM to be signaled when such faults
occur and stop the code that is being used by the failing processor core. This feature allows
the outage that is associated with the fault to be contained to the logical partition (LPAR) by
using the core that was being used when the uncorrectable fault occurred.
The core-contained checkstop feature is beneficial for scale-up IBM Power Systems servers
such as the Power E980 server, which typically host many LPARs. However, a core-contained
checkstop signaling that a fault occurred on a core running a hypervisor instruction typically
results in hypervisor termination and a full system outage.
Processor designs without processor instruction retry typically must resort to such techniques
for all faults that can be contained to an instruction in a processor core.
PowerVM can handle certain other hardware faults without stopping applications, such as an
error in specific data structures (faults in translation tables or lookaside buffers).
Along with this hub freeze behavior is what is termed as Enhanced Error Handling for I/O
(EEH for I/O). This capability signals device drivers when various PCIe bus-related faults
occur. Device drivers may attempt to restart the adapter after such faults (EEH recovery.)
A clock error in the PCIe clocking can be signaled and recovered by using EEH in any system
that incorporates redundant PCIe clocks with dynamic failover enabled.
Some severe faults require that memory under a portion of the controller becomes
inaccessible to prevent reliance on incorrect data. There are cases where the fault can be
limited to just one memory channel. In these cases, the memory controller asserts what is
known as a channel checkstop. In systems without hypervisor memory mirroring, a channel
checkstop usually results in a system outage. However, with hypervisor memory mirroring,
the hypervisor continues to operate despite the memory channel checkstop.
The auto-restart (restart) option, when enabled, can restart the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced failure.
The auto-restart (restart) option must be enabled from the Advanced System Management
Interface (ASMI).
Figure 4-2 shows the memory subsystem design of a POWER9 processor-based module that
is based on two memory controllers and eight DMI channels that connect to eight 32 GB
CDIMMs each.
POWER9 scale-up
Memory Ctrl
processor with
Memory controller 8 memory buses
• 128-byte cache line support supporting 8 CDIMMs
• Reply buffer to retry after soft internal faults
• Special uncorrectable error handling for solid faults
Memory Ctrl
Memory bus
• CRC protection with recalibration and retry on error Memory Bus
• Dynamic substitution of a failed data lane by spare lane
Memory buffer
• Retry after internal soft errors Memory
• L4 cache SED/DED ECC code One rank DIMM supporting two
with persistant correctable error handling Buffer
L4 128 bit data groups
• 4 memory buffer chip ports connect to DIMM rank
• 10 x8 DRAMs attached to each port:
8 DRAMs for data storage,
1 DRAM for ECC data,
1 spare DRAM
• 2 ports combined to form 128 data bit array:
8 read operations fill one 128-byte cache line line,
Second port group can be used to fill a second cache line Note: Bits used for data and for ECC are spread across
nine DRAMs to maximize error correction capability
The memory buffer chip is manufactured in 22-nm lithography and incorporates similar
technologies that are used by POWER9 processor-based functional units to avoid soft errors.
The integrated L4 cache is based on embedded DRAM (eDRAM) technology with soft error
hardening and persistent error handling features. The memory buffer implements a try again
for many internally detected faults. This function complements a replay buffer in the memory
controller in the processor, which also handles internally detected soft errors.
The bus between a processor memory controller and a CDIMM uses CRC error detection that
is coupled with the ability to retry a memory access operation in case a soft error occurs. The
bus features dynamic recalibration capabilities and a spare data lane that can be substituted
for a failing bus lane through the recalibration process.
The memory buffer on each CDIMM has four ports for communicating with DRAM modules.
For example, the 32 GB DDR4 DIMM features one rank that is composed of four ports of
eight columns-wide (x8) DRAM modules and each port contains 10 DRAM modules.
For each such port, there are eight DRAM modules worth of data (64 bits) and another DRAM
module’s worth of error correction and other such data. There also is a spare DRAM module
for each port that can be substituted for a failing DRAM chip.
Two ranks on different IS DIMMs are combined into a 128-bit ESS word. The ECC that is
deployed can correct the result of an entire DRAM module that is faulty. This process is also
known as Chipkill correction. Then, it can correct at least one other bit within the ECC word.
The extra spare DRAM modules are used so that when a CDIMM experiences a Chipkill
event within the DRAM modules under a port, the spare DRAM module can be substituted for
a failing module. This substitution avoids the need to replace the CDIMM for a single Chipkill
Depending on how DRAM modules fail, it might be possible to tolerate up to four DRAM
modules failing on a single CDIMM without needing to replace the CDIMM. Such a
deprecated CDIMM still supports the full memory access bandwidth and also can correct soft
errors within the functional DRAM chips through ECC.
In addition to the protection that is provided by the ECC and sparing capabilities, the memory
subsystem implements memory scrubbing to identify and correct single-bit soft-errors. The
PowerVM hypervisor is informed of incidents of single-cell persistent (hard) faults for
deallocation of associated pages. However, because of the ECC and sparing capabilities that
are used, such memory page deallocation is not relied upon for repair of faulty hardware.
Finally, should an uncorrectable error in data be encountered, the memory that is affected is
marked with a Special Uncorrectable Error (SUE) code and handled as described in 4.3.4,
“Cache uncorrectable error handling” on page 140.
EEH allows EEH-aware device drivers to try again after certain non-fatal I/O events to avoid
failover, especially in cases where a soft error is encountered. EEH also allows device drivers
to stop if there is an intermittent hard error or other unrecoverable errors while protecting
against reliance on data that cannot be corrected. This action often is done by “freezing”
access to the I/O subsystem with the fault. Freezing prevents data from flowing to and from an
I/O adapter and causes the hardware or firmware to respond with a defined error signature
whenever an attempt is made to access the device. If necessary, a SUE code can be used to
mark a section of data as bad when the freeze is first started.
IBM device drivers under AIX are fully EEH-capable. For Linux under PowerVM, EEH support
extends to many frequently used devices. There might be various third-party PCI devices that
do not provide native EEH support.
x8 x8 x8 x8
I/O Slot
I/O Slot
I/O Slot
I/O Slot
I/O Slot
I/O Slot
I/O Slot
I/O Slot
I/O Slot
I/O Slot
I/O Slot
Power /
Fan Power
Fans Power
These I/O drawers are attached by using a connecting card that is called a PCIe3 Optical
Cable Adapter (#EJ07) that plugs in to a PCIe slot of a Power E950 system node. The cable
cards for POWER9 processor-based servers are redesigned in certain areas to improve error
handling. These improvements include new routing for clock logic within the cable card and
extra recovery for faults during IPL.
Each I/O drawer contains up to two PCIe FanOut Modules (#EMXG). An I/O module uses
x16 PCIe lanes that are controlled from a processor in a system node. An I/O module that
uses a PCIe switch to supply six PCIe slots is supported.
Two Active Optical Cables (AOCs) are used to connect a PCIe3 cable adapter to the
equivalent card in the I/O drawer module. Although these cables are not redundant (since the
FW830 firmware), the loss of one cable reduces the I/O bandwidth (the number of lanes that
is available to the I/O module) by 50%.
Infrastructure RAS features for the I/O drawer include redundant power supplies, fans, and
DC outputs of voltage regulators (phases).
The impact of the failure of an I/O drawer component is summarized for the most significant
cases in Table 4-2.
Table 4-2 PCIe Gen3 I/O Expansion Drawer RAS feature matrix
Faulty component Impact of failure Impact of repair Prerequisites
I/O adapter in an I/O Loss of function of the I/O adapter can be Multipathing I/O
slot. I/O adapter. repaired while the rest adapter redundancy,
of the system where implemented,
continues to operate. can be used to prevent
application outages.
Other failure of PCIe3 Loss of access to all The associated I/O Systems with a
cable adapter in the the I/O of the module must be taken Hardware
system or I/O module. connected I/O module. down for repair; the Management Console
rest of the system can (HMC).
remain active.
VRM associated with System continues to The associated I/O Systems with an HMC.
an I/O module. run for a phase failure module cannot be
transition to n mode. active during repair;
Other faults impact all the rest of the system
the I/O in the module. can remain active.
Chassis Management No impact to the The I/O drawer must Systems with an HMC.
Card (CMC). running system, but be powered off to
after it is powered off, repair (loss of use of all
the I/O drawer cannot I/O in the drawer).
be reintegrated until
the CMC is repaired.
Midplane. Depending on the The I/O drawer must Systems with an HMC.
source of the failure, be powered off to
this failure might take repair (loss of use of all
down the entire I/O I/O in the drawer).
The following advanced RAS features pertain to the Power E980 enterprise-class system:
Redundant SP
The SP is an essential component of a system. It is responsible for IPL, setup, monitoring,
control, and management. The control units that are present on enterprise-class systems
house two redundant SPs. If there is a failure in either of the SPs, the second processor
ensures continued operation of the system until a replacement is scheduled. Even a
system with a single system node has dual SPs in the SCU.
Redundant system clock cards
Another component that is crucial to system operations is the system reference clock
source, which is responsible for providing a synchronized clock signal to all functional
units. Each system node of a Power E980 server uses its own private set of two redundant
system clock/control cards. If there is a failure in any of the clock/control cards, the second
card ensures continued operation of the system until a replacement is scheduled. Unlike
the POWER8 processor-based enterprise class servers Power E870, Power E870C,
Power E880, and Power E880C, the Power E980 system does not require a global
reference clock source in the system control.
Dynamic processor sparing
Enterprise-class systems are Capacity Upgrade on Demand (CUoD)-capable. Processor
sparing helps minimize the effect on server performance that is caused by a failed
processor. An inactive processor is activated if a failing processor reaches a
predetermined error threshold, which helps to maintain performance and improve system
Dynamic processor sparing happens dynamically and automatically when dynamic logical
partitioning (DLPAR) is used and the failing processor is detected before failure. Dynamic
processor sparing does not require purchasing an activation code. Instead, it requires only
that the system have inactive CUoD processor cores available.
By working in a redundant architecture, some tasks that require that a specific application to
be brought offline can now be done with the application running, which allows for greater
When determining a highly available architecture that fits your needs, consider the following
Will you need to move your workloads off an entire server during service or planned
If you use a clustering solution to move the workloads, how will the failover time affect your
If you use a server evacuation solution to move the workloads, how long does it take to
migrate all the partitions with your current server configuration?
4.7.1 Clustering
A Power Systems server that is running under PowerVM, AIX, and Linux support many
clustering solutions. These solutions meet requirements for application availability regarding
server outages and data center disaster management, reliable data backups, and so on.
These offerings include distributed applications with IBM Db2® PureScale, HA solutions that
use clustering technology with IBM PowerHA® SystemMirror®, and disaster management
across geographies with PowerHA SystemMirror Enterprise Edition.
The usage of redundant VIOS servers mitigates this risk. Maintaining the redundancy of
adapters within each VIOS (in addition to having redundant VIOSes) avoids most faults that
keep a VIOS from running. Therefore, multiple paths to networks and SANs are advised.
A partition that is accessing data from two distinct VIOSes, each one with multiple network
and SAN adapters to provide connectivity, is shown in Figure 4-4.
Logical Partition
SAN Adapter
LAN Adapter
SAN Adapter
LAN Adapter
LAN Adapter
LAN Adapter
Because each VIOS can be considered an AIX based partition, each VIOS also must access
a boot image, have paging space, and so on, under a root volume group or rootvg. The rootvg
can be accessed through a SAN, the same as the data that partitions use.
Alternatively, a VIOS can use the U.2 NVMe internal storage option that is available for
Power E980 systems. One system node of a Power E980 server supports four U.2 NVMe
devices, which can be individually assigned to independent partitions. To use storage that is
locally attached (DASD devices or SSDs), SAS adapters offer another option to provide boot
devices to VIOS partitions. However they are accessed, the rootvgs should use mirrored or
RAID drives with redundant access to the devices for best availability.
LPM provides systems management flexibility and improves system availability through the
following functions:
Avoid planned outages for hardware or firmware maintenance by moving LPARs to
another server and then performing the maintenance. LPM can help lead to zero
downtime for maintenance because you can use it to work around scheduled maintenance
Avoid downtime for a server upgrade by moving LPARs to another server and then
performing the upgrade. This approach allows your users to continue their work without
Avoid unplanned downtime. With preventive failure management, you can move a server’s
LPARs to another server before the failure occurs if a server indicates a potential failure.
Partition mobility can help avoid unplanned downtime.
Take advantage of server optimization:
– Consolidation: You can consolidate workloads that run on several small, underused
servers onto a single large server.
– Deconsolidation: You can move workloads from server-to-server to optimize resource
use and workload performance within your computing environment. With LPM, you can
manage workloads with minimal downtime.
Server Evacuation: This PowerVM function allows you to perform a server evacuation
operation. Server Evacuation is used to move all migration-capable LPARs from one
system to another if there are no active migrations in progress on the source or the target
With the Server Evacuation feature, multiple migrations can occur based on the concurrency
setting of the HMC. Migrations are performed as sets, with the next set of migrations starting
when the previous set completes. Any upgrade or maintenance operations can be performed
after all the partitions are migrated and the source system is powered off.
You can migrate all the migration-capable AIX, IBM i, and Linux partitions from the source
server to the destination server by running the following command from the HMC command
migrlpar -o m -m source_server -t target_server --all
For more information about LPM and how to implement it, see IBM PowerVM Virtualization
Introduction and Configuration, SG24-7940.
4.8 Serviceability
The purpose of serviceability is to repair the system while attempting to minimize or eliminate
service cost (within budget objectives) and maintain application availability and high customer
satisfaction. Serviceability includes system installation, Miscellaneous Equipment
Specification (MES) (system upgrades or fallbacks), and system maintenance or repair.
Depending on the system and warranty contract, service might be performed by the
customer, an IBM System Services Representative (SSR), or an authorized warranty service
The serviceability features that are delivered in this system provide a highly efficient service
environment by incorporating the following attributes:
Design for SSR Set Up and Customer Installed Features (CIFs).
The Guiding Light service indicator architecture is used to control a system of integrated
LEDs that lead the individual that services the machine to the correct part as quickly as
Service labels, service cards, and service diagrams are available on the system and
delivered through the HMC.
Step-by-step service procedures are available through the HMC.
This section provides an overview of how these attributes contribute to efficient service in the
progressive steps of error detection, analysis, reporting, notification, and repair found in all
POWER processor-based systems.
Although not all errors are a threat to system availability, those errors that go undetected can
cause problems because the system has no opportunity to evaluate and act if necessary.
POWER processor-based systems employ IBM Z® server-inspired error detection
mechanisms, which extend from processor cores and memory to power supplies and hard
disk drives (HDDs).
4.8.2 Error checkers, fault isolation registers, and first failure data capture
POWER processor-based systems contain specialized hardware detection circuitry that is
used to detect erroneous hardware operations. Error-checking hardware ranges from parity
error detection that is coupled with Processor Instruction Retry and bus try again, to ECC
correction on caches and system buses.
Within the processor/memory subsystem error checker, error-checker signals are captured
and stored in hardware FIRs. The associated logic circuitry is used to limit the domain of an
error to the first checker that encounters the error. In this way, runtime error diagnostic tests
can be deterministic so that for every check station, the unique error domain for that checker
is defined and mapped to field-replaceable units (FRUs) that can be repaired when
Integral to the Power Systems design is the concept of FFDC. FFDC is a technique that
involves sufficient error-checking stations and co-ordination of faults so that faults are
detected and the root cause of the fault is isolated. FFDC also expects that necessary fault
information can be collected at the time of failure without needing to re-create the problem or
run an extended tracing or diagnostics program.
For many faults, a good FFDC design means that the root cause is isolated at the time of the
failure without intervention by an IBM SSR. For all faults, good FFDC design still makes failure
information available to the IBM SSR. This information can be used to confirm the automatic
diagnosis. More detailed information can be collected by an IBM SSR for rare cases where
the automatic diagnosis is not adequate for fault isolation.
– The server input voltages are out of operational specification. The SP can shut down a
system in the following circumstances:
• The temperature exceeds the critical level or remains above the warning level for
too long.
• Internal component temperatures reach critical levels.
• Non-redundant fan failures occur.
PowerVM Hypervisor (system firmware) and HMC connection surveillance
The SP monitors the operation of the firmware during the boot process and monitors the
hypervisor for termination. The hypervisor monitors the SP and can perform a reset and
reload if it detects the loss of the SP. If the reset/reload operation does not correct the
problem with the SP, the hypervisor notifies the OS, which can then take appropriate
action, including calling for service. The FSP also monitors the connection to the HMC and
can report loss of connectivity to the OS partitions for system administrator notification.
Uncorrectable error recovery
When enabled, the auto-restart (restart) option can restart the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced (power) failure.
The auto-restart (restart) option must be enabled from the ASMI.
Concurrent access to the SPs menus of the ASMI
This access allows nondisruptive abilities to change system default parameters,
interrogate SP progress and error logs, set and reset service indicators (Guiding Light for
enterprise servers), and access all SP functions without powering down the system to the
standby state.
The administrator or IBM SSR can access dynamically the menus from any web
browser-enabled console that is attached to the Ethernet service network concurrently
with normal system operation. Some options, such as changing the hypervisor type, do
not take effect until the next restart.
Managing the interfaces for connecting uninterruptible power source systems to the
POWER processor-based systems and performing timed power-on (TPO) sequences.
4.8.4 Diagnosing
General diagnostic objectives are created to detect and identify problems so that they can be
resolved quickly. The IBM diagnostic strategy includes the following elements:
Provide a common error code format that is equivalent to a System Reference Code
(SRC), system reference number, checkpoint, or firmware error code.
Provide fault detection and problem isolation procedures.
Support a remote connection ability that is used by the IBM Remote Support Center or
IBM Designated Service.
Provide interactive intelligence within the diagnostic tests with detailed online failure
information while connected to an IBM back-end system.
By using the extensive network of advanced and complementary error detection logic that is
built directly into hardware, firmware, and OSs, the Power Systems servers can perform
considerable self-diagnosis.
Because of the FFDC technology that is designed into IBM servers, re-creating diagnostic
tests for failures or requiring user intervention is unnecessary. Solid and intermittent errors
are correctly detected and isolated at the time that the failure occurs. Runtime and boot time
diagnostic tests fall into this category.
Boot time
When a Power Systems server starts, the SP initializes the system hardware. Boot time
diagnostic testing uses a multitier approach for system validation, starting with managed
low-level diagnostic tests that are supplemented with system firmware initialization and
configuration of I/O hardware, followed by OS-initiated software test routines.
To minimize boot time, the system determines which of the diagnostic tests are required to be
started to ensure correct operation. This determination based on the way that the system was
powered off or on the boot-time selection menu.
With this HB initialization, new progress codes are available. An example of an FSP progress
code is C1009003. During the HB IPL, progress codes, such as CC009344, appear.
If there is a failure during the HB process, a new HB system memory dump is collected and
stored. This type of memory dump includes HB memory and is offloaded to the HMC when it
is available.
Extensive diagnostic and fault analysis routines were developed and improved over many
generations of POWER processor-based servers. These routines enable quick and accurate
predefined responses to actual and potential system problems. The PRD code running in the
special service partition correlates and processes runtime error information by using logic
that is derived from IBM engineering expertise to count recoverable errors (called
thresholding) and predict when corrective actions must be automatically initiated by the
system. These actions can include the following items:
Requests for a part to be replaced
Dynamic invocation of built-in redundancy for automatic replacement of a failing part
Dynamic deallocation of failing components so that system availability is maintained
Device drivers
In certain cases, diagnostic tests are best performed by OS-specific drivers, most notably
adapters or I/O devices that are owned directly by an LPAR. In these cases, the OS device
driver often works with I/O device microcode to isolate and recover from problems. Potential
problems are reported to an OS device driver, which logs the error.
In non-HMC managed servers, the OS can start the Call Home application to report the
service event to IBM. For optional HMC-managed servers, the event is reported to the HMC,
which can start the Call Home request to IBM. I/O devices can also include specific exercisers
that can be started by the diagnostic facilities for problem recreation (if required by service
4.8.5 Reporting
If a system hardware or environmentally induced failure is detected, Power Systems servers
report the error through various mechanisms. The analysis result is stored in system NVRAM.
You can use error log analysis (ELA) to display the failure cause and the physical location of
the failing hardware.
Using the Call Home infrastructure, the system automatically can send an alert through a
phone line to a pager, or call for service if there is a critical system failure. A hardware fault
also illuminates the amber system fault LED that is on the system node to alert the user of an
internal hardware problem.
On POWER9 processor-based servers, hardware and software failures are recorded in the
system log. When a management console is attached, an ELA routine analyzes the error,
forwards the event to the Service Focal Point (SFP) application that is running on the
management console, and notifies the system administrator that it isolated a likely cause of
the system problem. The SP event log also records unrecoverable checkstop conditions,
forwards them to the SFP application, and notifies the system administrator.
After the information is logged in the SFP application, a Call Home service request is started
and the pertinent failure data with service parts information and part locations is sent to the
IBM service organization if the system is correctly configured. This information also contains
the client contact information that is defined in the IBM Electronic Service Agent (ESA) guided
setup wizard. In HMC V8R8.1.0, a Serviceable Event Manager is available to block problems
from being automatically transferred to IBM. For more information, see “Service Event
Manager” on page 170.
Data that contains information about the effect that the repair has on the system is also
included. Error log routines in the OS and FSP can then use this information and decide
whether the fault is a Call Home candidate. If the fault requires support intervention, a call is
placed with service and support. A notification is sent to the contact that is defined in the
ESA-guided setup wizard.
Remote support
The Remote Management and Control (RMC) subsystem is delivered as part of the base OS,
which includes the OS that runs on the HMC. RMC provides a secure transport mechanism
across the local area network (LAN) interface between the OS and the optional HMC and is
used by the OS diagnostic application for transmitting error information. It performs several
other functions, but those functions are not used for the service infrastructure.
When a local or globally reported service request is made to the OS, the OS diagnostic
subsystem uses the RMC subsystem to relay error information to the optional HMC. For
global events (platform unrecoverable errors, for example), the SP also forwards error
notification of these events to the HMC, which provides a redundant error-reporting path in
case the errors are in the RMC subsystem network.
The first occurrence of each failure type is recorded in the Manage Serviceable Events task
on the management console. This task then filters and maintains a history of duplicate
reports from other LPARs or from the SP. It then looks at all active service event requests
within a predefined timespan, analyzes the failure to ascertain the root cause and, if enabled,
starts a Call Home for service. This methodology ensures that all platform errors are reported
through at least one functional path, which results in a single notification for a single problem.
Similar service functions are provided through the SFP application on the IVM for providing
service functions and interfaces on non-HMC partitioned servers.
The data is formatted and prepared for transmission back to IBM to assist the service support
organization with preparing a service action plan for the IBM SSR or for more analysis.
If more information that relates to the memory dump is required, or if viewing the memory
dump remotely becomes necessary, the management console memory dump record notifies
the IBM Support center regarding on which managements console the memory dump is. If no
management console is present, the memory dump might be on the FSP or in the OS,
depending on the type of memory dump that was started and whether the OS is operational.
4.8.6 Notifying
After a Power Systems server detects, diagnoses, and reports an error to an appropriate
aggregation point, it notifies the client and, if necessary, the IBM Support organization.
Depending on the assessed severity of the error and support agreement, this client
notification might range from a simple notification to having field service personnel
automatically dispatched to the client site with the replacement part.
Client Notify
When an event is important enough to report but does not indicate the need for a repair action
or to call home to IBM Support, it is classified as Client Notify. Clients are notified because
these events might be of interest to an administrator. The event might be a symptom of an
expected systemic change, such as a network reconfiguration or failover testing of redundant
power or cooling systems, including the following examples:
Network events, such as the loss of contact over a LAN
Environmental events, such as ambient temperature warnings
Events that need further examination by the client (although these events do not
necessarily require a part replacement or repair action)
Client Notify events are serviceable events because they indicate that something happened
that requires client awareness if the client wants to take further action. These events can be
reported to IBM at the discretion of the client.
Call Home
Call Home refers to an automatic or manual call from a customer location to an IBM Support
structure with error log data, server status, or other service-related information. The Call
Home feature starts the service organization so that the appropriate service action can begin.
Call Home can be done through HMC or most non-HMC managed systems.
Although configuring a Call Home function is optional, clients are encouraged to implement
this feature to obtain service enhancements, such as reduced problem determination and
faster and potentially more accurate transmission of error information. The use of the Call
Home feature can result in increased system availability. The ESA application can be
configured for automated Call Home. For more information, see 4.9.4, “Electronic Services
and Electronic Service Agent” on page 169.
Guiding Light
High-end systems are usually repaired by IBM Support personnel. The enclosure and system
identify LEDs that are on solid, and can be used to follow the path from the system to the
enclosure and down to the specific FRU.
Guiding Light uses a series of flashing LEDs, allowing a service provider to quickly and easily
identify the location of system components. Guiding Light can also handle multiple error
conditions simultaneously, which might be necessary in some complex high-end
In these situations, Guiding Light waits for the servicer’s indication of what failure to attend
first and then illuminates the LEDs to the failing component.
Data centers can be complex places, and Guiding Light is designed to do more than identify
visible components. When a component might be hidden from view, Guiding Light can flash a
sequence of LEDs that extends to the frame exterior, clearly guiding the service
representative to the correct rack, system, enclosure, drawer, and component.
Service labels
Service providers use these labels to assist with maintenance actions. Service labels are in
various formats and positions and are intended to transmit readily available information to the
IBM SSR during the repair process.
Operator panel
The operator panel of the Power E980 is in the SCU and is composed of a base unit and a
separate LCD unit, which are individually concurrent maintainable. The operator panel is
used to present boot progress codes, which indicate advancement through the system
power-on and initialization processes. The operator panel also is used to display error and
location codes when an error occurs that prevents the system from booting. It includes
several buttons, which enable an IBM SSR or client to change various boot-time options and
for other limited service functions.
The LCD operator panel features two rows of 16 characters and increment, decrement, and
Enter buttons.
Concurrent maintenance
The IBM POWER9 processor-based systems are designed with the understanding that
certain components have higher intrinsic failure rates than others. These components can
include fans, power supplies, and physical storage devices. Other devices, such as I/O
adapters, can wear from repeated plugging and unplugging. For these reasons, these devices
are concurrently maintainable when properly configured. Concurrent maintenance is
facilitated by the redundant design for the power supplies, fans, and physical storage.
In addition to these components, the operator panel can be replaced concurrently by using
the service functions of the ASMI menu.
R&V procedures can be used by user engineers and IBM SSR providers who are familiar with
the task and those engineers and providers who are not. Education-on-demand content is
placed in the procedure at the appropriate locations. Throughout the R&V procedure, repair
history is collected and provided to the Service and Support Problem Management Database
for storage with the serviceable event to ensure that the guided maintenance procedures are
operating correctly.
Clients can subscribe through the subscription services on the IBM Support Portal to obtain
notifications about updates that are available for service-related documentation.
4.9 Manageability
Several functions and tools help manageability so you can efficiently and effectively manage
your system.
The following primary service interfaces are used, depending on the state of the system and
its operating environment:
Guiding Light (see “Guiding Light” on page 158 and “Service labels” on page 159)
Operator panel
OS service menu
SFP on the HMC
Service processor
The SP is a controller that is running its own OS. It is a component of the service interface
card. The SP OS includes specific programs and device drivers for the SP hardware. The
host interface is a processor support interface that is connected to the POWER processor.
The SP is used to monitor and manage the system hardware resources and devices. The SP
checks the system for errors, which ensures the connection to the management console for
manageability purposes and for accepting ASMI Secure Sockets Layer (SSL) network
connections. The SP can view and manage the machine-wide settings by using the ASMI. It
also enables complete system and partition management from the HMC.
Analyzing a system that does not boot: The FSP can analyze a system that does not
boot. Reference codes and detailed data are available in the ASMI and are transferred to
the HMC.
The SP uses two Ethernet ports that run at 1 Gbps speed. Consider the following points:
Both Ethernet ports are visible only to the SP and can be used to attach the server to an
HMC or to access the ASMI. The ASMI options can be accessed through an HTTP server
that is integrated into the SP operating environment.
Both Ethernet ports support only auto-negotiation. Customer-selectable media speed and
duplex settings are not available.
The Ethernet ports have the following default IP addresses:
– SP eth0 (HMC1 port) is configured as
– SP eth1 (HMC2 port) is configured as
The ASMI is accessible through the management console. It is also accessible by using a
web browser on a system that is connected directly to the SP (in this case, a standard
Ethernet cable or a crossed cable) or through an Ethernet network. ASMI can also be
accessed from an ASCII terminal, but this option is available only while the system is in the
platform powered-off mode.
Use the ASMI to change the SP IP addresses or to apply certain security policies and prevent
access from unwanted IP addresses or ranges.
You might use the SP’s default settings to operate your server. If the default settings are used,
accessing the ASMI is not necessary. To access ASMI, use one of the following methods:
Management console
If configured to do so, the management console connects directly to the ASMI for a
selected system from this task.
To connect to the ASMI from a management console, complete the following steps:
a. Open Systems Management from the navigation pane.
b. From the work window, select one of the managed systems.
c. From the System Management tasks list, click Operations → Launch Advanced
System Management (ASM).
Web browser
At the time of writing, supported web browsers are Netscape, Microsoft Internet
Explorer 7.0, Opera 9.24, and Mozilla Firefox Later versions of these browsers
might work, but are not officially supported. The JavaScript language and cookies must be
enabled and TLS 1.2 might need to be enabled.
The web interface is available during all phases of system operation, including the IPL and
run time. However, several of the menu options in the web interface are unavailable during
IPL or run time to prevent usage or ownership conflicts if the system resources are in use
during that phase. The ASMI provides an SSL web connection to the SP. To establish an
SSL connection, open your browser by using the following address:
Note: To make the connection through Microsoft Internet Explorer, click Tools Internet
Options. Clear the Use TLS 1.0 option, and click OK.
ASCII terminal
The ASMI on an ASCII terminal supports a subset of the functions that are provided by the
web interface and is available only when the system is in the platform powered-off mode.
The ASMI on an ASCII console is not available during several phases of system operation,
such as the IPL and run time.
Command-line start of the ASMI
On the HMC or when properly configured on a remote system, the ASMI web interface
can be started from the HMC command line. Open a window on the HMC or access the
HMC with a terminal emulation and run the following command:
asmmenu --ip <ip address>
On the HMC, a browser window opens automatically with the ASMI window and, when
configured properly, a browser window opens on a remote system when issued from
Operator panel
The SP provides an interface to the operator panel, which is used to display system status
and diagnostic information. The operator panel can be accessed in the following ways:
By using the normal operational front view
By pulling it out to access the switches and viewing the LCD display
When installed, online diagnostic tests are a part of the AIX or IBM IBM i on the disk or
server. They can be started in single-user mode (service mode), run in maintenance mode, or
run concurrently (concurrent mode) with other applications. They can access the AIX error log
and the AIX configuration data. IBM i has a service tools problem log, IBM i history log
(QHST), and IBM i problem log.
The System Management Services (SMS) error log is accessible from the SMS menus. This
error log contains errors that are found by partition firmware when the system or partition is
You can also access the system diagnostics from a Network Installation Management (NIM)
Alternative method: When you order a Power Systems server, a DVD-ROM or DVD-RAM
might be an option. An alternative method for maintaining and servicing the system must
be available if you do not order the DVD-ROM or DVD-RAM.
IBM i and its associated machine code provide dedicated service tools (DSTs) as part of the
IBM i licensed machine code (Licensed Internal Code) and system service tools (SSTs) as
part of IBM i. DSTs can be run in dedicated mode (no OS is loaded). DSTs and diagnostic
tests are a superset of those available under SSTs.
The IBM i End Subsystem (ENDSBS *ALL) command can shut down all IBM and customer
applications subsystems except for the controlling subsystem QTCL. The Power Down
System (PWRDWNSYS) command can be set to power down the IBM i partition and restart the
partition in DST mode.
You can start SST during normal operations, which keeps all applications running, by using
the IBM i Start Service Tools (STRSST) command (when signed onto IBM i with a secured user
With dedicated service tools (DST) or system service tools (SST), you can review various
logs, run various diagnostic tests, or take several kinds of system memory dumps or other
Depending on the OS, the following service-level functions are what you often see when you
use the OS service menus:
Product activity log
Trace Licensed Internal Code
Work with communications trace
Licensed Internal Code log
Main storage memory dump manager
Hardware service manager
Call Home/Customer Notification
Error information menu
LED management menu
Concurrent/Non-concurrent maintenance (within scope of the OS)
Managing firmware levels:
– Server
– Adapter
Remote support (access varies by OS)
Each LPAR reports errors that it detects and forwards the event to the SFP application that is
running on the management console without determining whether other LPARs also detect
and report the errors. For example, if one LPAR reports an error for a shared resource, such
as a managed system power supply, other active LPARs might report the same error.
By using the Manage Serviceable Events task in the management console, you can avoid
long lists of repetitive Call Home information by recognizing that these errors are repeated
errors and consolidating them into one error.
In addition, you can use the Manage Serviceable Events task to start service functions on
systems and LPARs, including the exchanging of parts, configuring connectivity, and
managing memory dumps.
Firmware entitlement
With HMC V8R8.1.0.0, the firmware installations are restricted to entitled servers. The
customer must be registered with IBM and have the appropriate service contract. During the
initial machine warranty period, the access key is installed in the machine by
IBM Manufacturing. The key is valid for the regular warranty period plus some extra time.
The Power Systems Firmware is relocated from the public repository to the access control
repository. The I/O firmware remains on the public repository, but the server must be entitled
for installation. When the lslic command is run to display the firmware levels, a new value,
update_access_key_exp_date, is added. The HMC GUI and the ASMI menu show the Update
access key expiration date.
When the system is no longer entitled, the firmware updates fail. The following new SRC
packages are available:
E302FA06: Acquisition entitlement check failed
E302FA08: Installation entitlement check failed
Any firmware release that was made available during the entitled time frame can still be
installed. For example, if the entitlement period ends on 31 December 2014 and a new
firmware release is available before the end of that entitlement period, it can still be installed.
If that firmware is downloaded after 31 December 2014 but it was made available before the
end of the entitlement period, it can still be installed. Any newer release requires a new
update access key.
Note: The update access key expiration date requires a valid entitlement of the system to
perform firmware updates.
You can find an update access key at IBM Capacity on Demand (CoD) Home.
For more information about entitled IBM Software Support, see My Entitle Systems Support.
Firmware updates
System firmware is delivered as a release level or a service pack. Release levels support the
general availability (GA) of new functions or features, and new machine types or models.
Upgrading to a higher release level is disruptive to customer operations. These release levels
are supported by service packs. Service packs are intended to contain only firmware fixes
and not introduce new functions. A service pack is an update to a release level.
The management console is used for system firmware updates. By using the management
console, you can use the Concurrent Firmware Maintenance (CFM) option when concurrent
service packs are available. CFM is the Power Systems Firmware updates that can be
partially or wholly concurrent or nondisruptive. With the introduction of CFM, IBM is
increasing its clients’ opportunity to stay on a specific release level for longer periods. Clients
that want maximum stability can defer until there is a compelling reason to upgrade, such as
the following reasons:
A release level is approaching its end of service date (that is, it was available for
approximately one year and service soon will not be supported).
You want to move a system to a more standardized release level when there are multiple
systems in an environment with similar hardware.
A new release has a new function that is needed in the environment.
A scheduled maintenance action causes a platform restart, which also provides an
opportunity to upgrade to a new firmware release.
Updating and upgrading system firmware depends on several factors, including the current
firmware that is installed and what OSs are running on the system. These scenarios and the
associated installation instructions are described in the Firmware section of Fix Central.
You also might want to review the preferred practice white papers that are found at Service
and support best practices for Power Systems.
The firmware and microcode can be downloaded and installed from the HMC or a running
Power Systems servers include a permanent firmware boot side (A side) and a temporary
firmware boot side (B side). New levels of firmware must be installed first on the temporary
side to test the update’s compatibility with applications. When the new level of firmware is
approved, it can be copied to the permanent side.
For access to the initial websites that address this capability, see POWER9 systems
For POWER9 processor-based servers, select POWER9 systems. Then, search for
“Firmware and HMC updates” to find the resources for keeping your system’s firmware
If there is an HMC to manage the server, the HMC interface can be used to view the levels of
server firmware and power subsystem firmware that are installed and that are available to
download and install.
Each Power Systems server has the following levels of server firmware and power subsystem
Installed level
This level of server firmware or power subsystem firmware is installed on the temporary
side of the system firmware. It also is installed into memory after the managed system is
powered off and then powered on.
Activated level
This level of server firmware or power subsystem firmware is active and running in
Accepted level
This level is the backup level of server or power subsystem firmware. You can return to this
level of server or power subsystem firmware if you decide to remove the installed level. It is
installed on the permanent side of system firmware.
Use the HMC-enhanced GUI to obtain information about the different firmware levels in effect
by selecting Resources → All Systems, selecting the system or the systems of interest,
selecting Actions → Updates → View system information → and selecting None - Display
current values.
IBM provides the CFM function on Power E980 servers. This function supports applying
nondisruptive system firmware service packs to the system concurrently (without requiring a
restart operation to activate changes).
The concurrent levels of system firmware can (on occasion) contain fixes that are known as
deferred. These deferred fixes can be installed concurrently, but are not activated until the
next IPL. Any deferred fixes are identified in the Firmware Update Descriptions table of the
firmware document. For deferred fixes within a service pack, only the fixes in the service pack
that cannot be concurrently activated are deferred.
The file-naming convention for the system firmware is listed in Table 4-3.
PP Package identifier 01 -
For example, here is the naming convention for the current (as of this writing) Power E980
Otherwise, an installation is concurrent if the service pack level (FFF) of the new firmware is
higher than the service pack level that is installed on the system and the conditions for
disruptive installation are not met.
With PORE, the firmware can now dynamically power off processor components, change the
registers, and reinitialize while the system is running without discernible affect to any
applications that are running on a processor. This feature potentially allows concurrent
firmware changes in POWER9 processor-based systems, which in earlier designs required a
restart to take effect.
Activating new firmware functions requires installation of a firmware release level. This
process is disruptive to server operations and requires a scheduled outage and full server
The Electronic Services solution consists of the following separate (but complementary)
Electronic Services news page
Electronic Service Agent
Early knowledge about potential problems enables IBM to deliver proactive service that can
result in higher system availability and performance. In addition, information that is collected
through the Service Agent is made available to IBM SSRs when they help answer your
questions or diagnose problems. The installation and use of ESA for problem reporting
enables IBM to provide better support and service for your IBM server.
For more information about how Electronic Services can work for you, see IBM Electronic
Support (an IBM ID is required).
The basic configuration of the SEM can be accomplished by using the HMC Enhanced GUI.
Select Serviceability → Event Manager for Call Home (Figure 4-6) to get access to the
Events Manager for Call Home menu.
Figure 4-6 Service Event Manager configuration through the HMC Enhanced GUI
In the Events Manager for Call Home menu, you can add an HMC that is used to manage the
serviceable events to the list of registered management consoles and proceed with further
configuration steps, as shown in Figure 4-7.
Figure 4-7 Event Manager for Call Home menu of the HMC Enhanced GUI
The following menu options are available when you select an event in the table:
View Details...
Shows the details of this event.
View Files...
Shows the files that are associated with this event.
Approve Call Home
Approves the Call Home of this event. This option is available only if the event is not yet
The Help / Learn more function can be used to get more information about the other available
windows for the Serviceable Event Manager.
I/O subsystem
Memory availability
SUE handling X X X
OS error codes X X X
Inventory collection X X X
EED collection X X X
Redundant HMCs X X X
