The Nutanix Design Guide First Edition PDF
The Nutanix Design Guide First Edition PDF
The Nutanix Design Guide First Edition PDF
FIRST EDITION
Nutanix
Design
Guide
Edited by
Angelo Luciani, V
CP
By Nutanix,
René van den Bedem, in collaboration with
NPX, VCDX4, DECM-EA RoundTower Technologies
Table of Contents
1 Foreword 3
2 Contributors 5
3 Acknowledgements 7
5 Introduction To Nutanix 9
6 Why Nutanix? 17
9 Design Methodology 49
& The NPX Program
10 Channel Charter 53
11 Mission-Critical Applications 57
12 SAP On Nutanix 69
13 Hardware Platforms 81
18 Xi IoT 121
21 Era 161
22 Karbon 165
24 Files 193
25 Volumes 199
26 Buckets 203
27 Prism 209
29 AHV 217
30 Move 225
31 X-Ray 229
32 Foundation 241
1 Foreword
I am honored to write this foreword for ‘The Nutanix Design Guide’.
We have always believed that cloud will be more than just a rented
model for enterprises. Computing within the enterprise is nuanced,
as it tries to balance the freedom and friction-free access of the
public cloud with the security and control of the private cloud. The
private cloud itself is spread between core datacenters, remote and
branch offices, and edge operations. The trifecta of the 3 laws – (a)
Laws of the Land (data and application sovereignty), (b) Laws of
Physics (data and machine gravity), and (c) Laws of Economics
(owning vs. renting in long term) – is forcing the enterprise to be
deliberate about its cloud journey. At Nutanix, we firmly believe that
the private and the public cloud must mimic each other when it
comes to ease-of-use and friction-free operations.
3
Foreword
4
The Nutanix Design Guide
2 Contributors
Angelo Luciani is the Nutanix Technology Champion Community
Manager at Nutanix. He is VCP certified. He blogs at virtuwise.
com (voted Top 50 vBlog 2018 at vsphere-land.com) and can be
followed on Twitter at @AngeloLuciani.
5
Contributors
6
The Nutanix Design Guide
3 Acknowledgements
The following people reviewed and provided feedback for this book:
7
8
The Nutanix Design Guide
Introduction
to Nutanix
Author: Angelo Luciani
9
Introduction to Nutanix
10
The Nutanix Design Guide
FIGURE 1
Nutanix Customer Journey
NIX ENTERP
TA RI
SE
NU
Xi Leap
Volumes Xi Frame
NI X ESSENT
TA IA
NU LS
Buckets Xi Beam
Calm Flow
TA N I X C O R
Era NU E Xi Epoch
PrismPro Files
11
Introduction to Nutanix
In the last 3 years, we have proved them wrong with the deep
inroads that AHV has made with a large swathe of workloads in
the enterprise. Industry watchers gave us no chance to shift from
an appliance business model to a pure software business model
as a public company.
12
The Nutanix Design Guide
13
Introduction to Nutanix
14
The Nutanix Design Guide
References 5.2
What We Do:
https://www.nutanix.com/what-we-do/
Hardware Platforms:
https://www.nutanix.com/products/hardware-platforms/
Software Options:
https://www.nutanix.com/products/software-options/
15
16
The Nutanix Design Guide
Why
Nutanix?
Author: Steve Kaplan
17
Why Nutanix?
18
The Nutanix Design Guide
19
Why Nutanix?
20
The Nutanix Design Guide
21
Why Nutanix?
FIGURE 2
Excess Capacity Depreciation Required for a SAN
Depreciation Cost per Consumed VM
$300
$250
Depreciation Cost
$200
$150
$100
$50
$0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Quarter
22
The Nutanix Design Guide
23
Why Nutanix?
24
The Nutanix Design Guide
T A B L E 1
Capital Expenses
Compute Layer (Blades, $1,740,000 $6,240,000 -$4,500,000
Rackmount Servers) vs.
Nutanix
Data Storage Services $9,346,920 $0 $9,346,920
Storage Area Network $343,392 $0 $343,392
Total Services
SAN Ports & Cables $40,768 $0 $40,768
Server Virtualization $1,792,000 $1,064,000 $728,000
Software/Hypervision
Capitalizaed $676,516 $96,000 $580,516
Professional Services/
Installation
Total Capital Expense $13,939,595 $7,400,000 $6,539,595
Operating Expense
Quantifying 6.5
Virtualization Savings
Moore’s Law, which states that the number of transistors on a
processor doubles every 18 months, has long powered the IT
industry. Laptops, the Internet, virtualization, smart phones, cloud
computing and hyperconverged infrastructure (HCI) are examples of
technologies enabled by ever faster CPUs. There is no end in sight
for the continued performance benefits of Moore’s Law, even though
the ways in which that performance is achieved, such as using more
cores, photonics and memristors, differs from the original precepts.
25
Why Nutanix?
26
The Nutanix Design Guide
27
Why Nutanix?
T A B L E 2
6.5.1.1 Security
Each VMware product, and each version of said product,
requires a separate hardening guide. The vSphere 6.7 Update
1 hardening guide alone includes 50 tasks, and these are not
trivial tasks. The hardening guides additionally do not operate in
isolation. Changes in hardening one product line can adversely
affect another.
28
The Nutanix Design Guide
Micro-segmentation 6.5.1.2
29
Why Nutanix?
30
The Nutanix Design Guide
An IDC study: Private vs. Public Cloud, for example, says that
predictable workloads (which typically account for many
applications) on average result in costs more than twice those
when running on-premises with Nutanix HCI. A July 2018 IDC
survey of 400 organizations, Cloud Repatriation Accelerates in a
Multi-Cloud World, found that 80% of organizations in the study
had repatriated at least some applications out of public cloud back
on-premises, and that 50% of all public cloud applications installed
today will move back on-premises over the next two years.
If you are going to use a car a few weeks out of the year, it would
be silly to purchase a vehicle as it would be far more expensive. If,
however, you are going to use the car most of the time, it is far less
expensive to purchase it rather than rent it year-round. The same
type of logic applies to a public cloud. Elastic, burstable workloads
make all kinds of economic sense to run in the public cloud. But
customers can typically run predictable and persistent workloads
at a much lower cost on-premises with Nutanix.
31
Why Nutanix?
really any way that you can be less expensive than AWS. AWS is
really cheap.”
Tim responded later in the week with a TCO analysis. The summary
slide is shown in the table below. “Simon”, he said, “We have
mapped all the AWS costs and, in fact, about 58 of them were
T A B L E 3
Capital Expenses
Compute Layer $0 $271,173
SAN Ports & Cables $0 $480
Capitalized Professional $0 $4,800
Services/Installation
Sub-Total Capital Expense $0 $276,453
Operating Expense
32
The Nutanix Design Guide
33
Why Nutanix?
For further reading, see Steve Kaplan’s book – The ROI Story:
A Guide for IT leaders, available now from Amazon.
34
The Nutanix Design Guide
References 6.8
35
36
The Nutanix Design Guide
The Nutanix
Eco-System
Author: René van den Bedem
37
The Nutanix Eco-system
FIGURE 3
Nutanix Enterprise Cloud Eco-System
Xi Frame
Xi IoT
Xi Epoch
Xi Beam
Xi Leap
Community
Sizer Move X-Ray
Edition (CE)
Era
Calm
Flow
Data Protection
Foundation
Phoenix
IBM CS Series
Nutanix NX Dell XC Lenova HX (AIX & Power
Series Series Series Linux)
38
The Nutanix Design Guide
39
The Nutanix Eco-system
40
The Nutanix Design Guide
References 7.1
Nutanix Core:
https://www.nutanix.com/products/core/
Nutanix Essentials:
https://www.nutanix.com/products/essentials/
Nutanix Enterprise:
https://www.nutanix.com/products/enterprise/
41
42
The Nutanix Design Guide
Certification
& Training
Author: René van den Bedem
43
Certification & Training
Nutanix has four certification and learning tracks with the Nutanix
Platform Expert being the premier level of certification (refer
to following chapter for additional detail). The tracks are Sales,
Systems Engineer, Services and Technical.
FIGURE 4
Nutanix Certification by Role
NCSX
44
The Nutanix Design Guide
The NCP, NCAP, NCPI, NCS and NCSE exams are proctored
certifications.
45
Certification & Training
8.1 References
Nutanix Virtual Technology Bootcamp:
https://www.nutanix.com/bootcamp/virtual/
Nutanix NuSchool:
https://nuschool.nutanix.com
46
The Nutanix Design Guide
47
48
The Nutanix Design Guide
Design
Methodology
& The NPX
Program
Authors: René van den Bedem & Mark Brunstad
49
Design Methodology & The NPX Program
“Complex is competent.
Simple is Genius.”
– Binny Gill, Nutanix
50
The Nutanix Design Guide
References 9.1
NPX Link-O-Rama:
https://vcdx133.com/2015/03/06/nutanix-platform-link-o-rama/
51
52
The Nutanix Design Guide
10
Channel
Charter
Author: René van den Bedem
53
Channel Charter
There are levels to this game, and if you have a strategic project
that you want done right, it makes sense to align yourself with
the correct Nutanix partner.
The table below lists the criteria for each Partner Level in developed
Zone 1 countries.
T A B L E 4
Closed Deals 2 9 30
Transformational Deals 0 1 6
NCSR L1-L3 2 4 5
NCSX 0 1 2
NCP 1 2 4
NCSE L1-L2 1 2 4
NPX 0 0 Optional
NCPI or NCS 0 1 2
54
The Nutanix Design Guide
References 10.1
55
56
The Nutanix Design Guide
11
Mission-
Critical
Applications
Author: Michael Webster
57
Mission-Critial Applications
The main driving factor for this how the Nutanix architecture reduces
risk, improves predictability and performance consistency from
day 1, during growth, and when disasters strike. The figure below
displays the workload use-case proportion from Q1 FY2019, which
was included in the earnings infographic (see references for full
infographic).
FIGURE 5
Nutanix Enterprise Cloud Use-Case Distribution Q1 FY2019
58
The Nutanix Design Guide
59
Mission-Critial Applications
60
The Nutanix Design Guide
Use-Cases 11.1
61
Mission-Critial Applications
62
The Nutanix Design Guide
• Scale up versus scale out. Does the application only scale up,
or can you scale it out and add multiple components to balance
the load across a data center or multiple data centers? In a
virtualized environment having more smaller VMs can achieve
higher performance and better load distribution than fewer
larger VMs. In many cases better than bare metal performance
can be achieved with an optimized VM design due to more
efficient processor scheduling.
63
Mission-Critial Applications
FIGURE 6
App Architecture for Mission-Critical System with Automated Failover
Primary Secondary
Data Center Data Center
DR Automation Cluster
DR Automation DR Automation
DR Automation
Power-On/Power-Off/
Switchover/Failover/
Failback Scripts
Application/Message/
DB Replication
App1 App2 App3 App4
Metro Ethernet
GSLB GSLB
GSLB
DNS DNS
DNS Updates
64
The Nutanix Design Guide
Risks 11.3
65
Mission-Critial Applications
11.4
References
SAP NetWeaver Certified:
http://scn.sap.com/docs/DOC-8760
Exchange ESRP:
https://technet.microsoft.com/en-us/office/dn756396.aspx
66
The Nutanix Design Guide
67
68
The Nutanix Design Guide
12
SAP on
Nutanix
Author: Bas Raayman
69
SAP on Nutanix
70
The Nutanix Design Guide
71
SAP on Nutanix
12.1 Use-Cases
The following use-cases drive SAP design:
72
The Nutanix Design Guide
73
SAP on Nutanix
• Take CPU generations into account. Older x86 CPUs did not
have many cores to accommodate hyperthreading but ran at
very high clock speeds. Modern CPUs come with a large number
of cores but have the tradeoff that as more cores are available
on a CPU, the clock speed is lowered. Some workloads benefit
significantly by having higher clock speed and therefore higher
single-threaded execution performance, whereas others are
better with many threads, even if they are lower speed per
thread. This metric is called SCU (Single Compute Unit) in SAP.
In general, higher frequency per core is favorable for application
processes in SAP.
74
The Nutanix Design Guide
75
SAP on Nutanix
12.3 Risks
These are some of the risks associated with running SAP:
76
The Nutanix Design Guide
77
SAP on Nutanix
12.4 References
Best Practices: SAP on Nutanix:
https://www.nutanix.com/go/virtualizing-sap-on-nutanix-best-
practices.php
78
The Nutanix Design Guide
79
80
The Nutanix Design Guide
13
Hardware
Platforms
Author: Wayne Conrad
81
Hardware Platforms
T A B L E 5
Nutanix NX
Dell Technologies XC, PowerEdge
Lenovo HX
Cisco UCS Business-Critical Apps, VDI,
Compute intensive, Data
HPE ProLiant, Apollo intensive, ROBO, SMB
Intel SU2600
Inspur NF5280M5
Hitachi HA8000V
Huawei FusionServer 2288H V5
IBM CS AIX, PowerLinux
Klas Voyager 2
Rugged, MIL-spec
Crystal RS2616PS18
82
The Nutanix Design Guide
and Capacity
Anyone familiar with virtualization should hopefully be familiar with
CPU and memory sizing at this point, but we will briefly consider
the traditional virtualization sizing considerations. Nutanix does add
a wrinkle most of us have not considered in years, local storage.
Remember that the CVM is pinned to the first CPU, and the CVM
vCPU size may cause co-stop with other large CPU count VMs as
they may not be able to run side by side with the CVM.
83
Hardware Platforms
CPUs with less cores but higher GHz is a much better idea.
84
The Nutanix Design Guide
The corollary to “You always run out of RAM first so you should buy
as much as you can afford” in virtualization is your SSD space in
hybrid storage. The working set size will almost certainly increase
over time as software continues to increase in size and their
patches get larger and larger. Consider the growth in size of your
server gold images over the last ten years, did you start with 20GB
Windows 2003 images that are now 50-60GB on Windows Server
2016? One of the easiest ways to size your hot tier is by looking at
the change rates on daily backups.
All SSDs are 2.5-inch, 3.5-inch really does not bring any benefits,
but 3.5-inch traditional hard drives can provide a lot more space in
hybrid nodes at a much lower price.
Deep storage nodes provide a lot of slots for either 2.5-inch or 3.5-
inch hard drives. Deep storage nodes are generally used for Files and
Buckets use-cases. Some deep storage nodes are more performance
oriented for business-critical applications like large databases.
Nodes with a single SSD may be more cost-effective but have more
risk of performance issues with SSD failure. Exercise caution with
the intended use-cases when purchasing single SSD nodes.
85
Hardware Platforms
86
The Nutanix Design Guide
more GPUs. Note that NVIDIA GPUs will not work for VDI without
a separate license and license server VM running.
& Support
OEMs have varying global distribution networks for parts or
installation. An OEM with truly global reach may be needed to
support remote sites in smaller countries for instance. Some OEMs or
their resellers can perform integration work, such as loading custom
images onto servers or racking all hardware, cabling up switches, and
configuring everything for drop in installation into your data center.
Ruggedization 13.3
Compliance 13.4
87
Hardware Platforms
Good use-cases for compute only nodes are monster CPU and
memory business critical application VMs, especially those with
license per server or per code. If your applications are not CPU or
memory bound, you may be better off with partially populated all
flash or traditional configurations that have data locality.
88
The Nutanix Design Guide
Since compute only nodes are for monster VMs, please consider
25GbE or better networking, and/or LACP unless you know the
storage and network throughput expected.
89
Hardware Platforms
13.6 References
Nutanix Hardware Platform:
https://www.nutanix.com/products/hardware-platforms/
Compatibility matrix:
https://portal.nutanix.com/#/page/compatibilitymatrix
90
The Nutanix Design Guide
91
92
The Nutanix Design Guide
14
Sizer &
Collector
Author: René van den Bedem
93
Sizer & Collector
Note that Nutanix Sizer does not currently support non-x86 IBM
CS hardware.
If you have hard data on the expected data reduction ratios, use
them here, otherwise they are assumptions, which introduces risk
and should be avoided.
94
The Nutanix Design Guide
References 14.2
Size and Design Your Web-Scale Data Center with Nutanix Sizer:
https://www.nutanix.com/2014/11/10/size-and-design-your-web-
scale-datacenter-with-nutanix-sizer/
Nutanix Sizer:
https://sizer.nutanix.com
Nutanix Collector:
http://download.nutanix.com/documentation/Documents_ANY_
Version/Nutanix-Collector-User-Guide.pdf
95
96
The Nutanix Design Guide
15
IBM Power
Systems
Author: René van den Bedem
97
IBM Power Systems
15.1 Use-Cases
The following use-cases drive AIX and PowerLinux with IBM Power
Systems on Nutanix:
98
The Nutanix Design Guide
• Threads per core – Intel CPUs have 2 threads per core, IBM
POWER8 CPUs have 8 threads per core.
• AIX version – Run AIX 7.2 with the 7200-02 Technology Level with
Service Pack 7200-02-02-1810 and APAR IJ05283 or later.
99
IBM Power Systems
15.3 Risks
• By avoiding application refactoring, there is still a need to
maintain operations and administration staff that can manage
AIX and PowerLinux. However, the management and monitoring
complexity will be reduced by leveraging the Nutanix Enterprise
Cloud Platform eco-system.
100
The Nutanix Design Guide
References 15.4
101
102
The Nutanix Design Guide
16
Remote
Office, Branch
Office
Author: Greg White
103
Remote Office, Branch Office
• Management,
16.1 Use-Cases
Due to the ability to architect solutions using 1 or more nodes,
Nutanix is able address a wide spectrum of use-cases for ROBO
and edge sites in all key verticals. Whether it is a retail store,
restaurant, manufacturing site, bank branch, drill rig, ship, clinic
or other location where latency, connectivity, data locality or
business reasons dictate a need for local compute and storage
resources the flexibility of the HCI-based Nutanix Enterprise Cloud
software and variety of hardware platforms and configurations
ensure that unique needs can be met.
104
The Nutanix Design Guide
Two-node clusters offer reliability for smaller sites that must be cost
effective and run with tight margins. These clusters use a witness
only in failure scenarios to coordinate rebuilding data and automatic
upgrades. You can deploy the witness offsite up to 200ms away and
105
Remote Office, Branch Office
multiple clusters can use the same witness for two-node and metro
clusters. Metadata is maintained in RF4 with 2 copies on each node
and data is RF2 with a copy on the other node. In a failure scenario,
the operational node will create a 2nd (RF2) copy of lost node’s data.
During this rebuild, additional writes are held.
106
The Nutanix Design Guide
Recovery
Remote sites are great candidates for storing another copy
of native Nutanix snapshots for recovery and DR purposes.
Configuring backup on Nutanix lets an organization use its remote
site as a replication target to retrieve snapshots from it to restore
locally, but failover protection (that is, running failover VMs directly
from the remote site) is not enabled. Backup also supports using
multiple hypervisors, for example ESXi at the main site and AHV
at the remote site. Configuring the disaster recovery option allows
using the remote site both as a backup target and as a source for
dynamic recovery so that failover VMs can run directly from the
remote site. Nutanix provides cross-hypervisor disaster recovery
between ESXi and AHV clusters. Hyper-V clusters can only provide
disaster recovery to other Hyper-V-based clusters.
107
Remote Office, Branch Office
• Customizable dashboards,
108
The Nutanix Design Guide
Prism Central also provides labeling and tagging for VMs and
clusters. Tag VMs with labels to easily sort and find the ones
associated with a single application, site, business owner, or
customer. Tag clusters for similar needs in order to quickly
identify clusters by size, geography or other characteristics
you specify. You can perform operations or actions, like an
upgrade, on multiple entities at the same time.
Often remote and branch sites need local file server capabilities.
Nutanix Files provides the ability to have local SMB and NFS/Linux
file data. As little as one node can be enabled for file data, and it
can then be expanded either scale-up or scale-out. Consult the Files
chapter for more information on design considerations and limitations.
Security at remote and branch sites and at the edge can also be
a challenge due to the lack of attention and resources available
locally. Nutanix added Flow to provide built-in micro-segmentation
capabilities. This can be enabled at remote sites to protect east-
west traffic in the event of a breach. Consult the Flow chapter for
more information on design considerations and limitations.
109
Remote Office, Branch Office
16.8
References
ROBO Solution page:
https://www.nutanix.com/solutions/remote-and-branch-office/
110
The Nutanix Design Guide
111
112
The Nutanix Design Guide
17
Xi Frame
& EUC
Author: Kees Baggerman
113
Xi Frame & EUC
114
The Nutanix Design Guide
Considering that the EUC platform often is the first and most visual
point of access for end users to get to backend services provided
by IT, it is considered a mission critical application set. When the
EUC environment becomes unavailable due to unplanned downtime
or misaligned technology choices on business requirements this
immediately takes away the primary point of access for access to
backend services resulting into direct impact for end users.
Even performance degradation will be instantly visible to your
end users as the EUC environment is their access platform resulting
in additional service calls and escalations.
• Availability/DR
• Predictable performance
115
Xi Frame & EUC
• Cons: Not all hypervisors or brokers support all the same features
or end user devices, which requires careful design and sizing
considerations and selection of components.
• Efficient clones:
• VAAI/VCAI (vSphere)
• OCX (Hyper-V)
17.1 Use-Cases
The following use-cases drive design for End User
Computing environments:
116
The Nutanix Design Guide
Xi Frame 17.2
FIGURE 7
Xi Frame Components
117
Xi Frame & EUC
17.4 References
Xi Frame Product Page:
https://www.nutanix.com/products/frame/
118
The Nutanix Design Guide
119
120
The Nutanix Design Guide
18
Xi IoT
Author: Rohit Goyal
121
Xi IoT
FIGURE 8
Classic IoT Model
$
SENSORS DATA
LONG-TERM
PROCESSING
122
The Nutanix Design Guide
While IoT devices have been around for years, making sense of the
data generated from these devices has not been a top priority for
many organizations, largely due to complexity and cost. With the
right edge computing and IoT platform, however, deploying planet-
scale edge intelligence can be straightforward, cost-effective,
and a path to unprecedented innovation within the enterprise.
FIGURE 9
Nutanix Xi IoT - Move Real-time Processing to Edge and
Gain Faster Insights
REAL-TIME
MACHINE LEARNING
PROCESSING
IN THE CLOUD
SENSORS DATA
LONG-TERM
EDGE PROCESSING
AI INFERENCE
123
Xi IoT
124
The Nutanix Design Guide
Use-Cases 18.1
Manufacturing 18.1.1
Increase efficiency and maximize productivity by using edge
intelligence to predict equipment failure, detect process
anomalies, improve quality control, and manage energy
consumption. Real-time analysis reduces decision latency
and minimizes costly production delays
FIGURE 10
Xi IoT Manufacturing Use-Case
</>
Developer
Machine
Inference,
Apps & Models
Analytics,
Actuation
Operator
Insights
x 10s
Dashboards
125
Xi IoT
18.1.2 Retail
Deliver unique customer experiences by leveraging data at the edge
to personalize offers, build an omnichannel customer relationship,
and streamline the purchase process. Edge data can also improve
inventory management, ensuring product availability and easing
supply chain strains.
FIGURE 11
Xi IoT Retail Use-Case
</>
Developer
Apps & Models
Machine
Inference
Operator
Anomolies
Learning
126
The Nutanix Design Guide
Healthcare 18.1.4
Edge-based diagnostic equipment and monitoring tools bring
processing and analysis closer to the patient, improving care and
services without compromising patient privacy. Realtime detection
and diagnosis can make a significant impact on patient outcomes.
127
Xi IoT
128
The Nutanix Design Guide
129
Xi IoT
18.3
Risks
These are some of the risks associated with edge computing solutions:
130
The Nutanix Design Guide
References 18.4
131
132
The Nutanix Design Guide
19
Xi Leap, Data
Protection,
DR & Metro
Availability
Author: Mark Nijmeijer
133
Xi Leap, Data Protection, DR & Metro Availability
134
The Nutanix Design Guide
T A B L E 6
135
Xi Leap, Data Protection, DR & Metro Availability
For instance, you can have a ‘Gold’ Protection Policy that states a
15-minute RPO and a 1-week retention goal. There is a rule defined
that ties the Gold Protection Policy that ties it to any VM that has
been configured with a Nutanix Category “protection-level equals
mission-critical”.
136
The Nutanix Design Guide
Disaster Recovery
On top of the efficient Nutanix snapshotting and replication
capabilities, Nutanix provides workflows that allow the admin to
configure the system to provide disaster avoidance and recovery
capabilities. The vision is to shift all of the hard work to a period of
time that is typically not high-stress and provide capabilities to help
you ensure you can quickly and successfully recover from any kind
of outage with intuitive workflows and the right level of information
to keep your organization abreast of progress towards and ETA of a
full recovery.
The admin can define Recovery Plans for each of his applications.
This Recovery Plan contains all information that is necessary to
migrate that application to another data center, or to provide
a failover after an outage occurs. In particular, a Recovery Plan
contains the following information:
137
Xi Leap, Data Protection, DR & Metro Availability
138
The Nutanix Design Guide
Synchronous Replication
Nutanix Synchronous Replication provide a 0 RPO data protection
solution for those applications that require the highest levels of
data protection. Any application write that the system processes
will be acknowledged by at least 2 nodes in the local cluster
(assuming RF2) and at least 2 nodes in the remote cluster before
that write gets acknowledged back to the application’s VM.
139
Xi Leap, Data Protection, DR & Metro Availability
140
The Nutanix Design Guide
versus Xi Leap
Nutanix provides two ways of managing the Disaster Recovery
configuration. There is the newly-released orchestration that is used
for Xi Leap and on-site Leap part of AOS 5.10 and later, and there is
the legacy method using Protection Domains.
Refer to the table below for a comparison between the two methods.
141
Xi Leap, Data Protection, DR & Metro Availability
T A B L E 7
Management Managed at the data center level, Managed at the cluster level via
via Prism Central Prism Element
Dynamic/Static Very dynamic management. Being part of a Protection
Protection Policies can Domain is static. Admin must
automatically be applied to manually manage membership
new VMs. Recovery Plan can of VMs in a Protection Domain
automatically include new VMs
Reusability Protection Policies can be Schedules must be defined on
re-used to protect applications each Protection Domain.
that need the same protection
specs (RPO, RTO, retention
Scope Managed at the data center level Managed at the Nutanix cluster
via Prism Central level through Prism Element
Xi Leap
142
The Nutanix Design Guide
References 19.6
DR Orchestration:
https://www.nutanix.com/products/acropolis/dr-orchestration/
143
144
The Nutanix Design Guide
20
Cloud
Management
& Automation:
Calm, Xi Beam
& Xi Epoch
Author: Chris Brown
145
Cloud Management & Automation: Calm, Xi Beam & Xi Epoch
Nothing is ever 100% in IT. Even the most ardent Dell-EMC fan has
at least some Netapp running in their environment just in case
something goes wrong with a Dell-EMC patch. Clouds are the same.
Using a single cloud introduces a new single point of failure, but the
friction of maintaining policy, governance, and control across clouds
makes it difficult to use more than a single public cloud at a time.
FIGURE 12
Multi-Cloud Operations
Xi Epoch
Monitor & Alert
Xi Beam Calm
Secure & Optimize Deploy & Operate
146
The Nutanix Design Guide
147
Cloud Management & Automation: Calm, Xi Beam & Xi Epoch
Automation closes this gap by tracking all of this for you. No matter
how many components in an application, an automated upgrade
never grows in execution complexity. Automation remembers where
everything is, what it depends on, and what needs to be done.
Automation never forgets.
148
The Nutanix Design Guide
Use-Cases 20.2
• Cloud Optionality.
• Multi-Cloud Governance.
Calm 20.3
149
Cloud Management & Automation: Calm, Xi Beam & Xi Epoch
FIGURE 13
Calm Blueprint Components
Application
Profiles Service Deployment
150
The Nutanix Design Guide
151
Cloud Management & Automation: Calm, Xi Beam & Xi Epoch
Once the VMs are prepared, the blueprint installs Microsoft SQL
Server into each VM by accessing install media from a shared
repository and following the configuration specifications contained
152
The Nutanix Design Guide
Xi Beam 20.4
153
Cloud Management & Automation: Calm, Xi Beam & Xi Epoch
20.5 Xi Epoch
Businesses are increasingly adopting distributed application
154
The Nutanix Design Guide
155
Cloud Management & Automation: Calm, Xi Beam & Xi Epoch
Key benefits:
156
The Nutanix Design Guide
157
Cloud Management & Automation: Calm, Xi Beam & Xi Epoch
20.6 References
Nutanix Calm Product Page:
https://www.nutanix.com/products/calm/
Nutanix Beam:
https://www.nutanix.com/products/beam/
Xi Epoch:
https://www.nutanix.com/products/epoch/
158
The Nutanix Design Guide
159
160
The Nutanix Design Guide
21
Era
Author: René van den Bedem
161
ERA
21.1
Design Considerations
• Nutanix Era supports:
• Microsoft SQL Server 2008 R2, 2012, 2014, 2016 and 2017
• MariaDB
162
The Nutanix Design Guide
• Nutanix Era does not support time machine for Oracle 12c
Container Databases (CDB) and Pluggable Databases (PDB).
References 21.2
163
164
The Nutanix Design Guide
22
Karbon
Author: René van den Bedem
165
Karbon
166
The Nutanix Design Guide
References 22.2
167
168
The Nutanix Design Guide
23
Acropolis
Security &
Flow
Author: Neil Ashworth
169
Acropolis Security & Flow
170
The Nutanix Design Guide
Lifecycle (SecDL)
171
Acropolis Security & Flow
When you look at the last few years and how cyber has become
much more prevalent in not just intelligence and federal
communities but public and private sector communities; Home
Depot, Sony, Target, Experian, the DNC, each breech seemingly
more damaging than the last, providing endless media junkets
and sound bites, the impact is not only measured in dollars and
revenue lost, but in reputation and public perception.
172
The Nutanix Design Guide
FIGURE 14
Security Development Lifecycle
Assess
Repeat Measure
Update Report
Test
173
Acropolis Security & Flow
174
The Nutanix Design Guide
Ultimately what this boils down to is; does the software vendor
that is writing this product that you are using to solve a problem
understand how you plan to use it? And this is from a security
perspective. Does the vendor you are speaking too actually
understand the vertical that you are in? The compliance requirements
that you have and the controls that you may have to adhere to?
Another way a vendor can make your life easier in this regard
is, best practices and standards baked into the product. Nutanix
calls it the intrinsic method. Understanding what it is you need
to do to the product and then baking it into the product. Instead
of writing a product generally for the masses and then creating
documentation and procedures for you to use on your own later,
with your own resources and dollars, why not just understand
those requirements and develop a product that has all that good
to go, baked in to the system and shipping it to the customer in
a hardened configuration state.
175
Acropolis Security & Flow
they are ideal candidates for third-party apps that probe for
deficiencies in a system configuration.
Note: The XCCDF XML format is highly efficient for conversion from
a manual process to machine automation. Designed specifically
to meet the SCAP standard, the XML format is future-proof, in
that it supports the transition to DoD DIARMF (Risk Management
Framework) for continuous monitoring. Any third-party system that
understands XCCDF XML style formatting can consume the STIGs.
176
The Nutanix Design Guide
Note: By default, SCMA runs daily, for organizations that are willing
177
Acropolis Security & Flow
178
The Nutanix Design Guide
Encryption 23.3.2
Making information indecipherable as a means to protect it from
falling into the wrong hands is not anything new. As far back as
600 BCE the ancient Spartans use a device called a scytale to
send secret messages during battle. Modern cryptography uses
an algorithm, a mathematical cipher to encrypt or decrypt data,
turning plaintext into ciphertext.
179
Acropolis Security & Flow
• The first method is via Hardware with the use of Self Encrypted
Drives (SEDs). Key Encryption Key management is done via an
External Key Manager (EKM) sometimes referred to as a Key
Management Server (KMS). Our system treats data in an encrypted
system much the same way as it treats data in a non-encrypted
system. Encryption happens when data lands on SEDs. When a
client reads data from SEDs, non-encrypted data is returned.
180
The Nutanix Design Guide
Since the second method allows for encryption without yet another
silo to manage, customers looking to simplify their infrastructure
operations can now have one-click infrastructure for their key
manager as well. Key management and properly storing secret
keying material is the center price of the Nutanix design. In the LKM
we use a mathematical method referred to as Shamir Key Splitting.
This allows us to securely store only portions of each private key
per node, requiring a quorum of nodes to be present in order to
reassemble the key and decrypt the data. This ensures that drive
theft, and node theft are covered use-cases.
Since the MEK is shared, each node can read what other nodes
have written. In order to reconstruct the keys, a majority of the
nodes need to be present. We use the equation K = Ceiling (N / 2)
to determine how many nodes are required for the majority. For
example, in an 11-node cluster (N = 11), we would need 6 nodes
online to decrypt the data.
181
Acropolis Security & Flow
FIGURE 15
EKM & LKM Workflows
DEK
Encrypted
Data
KEK
OR
EKM
Split and
encrypted
to form
MEK LKM
the MEK
MEK MEK
MEK MEK
MEK MEK
182
The Nutanix Design Guide
Micro-Segmentation 23.4
with Flow
Network Security is big, complicated, requires specialist training,
and often can often necessitate years of experience to deploy
in enterprise organization. on top of that, in order to architect
appropriate solutions for isolating environments takes careful
planning, resources and time. Re-architecting an environment
can be even more problematic if not precisely carried out. Many
network engineers might jokingly state “it is always the network,”
yet they know the consequences of failure being, potentially
breaking existing applications and functionality or worse,
exposing network vulnerabilities.
183
Acropolis Security & Flow
In the legacy data center, external traffic bound for the database,
as indicated by the green line in Fig 3, is only filtered via a perimeter
firewall, this is considered North - South traffic. In the virtual data
center, the attempt of East - West traffic of the two VM’s passing
data, as indicated by the blue line in the figure below, to be properly
inspected by the perimeter firewall creates a hair-pinning effect.
FIGURE 16
Legacy Traffic Flow
External Firewall
Traffic
Router
App
VM VM
Database
184
The Nutanix Design Guide
185
Acropolis Security & Flow
Rather, they simply swipe a credit card, build their application and
meet the business need. They do not care about the underlying
infrastructure because the application is what the end users
are touching, the application is what generates revenue. Given
this information then, in the cloud centric world, why are we
not allowing the application itself to manage some of its own
infrastructure and security needs? The key is to innovate solutions
around cloud-centric application needs rather than outdated silo-
centric infrastructure capabilities.
186
The Nutanix Design Guide
FIGURE 17
Flow OvS Bridge Chain (Micro-Segmentation)
187
Acropolis Security & Flow
188
The Nutanix Design Guide
Extrapolating this out for your environment you can quickly build
layered categories all your applications across environments.
FIGURE 18
Categories Applied to a Virtual Machine
Set Categories
Environment: Production
AppType: Exchange
AppTier: ExchangeMBox
Once categories are set and applied to all the relevant VMs, an
administrator can begin constructing policies. Flow maintains
three different policy types: Quarantine, Isolation, and Application.
189
Acropolis Security & Flow
FIGURE 19
Flow Isolation Policy
An isolation policy allows you to isolate one set of VMs from another so
they cannot talk to each other.
Isolate_Dev_Prod
Purpose
Isolate_Dev_Prod
Environment Production
FIGURE 20
Flow Policy Visual
Isolated Categories
190
The Nutanix Design Guide
Hopefully this brief example has impressed upon you the power of
simplicity embedded within this design. Note how we did not have
to carefully plan IP addressing, or VLAN allocation, we did not have
to think about VLAN IDs, ALCs or subnets. Quite instinctively and
naturally we identified the two environments we wanted to isolate,
assigned those environments categories and effected segmentation
of those environments with a simple policy which will now maintain
those dynamic environments automatically should say a new VM(s)
is spun up or older VM(s) are removed.
References 23.5
Acropolis Security:
https://www.nutanix.com/products/acropolis/security/
191
192
The Nutanix Design Guide
24
Files
Author: Wayne Conrad
193
Files
The traditional use-case for Files was VDI, but now with NFS
support and various improvements, Files is now ready to tackle
many of your traditional use-cases.
24.1 Use-Cases
Nutanix Files provides file shares in the two most common network
file access protocols, Microsoft Windows SMB and Linux/Unix NFS.
Nutanix Files evolved to provide for VDI profile data for what was
our most common use-case, VDI, without having to create file
shares or buy NAS filers from 3rd parties.
Nutanix Files has evolved rapidly since its launch two years ago,
scaling in performance, features and share size so that it can now
take on general purpose file shares, or some application level
transactional storage for workloads like containers.
194
The Nutanix Design Guide
195
Files
196
The Nutanix Design Guide
References 24.3
197
198
The Nutanix Design Guide
25
Volumes
Author: Wayne Conrad
199
Volumes
Why would you want to do this? The answer is simple. Shared disks
at the hypervisor layer for clustered workloads like databases have
always been painful and complex to setup.
25.1 Use-Cases
Nutanix Volumes are iSCSI disks attached inside the OS versus at
the hypervisor layer. This avoids the worst problems associated
with VMware raw device mappings and allows shared disks without
pain and suffering. Nutanix volumes also supports physical servers,
with a large range of supported OSes. One use-case that is NOT
supported however is attaching Nutanix volumes as VM storage to
non-Nutanix hypervisor hosts. Nutanix has not built the vSphere,
Hyper-V or other hypervisor level plugins required to make Volumes
work for hosting VMs.
200
The Nutanix Design Guide
References 25.3
201
202
The Nutanix Design Guide
26
Buckets
Author: Laura Jordana
203
Buckets
26.1 Use-Cases
26.1.1 DevOps
The way IT departments deploy applications is rapidly changing.
With the emergence of containers and other cloud native
technologies, users need a scalable and resilient storage stack
which is optimized for the world of cloud computing. Nutanix
Buckets was built for cloud native environments and is the optimal
solution for next generation applications that could be running
anywhere, whether on-prem or in the cloud.
204
The Nutanix Design Guide
205
Buckets
206
The Nutanix Design Guide
References 26.3
207
208
The Nutanix Design Guide
27
Prism
Author: Wayne Conrad
209
Prism
The Nutanix Prism UI lives at two layers, Prism Central, which manages
multiple clusters, and Prism Element, running on each cluster.
• Capacity Planning
• VM right sizing
• V3 APIs
27.2
Prism Central Design
Considerations
• Scale Out Prism Central requires running Prism Central on a
Nutanix cluster. Single Prism Central servers may be placed on
any other virtualization platform.
210
The Nutanix Design Guide
• Scale Out Prism Central is three VMs and can tolerate one failure.
Scale Out Prism Central only resides on a single Nutanix cluster
and cannot span clusters or data centers.
Considerations
• Some UI features are only on some hypervisors, such as network
visualization, which is only on AHV.
• Microsoft IE11 and Edge have issues uploading large files to the
Prism interface. We recommend the use of a modern browser
such as Chrome or Firefox.
References 27.4
211
212
The Nutanix Design Guide
28
Life Cycle
Manager
Author: René van den Bedem
213
Life Cycle Manager
The life cycle manager (LCM) tracks software and firmware versions
of all entities in the cluster.
• LCM 2.1 and later supports both Prism Element and Prism
Central.
214
The Nutanix Design Guide
References 28.2
215
216
The Nutanix Design Guide
29
AHV
Authors: Wayne Conrad & Magnus Andersson
217
AHV
• VMs will restart from a failed AHV host as long as any AHV host has
enough available resources to satisfy the memory requirement.
218
The Nutanix Design Guide
219
AHV
29.2
Pros and Cons of
Nutanix AHV
Nutanix AHV Strengths:
220
The Nutanix Design Guide
• No metro-clustering support.
• No memory overcommitment.
Hypervisors
In addition to AHV, Nutanix supports three additional hypervisors:
VMware vSphere ESXi, Microsoft Hyper-V, and the Citrix Hypervisor.
221
AHV
29.4
References
AHV Virtualization Product Page:
https://www.nutanix.com/products/acropolis/virtualization/
222
The Nutanix Design Guide
223
224
The Nutanix Design Guide
30
Move
Author: René van den Bedem
225
Move
30.1
Design Considerations
• Nutanix Move migrations from VMware ESXi to AHV support:
AOS 5.0.x-5.10.x, ESXi 5.5-6.7 and vCenter Server 5.5-6.7.
226
The Nutanix Design Guide
References 30.2
227
228
The Nutanix Design Guide
31
X-Ray
Author: Gary Little
229
X-Ray
FIGURE 21
Amdahl’s Law
20
18
16
14
12
Speed Up
10
0
1
16
32
64
128
256
512
1024
2048
4056
8192
16384
32768
65536
Number of processors
Parallel portion
230
The Nutanix Design Guide
231
X-Ray
31.1
Criteria 1 – Raw IO
Performance
Does HCI have the raw IO performance I need for my most
demanding applications?
232
The Nutanix Design Guide
T A B L E 8
T A B L E 9
Simply measuring how fast our storage will run under ideal
conditions tells us nothing about how applications will be impacted
in failure or multi-tenant environments. In almost all cases the raw
performance exceeds typical demands.
233
X-Ray
In this example, the single VM, single disk test yields 2,500 IOPS -
but the total capacity of the cluster is around 600,000 IOPS. The
reason for the discrepancy is that a single VM on a single node
cannot drive all the performance from the entire cluster. A Nutanix
cluster is designed to provide consistent performance to multiple
VMs running on multiple hosts. In the example chart below - we see
that the amount of IOPS that the cluster delivers increases as we
add load, until it reaches saturation at around 600,000 IOPS on this
particular cluster.
FIGURE 22
Cluster-wide Performance
Random Read IOPS (All VMs)
600K
400K
You are here
200K
Total Capacity
10:00PM 11:00PM 12:00AM 1:00AM
2ms
1ms
234
The Nutanix Design Guide
FIGURE 23
Four Corners X-Ray Test
Random Read IOPS | What’s a good result?
1,000,000
500,000
30 GBps
20 GBps
10 GBps
0 KBps
1:54pm 1:56pm 1:58pm 2:00pm
1,000,000
500,000
30 GBps
20 GBps
10 GBps
0 KBps
1:54pm 1:56pm 1:58pm 2:00pm
235
X-Ray
31.2
Criteria 2 – Resilience
Can HCI architecture give me the same level of resilience that I am
used to with Hardware based redundancy?
In many cases testers fall back on the “Disk Pull” test during a
workload. While this experiment will reveal what happens when a disk
is pulled from a running system it does not accurately simulate a disk
failure. The disk enclosure firmware will treat a disk-pull differently to a
disk failure - which tends to degrade over time anyway.
For small clusters that typically are used for POC - the largest failure
domain that can be sustained is an entire node. We can use the X-Ray
extended node failure test to show what happens to the remaining
nodes on the cluster - and the time/impact to rebuild the data.
236
The Nutanix Design Guide
In this test, X-Ray connects directly to the IPMI port on the cluster
hardware and issues a power-off command (not shutdown) without
giving the cluster software any warnings. The reason we choose
to fail a node is a) larger domain and b) more like a real failure.
c) Using IPMI we can fail any sort of cluster node that supports
IPMI because IPMI is a public interface. It is possible therefore to
compare the failure handling between nodes running Nutanix and
other vendors HCI implementations.
Performance
Will HCI retain consistent performance in the face of demanding
multi-tenant applications?
237
X-Ray
FIGURE 24
X-Ray Consistent Performance Graph
HCI Without Data Locality
OLTP IOPS OLTP Workload in isolation DSS Workload begins on other host HCI With Data Locality
8000
6000
4000
2000
15ms
10ms
5ms
0ms
0m 8.33m 16.67m 25m 33.33m 41.67m 50m 58.33m
• Predictable scaling
238
The Nutanix Design Guide
References 31.5
X-Ray Datasheet:
https://www.nutanix.com/documents/datasheets/nutanix-x-ray-
datasheet.pdf
239
240
The Nutanix Design Guide
32
Foundation
Author: Wayne Conrad
241
Foundation
242
The Nutanix Design Guide
• Confirm you have got the correct hypervisor ISO for your chosen
hardware if you are not using AHV. Nutanix published a whitelist
of approved ISO files. Check the MD5 sum of your ISO file
matches as vendors have been known to update their ISO files
silently without changing a build number.
References 32.1
243
244
The Nutanix Design Guide
33
Data Center
Facilities
Author: René van den Bedem
245
Data Center Facilities
246
The Nutanix Design Guide
Use-Cases 33.1
• Power – UPS 1+1, UPS N+1 or Radial (Generator and UPS are
combined into one unit), Auto Transfer Switch (ATS), External
Generators, Fuel Cells, Distance from Building Transformers?
Supply voltage?
247
Data Center Facilities
• Load rating – What is the load rating of the data center floor,
ramps and elevator?
248
The Nutanix Design Guide
Risks 33.3
These are some of the risks associated with Nutanix solutions and
Data Center Facilities:
• There is no “right solution”, there is only the solution that fits your
business requirements, budget and timeline. Talk to the experts that
specialize in this field, get quotations and advice and then select the
best strategy for your company.
References 33.4
Schneider-Electric:
https://www.schneider-electric.com/en/work/solutions/for-business/
data-centers-and-networks/
249
250
The Nutanix Design Guide
34
People &
Process
Author: René van den Bedem
251
People & Process
252
The Nutanix Design Guide
References 34.1
253
254
The Nutanix Design Guide
35
Risk
Management
Author: Daemon Behr
255
Risk Management
256
The Nutanix Design Guide
How do you get this information and how do you organize it into
and actionable strategy? In this chapter, we will explore some
methods to obtain the operational intelligence required for making
design decisions based on identified risks.
FIGURE 25
Risk Management
1. 2. 3.
Risk identified Probability and Risk severity
with an ID impact defined determined
RI-002
Risk Register
RI-003
4.
Impact Key Risk Indicators Risk added
RI-004
to register
5.
Key Risk Indicators monitored,
probability and impact updated
257
Risk Management
258
The Nutanix Design Guide
There are many more types of risks that can be considered, such
as budget, competition, compliance, force majeure, integration,
procurement, resource, strategic, etc. This chapter is focused
on infrastructure risk and the technology category. Below is an
example of how to determine the technology risks, based on its
design qualities.
Availability 35.1.1
This pertains not only to operational states, but also transition
states. This includes periods during upgrades, migrations and DR.
Availability should be considered on an application or workloads
basis. The availability requirements of each workload or application
need to be determined in co-operation with the responsible
business units.
259
Risk Management
35.1.2 Manageability
This includes aspects, such as who, how, when and from where,
will someone perform management operations on a technology.
35.1.3 Performance
This includes understanding the performance requirements
and SLAs of all the workloads in the environment and what the
ramifications are for not meeting them.
35.1.4 Recoverability
This includes understanding the various states that the
infrastructure can be in when a failure occurs.
35.1.5 Security
This includes knowing the attack surfaces in your environment, the
vulnerabilities, and the proactive and reactive actions for incidents.
35.3 Recommendations
Recommendations are the suggested risk treatments that consider
the organizational risk appetite, the severity, and the operational
capabilities to initiate a treatment. A recommendation based on the
above password example would be:
260
The Nutanix Design Guide
Prioritization 35.5
T A B L E 1 0
Risk Prioritization
261
Risk Management
T A B L E 1 1
Risk Register
262
The Nutanix Design Guide
3-tier network
architectures use
7 7 49
inter-switch links
to provide network
connectivity across
access layers segments.
Link oversubscription will
arise when the spanning
tree blocks redundant
links to prevent network
loops on the L2
segments.
If a new infrastructure is
being built in parallel to
8 8 64
supersede the existing
environment, then the
role of replication target
for South America needs
to be considered and
created.
263
Risk Management
35.7 References
Designing Risk in IT Infrastructure, by Daemon Behr:
http://designingrisk.com/buy/
Insights:
https://portal.nutanix.com/#/page/insights
Field advisories:
https://portal.nutanix.com/#/page/static/fieldAdvisories
Security advisories:
https://portal.nutanix.com/#/page/static/securityAdvisories
264