VMware Vsphere® 6.0 Knowledge Transfer Kit - PPT Download

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Search...

Search Log in 

VMware vSphere ® 6.0


Knowledge Transfer Kit
Technical Walk-Through

© 2015 VMware Inc. All rights reserved.

1 / 98

Similar presentations
VMware vSphere® 6.0 Knowledge  622

Transfer Kit
Published by Baldwin Hall Modified over 7 years ago

 Embed  Download presentation


Presentation on theme: "VMware vSphere® 6.0 Knowledge
Transfer Kit"— Presentation transcript:

1 VMware vSphere® 6.0


Knowledge Transfer Kit
Technical Walk-Through

2 Agenda VMware ESXi™ Virtual Machines VMware vCenter


Server™
VMware vSphere vMotion®
Availability
VMware vSphere High Availability
VMware vSphere Fault Tolerance
VMware vSphere Distributed Resource Scheduler™
Content Library
VMware Certificate Authority (CA)
Storage
Networking

3 Technical Walk-Through
The technical walk-through expands on the architectural presentation to
provide more detailed technical best practice and troubleshooting
information for each topic
This is not comprehensive coverage of each topic
If you require more detailed information use the VMware vSphere
Documentation ( and VMware Global Support Services might be of
assistance

4 ESXi
5 Components of ESXi
The ESXi architecture comprises the underlying operating system, called
the VMkernel, and processes that run on top of it
VMkernel provides a means for running all processes on the system,
including management applications and agents as well as virtual
machines
It has control of all hardware devices on the server and manages
resources for the applications
The main processes that run on top of VMkernel are
Direct Console User Interface (DCUI)
Virtual Machine Monitor (VMM)
VMware Agents (hostd, vpxa)
Common Information Model (CIM) System

6 Components of ESXi (cont.)


Direct Console User Interface
Low-level configuration and management interface, accessible through
the console of the server, used primarily for initial basic configuration
Virtual Machine Monitor
Process that provides the execution environment for a virtual machine,
as well as a helper process known as VMX. Each running virtual machine
has its own VMM and VMX process
VMware Agents (hostd and vpxa)
Used to enable high-level VMware Infrastructure™ management from
remote applications
Common Information Model System
Interface that enables hardware-level management from remote
applications through a set of standard APIs

7 ESXi Technical Details


VMkernel
A POSIX-like operating system developed by VMware, which provides
certain functionality similar to that found in other operating systems,
such as process creation and control, signals, file system, and process
threads
Designed specifically to support running multiple virtual machines and
provides such core functionality such as
Resource scheduling
I/O stacks
Device drivers
Some of the more pertinent aspects of the VMkernel are presented in
the following sections

8 ESXi Technical Details (cont.)


File System
VMkernel uses a simple in-memory file system to hold the ESXi Server
configuration files, log files, and staged patches
The file system structure is designed to be the same as that used in the
service console of traditional ESX Server. For example
ESX Server configuration files are found in /etc/vmware
Log files are found in /var/log/vmware
Staged patches are uploaded to /tmp
This file system is independent of the VMware vSphere VMFS file system
used to store virtual machines
The in-memory file system does not persist when the power is shut
down. Therefore, log files do not survive a reboot if no scratch partition
is configured
ESXi has the ability to configure a remote syslog server and remote dump
server, enabling you to save all log information on an external system

9 ESXi Technical Details (cont.)


User Worlds
The term user world refers to a process running in the VMkernel
operating system. The environment in which a user world runs is limited
compared to is found in a general-purpose POSIX-compliant operating
system such as Linux
The set of available signals is limited
The system API is a subset of POSIX
The /proc file system is very limited
A single swap file is available for all user world processes. If a local disk
exists, the swap file is created automatically in a small VFAT partition.
Otherwise, the user is free to set up a swap file on one of the attached
VMFS datastores
Several important processes run in user worlds. Think of these as native
VMkernel applications. They are described in the following sections
10 ESXi Technical Details (cont.)
Direct Console User Interface (DCUI)
DCUI is the local user interface that is displayed only on the console of an
ESXi system
It provides a BIOS-like, menu-driven interface for interacting with the
system. Its main purpose is initial configuration and troubleshooting
The DCUI configuration tasks include
Set administrative password
Set Lockdown mode (if attached to VMware vCenter™)
Configure and revert networking tasks
Troubleshooting tasks include
Perform simple network tests
View logs
Restart agents
Restore defaults

11 ESXi Technical Details (cont.)


Other User World Processes
Agents used by VMware to implement certain management capabilities
have been ported from running in the service console to running in user
worlds
The hostd process provides a programmatic interface to VMkernel, and it
is used by direct VMware vSphere Client™ connections as well as APIs. It
is the process that authenticates users and keeps track of which users
and groups have which privileges
The vpxa process is the agent used to connect to vCenter. It runs as a
special system user called vpxuser. It acts as the intermediary between
the hostd agent and vCenter Server
The FDM agent used to provide vSphere High Availability capabilities has
also been ported from running in the service console to running in its
own user world
A syslog daemon runs as a user world. If you enable remote logging, that
daemon forwards all log files to the remote target in addition to putting
them in local files
A process that handles initial discovery of an iSCSI target, after which
point all iSCSI traffic is handled by the VMkernel, just as it handles any
other device driver

12 ESXi Technical Details (cont.)


Open Network Ports – A limited number of network ports are open on
ESXi. The most important ports and services are
80 – This port serves a reverse proxy that is open only to display a static
Web page that you see when browsing to the server. Otherwise, this port
redirects all traffic to port 443 to provide SSL-encrypted communications
to the ESXi Server
443 (reverse proxy) – This port also acts as a reverse proxy to a number
of services to provide SSL- encrypted communication to these services.
The services include API access to the host, which provides access to the
RCLIs, the vSphere Client, vCenter Server, and the SDK
5989 – This port is open for the CIM server, which is an interface for
third-party management tools
902 – This port is open to support the older VIM API, specifically the older
versions of the vSphere Client and vCenter
Many other ports depending on what is configured (vSphere High
Availability, vSphere vMotion, and so on) have their own port
requirements, but this are only opened if these services are configured

13 ESXi Troubleshooting
Troubleshooting ESXi is very much the same as any operating system
Start by narrowing down the component which is causing the problem
Next review the logs as required to narrow down the issue
Common log files are as follows
/var/log/auth.log: ESXi Shell authentication success and failure
/var/log/esxupdate.log: ESXi patch and update installation logs
/var/log/hostd.log: Host management service logs, including virtual
machine and host Task and Events, communication with the vSphere
Client and vCenter Server vpxa agent, and SDK connections
/var/log/syslog.log: Management service initialization, watchdogs,
scheduled tasks and DCUI use
/var/log/vmkernel.log: Core VMkernel logs, including device discovery,
storage and networking device and driver events, and virtual machine
startup
/var/log/vmkwarning.log: A summary of Warning and Alert log messages
excerpted from the VMkernel logs
/var/log/vmksummary.log: A summary of ESXi host startup and
shutdown, and an hourly heartbeat with uptime, number of virtual
machines running, and service resource consumption
/var/log/vpxa.log: vCenter Server vpxa agent logs, including
communication with vCenter Server and the Host Management hostd
agent
/var/log/fdm.log: vSphere High Availability logs, produced by the FDM
service

14 ESXi Best Practices


For in depth ESXi and other component practices, read the Performance
Best Practices Guide ( )
Always set up the VMware vSphere Syslog Collector (Windows) / VMware
Syslog Service (Appliance) to remotely collect and store the ESXi log files
Always set up the VMware vSphere ESXi Dump Collector Service to allow
dumps to be remotely collected in the case of a VMkernel failure
Ensure that only the firewall ports required by running services are
enabled in the Security profile
Ensure the management network is isolated from the general network
(VLAN) to decrease the attack surface of the hosts
Ensure the management network has redundancy through NIC Teaming
or by having multiple management interfaces
Ensure that the ESXi Shell and SSH connectivity are not permanently
enabled
Performance Best Practices Guide for vSphere 6.0
(Soon to be released)
Performance Best Practices Guide for vSphere 5.5

15 Virtual Machines
16 Virtual Machine Troubleshooting
Virtual machines run as processes on the ESXi host
Troubleshooting is split into two categories
Inside the Guest OS – Standard OS troubleshooting should be used,
including the OS-specific log files
ESXi host level troubleshooting – Concerning the virtual machine process,
where the log file for the virtual machine is reviewed for errors
ESXi host virtual machine log files are located in the directory which the
virtual machine runs by default, and are named vmware.log
Generally issues occur as a result of a problem in the guest OS
Host level crashes of the VM processes are relatively rare and are
normally a result of hardware errors or compatibility of hardware
between hosts

17 Virtual Machine Best Practices


Virtual machines should always run VMware Tools™ to ensure that the
correct drivers are installed for virtual hardware
Right-size VMs to ensure that they use only required hardware. If VMs
are provisioned with an over-allocation of resources that are not used,
ESXi host performance and capacity is reduced
Any devices not being used should be disconnected from VMs (CD-
ROM/DVD, floppy, and so on)
If NUMA is used on ESXi, VMs should be right-sized to the size of the
NUMA nodes on the host to avoid performance loss
VMs should be stored on shared storage to allow for the maximum
vSphere vMotion compatibility and vSphere High Availability
configurations in a cluster
Memory/CPU reservations should not be used regularly because they
reserve the resource and can prevent the VMware vSphere Hypervisor
from being able to take advantage of over commitment technologies
VMs partitions should be aligned to the storage array partition alignment
Storage and Network I/O Control can dramatically help VM performance
in times of contention

18 vCenter Server
19 vCenter Server 6.0 with Embedded Platform Services
Controller
SSO
CM
License
IS
Web
TOOLS
Platform Services Controller
Management Node
Sufficient for most environments
Easiest to maintain and deploy
Recommended - 8 or less vCenter Servers
vCenter Server and the infrastructure controller are deployed on a single
virtual machine or physical host.
vCenter Server with embedded infrastructure controller is suitable for
smaller environments with eight or less product instances.
To provide the common services, such as vCenter Single Sign-On, across
multiple products and vCenter Server instances, you can connect
multiple vCenter Server instances with embedded infrastructure
controllers together.
You can do this by replicating the vCenter Single Sign-On data from one
of the Infrastructure Controller to the other Infrastructure Controllers.
This way, infrastructure data for each product is replicated to all of the
infrastructure controllers, and each individual infrastructure controller
contains a copy of the data for all of the infrastructure controllers.
The Embedded Infrastructure Controller supports both an internal
database, which is vPostgres or external database, such as Oracle and
Microsoft Server.
The vCenter Server 6.0 with Embedded Infrastructure Controller is
available for both Windows and Virtual Appliance format.
Supports embedded and external database
Available for Windows and vCenter Server Appliance

20 vCenter Server 6.0 with External Platform Services Controller


SSO
CM
License
IS
Web
TOOLS
VC
Platform Services Controller
Management Node
For larger customers with numerous vCenter Servers
Reduces infrastructure by sharing Platform Services Controller across
several vCenter Servers
Recommended - 9 or more vCenter Servers
vCenter Server and the infrastructure controller are deployed on
separate virtual machines or physical hosts. The Platform Services
Controller can be shared across many products. This configuration is
suitable for larger environments with nine or more product instances.
To provide the common services, such as vCenter Single Sign-On, across
multiple products and vCenter Server instances, the products are
connected together through the Platform Services Controller. The
infrastructure data that each product needs is in the Platform Services
Controller.
If you have more than one Platform Services Controller, you can set up
the controllers to replicate data with each other all the time, so that the
data from each Platform Services Controller is shared with every
product.
The Platform Services Controller is lightly loaded and doesn’t use as
many resources as the management nodes.
Supports embedded and external database
Available for Windows and vCenter Server Appliance

21 Installation / Upgrade Overview


Component MSI Deployment
~ 50 component MSIs
Most MSIs only deliver binaries and configuration files to the machine
Firstboot Scripts
The installation process consists of two key parts.
First the vSphere installer copies approximately 44 MSI files to the
installation directory of the Windows machine. These MSI files are binary
installation and configuration files. All MSI files are installed silently.
[Click]
Second, once the MSI files are copied to the installation folder the
Firstboot process starts. Depending on the deployment type that you
choose, a different number of services will be installed.
An Embedded node has about 25 services, The Management node has
about 24 and an Infrastructure Controller has about 7 services installed.
Firstboot scripts are written in Python and the same scripts are used for
both the Windows installer and the vCenter Virtual Appliance.
Depending on your deployment type, these Python firstboot scripts are
run to install and configure the Components.
Firstboot scripts also take care of generating certificates and registering
the components with the Component Manager Service. It will also create
and start the Services, and if needed, open ports in the firewall.
~28 firstboot scripts – depending on the install method
Configures the product
Generates certificates
Registration between components
Creates and starts services
Opens ports in firewall

22 Installation Overview/Architecture
This graphic shows how the installation progresses for vSphere 6
Parent
MSI
Component MSIs
(~50)
Services
(Firstboot
Scripts)
VPXD
VCDB (vPostgres) Or External
vpxd
VC Prefs
NGC
Net Dump
vCenter Server 6.0 Installer
SSO
VPXD
LOTUS
SSO
SMS/SPBM
Here’s a graphical overview of the installation process.
Once the installation media is download and mounted on the destination
machine, the vSphere 6.0 Installer menu is launched.
Once you pick the option to install vCenter for Windows, you will be
prompted with a series of questions such as
Deployment type
What database you want to use
Credentials for SSO, and so on
Once the installer has captured all this information the MSI files are
copied from the installation media to the destination installation folder.
The number of MSIs will vary depending on the installation type.
28 firstboot scripts will be copied for an embedded node. Less will be
copied for other deployment types.
[CLICK]
Once the MSIs are copied they are installed and the firstboot process
starts. The firstboot process run and configures the services and
performs tasks such as generating certificates, installing and starting
services and registering the components.
As the firstboot process progresses, the components are installed and
configured until you have successfully installed all the components.
NGC
CM
Auto Deploy

Flat File
NGC

Service Manager
MSI
SSO
IS
VCO
XDB
CM
Licensing
KV
VCO DB

AuthZ

23 Platform Services Controller


Upgrade Scenarios
The upgrade decision-making process depends on the placement of the
vCenter Single Sign- On service and the vCenter Services
If a machine has Single Sign-On and vCenter Server installed, it becomes
vCenter Server with embedded Platform Services Controller
vSphere 5.x
All vSphere components installed on the same host
vSphere 6.0
Embedded Deployment
vCenter Server
Platform Services Controller
Web Client
vCenter Server
Inventory Service
Single Sign-On

24 Upgrade Scenarios (cont.)


If one machine has vCenter Server and Inventory Service installed
vSphere Web Client and Single Sign-On are in separate machines
5.x
Web Client
vCenter Server
Management Node
Inventory Service
6.0
Inventory Service
vCenter Server
Web Client
Single Sign-On
PSC
Single Sign-On

25 Upgrade Scenarios (cont.)


All vSphere 5.x components are installed on separate servers/VMs
After upgrade, Web Client and Inventory Service are defunct
Web Client and Inventory Service become part of Management Node
Web Client
vCenter Server
Inventory Service
Management Node
Web Client
vCenter Server
Inventory Service
Single Sign-On
Single Sign-On
PSC

26 Upgrade Scenarios (cont.)


vSphere 6.0 still requires a load balancer for vSphere High Availability
We are still working out the details
F5 and Citrix Netscaler
vCenter Server
Management Node
Web Client
vCenter Server
Inventory Service
Load Balancer
Load Balancer
Single Sign-On
PSC
Single Sign-On
PSC
Single Sign-On
Single Sign-On

27 vCenter Troubleshooting
vCenter for windows has been consolidated and organized in this release
Installation and logging directories mimic the vCenter Server Appliance in
previous releases
Start by narrowing down the component which is causing the problem
Next review the logs as required to narrow down the issue
Each process now has its own logging directory:
28 vCenter Troubleshooting – Installer Logs
vSphere Installer logs
Can show up in %TEMP% or %TEMP%\<number> e.g. %TEMP%\1
vminst.log – Logging created by custom actions – usually verification and
handling of MSI properties
*msi.log (for example, vim-vcs-msi.log or vm-ciswin-msi.log)
MSI installation log–strings produced by the Microsoft Installer backend
pkgmgr.log – contains a list of installed sub-MSIs (for example, VMware-
OpenSSL.msi) and the command lines used to install them
pkgmgr-comp-msi.log – the MSI installation logs for each of the ~50 sub-
MSIs (appended into one file)

29 vCenter Troubleshooting – Log Bundle


Generate log bundles from the command prompt
In the command prompt navigate to
C:\Program Files\VMware\vCenter Server\bin
Run the command vc-support.bat -w %temp%\
The log bundle is generated in the system temp directory
The VC-Support is located in
C:\Program Files\VMware\vCenter Server\bin
To run it, type vc-support.bat in
C:\Program Files\VMware\vCenter Server\bin folder
Without any parameters, the vc-support output bundle will be written to
the %TEMP% folder.
You specify an alternate location using the –w switch.
...

30 vCenter Server Log Bundle


Support bundle comes as TGZ format
Once the support bundle is on local machine you must unzip it
Commands – OS-specific details
Program Files – Configuration and properties for vCenter components
ProgramData – Component log files and firstboot logs
Users – %TEMP% folder
Windows – local Hosts file
The output from the vc-support bundle is in TGZ format.
Components of a support bundle include
Commands – OS-specific details
Program Files – Configuration and properties for vCenter components
ProgramData – Component log files and firstboot logs are located here
Users - %TEMP% folder
Windows – local hosts file

31 VC-Support – ProgramData > VMware > CIS – cfg


What is my database and deployment node type?
\ProgramData\VMware\CIS\cfg\db.type
\ProgramData\VMware\CIS\cfg\deployment.node.type
vCenter Configuration files – vmware-vpx folder
vpxd.cfg, vcdb.properties, embedded_db.cfg
32 vCenter Best Practices
Verify that vCenter, the Platform Services Controller, and any database
have adequate CPU, memory, and disk resources available
Verify that the proper inventory size is configured during the installation
Minimize latency between components (vCenter and Platform Services
Controller) by minimizing network hops between components
External databases should be used for large deployments
If using Enhanced Linked Mode, VMware recommends having external
Platform Services Controllers
Verify that DNS is configured and functional for all components
Verify that time is correct on vCenter and all other components in the
environment
VMware vSphere Update Manager™ should be installed on a separate
system if inventory is large

33 vSphere vMotion
34 vSphere vMotion and vSphere Storage vMotion
Troubleshooting
vSphere vMotion and vSphere Storage vMotion are some of the best
logged features in vSphere
Each migration that occurs has a unique Migration ID (MID) that can be
used to search logs for the vSphere vMotion and vSphere Storage
vMotion
MIDs look as follows:
Each time a vSphere vMotion and vSphere Storage vMotion is attempted,
all logs can be reviewed to find the error using grep and searching for
the term Migrate
Both the source and the destination logs should be reviewed
The following is a list of common log files and errors
VMKernel.log – VMkernel logs usually contain storage or network errors
(and possibly vSphere vMotion and vSphere Storage vMotion timeouts)
hostd.log – contains interactions between vCenter and ESXi
vmware.log – virtual machine log file which will show issues with starting
the virtual machine processes
vpxd.log – vSphere vMotion as seen from vCenter normally shows a
timeout or other irrelevant data because the errors are occurring on the
host itself

35 vSphere vMotion Troubleshooting – Example vmkernel.log –


Source
T16:47:04.555Z cpu0:305224)Migrate: vm : InitMigration:3215: Setting
VMOTION info: Source ts = , src ip = < > dest ip = < > Dest wid = using
SHARED swap
T16:47:04.571Z cpu0:305224)Migrate: StateSet:158: S: Changing state
from 0 (None) to 1 (Starting migration off)
T16:47:04.572Z cpu0:305224)Migrate: StateSet:158: S: Changing state
from 1 (Starting migration off) to 3 (Precopying memory)
T16:47:04.587Z cpu1:3589)Migrate:
VMotionServerReadFromPendingCnx:192: Remote machine is ESX 4.0 or
newer.
T16:47:05.155Z cpu1:588763)VMotionSend: PreCopyStart:1294: S:
Starting Precopy, remote version
T16:47:07.985Z cpu1:305226)VMotion: MemPreCopyIterDone:3927: S:
Stopping pre-copy: only 156 pages left to send, which can be sent within
the switchover time goal of seconds (network bandwidth ~ MB/s, 51865%
t2d)
T16:47:07.991Z cpu1:305226)VMotion: PreCopyDone:3259:
Migration ID
S: for Source
Troubleshooting: Logging – Source
We can see the source and destination vMotion interface IP addresses.
Next, we see the Migration ID and the state changing from no vMotion to
starting.
Notice that S: means that this log file is from the course ESXi Server.
We start the pre-copies. We do a number of pre-copies until the amount
of memory left to transfer is small–enough that we can transfer in less
than a second.
35
36 vMotion Troubleshooting – Example vmkernel.log –
Destination
Migration ID is the same on the Destination
T16:45:35.156Z cpu1:409301)Migrate: vm : InitMigration:3215: Setting
VMOTION info: Dest ts = , src ip = < > dest ip = < > Dest wid = 0 using
SHARED swap
T16:45:35.190Z cpu1:409301)Migrate: StateSet:158: D: Changing state
from 0 (None) to 2 (migration on)
T16:45:35.432Z cpu0:3556)Migrate:
VMotionServerReadFromPendingCnx:192: Remote machine is ESX 4.0 or
newer.
T16:45:36.101Z cpu1:409308)VMotionRecv: PreCopyStart:416: D: got
MIGRATE_MSG_PRECOPY_START
T16:45:36.101Z cpu1:409308)Migrate: StateSet:158: D: Changing state
from 2 (Starting migration on) to 3 (Precopying memory)
T16:45:38.831Z cpu0:409308)VMotionRecv: PreCopyEnd:466: D: got
MIGRATE_MSG_PRECOPY_END
T16:45:38.831Z cpu0:409308)VMotionRecv: PreCopyEnd:478: D:
Estimated network bandwidth MB/s during pre-copy
T16:45:38.917Z cpu0:409308)Migrate: StateSet:158: D: Changing state
from 3 (Precopying memory) to 5 (Transferring cpt data)
T16:45:39.070Z cpu0:409308)Migrate: StateSet:158: D: Changing state
from 5 (Transferring cpt data) to 6 (Loading cpt data)
D: for Destination
Troubleshooting: Logging – Destination
This is the destination log file for the same vMotion.
We can tell it’s the same as the Migrate ID is identical
We can tell it’s the destination because it uses D.
Highlighted is the entry that is logged when it receives a message from
the source telling it that it’s finished the pre-copy iterations.
At the bottom, we can see it loading the checkpoint data.
36
37 vSphere vMotion Best Practices
ESXi host hardware should be as similar as possible to avoid failures
VMware Virtual Machine Hardware compatibility is important to avoid
failures as newer hardware revisions cannot be run on older ESXi hosts
10 Gb networking will improve vSphere vMotion performance
vSphere vMotion networking should be segregated form other traffic to
prevent saturation of network links
Multiple network cards can be configured for vSphere vMotion VMkernel
networking to improve performance of migrations

38 vSphere Storage vMotion Best Practices


If vSphere Storage vMotion traffic takes place on storage that might also
have other I/O loads (from other VMs on the same ESXi host or from
other hosts), it can further reduce the available bandwidth, so it should
be done during times when there will be less impact
„vSphere Storage vMotion will have the highest performance during
times of low storage activity (when available storage bandwidth is
highest) and when the workload in the VM being moved is least active
vSphere „Storage vMotion can perform up to four simultaneous disk
copies per vSphere Storage vMotion operation. However, vSphere
Storage vMotion will involve each datastore in no more than one disk
copy at any one time. This means, for example, that moving four VMDK
files from datastore A to datastore B will occur serially, but moving four
VMDK files from datastores A, B, C, and D to datastores E, F, G, and H will
occur in parallel
For performance-critical vSphere Storage vMotion operations involving
VMs with multiple VMDK files, you can use anti-affinity rules to spread
the VMDK files across multiple datastores, thus ensuring simultaneous
disk copies
„vSphere Storage vMotion will often have significantly better
performance on vStorage APIs for Array Integration (VAAI)-Capable
storage arrays

39 Availability
vSphere High Availability

40 vSphere High Availability Technical Details


vSphere High Availability uses a different agent model from that used by
previous versions of vSphere High Availability
There is no longer any primary/secondary host designation
It uses both the management network and storage devices for
communication
It introduces IPv6 support
vSphere High Availability also has a new deployment and configuration
mechanism which reduces the cluster configuration time to ~1 minute by
configuring hosts in parallel rather than serially
This is a vast improvement when compared to the ~1 minute per host in
previous versions of vSphere High Availability
It supports the concept of management network partitioning, where the
cluster can continue to function when some hosts are unreachable on
the management network
Error reporting has also been improved with FDM. Now it uses a single
log file per host and supports syslog integration

41 vSphere High Availability Technical Details (cont.)


In the vSphere High Availability architecture, each host in the cluster runs
an FDM agent
The FDM agents do not use vpxa and are completely decoupled from it
The agent (or FDM) on one host is the master, and the agents on all other 70-293: MCSE Guide to
hosts are its slaves Planning a Microsoft
When vSphere High Availability is enabled, all FDM agents participate in Windows Server 2003
an election to choose the master Network, Enhanced Chapter
The agent that wins the election becomes the master 6: Planning, Configuring,
If the host that is serving as the master should subsequently fail, be And Troubleshooting WINS.
shutdown, or need to abdicate its role, a new master election is held

42 vSphere High Availability Technical Details – Role of the


Master
A master monitors ESXi hosts and VM availability
A master will monitor slave hosts and it will restart VMs in the event of a
slave host failure DPM - IPMI Product
It manages the list of hosts that are members of the cluster and Support Engineering
manages adding and removing hosts from the cluster VMware Confidential.
It monitors the power state of all the protected VMs, and if one should
fail, it will restart the VM
It manages the list of protected VMs and updates this list after each user-
initiated power on or power off
It sends heartbeats to the slaves so the slaves know the master is alive
It caches the cluster configuration and informs the slaves of changes in
configuration
A master will reports state information to vCenter through property
updates
Comparing this list to the existing product, note that many of the Virtual Machine
master's responsibilities also exist in the AAM architecture. Movement and Hyper-V
For example, in both architectures, the health of hosts are monitored. Replica
In the AAM architecture, these responsibilities were shared by the
vpxa/HA agents on each host, and by the AAM primary that ran the rule.
In the FDM architecture, these responsibilities are all provided by the
FDM agent running on the master host.

43 vSphere High Availability Technical Details – Role of the


Slave
A slave monitors the runtime state of the VMs running locally and
forwards significant state changes to the master
It implements vSphere High Availability features that do not require
central coordination, most notably VM health monitoring
It monitors the health of the master, and if the master should fail, it
participates in a new master election

44 vSphere High Availability Technical Details – Master and


Slave Summary Views
45 vSphere High Availability Technical Details – Master Election
A master is elected when the following conditions occur
vSphere High Availability is enabled
A master host fails
A management network partition occurs
The following algorithm is used for selecting the master
If a host has the greatest number of datastores, it is the best host
If there is a tie, then the host with the lexically highest moid is chosen.
For example moid "host-99" would be higher than moid "host-100" since
9 is greater than 1
After a master is elected and contacts vCenter, vCenter sends a
compatibility list to the master which saves it on its local disk, and then
pushes it out to the slave hosts in the cluster
vCenter normally only talks to a master. It will sometimes talk to FDM
agents on other hosts, especially if master states that it cannot reach the
slave agent. vCenter will try to contact the other host to figure out why
Moid – Managed Object ID – vCenter identifier
There are some other scenarios when vCenter will talk to the other FDM
agents
When scanning for master
When vCenter powers on a vSphere FT secondary VM
When host is reported isolated or partitioned

46 vSphere High Availability Technical Details – Partitioning


Under normal operating conditions, there is only one master
However, if a management network failure occurs, a subset of the hosts
might become isolated. This means that they cannot communicate with
the other hosts in the cluster over the management network
In such a situation, when the hosts can continue to ping the isolation
response IP, but not other hosts, FDM is called network partitioned
Each partition without an existing master will elect a new one
Thus, a partitioned cluster state will have multiple masters, one per
partition
However, vCenter cannot report back on more than one master, so you
could be getting only one partition details – the master that vCenter finds
first
When a network partition is corrected, one of the masters will take over
from the others, thus reverting back to a single master

47 vSphere High Availability Technical Details – Isolation


In some ways this is similar to a network partition state, except that a
host can no longer ping the default gateway/isolation IP address
In this case, a host is called network isolated
The host has the ability to inform the master that it is in this isolation
state, through files on the heartbeat datastores, which will be discussed
shortly
Then the Host Isolation Response is checked to see when the VMs on this
host should be shut down or left powered on
If they are powered off, they can be restarted on other hosts in the
cluster

48 vSphere High Availability Technical Details – Virtual Machine


Protection
The master is responsible for restarting any protected VMs that fail
The trigger to protect a VM is the master observing that the power state
of the VM changes from powered off to powered on
The trigger to unprotect a VM is the master observing the VM’s power
state changing from powered on to power off
After the master protects the VM, the master will inform vCenter that the
VM has been protected, and vCenter will report this fact through the
vSphere High Availability Protection runtime property of the VM
Because templates are in essence powered-off VMs, templates are not
protected. Further, VMs that are created from the templates are not
protected until the VMs are powered on.
Periodically (that is, once every 5 minutes), vCenter will compare the list
it has to the protected VM list last reported by the FDM master. If there
are any VMs on the vCenter list but not the master's, vCenter will call into
the master to inform it of the difference.
The master, in turn, will ensure that each VM in the list provided by
vCenter has been protected.

49 vSphere High Availability Troubleshooting


Troubleshooting vSphere High Availability since vSphere 5.1 is greatly
simplified
Agents were upgraded from using a third party component to using a
component built by VMware called Fault Domain Manager (FDM)
A single log file, fdm.log, now exists for communication of all events
related to vSphere High Availability
When troubleshooting a vSphere High Availability failure, be sure to
collect logs from all hosts in the cluster
This is because when a vSphere High Availability event occurs, VMs might
be moved to any host in the cluster. To track all events, the FDM log for
each host (including the master host) is required
This should be the first point of call for
Partitioning issues
Isolation issues
VM protection issues
Election issues
Failure to failover issues

50 vSphere High Availability Best Practices


Networking
When performing maintenance use the host network maintenance
feature to suspend vSphere High Availability monitoring
When changing networking configuration, always reconfigure vSphere
High Availability afterwards
Specify which networks are used for vSphere High Availability
communication. By default, this is the management network
Specify isolation addresses as appropriate for the cluster, if the default
gateway does not allow for ICMP pings
Network paths should be redundant to avoid isolations of vSphere High
Availability

51 vSphere High Availability Best Practices (cont.)


Interoperability
Do not mix versions of ESXi in the same cluster
Virtual SAN uses its network for vSphere High Availability, rather than the
default
When enabling Virtual SAN, vSphere High Availability should be disabled
first and then enabled
Admission Control
Select the policy that best matches the need in the environment
Do not disable admission control or VMs might not all be able to fail over
if an event occurs
Size hosts equally to prevent imbalances

52 Availability
vSphere FT

53 vSphere FT Troubleshooting
vSphere FT has been completely rewritten in vSphere 6.0
Now, CPU compatibility is the same as vSphere vMotion compatibility
because the same technology is used to ship memory, CPU, storage, and
network states across to the secondary virtual machine
When troubleshooting
Get logs for both primary and secondary VMs and hosts
Grab logs before log rotation
Ensure time is synchronized on all hosts
When reviewing the configuration, you should find both primary and
secondary VMX logs in the primary VMs directory
They will named vmware.log and vmware-snd.log
Also, be sure to review vmkernel.log and hostd.log from both the
primary and secondary hosts for errors

54 vSphere FT Troubleshooting – General Things To Look For


(vmkernel, vmx)
T18:12:25.892Z cpu3:35660)FTCpt: 2401: ( pri) Primary init: nonce
T18:12:25.892Z cpu3:35660)FTCpt: 2440: ( pri) Setting allowedDiffCount =
64
T18:12:25.892Z cpu3:35660)FTCpt: 1217: Queued accept request for
ftPairID
T18:12:25.892Z cpu3:35660)FTCpt: 2531: ( pri) vmx vmm 35662
T18:12:25.892Z cpu1:32805)FTCpt: 1262: ( pri) Waiting for connection
Generally, multiprocessor vSphere FT messages will prefix with “FTCpt:”
in vmkernel and vmx logs.
Like vMotion, vSphere FT sessions have an vSphere FT id unique
identifier, taken from the migration id that started it, shared by: vmx,
vmkernel, primary, and secondary (can be used to verify all logs present).
The role of the VM is either “pri” or “snd”.
vSphere FT messages will prefix with “FTCpt:”
Like vSphere vMotion, vSphere FT sessions have an vSphere FT id unique
identifier taken from the migration ID that started it
The role of the VM is either “pri” or “snd”

55 vSphere FT Troubleshooting – Legacy vSphere FT or vSphere


FT?
vmware.log file
Search for: “ftcpt.enabled”
If present and set to “TRUE”: FT
Otherwise, legacy vSphere FT
Important for triaging failures
Check vmx log file of primary or secondary.
Search for: “ftcpt.enabled”
If present and set to “TRUE”: multiprocessor FT
Otherwise: uniprocessor FT
Important for triaging failures
56 vSphere FT Troubleshooting – Has vSphere FT Started?
vmkernel.log
T14:32:13.607Z cpu5:89619)FTCpt: 3831: ( pri) Start stamp:
T14:32:13.607Z nonce

T14:46:23.860Z cpu2:89657)FTCpt: 9821: ( pri) Last ack stamp:
T14:46:15.639Z nonce
Grepping for FTCpt in the vmkernel.log provides a robust set of
information.
You can see vSphere FT starting for a VM by finding the keywords Start
Stamp and Last ack stamp.
Also the vmware.log file provides very clean and easy-to-read
notifications of the vSphere FT state changes. Again, grep for FTCpt.
When the secondary is being created, a XvMotion is started, using the
vMotion network. If this fails vSphere FT will fail to start.
vmware.log
T22:56:01.635Z| vcpu-0| I120: FTCpt: Activated ftcpt in VMM.
If you do not see these, vSphere FT may not have started
Check for XvMotion migration errors

57 vSphere FT Best Practices


Hosts running primary and secondary VMs should run at approximately
the same processor frequency to avoid errors
Homogeneous clusters work best for vSphere FT
All hosts should have
Common access to datastores used by VMs
The same virtual network configuration
The same BIOS settings (power management, hyper threading, and so
on)
FT Logging networks should be configured with 10 Gb networking
connections
Jumbo frames can also help performance of vSphere FT
Network configuration should be
Distribute each NIC team over two physical switches
Deterministic teaming policies to ensure network traffic affinity
ISOs should be stored on shared storage

58 Availability
vSphere Distributed Resource Scheduler

59 DRS Troubleshooting
DRS uses a proprietary algorithm to assess and determine resource
usage and to determine which hosts to balance VMs to
DRS primarily uses vMotion to facilitate movements
Troubleshooting failures generally consist of figuring out why vMotion
failed, and not DRS itself as the algorithm just follows resource utilization
Ensure the following
vSphere vMotion is enabled and configured
The migration aggressiveness is set appropriately
Fully automated if approvals are not needed for migrations
To test DRS, from the vSphere Web Client, select the Run DRS option,
which will initiate recommendations
Failures can be assessed and corrected at that time
60 DRS Best Practices
Hosts should be as homogeneous as possible to ensure predictability of
DRS placements
vSphere vMotion should be compatible for all hosts or DRS will not
function
The more hosts available, the better DRS functions because there are
more options for available placement of VMs
VMs that have a smaller CPU/RAM footprint provide more opportunities
for placement across hosts
DRS Automatic mode should be used to take full benefit of DRS
Idle VMs can affect DRS placement decisions
DRS affinity should be used to keep VMs apart, such as in the case of a
load balanced configuration providing high availability

61 Content Library
62 Content Library Troubleshooting
The Content Library is easy to troubleshoot because there are two basic
areas to examine
Creation/administration of Content Libraries
This area consists of issues with the Content Library creation, storage
backing, creation of and synchronizing Content Library items, and
subscription problems.
Log files are cls-debug.log / cls-cis-debug.log
They are located in /var/log/vmware/vdcs/ OR
C:/ProgramData/Vmware/CIS/logs/vdcs
Synchronization of Content Libraries
This area consists of issues where there are synchronization failures and
problems with adding items to a content library. You can also track
transfer session ids between cls-debug and ts-debug.
Log files are ts-debug.log / ts-cis-debug.log
They are located in /var/log/vmware/vdcs/ OR
C:/ProgramData/Vmware/CIS/logs/vdcs

63 Content Library Troubleshooting – Logging (Modifying Level)


VCSA: /usr/lib/vmware-vdcs/vdcserver/webapps/cls/WEB-
INF/log4j.properties
Windows: %ProgramFiles%\VMware\vCenter
Server\vdcs\vdcserver\webapps\cls\WEB-INF\log4j.properties
Modifying Log Level
Make a backup before editing
Locate the required entries and modify
log4j.logger.xxx / log4j.appender.xxx
Modify the level (OFF / FATAL / ERROR / WARN / INFO / DEBUG / TRACE)
Restart vmware-vdcs service
Log4j.properties is used for configuring the logging for these Java
services.
Follow the steps as defined on the screen.
Notice that Content Library (CLS) is highlighted in the paths as this can be
replaced Transfer Service (TS) so you can configure logging for those.
Note that restarting the vmware-vdcs will restart Virtual Data center /
Content Library and Transfer Service.
[EXTRA INFORMATION]
================
Loggers are logical log file names. They are the names that are known to
the Java application. Each logger is independently configurable as to
what level of logging (FATAL, ERROR, etc.) it currently logs. In early
versions of log4j, these were called category and priority, but now they're
called logger and level, respectively.
The actual outputs are done by Appenders. There are numerous
Appenders available, with descriptive names, such as FileAppender,
ConsoleAppender, SocketAppender, SyslogAppender,
NTEventLogAppender and even SMTPAppender. Multiple Appenders can
be attached to any Logger, so it's possible to log the same information to
multiple outputs; for example to a file locally and to a socket listener on
another computer.
Level Description
OFF The highest possible rank and is intended to turn off logging.
FATAL Severe errors that cause premature termination. Expect these to
be immediately visible on a status console.
ERROR Other runtime errors or unexpected conditions. Expect these to
be immediately visible on a status console.
WARN Use of deprecated APIs, poor use of API, 'almost' errors, other
runtime situations that are undesirable or unexpected, but not
necessarily "wrong". Expect these to be immediately visible on a status
console.
INFO Interesting runtime events (startup/shutdown). Expect these to be
immediately visible on a console, so be conservative and keep to a
minimum.
DEBUG Detailed information on the flow through the system. Expect
these to be written to logs only.
TRACE Most detailed information. Expect these to be written to logs only.
Since version [5]

64 Content Library Troubleshooting – Advanced Settings


Modifying Advanced
Administration > System Configuration > Services
Changes take immediate effect
Advanced settings for Content Library and Transfer service appear in the
same location

65 Content Library Troubleshooting


Common problems with the Content Library
User permissions incorrect between the two vCenter Servers, which is
accomplished with Global permissions from the vSphere Web Client
Password protected content libraries can cause authentication failures
when trying to connect to them
Sufficient space available on the subscriber can cause errors when trying
to synchronize content libraries

66 Content Library Best Practices


Ensure that there is enough available space on the subscriber to be able
to download the content library
Ensure that the synchronization occurs off hours if utilization of
bandwidth is a concern
67 VMware Certificate Authority
68 VMware CA – Management Tools
A set of CLIs allows management of VMware CA, VMware Endpoint
Certificate Store, and VMware Directory Service are available
certool
Use to generate private keys, public keys
Use to request a certificate
Used to promote a plain Certificate Server to a Root CA
dir-cli
Use to create/delete/list/manage solution users in VMDirectory
vecs-cli
Use to create/delete/list/manage key stores in VMware Endpoint
Certificate Store
Use to create/delete/list/manage private keys and certificates in the key
stores
Use to manage the permissions on the key stores

69 VMware CA – Management Tools (cont.)


By default, the tools are in the following locations
Platform
Location
Windows
C:\Program Files\VMware\vCenter Server\vmafdd\vecs-cli.exe
C:\Program Files\VMware\vCenter Server\vmafdd\dir-cli.exe
C:\Program Files\VMware\vCenter Server\vmca\certool.exe
Linux
/usr/lib/vmware-vmafd/bin/vecs-cli
/usr/lib/vmware-vmafd/bin/dir-cli
/usr/lib/vmware-vmca/bin/certool

70 certool Configuration File


certool uses a configuration file called certool.cfg
override by using the --config=<file name> or --Locality=“Cork”
OS
Location
VCSA
/usr/lib/vmware-vmca/share/config
Windows
C:\Program Files\VMware\vCenter Server\vmcad
certool.cfg
Country = US
Name= cert
Organization = VMware
OrgUnit = Support
State = California
Locality = Palo Alto
IPAddress =
=
Hostname = machine.vmware.com
Note: When using the “—Locality” switch to override the Locality
information from the certool.cfg file, the keyword “—Locality” must be
capitalized.
Other switches (see the next slide) are lowercase switches.
71 Machine SSL Certificates
The SSL certificates for each node, also called machine certificates, are
used to establish a socket that allows secure communication. For
example, using HTTPS or LDAPS
During installation, VMware CA provisions each machine (vCenter / ESXi)
with an SSL certificate
Used for secure connections to other services and for other HTTPS traffic
The machine SSL certificate is used as follows
By the reverse proxy service on each Platform Service Controller node
SSL connections to individual vCenter services always go to the reverse
proxy. Traffic does not go to the services themselves
vCenter service on Management and Embedded nodes
By the VMware Directory Service on PSC and Embedded nodes
By the ESXi host for all secure connections

72 Solution User Certificates


Solution user certificate are used for authentication to vCenter Single
Sign-On
Issues the SAML tokens that allow services and other users to
authenticate
Each solution user must be authenticated to vCenter Single Sign-On
A solution user encapsulates several services and uses the certificates to
authenticate with vCenter Single Sign-On through SAML token exchange
The Security Assertion Markup Language (SAML) token contains group
membership information so that the SAML token could be used for
authorization operations
Solution user certificates enable the solution user to use any other
vCenter service that vCenter Single Sign-On supports without
authenticating

73 Certificate Deployment Options


VMware CA Certificates
You can use the certificates that VMware CA assigned to vSphere
components as is
These certificates are stored in the VMware Endpoint Certificate Store on
each machine
VMware CA is a Certificate Authority, but because all certificates are
signed by VMware CA itself, the certificates do not include a certificate
chain
Third-Party Certificates with VMware CA
You can use third-party certificates with VMware CA
VMware CA becomes an intermediary in the certificate chain that the
third-party certificate is using
VMware CA provisions vSphere components that you add to the
environment with certificates that are signed by the full chain
Administrators are responsible for replacing all certificates that are
already in your environment with new certificates

74 Certificate Deployment Options (cont.)


Third-Party Certificates without VMware CA
You can add third-party certificates as is to VMware Endpoint Certificate
Store
Certificates must be stored in VMware Directory Services and VMware
Endpoint Certificate Store, but VMware CA is not included in your
certificate chain
In that case, VMware CA no longer provisions new components with
certificates

75 VMware CA Best Practices


Replacement of the certificates is not required to have trusted
connections
VMware CA is a CA, and therefore, all certificates used by vSphere
components are fully valid and trusted certificates
Addition of the VMware CA as a trusted root certificate will allow the SSL
warnings to be eliminated
Integration of VMware CA to an existing CA infrastructure should be
done in secure environments
This allows the root certificate to be replaced, such that it acts as a
subordinate CA to the existing infrastructure

76 Storage
77 Storage Troubleshooting
Troubleshooting storage is a broad topic that very much depends on the
type of storage in use
Consult the vendor to determine what is normal and expected for
storage
In general, the following are problems that are frequently seen
Overloaded storage
Slow storage

78 Problem 1 – Overloaded Storage


Monitor the number of disk commands aborted on the host
If Disk Command Aborts > 0 for any LUN, then storage is overloaded on
that LUN
What are the causes of overloaded storage?
Excessive demand is placed on the storage device
Storage is misconfigured
Check
Number of disks per LUN
RAID level of a LUN
Assignment of array cache to a LUN

79 Problem 2 – Slow Storage


For a host’s LUNs, monitor Physical Device Read Latency and Physical
Device Write Latency counters
If average > 10ms or peak > 20ms for any LUN, then storage might be
slow on that LUN
Or monitor the device latency (DAVG/cmd) in resxtop/esxtop.
If value > 10, this might be a problem
If value > 20, this is a problem
Three main workload factors that affect storage response time
I/O arrival rate
I/O size
I/O locality
Use the storage device’s monitoring tools to collect data to characterize
the workload

80 Example 1 – Bad Disk Throughput


Good Throughput
Low Device Latency
Bad Throughput
High Device Latency (Due To Disabled Cache)
80
81 Example 2 – Virtual Machine Power On Is Slow
User complaint – Powering on a virtual machine takes longer than usual
Sometimes, powering on a virtual machine takes 5 seconds
Other times, powering on a virtual machine takes 5 minutes!
What do you check?
Check the disk metrics for the host. This is because powering on a virtual
machine requires disk activity

82 Monitoring Disk Latency Using the vSphere Client


Maximum disk latencies range from 100ms to 1100ms This is very high

83 Using esxtop to Examine Slow VM Power On


Rule of thumb
GAVG/cmd > 20ms = high latency!
What does this mean?
Latency when command reaches device is high.
Latency as seen by the guest is high.
Low KAVG/cmd means command is not queuing in VMkernel

Very Large Values for DAVG/cmd and GAVG/cmd

84 Solving the Problem of Slow VM Power On


Monitor disk latencies if there is slow access to storage
The cause of the problem might not be related to virtualization
Host events show that a disk has connectivity issues This leads to high
latencies

85 Storage Troubleshooting – Resolving Performance Problems


Consider the following when resolving storage performance problems
Check your hardware for proper operation and optimal configuration
Reduce the need for storage by your hosts and virtual machines
Balance the load across available storage
Understand the load being placed on storage devices
To resolve the problems of slow or overloaded storage, solutions can
include the following
Verify that hardware is working properly
Configure the HBAs and RAID controllers for optimal use
Upgrade your hardware, if possible
Consider the trade-off between memory capacity and storage demand
Some applications, such as databases, cache frequently used data in
memory, thus reducing storage loads
Eliminate all possible swapping to reduce the burden on the storage
subsystem
86 Storage Troubleshooting – Balancing the Load
Spread I/O loads over the available paths to the storage
For disk-intensive workloads
Use enough HBAs to handle the load
If necessary, separate storage processors to separate systems

87 Storage Troubleshooting – Understanding Load


Understand the workload
Use storage array tools to capture workload statistics
Strive for complementary workloads
Mix disk-intensive with non-disk-intensive virtual machines on a
datastore
Mix virtual machines with different peak access times

88 Storage Best Practices – Fibre Channel


Best practices for Fibre Channel arrays
Place only one VMFS datastore on each LUN
Do not change the path policy the system sets for you unless you
understand the implications of making such a change
Document everything. Include information about zoning, access control,
storage, switch, server and FC HBA configuration, software and firmware
versions, and storage cable plan
Plan for failure
Make several copies of your topology maps. For each element, consider
what happens to your SAN if the element fails
Cross off different links, switches, HBAs and other elements to ensure
you did not miss a critical failure point in your design
Ensure that the Fibre Channel HBAs are installed in the correct slots in
the host, based on slot and bus speed. Balance PCI bus load among the
available busses in the server
Become familiar with the various monitor points in your storage
network, at all visibility points, including host's performance charts, FC
switch statistics, and storage performance statistics
Be cautious when changing IDs of the LUNs that have VMFS datastores
being used by your ESXi host. If you change the ID, the datastore
becomes inactive and its virtual machines fail

89 Storage Best Practices – iSCSI


Best practices for iSCSI arrays
Place only one VMFS datastore on each LUN. Multiple VMFS datastores
on one LUN is not recommended
Do not change the path policy the system sets for you unless you
understand the implications of making such a change
Document everything. Include information about configuration, access
control, storage, switch, server and iSCSI HBA configuration, software
and firmware versions, and storage cable plan
Plan for failure
Make several copies of your topology maps. For each element, consider
what happens to your SAN if the element fails
Cross off different links, switches, HBAs, and other elements to ensure
you did not miss a critical failure point in your design
Ensure that the iSCSI HBAs are installed in the correct slots in the ESXi
host, based on slot and bus speed. Balance PCI bus load among the
available busses in the server
If you need to change the default iSCSI name of your iSCSI adapter, make
sure the name you enter is worldwide unique and properly formatted. To
avoid storage access problems, never assign the same iSCSI name to
different adapters, even on different hosts

90 Storage Best Practices – NFS


Best practices for NFS arrays
Make sure that NFS servers you use are listed in the VMware Hardware
Compatibility List. Use the correct version for the server firmware
When configuring NFS storage, follow the recommendations from your
storage vendor
Verify that the NFS volume is exported using NFS over TCP
Verify that the NFS server exports a particular share as either NFS 3 or
NFS 4.1, but does not provide both protocol versions for the same share.
This policy needs to be enforced by the server because ESXi does not
prevent mounting the same share through different NFS versions
NFS 3 and non-Kerberos NFS 4.1 do not support the delegate user
functionality that enables access to NFS volumes using nonroot
credentials. Typically, this is done on the NAS servers by using the
no_root_squash option
If the underlying NFS volume, on which files are stored, is read-only,
make sure that the volume is exported as a read-only share by the NFS
server, or configure it as a read-only datastore on the ESXi host.
Otherwise, the host considers the datastore to be read-write and might
not be able to open the files

91 Networking
92 Networking Troubleshooting
Troubleshooting networking is very similar to physical network
troubleshooting
Start by validating connectivity
Look at network statistics from esxtop as well as the physical switch
Is it a network performance problem?
Validate throughput
Is CPU load too high?
Are packets being dropped?
Is the issue limited to the virtual environment, or is it seen in the physical
environment too?
One of the biggest issues that VMware has observed is dropped network
packets (discussed next)

93 Network Troubleshooting – Dropped Network Packets


Network packets are queued in buffers if the
Destination is not ready to receive them (Rx)
Network is too busy to send them (Tx)
Buffers are finite in size
Virtual NIC devices buffer packets when they cannot be handled
immediately
If the queue in the virtual NIC fills, packets are buffered by the virtual
switch port
Packets are dropped if the virtual switch port fills

94 Example Problem 1 – Dropped Receive Packets


If a host’s droppedRx value > 0, there is a network throughput issue
Cause
Solution
High CPU utilization
Increase CPU resources provided to virtual machine
Increase the efficiency with which the virtual machine uses CPU
resources
Improper guest operating system driver configuration
Tune network stack in the guest operating system
Add virtual NICs to the virtual machine and spread network load across
them
94
95 Example Problem 2 – Dropped Transmit Packets
If a host’s dropped TX value > 0, there is a network throughput issue
Cause
Solution
Traffic from the set of virtual machines sharing a virtual switch exceeds
the physical capabilities of the uplink NICs or the networking
infrastructure
Add uplink capacity to the virtual switch
Move some virtual machines with high network demand to a different
virtual switch
Enhance the networking infrastructure
Reduce network traffic
95
96 Networking Best Practices
CPU plays a large role in performance of virtual networking. More CPUs,
therefore, will generally result in better network performance
Sharing physical NICs is good for redundancy, but it can impact other
consumers if the link is overutilized. Carefully choose the policies and
how items are shared
Traffic between virtual machines on the same system does not need to
go external to the host if they are on the same virtual switch. Consider
this when designing the network
Distributed vSwitches should be used whenever possible because they
offer greater granularity on traffic flow than standard vSwitches
vSphere Network and Storage I/O Control can dramatically help with
contention on systems. This should be used whenever possible
VMware Tools, and subsequently VMXNET3 drivers, should be used in all
virtual machines to allow for enhanced network capabilities

97 Questions
98 VMware vSphere 6.0 Knowledge Transfer Kit
VMware, Inc Hillview Ave Palo Alto, CA 94304
Tel: or Fax:
Download ppt "VMware vSphere® 6.0 Knowledge Transfer Kit"

© 2024 SlidePlayer.com Inc. Feedback Do Not Sell About project


All rights reserved.
Privacy Policy My Personal SlidePlayer
Feedback Information Terms of Service

Search... Search

You might also like