Advanced SAN Troubleshooting: Mike Frase
Advanced SAN Troubleshooting: Mike Frase
Advanced SAN Troubleshooting: Mike Frase
Troubleshooting
Mike Frase
Session BRKSAN 3708
Session_3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 2
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 3
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 4
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 6
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 7
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 8
Port A Port B
Link Failure Condition
AC AC
These Are All Special-Ordered Sets
of 8B/10B Coding:
NOS LF LF = Link Failure State
LF
NOS = Not Operational Sequence
OLS
OLS = Offline Sequence
LR
OL
OL = Offline State
LR
LRR LR = Link Reset
LR LRR = Link Reset Response
Idle
AC LR = Link Recovery State
Idle AC = Active State
AC
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 9
Primitive Sequences
Counters Can Determine
Layer 0–1 Problems
Tip:
Clear Counters and Monitor to Verify Active
Issues, Use Device Manager Monitor Tool to
Monitor Live; Set and Activate Threshold
Manager to Alert You;
MDS_Switch# clear counters interface fc 1/1
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 10
Exit to Detach
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 11
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 12
LINK_ACTIVE
032907 014953 369768 (0000) E_LINK_IDLE LINK_LR_RX
032907 014953 368963 (0000) E_LINK_LR LINK_NOS_RX
032907 014953 365690 (0000) E_LINK_NOS LINK_OLS_TX
032907 014953 365593 (0001) E_LINK_MIN_OLS LINK_INIT
032907 014953 360463 (016D) E_LINK_LINK_INIT LINK_DIS
032907 014949 710690 (413C) E_LINK_CLEANUP LINK_ACTIVE
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 13
FLOGI (acc)
PLOGI (acc)
ACC PLOGI (acc)
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 14
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 15
Addresses
From Fabric Manager
FCID Domain/Area/Host
Port World Wide Name 0b / 00 / 01
Vendor name derived
from standards assigned
OUI
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 16
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 17
Addresses
From Performance Manager
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 18
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 19
WireShark Uses
Wireshark (Once known as
Ethereal) is part of the
SAN/OS system image and
can be run directly on the
switch via ssh/telnet.
(FCAnalyzer command)
The combination of
Wireshark on a PC with a
PAA can give complete look
at the flow beyond the
FLOGI/PLOGI process
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 20
PLOGI to storage
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 21
ISL
• Common Fabric Parameters required
• Principal switch selection
• Domain ID’s unique across SAN / VSAN
• FSPF routing is activated
• Zone Merging must occur
• Possible VSAN trunking involved
• Possible Port Channeling involved
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 22
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 23
The Domain ID
Domain IDs are assigned by the principal switch based on the
non-principal switch's requesting domain ID.
On a fresh switch, the search for the free domain starts from 239 and goes
in decreasing order.
Before a switch ever joins a fabric, each switch assigns itself a domain
ID based on its configured domain ID. If the configured domain ID type
is preferred and configured domain ID is 0, then it assigns itself a
random domain ID.
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 24
Domain ID’s
Configured Domains in a VSAN
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 25
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 26
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 27
Zoning
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 28
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 29
Zoning Choices
Switch FC Interface or Fabric Port WWN (FWWN)
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 30
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 31
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 32
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 33
Enhanced Zoning
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 34
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 37
FCanalyzer
SPAN & PAA (WireShark usage)
SAN/OS (Output analysis, debug, logs, Cores)
Performance Manager (Licensed part of Fabric Manager)
NTOP (Using Netflow and SPAN w/PAA)
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 38
iS C
CS
iSCSIserver3
iS
FC Analyzer
SI
Local iSCSI.server4
Server2
Terminal Server3
PAA
Storage
Server1 Fabric Manager and
Device Manager
Ethernet
PC Out-of-Band
Management Network
Span FC Ports, Port
FC Analyzer Channels, iSCSI or
Debugs and Show FCIP Port to SD Port
Remote
Commands, Telnet or on MDS
Console Wireshark
PC
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 39
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 40
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 41
FC Analyzer Options
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 42
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 43
N Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 45
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 46
N Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 47
FC Linecard
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 48
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 49
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 50
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 51
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 52
52
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 53
Performance
Manager
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 54
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 55
Accounting
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 56
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 57
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 58
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 59
SD
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 61
LUN 0 Traffic
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 62
Device Connections
ISL’s
Zoning
IVR
NPV
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 63
MDS 9124
PAA
Ethernet
Fibre Channel
Wireshark
Application
Response Side
side of Trace
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 65
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 66
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 67
ISL Trace
ACK1 Filter Applied
Build Fabric
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 69
Debugging Zoning
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 70
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 71
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 72
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 73
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 74
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 75
LUNs Targets
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 76
port-channel 1,
Success
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 77
<show port internal info> will give greater detail on merger failure reason
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 78
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 79
Different
Different Same zone member
switches name
BEAR %ZONE-2-ZS_MERGE_FAILED: %$VSAN 1%$ Zone merge failure, isolating interface fc1/2 error:
Member mismatch
Guernsey %ZONE-2-ZS_MERGE_FAILED: %$VSAN 1%$ Zone merge failure, isolating interface fc2/1 error:
Received rjt from adjacent switch
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 80
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 81
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 82
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 83
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 84
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 85
Best Practices
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 86
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 87
VSAN10
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 88
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 89
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 90
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 91
Zoning
Keep default zone policy at deny
Manage local zones from IVR enabled switch
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 92
Are devices logged into Use command line to Is it possible that a natted
their local VSAN? (show view active local zoneset FCID changed because of a
flogi database) (show zoneset active) reload causing AIX or HP-UX
to have target binding
issues?
Are devices exported Does the same IVR zone
into remote FCNS in show up in both local and
both directions? (show remote VSAN’s active
fcns database) zoneset? Ensure HBA is not
configured to time out
PLOGI to quickly. IVR NAT
Are FC Devices in the
native VSAN FCNS in Did IVR zoneset delays ACC to PLOGI for a
all switches in that activation succeed in all few seconds. Most HBAs
VSAN? (show fcns VSANs for the affected
devices? have a 10 second timeout.
database)
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 93
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 94
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 95
NPV Troubleshooting
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 96
NPV-Core Switch
fwwn of Port P1
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 97
NPV-Core Switch
VSAN:1 FCID:0x331e00
------------------------
port-wwn (vendor) :2f:ff:00:06:2b:10:c4:4c (LSI)
node-wwn :2f:ff:00:06:2b:10:c4:4c
class :3
node-ip-addr :0.0.0.0
ipa :ff ff ff ff ff ff ff ff
F P1 fc4-types:fc4_features :scsi-fcp:init
symbolic-port-name :LSI7404XP-LC BR A.1 03-01081-03A
FW:01.03.17 Port 0
symbolic-node-name :
NP port-type :N
fc1/2 P2 port-ip-addr :0.0.0.0
fabric-port-wwn :20:02:00:0d:ec:04:99:40
hard-addr :0x000000
NPV permanent-port-wwn (vendor) :20:02:00:0d:ec:2f:c1:40 (Cisco)
F P3
fc1/1 fwwn of Port P2
fwwn of Port P1
N P4
P2 2 (fc1/7, fc1/21)
NPV Switch F
FC
Next F port on NPV would be assigned to NP Port P2
(NP port with minimum number of mapped F ports)
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 99
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 100
Conflict in Port-security F
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 101
FDISC- converted
FLOGI or connecting
host or storage device
on the NPV switch
PLOGI accept to the
real HBA on the MDS
NPV switch, FCID of
attached device is
0b0002
Remaining exchanges
are from device 0b0002
to the code MDS switch
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 102
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 103
The following show commands can be used in the NPV-core switch to display
information on the NPV devices. Since these outputs are based on the name
server information, this command can be run from any non-NPV MDS switch
running 3.2(1) or later release
Example Outputs
VSAN 1:
-------------------------------------------------------------------------------
NPV NODE-NAME NPV IP_ADDR NPV IF CORE SWITCH WWN CORE IF
-------------------------------------------------------------------------------
20:00:00:0d:ec:3d:62:80 10.1.96.24 fc1/20 20:00:00:0d:ec:2d:af:40 fc4/4
20:00:00:0d:ec:3d:62:80 10.1.96.24 fc1/19 20:00:00:0d:ec:2d:af:40 fc4/3
20:00:00:0d:ec:3d:62:80 10.1.96.24 fc1/17 20:00:00:0d:ec:2d:af:40 fc4/1
...
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 104
....
VSAN 1:
--------------------------------------------------------------------------
FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE
--------------------------------------------------------------------------
0x330f00 N 2f:ff:00:06:2b:10:c7:b2 (LSI) scsi-fcp:init
0x331000 N 2f:ff:00:06:2b:10:c7:b3 (LSI) scsi-fcp:init
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 105
Wrap-up
Cornerstone to SAN network troubleshooting is to
understand Standards operation
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 107
Core Dumps
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 108
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 109
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 110
Reported failure on ports 1/5-1/8 (Fibre Channel) due to X-bar Interface ASIC Error in
device 5 (device error 0xc05006a3)
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 111
• Reason for shutdown 'bit error rate exceeded', we are following the FICON
specification
• By actively disabling a bad link, we can minimize the side effects on other
good links
• A bad link ends up link flapping on good links. This was because of error
handling mechanisms implemented in some storage systems.
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 112
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 113
You can then clear all, which does not clear all
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 114
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 115
Core Dumps
Show cores
Guernsey# sh cores
Module-num Process-name PID Core-create-time
----------------- --------------------- ---------- -----------------------
1 cimxmlserver 20029 Jul 18 08:39
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 116
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 117
Recommended Reading
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 119
Session 3708 © 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 120