Fibrecat SHB en

FibreCAT SX60 / SX80 / SX88
Service Manual
Edition September 2007 (Version 1.2, 2007-09-17)

Comments… Suggestions… Correc
tions…
The User Documentation Department would like to know your
opinion on this manual. Your feedback helps us to optimize our
documentation to suit your individual needs.
Feel free to send us your comments by e-mail to:
manuals@fujtsu-siemens.com
Certified documentation
according to DIN EN ISO 9001:2000
To ensure a consistently high quality standard and
user-friendliness, this documentation was created to
meet the regulations of a quality management system which
complies with the requirements of the standard
DIN EN ISO 9001:2000.
cognitas. Gesellschaft für Technik-Dokumentation mbH
www.cognitas.de
Copyright and Trademarks

Copyright © Fujitsu Siemens Computers GmbH 2007.
All rights reserved.
Delivery subject to availability; right of technical modifications reserved.
All hardware and software names used are trademarks of their respective manufacturers.
This manual is printed

on paper treated with
chlorine-free bleach.
Contents
1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1 Before You Read This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2 Notational Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Controller and Expansion Enclosure Architecture . . . . . . . . . . . . . . . . . 13
2.2.1 Midplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Enclosure ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2.1 Expansion Enclosure Usage Behind a RAID System . . . . . . . . . . . . . . . 14
2.3 Drive Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Disk Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Drive Module Dongle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2.1 SAS Drive Dongle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2.2 SATA Drive Dongle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Controller and Expansion Modules . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Controller Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1.1 Storage Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1.2 Management Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 SAS Data Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3 Host Interface Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3.1 Host Interface Speed for FibreCAT SX60 / SX80 in Direct Attached Configurations 21
2.4.4 Expansion Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Power and Cooling Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5.1 Cooling Fans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5.2 Airflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
FibreCAT SX60 / SX80 / SX88 Service Manual

Contents
3 Fault Isolation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Gather Fault Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Determine Where the Fault Is Occurring . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Review the Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Isolate the Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Troubleshooting Using System LEDs . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Enclosure LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Status LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.2 Enclosure ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Drive Module LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Controller Module LEDs and Fault Isolation . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Isolating Faults Using the Host Link Status LEDs . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Isolating Faults Using the Expansion Port Status LED . . . . . . . . . . . . . . . . . 35
4.3.3 Isolating Management Connection Faults . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.4 Checking Controller Module Status and Functionality . . . . . . . . . . . . . . . . . 38
4.3.5 FRU Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.6 FRU Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.7 Cache Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Power and Cooling Module LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Expansion Enclosure LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Troubleshooting Using WBI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Determining Overall Array Status and Verifying Faults . . . . . . . . . . . . . . . 44

5.2 Stopping I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Isolating Faulty Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.1 Identifying a Faulty Disk Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.2 Reviewing Disk Error Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.3 Capturing Trend Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.4 Reviewing the Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Contents
5.4 Isolating Data Path Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.4.1 Isolating Internal Data Path Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4.2 Using the Expander Status Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4.3 Reviewing the Event Log for Disabled PHYs . . . . . . . . . . . . . . . . . . . . . . 53
5.4.4 Resolving PHY Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4.5 Isolating External Data Path Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4.6 Resetting Host Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.5 Isolating FRU Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.6 Clearing Metadata From a Leftover Disk Drive . . . . . . . . . . . . . . . . . . . 55
5.7 Using Diagnostic Manage-Level Only Options . . . . . . . . . . . . . . . . . . . 55
5.7.1 Enabling and Using the Trust Virtual Disk for Disaster Recovery . . . . . . . . . . . 56
5.7.2 Clearing Unwritable Cache Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.7.3 Viewing the Debug Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.7.4 Viewing Error Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.7.5 Viewing CAPI Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.7.6 Viewing Management Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.7.7 Configuring Event Notification By Selecting Individual Events . . . . . . . . . . . . . 60
5.8 Using Advanced Manage-Level Recovery and Debug Utilities . . . . . . . . . . 62
5.8.1 Dequarantining a Virtual Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.8.2 Saving Log Information to a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.9 Problems Accessing the Array Using FibreCAT SX Manager’s WBI . . . . . . . . 64
6 Troubleshooting Using Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1 Using Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1.1 Saving the Event Log to a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.2 Event Log Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.1.3 Event Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1.4 Reviewing Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Using the Debug Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.1 Setting Up the Debug Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.2 Viewing the Debug Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Contents
7 Voltage and Temperature Warnings . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1 Resolving Voltage and Temperature Warnings . . . . . . . . . . . . . . . . . . . 73

7.2 Sensor Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2.1 Power Supply Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2.2 Cooling Element Fan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2.3 Temperature Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.4 Voltage Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8 Troubleshooting and Replacing FRUs . . . . . . . . . . . . . . . . . . . . . . . . 79
8.1 Available FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8.2 Identifying a FibreCAT SX Enclosure by Its Serial Number . . . . . . . . . . . . . 83
8.2.1 Querying the Serial Number Remotely . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2.2 Table of FibreCAT SX60 / SX80 / SX88 Serial Numbers . . . . . . . . . . . . . . . . 84
8.3 Identifying FibreCAT SX Spares Lists in Ersin . . . . . . . . . . . . . . . . . . . 85
8.4 Filling Out the Field Return Tag Form . . . . . . . . . . . . . . . . . . . . . . . . 88
8.5 Static Electricity Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.6 Identifying Controller or Expansion Module Faults . . . . . . . . . . . . . . . . . 90
8.7 Removing and Replacing a Controller or Expansion Module . . . . . . . . . . . 91
8.7.1 Saving Configuration Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.7.2 Shutting Down a Controller Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.8 Removing a Controller Module or Expansion Module . . . . . . . . . . . . . . . 93
8.8.1 Installing a Controller Module or Expansion Module . . . . . . . . . . . . . . . . . 95
8.8.1.1 Fault/Service Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.8.1.2 Boot Handshake Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.9 Updating Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.9.1 Updating Firmware During Controller Replacement . . . . . . . . . . . . . . . . . . 96
8.9.1.1 Disabling Partner Firmware Upgrade . . . . . . . . . . . . . . . . . . . . . . . . 97
8.9.2 Updating Firmware Using FibreCAT SX Manager’s WBI . . . . . . . . . . . . . . . . 97
8.10 Identifying Cable Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.10.1 Identifying Cable Faults on the Host Side . . . . . . . . . . . . . . . . . . . . . . . . 99
8.10.2 Identifying Cable Faults on the Expansion Enclosure Side . . . . . . . . . . . . . . . 99
8.11 Identifying Drive Module Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.11.1 Understanding Disk-Related Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.11.2 Disk Drive Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Contents
8.11.3 Disk Channel Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8.11.4 Identifying Faulty Drive Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.12 Removing and Replacing a Drive Module . . . . . . . . . . . . . . . . . . . . . . 104
8.12.1 Replacing a Drive Module When the Virtual Disk Is Rebuilding . . . . . . . . . . . . 104
8.12.2 Identifying the Location of a Faulty Drive Module . . . . . . . . . . . . . . . . . . . 105
8.12.3 Removing a Drive Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.12.4 Installing a Drive Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.12.5 Verify that the Correct Power-On Sequence was Performed . . . . . . . . . . . . . . 109
8.12.6 Installing an Air Management Module . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.13 Identifying Virtual Disk Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.13.1 Clearing Metadata From a Disk Drive . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.14 Identify Power and Cooling Module Faults . . . . . . . . . . . . . . . . . . . . . 111
8.14.1 Removing and Replacing a Power and Cooling Module . . . . . . . . . . . . . . . . 112
8.14.2 Installing a Power and Cooling Module . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.15 Replacing an Enclosure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.1 Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

9.2 Failover Reason Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.3 Troubleshooting Using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.3.1 Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.3.1.1 Keywords and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.3.1.2 Disk Drive Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.3.1.3 Virtual Disk Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.3.1.4 Volume Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.3.1.5 Host Nickname Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.3.1.6 Volume Mapping Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.3.2 Viewing Command Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3.3 clear cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3.4 clear expander-status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3.5 ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.3.6 reset host-channel-link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.3.7 restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.3.8 restore defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.3.9 set debug-log-parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.3.10 set led . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.3.11 set protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.3.12 show debug-log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Contents
9.3.13 show debug-log-parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

9.3.14 show enclosure-status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.3.15 show events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.3.16 show expander-status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.3.17 show frus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.3.18 show inquiry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.3.19 show protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.3.20 show redundancy-mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.3.21 trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Related Documents and Links . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

1 Preface
This guide describes how to install, initially configure and operate FibreCAT SX® series
storage system, and applies to the following models:
● FibreCAT SX60 Fibre Channel (FC) Controller Enclosure
● FibreCAT SX60 / SX80 / SX88 Serial Attached SCSI (SAS) Expansion Enclosure
If there are no differences between the three controller enclosure models, from now on they
together are referred to as FibreCAT SX (controller enclosure). The FibreCAT SX60 /
SX80 / SX88 expansion enclosure is referred to as FibreCAT SX expansion enclosure.
This book is written for system administrators and service personnel who are familiar with
Fibre Channel (FC), Internet SCSI (iSCSI), and Serial Attached SCSI (SAS) configurations,
network administration, and RAID technology.
1.1 Before You Read This Book

Before you begin to follow procedures in this book, you must have already installed enclo-
sures and learned of any late-breaking information related to system operation, as
described in the following documents:
● “FibreCAT SX60 / SX80 / SX88 Operating Manual”
● FibreCAT SX60 / SX80 / SX88 “Support Bulletins”
FibreCAT SX60 / SX80 / SX88 Service Manual 9

Notational Conventions Preface
1.2 Notational Conventions

Typeface1 Meaning
Italics Commands, options, file names and path names are written in italic letters
in continuous text
fixed font Commands and options in syntax descriptions as well as system output
are written in a fixed font
<variable> Angle brackets are used to enclose variables which are to be replaced by
actual values
semi-bold Highlights text
“Quotation marks” References to documents and chapters or sections in this or other documents
I NOTE Important information and tips
V CAUTION Reference to hazards that can lead to personal injury, loss of data or
damage to equipment
Table 1: Notational Conventions
1
The settings on your browser might differ from these settings.
10 FibreCAT SX60 / SX80 / SX88 Service Manual

2 System Architecture
This chapter describes the FibreCAT SX architecture. Prior to troubleshooting any system
it is important to understand the architecture including each of the system components, how
they relate to each other and how data passes through the system. Topics covered in this
chapter include:
● “Architecture Overview” on page 12
● “Controller and Expansion Enclosure Architecture” on page 13
● “Drive Modules” on page 14
● “Controller and Expansion Modules” on page 16
● “Power and Cooling Modules” on page 25

Architecture Overview System Architecture
2.1 Architecture Overview

The FibreCAT SX architecture is based on a passive midplane design; meaning that all field
replaceable units (FRU) connect to a central printed circuit board consisting of redundant
pathways and components as seen in Figure 1.
Power supply
Drive module
Controller/
expansion module
Midplane
Figure 1: FibreCAT SX Storage System Architecture Overview
The primary field replaceable units found in the controller/expansion enclosure are labeled
in Figure 1 and include:
● 2U controller/expansion enclosure and midplane (3.5 inches tall by 19 inches wide).
The midplane is replaced with the enclosure housing.
● Up to 12 SATA or SAS drive modules per enclosure. When a disk drive fails the entire
drive module is replaced.
● Up to two controller/expansion modules. When a host or drive side bus fault,
management controller fault or storage controller fault occurs that is related to the
controller/expansion module, the entire module is replaced.
● Two redundant power and cooling modules. If a power supply fault or fan fault occurs
the entire module must be replaced.
NOTE
i Do not remove any field replaceable unit until the replacement is on-hand.
Removing a field replaceable unit without a replacement will disrupt the system
airflow and cause an over temperature condition.

System Architecture Controller and Expansion Enclosure Architecture
2.2 Controller and Expansion Enclosure Architecture

The controller/expansion enclosure architecture consists of the following components:
● Midplane
● Enclosure ID (EID)
NOTE
i The expansion enclosure consists of the same components as the controller
enclosure with the exception that an expansion module replaces the controller
module. The expansion enclosure features redundant expansion modules, each of
which provides two 3-Gbps SAS expansion ports for external drive expansion. The
details of the expansion module are in “Expansion Module” on page 23“.
2.2.1 Midplane
The midplane is the common connection point for all system electronics and is part of the
controller/expansion enclosure. All FRUs plug into this board. The drive modules plug into
the front through a dongle board. The power and cooling modules and controller(s) plug into
the rear. The upper controller (controller module A) and lower controller (controller module
B) connect to the midplane through two Molex SQEQ series connectors for signals, and one
HDM series for power. The midplane is designed to support 3.0 GBit/sec SATA II and SAS
operation.
The midplane incorporates high-speed differential pair design layout rules to match
impedance, minimize skin effect losses, minimize transition losses through even mode
impedance changes at transitions, and minimize crosstalk.
The midplane uses a serial EEPROM to hold system serial number and WWN information.
The serial EEPROM is accessible by the I/O controller through an I2C connection. Another
I2C is used for the power supply and fan status/control functions with the exception of Turn
On and Mated states. These I2C busses are multiplexed from a single bus on the IOM.
2.2.2 Enclosure ID
The enclosure ID (EID) provides a visual single-digit numerical reference to each enclosure
in an array. It is located on the left mounting flange when you are facing the front of the array.
The array uses the SAS protocol for internal data routing: therefore, its devices are
addressed through their 64-bit world wide name (WWN). The WWN is not user-friendly for
device identification although it does simplify the identification process.

Drive Modules System Architecture
Because the WWN is used, there is no need for a selectable or equivalent mechanical
interface. Instead, the array uses an LED display on each enclosure in the system. The
value shown on the LED display serves as EID. It is the responsibility of the controlling
member of the system, whether a host computer(s) or RAID controller(s), to set the EID on
each enclosure in the system.
The SCSI Enclosure Services (SES) chip on the I/O board obtains and sets the EID. A host
or RAID system manages the EID by setting bits 3-0 of byte 2 in either the A (top) or B
(bottom) SAS expander element when sending an SES enclosure control page.
Refer to “Updating Firmware” on page 96 for information about how the enclosure ID
changes when expansion modules are moved.
The following criteria define EID usage for an expansion enclosure behind a RAID system:
2.2.2.1 Expansion Enclosure Usage Behind a RAID System
● The controller enclosure should always display zero (0) on its EID.
● An expansion enclosure attached to a controller enclosure should have a non-zero
value displayed on its EID.
● Each enclosure, within a single solution, should have a unique value displayed on its
EID.
● When one or more expansion enclosures are used, the RAID controllers within the
RAID system assign an ID to each enclosure.
● The RAID system uses a persistent algorithm to assign EIDs, so that they will not
change during simple reconfigurations.
● The values on the EID display can be used to correlate physical enclosures, and drives
within them, to logical views of the system provided by the FibreCAT SX Manger’s WBI
or CLI.
2.3 Drive Modules

Each drive module consists of three components: the carrier, disk drive, and dongle board
as shown in Figure 2. The carrier has a front bezel with a cam lever used for insertion and
removal of the disk sled. The carrier also has slide rails and a mount for the dongle board.

System Architecture Drive Modules
Dongle
Disk
drive
Carrier
Figure 2: Drive Sled
Each drive module is inserted into a drive slot in the enclosure. The drive slots are used to
identify drives, for example enclosure 0 slot 0 is the upper left drive slot of enclosure 0, the
RAID enclosure. Figure 3 displays each of the drive slot numbers in the enclosure. Drive
modules are slot independent, that is, the drives can be moved to any slot with the power
off. Once power is applied, the RAID controllers will use the metadata held on each disk to
locate each member of a virtual disk.
Figure 3: Drive Slot Numbers

Controller and Expansion Modules System Architecture
2.3.1 Disk Drives

Each RAID controller has single port access from the local SAS expander to internal and
expansion enclosure drives. Alternate path dual port access to all internal drives is accom-
plished through the expander inter-controller wide lane connection. Dual port access
assumes the presence of both controller modules. In a failed over configuration, where the
partner controller module is down or removed, only single port access to the drives exists.
The array can use SAS or SATA II disk drives. Native command queuing (NCQ) is supported
on SATA drives
The disk drives are interchangeable with qualified equivalent drives. In addition, each
enclosure can be populated with disks of various capacities. To ensure the full use of a
disk’s capacity, construct all virtual disks with disks of the same capacity.
2.3.2 Drive Module Dongle

Each drive module has a dongle board mounted to the rear. The type of board and purpose
of the dongle depends upon the type of drive installed in the drive module. The dongle has
an FC drive mechanically compatible SCA-II 40-pin connector that mates to the midplane.
Other common components include power switching FETs, drive fault/activity LEDs and a
simple micro-controller that is used to decode a single-wire serial interface from each
controller.
2.3.2.1 SAS Drive Dongle
Because the SAS drives are natively dual ported and can fully utilize the dual path FibreCAT
SX architecture, the SAS dongle board only serves to make the drive module connector
compatible with the enclosure midplane.
2.3.2.2 SATA Drive Dongle
The single ported SATA drive’s dongle board is used to make it connector compatible with
the midplane and includes an active/active (AA) multiplexer (MUX). The SATA AA MUX
enables a single port drive to appear as a dual port on the midplane.
2.4 Controller and Expansion Modules

This section describes the controller and expansion module

System Architecture Controller and Expansion Modules
2.4.1 Controller Module

The controller module is a single hot-pluggable board that mates with the enclosure
midplane using a 150-pin midplane connector and provides all RAID functions and SAS
expansion (drive) channels. It can accept a variety of plug-in mezzanine boards, known as
host interface modules (HIMs). It is important to note that the host mezzanine is not a FRU.
Together, a RAID I/O board and HIM form the controller module FRU.
The midplane connector interface supports high-speed serial lanes operating at up to 4-
Gbps link speed. The controller module has three mezzanine connectors to support the
HIM and contains the storage controller, management controller, SAS data paths, and SES
processing functions.
2.4.1.1 Storage Controller
The storage controller (SC) consists of a processor subsystem which provides all RAID
functionality. The SC also provides the bridging functionality that takes in a Fibre Channel
signal and sends out a SAS signal to the back end drive bus.
2.4.1.2 Management Controller
The management controller (MC) is a separate processor subsystem. The MC provides all
out-of-band management features, including the FibreCAT SX Manager’s web based
Interface (WBI), SNMP, CLI, DMS and e-mail notification. The MC also consists of the
external serial ports and Ethernet port.
Note that there are two primary processors: the SC and the MC. Both CPUs are
independent, and most importantly, one will operate if the other one goes down. In addition,
by having two CPUs, management functions will have significantly less impact on RAID I/O
performance, which differentiates the storage product's architecture from traditional
approaches.
As illustrated in Figure 4, the controller module includes a number of high-speed serial
interfaces:
● SAS/SATA serial back-end disk channels (12 lanes per controller)
● SAS inter-controller alternate path (4 lanes)
● SAS disk channel expansion (4 lanes)
● PCI Express inter-controller messaging and write cache mirroring (4 lanes)
● FC serial front-end host channels (dual port per controller)
● Two FC serial connections between controllers used to facilitate controller failover (up
to 4 lanes)

Storage Storage
Controller Controller
Figure 4: Block Diagram of the Controller Module

2.4.2 SAS Data Path

The back-end data path of each controller module uses the SAS protocol. To accomplish
this, the controller module incorporates a number of SAS components as shown in Figure 5.
Disk data path

Storage Storage
Figure 5: SAS Data Path
Following the data path as it leaves the SC the signal enters the SAS controller. It is then
sent from the controller to the SAS expander and then onto the drive module. The SAS
expander is much like a Fibre Channel switch in that it maintains a routing map and can
route data to the addressed destination. The expander ports connect to each disk slot. It
also connects to the failover or alternate path and to the expansion path.
2.4.3 Host Interface Module

The Fibre Channel Host Interface Module (HIM) provides the connectivity between the host
system and the array. Its primary components are the two Fibre Channel ports that use an
SFP, the Fibre Channel controller chip, and port bypass circuits (Port interconnects). The
HIM board supports a variety of features, including:
● Two SFP (small form-factor pluggable) optical ports, each having 2-Gbps or 4-Gbps link
speed
● Support for 2/ 4-Gbit data rate speeds at full duplex while supporting FC-AL (arbitrated
loop).

● Port interconnect technology provides fault isolation host channels.

● Each FC PBC can be remotely configured as an FC-AL connection.
● Automated FC node ID designation is assigned according to drive slot insertion.
The HIM connects to the I/O board using three stacking connectors. Internal features of the
HIM include:
● Single 64-bit 100-MHz PCIX bus supporting up to two independent masters and
separate interrupts
● Power and miscellaneous control
● High-speed inter-controller serial lanes using intermediate PCB connector board
● Independent DC power regulation from 5V and 12V primary
● Board type/revision detection via PCI device configuration scan
The FC HIM uses a single PCIX bus and a single dual-port Fibre Channel controller. The
port interconnects are used for internal controller port connectivity when presenting all
LUNs from host port. The primary components of the HIM (Figure 6) are:
● Two 2/4-Gbit SFP sockets
● Link Status LED and Link Speed LED for each SFP
Figure 6: Block Diagram of the FC HIM Board
SCSI Enclosure Services (SES) intelligence controls LED indicators on the front and rear
panels to provide environmental and hardware status on enclosures and FRUs. The SES
controller also monitors the following:

● Internal +12 and +5 voltages

● Temperature sensors located throughout the enclosure
● Fans
Both RAID and expansion/JBOD enclosures support dual SES fail-over capabilities for fully
redundant event monitoring. To accomplish this, the SES protocol communicates in-band
from one array enclosure to another. The ability to communicate to all devices in a given
solution enables the storage array to auto-detect installation of all major components such
as drive modules, power and cooling modules, controller modules, and expansion modules.
It also enables the system to log all warning and critical events.
2.4.3.1 Host Interface Speed for FibreCAT SX60 / SX80 in Direct Attached Configurations
NOTE
i The following restriction applies to FibreCAT SX60 / SX80 in direct attached config-
urations:
If your Host Interface Module (HIM) is Model 0 (or you have a mixed mode of HIM
0 and 1 in a dual controller FibreCAT controller enclosure), for FibreCAT SX60 /
SX80 in direct connect mode 2 Gbit FC speed is supported only.
If both HIMs in your controller enclosure are Model 1 (or you have only a single controller
FibreCAT SX and it is Model 1), in direct host connect mode up to 4 Gbit FC speed is
supported for FibreCAT SX60 / SX80.
For FibreCAT SX88, in direct host connect mode always up to 4 Gbit FC speed is
supported.
In switch attached mode, for FibreCAT SX60 / SX80 / SX88 up to 4 Gbit FC speed is
supported always (no restriction with any HIM Model).
If you have a direct attached configuration with FibreCAT SX60 / SX80, you should find out
the HIM Model (0 or 1) of your cointroller(s) via the controllers’ part numbers or via FibreCAT
SX Manager’s Web Based Interface:
● Part Number (see Fujitsu Siemens Computers’ identification label on the rear side of
each controller module)
FibreCAT SX60 HIM Model 1 has the part number 10600862818 only.
FibreCAT SX80 HIM Model 1 has the part number 10600862820 only.
● FibreCAT SX Manager’s WBI
1. Open FibreCAT SX Manager’s Web Based Interface.
2. Login as monitor or manage user.
3. In MONITOR STATUS menu, click the Link advanced settings (see screenshots below).

Here you can find out the HIM Model of your controller module(s):
Figure 7: Detecting the HIM Model with FibreCAT SX Manager’s WBI (Example with two HIM Models 0)
Figure 8: Detecting the HIM Revision with FibreCAT SX Manager’s WBI (Example with two HIM Models 1)

2.4.4 Expansion Module

The architecture of the expansion module is a simplified version of the controller module.
Like the controller module, the expansion module contains management features and uses
the SAS protocol. As shown in Figure 9, each module provides a SAS “In” port and a SAS
“Out” port, which enables up to four expansion enclosures to be daisy-chained from the
enclosure. These are then connected to a 24-port SAS expander, which in turn is connected
to the drive module.
Figure 9: Expansion Module Block Diagram

The diagram in Figure 10 illustrates how the array can be configured for disk drive
expansion. Additional configurations are available.
Storage Storage
Figure 10: Cabling of Controller Enclosure to Two Expansion Enclosures

System Architecture Power and Cooling Modules
2.5 Power and Cooling Modules

This section describes the power and cooling modules.
2.5.1 Cooling Fans

Each tray contains two power and cooling modules. The cooling fans are integrated into
each of the power and cooling module FRUs. Each module contains two fans mounted in
tandem (series). The fans are powered from the 12V common rail so that a single failed
power supply still enables all fans to continue to operate.
The fans cannot be accidentally removed as they are part of the power and cooling module.
Removing this module requires the disengagement of a captive panel fastener and the
operation of an ejector lever to remove it from the chassis.
Should one fan fail in either one of the two power and cooling modules, the system
continues to operate indefinitely. In addition, the fan system enables the airflow pattern to
remain unchanged and there is no pressure leak through the failed fan since there are
always two fans in tandem, and they are sealed to each other through a calibrated cavity.
Should a power and cooling module be turned off or unplugged, the fans inside the module
continue to operate at normal capacity. This is accomplished by powering each fan from a
power bus on the midplane.
The fans’ variable speed is controlled by the controller modules through an I2C interface.
The fans also provide tachometer speed information through the I2C interface. Speed
control is accomplished through the use of speed commands issued from the controller
module. The controller module has one temperature sensor at the inlet port of the controller
to sense the exhaust air temperature from the disk drives. Should the controller module
sense a rise in temperature, it can increase fan speed to keep the disk drive temperatures
within limits.
Balanced cooling for all of the drives is accomplished through the use of two mechanisms.
● Tuned port apertures in the midplane placed behind each drive carrier slot
● The use of a cavity behind the entire surface of the midplane (side-to-side and top-to-
bottom) that acts as an air pressure equalization chamber. This chamber is commonly
evacuated by all of the fans.
In this way the amount of mass flow through each drive slot is controlled to be the same slot
to slot.
Airflow is controlled and optimized over the power supply by using the power supply chassis
as the air-duct for the power supply, ensuring that there are no dead air spaces in the power
supply core and increasing the velocity flow (LFM) by controlling the cross sectional area
that the mass flow travels through.

Power and Cooling Modules System Architecture
Airflow is controlled and optimized over the RAID I/O board and HIM in a similar manner.
The controller cover is used as an air duct to force air over the entire surface of the controller
from front to back, ensuring no dead air spaces, and increasing the velocity flow (LFM) by
controlling the cross-sectional area that the mass flow travels through.
Cooling for all hot components is passive. There are no other fans in the system other than
the fans contained in the power and cooling module.
2.5.2 Airflow
CAUTION
! To allow for correct airflow and cooling, use an air management module for removed
disk drives and IOMs. Do not leave a FRU out of its slot for more than 2 minutes.
As noted above, the array's cooling system is comprised of four fans in a tandem parallel
array. These variable speed fans provide low noise and high mass flow rates. Airflow is from
front to back. Each drive slot draws ambient air in at the front of the drive, sending air over
the drive surfaces and then through tuned apertures in the chassis midplane.
Note that the air-flow washes over the top and bottom surface of the disk drive at high mass
flow and velocity flow rates, so both sides of the drive are used for cooling. The air-flow
system uses a cavity in the chassis behind the midplane as an air-pressure equalization
chamber to normalize the negative pressure behind each of the disk drive slots. This
mechanism together with the tuned apertures in the midplane behind each drive assures
an even distribution of airflow and therefore LFM for each drive slot. This even cooling
extends the operational envelope of the system by ensuring no 'hot' drive bypass.
Further, airflow is “in line” with the top and bottom surfaces of the drive to reduce back-
pressure and optimize fan performance. All of the mass flow at room ambient is used for
cooling the 12 disk drives. The high velocity flow helps to lower the thermal resistance of
the disk drive assembly to ambient temperature. The thermal temperature rise of the disk
drive is dependent upon the power consumed by the disk drive, which varies by drive model
as well as the level of drive activity.

3 Fault Isolation Methodology
The FibreCAT SX storage system provides many ways to isolate faults within the system.
This chapter presents the basic methodology used to locate faults and the associated
FRUs.
The basic fault isolation steps are:
● Gather fault information
● Determine where in the system the fault is occurring
● Review event logs
● If required isolate the fault to data path component
3.1 Gather Fault Information

When a fault occurs, it is important to gather as much information as possible. Doing so will
help you determine the correct action needed to remedy the fault.
Begin by reviewing the reported fault. Is the fault related to an internal data path or an
external data path? Is the fault related to a hardware component such as a drive module,
controller module, or power and cooling module? By isolating the fault to one of the systems
within the array, you will be able to determine the necessary action more rapidly.
3.2 Determine Where the Fault Is Occurring

Once you have an understanding of the reported fault, review the enclosure LEDs. The
enclosure LEDs are designed to alert users of any system faults and might be what alerted
the user to a fault in the first place.
When a fault occurs, the enclosure status LEDs, located on the right side of the drive slots
(shown in Figure 13), illuminate. Review the LEDs on the back of the enclosure to narrow
the fault to a FRU, connection, or both. The LEDs will also help you identify the location of
a FRU reporting a fault.

Review the Event Logs Fault Isolation Methodology
Use FibreCAT SX Manager’s WBI to verify any faults found while viewing the LEDs.
FibreCAT SX Manager’s WBI is also a good tool to use in determining where the fault is
occurring if the LEDs cannot be viewed due to the location of the system. FibreCAT SX
Manager’s WBI provides you with a visual representation of the system and where the fault
is occurring. It can also provide more detailed information about FRUs, data, and faults. See
“Troubleshooting Using System LEDs” on page 29 for more information about LEDs.
3.3 Review the Event Logs

The event logs record all system events. It is very important to review the logs, not only to
identify the fault, but also to search for events that might have caused the fault to occur. For
example, a host could lose connectivity to a virtual disk if a user changes channel settings
without taking the storage resources assigned to it into consideration. In addition, the type
of fault can help you isolate the problem to hardware or software. See “Troubleshooting
Using Event Logs” on page 67 for more information about event logs.
3.4 Isolate the Fault

Occasionally it might become necessary to isolate a fault. This is particularly true with data
paths due to the number of components the data path consists of. For example, if a host
side data error occurs, it could be created by any of the components in the data path:
controller module, cable, or HBA. For more information about isolating faults, see “Trouble-
shooting Using System LEDs” on page 29.

4 Troubleshooting Using System LEDs
The first step in troubleshooting your storage system is to check the status of the system
LEDs. System LEDs can often target the FRU that is causing an error. This chapter
describes the system LEDs and includes the following topics:
● “Enclosure LEDs” on page 29
● “Drive Module LEDs” on page 31
● “Controller Module LEDs and Fault Isolation” on page 33
● “Power and Cooling Module LEDs” on page 40
● “Expansion Enclosure LEDs” on page 41
4.1 Enclosure LEDs

The enclosure LEDs include the Status LEDs and the Enclosure ID. When checking system
LEDs, start at the front status LEDs.
4.1.1 Status LEDs

As shown in Figure 11, the group of LEDs located on the right side of the drive slots are the
enclosure status LEDs. They provide a quick glimpse into the system, providing status for
fault notification, power, and temperature.

Enclosure LEDs Troubleshooting Using System LEDs
Unit Locator
Fault/Service Required
FRU OK
Temperature Fault
Figure 11: Enclosure Status LEDs
Check the status LEDs as described in Table 2 periodically or after you have received an
error notification. It is important to note that more than one of the LEDs might display a fault
condition at the same time. For example, if a disk drive were to fail due to an exceedingly
high ambient temperature, the temperature fault LED and the fault/service LED both display
the fault. This functionality can help determine the cause of a fault in a FRU.
Location LED Color Power-On Operating Description
State State1
Right ear White On for 3–4 Off Normal operation.
Unit Locator seconds, Off
Unit Locator
icon
Blink Physically identifies the enclosure.
Right ear Fault/Service Yellow On for 3–4 Off No fault.
Required seconds, Off
Fault/Service
Required icon
On An enclosure-level fault occurred.
Service action is required. The
event has been acknowledged but
the problem needs attention.
Right ear Green On for 3–4 On The enclosure is powered on with
FRU OK seconds, blink at least one power and cooling
Power On/OK for up to 2 module operating normally.
icon minutes during
boot, On
Off Both power and cooling modules
are off.
Table 2: Enclosure Status LEDs (Front)

Troubleshooting Using System LEDs Drive Module LEDs

State State1
Right ear Green On green for 3– Off The enclosure temperature is
Temperature 4 seconds, On normal.
Fault
Temperature
Fault icon
Yellow On The enclosure temperature is
above threshold.
Table 2: Enclosure Status LEDs (Front)
1
If new firmware is detected, the LEDs will be On for up to 10 seconds.
4.1.2 Enclosure ID
A hex display on the left enclosure ear as shown in Figure 12 provides the enclosure ID.
The ID number it presents enables you to correlate a physical enclosure with logical views
presented in FibreCAT SX Manager’s WBI. The enclosure ID for a controller enclosure is
always zero (0); the enclosure ID for an attached expansion enclosure is always nonzero.
Fore more information about the Enclosure ID, see “Enclosure ID” on page 13.
Enclosure ID
Figure 12: Enclosure ID
4.2 Drive Module LEDs

When a disk drive fault occurs, the failed drive module’s lower LED is solid yellow, indicating
that it must be replaced. If the failed drive module is a member of a virtual disk, all of the
virtual disk member drives blink yellow. These LEDs continue to blink until the virtual disk
is no longer in a critical state.

Drive Module LEDs Troubleshooting Using System LEDs
The drive module LEDs are shown in Figure 13 and described in Table 3.
Drive module LEDs
Figure 13: Drive Module LEDs

State State1
Drive OK to Remove Blue Off Off The drive module is not prepared
module for removal.
On The drive module has been
removed from any active virtual
disk, spun down, and prepared
for removal.
Drive Power/Activity/ Green On Off The drive module is not powered
module Fault on.
On The drive module is operating
normally.
Blink The drive module is active and
processing I/O or is performing a
media scan.
Yellow Off Off No fault.
On The drive module has experi-
enced a fault or has failed.
Blink Physically identifies the drive
module in the enclosure.
Table 3: Drive Module LEDs
1 If new firmware is detected, the LEDs will be On for up to 10 seconds.

Troubleshooting Using System LEDs Controller Module LEDs and Fault Isolation
4.3 Controller Module LEDs and Fault Isolation

The controller module LEDs provide status information that help you isolate faults. They are
divided into the following types:
● Host connectivity
● Expansion connectivity
● Management connectivity
● Controller module status and functionality
4.3.1 Isolating Faults Using the Host Link Status LEDs

If you are having difficulty with a host-side connection, check the controller module’s link
status LEDs as shown in Figure 14 and described in Table 4.
Host link status Host link speed
Host activity
Figure 14: Host Link Status LEDs
Location LED Color State Description

Controller Host link status Green Off The port is empty or the link is down.
module
On The port link is up and connected.
Controller Host link speed Green Off The data transfer rate is 2 Gbps.
module
On The data transfer rate is 4 Gbps.
Controller Host activity Green Off The host ports have no I/O activity.
module
Blink At least one host port has I/O activity.
Table 4: Host Link Status LEDs

Controller Module LEDs and Fault Isolation Troubleshooting Using System LEDs
If the host link status LED indicates that there is no link, review the event logs for indicators
of a specific fault in a host data path component. If you are unable to locate a specific fault
or are unable to access the event logs, halt all I/O and use the following procedure to isolate
the fault. The procedure requires scheduled downtime.
NOTE
i Do not perform more than one step at a time. Changing more than one variable at
a time can complicate the troubleshooting process.
1. Halt all I/O.
2. Check the host activity LED.
If there is activity, halt all applications that access the array.
3. Reseat the SFP and FC cable.
Is the host link status LED on?
● Yes – Monitor the status to ensure that there is no intermittent error present. If the
fault occurs again, clean the connections to ensure that a dirty connector is not
interfering with the data path.
● No – Proceed to Step 4.
4. Move the SFP and cable to a port with a known good link status.
This step isolates the problem to the external data path (SFP, host cable, HBA) or to the
I/O controller module port.
● Yes – You now know that the SFP, host cable, and HBA are functioning properly.
Return the SFP and cable to the original port. If the link status LED remains off, you
have isolated the fault to the controller module’s port. Replace the controller
module.
5. Swap the SFP with the known good one.
● Yes – Replace the controller module. The fault has been isolated.
6. Place the original SFP back into the configuration and swap the cable with a known
good one.
● Yes – Replace the original cable. The fault has been isolated.

7. Replace the HBA with a known good HBA, or move the host side cable and SFP to a
known good HBA.
● Yes – You have isolated the fault to the HBA. Replace the HBA.
● No – It is likely that the controller module needs to be replaced.
4.3.2 Isolating Faults Using the Expansion Port Status LED

If you are having difficulty with the connectivity of an expansion enclosure, review the
expansion port status LED as shown in Figure 15 and described in Table 5.
Expansion port status
Figure 15: Expansion Port Status LED

Controller Expansion port status Green Off The port is empty or the link is down.
module
Table 5: Expansion Port Status LED
If the expansion port status LED indicates that there is no link, review the event logs for
indicators of a specific fault. If you are unable to locate a specific fault or are unable to
access the event logs, halt all I/O and use the following procedure to isolate the fault. The
procedure requires scheduled downtime.
NOTE
i Do not perform more than one step at a time. Changing more than one variable at
a time can complicate the troubleshooting process.
1. Halt all I/O.
2. Check the host activity LED.

If there is activity, halt all applications that access the array.

3. Reseat the expansion cable.
Is the expansion port status LED on?
● Yes – Monitor the status to ensure there is no intermittent error present. If the fault
occurs again, clean the connections to ensure that a dirty connector is not inter-
fering with the data path.
4. Move the expansion cable to a port on the RAID enclosure with a known good link
status.
This step isolates the problem to the expansion cable or to the controller module’s
expansion port.
● Yes – You now know that the expansion cable is good. Return cable to the original
port. If the expansion port status LED remains off, you have isolated the fault to the
controller module’s expansion port. Replace the controller module.
5. Move the expansion cable back to the original port on the controller enclosure.
6. Move the expansion cable on the expansion enclosure to a known good expansion port
on the expansion enclosure.
● Yes – You have isolated the problem to the expansion enclosure’s port. Replace the
expansion module.

7. Replace the cable with a known good cable, ensuring the cable is attached to the
original ports used by the previous cable.
● Yes – Replace the original cable. The fault has been isolated.
● No – It is likely that the controller module needs to be replaced
4.3.3 Isolating Management Connection Faults

Ethernet LEDs are shown in Figure 16. Use Table 6 to identify any link faults with the
Ethernet management connection. The LEDs described are the same for most Ethernet
connections. Use standard networking troubleshooting procedures to isolate faults on the
network.
Ethernet link status Ethernet activity
Figure 16: Ethernet LEDs

Controller Ethernet link status Green Off The Ethernet port is not connected or the link is down.
module
On The Ethernet link is up.
Controller Ethernet activity Green Off The Ethernet link has no I/O activity.
module
Blink The Ethernet link has I/O activity.
Table 6: Ethernet LEDs

4.3.4 Checking Controller Module Status and Functionality

There are three conditions that can be isolated using the LEDs on the controller module:
● FRU status
● FRU location
● Cache status
The controller module LEDs shown in Figure 17 and described in Table 7 are associated
with the above conditions
.
Unit Locator FRU OK
OK to Remove Fault/Service Required

Figure 17: Controller Module LEDs

Controller Unit Locator White Off Normal operation.
module Unit Locator icon
Blink Physically identifies the controller module.
Controller OK to Remove Blue Off The controller module is not prepared for removal.
module OK to Remove icon
On The controller module can be removed.
Controller Fault/Service Yellow On A fault has been detected or a service action is
module Required required.
icon
Blink Indicates a hardware-controlled power up or a cache
flush or restore error.
Controller FRU OK Green Off Controller module is not OK.
module Power On/OK icon
On Controller module is operating normally.
Blink System is booting.
Table 7: Controller Module Status LEDs


Controller Cache status Green Off Cache is clean (contains no unwritten data).
module
On Cache is dirty (contains unwritten data) and
operation is normal.
Blink A Compact Flash flush or cache self-refresh is in
progress.
Table 7: Controller Module Status LEDs
4.3.5 FRU Status

When a controller module failure is suspected, check the FRU OK LED. If the LED is off,
check the event log for specific information regarding the failure.
4.3.6 FRU Location

Once a fault has been isolated to a controller module, it is important to identify the FRU prior
to removing it. To identify the FRU, illuminate the Unit Location LED using FibreCAT SX
Manager’s WBI.
1. Select Manage > General Config > Enclosure Management.
2. If there is more than one enclosure, select the enclosure you want to identify.
3. Click Illuminate Locator LED.
The white unit locator LED located on the enclosure’s ear blinks so you can determine
the enclosure’s physical location.
To stop an enclosure LED from illuminating, select the enclosure that you want to return
to a normal state, and click Turn Off Locator LED.
4.3.7 Cache Status

The Cache Status LED is primarily used when there is a power failure. When power to an
array fails, EcoStor keeps primary data handling components powered on while the data in
cache is moved from cache to compact flash. While this occurs, the Cache Status LED
blinks. It continues to blink as long as the super capacitors found in EcoStor can maintain
power for cache. Should the power be restored prior to the super capacitors being drained,
the system will boot, find the valid cache data, and write it to disk. Should the power be
restored after the super capacitors have drained, the system will have to restore from the
compact flash.
The Cache Status LED indicates that cache is dirty when it is solid on, which can be seen
when write back cache is enabled.

Power and Cooling Module LEDs Troubleshooting Using System LEDs
4.4 Power and Cooling Module LEDs

As shown in Figure 18 and described in Table 8, there are two LEDs on the power and
cooling module. The top LED provides a visual indication of the AC power input, and the
bottom LED provides a visual indication of the DC power output.
If the AC Power Good LED turns off, ensure the power cord is properly connected and
check the power source it is connected to.
When a fan or power supply fails or falls below acceptable RPM/voltage ranges, the DC
Voltage/Fan Fault/Service Required LED illuminates. When isolating faults in the power and
cooling module, it is important to remember that the fans in the module are powered by a
power bus on the midplane and are not powered from the power supply. Thus, when a
power supply fails, the fans continue to operate normally.
AC Power Good
DC Voltage/Fan
Figure 18: Power and Cooling Module LEDs

Power and AC Power Good Green Off AC power is off or input voltage is below the
cooling minimum threshold.
module
On AC power is on and input voltage is normal.
Power and DC Voltage/Fan Fault/ Yellow Off DC output voltage is normal.
cooling Service Required
module
On DC output voltage is out of range or a fan is
operating below the minimum required RPM.
Table 8: Power and Cooling Module LEDs

Troubleshooting Using System LEDs Expansion Enclosure LEDs
4.5 Expansion Enclosure LEDs

The indicators on the front of an expansion enclosure are the same as on a controller
enclosure (see Table 2). The SAS In port status and SAS Out port status can be used to
isolate connectivity problems using the same steps found in “Isolating Faults Using the
Expansion Port Status LED” on page 35.
Figure 19 shows the location of the expansion enclosure LEDs and Table 9 describes them
.
DC Voltage/Fan Fault/
AC Power Good SAS In port status SAS Out port status
Service Required
0 0
0 0
Unit Locator FRU OK

OK to Remove Fault/Service Required
Figure 19: Expansion Enclosure LEDs (Back View)

Power and AC Power Good Green Off AC power is off or input voltage is below the minimum
cooling threshold.
module
On AC power is on and input voltage is normal.
Power and DC Voltage/Fan Yellow Off DC output voltage is normal.
cooling Fault/Service Required
module
On DC output voltage is out of range or a fan is operating
below the minimum required RPM.
Expansion SAS In port status Green Off The port is empty or the link is down.
module
Expansion SAS Out port status Green Off The port is empty or the link is down.
module
Table 9: Expansion Enclosure LEDs (Back)

Expansion Enclosure LEDs Troubleshooting Using System LEDs

Expansion Unit Locator White Off Not active.
module Unit Locator icon
Blink Physically identifies the expansion module.
Expansion OK to Remove Blue Off Not implemented.
module OK to Remove icon
Expansion Fault/Service Yellow On A fault has been detected or a service action is
module Required required.
icon
Blink Indicates a hardware-controlled power up or a cache
flush or restore error.
Expansion FRU OK Green Off Expansion module is not OK.
module Power On/OK icon
On Expansion module is operating normally.
Blink System is booting.
Table 9: Expansion Enclosure LEDs (Back)

5 Troubleshooting Using WBI
This chapter describes how to use the FibreCAT SX Manager’s WBI interface to trouble-
shoot your storage system enclosure and its FRUs. It also describes solutions to problems
you might experience when using FibreCAT SX Manager’s WBI.
Topics covered in this chapter include:
● “Determining Overall Array Status and Verifying Faults” on page 44
● “Stopping I/O” on page 46
● “Isolating Faulty Disks” on page 46
● “Isolating Data Path Faults” on page 49
● “Isolating FRU Faults” on page 54
● “Clearing Metadata From a Leftover Disk Drive” on page 55
● “Using Diagnostic Manage-Level Only Options” on page 55
● “Using Advanced Manage-Level Recovery and Debug Utilities” on page 62
● “Problems Accessing the Array Using FibreCAT SX Manager’s WBI” on page 64
NOTE
i You can also use the CLI to troubleshoot your storage system. “Troubleshooting
Using the CLI” on page 135 provides information on specific CLI commands that
can be used to troubleshoot your system.

Determining Overall Array Status and Verifying Faults Troubleshooting Using WBI
5.1 Determining Overall Array Status and Verifying Faults

You can determine the overall status and health of the array and each FRU component
using FibreCAT SX Manager’s WBI’s Status Summary screen. By default the Status
Summary screen is displayed upon login or you can navigate to it.
To display the Status Summary page:
● Select Monitor > Status > Status Summary.
The following four panels are displayed:
– FibreCAT SX Manager’s WBI Administrator Status Message – Displays the status
of the controller(s) and provides system messages to the administrator.
– Virtual Disk Overview – Displays the status of each virtual disk created. To view
more details about a virtual disk, click on a virtual disk icon, which displays the
Virtual Disk Status page.
– Hardware Status – Shows whether the hardware status of each controller and each
enclosure(s) is in good operational condition (green icon ) or has a hardware
failure (red icon with an exclamation point ). To view details, click on a component,
which displays the Module Status screen.
– System Panel – Displays the system status as well as the world wide name (WWN)
and IP address of each controller. This panel also enables you to click an IP address
to switch to the selected controller or to open the event log by clicking on the Event
Log icon in the upper right corner of the panel.
After a fault has occurred and you have checked the enclosure LEDs, it is important to
verify the fault. The Status Summary screen provides you with a way to discover any
faulty FRUs as described in the following steps.
4. Review the status icons in the upper left corner of each panel.
● A green icon indicates that components associated with that panel are good.
● A red icon with an exclamation point indicates a fault with a FRU that is included
in the panel or a FRU required by the component in the panel.
5. Review each panel that has a fault icon displayed, which indicates areas of the system
that are affected by the fault.

Troubleshooting Using WBI Determining Overall Array Status and Verifying Faults
Figure 20: FibreCAT SX Manager’s WBI Summary Screen
6. Look for red text in the panels.

Any red text indicates where the fault is occurring. In Figure 20: for example, the panels
are indicating an issue with a virtual disk.
7. To gather more details regarding the failure, click any of the red text.
The associated status screen is displayed.
8. Review the information displayed in the status screen.
If the fault is with a FRU, an image of the enclosure is displayed. The faulty FRU is
highlighted in red if it has a fault or grayed out if it has been removed or not installed.

Stopping I/O Troubleshooting Using WBI
5.2 Stopping I/O

When troubleshooting disk and connectivity faults, ensure you have a current full backup.
As an additional data protection precaution, stop all I/O to the affected virtual disk(s). When
at a customer site, you can verify that there is no I/O activity by briefly monitoring the system
LEDs; however, when working on a remote system, this is not possible.
To check the I/O status of a remote system, use the Overall Rate Stats page located at
Monitor > Statistics > Overall Rate Stats. The Overall Rate Stats page enables you to view
I/O based on the host-side activity interval since the page was last refreshed. The following
data is presented for all virtual disks:
● The total IOPS and bandwidth for all virtual disks
● The IOPS and bandwidth for each virtual disk
To ensure that all I/O has ceased on a remote system using the Overall Rate Stats screen,
perform the following steps:
1. Quiesce all applications.
2. Select Monitor > Statistics > Overall Rate Stats.
3. Click the refresh button on your browser to ensure you are viewing the most current
information.
4. Verify that both the I/O Sec and Bandwidth indicators display no activity.
5.3 Isolating Faulty Disks

When a disk drive fault occurs, the base troubleshooting actions are:
● Identify the faulty disk drive
● Review the disk error statistics
● Review the event log
● Replace the faulty disk drive

Troubleshooting Using WBI Isolating Faulty Disks
5.3.1 Identifying a Faulty Disk Drive

The identification of a faulty disk drive involves confirming the disk drive fault and identifying
the physical location of the disk drive.
To confirm a disk drive fault, use the basic troubleshooting steps in “Determining Overall
Array Status and Verifying Faults” on page 44. You can also navigate to Monitor > Status >
Show Notification Screen and look for any notifications pertaining to a disk drive fault.
To identify the physical location of a disk drive, perform the following steps:
1. Select Manage > Utilities > Disk Drive Utilities > Locate Disk Drive.
2. Select the disk identified as having a fault by clicking in its check box.
3. Click Update LED Illumination, which turns on the identification LED for that disk.
For more information about viewing disk information, refer to the “FibreCAT SX60 / SX80 /
SX88 Administrator’s Guide”.
5.3.2 Reviewing Disk Error Statistics

The Disk Error Stats screen provides specific disk drive fault information. It shows a
graphical representation of the enclosures and disks installed in the system. The Disk Error
Stats screen can be used to gather disk drive information and to identify specific disk drive
errors. Additionally, you can capture intermittent errors.
To view the disk drive statistics, perform the following steps:
1. Select Monitor > Statistics > Disk Error Stats.
The top panel displays all enclosure and disks installed in the system.
2. Select the disk drive whose error statistics you want to view.
3. Click Show Disk Drive Error Statistics.
The drive error data for the selected disk is displayed in the second panel.
4. Note any current error counts displayed.
The data displayed includes the following:
● SMART Event Count – Number of SMART (Self Monitoring And Reporting
Technology) events the drive recorded. These events are often used by the vendor
to determine the root cause of a drive failure. Some, but not all, SMART events can
be fatal.
● I/O Timeout Count – Number of times the drive accepted an I/O request but did not
complete it in the required amount of time. Excessive timeouts can indicate
potential device failure (media retries or soft, recoverable errors).

Isolating Faulty Disks Troubleshooting Using WBI
● No Response Count – Number of times the drive failed to respond to an I/O request.
A high value can indicate that the drive is too busy to respond to further requests.
● Spin-up Retries – Number of times the drive failed to start on power-up or on
software request. Excessive spin-up retries can indicate that a drive is close to
failing.
● Media Errors – Number times the drive had to retry an I/O operation because the
media did not successfully record/retrieve the data correctly.
● Non Media Errors – Number of soft, recoverable errors not associated with drive
media.
● Bad Block Reassignments – Number of block reassignments that have taken place
since the drive was shipped from the vendor. A large number of reallocations in a
short period of time could indicate a serious condition.
● Bad Block List Size – Number of blocks that have been deemed defective either
from the vendor or over time due to reallocation.
5.3.3 Capturing Trend Data

To capture trend data regarding a specific drive or for all the disks, perform the following
steps:
1. Perform Step 1-Step 4 from “Reviewing Disk Error Statistics” on page 47.
2. Create a baseline by clearing the current error statistics.
To clear the statistics for a single disk, select the desired disk and click Clear Selected
Disk Drive Error Statistics.
To clear the statistics of all disks, click Clear All Disk Drive Error Statistics.
If a faulty disk is present, disk errors are captured in a short period of time. If the disk
drive has intermittent errors you might have to monitor the array for more than 24 hours.
3. To view the disk error statistics, select the suspected disk drive and click Show Disk
Drive Error Statistics.
4. Review the Disk Drive Error Statistics panel for disk drive errors.
The Disk Drive Error Statistics panel enables you to review errors from each of the two
ports.

Troubleshooting Using WBI Isolating Data Path Faults
5.3.4 Reviewing the Event Logs

If all the steps in “Identifying a Faulty Disk Drive” on page 47 and “Reviewing Disk Error
Statistics” on page 47 have been performed, you have determined the following:
● A disk drive has encountered a fault
● The location of the disk drive
● What the fault is
The next step is to review the event logs to determine if there were any events that led to
the fault. If this step is skipped, you could replace the faulty disk only to encounter another
fault.
The event logs can be accessed from any screen by scrolling to the last panel and clicking
the Event Log icon in the upper right corner. See “Using Event Logs” on page 67 for more
information about using event logs.
5.4 Isolating Data Path Faults

When isolating data path faults, you must first isolate the fault to an internal data path or an
external data path. This will help to target your troubleshooting efforts.
Internal data paths include the following:
● Controller to disk connectivity
● Controller to controller connectivity
● Controller ingress (incoming signals from expansion enclosures)
● Controller egress (outgoing signals to expansion enclosures)
External data paths consist of the connections between the array enclosure, SFPs, and
hosts.
To troubleshoot a data path using FibreCAT SX Manager’s WBI, take the following actions:
● Identify the fault as an internal or external data path fault using the steps in “Determining
Overall Array Status and Verifying Faults” on page 44
● Gather details regarding the fault
● Review event logs
● Replace the faulty component

Isolating Data Path Faults Troubleshooting Using WBI
5.4.1 Isolating Internal Data Path Faults

A Physical Layer Interface (PHY) is an interface in a device used to connect to other
devices. The term refers to the physical layer of the Open Systems Interconnect (OSI) basic
reference model. The physical layer defines all of the electrical and physical specifications
for a device.
In a SAS architecture, each physical point-to-point connection is called a lane. Every lane
has a PHY at either end. Lanes are sometimes referred to as physical links.
In the FibreCAT SX, fault isolation firmware monitors hardware PHYs for problems.
PHYs are tested and verified before shipment as part of the manufacturing and qualification
process. But subsequent problems can occur in a PHY because of installation problems
such as:
● a bad cable between enclosures.
● a controller connector that is damaged as a result of attaching a cable and then torquing
the cable connector until solder joints connecting the controller connector become
fatigued or break.
Problem PHYs can cause a host or RAID controller to continually rescan disks, which
disrupts I/O or causes I/O errors. I/O errors can result in a failed drive, causing a virtual disk
to become critical or causing complete loss of a virtual disk if more than one disk fails.
To avoid these problems, problem PHYs are identified and disabled, if necessary, and
status information is transmitted to the RAID controller so that each action can be reported
in the event log. Problem PHY identification and status information is reported in FibreCAT
SX Manager’s WBI, but disabled PHYs are only reported through event messages.
Some PHY errors can be expected when powering on an enclosure, removing or replacing
a controller, and when connecting or disconnecting an enclosure. An incompletely
connected or disturbed cable might also generate a PHY error. These errors are usually not
significant enough to disable a PHY, so the fault isolation firmware analyzes the number of
errors and the error rate. If errors for a particular PHY increase at a slow rate, the PHY is
usually not disabled. Instead the errors are accumulated and reported.
On the other hand, bad cables connecting enclosures, damaged controller connectors, and
other physical damage can cause continual errors, which the fault isolation firmware can
often trace to a single problematic PHY. The fault isolation firmware recognizes the large
number and rapid rate of these errors and disables this PHY without user intervention. This
disabling, sometimes referred to as PHY fencing, eliminates the I/O errors and enables the
system to continue operation without suffering performance degradation.
Once the firmware has disabled a PHY, the only way to enable the PHY again is to reset the
affected controller or power cycle the enclosure. Before doing so, it may be necessary to
replace a defective cable or FRU.

If a PHY becomes disabled, the event log entry helps to determine which enclosure or
enclosures and which controller (or controllers) are affected.
5.4.2 Using the Expander Status Screen

FibreCAT SX Manger's WBI’s Expander Status screen includes an Expander Controller
PHY Detail panel. This panel shows the internal data paths that show the data paths for the
storage controller, expander controller, disks, and expansion ports. Review this screen to
locate an internal data path that has a fault.
NOTE
i You must be logged in as a Diagnostic-level user to access the Expander Status
screen.
To access and properly use the Expander Status screen, perform the following steps:
1. Select Monitor > Status > Advanced Settings > Expander Status.
2. Ensure that you are viewing the proper system.
To aid in the identification of a system, the second panel of the Expander Status screen
displays the following information about the enclosure: Name, Vendor, Location, Status,
Misc. (enclosure ID), WWN, Model, Rack:Position and Firmware Version.
3. Review the PHY status panel.
The Expander Controller Phy Detail panel displays a graphical representation of the
signal path between the storage controller, expander controller, and expansion devices
(disk drives and expansion enclosures) as shown in Figure 21.
Figure 21: Expander Controller PHY Detail
a) Review each connection represented by a line between the device and Controller A
or Controller B.
Any red lines indicate a fault.

Isolating Data Path Faults Troubleshooting Using WBI
b) Review the details of any PHYs found to have a fault.

To do this pause your mouse over the “i” icon . If you click the icon, the information
remains on screen even after you move the cursor. The following information is
displayed:
● Status – OK, ERROR, or DISABLED. ERROR indicates that error count has
reached a significant level. DISABLED indicates that the enclosure controller
has disabled the PHY.
● Type – Specifies whether the PHY lane type is for a Disk Drive, Inter-Expander,
Ingress, or Egress. Egress PHYs represent SAS ports on controller enclosures
and expansion enclosures. Ingress PHYs represent SAS ports on expansion
enclosures.
● Grp Num – Group number.
● Flag Bits – Control bits used by the SAS expander.
● PHY Change – Specifies the number of times the PHY originated a
BROADCAST (CHANGE). A BROADCAST (CHANGE) is sent if dword
synchronization is lost or at the end of a Link Reset sequence.
● Code Violation – Specifies the number of times the PHY received an unrecog-
nized or unexpected signal.
● Disparity – Specifies the number of dwords containing running disparity errors
that have been received by the PHY, not including those received during Link
Reset sequences. A running disparity error occurs when positive and negative
values in a signal don't alternate.
● CRC Error – In a sequence of SAS transfers (frames), the data is protected by
a cyclic redundancy check (CRC) value. This error specifies the number of
times the CRC value does not compute when data is decoded, meaning the
received data did not match the transmitted data.
● Inter-Conn Error – Specifies the number of times the lane between two
expanders experienced a communication error.
● Lost DWord – Specifies the number of times the PHY has lost dword synchro-
nization and restarted the Link Reset sequence.
● Invalid DWord – Specifies the number of invalid dwords that have been received
by the PHY, not including those received during Link Reset sequences.
When working with intermittent errors, you may want to reset PHY status so that you can
observe trend information. If so, note the PHY that is currently in error, and then click Reset
Phy Status to clear all PHY status information. Then, once the error recurs, check the PHY
status panel to see any changes. The error counts display only the errors that occurred in
the time period between the clearing of the PHY status and the current time.

5.4.3 Reviewing the Event Log for Disabled PHYs

If the fault isolation firmware disables a PHY, the event log shows the following message:.
A552 Phy disabled. Erclosure:ADD. Phy20.Phys:d20 Type:Egress.

Reason:Externally Disabled
The message in the event log helps you determine which enclosure or enclosures and
which controller or controllers are affected. The reason can be Err Count Interrupts, Exter-
nally Disabled, Ctrl Page Disabled, or Unknown Reason.
5.4.4 Resolving PHY Faults

1. Ensure that the cables are securely connected. If they are not, tighten the connectors.
2. Reset the affected controller or power-cycle the enclosure.
3. If the problem persists, replace the affected FRU or enclosure.
4. Periodically examine the Expander Status screen to see if the fault isolation firmware
disables the same PHY again. If it does:
a) Replace the appropriate cable
b) Reset the affected controller or power-cycle the enclosure.
5.4.5 Isolating External Data Path Faults

To troubleshoot external data path faults, perform the following steps:
1. Select Monitor > Status > Advanced Settings > Host Port Status.
The host port status provides a graphical representation of the Fibre Channel port
status as well as port details.
2. Review the graphical representation of the FC port status.
● Green – The link is up.
● Red – The link is down.
● White –There is no SFP installed.
An indication of link down can be caused by one or more of the following conditions:
● A faulty HBA in the host
● A faulty Fibre Channel cable

Isolating FRU Faults Troubleshooting Using WBI
● A faulty SFP
● A faulty port in the host interface module
● A disconnected cable
3. To target the cause of the link failure, view the FC port details by clicking on a port in
the graphical view and then reviewing the details listed below it.
The data displayed includes:
● Host Port Status Details – Selected controller module and port number.
● SFP Detect – SFP Present or No SFP Present.
● Receive Signal – Present or Not Present.
● Link Status – Active or Inactive.
● Signal Detect – No Signal or Signal Detected.
● Topology – Loop. If the loop is active, shows Private Loop or Public Loop.
● Speed – 2 Gbyte/sec or 4 Gbyte/sec as set in FibreCAT SX Manger’s WBI. To
change this setting for host ports, go to the
Manage > General Config > Host Port Configuration page.
● FC Address – 24-bit FC address or Unavailable if the FC link is not active.
● Node WWN – FC World Wide Node Name (WWNN).
● Port WWN – FC World Wide Port Name (WWPN).
5.4.6 Resetting Host Channel

For a Fibre Channel system using FC-AL (loop) topology, you might need to reset a host
link to fix a host connection or configuration problem. This command enables you to
remotely issue a loop initialization primitive (LIP) on specified controller channels.
To reset host channels:
1. Select Manage > Utilities > Host Utilities > Reset Host Channel.
2. Set the channel and controller options, and then click Reset Host Channel.
5.5 Isolating FRU Faults

For information regarding the isolation of faults for disk drives see “Isolating Faulty Disks”
on page 46.

Troubleshooting Using WBI Clearing Metadata From a Leftover Disk Drive
5.6 Clearing Metadata From a Leftover Disk Drive

All of the member disk drives in a virtual disk contain metadata in the first sectors. The array
uses the metadata to identify virtual disk members after restarting or replacing enclosures.
Clear the metadata if you have a disk drive that was previously a member of a virtual disk.
Disk drives in this state display “Leftover” in the Display All Devices page and in the Clear
Metadata page. After you clear the metadata, you can use the disk drive in a virtual disk or
as a spare.
To clear metadata from a disk drive:
1. Select Manage > Utilities > Disk Drive Utilities > Clear Metadata.
The virtual disk enclosure is shown slightly different on this page. Instead of color-
coding individual virtual disks, only drives identified as “Leftover” or “Available” are
color-coded so that it is easier to select the drives whose metadata you want to clear.
2. Select the leftover drive whose metadata you want to clear.
3. Click Clear Metadata for Selected Device.
5.7 Using Diagnostic Manage-Level Only Options

To use FibreCAT SX Manager’s WBI functions covered in this section, you must be logged
in as a Diagnostic Manage-level user. Refer to the “FibreCAT SX60 / SX80 / SX88 Admin-
istrator’s Guide” for information on defining user configuration and setting access privileges.
The Diagnostic Manage-level only options include the following:
● Recovery utilities:
– Trust Virtual Disk
– Clear Unwritable Cache Data
● Debug utilities:
– Setting Up and Viewing the Debug Log
– Viewing Error Buffers
– Viewing CAPI Trace
– View Mgmt Trace
● Configuring event notification by selecting individual events

Using Diagnostic Manage-Level Only Options Troubleshooting Using WBI
5.7.1 Enabling and Using the Trust Virtual Disk for Disaster Recovery
If a virtual disk appears to be down or offline (not quarantined) and the disks are labeled
“Leftover”, use the Trust Virtual Disk function to recover the virtual disk. The Trust Virtual
Disk function brings a virtual disk back online by ignoring metadata that indicates the drives
may not form a coherent virtual disk. This function can force an offline virtual disk to be
critical or fault tolerant, or a critical virtual disk to be fault tolerant. You might need to do this
when:
● A drive was removed or was marked as failed in a virtual disk due to circumstances you
have corrected (such as accidentally removing the wrong disk). In this case, one or
more disks of a virtual disk can start up more slowly, or might have been powered on
after the rest of the disks in the virtual disk. This causes the date and time stamps to
differ, which the array interprets as a problem. Also see “Dequarantining a Virtual Disk”
on page 62.
● A virtual disk is offline because a drive is failing, you have no data backup, and you want
to try to recover the data from the virtual disk. In this case, the Trust Virtual Disk function
might work, but only as long as the failing drive continues to operate.
CAUTION
! If used improperly, the Trust Virtual Disk feature can cause unstable operation and
data loss. Only use this function for disaster recovery purposes and when advised
to do so by a service technician. The virtual disk has no tolerance for any additional
failures.
To trust a virtual disk, first enable the Trust Virtual Disk function and then use it:
1. Select Manage > Utilities > Recovery Utilities > Enable Trust Virtual Disk.
2. Select Enabled.
3. Click Enable/Disable Trust Virtual Disk.
The option is only enabled until you use it. After you trust a virtual disk, the option
reverts back to being disabled.
4. Select Manage > Utilities > Recovery Utilities > Trust Virtual Disk.
5. Select the array and click Trust This Array.
6. Back up the data from all the volumes residing on this virtual disk and audit it to make
sure that it is intact.
7. Verify the virtual disk using the verify utility. While verify is running, any new data written
to any of the volumes on the virtual disk will be written in a parity-consistent way. Select
Manage > Virtual Disk Config > Verify Virtual Disk

Troubleshooting Using WBI Using Diagnostic Manage-Level Only Options
NOTE
i If the virtual disk does not come back online, it might be that too many disks are
offline or the virtual disk might have additional failures on the bus or enclosure that
Trust Virtual Disk cannot fix.
5.7.2 Clearing Unwritable Cache Data

Unwritable cache data is data in the controller cache that cannot be written out to a virtual
disk because that virtual disk is no longer accessible. The virtual disk may be offline or
missing. Unwritable cache data can exist if I/O to the virtual disk does not complete because
drives or enclosures fail or are removed before the data can be written. Recovery is possible
if the missing devices can be restored so that the cached data can be written to the virtual
disk.
Unwritable cache data might affect performance since it ties up the cache space and
prevents that space from being used by other virtual disks that might be performing I/O. The
percentage of the cache filled with unwritable cache data appears. This data can be from
one or multiple virtual disks. If the data is from only one virtual disk, then the serial number
for this virtual disks appears. If the data is from multiple virtual disks, then only the serial
number of one virtual disk appears. If the unwritable cache data for this virtual disk is
cleared, then the serial number of the next virtual disk appears.
CAUTION
! Make sure that the data is no longer needed before clearing it. Once writable cache
is cleared, it cannot be recovered.
To remove data from the cache, perform the following steps:
1. Select Manage > Utilities > Recovery Utilities > Cache Data Status.
2. Click Clear Unwritable Cache Data.
NOTE
i On this page, percentages are shown for the full cache on both controllers. In the
event log, percentages for a volume are shown for a single controller. So if an event
log entry states that unwritable cache data comprises 12% of the cache (for one
controller), this pages shows the data comprises 6% of the total cache on both
controllers.
5.7.3 Viewing the Debug Log

To set up and view the debug log using FibreCAT SX Manager’s WBI, see “Using the Debug
Log” on page 70.

5.7.4 Viewing Error Buffers

The View Debug Buffers panel shows crash dump and boot information saved by the
management controller. During normal operation, the management controller communi-
cates with the storage controller. If there are problems with this communication, there is little
information available to the LAN subsystem to show. In this case and under certain failure
conditions, crash and boot buffer data can be examined. For normal operation, these
buffers are empty.
To view error buffers:
● Select Manage > Utilities > Debug Utilities > View Error Buffers.
The Error Buffer Log is displayed. To save this information to a file, see “Saving Log
Information to a File” on page 63.
Figure 22: Debug Buffer Example
5.7.5 Viewing CAPI Trace

The CAPI Command Trace panel shows the Configuration API (CAPI) commands sent and
received by the management controller. For example, when creating a virtual disk, the
request to create the virtual disk is shown and the reason why it failed. This panel provides
detail for the underlying action that supports the failed function.
To view the CAPI trace:
● Select Manage > Utilities > Debug Utilities > View CAPI Trace.
The CAPI Command Trace panel is displayed. To change the number of commands
shown, select a value from the Requested Lines to Display and click Load/Reload CAPI
Command Trace. The default number of lines to display is 200. To capture this infor-
mation to a file, see “Saving Log Information to a File” on page 63.

Figure 23: CAPI Command Trace Example
5.7.6 Viewing Management Trace

The LAN Debug Trace panel shows a debug trace for the management controller. It traces
interface activity between the controllers’ internal processors and activity on the
management processor.
To view the management trace:
1. Select Manage > Utilities > Debug Utilities > View Mgmt Trace.
2. Select Load/Reload Debug Trace.
The LAN Debug Trace panel is displayed. Because the debug trace can be large, this
page limits the display to 100 entries. To view all trace entries and save this information
to a file, see “Saving Log Information to a File” on page 63.

Figure 24: LAN Debug (Management) Trace Example
5.7.7 Configuring Event Notification By Selecting Individual Events

As described in the “FibreCAT SX60 / SX80 / SX88 Administrator’s Guide”, you can
configure how and under what conditions the array alerts you when specific events occur.
In addition to selecting event categories, as a Diagnostic Manage-level user, you can select
individual events that you want to be notified of.
NOTE
i Selecting many individual events can result in the array sending numerous event
notifications. Select the categories and individual events that are most important to
you.
Use this method when you want to track or watch for a specific event, warning, or error. You
can also use it to receive notification of specific functions being started or completed, such
as reconstruction or completion of initialization.
Individual event selections do not override the Notification Enabled or Event Categories
settings as explained in the “FibreCAT SX60 / SX80 / SX88 Administrator’s Guide”. If the
notification is disabled, the individual event selection is ignored. Similarly, Event Categories

settings have higher precedence for enabling events than individual event selection. If the
critical event category is selected, all critical events cause a notification regardless of the
individual critical event selection. You can select individual events to fine-tune notification
either instead of or in addition to selecting event categories. For example, you can select
the critical event category to be notified of all critical events, and then select additional
individual warning and informational events.
To select events for notification:
1. Select Manage > Event Notification > Select Individual Events.
2. From the Manage menu, select the type of individual event you want to track:
● Critical Events. Represent serious device status changes that might require
immediate intervention.
● Warning Events. Represent device status changes that might require attention.
● Informational Virtual Disk Events. Represent device status changes related to
virtual disks that usually do not require attention.
● Informational Drive Events. Represent device status changes related to disk drives
that do not require attention.
● Informational Health Events. Represent device status changes related to the array’s
health that usually do not require attention.
● Informational Status Events. Represent device status changes related to the array’s
status that usually do not require attention.
● Informational Configuration Events. Represent device status changes related to the
array’s configuration that usually do not require attention.
● Informational Miscellaneous Events. Represent device status changes related to
informational events that usually do not require attention.
3. Select events by clicking the corresponding check box in the column.

Using Advanced Manage-Level Recovery and Debug Utilities Troubleshooting Using WBI
4. For each event you want to be notified of, select a notification method.
For a description of each notification method, refer to the “FibreCAT SX60 / SX80 /
SX88 Administrator’s Guide”.
5. Click Change Events to save your changes.
5.8 Using Advanced Manage-Level Recovery and Debug Utilities

This section describes additional FibreCAT SX Manager’s WBI troubleshooting functions
that require the user to be logged in as an Advanced Manage-level user. Refer to the
“FibreCAT SX60 / SX80 / SX88 Administrator’s Guide” for information about defining user
configuration and setting access privileges.
5.8.1 Dequarantining a Virtual Disk

The quarantining process prevents the array from making a virtual disk critical and starting
reconstruction when the missing drive is just slow to spin up, not properly seated in its slot,
in an enclosure that is not powered up, or from an unknown virtual disk.
The array quarantines a virtual disk (shown by the quarantine icon Quarantined Virtual Disk
icon) if it does not see all of the virtual disk’s drives in these cases:
● After restarting one or both controllers, typically after powering up the array, or after a
failover
● After inserting a disk drive that is part of a virtual disk from another controller/disk
enclosure combination
The virtual disk can be fully recovered if the missing disk drives can be restored. Make sure
that no disk drives have been inadvertently removed or that no cables have been
unplugged. Sometimes not all drives in the virtual disk power up. Check that all enclosures
have rebooted after a power failure. If these problems are found and then fixed, the virtual
disk recovers and no data is lost.
The quarantined virtual disk’s drives are “write locked,” and the virtual disk is not available
to hosts until the virtual disk is dequarantined. The array waits indefinitely for the missing
drive. If the drive does spin up, the array automatically dequarantines the virtual disk. If the
drive never spins up, because it has been removed or has failed, you must dequarantine the
virtual disk manually.

Troubleshooting Using WBI Using Advanced Manage-Level Recovery and Debug Utilities
If the missing drives cannot be restored (for example, a failed drive), you can use dequar-
antine to restore operation in some cases. If the virtual disk is fault-tolerant and is not
missing too many drives and you dequarantine the virtual disk, it comes back up in a critical
state. If a spare of the appropriate size is available, it is used to reconstruct a critical virtual
disk.
NOTE
i After you dequarantine the virtual disk, make sure that a spare drive is available to
let the virtual disk reconstruct.
CAUTION
! If the virtual disk does not have enough drives to continue operation, when a
dequarantine is done, the virtual disk comes back up to an offline state and its data
is not recoverable.
To dequarantine a virtual disk:
1. Select Manage > Utilities > Recovery Utilities > Virtual Disk Quarantine.
2. Select the array you want to dequarantine.
3. Click Dequarantine Selected Virtual Disk.
5.8.2 Saving Log Information to a File

You can save the following types of log information to a file:
● Device status summary, which includes basic status and configuration information for
the array.
● Event logs from both controllers when in active-active mode.
● Debug logs from both controllers when in active-active mode.
● Boot logs, which show the startup sequence for each controller.
● Up to four critical error dumps from each controller. These will exist only if critical errors
have occurred.
● Management controller traces, which trace interface activity between the controllers’
internal processors and activity on the management processor.
To save log information:
1. Select Manage > Utilities > Debug Utilities > Save Logs to File.
2. Optional. Enter contact information to include in the log information file.
3. Under File Contents, choose the logs to include in the file.
By default, all logs are selected.

Problems Accessing the Array Using FibreCAT SX Manager’s WBI Troubleshooting Using WBI
4. Click Generate Log Information.

A processing message is displayed, followed by the Log File Identification page.
5. Click Download Selected Logs to File to save the file to the management host.
The default file name is store.logs. If you intend to capture multiple event logs, be
sure to name the files appropriately so that they can be identified later.
5.9 Problems Accessing the Array Using FibreCAT SX

Manager’s WBI
Table 10 lists problems you might encounter when using FibreCAT SX Manager’s WBI to
access the array.
Problem Solution
You cannot access FibreCAT SX Verify that you entered the correct IP address.
Manager’s WBI. Enter the IP address using the format http://ip-
address/index.html
If the array has two controllers, enter the IP address of the partner
controller.
FibreCAT SX Manager’s WBI pages do Configure your browser according to the information contained in the
not display properly. “FibreCAT SX60 / SX80 / SX88 Administrator’s Guide”.
Click Refresh or Reload in your browser to display the most current
FibreCAT SX Manager’s WBI information.
Be sure that someone else is not accessing the array using the CLI. It is
possible for someone else to change the array’s configuration using the
CLI. The other person’s changes might not display in FibreCAT SX
Manager’s WBI until you refresh the FibreCAT SX Manager’s WBI page.
If you are using Internet Explorer, clear the following option: Tools >
Internet Options > Accessibility > Ignore Colors specified on web pages.
Prevent FibreCAT SX Manager’s WBI pages from being cached by
disabling web page caching in your browser.
Menu options are not available. User configuration affects the FibreCAT SX Manager’s WBI menu. For
example, diagnostic functions are available only to users with Diagnostic
access privileges. Refer to the “FibreCAT SX60 / SX80 / SX88 Admin-
istrator’s Guide” for information on user configuration and setting access
privileges.
Table 10: Problems Accessing the Array Using FibreCAT SX Manager’s WBI

Troubleshooting Using WBI Problems Accessing the Array Using FibreCAT SX Manager’s WBI
Problem Solution
All user profiles have been deleted and 1. Use a terminal emulator (such as Microsoft HyperTerminal) to
you cannot log into FibreCAT SX connect to the system.
Manager’s WBI or the CLI with a remote 2. In the emulator, press Enter to display the serial CLI prompt (#). No
connection. password is required because the local host is expected to be secure.
3. Use the create user command to create new users. For infor-
mation about using the command, enter
create user ? or refer to the “FibreCAT SX Manager Command Line
Interface (CLI)” manual.
Table 10: Problems Accessing the Array Using FibreCAT SX Manager’s WBI

6 Troubleshooting Using Event Logs
Event logs capture reported events from components throughout the array. Each event
consists of an event code, the date and time the event occurred, which controller reported
the event, and a description of what occurred. This chapter describes the event logs and
includes the following topics:
● “Using Event Logs” on page 67
● “Using the Debug Log” on page 70
6.1 Using Event Logs

You can use FibreCAT SX Manager’s WBI or the CLI to view the event logs.
● FibreCAT SX Manager’s WBI
– Select Monitor > Status > View Event Log.
– Click the event log icon in the last panel of any screen.
– Save the event log to a file.
● CLI
– Run the show events command.
6.1.1 Saving the Event Log to a File

To save the event log to a file, perform the following steps:
1. Select Manage > Utilities > Debug Utilities > Save Logs to File.
2. Optional. Enter contact information to include in the log information file.
Contact information provides the support representatives who are reviewing the file a
means to identify who saved the log. In the additional comments filed, enter the reason
the event logs are being saved as well as any pertinent information regarding system
faults.

Using Event Logs Troubleshooting Using Event Logs
3. Under File Contents, choose the logs to include in the file.

By default, all logs are selected.
4. Click Generate Log Information.
A processing message is displayed, followed by the Log File Identification page.
5. Click Download Selected Logs to File to save the file to the management host.
The default file name is store.logs. If you intend to capture multiple event logs, be
sure to name the files appropriately so that they can be identified later.
6.1.2 Event Log Format

Event logs are presented in three formats:
● Critical/warning events only
● Combined events
● Events separated by controller
All of the event log formats are presented as a table with column headings that vary,
depending on whether the logs are being viewed from FibreCAT SX Manager’s WBI or from
the event log capture. The log headings are:
● From FibreCAT SX Manager’s WBI:
– C/W – Type of event: Critical or Warning.
– Date/Time – The date and time the event occurred.
– EC – Error Code. A code used to look up the description of the error. A summary of
this description is provided in the Message column of the event log.
– ESN – Event Serial Number where the first letter indicates the controller reporting
the error.
– Message – A clear text explanation of the Error Code and Event Serial Number.

Troubleshooting Using Event Logs Using Event Logs
● From event log capture:

– Event ID – Event Serial Number where the first letter indicates the controller
reporting the error.
– Date/Time – The date and time the event occurred.
– Code – Error Code. A code used to look up the description of the error. A summary
of this description is provided in the Message column of the event log.
– Criticality – Type of event: Critical or Warning.
– Controller – The controller reporting the error.
– Description – A clear text explanation of the Error Code and Event Serial Number.
6.1.3 Event Types

There are three event types: informational, warning, and critical.
● Informational – A problem occurred that the system corrected or a system change has
been made. These events that are purely informational; no action required.
● Warning – Something related to an array or a virtual disk has a problem. Correct the
problem as soon as possible.
● Critical – Something related to the array or virtual disk has failed and requires
immediate attention.
There are a number of conditions that trigger warning or critical events and can affect the
state of status LEDs. For a list of events, see “Event Codes” on page 115.
6.1.4 Reviewing Event Logs

Perform the following steps when reviewing events:
1. Review the critical/warning events.
Identify the primary event(s) and any that might be the cause of the primary event. For
example, an over temperature event could cause a disk drive failure.
2. Review the event log for the controller that reported the critical/warning event by viewing
the event log by controller. Locate the critical/warning event(s) in the sequence.
Repeat this step for the other controller if necessary.

Using the Debug Log Troubleshooting Using Event Logs
3. Review the events that occurred before and after the primary event.
During this review you are looking for any events that might indicate the cause of the
critical/warning event. You are also looking for events that resulted from the
critical/warning event, known as secondary events.
4. Review the events following the primary and secondary events.
You are looking for any actions that might have already been taken to resolve the event.
6.2 Using the Debug Log

When instructed to do so by service personnel, you can set up and view the debug log that
includes additional troubleshooting information. The debug log is used to capture infor-
mation that will help engineering locate problems within the array logic.
After you set up the debug log per engineering’s instructions, you will need to run data to
the array or recreate the situation that is causing the fault. This populates the debug log with
information that engineering can use to diagnose the array.
NOTE
i The debug log only collects data after you set it up. It will not contain information
about any problems that occurred before you set it up.
6.2.1 Setting Up the Debug Log

To set up the debug log, perform the follow steps:
1. Select Manage > Utilities > Debug Utilities > Debug Log Setup.
The Debug Log Setup page is displayed.
2. Select the debug log setup you want.
● Standard – Under normal conditions, you should select the Standard option. At
minimal impact to I/O performance, it collects a wide range of debug data.
● Fibre Channel Performance – Used for diagnosing Fibre Channel problems. Using
this option, the debug log is dedicated to collecting Fibre Channel information, with
minimal impact on I/O performance.
● Device-Side – Used for diagnosing device-side problems. It collects device failure
data as well as Fibre Channel information, with minimal impact on I/O performance.
● Device Management – Collects very verbose information, including all CAPI trans-
actions. Because this option collects a lot of data, it has a substantial impact on
performance and quickly fills up the debug trace.

Troubleshooting Using Event Logs Using the Debug Log
● Custom Debug Tracing – Shows that specific events are selected for inclusion in the
log.
3. Click Change Debug Logging Setup.
4. If instructed by service personnel, click Advanced Debug Logging Setup Options and
select one or more additional types of events.
Under normal conditions, you should not select any of these options because they have
a slight impact on read/write performance.
6.2.2 Viewing the Debug Log

To view the debug log, perform the following steps:
1. Select Manage > Utilities > Debug Utilities > View Debug Log.
2. Select the number of debug lines to display.
The default is 400.
3. Select the controller from which to include debug lines.

7 Voltage and Temperature Warnings
The array provides voltage and temperature warnings, which are generally input or environ-
mental conditions. Voltage warnings can occur if the input voltage is too low or if a FRU is
receiving too little or too much power from the power and cooling module. Temperature
warnings are generally the result of a fan failure, a FRU being removed from the array for a
lengthy time period, or a high ambient temperature around the enclosure.
This chapter describes the steps to take to resolve voltage and temperature warnings and
provides information about the power supply, cooling fan, temperature, and voltage sensor
locations and alarm conditions. Topics covered in this chapter include:
● “Resolving Voltage and Temperature Warnings” on page 73
● “Sensor Locations” on page 74
7.1 Resolving Voltage and Temperature Warnings

To resolve voltage and temperature warnings, perform the following steps:
1. Check that all of the fans are working by making sure each power and cooling module’s
DC Voltage & Fan Fault/Service LED is off or by using the FibreCAT SX Manager’s WBI
Status Summary page (see “Determining Overall Array Status and Verifying Faults” on
page 44).
2. Make sure that the controller modules are properly seated in their slots and that their
latches are locked.
3. Make sure that no slots are left open for more than two minutes.
If you need to replace a module, leave the old module in place until you have the
replacement or use a blank cover to close the slot. Leaving a slot open negatively
affects the airflow and may cause the unit to overheat.
4. Try replacing each power and cooling module one at a time.
5. Replace the controller modules, one at a time.

Sensor Locations Voltage and Temperature Warnings
7.2 Sensor Locations

Monitoring conditions at different points within the enclosure enables you to avoid problems
before they occur. Power, cooling fan, temperature, and voltage sensors are located at key
points in the enclosure. The enclosure management processors (EMPs) monitor the status
of these sensors to perform SCSI enclosure services (SES) functions. Various FibreCAT SX
Manager’s WBI pages display the sensor information, for example Monitor > Status >
Module Status.
The following sections describe each element and its sensors.
7.2.1 Power Supply Sensors

As shown in Figure 25: , each array has two fully redundant power and cooling modules with
load-sharing capabilities. The power supply sensors described in Table 11 monitor the
voltage, temperature, and fans in each power and cooling module. If the power supply
sensors report a voltage that is under or over the threshold, check the input voltage.
ID Description Location Alarm Conditions

0 Left power supply Power and cooling Voltage, temperature, or fan fault
module 0
1 Right power supply Power and cooling Voltage, temperature, or fan fault
module 1
Table 11: Power Supply Sensors

Voltage and Temperature Warnings Sensor Locations
Fan 2
Fan 3
Power and
{
cooling module 1
Fan 0
Fan 1 { Power and cooling module 0
Figure 25: Power and Cooling Module and Cooling Fan Locations
7.2.2 Cooling Element Fan

As shown in Figure 25: , each power and cooling module includes two fans. The normal
range for fan speed is 4000 to 6000 RPM. When a fan’s speed drops below 4000 RPM, the
EMP considers it a failure and posts an alarm in the array’s event log. Table 12 lists the
element ID, location, and alarm condition for each fan. If the fan speed remains under the
4000 RPM threshold, the internal array temperature may continue to rise. Replace the
power and cooling module reporting the fault.
ID Description Location Alarm Condition
0 Fan 0 Power and cooling module 0 < 4000 RPM
Table 12: Cooling Element Fan Sensor Descriptions
During a shutdown, the cooling fans do not shut off. This allows the unit to continue cooling.

Sensor Locations Voltage and Temperature Warnings
7.2.3 Temperature Sensors

Extreme high and low temperatures can cause significant damage if they go unnoticed.
There are 12 temperature sensors at key points in the enclosure. Table 13: lists the element
IDs, location, and alarm condition for each temperature sensor. Once a temperature fault is
noted, it must be remedied as quickly as possible to avoid system damage. This can be
done by warming or cooling the installation location.
0 Temperature sensor 0 Midplane left < 32°F (0°C)
> 131°F (55°C)
1 Temperature sensor 1 Midplane left < 32°F (0°C)
> 131°F (55°C)
2 Temperature sensor 2 Midplane center < 32°F (0°C)
> 131°F (55°C)
3 Temperature sensor 3 Midplane center < 32°F (0°C)
> 131°F (55°C)
4 Temperature sensor 4 Midplane right < 32°F (0°C)
> 131°F (55°C)
5 Temperature sensor 5 Midplane right < 32°F (0°C)
> 131°F (55°C)
6 Temperature sensor 6 Upper controller module < 32°F (0°C)
> 140°F (60°C)
7 Temperature sensor 7 Upper controller module < 32°F (0°C)
> 140°F (60°C)
8 Temperature sensor 8 Lower controller module < 32°F (0°C)
> 140°F (60°C)
9 Temperature sensor 9 Lower controller module < 32°F (0°C)
> 140°F (60°C)
10 Temperature sensor 10 Power and cooling module 0 < 32°F (0°C)
> 140°F (60°C)
11 Temperature sensor 11 Power and cooling module 1 < 32°F (0°C)
> 140°F (60°C)
Table 13: Temperature Sensor Descriptions

Voltage and Temperature Warnings Sensor Locations
7.2.4 Voltage Sensors

Voltage sensors ensure that the array’s voltage is within normal ranges. Table 14: lists the
element IDs, location, and alarm condition for each voltage sensor.
0 Voltage sensor 0 Power and cooling module 0 (5V) < 4.00V
> 6.00V
> 13.00V
> 6.00V
> 13.00V
4 Voltage sensor 4 Upper controller module (2.5V Local) < 2.25V
> 2.75V
5 Voltage sensor 5 Upper controller module (3.3V Local) < 3.00V
> 3.60V
6 Voltage sensor 6 Upper controller module (midplane 5V) < 4.00V
> 6.00V
7 Voltage sensor 7 Upper controller module (midplane 12V) < 11.00V
> 13.00V
8 Voltage sensor 8 Lower controller module (2.5V Local) < 2.25V
> 2.75V
9 Voltage sensor 9 Lower controller module (3.3V Local) < 3.00V
> 3.60V
10 Voltage sensor 10 Lower controller module (midplane 5V) < 4.00V
> 6.00V
11 Voltage sensor 11 Lower controller module (midplane 12V) < 11.00V
> 13.00V
Table 14: Voltage Sensor Descriptions

8 Troubleshooting and Replacing FRUs
This chapter describes how to troubleshoot and replace field-replaceable units. A field-
replaceable unit (FRU) is a system component that is designed to be replaced onsite.
This chapter contains the following sections:
● “Available FRUs” on page 80
● “Identifying a FibreCAT SX Enclosure by Its Serial Number” on page 83
● “Identifying FibreCAT SX Spares Lists in Ersin” on page 85
● “Filling Out the Field Return Tag Form” on page 88
● “Static Electricity Precautions” on page 89
● “Identifying Controller or Expansion Module Faults” on page 90
● “Removing and Replacing a Controller or Expansion Module” on page 91
● “Updating Firmware” on page 96
● “Identifying Cable Faults” on page 99
● “Identifying Drive Module Faults” on page 99
● “Removing and Replacing a Drive Module” on page 104
● “Identifying Virtual Disk Faults” on page 110
● “Identify Power and Cooling Module Faults” on page 111
● “Replacing an Enclosure” on page 114

Available FRUs Troubleshooting and Replacing FRUs
8.1 Available FRUs

The following three tables list the available FRUs for your storage system.
Available FibreCat SX60 Controller Enclosure FRUs
Material No. Part No. Description

34001882 DHH:PFRUHK02-01 FRONT PLASTIC COVER LEFT SIDE SX60
34001884 DHH:PFRUHK04-01 FRONT PLASTIC COVER RIGHT SIDE FSC
34002999 SNP:A3C40087238 HDD SATA 750GB 7,2K
88038929 DHH:PFRUHA01-01 BOX, 2U, NEP, Chass+ Midplane, SX60, empty
88038940 DHH:PFRUHC01-01 RAID CONTROLLER SX60
88038941 DHH:PFRUHF04-01 SATA DISC 250GB 7.2K
88038943 DHH:PFRUHF06-01 HDD DUMMY
88038944 DHH:PFRUHE01-01 AC POWER SUPPLY
Table 15: FibreCat SX60 Controller Enclosure FRUs

34001252 DHH:PFRUHA01-02 BOX, 2U, NEP,CHASS+ MIDPLANE,SX80,empty
34002998 SNP:A3C40085280 HDD SAS 300GB 15K-SX
88038946 DHH:PFRUHF01-01 SAS DISC 73GB 15K
Table 16: FibreCat SX80 Controller Enclosure FRUs

Troubleshooting and Replacing FRUs Available FRUs

34007210 DHH:PFRUHA01-04 BOX, 2U, NEP,CHASS+ MIDPLANE,EXP,EMPTY
34002998 SNP:A3C40085280 HDD SAS 300GB 15K-SX
Table 17: FibreCat SX60 / SX80 / SX88 Controller Enclosure FRUs

Available FRUs Troubleshooting and Replacing FRUs
Available FibreCat SX60 / SX80 / SX88 Expansion Enclosure FRUs

34001253 DHH:PFRUHA01-03 BOX, 2U, NEP,CHASS+ MIDPLANE, EXP, empty
34003814 SNP:A3C40085779 CASCADING CABLE FOR SINGLE SHIPMENT
88038948 DHH:PFRUHC03-01 EXPANSION (I/O) MODULE
Table 18: FibreCat SX88 Expansion Enclosure FRUs
The order numbers of potentially additional or changed FRUs can be retrieved via spare
part information tool Ersin.

Troubleshooting and Replacing FRUs Identifying a FibreCAT SX Enclosure by Its Serial Number
8.2 Identifying a FibreCAT SX Enclosure by Its Serial Number

To identify each FibreCAT SX enclosure you find the serial number printed (human and
barcode readable) on the left side and on the back side below the right power supply (see
pictures below).
Figure 26: Serial Number Location (1)
Figure 27: Serial Number Location (2)

Identifying a FibreCAT SX Enclosure by Its Serial Number Troubleshooting and Replacing FRUs
8.2.1 Querying the Serial Number Remotely

Log in to a controller module of the FibreCAT controller enclosure in question per Telnet
using a terminal emulator (e. g. Hyper Terminal). Default IP address, default user name,
default password, and the required settings are described in the “FribreCAT SX60 / SX80 /
SX88 Operating Manual”, chapter “Configuring an Array for the First Time”.
After logging in, you automatically work with the controllers Command Line Interface (CLI).
1. Type the CLI command show configuration. The category ENCLOSURE FRU contains
the serial number at Configuration SN:
Figure 28: Querying the Serial Number Remotely (example, showing a FibreCAT SX80)
The first four letters of the FibreCAT SX serial numbers are:

● YKCT for FibreCAT SX controller enclosures
● YHAM or YKCM for FibreCAT SX expansion enclosures
8.2.2 Table of FibreCAT SX60 / SX80 / SX88 Serial Numbers
Serial number (1st four letters) FibreCAT SX Unit

YHAK... or YKCK... FibreCAT SX60 controller enclosure
YHAL... or YKCL... FibreCAT SX80 controller enclosure
YKCT... FibreCAT SX88 controller enclosure
YHAM...or YKCM... FibreCAT SX60 / SX80 expansion enclosure
Table 19: 1st Four Letters of the FibreCAT SX Serial Numbers

Troubleshooting and Replacing FRUs Identifying FibreCAT SX Spares Lists in Ersin
8.3 Identifying FibreCAT SX Spares Lists in Ersin
Using the Serial Number
1. Check the first four letters of the serial number.
Figure 29: Checking the First Four Letters of the Serial Number (example, showing a FibreCAT SX80 controller
Enclosure)

Identifying FibreCAT SX Spares Lists in Ersin Troubleshooting and Replacing FRUs
2. Open Ersin and enter the first four letters at “Description”.
Figure 30: 1st Four Letters of the FibreCAT SX Serial Number in Ersin (example, showing a FibreCAT SX80
controller enclosure)

Troubleshooting and Replacing FRUs Identifying FibreCAT SX Spares Lists in Ersin
Using the Product Class
1. In Ersin, select “Product Class” “PR75-FSC Online Storage”.
Figure 31: Selecting the FibreCAT SX Product Class

Filling Out the Field Return Tag Form Troubleshooting and Replacing FRUs
2. Select one of the two FibreCAT SX spares lists.
Figure 32: Selecting one of the two FibreCAT SX Spares Lists
8.4 Filling Out the Field Return Tag Form

All spare parts should have a so called Field Return Tag which is supplied with the spare
part (FRU). When the replaced and/or failed part is returned to the “Repairer” there must
be a Field Return Tag attached to the returning Module. Since it might happen that not all
FRUs will have the tag included you will find three of them attached to the Support Bulletin
SB-STO-07002, that you find at (for service partners only):
http://extranet.fujitsu-siemens.com/service/information/storage/online/supportbulletins/2007/SB-STO-07002.pdf.
In this case please print it out and attach one tag to each returning FRU. It is strongly recom-
mended that all replaced FRUs that are shipped back to the Repairer should have a filled
out Field Return Tag attached.

Troubleshooting and Replacing FRUs Static Electricity Precautions
8.5 Static Electricity Precautions

To prevent damaging a FRU, make sure you adhere to the following static electricity precau-
tions:
● Remove plastic, vinyl, and foam from the work area.
● Wear an antistatic wrist strap, attached to a ground.
● Before handling a FRU, discharge any static electricity by touching a ground surface.
● Do not remove a FRU from its antistatic protective bag until you are ready to install it.
● When removing a FRU from a controller enclosure, immediately place the FRU in an
antistatic bag and in antistatic packaging.
● Handle a FRU only by its edges, and avoid touching the circuitry.
● Do not slide a FRU over any surface.
● Limit body movement (which builds up static electricity) during FRU installation.

Identifying Controller or Expansion Module Faults Troubleshooting and Replacing FRUs
8.6 Identifying Controller or Expansion Module Faults

The controller and expansion modules are the central components of the array and can
contain subcomponents that require the replacement of the entire FRU should they fail.
Each controller and expansion module contains LEDs that can be used to identify a fault.
Additionally, you can use FibreCAT SX Manager’s WBI to locate and isolate controller and
expansion module faults. See “Troubleshooting Using WBI” on page 43.
NOTE
i When troubleshooting, ensure that you review the reported events carefully. The
controller module is often the FRU reporting faults, but is not always the FRU where
the fault is occurring.
Table 20 lists the faults you might encounter with a controller module or expansion module.
Problem Solution
The FRU OK LED is off. Verify that the controller module is properly seated in the slot and latched.
Check the FibreCAT SX Manager’s WBI event log for power-on initialization
events and diagnostic errors.
The FRU Fault LED is on. Examine the event log to determine if there is any error event and take appro-
priate action.
Call technical support and send in the log and event files.
Replace the controller that displayed the fault LED.
Only one controller module boots In a dual controller module configuration, if a conflict between controllers
exists, only controller module A will boot. For example, if the cache size is
different on the controller modules, controller module B will not boot.
The array reports an SDRAM Replace the controller module where the error occurred.
memory error.
Controller Failure The controller might need to have its firmware upgraded or be replaced.
Event codes 84 and 74 Check the specific error code to determine the corrective action to take.
Controller voltage fault Check the power and cooling module and the input voltage.
Controller temperature fault Check that the array’s fans are running.
Check that the ambient temperature is not too warm. Refer to chapter
„Important Notes“, section “Site Requirements”, in the “FibreCAT SX60 /
SX80 / SX88 Operating Manual” for temperature specifications.
Check for any obstructions to the airflow.
When the problem is fixed, event 47 is logged.
Memory Error Contact Technical Support.
Event code 65 and 138 The controller module needs to be replaced.
Event 72 indicates that, after the failover to the other controller, the recovery
has started.
Table 20: Controller Module or Expansion Module Faults

Troubleshooting and Replacing FRUs Removing and Replacing a Controller or Expansion Module
Problem Solution
Flash write failure The controller needs to be replaced.
Event code 157
Firmware mismatch The downlevel controller needs to be upgraded.
Event code 89
Table 20: Controller Module or Expansion Module Faults
8.7 Removing and Replacing a Controller or Expansion Module

In a dual-controller configuration, stop all I/O during a Controller or Expansion Module
replacement.
In a single-controller, I/O to the FibreCAT SX must be powered off.
A controller/expansion module might need replacing when:
● The Fault/Service Required LED is illuminated
● Events in FibreCAT SX Manager’s WBI indicate a problem with the module
● Troubleshooting indicates a problem with the module
CAUTION
! After Expansion Module replacement, verify the Expansion Module Firmware.
CAUTION
! In a dual-controller configuration, both controllers must have the same cache size.
If the new controller has a different cache size, controller A will boot and controller
B will not boot. To view the cache size, select
Monitor > Advanced Settings > Controller Version.
8.7.1 Saving Configuration Settings

Before replacing a controller module, save the enclosure’s configuration settings to a local
management host. This makes a backup of your settings in case a subsequent configu-
ration change causes a problem, or you want to “clone” your array’s settings and apply them
to another array.

Removing and Replacing a Controller or Expansion Module Troubleshooting and Replacing FRUs
If you use FibreCAT SX Manager’s WBI to save the configuration settings, the file will
contain all FibreCAT SX configuration data, including the following settings:
● FC host port
● Enclosure management
● Options
● Disk
● LAN
● Service security
● Remote notification
NOTE
i The configuration file does not include any virtual disk or volume information. You
do not need to save this information before replacing the controller or expansion
module because it is saved to a special area on the disk drive.
To save your array’s configuration data to a file on the management host or another host on
your network using FibreCAT SX Manager’s WBI, perform following steps:
1. Connect to the FibreCAT SX from FibreCAT SX Manager’s WBI using the IP address
for one of the controller modules.
2. Select Manage > Utilities > Configuration Utilities > Save Config File.
3. Click Save Configuration File.
4. If prompted to open or save the file, click Save.
5. Specify the file location and name, using a .config extension.
The default file name is saved_config.config.
NOTE
i If you are using Firefox and have a download directory set, the file is automatically
saved to it.
8.7.2 Shutting Down a Controller Module

You can replace a controller module without powering off the enclosure. You only need to
power off the enclosure when you plan to remove both controller modules or shut down the
entire system for maintenance, repair, or a move.
Shutting down the controller module halts I/O to the module, ensures that any data in the
write cache is written to disk, and forces a failover to the other controller module in the
enclosure.

Troubleshooting and Replacing FRUs Removing a Controller Module or Expansion Module
CAUTION
! To ensure continuous availability of the system, be sure that the other controller
module is online before shutting down a controller module. Check the status of the
other controller module using FibreCAT SX Manager’s WBI to determine whether
the other controller module is online.
To shut down a controller module using FibreCAT SX Manager’s WBI, perform the following
steps:
1. Select Manage > Restart System > Shut Down/Restart.
2. In the Shut Down panel, select the controller module you want to shut down.
3. Click Shut Down.
A warning might appear that data access redundancy will be lost until the selected
controller is restarted. This is an informational message that requires no action.
4. Click OK.
FibreCAT SX Manager’s WBI shows that the module is shut down.
8.8 Removing a Controller Module or Expansion Module

As long as the other module in the enclosure you are removing remains online and active,
you can remove a module without powering down the enclosure: however you must shut
down a controller module as described in “Shutting Down a Controller Module” on page 92.
When removing a controller module, the other controller module must be online.
CAUTION
! The enclosure might overheat if you take more than two minutes to replace it. When
you replace a module with the enclosure powered on, install the new module within
two minutes of removing the old module.
To remove a controller module or expansion module, perform the following steps.
1. Follow all Static Electricity Precautions as described in “Static Electricity Precautions”
on page 89.
2. Check the status of the other module of the type you want to remove.
To ensure continuous availability of the system, be sure that the other module of the
type you want to remove is online. If the other module is offline, resolve the problem with
that module before removing the module you want replace.
3. If the other module is online and you are removing a controller module, shut down the
module that you want to remove as described in “Shutting Down a Controller Module”
on page 92.

Removing a Controller Module or Expansion Module Troubleshooting and Replacing FRUs
You only need to use the Shut Down function for controller modules. The blue OK to
Remove LED illuminates to indicate that the module can be removed safely.
4. Use FibreCAT SX Manager’s WBI to illuminate the Unit Locator LED for the enclosure
where you want to replace the module.
a) Select Manage > General Config > Enclosure Management
b) Click on Illuminate Locator LED
5. Physically locate the module with the Unit Locator LED blinking.
On the Enclosure Management page, look at the System Panel at the bottom of the
page. This panel shows the status of the controllers. In the enclosure, controller A is
always on top, and controller B is always on the bottom.
6. If the controller module is connected to an expansion enclosure, disconnect the SAS
cables from the controller module before removing the controller.
7. Turn the thumbscrew on each ejector handle (see Figure 33) counterclockwise until the
screw disengages from the module.
Do not remove the screw from the handle.
Ejector handles
Controller A
Controller B
Thumbscrews
Figure 33: Location of Controller/Expansion Module Ejector Thumbscrews (Controller Modules Shown)
8. Rotate both ejector handles downward, supplying leverage to disengage the module
from the interior connector.

Troubleshooting and Replacing FRUs Removing a Controller Module or Expansion Module
NOTE
i If you have not already shut down the module, ejecting it forces the module offline
regardless of the state of the other module.
9. Pull outward on the ejector handles to slide the module out of the chassis.
8.8.1 Installing a Controller Module or Expansion Module

You can install a module without powering off the enclosure.
CAUTION
! When replacing a controller module, ensure that less than 10 seconds elapses
between the time you insert a controller module into a slot and the time you fully
seat it and lock the ejector handles in place. Failing to do so might cause the
controller to fail. If it is not locked within 10 seconds, pull the controller module
completely out and repeat the process.
To install a controller module or an expansion module, perform the following steps.
1. Follow all static electricity precautions as described in “Static Electricity Precautions” on
page 89.
2. Loosen the thumbscrews on the ejector handles of the new module and rotate the
ejectors downward.
3. Orient the module with the extractor handles toward the top, and slide the module into
the appropriate slot as far as it will go.
For controller modules (shown in Figure 34) and expansion modules, you must make
sure the rails on both sides of the module slide into the guides on both sides of the slot.
Figure 34: Sliding a Controller Module Into an Enclosure
4. Rotate the ejector handles upward until they are flush with the top edge of the module,
and turn the thumbscrews on each ejector handle clockwise until they are finger-tight.

Updating Firmware Troubleshooting and Replacing FRUs
The OK LED illuminates green when the module completes its initialization and is
online.
NOTE
i If partner firmware update is selected, when you install a new controller module, the
controller module with the oldest firmware will update itself with the newer firmware
on the other controller.
8.8.1.1 Fault/Service Required
If the Fault/Service Required yellow LED is illuminated, the module has not gone online and
likely failed its self-test. Try to put the module online (see “Shutting Down a Controller
Module” on page 92) or check for errors that were generated in the event log from FibreCAT
SX Manager’s WBI.
8.8.1.2 Boot Handshake Error
When powering on the controllers, if a boot handshake error occurs, try turning off both
controllers for two seconds and then powering them back on. If this does not correct the
error, remove and replace each controller following the instructions in “Removing a
Controller Module or Expansion Module” on page 93.
8.9 Updating Firmware

Occasionally new firmware is released to provide new features and fixes to known issues.
The firmware is updated during controller replacement or by using FibreCAT SX Manager’s
WBI.
CAUTION
! Do not power off the array during a firmware upgrade. Doing so might cause irrep-
arable damage to the controllers.
8.9.1 Updating Firmware During Controller Replacement

CAUTION
! Stop all I/O during controller replacment.
When a replacement controller is sent from the factory, it might have a more recent version
of firmware installed than the surviving controller in your system. By default, when you
insert the replacement controller, the system compares the firmware of the existing
controller and that of the new controller. The controller with the oldest firmware automati-

Troubleshooting and Replacing FRUs Updating Firmware
cally downloads the firmware from the controller with the most recent firmware (partner
firmware upgrade). If told to do so by a service technician, you can disable the partner
firmware upgrade function using FibreCAT SX Manager’s WBI.
8.9.1.1 Disabling Partner Firmware Upgrade
The partner firmware upgrade option is enabled by default in FibreCAT SX Manager’s WBI.
Only disable this function if told to do so by a service technician.
1. Select Manage > General Config > System Configuration.
2. For Partner Firmware Upgrade, select Disable.
8.9.2 Updating Firmware Using FibreCAT SX Manager’s WBI

CAUTION
! Stop all I/O during firmware update.
FibreCAT SX Manager’s WBI enables you to upgrade the firmware on your array when new
releases are available. To update your firmware using FibreCAT SX Manager’s WBI,
perform the following steps:
1. Ensure that the software package file is saved to a location on your network that the
array can access.
2. Select Manage > Update software > Controller software.
The Load Software panel is displayed, which describes the update process and lists
your current software versions.
3. Click Browse and select the software package file.
4. Click Load Software Package File.
If the array finds a problem with the file, it shows a message at the top of the page. To
resolve the problem, try the following:
● Be sure to select the software package file that you just downloaded.
● Download the file again, in case it got corrupted. Do not attempt to edit the file.
After about 30 seconds, the Load Software to Controller Module panel is displayed. This
page lets you know whether the file was validated and what software components are
in the file. The array only updates the software that has changes.
5. Review the current and new software versions, and then click Proceed with Code
Update.

Updating Firmware Troubleshooting and Replacing FRUs
A Code Load Progress window is displayed to show the progress of the update, which
can take several minutes to complete. The update procedure for one controller will take
up to 30 minutes. Do not power off the array during the code load process. Once the
firmware upload is complete, the controller resets after which the opposite controller
automatically repeats the process to load the new firmware. When the update
completes on the connected controller, you are logged out. Wait one minute for the
controller to start and click Log In to reconnect to FibreCAT SX Manager’s WBI.
If an enclosure firmware update is necessary (see release notes), you must update the
enclosure firmware from Controller A and Controller B.

Troubleshooting and Replacing FRUs Identifying Cable Faults
8.10 Identifying Cable Faults

When identifying cable faults you must remember that there are two sides of the controller:
the input/output to the host and the input/output to the expansion enclosures. It is also
important to remember that identifying a cable fault can be difficult due to the multiple
components that make up the data paths that cannot be overlooked as a cause of the fault.
Before you take to many troubleshooting steps, ensure you have reviewed the proper
cabling steps in the “FibreCAT SX60 / SX80 / SX88 Operating Manual”. Many faults can be
eliminated by properly cabling the array.
8.10.1 Identifying Cable Faults on the Host Side

To identify a faulty cable on the host side, use the host link status LED and perform the
troubleshooting procedure described in “Isolating Faults Using the Host Link Status LEDs”
on page 33.
8.10.2 Identifying Cable Faults on the Expansion Enclosure Side

To identify a cable fault on the expansion enclosure side, perform the troubleshooting
procedure described in “Isolating Faults Using the Expansion Port Status LED” on page 35.
8.11 Identifying Drive Module Faults

When identifying faults in drive modules you must:
● Understand disk-related errors
● Be able to determine if the error is due to a faulty disk drive or faulty disk drive channel
● Identify what action the controller has taken to protect the virtual disk after the drive fault
occurred (that is, rebuilding to a hot-spare)
● Know how to identify disk drives in the enclosure
● Understand the proper procedure for replacing a faulty drive module

Identifying Drive Module Faults Troubleshooting and Replacing FRUs
8.11.1 Understanding Disk-Related Errors

The event log includes errors reported by the enclosure management processor (EMPs)
and disk drives in your FibreCAT SX. If you see these errors in the event log, the following
will help you to understand the errors.
When a disk detects an error, it reports it to the controller by returning a SCSI sense key,
and if appropriate, additional information. This information is recorded in the FibreCAT SX
Manager’s event log. Table 21 lists some of the most common SCSI sense key descriptions
(in hexadecimal). Table 22 lists the descriptions for the standard SCSI sense codes (ASC)
and sense code qualifiers (ASCQ), all in hexadecimal. Refer to the SCSI Primary
Commands - 2 (SPC-2) Specification for a complete list of ASC and ASCQ descriptions.
Sense Key Description
0h No sense
1h Recovered error
2h Not ready
3h Medium error
4h Hardware error
5h Illegal request
6h Unit attention
7h Data protect
8h Blank check
9h Vendor-specific
Ah Copy aborted
Bh Aborted
command
Ch Obsolete
Dh Volume overflow
Eh Miscompare
Fh Reserved
Table 21: Standard SCSI Sense Key Descriptions

Troubleshooting and Replacing FRUs Identifying Drive Module Faults
ASC ASCQ Descriptions

0C 02 Write error —auto-reallocation failed
0C 03 Write error — recommend reassignment
11 00 Unrecovered read error
11 01 Read retries exhausted
11 02 Error too long to correct
11 03 Multiple read errors
11 04 Unrecovered read error — auto-reallocation failed
11 0B Unrecovered read error — recommend reassignment
11 0C Unrecovered read error — recommend rewrite the data
47 01h Data phase CRC error detected
Table 22: Common ASC and ASCQ Descriptions
8.11.2 Disk Drive Errors

In general media errors (sense key 3), recovery errors (sense key 1), and SMART events
(identified by the following text in the event logs: “SMART event”), clearly point to a problem
with a specific drive. Other events, such as protocol errors and I/O time-outs might suggest
drive problems, but also might be indicative of poorly seated or faulty cables, problems with
particular drive slots, or even problems with the drive’s dongle, a small printed circuit board
attached to the drive carrier of each drive. Each of these events may result in a warning or
critical notification in the FibreCAT SX Manager and the event log.
Problem
Event Event
Problem Code Solution Code
Impending disk drive 55 Replace the disk before failure. 8
failure
Ensure that the virtual disk that includes this disc is fault
tolerant. If it is not, add a spare disk to the FibreCAT SX.
The virtual disk will automatically use the spare when the
failed disc is removed.
Table 23: Disk Drive Problems

Identifying Drive Module Faults Troubleshooting and Replacing FRUs
8.11.3 Disk Channel Errors

Disk channel errors are similar to disk-detected errors, except they are detected by the
controllers instead of the disk drive. Some disk channel errors are displayed as text strings.
Others are displayed as hexadecimal codes.
If the error is a critical error, perform the steps in “Disk Drive Errors” on page 101.
Table 24 lists the descriptions for disk channel errors. Most disk channel errors are informa-
tional because the FibreCAT SX issues retries to correct any problem. Errors that cannot
be corrected with retries will result in another critical event describing the affected FibreCAT
SX (if any).
Error Code Description
CRC Error CRC error on data was received from a target.
Dev Busy Target reported busy status.
Dn/Ov Run Data overrun or underrun has been detected.
IOTimeout FibreCAT SX aborted an I/O request to this target because it timed
out.
Link Down Link down while communication in progress.
LIP I/O request was aborted because of a channel reset.
No Respon No response from target.
Port Fail Disk channel hardware failure. This may be the result of bad cabling.
PrtcolError FibreCAT SX detected an unrecoverable protocol error on the part of
the target.
QueueFull Target reported queue full status.
Stat: 04 Data overrun or underrun occurred while getting sense data.
Stat: 05 Request for sense data failed.
Stat: 32 Target has been reserved by another initiator.
Stat: 42 I/O request was aborted because of FibreCAT SX’s decision to reset
the channel.
Stat: 44 FibreCAT SX decided to abort I/O request for reasons other than bus
or target reset.
Stat: 45 I/O request was aborted because of target reset requested by
FibreCAT SX.
Stat: 46 Target did not respond properly to abort sequence.
Table 24: Disk Channel Error Codes

Troubleshooting and Replacing FRUs Identifying Drive Module Faults
8.11.4 Identifying Faulty Drive Modules

To identify faulty drive modules, perform the following steps:
1. Does the fault involve a single drive?
● If yes, perform steps Step 2 through Step 4.
● If an entire enclosure of disk drives are faulty check your cabling and if necessary
perform the steps in “Identifying Cable Faults” on page 99.
2. Identify the suspected faulty disk drive using the LEDs.
3. Replace the suspected faulty disk drive with a known good drive (a replacement drive).
4. Does this correct the fault?
● If yes, the fault has been corrected and no further action is necessary.
● If no, continue to Step 5.
5. The fault may be caused by a bad disk drive slot on the midplane. Confirm your findings
by powering off the array, moving an operating disk drive into the suspected slot, and
re-applying power.
NOTE
i Step 5 requires that you schedule down time for the system.
6. Does this drive fail when placed in the suspected slot?
● Yes, replace the enclosure. You have located the faulty FRU.
● No, continue to Step 7.
7. If it does not fail move the drive back to it original slot and ensure the replacement drive
is fully inserted into the slot.
If the drive fails again the midplane may have an intermittent fault or the connector is
dirty, replace the enclosure.

Removing and Replacing a Drive Module Troubleshooting and Replacing FRUs
8.12 Removing and Replacing a Drive Module

A drive module consists of a disk drive in a sled. Drive modules are hot-swappable, which
means they can be replaced without halting I/O to the FibreCAT SX or powering it off.
CAUTION
! To prevent any possibility of data loss, back up data to another virtual disk or other
location before removing the drive module.
CAUTION
! When you replace a drive module, the new module must be designed for the array
of the same type (SAS or SATA) and must have a capacity equal to or greater than
the drive module you are replacing. Otherwise the array will not accept the new disk
drive for the virtual disk.
If you are using disk management software or volume management software to manage
your disk storage, you might need to perform software operations to take a drive module
offline before you remove it and then, after you have replaced it, to bring the new drive
module online. Refer to the documentation that accompanies your disk management
software or volume management software for more information.
8.12.1 Replacing a Drive Module When the Virtual Disk Is Rebuilding

When a drive module fails or is removed, the system rebuilds the virtual disk by restoring
any data that was on the failed disk drive onto a global spare or virtual disk spare, if one is
available. If you replace more than one drive module at a time, the virtual disk cannot be
rebuilt. If more than one drive module fails in a virtual disk (except RAID 50 and 10), the
virtual disk fails and data from the virtual disk is lost.
When you want to replace a drive module and a virtual disk to which it belongs is being
rebuilt, you have two options:
● Wait until the rebuild process is completed, then replace the defective drive module. The
benefit is that the virtual disk is fully restored before you replace the defective drive. This
eliminates the possibility of lost data if the wrong drive is removed.
● Replace the defective drive and make the new drive a global spare while the rebuilding
process continues. This procedure installs the new drive and assigns it as a global
spare so that an automatic rebuild can occur if a drive module fails on another virtual
disk.
If a drive module fails in another virtual disk before a new global spare is assigned, you
must manually rebuild the virtual disk.

Troubleshooting and Replacing FRUs Removing and Replacing a Drive Module
8.12.2 Identifying the Location of a Faulty Drive Module

Before replacing a drive module, perform the following steps to ensure that you have
identified the correct drive module for removal.
CAUTION
! Failure to identify the correct drive module might result in data loss from removing
the wrong drive.
1. When a disk drive fault occurs, the failed disk drive’s lower LED is solid yellow, indicating
that it must be replaced; locate the yellow LED at the front of the drive module.
2. To verify the faulty drive module from FibreCAT SX Manager’s WBI, select
Monitor > Status > Status Summary.
3. In the Virtual Disk Overview panel, locate and click any critical virtual disks . The
Virtual Disk Status panel is displayed. As shown in Figure 35, the Virtual Disk Drive List
panel shows the status of the faulty drive as Down.
Figure 35: Virtual Disk Drive List Panel.
4. Replace the failed module by following the instructions in “Removing a Drive Module”
on page 106.
You can also use the CLI show enclosure-status command. If the drive status is “Absent”
the drive might have failed, or it has been removed from the chassis. For details on the show
enclosure-status command, refer to the “FibreCAT SX Manager Command Line
Interface (CLI)” manual.

8.12.3 Removing a Drive Module

When you remove a drive module, it is important to maintain optimum airflow through the
chassis by either replacing it immediately with another one or by using an air management
module. If you do not have a replacement module or an air management module, do not
remove the drive module, that is, it is not harmful to the storage system to keep a fault drive
inserted until you have a replacement drive. If you do have an air management module, it
is installed using the same procedure for removing a drive module as described below.
CAUTION
! If you remove a drive module and do not replace it within two minutes, you alter the
air flow inside the enclosure, which could cause overheating of the enclosure. Do
not remove a drive module unless you have a replacement drive module or air
management module to immediately replace the one you removed.
To remove a drive from an enclosure, perform the following steps:
page 89.
2. Squeeze the release on the left edge of the drive ejector handle (see Figure 36).
3. Rotate the handle toward the right to disengage the drive module from the enclosure’s
internal connector.
Drive ejector handle

Squeeze
Figure 36: Releasing the Drive Module Handle
4. Wait 20 seconds for the internal disks to stop spinning.

5. Pull the drive module out of the chassis.

8.12.4 Installing a Drive Module

To install the a drive module, perform the following steps:
page 89.
2. If the ejector handle is closed, squeeze the release on the left edge of the drive ejector
handle and rotate the handle toward the right to open the locking mechanism (see
Figure 36).
3. Orient the drive module with the LEDs to the left
4. Rotate drive ejector handle toward the left until the release clicks closed to firmly seat
the drive module in the enclosure’s internal connector.
If the controller enclosure is powered on, the green Power/Activity/Fault LED illumi-
nates, indicating that the disk drive is functional.
5. Use FibreCAT SX Manager’s WBI status page (Manage > Vdisk Configuration > Disk
Drive Status) to check the status of the disk and then use Table 25 to determine how to
continue.
Status Action
The status of the virtual disk that originally had the failed drive Use FibreCAT SX Manager’s WBI to assign the new
status is Good. A global or virtual disk (dedicated) spare has drive module as either a global spare or a vdisk
been successfully integrated into the virtual disk and the spare:
replacement drive module can be assigned as either a global Select Manage > Virtual Disk Config > Global
spare or a virtual disk spare. Spare Menu.
The status of the disk drive just installed is LEFTOVER. All of the member disk drives in a virtual disk
contain metadata in the first sectors. The array uses
the metadata to identify virtual disk members after
restarting or replacing enclosures.
Use FibreCAT SX Manager’s WBI to clear the
metadata if you have a disk drive that was previ-
ously a member of a virtual disk. After you clear the
metadata, you can use the disk drive in a virtual
disk or as a spare:
Select Manage > Utilities > Disk Drive Utilities >
Clear Metadata.
Select the disk, and click on Clear Metadata for
Selected Disk Drives.
Table 25: Disk Drive Status

Status Action
If the status of the virtual disk that originally had the failed All data in the virtual disk is lost. Use the FibreCAT
drive status is FATAL FAIL, two or more drive modules have SX Manager’s WBI Trust Virtual Disk function to
failed. attempt to bring the virtual disk back online.
Select Manage > Utilities > Recovery Utilities >
Trust Virtual Disk.
Note: You must be a Diagnostic Manage-level user
to access the Trust Virtual Disk submenu. Refer to
the “FibreCAT SX60 / SX80 / SX88 Adminis-
trator’s Guide” for more information on access privi-
leges.
The status of the virtual disk that originally had the failed drive See “Verify that the Correct Power-On
status is DRV ABSENT or INCOMPLETE. These status Sequence was Performed” on page 109. If the
indicators only occur when the enclosure is initially powered power-on sequence was correct, locate and replace
up. DRV ABSENT indicates that one drive module is bad. the additional failed drive modules.
INCOMPLETE indicates that two or more drive modules are
bad.
The status of the virtual disk that originally had the failed drive Wait for the virtual disk to complete its operation.
indicates that the virtual disk is being rebuilt.
The status of the virtual disk that originally had the failed drive If this status occurs after you replace a defective
is DRV FAILED. drive module with a known good drive module, the
enclosure midplane might have experienced a
failure.
Replace the enclosure.
Table 25: Disk Drive Status

8.12.5 Verify that the Correct Power-On Sequence was Performed

Review the power-on sequence which you most recently used with the enclosure. If you are
uncertain about the sequence used, repeat the power-on sequence in the following order
and see if it results in a Good status for the virtual disk that originally had the failed drive.
1. Power up the enclosures and associated data host in the following order:
a) Expansion enclosures first
b) Controller enclosure next
c) Data hosts last (if they had been powered down for maintenance purposes)
2. In FibreCAT SX Manager’s WBI, select Monitor > Status > Vdisk Status to display the
virtual disk overview panel.
This panel displays an icon for each virtual disk with information about the virtual disk
below it.
8.12.6 Installing an Air Management Module

An air management module looks like a drive module; however, it is an empty box used to
maintain optimum airflow and proper cooling in an enclosure. If your system was ordered
with less than 12 drive modules it was shipped with air management modules for the slots
without drive modules. Optionally, air management modules can be ordered.
If you must remove a drive module and cannot immediately replace it, you must leave the
faulty drive module in place, or insert an air management module to maintain the optimum
airflow inside the chassis. The blank is installed using the same procedure as “Installing a
Drive Module” on page 107.

Identifying Virtual Disk Faults Troubleshooting and Replacing FRUs
8.13 Identifying Virtual Disk Faults

Obvious virtual disk problems involve the failure of a member disk drive. However, there are
a number of not so obvious issues that result in virtual disk faults as seen in Table 26.
Problem Solution
Expanding virtual disk requires days In general, expanding a virtual disk can take days to complete. You cannot
to complete. stop the expansion once it is started.
If you have an immediate need, create a new virtual disk of the size you
want, transfer your data to the new virtual disk, and delete the old virtual
disk.
Failover causes a virtual disk to In general, controller failover is not supported if a disk drive is in an
become critical when one of its expansion enclosure that is connected with only one cable to the controller
drives “disappears.” enclosure. This is because access to the expansion enclosure will be lost if
the controller to which it is connected fails. When the controller with the
direct connection to the expansion enclosure comes back online, access to
the expansion enclosure drives is restored. To avoid this problem, ensure
that two cables are used to connect the enclosures as shown in the
“FibrecAT SX60 / SX80 / SX88 Operating Manual” and that the cables are
connected securely and are not damaged.
If the problem persists or affects a disk drive in a controller enclosure, a
hardware problem might have occurred in the drive module, dongle,
midplane, or controller module. Identify and replace the FRU where the
problem occurred
A virtual disk is much smaller than it Verify that the disk drives are all the same size within the virtual disk. The
should be. virtual disk is limited by the smallest sized disk.
Volumes in the virtual disk are not Verify that the volumes are mapped to the host using FibreCAT SX Manger’s
visible to the host. WBI:
Manage > Volume Management > Volume Mapping > Map by Volume.
Virtual Disk Degraded Replace the failed disk drive and add the replaced drive as a spare to the
Event codes 58 and 1, or event critical virtual disk.
codes 8 and 1 If you have dynamic spares enabled, you only need to replace the drive. The
system will automatically reconstruct the virtual disk.
Virtual Disk Failure Replace the bad disk drive and restore the data from backup.
Event codes 58 and 3, or event
codes 8 and 3
Virtual Disk Quarantined Ensure that all drives are turned on.
Event code 172 When the vdisk is de-quarantined, event code 79 is returned.
Table 26: Virtual Disk Faults

Troubleshooting and Replacing FRUs Identify Power and Cooling Module Faults
Problem Solution
Spare Disk Failure Replace the disk.
Event code 62 If this disk was a dedicated spare for a vdisk, assign another spare to the
vdisk.
Spare Disk Unusable The disk might not have a great enough capacity for the vdisk.
Event code 78 Replace the spare with a disk that has a capacity equal to or greater than
the smallest disk in the vdisk.
Mixed drive type errors Virtual disks do not support mixed drive types.
Verify that the drives in the virtual disk are of the same type (SATA or SAS)
and that they have the same capacity. If you attempt to build a virtual disk
with mixed drive types you will receive an error.
If you attempt to build a virtual disk with various sized disk drives, a warning
will be displayed. The capacity of the smallest disk will be set for all others.
Table 26: Virtual Disk Faults
8.13.1 Clearing Metadata From a Disk Drive

All of the member disk drives in a virtual disk contain metadata in the first sectors. The array
uses the metadata to identify virtual disk members after restarting or replacing trays.
Clear the metadata if you have a disk drive that was previously a member of a virtual disk.
Disk drives in this state display “Leftover” in the Display All Devices page and in the Clear
Metadata page. After you clear the metadata, you can use the disk drive in a virtual disk or
as a spare.
To clear metadata from a disk drive, see “Clearing Metadata From a Disk Drive” on
page 111.
8.14 Identify Power and Cooling Module Faults

When isolating faults in the power and cooling module, it is important to remember that the
module consists of two primary components: fans and a power supply. When ether of these
components fails, FibreCAT SX Manager’s WBI provides notification, the faults are
recorded in the event log, and the power and cooling module’s status LED changes from
green to yellow. Alternatively, you can use the CLI to poll for events; refer to the “FibreCAT
SX Manager Command Line Interface (CLI)” manual.

Identify Power and Cooling Module Faults Troubleshooting and Replacing FRUs
NOTE
i When a power supply fails, the fans of the module continue to operate because they
draw power from the power bus located on the midplane.
Once a fault is identified in the power and cooling module, you need to replace the entire
module.
CAUTION
! Because removing the power and cooling module significantly disrupts the
enclosure’s airflow, do not remove the power and cooling module until you have the
replacement module.
Table 27 lists possible power and cooling module faults.
Fault Solution
Power supply fan warning or failure, or Check that all of the fans are working using FibreCAT
power supply warning or failure. Event code 168 SX Manger’s WBI.
Make sure that no slots are left open for more than 2
minutes. If you need to replace a module, leave the
old module in place until you have the replacement,
or use a blank cover to close the slot. Leaving a slot
open negatively affects the airflow and might cause
the unit to overhead.
Make sure that the controller modules are properly
seated in their slots and that their latches are locked.
Power and cooling module status is listed as failed or you Check that the switch on each power and cooling
receive a voltage event notification. Event code 168 module is turned on.
Check that the power cables are firmly plugged into
both power and cooling modules and into an appro-
priate electrical outlet.
Replace the power and cooling module.
AC Power LED is off. Same as above.
DC Voltage & Fan Fault/Service LED is on. Replace the power and cooling module.
Table 27: Power and Cooling Module Faults
8.14.1 Removing and Replacing a Power and Cooling Module

A single power module is sufficient to maintain operation of the enclosure. It is not
necessary to halt operations and completely power off the enclosure when replacing only
one power module.

Troubleshooting and Replacing FRUs Identify Power and Cooling Module Faults
CAUTION
! When you remove a power and cooling module, install the new module within two
minutes of removing the old module. The enclosure might overheat if you take more
than two minutes to replace the power and cooling module.
To remove a power and cooling module from an enclosure, perform the following steps:
page 89.
2. Set the power switch on the module to the Off position.
3. Disconnect the power cable.
4. Turn the thumbscrew at the top of the latch (see Figure 37) counterclockwise until the
thumbscrew is disengaged from the power and cooling module.
Do not remove the thumbscrew from the latch.
Thumbscrew
Latch
Figure 37: Removing the Power and Cooling Module from the Chassis
5. As shown in Figure 37, rotate the latch downward to about 45 degrees, supplying
leverage to disconnect the power and cooling module from the internal connector.
6. Use the latch to pull the power and cooling module out of the chassis.
NOTE
i Do not lift the power and cooling module by the latch. This could break the latch.
Hold the power and cooling module by the metal casing.

Replacing an Enclosure Troubleshooting and Replacing FRUs
8.14.2 Installing a Power and Cooling Module

To install a power and cooling module from an enclosure, perform the following steps:
1. Orient the new power and cooling module with the AC connector and power switch
toward the right as shown in Figure 37, and slide the module into the power supply slot
as far as it will go.
2. Rotate the latch upward so that is flush against the power and cooling module to ensure
that the connector on the module engages the connector inside the chassis.
3. Turn the thumbscrew at the top of the power supply latch clockwise until it is finger-tight
to secure the latch to the power and cooling module.
4. Reconnect the power cable.
5. Set the power switch to the On position.
8.15 Replacing an Enclosure

The enclosure consists of an enclosure’s metal housing and the midplane that connects
controller/expansion modules, drive modules, and power and cooling modules. This FRU
replaces an enclosure that has been damaged or whose midplane has been damaged.
Often times a damaged midplane will appear as though a controller module has failed. If
you replace a controller module and it does not remedy the original fault, replace the
enclosure.
To make a fully functional enclosure, you must insert the following parts from the replaced
enclosure:
● Drive modules and air management modules
● Two power and cooling modules
● One or two controller modules (for a controller enclosure)
● One or two expansion modules (for an expansion enclosure)
To install the individual modules, use the replacement instructions provided in this guide. To
configure the enclosure, refer to the “FibreCAT SX60 / SX80 / SX88 Operating Manual”.
CAUTION
! If connected data hosts are not inactive during this replacement procedure, data
loss could occur.

9 Appendix
9.1 Event Codes

Event messages appear in the event log, which you can view using FibreCAT SX Manager
or the CLI, and in debug logs. You may also receive notifications, depending on your
FibreCAT SX Manager event notification settings.
Table 28 describes critical, warning, and informational events that can occur during
operation. Events are listed in order by numeric event code. Recommended actions
available at this time are also listed.
Event Event Type Description Recommended Action
Code
1 Warning A disk drive in the specified vdisk failed. Refer to Table 24 on page 102 for
The vdisk is online but not fault tolerant. recommended action.
Replace the failed disk drive and add it
as a vdisk spare to the critical vdisk.
If you have dynamic spares enabled,
you only need to replace the disk drive.
The system will automatically recon-
struct the vdisk.
3 Critical The specified vdisk is now offline.
4 Informational A drive in a vdisk had an error that could
not be corrected. The controller has
reassigned the block.
6 Informational or Creating the specified vdisk failed during
warning vdisk creation. This event is considered
informational if it occurs immediately or is
aborted by the user, and a warning if it
occurs during vdisk initialization.
7 Critical The specified controller diagnostic failed.
Table 28: Event Descriptions and Recommended Actions

Event Codes Appendix

Code
8 Warning A drive in a vdisk failed and the vdisk Refer to Table 24 on page 102 for
changed to a critical or offline state. If a recommended action.
spare disk drive is present the controller Replace the failed disk drive and add it
will initiate an automatic reconstruct to the as a vdisk spare to the critical vdisk.
spare. If you have dynamic spares enabled,
you only need to replace the disk drive.
9 Informational A spare disk drive has been used in a
critical vdisk to bring the vdisk back to a
fault-tolerant state. Rebuilding of the vdisk
has or will start automatically.
16 Informational A global spare has been added.
18 Informational or Vdisk reconstruction has completed. This
warning event is considered informational if the
operation succeeds, or a warning if the
operation fails.
19 Informational A bus scan has completed.
20 Informational A firmware update has completed. A Perform a controller restart.
controller restart is required for new
firmware to take effect.
21 Informational or Vdisk verification has completed. This
warning event is considered informational if the
command fails immediately, succeeds, or
is aborted by the user; or a warning if the
operation fails during verification.
22 Critical A battery failure has occurred on the
controller.
23 Informational Vdisk creation has started.
24 Informational The assigned LUN for this volume has
changed.
25 Informational. The statistics for the specified vdisk have
been reset.
27 Informational Cache parameters have been changed for
the specified vdisk.
28 Informational Controller parameters have been
changed.
31 Informational A spare drive was deleted.
32 Informational Vdisk verification has started.

Appendix Event Codes

Code
33 Informational Controller time/date has been changed.
This event is logged before the change
happens so the event timestamp shows
the "old" time.
34 Informational Controller has been restored to factory Perform a controller restart.
defaults. A controller restart is required for
some defaults to take effect.
37 Informational Vdisk reconstruction has started.
39 Warning The sensors monitored a temperature or Check that the array’s fans are running.
voltage in the warning range. Check that the ambient temperature is
not too warm.
Check for any obstructions to the
airflow.
When the problem is fixed, event 47 is
logged.
40 Critical The sensors monitored a temperature or Check that the array’s fans are running.
voltage in the failure range. Check that the ambient temperature is
not too warm.
Check for any obstructions to the
airflow.
logged.
41 Informational A vdisk spare has been added.
43 Informational A vdisk has been deleted.
44 Warning The controller contains dirty cache data Determine the reason that the drives
for the specified volume but the corre- are not online.
sponding disk drives are not present. If an enclosure is down, determine
corrective action.
If the virtual disk is no longer needed,
you can clear the orphan data; this will
result in lost data.
45 Informational A communication failure has occurred
between the controller and an environ-
mental processor.
47 Informational An error detected by the sensors has
been cleared.
48 Informational The vdisk name has been changed.
49 Informational A SCSI maintenance command has
completed.


Code
52 Informational Vdisk expansion has started.
53 Informational Vdisk expansion has completed.
54 Warning The battery life monitor indicates that the
battery needs to be replaced.
55 Informational A SMART event occurred on the specified Impending drive failure. Refer to
drive. Table 24 on page 102 for recom-
mended action.
56 Informational Controller power up.
58 Warning or informa- A disk drive or other SCSI device detected
tional an error. This event is considered a
warning for serious errors such as parity
or drive hardware failure, and informa-
tional for other errors.
59 Warning or informa- The controller detected an error while
tional communicating with the specified SCSI
device. The error was detected by the
controller, not the disk. This event is
considered a warning for parity errors, and
informational for other errors.
60 Informational A disk channel was reset by a third party.
61 Critical A serious error, which might indicate Check the cables to ports on the
hardware failure, occurred while commu- channel.
nicating on the specified disk channel.
The controller will attempt to recover.
62 Informational A spare drive has failed. Replace the failed drive.
63 Informational The controller battery for cache backup is
now charged.
64 Informational The specified disk channel changed from
single-Ended (SE) to LVD mode or vice
versa.
66 Warning A battery failure has occurred.
67 Informational The controller has identified disk drives
(with an existing vdisk) that have been
added from a different controller and has
taken ownership of them.
68 Informational Controller is in a shut-down state.
69 Critical Enclosure reported a general failure.


Code
70 Warning Battery temperature is in the warning
range.
71 Informational The controller has started or completed
failing over.
72 Informational (Active-active environment)
After failover, recovery has started or has
completed.
73 Informational (Active-active environment)
The two controllers are communicating
with each other and the mirror bus is
enabled.
74 Informational The host interface ID for a vdisk was
changed to be consistent with other vdisk
IDs, or the controller or Enclosure
Management Processor (EMP) LUNs.
75 Informational LUN assignment for a volume was
changed because of a conflict with other
volumes, or the controller or Enclosure
Management Processor (EMP) LUNs.
76 Informational The controller is using default configu- If you have just performed a firmware
ration settings. This event occurs on the update and your system requires
first power up, and might occur after a special configuration settings, you must
firmware update. make those configuration changes
before your system will operate as
before.
77 Informational The cache was initialized as a result of
power up or failover.
78 Warning The controller could not use an assigned Replace the spare drive with a disk
spare for a vdisk because the spare’s drive that has enough capacity to
capacity is too small. This occurs when a replace the smallest disk drive in the
vdisk’s status becomes critical and vdisk. The vdisk size is limited by the
available global spares are too small or (if smallest capacity disk.
dynamic spares are enabled) available
disk drives are too small.
79 Informational The controller cleared dead drives on a
vdisk. The trust vdisk operation has
completed successfully.
80 Informational The controller has modified mode param-
eters on one or more drives.


Code
81 Informational The current controller is releasing the
partner controller's kill line. The other
controller will reboot.
83 Informational The partner controller is changing state
(shutting down or rebooting).
84 Warning In an active-active configuration, the
current controller has forced the partner
controller to fail over for the specified
reason.
86 Informational The FC host port or drive parameters
have been changed.
87 Warning The mirrored configuration retrieved by
this controller from the partner controller
has bad cyclic redundancy check (CRC).
The local flash configuration will be used
instead.
88 Warning The mirrored configuration retrieved by
this controller from the partner controller is
corrupt. The local flash configuration will
be used instead.
89 Warning The mirrored configuration retrieved by This likely indicates that the current
this controller from the partner controller controller has down-level firmware.
has a configuration level that is too high for Update the firmware on the down-level
the firmware in this controller to process. controller. Both controllers should have
The local flash configuration will be used the same firmware versions.
instead. When the problem is fixed, event 20 is
logged.
90 Informational The partner controller does not have a

mirrored configuration image for the
current controller, so the current
controller’s local flash configuration is
being used. This event is expected if the
other controller is new or its configuration
has been cleared.
91 Critical The diagnostic utility that tests hardware
reset signals between controllers in an
active-active configuration has failed.


Code
92 Informational The enclosure has detected a status
change in a monitored component (for
example, a fan, power supply, port switch,
or FC loop GBIC receiver).
93 Warning The replacement controller has assumed
the World Wide Name (WWN) of the
original controller. This makes the
replacement of a controller in an active-
active configuration transparent to the
host. If both controllers lose power or are
rebooted, however, the original
controller's WWN will be lost and the
current controller will generate a new
WWN based on its own serial number. A
dual controller reboot will therefore cause
the controller's WWN to change from the
host's perspective.
94 Informational This controller was replaced and If you see this event, verify the WWN
assumed the World Wide Name (WWN) of information for this controller on all
the original controller. However, a dual hosts that access it.
controller reboot resulted in this controller
generating a new WWN. (See event 93.)
95 Critical Both controllers in an active-active config- A service technician must examine both
uration have the same serial number. controller serial numbers and change at
Non-unique serial numbers can cause least one of them.
system problems; for example, array
ownership and WWNs are determined by
serial number.
96 Informational Pending configuration changes that take If the requested configuration changes
effect at boot time were ignored because did not occur, make the changes again
customer data may still be present in and then do a clean shutdown.
cache.
100 Informational During active-active operation, an event
(potential error) occurred while communi-
cating with the Enclosure Management
Processor (EMP), which reports SES
data.
101 Informational Triggers an update of the Enclosure
Management Processor (EMP) data from
the slave controller.


Code
102 Informational (Controllers with parallel SCSI disk
channels only.) Ultra 160 domain
validation failed on one of the controller’s
disk channels. Parameters indicate the
minimum and maximum negotiated rates
on the disk channel, and which device IDs
were affected.
103 Informational Volume name change is complete.
104 Informational Volume size change is complete.
105 Informational Volume LUN change is complete.
106 Informational A volume has been added.
107 Critical The controller experienced the specified A service technician can use the debug
critical error. In a non-redundant configu- log to determine the problem.
ration the controller will be restarted
automatically. In an active-active configu-
ration the surviving controller will kill the
controller that experienced the critical
error.
108 Informational A volume has been deleted.
109 Informational The statistics for the specified vdisk have
been reset.
110 Informational Ownership of the specified vdisk has been
given to the other controller.
111 Informational The link for the specified host port is up.
112 Informational The link for the specified host port is
down. (Occurs after every LIP event.)
113 Informational The link for the specified disk channel port
is up.
114 Informational The link for the specified disk channel port
is down.
115 Critical After a recovery, the other controller was
killed while mirroring write-back data to
this controller. Some writes to LUNs
owned by the other controller may have
been lost.


Code
116 Critical After a recovery, the partner controller
was killed while mirroring write-back data
to the current controller. The current
controller rebooted to avoid losing the
data in the partner controller's cache, but
if the other controller does not reboot
successfully, the data will be lost.
118 Informational Cache parameters have been changed for
the specified vdisk.
119 Informational Through CAPI, the user has completed
putting a replaceable module offline.
putting a replaceable module online.
forcing a replaceable module offline.
forcing a replaceable module online.
123 Informational The system has completed putting a
replaceable module offline.
124 Informational The system has completed putting a
replaceable module online.
125 Informational The system has completed forcing a
126 Informational The system has completed forcing a
127 Warning The controller has detected an invalid disk
drive dual-port connection. This
connection does not have the benefit of
fault-tolerance. Failure of the disk drive
port would cause loss of access to the
drive.
128 Informational Through CAPI, a user has completed
forcing a module offline; however, avail-
ability constraints were detected. An
equivalent "put offline" would have failed.
129 Informational The controller has detected a change in
the status of one of the modules that it
monitors.


Code
130 Warning During controller startup, the system
attempts to put online all installed compo-
nents that were not taken offline by the
user. In this case, a system-initiated "Put
Online" request against the specified
module failed for the specified reason.
131 Informational During controller startup, the system The user should manually request a
attempts to put online all installed compo- "Put Online" for the module when it's
nents that were not originally taken offline ready.
by the user. Because the module
indicated was taken offline by the user, it
was left offline during startup.
132 Informational During controller startup, the system If the identified module should be
found a module that the current configu- online, change the system configu-
ration indicates should not be installed. ration to allow this module to be used.
This module was left down.
133 Critical During controller startup, the system was
unable to bring any data gates online. As
a result, data managers cannot execute at
the same time. In order to bring both data
managers online, at least one data gate
must be online.
134 Warning The system put offline operation has
failed.
135 Warning The system put online operation has
failed.
136 Warning Errors detected on the specified disk Determine the source of the errors on
channel have caused the array to mark the specified disk channel and replace
the channel as degraded. the faulty hardware.
logged.
137 Informational The system has finished downing Replace the failed module as soon as
(removing from online status) a failed possible.
module. The module is marked as failed
and is no longer online.
139 Informational The LAN processor subsystem has
powered up.
140 Informational The LAN processor subsystem is about to
reboot.


Code
141 Informational The IP address has been changed on the
LAN processor.
146 Informational The system has started putting a
147 Informational The system has started putting a
148 Informational The system has started forcing a
149 Informational The system has started forcing a
150 Informational The system has started removing a failed
module from online status. This module
will be marked as failed when the
operation is complete.
151 Informational The module latch microswitch has been Check subsequent events to see if the
activated (i.e. pushed) on the indicated Put Offline request succeeded.
module. The system interprets this as a
request to put the module offline.
152 Informational or The management controller has not sent Try restarting the management
warning a command to the storage controller for an controller.
interval that exceeds the management When the problem is fixed, event 153 is
controller communication timeout, and logged.
may have failed. This is sometimes
referred to as “LAN not talking” error. This
event is considered informational for short
timeouts (<160 sec.), and a warning for
longer timeouts (up to 15 min.).
153 Informational The management controller has re-estab-
lished communication with the storage
controller.
154 Informational New code has been loaded on the
management controller.
155 Informational New loader code has been loaded on the
management controller.
156 Informational The management controller has been
rebooted from the storage controller.
157 Critical A failure was encountered when trying to Replace the controller module.
write to the flash chip. When the problem is fixed, event 111 is
logged.


Code
158 Informational A correctable ECC error has occurred in
the CPU memory.
160 Warning The Enclosure Management Processor
(EMP) enclosures are not configured
correctly. All enclosure EMPs on that
channel are disabled.
161 Informational One or more enclosures do not have a
valid path to an Enclosure Management
Processor (EMP). All enclosure EMPs are
disabled.
162 Warning The host Fibre Channel World Wide Verify the WWN information for this
Names (node and port) previously controller module on all hosts that
presented by this controller module in this access it.
system are unknown. This event has two
possible causes:
One or both controller modules have been
replaced or moved while the system was
powered off.
One or both controller modules have had
their flash configuration cleared (this is
where the previously used WWNs are
stored).
The controller module recovers from this
situation by generating a WWN based on
its own serial number.
163 Warning The host FC World Wide Names (node Verify the WWN information for the
and port) previously presented by an other controller module on all hosts that
offline controller module in this system are access it.
unknown.
This event has two possible causes:
The online controller module reporting the
event was replaced or moved while the
system was powered off.
The online controller module had its flash
configuration (where previously used
WWNs are stored) cleared.
The online controller module recovers
from this situation by generating a WWN
for the other controller module based on
its own serial number.


Code
166 Warning The RAID metadata level of the two Update the controller with the lower
controllers does not match. Usually, the firmware level to match the firmware
controller at the higher firmware level can level on the other controller.
read metadata written by a controller at a
lower firmware level. The reverse is
typically not true. Therefore, if the
controller at the higher firmware level
failed, the surviving controller at the lower
firmware level cannot read the metadata
on drives that have failed over.
167 Warning A diagnostic test at controller bootup A service technician must review the
detected an abnormal operation, which error information returned.
might require a power cycle to correct.
168 Error, warning, or The specified SES alert condition was Most voltage and temperature errors
informational detected in the enclosure indicated. and warnings relate to the power and
cooling module. Refer to Table 27 on
page 112 for recommended action.
169 Informational The specified SES alert condition has This event is generated when the
been cleared in the enclosure indicated. problem that caused event 168 is
cleared.
170 Informational. The last rescan indicates that the
specified enclosure was added to the
system.
171 Informational The last rescan indicates that the
specified enclosure was removed from the
system.
172 Warning The specified vdisk has been quarantined Ensure that all drives are latched into
because not all of its drives are available. their slots and have power.
There are not enough drives to be fault During quarantine, the vdisk is not
tolerant. The partial vdisk will be held in visible to the host. If after latching drives
quarantine until it becomes fault tolerant. into their slot and powering up the
vdisk, the vdisk is still quarantined, you
can manually dequarantine the vdisk so
that the host can see the vdisk. The
vdisk is still critical.
When the vdisk has been dequaran-
tined, event 173 is logged.
173 Informational The specified vdisk has been dequaran-

tined.


Code
174 Informational A device firmware update has completed.
175 Informational An Ethernet link has changed status
(up/down).
176 Informational The error statistics for the specified drive
have been reset.
177 Informational The cache data for a missing volume was
purged.
178 Informational A host has been added to the list of hosts
that can access, or be denied access to, a
LUN. An Add Host command will be
successful.
179 Informational A host has been removed from the list of
hosts that can access or be denied access
to a LUN.
180 Informational Hosts can either access, or be denied
access to, a LUN. This event indicates
when a host list type is changed from
include (to allow access) to exclude (to
deny access) or from exclude to include.
181 Informational Advanced Network Interface Structure
was set. The management controller
configuration has been changed.
182 Informational All busses have been paused. I/O will not
be performed on the drives until all busses
are unpaused.
183 Informational All busses have been unpaused, meaning
that I/O can resume. An unpause initiates
a bus rescan.
184 Informational The battery life monitor has been set.
185 Informational An environmental management processor
(EMP) write command has completed.
186 Informational Enclosure parameters have been set.
187 Informational The write-back cache has been enabled
due to a battery state change.
188 Informational Write-back cache has been disabled due
to a battery state change.
189 Informational A disk channel that was previously
degraded or failed is now healthy.


Code
190 - Informational Includes component-specific environ-
201 mental indicator events generated by the
auto-write-through feature when an
environmental change occurs. If an auto-
write-through-trigger condition has been
met, write-back cache is disabled.
202 Informational An auto-write-through-trigger condition
has been cleared, causing write-back
cache to be re-enabled. The environ-
mental change even is also logged. (See
events 190-200.)
203 Warning An environmental change occurred that Manually enable write-back cache.
allows write-back cache to be enabled,
but the auto-write-back preference is not
set.
The environmental change is also logged.
(See events 190-200.)
205 Informational The specified volume has been mapped
or unmapped.
210 Informational All snapshot partitions have been deleted.
211 Informational The Serial Attached SCSI (SAS) topology
has changed; components were added or
removed.
212 Informational All master volume partitions have been
deleted.
213 Informational A standard volume has been converted to
a master volume or a master volume has
been converted to a standard volume.
214 Informational The creation of a batch of snapshots is
complete. The number of snapshots is
specified.
215 Informational A previously created batch of snapshots is
now committed and ready for use. The
number of snapshots is specified.
216 Informational The deletion of a batch of snapshots is
complete.


Code
217 Critical A super-capacitor failure has occurred on A service technician must replace the
the controller. super-capacitor pack in the controller
reporting this event.
218 Warning The super-capacitor pack is near end of A service technician must replace the
life. super-capacitor pack in the controller
reporting this event.
220 Informational Master volume rollback operation has
started.
221 Informational All master volume partitions have been
deleted.
222 Informational Setting of the policy for the backing store
is complete. Policy is the action to be
taken when the backing store hits the
threshold level.
223 Informational The threshold level for the backing store
has been set. Threshold is the percent
value of the backing store to be set to
handle the out of space issue. The options
are warning, error and critical.
To summarize, policy is the action taken
depending on the threshold value.
224 Informational A background master volume rollback
operation has completed.
225 Critical Background master write copy-on-write
operation has failed. There was an
internal I/O error. Could not complete the
write operation to the disk.
226 Critical A background master volume rollback Check to make sure backing store is
failed to start due to inability to initialize online and the array on which this
the snap pool. All rollback is in a partition exists and restart the
suspended state. operation.
227 Critical Failure to execute rollback for a particular Restart the rollback operation.
portion of the master volume.
228 Critical Background rollback for a master volume Check to make sure backing store is
failed to end due to inability to initialize the online and the array on which this
snap pool. All rollback is in a suspended partition exists, and restart the
state. operation.


Code
229 Warning The snap pool has reached the snap-pool will behave as per the policy set for the
warning threshold. Backing store.
error threshold. Backing store.
critical threshold. Backing store.
232 Warning The maximum number of enclosures

allowed for the current configuration has
been exceeded.
233 Warning The specified drive type is invalid and not
allowed in the current configuration.
234 Critical The specified snap pool is unrecoverable All the snapshots associated with this
and can therefore no longer be used. backing store are invalid and the user
may want to delete them. However, the
data on the master volume can be
recovered by converting it to a standard
volume.
235 Informational A non-disk SCSI device, such as an SES

component or partner controller, has
reported a check condition.
236 Informational A special shutdown operation has started.
237 Informational A firmware update has started and is in
progress.
238 Warning An attempt to write license data failed due
to an invalid license.
239 Warning A timeout has occurred to allow for a
compact flash flush operation.
240 Warning A compact flash flush error has been
detected.
241 - Informational Compact flash status events generated by
242 the auto-write-through feature whenever
an environmental change occurs. If an
auto-write-through-trigger condition has
been met, write-back cache is disabled.


Code
243 Informational A new RAID enclosure has been
detected.
244 Warning There is not enough space to expand the Add more storage and retry the
specified snap pool. operation.
245 Warning An existing disk channel target device is
not responding to SCSI discovery
commands.
247 Warning The ID for the specified field replaceable
unit (FRU) cannot be read.
248 Informational A valid license was successfully installed.
249 Informational A valid license has been installed for the
specified feature. This event is logged for
each feature license installed.
250 Warning A license could not be installed (license is
invalid).
252 Informational Snapshot write data on the specified
mastervolume has been deleted.
253 Informational A license was uninstalled.
254 Warning An incorrect data-parity chunk has been
detected on the specified vdisk.
255 Informational The port bypass circuits (PBCs) on
Controller A and Controller B do not
match, which may limit available configu-
rations.
256 Informational The specified snapshot has been created
but not committed. A commit action is
required before the snapshot can be used.
257 Informational The specified snapshot has been created
and committed.
258 Informational The specified snapshot has been
committed and is ready for use.
259 Informational Inband CAPI commands have been
disabled.
260 Informational Inband CAPI commands have been
enabled.
261 Informational Inband SES commands have been
disabled.

Appendix Failover Reason Codes

Code
262 Informational Inband SES commands have been
enabled.
263 Warning The specified drive spare is missing. It
was either removed or is not responding.
264 Informational The port bypass circuit’s (PBC’s) link
speed and interconnect mode has been
set to the default.
265 Informational Port bypass circuits currently use the Perform a system-level shutdown and
service port, which may limit the link reboot.
speed or interconnect mode support.
266 Informational A copy operation for the specified master
volume has been aborted.
268 Informational A background copy operation for the
specified master volume completed.
269 Informational A partner firmware update operation has
started and is in progress.
9.2 Failover Reason Codes

Table 29 lists the reasons codes for failover. Use these reason codes along with the event
messages to determine the reasons for a failover.
Code CAPI Event Short Description
0 CAPI_FR_NA Not applicable
1 CAPI_FR_FIRMWARE_INCOMPATIBLE Firmware incompatible
2 CAPI_FR_MODEL_INCOMPATIBLE Model incompatible
3 CAPI_FR_HEARTBEAT_LOST Heartbeat lost
4 CAPI_FR_MSG_TO_OTHER_FAILED Message to other failed
5 CAPI_FR_OTHER_NOT_PRESENT Other not present
6 CAPI_FR_CAPI_REQUESTED System call requested
7 CAPI_FR_FOC_REGISTER_ERROR FOC register error
8 CAPI_FR_MEMORY_SIZE_INCOMPATIBLE Memory size incompatible
9 CAPI_FR_BOOT_HANDSHAKE_TIMEOUT Boot handshake timeout
Table 29: Failover Reason Codes

Failover Reason Codes Appendix
Code CAPI Event Short Description

10 CAPI_FR_FIRMWARE_UPDATE Firmware update
11 CAPI_FR_SHUTDOWN Shutdown
12 CAPI_FR_REBOOTING Rebooting
13 CAPI_FR_WRITE_UNIQUE_DATA Write unique data
14 CAPI_FR_OTHER_ORPHAN_DIRTY Orphan data for other
15 CAPI_FR_LOCK_MGR_LOST_COMM Lock manager lost communication
16 CAPI_FR_SAME_SERIAL_NUMBER Same serial number
17 CAPI_FR_CPLD_REVISION_MISMATCH CPLD revision mismatch
18 CAPI_FR_HARDWARE_INCOMPATIBLE Hardware incompatible
19 CAPI_FR_NO_DG_AVAILABLE No data gate available
20 CAPI_FR_FORCED_OFFLINE Forced offline
21 CAPI_FR_PCIX_CONFIG_SEQ_TIMEOUT PCIX config sequence timeout
22 CAPI_FR_I2C_MSG_TO_OTHER_FAILED I2C message to other failed
23 CAPI_FR_OTHER_SHUTDOWN Other shutdown
24 CAPI_FR_OPERATING_MODE_MISMATCH Operating mode mismatch
25 CAPI_FR_SAS_LOCK_TIMEOUT SAS lock timeout
26 CAPI_FR_INTER_CTLR_MSG_TIMEOUT Intercontroller message timeout
27 CAPI_FR_PBC_FORWARD_INCOMPATIBLE Old PBC incompatible with new PBC configu-
ration
0x7F CAPI_FR_UNKNOWN Unknown
Table 29: Failover Reason Codes

Appendix Troubleshooting Using the CLI
9.3 Troubleshooting Using the CLI

This appendix provides specific CLI commands that you can run to troubleshoot your
storage system. For detailed information about using the CLI, see the “FibreCAT SX
Manager Commend Line Interface (CLI)” manual.
Topic include:
● Command Syntax
● Viewing Command Help
● Size of Devices and Logical Units
● clear cache
● clear expander-status
● ping
● reset host-channel-link
● restart
● restore defaults
● set debug-log-parameters
● set led
● set protocols
● show debug-log
● show debug-log-parameters
● show enclosure-status
● show events
● show expander-status
● show frus
● show inquiry
● show protocols
● show redundancy-mode
● trust

Troubleshooting Using the CLI Appendix
9.3.1 Command Syntax

This section describes syntax rules for CLI commands.
● “Keywords and Parameters” on page 136
● “Disk Drive Syntax” on page 136
● “Virtual Disk Syntax” on page 136
● “Volume Syntax” on page 137
● “Volume Mapping Syntax” on page 137
● “Host Nickname Syntax” on page 137
9.3.1.1 Keywords and Parameters
Command keywords must be entered in lowercase. Parameter values can be entered in

uppercase and lowercase.
Parameter values that contain non-alphanumeric characters, such as spaces, must be
enclosed in double quotes, which the CLI parses and removes.
9.3.1.2 Disk Drive Syntax
You can specify disk drives by using either:

● A disk drive ID. For example, 4.
● A hyphenated range of disk drive IDs from a to z. For example, 5-7.
● A list of disk drive IDs, ranges, or both, separated by commas; do not include spaces
before or after commas. For example, 4,5-7.
9.3.1.3 Virtual Disk Syntax
You can specify virtual disks by using either:

● Virtual disk serial number. A unique 32-digit number that is automatically assigned
when a virtual disk is created, and does not change for the life of the virtual disk.
● Virtual disk name. A user-defined name that can include a maximum of 19 printable
ASCII characters. A name cannot include a comma, backslash (\), or quotation mark (“);
however, a name that includes a space must be enclosed in quotation marks. Names
are case-sensitive.

Some commands accept a comma-separated list of virtual disk serial numbers and names.
Do not include spaces before or after commas. The following virtual disk list specifies a
serial number and two names:
00c0ff0a43180048e6dd1c4500000000,Sales/Mktg,”Vdisk #1”
9.3.1.4 Volume Syntax
You can specify volumes by using either:

● Volume serial number. A unique 32-digit number that is automatically assigned when
a volume is created, and does not change for the life of the volume.
● Volume name. A user-defined name that can include a maximum of 20 printable ASCII
characters. A name cannot include a comma, backslash (\), or quotation mark (“);
however, a name that includes a space must be enclosed in quotation marks. Names
are case-sensitive.
NOTE
i Volumes on different virtual disks can have the same name.
Some commands accept a comma-separated list of volume serial numbers and names. Do
not include spaces before or after commas. The following volume list specifies a serial
number and two names:
AA43BF501234560987654321FEDCBA,Image-Data,”Vol #1”
9.3.1.5 Host Nickname Syntax
You can specify a nickname for a data host’s host bus adapter (HBA). A nickname is a user-
defined string that can include a maximum of 16 printable ASCII characters. For example,
MyHBA. A name cannot include a comma, backslash (\), or quotation mark (“); however, a
name that includes a space must be enclosed in quotation marks. Names are case-
sensitive.
9.3.1.6 Volume Mapping Syntax
You specify the mapping of a host to a volume by using the syntax channels.LUN, where:
● channels is a single host channel number or a list of host channel numbers, ranges,
or both. For example, 0,1,3-5.
● LUN is a logical unit number (LUN) from 0–127 to assign to the mapping.
For example, 8.

The following example maps channels 0 and 1 using LUN 8:
0-1.8
9.3.2 Viewing Command Help

To view brief descriptions of all commands that are available to the user level you logged in
as, type:
# help
To view help for a specific command, type either:
# help command
# command ?
To view information about the syntax to use for specifying disk drives, virtual disks, volumes,
and volume mapping, type:
# help syntax
9.3.3 clear cache

Clears any unwritable cache in both RAID controllers for a specified volume, or any
orphaned data for volumes that no longer exist. This command can be used with a dual-
controller configuration only.
For details about using clear cache, see the “CLI” manual.
9.3.4 clear expander-status

(For service technicians only) Clears the counters and status for SAS expander controller
lanes. Counters and status can be reset to a good state for all enclosures, or for a specific
enclosure whose status is ERROR as shown by the show expander-status command.
For details about using clear expander-status, see the “CLI” manual.

9.3.5 ping
Tests communication with a remote host. The remote host is specified by IP address. Ping
sends ICMP echo response packets and waits for replies.
For details about using ping, see the “CLI” manual.
9.3.6 reset host-channel-link

Issues a loop initialization primitive (LIP) from specified controllers on specified channels.
This command is for use with an FC system using FC-AL (loop) topology.
For details about using reset host-channel-link, see the “CLI” manual.
9.3.7 restart
Restarts the RAID controller or the management controller in either or both controller
modules.
If you restart a RAID controller, it attempts to shut down with a proper failover sequence,
which includes stopping all I/O operations and flushing the write cache to disk, and then the
controller restarts. The management controllers are not restarted so they can provide
status information to external interfaces.
If you restart a management controller, communication with it is temporarily lost until it
successfully restarts. If the restart fails, the partner management controller remains active
with full ownership of operations and configuration information.
CAUTION
! If you restart both controller modules, you and users lose access to the system and
its data until the restart is complete.
For details about using restart, see the “CLI” manual.
9.3.8 restore defaults

(For service technicians only) Restores the manufacturer's default configuration to the
controllers. When the command informs you that the configuration has been restored, you
must restart the RAID controllers and management controllers for the changes to take
effect. After restarting the controllers, hosts might not be able to access volumes until you
re-map them.

CAUTION
! This command changes how the system operates and might require some recon-
figuration to restore host access to volumes.
For details about using restore defaults, see the “CLI” manual.
9.3.9 set debug-log-parameters

(For service technicians only) Sets the types of debug messages to include in the storage
controller debug log. If multiple types are specified, use spaces to separate them and
enclose the list in quotation marks (“).
For details about using set debug-log-parameters, see the “CLI” manual.
9.3.10 set led

Changes the state of drive module or enclosure LEDs to help you locate devices. For a drive
module, the OK to Remove LED will blink yellow. For an enclosure, the Unit Locator LED
on the chassis ear and on each controller module will blink white.
For details about using set led, see the “CLI” manual.
9.3.11 set protocols

Enables or disables one or more of the following service and security protocols.
● Standard WBI (http)
● Secure WBI (https)
● Telnet CLI
● Secure shell CLI (ssh)
● FTP firmware upgrade interface
● Storage Management Initiative Specification (SMI-S)
● Simple Network Management Protocol (SNMP)
● Telnet service port 1023
● Telnet debug port 4048
● In-band CAPI management interface
● In-band SES management interface
For details about using set protocols, see the “CLI” manual.

9.3.12 show debug-log

(For service technicians only) Shows the debug logs for the RAID storage controller (SC),
the management controller (MC), or both.
For details about using show debug-log, see the “CLI” manual.
9.3.13 show debug-log-parameters

(For service technicians only) Shows which debug message types are enabled (on) or
disabled (off) for inclusion in the storage controller debug log.
Output
Field Description
host Host interface debug messages
disk Disk interface debug messages
mem Internal memory debug messages
fo Failover/recovery debug messages
msg Inter-controller message debug messages
fca, fcb, fcc, fcd Four levels of Fibre Channel driver debug messages
misc Internal debug messages
rcm Removable-component manager debug messages
raid RAID debug messages
cache Cache debug messages
emp Enclosure Management Processor debug messages
capi Internal Configuration API debug messages
mui Internal service interface debug messages
bkcfg Internal configuration debug messages
awt Auto-write-through feature debug messages
res2 Internal debug messages
capi2 Internal Configuration API tracing debug messages
dms Snapshot feature debug messages
Table 30: Debug Log Parameters
For details about using show debug-log-parameters, see the “CLI” manual.

9.3.14 show enclosure-status

Shows the status of system enclosures and their components. For each attached
enclosure, the command shows general SCSI Enclosure Services (SES) information
followed by component-specific information.
Output
General SES fields:

Field Description
Chassis Chassis serial number
Vendor Name of enclosure vendor
Product ID Product model identifier
Rev Product revision number
CPLD Complex Programmable Logic Device revision number
WWPN World wide port name of the SES device reporting the enclosure status
Status Overall status of the enclosure
Table 31: Enclosure General SES Fields
Enclosure Component Status fields:
Field Description
Type The component type:
Fan: Cooling fan unit
PSU: Power supply unit
Temp: Temperature sensor
Voltage: Voltage sensor
DiskSlot: Disk drive module
# Unit ID
Status Component status:
Absent: Component is not present
Fault: One or more subcomponents has a fault
OK: All subcomponents are operating normally
N/A: Status is not available
FRU P/N Part number of the field-replaceable unit (FRU) that contains the component
FRU S/N Serial number of the FRU that contains the component
Add’l Additional data such as temperature (Celsius), voltage, or slot address
Data
Table 32: Enclosure Component Status Fields

For details about using show enclosure-status, see the “CLI” manual.
9.3.15 show events

Shows events for an enclosure, including events from each management controller and
each storage controller. A separate set of event numbers is maintained for each controller
module. Each event number is prefixed with a letter identifying the controller module that
logged the event.
Events are listed from newest to oldest, based on a timestamp with one-second granularity;
therefore the event log sequence matches the actual event sequence within about one
second.
If SNMP is configured, events can be sent to SNMP traps.
Output
Shows the following information for each event:

Field Description
Timestamp Day, date, time, and year when the event was logged
Event code Identifies the type of event and might help service technicians diagnose
problems; for example, [181]
Event ID Event number prefixed by A or B to indicate which controller module logged
the event; for example, #A123
Controller ID Model, serial number, and ID of the controller module that logged the event.
Severity CRITICAL: Events that might affect data integrity or system stability.
WARNING: Events that do not affect data integrity.
INFORMATIONAL: Events that show the change of state or configuration
changes.
Message Event-specific message giving details about the event; for example, LAN
configuration parameters have been set
Table 33: Event Information
For details about using show events, see the “CLI” manual.
9.3.16 show expander-status

(For service technicians only) Shows diagnostic information relating to SAS expander
controller physical channels, known as PHY lanes. Information is shown by controller for
each enclosure.

Output
Parameter Description
Id Identifier for a specific PHY lane.
Encl Enclosure that contains the SAS expander
Status OK: No errors detected on the PHY lane
ERROR: An error has occurred on the PHY lane
Type DRIVE: Disk drive PHY lane
INTER-EXP: Inter-expander PHY lane, communicating between the SAS
expanders in a dual-controller system
INGRESS: SAS ports on controller enclosures and expansion enclosures
EGRESS: SAS ports on expansion enclosures
Table 34: SAS Expander Information
For details about using show expander-status, see the “CLI” manual.
9.3.17 show frus

Shows information for all field-replaceable units (FRUs) in the controller enclosure and in
any attached expansion enclosures. Some information reported is for use by service techni-
cians.

Output
Field Description
Name FRU name:
CHASSIS_MIDPLANE: 2U chassis and midplane; the metal
enclosure and the circuit board to which power, controller,
expansion, and drive modules connect
RAID_IOM: Controller module
BOD_IOM: Expansion module
POWER_SUPPLY: Power and cooling module
Description FRU description
Part Number FRU part number
Mid-Plane SN For the CHASSIS_MIDPLANE FRU, the mid-plane serial number
Serial Number For the RAID_IOM, BOD_IOM, and POWER_SUPPLY FRUs, the
FRU serial number
Revision FRU revision number
Dash Level FRU template revision number
FRU Shortname FRU part number
Mfg Date Date and time that the FRU was programmed
Mfg Location Location where the FRU was programmed
Mfg Vendor ID JEDEC ID of the manufacturer
FRU Location Location of the FRU in the enclosure, as viewed from the back:
MID-PLANE SLOT: Chassis midplane
UPPER IOM SLOT: Upper controller module or expansion
module
LOWER IOM SLOT: Lower controller module or expansion
module
LEFT PSU SLOT: Left power and cooling module
RIGHT PSU SLOT: Right power and cooling module
Configuration SN A customer-specific configuration serial number
FRU Status Component status:
Absent: Component is not present
Fault: One or more subcomponents has a fault
OK: All subcomponents are operating normally
N/A: Status is not available
Table 35: FRU Information
For details about using show frus, see the “CLI” Manual.

9.3.18 show inquiry

(For service technicians only) Shows inquiry information for each controller module.
Output
● Management controller firmware version and loader version

● Storage controller firmware version and loader version
● Controller module serial number, physical MAC address, and IP address
For details about using show inquiry, see the “CLI” Manual.
9.3.19 show protocols

Shows which service and security protocols are enabled or disabled.
For details about using show protocols, see the “CLI” Manual.
9.3.20 show redundancy-mode

Shows the redundancy status of the system.
Output
Field Description
Redundancy Mode Active-Active
Redundancy Status Redundant Operation: Both controllers are
operating
Only Operational: Only the connected
controller is operating
Controller ID Status Operational: The controller is operational
Not Installed: The controller is not installed
or has failed
Controller ID Serial Number Controller module serial number
Not Available
Table 36: Redundancy Information
For details about using show redundancy-mode, see the “CLI” manual.

9.3.21 trust
Enables an offline virtual disk to be brought online for emergency data collection only. It
must be enabled before each use.
CAUTION
! This command can cause unstable operation and data loss if used improperly. It is
intended for disaster recovery only. Use only when advised to do so by a service
technician.
The trust command resynchronizes the time and date stamp and any other metadata on a
bad disk drive. This makes the disk drive an active member of the virtual disk again. You
might need to do this when:
● One or more disks of a virtual disk start up more slowly or were powered on after the
rest of the disks in the virtual disk. This causes the date and time stamps to differ, which
the system interprets as a problem with the “late” disks. In this case, the virtual disk
functions normally after being trusted.
● A virtual disk is offline because a drive is failing, you have no data backup, and you want
to try to recover the data from the virtual disk. In this case, trust may work, but only as
long as the failing drive continues to operate.
When the “trusted” virtual disk is back online, back up its data and audit the data to make
sure that it is intact. Then delete that virtual disk, create a new virtual disk, and restore data
from the backup to the new virtual disk. Using a trusted virtual disk is only a disaster-
recovery measure; the virtual disk has no tolerance for any additional failures.
For details about using trust, see the “CLI” manual.

Figures
Figure 1: FibreCAT SX Storage System Architecture Overview . . . . . . . . . . . . . . . . . . . 12
Figure 2: Drive Sled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Figure 3: Drive Slot Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Figure 4: Block Diagram of the Controller Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 5: SAS Data Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Figure 6: Block Diagram of the FC HIM Board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 7: Detecting the HIM Model with FibreCAT SX Manager’s WBI

(Example with two HIM Models 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 8: Detecting the HIM Revision with FibreCAT SX Manager’s WBI

(Example with two HIM Models 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 9: Expansion Module Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 10: Cabling of Controller Enclosure to Two Expansion Enclosures . . . . . . . . . . . 24
Figure 11: Enclosure Status LEDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure 12: Enclosure ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Figure 13: Drive Module LEDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 14: Host Link Status LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure 15: Expansion Port Status LED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Figure 16: Ethernet LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Figure 17: Controller Module LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Figure 18: Power and Cooling Module LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Figure 19: Expansion Enclosure LEDs (Back View) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Figure 20: FibreCAT SX Manager’s WBI Summary Screen. . . . . . . . . . . . . . . . . . . . . . . 45

Figures
Figure 21: Expander Controller PHY Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure 22: Debug Buffer Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 23: CAPI Command Trace Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Figure 24: LAN Debug (Management) Trace Example . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Figure 25: Power and Cooling Module and Cooling Fan Locations . . . . . . . . . . . . . . . . . 75
Figure 26: Serial Number Location (1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 27: Serial Number Location (2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 28: Querying the Serial Number Remotely

(example, showing a FibreCAT SX80) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Figure 29: Checking the First Four Letters of the Serial Number
(example, showing a FibreCAT SX80 controller Enclosure) . . . . . . . . . . . . . . . . . . . . . . 85
Figure 30: 1st Four Letters of the FibreCAT SX Serial Number in Ersin
(example, showing a FibreCAT SX80 controller enclosure). . . . . . . . . . . . . . . . . . . . . . . 86
Figure 31: Selecting the FibreCAT SX Product Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Figure 32: Selecting one of the two FibreCAT SX Spares Lists. . . . . . . . . . . . . . . . . . . . 88
Figure 33: Location of Controller/Expansion Module Ejector Thumbscrews

(Controller Modules Shown) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Figure 34: Sliding a Controller Module Into an Enclosure . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 35: Virtual Disk Drive List Panel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Figure 36: Releasing the Drive Module Handle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Figure 37: Removing the Power and Cooling Module from the Chassis . . . . . . . . . . . . 113

Tables
Table 1: Notational Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Table 2: Enclosure Status LEDs (Front). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Table 3: Drive Module LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 4: Host Link Status LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Table 5: Expansion Port Status LED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 6: Ethernet LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Table 7: Controller Module Status LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Table 8: Power and Cooling Module LEDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Table 9: Expansion Enclosure LEDs (Back) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Table 10: Problems Accessing the Array Using FibreCAT SX Manager’s WBI . . . . . . . . 64
Table 11: Power Supply Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Table 12: Cooling Element Fan Sensor Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Table 13: Temperature Sensor Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Table 14: Voltage Sensor Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Table 15: FibreCat SX60 Controller Enclosure FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Table 16: FibreCat SX80 Controller Enclosure FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Table 17: FibreCat SX60 / SX80 / SX88 Controller Enclosure FRUs . . . . . . . . . . . . . . . 81
Table 18: FibreCat SX88 Expansion Enclosure FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Table 19: 1st Four Letters of the FibreCAT SX Serial Numbers. . . . . . . . . . . . . . . . . . . . 84
Table 20: Controller Module or Expansion Module Faults . . . . . . . . . . . . . . . . . . . . . . . . 90
Table 21: Standard SCSI Sense Key Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Table 22: Common ASC and ASCQ Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Tables
Table 23: Disk Drive Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Table 24: Disk Channel Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Table 25: Disk Drive Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Table 26: Virtual Disk Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Table 27: Power and Cooling Module Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Table 28: Event Descriptions and Recommended Actions . . . . . . . . . . . . . . . . . . . . . . 115
Table 29: Failover Reason Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Table 30: Debug Log Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Table 31: Enclosure General SES Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Table 32: Enclosure Component Status Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Table 33: Event Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Table 34: SAS Expander Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Table 35: FRU Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Table 36: Redundancy Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Related Documents and Links
[1] Storage Systems Safety Notes
Basic safety information including the handling of racks and rack mount enclosures.
Supported with the hardware as printed manual.
[2] FibreCAT SX60 / SX80 / SX88 Quick Start Guide

Overview and references for getting current information; short instructions for installation,
configuration, and administration.
Supported via http://www.fujitsu-siemens.com/support/manuals.html
[3] FibreCAT SX60 / SX80 / SX88 Operating Manual

Installing and configuring hardware.
[4] FibreCAT SX60 / SX80 / SX88 Best Practices Guide

Recommendations for maximizing reliability, accessibility, and serviceability.
[5] FibreCAT SX60 / SX80 / SX88 Administrator’s Guide

Using the FibreCAT SX Manager’s (FSM) web based interface WBI to configure and
manage a system.
[6] FibreCAT SX Manager Command Line Interface (CLI)

Using the FibreCAT SX Manager’s (FSM) command-line interface (CLI) to configure and
manage a system.
[7] FibreCAT SX60 / SX80 / SX88 Service Manual (the manual in hand)
The latest version of the manual
is available at http://www.fujitsu-siemens.com/support/manuals.html

Related Documents and Links
[8] Late-breaking information not included in the information set at

http://www.fibrecat.net
[9] See also the user forum for the FibreCAT SX Series at
http://www.fibreservice.net
[10] The Helpdesk telephone numbers are listed at

http://www.fujitsu-siemens.com/support/helpdesk.html
[11] Latest firmware is available at

http://www.fujitsu-siemens.com/support/downloads.html

Index
A syntax 137
AC Power Good LED 40 volume mapping syntax 137
AC power module command syntax
installing 114 CLI 136
advanced manage-level functions controller module
dequarantining a virtual disk 62 cache status LED 39
saving log information to a file 63 Ethernet activity LED 37
Ethernet link status LED 37
B expansion port status LED 35
bad block Fault/Service Required LED 38
list size, displaying 48 FC link speed LED 33
reassignments, displaying 48 FRU OK LED 38
boot handshake 96 host activity LED 33
host link speed LED 33
C host link status LED 33
cables identifying faults 90
identifying faults installing 95, 96
expansion enclosure side 99 LEDs 33
host side 99 OK to Remove LED 38
cache only one boots 90
checking status 39 removing 93
clearing 57 replacing 96
size 91 Unit Locator LED 38
status LED 39 updating firmware 97
CLI controller modules
disk drive syntax 136 conflicts 90
help, view command 138 cooling element
host nickname syntax 137 fan sensor descriptions 75
keyword syntax 136 critical events
parameter syntax 136 selecting to monitor 61
virtual disk critical state, virtual disk
name 136 preventing 62
syntax 136
volume D
serial number 137 data paths

Index
isolating faults 49 installing 104, 107

DC Voltage/Fan Fault/Service Required LED 40 removing 106
debug log 70 replacing 106, 109
setting up 70 drive modules
viewing 71 identifying faults 99
debug utilities LEDs 31
debug log setup 70 OK to Remove LED 32
saving log information to a file 63 Power/Activity/Fault LED 32
view CAPI trace 58
view error buffers 58 E
view mgmt trace 59 enclosure
viewing debug log 71 replacing 114
dequarantining, virtual disks 63 enclosure ear
diagnostic manage-level only functions Fault/Service Required 30
clearing unwritable cache data 57 FRU OK LED 30
selecting individual events for notification 60 Temperature Fault LED 31
view CAPI trace 58 Unit Locator LED 30
view error buffers 58 enclosure ID
view mgmt trace 59 locating 31
viewing the debug log 71 errors
disabled PHY 50 displaying media errors 48
disaster recovery. See trust virtual disk PHY 50
disk drive reviewing disk drive statistics 47
See drive module Ersin 85
disk drives Ethernet activity LED 37
available 55 Ethernet link status LED 37
bad block reassignments 48 event logs
bad block size 48 disabled PHY 53
capturing trend data 48 reviewing 49
clearing metadata 55 event notification
event logs 49 selecting individual events to monitor 60
I/O timeout count 47 events
identifying faulty disks 46 configuring notification 60
LEDs 105 expansion module
leftover 55 Fault/Service Required LED 42
locating 47 FRU OK LED 42
media errors 48 identifying faults 90
no response count 48 installing 95, 96
reviewing error statistics 47 OK to Remove LED 42
capturing trend data 48 removing 93
spin-up retires 48 replacing 96
disk error stats 47 SAS In port status LED 41
drive module SAS Out port status LED 41
identifying 105 Unit Locator LED 42

Index
expansion port status LED 35 Host Interface Module

restriction 21
F host link speed LED 33
fault isolation 50 host link status LED 33
Fault/Service Required LED
controller module 38 I
enclosure ear 30 I/O
expansion module 42 checking status 46
faults displaying timeout count 47
identifying icons
cables 99 system status 44
disk drive 46 informational events
drive modules 99 enabling 69
power and cooling modules 111 selecting to monitor 61
virtual disks 110
isolating L
controller module 33 LEDs
data path faults 53 controller module
Ethernet management connection 37 cache status LED 39
expansion enclosure connectivity 35 Ethernet activity 37
host-side connection 33 Ethernet link status 37
methodology 27 Fault/Service Required 38
Field Return Tag 88 FRU OK LED 38
firmware host activity LED 33
controller partner, disabling automatic host link speed LED 33
update 97 host link status LED 33
updating 96 OK to Remove 38
FRU OK LED Unit Locator 38
controller module 38 drive module
enclosure ear 30 OK to Remove LED 32
expansion module 42 Power/Activity/Fault LED 32
FRUs enclosure ear
checking status 39 Fault/Service Required 30
determining health status 44 FRU OK LED 30
identifying location 39 Unit Locator LED 30
replacing expansion module
enclosure 114 Fault/Service Required LED 42
static electricity precautions 89 FRU OK LED 42
OK to Remove LED 42
H SAS In port status 41
HIM SAS Out port status 41
restriction 21 Unit Locator LED 42
host activity LED 33 expansion port status 35
host channels, resetting 54 power and cooling module

Index
AC Power Good 40 R
DC Voltage/Fan Fault/Service RAIDar
Required 40 cache data status 57
Temperature Fault LED 31 checking I/O status 46
leftover disk drives configuring event notification 60
clearing metadata 55 debug utilities 58
LIP, remotely issuing on host channels 54 diagnostic manage-level user only
log information functions 55
saving to a file 63 disk error statistics 47
log information, saving 67 displaying system status 44
loop initialization primitive. See LIP enable/disable trust virtual disk 56
icons, system status 44
M locating a disk drive 47
metadata reviewing event logs 49
clearing 55 status summary 44
Model of HIM 21 using to troubleshoot 43
recovery
O clearing cache data 57
OK to Remove LED disaster
controller module 38 trust virtual disk 56
drive module 32 rescan disks 50
expansion module 42 reset PHY status 52
resetting host channels 54
P
restriction
partner controller, disabling automatic update 97
direct connect mode 21
PHY
FibreCAT SX60 / SX80 21
disabled 50
errors 50 S
event logs 53 SAS In port status LED 41
fault isolation 50 SAS Out port status LED 41
fencing 50 saving
rescan disks 50 log information 67
reset status 52 sensors
physical layer interface. See PHY 50 cooling fan 75
power and cooling module locating 74
AC Power Good LED 40 power supply 74
DC Voltage/Fan Fault/Service Required temperature 76
LED 40 voltage 77
identifying faults 111 Serial Number 83, 84, 85
power module SMART
replacing 114 displaying event count 47
Power/Activity/Fault LED 32 spin-up retries, displaying 48
Product Class 87 static electricity precautions 89
status

Index
determining overall array health 44

disk 47
status summary 44
system status, displaying 44
T
Temperature Fault LED 31
temperature sensor descriptions 76
temperature warnings, resolving 73
trust virtual disk
U
Unit Locator LED
controller module 38
enclosure ear 30
expansion module 42
V
view CAPI trace 58
view error buffers 58
view mgmt trace 59
virtual disks
clearing cache data 57
dequarantining 63
disaster recovery 56
identifying faults 110
preventing critical state 62
voltage sensor descriptions 77
voltage warnings, resolving 73
W
warning events
selecting to monitor 61
warnings, temperature 73

Fibrecat SHB en

Uploaded by

Copyright:

Available Formats

Fibrecat SHB en

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fibrecat SHB en

Uploaded by

Copyright:

Available Formats

FibreCAT SX60 / SX80 / SX88

Edition September 2007 (Version 1.2, 2007-09-17)

Copyright and Trademarks

This manual is printed

1.1 Before You Read This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

FibreCAT SX60 / SX80 / SX88 Service Manual

3 Fault Isolation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Gather Fault Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Troubleshooting Using System LEDs . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1 Enclosure LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Troubleshooting Using WBI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.1 Determining Overall Array Status and Verifying Faults . . . . . . . . . . . . . . . 44

FibreCAT SX60 / SX80 / SX88 Service Manual

5.4 Isolating Data Path Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Troubleshooting Using Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1 Using Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

FibreCAT SX60 / SX80 / SX88 Service Manual

7 Voltage and Temperature Warnings . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.1 Resolving Voltage and Temperature Warnings . . . . . . . . . . . . . . . . . . . 73

8 Troubleshooting and Replacing FRUs . . . . . . . . . . . . . . . . . . . . . . . . 79

8.1 Available FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

FibreCAT SX60 / SX80 / SX88 Service Manual

8.11.3 Disk Channel Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

9.1 Event Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

FibreCAT SX60 / SX80 / SX88 Service Manual

9.3.13 show debug-log-parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Related Documents and Links . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

FibreCAT SX60 / SX80 / SX88 Service Manual

1.1 Before You Read This Book

FibreCAT SX60 / SX80 / SX88 Service Manual 9

1.2 Notational Conventions

10 FibreCAT SX60 / SX80 / SX88 Service Manual

FibreCAT SX60 / SX80 / SX88 Service Manual 11

2.1 Architecture Overview

Figure 1: FibreCAT SX Storage System Architecture Overview

12 FibreCAT SX60 / SX80 / SX88 Service Manual

2.2 Controller and Expansion Enclosure Architecture

FibreCAT SX60 / SX80 / SX88 Service Manual 13

2.2.2.1 Expansion Enclosure Usage Behind a RAID System

2.3 Drive Modules

14 FibreCAT SX60 / SX80 / SX88 Service Manual

Figure 2: Drive Sled

Figure 3: Drive Slot Numbers

FibreCAT SX60 / SX80 / SX88 Service Manual 15

2.3.1 Disk Drives

2.3.2 Drive Module Dongle

2.3.2.1 SAS Drive Dongle

2.3.2.2 SATA Drive Dongle

2.4 Controller and Expansion Modules

16 FibreCAT SX60 / SX80 / SX88 Service Manual

2.4.1 Controller Module

2.4.1.1 Storage Controller

2.4.1.2 Management Controller

FibreCAT SX60 / SX80 / SX88 Service Manual 17

Figure 4: Block Diagram of the Controller Module

18 FibreCAT SX60 / SX80 / SX88 Service Manual

2.4.2 SAS Data Path

Disk data path