Bda X7
Bda X7
Bda X7
Customer:
Task Number:
Technician:
Version EIS-DVD:
Date:
• It is recommend that the EIS web pages are checked for the latest version of this checklist prior
to commencing the installation.
• The idea behind this checklist is to help the installer achieve a "good" installation.
• It is assumed that the installer has attended the appropriate training classes.
• Use of a laptop (preferably with Solaris or Linux available) is recommended during the
installation.
• It is not intended that this checklist be handed over to the customer.
• Feedback on issues with EIS content or product quality is welcome – refer to the last page of this
checklist.
The final configuration steps to configure the BDA system to use ASR are expected to be
performed by an Oracle Advanced Customer Services (ACS) software engineer during actions that
are subsequent to the EIS activities and hence is not part of this installation checklist. An exception
to this is the configuration of the InfiniBand switches for ASR (page 49), since the standard ASR
scripts do not include the InfiniBand switches.
Some or all of the following additional EIS Installation Checklists may be required:
• Oracle Advanced Support Gateway (OASG) Server
• Multi-Rack Cabling of Engineered Systems (via InfiniBand).
Oracle Internal and Approved Partners Only Page 1 of 51 Vn 1.1b Created: 15 Feb 2018
ENGINEERED SYSTEMS ENTERPRISE SUPPORT TEAM EEST
If a Field Support Engineer (FSE) requires assistance while installing an Engineered System, a
streamlined method of engaging an Engineered Systems Enterprise Support Team (EEST)
engineer when the SR owner is not available is described below.
The SR referred to is the installation SR.
An FSE would use this option if they are onsite and require immediate assistance from EEST.
This process is specifically for FSE Callbacks for hardware, installation and Field Change Order
(FCOs) support and should NOT be used by ACS or partners.
The complete process is described in MOS Document ID 1803744.1 EEST: Silent Menu Option :
GCSEXA (internal-only). Ensure that you understand the contents before going onsite.
Task Check
Oracle Internal and Approved Partners Only Page 2 of 51 Vn 1.1b Created: 15 Feb 2018
INFORMATION: EIS Checklist Steps and Estimated Timing
Item First Page Est. Duration
ASR Preparation 4 1 hour
BDA Rack Preparation 5 1 hour
Unpacking 9 1 hour
Initial Power-On Actions 11 30 mins
Configuring the CISCO Ethernet Switch 13 45 mins
Verifying / Configuring Rack PDUs 20 30 mins
Configuring the InfiniBand NM2-GW Switches 22 15 mins / switch
InfiniBand Spine Switch Installation 28 15 mins
IB Switches Instance Check 32 5 mins / switch
BDA Server Nodes 33 15 mins / server
Verification of the InfiniBand Network 37 5 to 30 mins
Customer Network Preparation 39 1 hour
Connecting to Customer Network 44 1 hour
Configure the InfiniBand switches to use ASR 49 1 hour
BDA Mammoth Utility 51
Handover 51 30 mins
Oracle Internal and Approved Partners Only Page 3 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
When the ASR manager version went from 4.x to 5.x the package name and subsequently
the path names changed. Thus if the search for package SUNWswasr failed, try again for
asrmanager :
On a Linux system:
# rpm -qa | grep asrmanager
asrmanager-5.5.1 <<<<<<< Version 5.5 is OK!
On a Solaris System:
# pkginfo -l asrmanager
PKGINST: asrmanager
NAME: ASR Manager
CATEGORY: application
ARCH: all
VERSION: 5.3.0 <<<<<<< Version 5.3.0 requires an update!
BASEDIR: /
VENDOR: Oracle Corporation
<SNIP>
Oracle Internal and Approved Partners Only Page 4 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
If the installed ASR Manager host does not meet the minimum requirements, request the
customer to update prior to installation of the BDA rack(s).
Oracle Internal and Approved Partners Only Page 5 of 51 Vn 1.1b Created: 15 Feb 2018
ASSUMPTIONS
In an EIS installation the following assumptions are made:
• Systems come pre-configured from the factory with a default name and IP address. This is
utilised in this check list for actions before connecting to the customer network. The root
password on all systems OS and ILOM has been set to welcome1. The default name and IP
address scheme used is on page 7.
• We understand that v4.10.1 shipped on all BDA X7 racks starting from RR in October 2017.
• EIS recommends that connections from the rack components to the Customer network should
be made AFTER the initial configuration (as described in this checklist). Connecting before
power on could cause undesirable interactions due to possible presence of a duplicate IP
address in the Customer's environment.
• For BDA X7-2 racks, it is required to use a laptop (preferably with Solaris or Linux) plugged
into the Cisco switch using SSH to the default IP addresses defined on page 7. A cable is
provided plugged into Cisco port 48 with settings appropriate for the customer network uplink
to another switch. Do NOT use port 48 on the Cisco for a laptop – use any other free port or
temporarily one of the PDU ports.
• Component numbering starts with 1 at the rack bottom working upwards based on the server
type. BDA Server Node 1 is the lower-most server node in the rack location U2, BDA Server
Node 9 is the upper-most server node in the bottom half rack location U18; BDA Server Node
10 starts the top half progressing up to 18.
• BDA Full Rack configuration has 18 server nodes. A BDA Starter Rack has 6 server nodes
and BDA Elastic Upgrades add additional BDA Server Nodes one at a time up to 12, for a total
of 18 server nodes in the rack.
Oracle Internal and Approved Partners Only Page 6 of 51 Vn 1.1b Created: 15 Feb 2018
Table Showing Default Settings for Full, Expansion & Starter Racks
PDU-A 192.168.1.210
PDU-B 192.168.1.211
Oracle Internal and Approved Partners Only Page 7 of 51 Vn 1.1b Created: 15 Feb 2018
INFORMATION: CABLE LABELS WITHIN RACK
The cables between the various units within the rack are labelled by manufacturing. The cables
are also colour-coded as follows:
• Black – InfiniBand Data
• Black – InfiniBand Switch & BDA Server Node Ethernet management cables
• Red – ILOM Ethernet management cables
• Black – AC power jumper cables
Some examples are given here:
At an InfiniBand switch (connection to second switch) (black cable):
R1 U20 P8A (local): Rack Unit 20 Port 8A on the switch.
R1 U24 P8A (remote): Rack Unit 24 Port 8A on the switch.
At an InfiniBand switch to a PCI card on a server (black cable):
R1 U20 P15A (local): Rack Unit 20 Port 15A on the switch.
R1 U12 PCIE3-1 (remote): Rack Unit 12 PCIE card in slot 3, port #1.
At a server's ILOM to Ethernet switch (red cable):
R1 U8 ILOM (local): Rack Unit 8 ILOM NET MGT port.
R1 U23 P38 (remote): Rack Unit 23 Port 38 on the Ethernet switch.
At a server's power cable / PDU1 (black cable):
U19 PS0 (local): Rack Unit 19 Power Supply 0.
PDU A (remote): Group 2 Output 3 on PDU A (left side, viewed from rear).
G2-3
For data cables the label at the opposite end of the cable is labelled with local/remote exchanged;
for power cables the labels at each end are identical.
Oracle Internal and Approved Partners Only Page 8 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
UNPACKING
For reference refer to the Oracle® Big Data Appliance Site Checklists. Use the latest
release available here: https://docs.oracle.com/en/bigdata/
Oversee the delivery of the rack(s). It is the responsibility of the delivery company to
unpack and roll the rack into place into the data
center. The Oracle FSE should be present for
oversight to ensure that the delivery company
follows proper procedure (e.g., doesn't roll it over
rough floor).
Determine the Rack Master Serial Number The Rack's Master Serial Number is located on the
and contact your regional Installation top left side wall (viewed from rear) inside the rack
on the rear of the chassis.
Coordinator either by phone or email and
provide this serial number. This is so that
your Installation Coordinator can begin the
process to verify Install Base information is
correct for future entitlement purposes.
Delivery complete?
Collect the white Customer Information The installer should inform the customer about
Sheets (CIS). location of spares (& documentation) kit and
request the customer to safely store the kit outside
Any documentation delivered with the system of the data center room.
should be stored with the spares.
Allow the system to acclimatise (power off) at Refer to EIS standard “Acclimatisation of Oracle
the customer site if required. Hardware Products”.
Collect packing material together for disposal.
Unpack outside data center to ensure no
contamination/dust is released inside
customer's controlled environment.
Verify all packing material has been removed, Fans & air vents must be free to operate.
i.e. nothing is blocked. Metal brace plates screwed on the rear of the rack
where the PDU power cords are tied are brackets
for shipping only. They should be removed if they
were not already removed by the unpacking
engineers, so as to not block airflow out of the rack.
Ensure that the Rack levelling feet have been This will prevent it rolling forward or back while
lowered to stabilize the rack to the floor. working on the rack.
Refer to the Oracle® Rack Cabinet 1242 User’s
Guide Section Stabilize the Rack (Leveling Feet)
Oracle Internal and Approved Partners Only Page 9 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
There are a number of spare parts that should be handed to the Customer for safe-keeping
(must be able to locate them when needed!):
• Located inside the rack in a bundle:
• 1 QSFP InfiniBand copper cables (3M) for FRU replacement
• 1 Black Cat5e Ethernet cable (10')
• 1 Red Cat5e Ethernet cable (10')
• 1 Blue Cat5e Ethernet cable (7')
• Located in the ride-along boxes:
• 1 10TB Disk Drive
• Cisco Switch Accessory Kit
• InfiniBand cables for multi-racking (6x 3m & 10x 5m)
• X7-2L Documentation Kit
• Oracle Rack Cabinet 1242 (Foxconn) Accessory Kit
There is also a set of 2 keys to open the rack doors and side panels.
Oracle Internal and Approved Partners Only Page 10 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Oracle Internal and Approved Partners Only Page 11 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Verify that for all systems the OK LED is The OK LED blinks on for 0.1 seconds once every
blinking “standby”. This means that the 3 seconds when in “standby”.
ILOM is up and that the host is off. The system OK LED does NOT flash while ILOM
is booting as it did on past systems. The LED will
If the system does not go into Standby, stay dark until it goes into Standby blink mode after
connect to that unit's SP SER MGT port with 2 to 3 minutes.
baud settings 9600,8,N,1. If it is at the pre-
boot> menu, then check the locate button on
front and rear is not stuck depressed, then type
boot.
Verify that the server’s LEDs have been correctly manufactured (Bug 27416683).
On all X7-2 and X7-2L servers within the rack, press the chassis locate LED for 10 seconds
until all the LED's come on. Verify the LED colours on the front left indicator module are
as follows:
Left:
• Locate LED is White
• Service LED is Amber
• System OK LED is Green
• Do NOT Service LED is White
Right:
• Top Fan LED is Amber
• Rear PS LED is Amber
• Temperature LED is Amber
• SP OK LED is Green.
In particular verify that the Service LED is not Green and the SP OK LED is not Amber. If
the system does not have the correct colours, replace the front left indicator module (part
7322171) and open a CPAS citing Bug 27416683.
Oracle Internal and Approved Partners Only Page 12 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
2 The BDA rack's ride-along accessory kit may or may not (depending on Cisco) include this Cisco adapter.
Oracle Internal and Approved Partners Only Page 13 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
GETTING STARTED
We understand that manufacturing will wipe the switch configuration before delivery. Thus
when the rack's PDUs are powered up the switch will boot and enter the Basic System
Configuration Dialog. Since the connection to the Cisco serial port has probably been
made after the power was applied the initial output from the switch will probably have
been missed.
Enter “?” then press the RETURN key – you should find one of the following:
1. If you see a prompt similar to <switch_name> login: the switch has been
previously configured – go to Erasing the Configuration below.
2. If you see the prompt:
Abort Auto Provisioning and continue with normal setup ?(yes/no)[n]:
Enter "yes" & go to System Admin Account Setup & Basic Configuration on page 15.
If an additional return was entered and the default response of ‘n’ was selected, power
down the switch and turn it back on.
ERASING THE CONFIGURATION
As mentioned above this step should only be needed if (somehow) the switch was shipped
pre-configured... Firstly log in (hoping that the default password has been used):
orcltsw-adm01 login: admin
Password: welcome1
orcltsw-adm0#
Then reload the system. This will cause the switch to reboot & there will be a long
dialogue which has been clipped here:
orcltsw-adm0# reload
This command will reboot the system. (y/n)? [n] y
2017 Aug 31 01:09:00 exadatax7-adm0 %$ VDC-1 %$ %PLATFORM-2-PFM_SYSTEM_RESET: Manual
system restart from Command Line Interface
CISCO SWITCH Ver7.59
Device detected on 0:1:2 after 0 msecs
Device detected on 0:1:1 after 0 msecs
Device detected on 0:1:0 after 0 msecs
MCFrequency 1333Mhz
Relocated to memory
Time: 8/31/2017 1:9:22
<SNIP>
INIT: version 2.88 booting
<SNIP>
INIT: Entering runlevel: 3
Running S93thirdparty-script...
Populating conf files for hybrid sysmgr ...
Starting hybrid sysmgr ...
inserting /isan/lib/modules/klm_cisco_nb.o ... done
Oracle Internal and Approved Partners Only Page 14 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
The following is the first prompt after the reboot of the switch:
Abort Auto Provisioning and continue with normal setup ?(yes/no)[n]: yes
Wait 30 seconds for the next prompt, do not hit enter otherwise it will accept the next
prompt default of 'yes'. If you enter yes by accident, you will have to go through the whole
configuration before you can erase and start again to clear it.
SYSTEM ADMIN ACCOUNT SETUP & BASIC CONFIGURATION
We now set the password for user admin – recommended is welcome1:
Do you want to enforce secure password standard (yes/no) [y]: no
The Basic System Configuration Dialogue will start. Ignore the request to register the
Cisco Nexus9000 device:
---- Basic System Configuration Dialog VDC: 1 ----
<SNIP>
Would you like to enter the basic configuration dialog (yes/no): yes
Answer “no” to the following questions except when entering the name for the switch:
Create another login account (yes/no) [n]: no
Type of ssh key you would like to generate (dsa/rsa) [rsa]: rsa
Oracle Internal and Approved Partners Only Page 15 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Final configuration:
Configure default interface layer (L3/L2) [L2]: L2
Oracle Internal and Approved Partners Only Page 16 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Configure STP to be "network" for port 48, which is designated as the uplink port, and
expected to be connected as uplink to another switch:
orcltsw-adm01(config)# interface Ethernet 1/48
orcltsw-adm01(config-if)# spanning-tree port type network
orcltsw-adm01(config-if)# exit
If you will be connecting the switch to 2 different uplink switches for redundancy on the
same VLAN – OR – using an uplink port other than 48, then change this and the previous
command accordingly. Use "network" for uplink ports so the switch is able to detect and
protect against routing loops and "edge" for non-uplink ports.
Before plugging the uplink port(s) into the customers network consult with their network
administrator to see if the default spanning-tree configuration is acceptable.
Set low STP priority to avoid this switch becoming the "root bridge":
orcltsw-adm01(config)# spanning-tree vlan <1> priority 61440
The value for vlan may be variable depending on Customer uplink settings.
Normally Customers have already configured their core switches to have a better priority
so this problem would not happen at a real Customer.
By default the Cisco treats all ports as Layer3 interfaces. Change the default using
"switchport" by itself to change all ports to act as Layer2 interfaces:
orcltsw-adm01(config)# interface Ethernet 1/1-48
orcltsw-adm01(config-if-range)# switchport
orcltsw-adm01(config-if-range)# exit
Configure the DNS client (this does not work using the install script):
orcltsw-adm01(config)# ip domain-name x7toi.com <<< Example domain name!
orcltsw-adm01(config)# ip name-server 10.100.100.2 <<< Example address!
VERIFICATION
Verify that all changed settings are correct using the "show running-config" command.
Below is an edited example output. More than likely there will be additional default
settings displayed that we did not set which may be different to the settings required by the
customer's network. If a setting is incorrect and needs to be changed refer to the next task
(on page 18 below).
orcltsw-adm01# show running-config
!Command: show running-config
!Time: Thu Aug 31 13:06:40 2017
version 7.0(3)I5(2)
power redundancy-mode combined force
Oracle Internal and Approved Partners Only Page 17 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
switchname orcltsw-adm01
vdc orcltsw-adm01 id 1
limit-resource vlan minimum 16 maximum 4094
limit-resource vrf minimum 2 maximum 4096
limit-resource port-channel minimum 0 maximum 511
limit-resource u4route-mem minimum 248 maximum 248
limit-resource u6route-mem minimum 96 maximum 96
limit-resource m4route-mem minimum 58 maximum 58
limit-resource m6route-mem minimum 8 maximum 8
feature interface-vlan
clock timezone PST -8 0
no password strength-check
username admin password 5 $5$CrlKuGiG$A2skAGr3jmZvBn1fgDYGFoHA9xWrdez9MpuSTHSwo96
role network-admin
ip domain-lookup
ip domain-name x7toi.com
ip name-server 10.100.100.2
system default switchport
copp profile lenient
snmp-server user admin network-admin auth md5 0x962ed71b70064edc73d53a89eb14ea8a
priv 0x962ed71b70064edc73d53a89eb14ea8a localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO
ntp server 10.100.100.2 use-vrf default
vlan 1
interface Vlan1
no shutdown
ip address 10.100.100.110/24
interface Ethernet1/1
spanning-tree port type edge
<SNIP>
interface Ethernet1/47
spanning-tree port type edge
interface Ethernet1/48
spanning-tree port type network
interface Ethernet1/49
<SNIP>
interface Ethernet1/54
interface mgmt0
vrf member management
line console
line vty
boot nxos bootflash:/nxos.7.0.3.I5.2.bin
ip route 0.0.0.0/0 10.100.100.1
no system default switchport shutdown
orcltsw-adm01#
If anything in the above list is incorrect, go back and repeat the appropriate section. To
erase a setting whilst in config mode, insert "no" in front of the same command. Any other
settings that the customer requires should be checked and corrected by the customer.
Oracle Internal and Approved Partners Only Page 18 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
The installer should send the above output from running-config (minus any password
lines) to the customer network administrator to verify so they may suggest any changes
necessary for attaching to the customer network.
Make the current configuration permanent:
orcltsw-adm01# copy running-config startup-config
[########################################] 100%
Copy complete.
orcltsw-adm01# exit
FINALLY
Disconnect the cable from the CISCO console.
The Cisco switch must NOT be connected to the Customers management network at this
stage. This will be done later after Oracle have configured the systems with the customer's
IP addresses and the customer has verified the running-config on the switch and worked
with the FSE to make any additional changes necessary for attaching to the customer
network.
If you wish to check the Cisco switch attach your laptop to a port on the Cisco and ping the
IP addresses of the BDA internal management network. Do NOT use port 48 on the Cisco
for a laptop – use any other free port or temporarily one of the PDU ports.
Oracle Internal and Approved Partners Only Page 19 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Connect an RS-232 cable between the SER MGT port and the host (e.g. Laptop or other
suitable system).
Configure the host’s terminal or terminal Terminal Configuration Settings:
emulator (it is assumed that the installer • 9600 baud
knows how to do this – otherwise see page 58 • 8 bit
of the PDU User’s Guide). • 1 stop bit
• no parity bit
• no flow control
At the terminal device, log in to the PDU User = admin, pwd = adm1n
metering unit: Password may be: welcome1
After successful login, enter the Customer’s network configuration:
pducli->set net_ipv4_dhcp=Off
pducli->set net_ipv4_ipaddr=xxx.xxx.xxx.xxx
pducli->set net_ipv4_subnet=xxx.xxx.xxx.xxx
pducli->set net_ipv4_gateway=xxx.xxx.xxx.xxx
It is sufficient to perform the reset just once (as shown in the task below).
The PDU can be optionally configured for DNS with:
pducli->set net_ipv4_dns1=xxx.xxx.xxx.xxx
pducli->set net_ipv4_dns2=xxx.xxx.xxx.xxx
Click on the Net Configuration link found in You will need to login:
the upper left side of the page to view the IP User = admin, pwd = adm1n
settings. Password may be: welcome1
Oracle Internal and Approved Partners Only Page 20 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
SETTING PDU SYSTEM TIME, NTP & INFORMATION A B
Oracle Internal and Approved Partners Only Page 21 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Switch
1 2
Note the Gateway switches are referred to as Leaf 1 and Leaf 2 however the default
hostname scheme usually refers to them as ib2 and ib3, where ib1 is the non-gateway spine
switch.
Set the DNS server and domain name:
-> set /SP/clients/dns auto_dns=enabled
-> set /SP/clients/dns nameserver=<IP address>
-> set /SP/clients/dns searchpath=<domain name>
where <IP address> is up to three comma separated name server IP addresses in preferred
search order. e.g. “nameserver=10.196.23.245,138.2.202.15”
where <domain name> is the customer's full DNS domain name (everything after the host
name and first dot).e.g. “searchpath=us.oracle.com”.
Oracle Internal and Approved Partners Only Page 22 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Switch
1 2
Verify the DNS settings:
-> show /SP/clients/dns
/SP/clients/dns
Targets:
Properties:
auto_dns = enabled
nameserver = 10.196.23.245, 138.2.202.15
retries = 1
searchpath = us.oracle.com
timeout = 5
<SNIP>
If any of the “ip<parameter>” values are wrong, correct them repeating the above
“pendingip<parameter>” settings followed by commitpending=true.
Oracle Internal and Approved Partners Only Page 23 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Switch
1 2
Set the Timezone. Setting the timezone first ensures the offset from UTC is maintained
correctly as normally the hardware clock is kept in UTC. If the date is not close to current
time, then set it after setting the timezone:
-> show /SP/clock (Verify the current setting)
-> set /SP/clock timezone=<zone identifier>
-> show /SP/clock (Verify the new setting displays correctly)
where <zone identifiter> is the identifier of the proper timezone file, such as “US/Eastern”
or “America/New_York”. This should be provided by the Customer on the configuration
worksheet.
Time zone data provided with the Oracle Big Data Appliance and Oracle Enterprise Linux
comes from the zoneinfo database. For a reference list of latest time zone values, refer to
the zoneinfo database available in file /usr/share/zoneinfo on one of the Linux-based server
nodes, or in the public domain available via http://www.iana.org/time-zones
The timezone files available supplied on the IB switch may not have the latest that is on the
above site.
Set the SP clock manually to something near current time, if not already.
-> show /SP/clock (Verify the current setting)
-> set /SP/clock datetime=MMddHHmmCCyy
-> show /SP/clock (Verify the new setting displays correctly)
using the format MMddHHmmCCyy Month, Day, Hour, Minute, Century, Year.
Configure the NTP settings. NTP is critical to the operation of the BDA applications:
-> set /SP/clients/ntp/server/number address=IP_address
Where number can be 1 or 2 depending on how many and which NTP server you are
configuring and IP_address is the address of that server. Use “1” for the primary NTP
server and repeat the command using “2” for the secondary.
-> set /SP/clock usentpserver=enabled
If the customer does not use NTP on their network, then the first two BDA server nodes
should be configured as NTP servers, prior to the deployment scripts being run. Failure to
have a proper clock synchronized will cause the deployment scripts for Hadoop to fail. For
specific instructions on how to configure NTP server on Linux, refer to MOS Document ID
1554253.1.
VERIFY THE SETTINGS
-> show /SP/clients/ntp/server/1
/SP/clients/ntp/server/1
Targets:
Properties:
address = 10.204.74.2
Commands:
cd
set
show
Oracle Internal and Approved Partners Only Page 24 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Switch
1 2
-> show /SP/clients/ntp/server/2
/SP/clients/ntp/server/2
Targets:
Properties:
address = 10.196.16.1
Commands:
cd
set
show
-> show /SP/clock
/SP/clock
Targets:
Properties:
datetime = Mon Jan 30 11:53:19 2012
timezone = EST (America/New_York)
usentpserver = enabled
Commands:
cd
set
show
Verify that the Rack Master Serial Number is populated on the InfiniBand Gateway Switch
ILOM:
-> show /SP system_identifier
/SP
Properties:
system_identifier = Oracle Big Data Appliance X7-2 AK012345678
Oracle Internal and Approved Partners Only Page 25 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Switch
1 2
Enter the Fabric Management shell.:
-> start /SYS/Fabric_Mgmt
Are you sure you want to start /SYS/Fabric_Mgmt (y/n)? y
FabMan@bda1sw-ib2->
Check the overall health of the switch: If there are issues discovered at this point,
FabMan@bda1sw-ib2-> showunhealthy they must be corrected.
OK - No unhealthy sensors
Oracle Internal and Approved Partners Only Page 26 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Switch
1 2
Use the setsmpriority list command to determine the current priority setting:
FabMan@bda1sw-ib2->setsmpriority list
Current SM settings:
smpriority 5
controlled_handover TRUE
subnet_prefix 0xfe80000000000000
M_Key None
Routing engine FatTree
FabMan@bda1sw-ib2->
The switch LEDs will flash while rebooting, it will take ~5 minutes with no serial output
until it is completely booted which will give the login prompt again.
Disconnect the laptop's serial cable from the IB switch's USB to DB9 adapter port, leaving
the USB to DB9 adapter wired in the rack.
Oracle Internal and Approved Partners Only Page 27 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
FabMan@bda1sw-ib1->
Oracle Internal and Approved Partners Only Page 28 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Use the setsmpriority list command to determine the current settings for priority and
controlled handover. For the Spine switch they should be as follows:
FabMan@bda1sw-ib1-> setsmpriority list
Current SM settings:
smpriority 8
controlled_handover TRUE
subnet_prefix 0xfe80000000000000
M_Key None
If one of the settings need to be changed, perform the procedure in the following steps:
• disablesm
• setsmpriority and/or setcontrolledhandover
• enablesm
The detailed procedure is shown in the following tasks / rows.
Use the disablesm command to stop the Subnet Manager:
FabMan@bda1sw-ib1-> disablesm
Stopping IB Subnet Manager.. [ OK ]
The smnodes list needs to contain the IP addresses of all switches which have Subnet
Manager enabled so that partition configuration can be synchronized across all these
switches.
Check if the smnodes list exists, and if it does not have the IP's of all the switches listed,
then add or delete them as needed:
FabMan@bda1sw-ib1-> smnodes list
FabMan@bda1sw-ib1-> smnodes add IP_address IP_address ...
or
FabMan@bda1sw-ib1-> smnodes delete IP_address IP_address ...
FabMan@bda1sw-ib1-> smnodes list
Logout from the InfiniBand switch ILOM Exit the Fabric Manager shell:
shell. FabMan@bda1sw-ib1->exit
Oracle Internal and Approved Partners Only Page 29 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Disconnect the laptop’s serial cable from the
InfiniBand switch’s USB-to-DB9 adapter port
or the laptop’s Ethernet cable from the Cisco
Ethernet switch.
REPEATING ABOVE ACTIONS ON GW LEAF SWITCH WITH PRIORITY 5 2 3
Check if the smnodes list exists, and if it does not have the correct IP's of all the switches
listed, then add or delete them as needed:
FabMan@bda1sw-ib1-> smnodes list
FabMan@bda1sw-ib1-> smnodes add IP_address IP_address ...
or
FabMan@bda1sw-ib1-> smnodes delete IP_address IP_address ...
FabMan@bda1sw-ib1-> smnodes list
Logout from the InfiniBand switch ILOM Exit the Fabric Manager shell:
shell. FabMan@bda1sw-ib1->exit
Oracle Internal and Approved Partners Only Page 30 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Connect to any InfiniBand switch using a
serial cable or Ethernet cable to the Cisco
Ethernet switch (preferred).
Login as ilom-admin: localhost: ilom-admin
password: welcome1
The switch OS is Linux-based but has an ILOM supports up and down arrows for command-
ILOM interface that will be used to make the line history and left and right for command-line
necessary configuration changes. editing which should be used to make these steps
easier to complete. Tab can also be used for
command-line completion where possible.
Enter the Fabric Management shell as above: -> start /SYS/Fabric_Mgmt
Use the getmaster command to check the location of the Master Subnet Manager. The
following example shows that the Master Subnet Manager is currently running on the first
leaf switch:
FabMan@bda1sw-ib1-> getmaster
Local SM enabled and running, state STAND BY
20110207 11:34:04 OpenSM Master on Switch : 0x0021286cccb6a0a0 ports 36 Sun DCS
36 QDR switch bda1sw-ib2 enhanced port 0 lid 1 lmc 0
If the Master Subnet Manager is running on the Spine switch, no further action is required;
go to the next Section of this checklist..
If the Master Subnet Manager is not running on the Spine switch, you need to relocate the
Master Subnet Manager as described in the following steps.
Log in as user ilom-admin on the switch that
is the current Master Subnet Manager and
enter the fabric manager shell.
Use the disablesm command to stop the Subnet Manager. The Master Subnet Manager
will then failover to another switch.
Wait ten seconds (may be longer for larger multi-rack cablings) for the InfiniBand network
to update and then use the getmaster command to identify the current location of the
Master Subnet Manager. Based on the previous configuration of the priority setting the
Master Subnet Manager should now be running on the Spine switch.
The following example shows that the Master Subnet Manager has been relocated from the
first leaf switch:
FabMan@bda1sw-ib2-> disablesm
Stopping IB Subnet Manager.. [ OK ]
FabMan@bda1sw-ib2-> getmaster
20110207 11:34:04 OpenSM Master on Switch : 0x0021286cccb6a0a0 ports 36 Sun DCS 36
QDR switch bda1sw-ib1 enhanced port 0 lid 1 lmc 0
Use the enablesm command to re-enable the Subnet Manager on any switches where the
Subnet Manager has been disabled during this procedure:
FabMan@bda1sw-ib2-> enablesm
Starting IB Subnet Manager. [ OK ]
Starting partitiond daemon. [ OK ]
Oracle Internal and Approved Partners Only Page 31 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Switch
1 2
FabMan@bda1sw-ib2->
If it reports No BXM system name set... (as shown above) or the system name is set to
0, then it MUST be set manually to a value between 0 and 63. Even numbered values are
preferred.
If GW1 is not 10, then set it to 10 following the factory scheme:
FabMan@bda1sw-ib2->setgwinstance 10
Stopping Bridge Manager..-. [ OK ]
Starting Bridge Manager. [ OK ]
FabMan@bda1sw-ib2->setgwinstance --list
BXM system name set to 10
Now repeat the above for GW2 using 20 as the value (in place of 10).
Oracle Internal and Approved Partners Only Page 32 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Each server will boot itself up through BIOS and boot the OS with the default factory IP
configuration.
The servers may take 5 – 10 minutes to boot through the normal BIOS POST tests.
From a laptop connected to the Cisco It is recommended NOT to use port 48 on the Cisco for
switch on 192.168.1.x, SSH to the BDA a laptop – use any other free port or temporarily one of
the PDU ports.
ILOM at 192.168.1.101.
Example:
Then connect to the console & login to the -> start /SP/console
host. <press RETURN key>
bda01 login:
User: root
Password: welcome1
You may have to press Enter to wake up the system
first.
The system checks are done using the dcli command to run across all the nodes in the rack
at the same time, using SSH authorization keys. The dcli command defaults to using the
known BDA configuration JSON files hence there is no need to specify the -g option to
dcli.
For Elastic configurations you may need to use the appropriate number between 7 and 18
with the '-j' option for each dcli command as follows:
[root@bda01 bda]# dcli -j "eth0_ips[1:9]" "hostname ; date"
where the eth0_ips[1:n] is the total number of "n" nodes in the rack. If the -j option is
omitted, then there will be a delay in the command while the non-existent hosts wait for
SSH to timeout.
SSH keys should have already been distributed across the rack during the factory
configuration. Verify SSH keys:
[root@bda01 ~]# dcli "hostname ; date"
If this asks you for a password then enter Control-C (several times) and continue,
otherwise go to the next step.
To generate the root SSH keys and push them across the rack with the default 'welcome1'
password, use the provided script:
[root@bda01 ~]# setup-root-ssh -p welcome1
Add the following option to the above setup-root-ssh command according to the
configuration:
Starter rack: -j "eth0_ips[1:6]"
Full rack: -j "eth0_ips[1:18]"
Elastic config: -j "eth0_ips[1:x]" where x is between 7 and 18.
Re-verify by re-running:
[root@bda01 ~]# dcli "hostname ; date"
Oracle Internal and Approved Partners Only Page 33 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Verify the System Serial Numbers (check against front of systems S/N sticker) are correct
for each node assignment, where 1 is the lowest system, and 18 is the highest system.
[root@bda01 ~]# dcli "dmidecode -s chassis-serial-number"
192.168.10.1: # SMBIOS implementations newer than version 2.8 are not
192.168.10.1: # fully supported by this version of dmidecode.
192.168.10.1: 1733XC2033
...
192.168.10.6: # SMBIOS implementations newer than version 2.8 are not
192.168.10.6: # fully supported by this version of dmidecode.
192.168.10.6: 1733XC2034
Verify the Rack Master Serial Number is set correctly, check against the rack front S/N
sticker:
[root@bda01 ~]# dcli "ipmitool sunoem cli 'show /SP system_identifier'" | grep =
192.168.10.1: system_identifier = Oracle Big Data Appliance AK00024695
...
192.168.10.18: system_identifier = Oracle Big Data Appliance AK00024695
If the Rack Master Serial Number is incorrect, insert it into the ILOM on every system
(refer to the IP addresses on page 7):
Enter the following command (on one line, no break):
[root@db01 ~]# dcli -l root \
"ipmitool sunoem cli 'set /SP system_identifier=\
"\"Oracle Big Data Appliance AK00024695\""'" \
> /tmp/set-rack-csn.out
Oracle Internal and Approved Partners Only Page 34 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
After powering up each server you should The first one should already be there from the factory,
see the following two files in the /root but the second one is generated after you power up the
system at the customer site.
directory:
If instead you find the file BDA_IMAGING_FAILED,
BDA_IMAGING_SUCCEEDED or BDA_REBOOT_FAILED one or more of our
hardware or software checks failed. The files
BDA_REBOOT_SUCCEEDED /root/bda_imaging_status and
[root@bda01 ~]# dcli ls -l /root/BDA* /root/bda_reboot_status will give you more detailed
information on what checks succeeded or failed.
The following checks should also help identify any
issues that should be rectified before continuing to the
ACS portion.
Gather the hardware profile output from each system into a file for review:
[root@bda01 ~]# dcli bdacheckhw > ~/all-bdahwcheck.out
If there are no issues the above command should return nothing. If there is an issue, it will
return this as the output. Optionally, use “less” or “more” to page through the output file
which verifies the hardware configuration is supported, in the correct slots. Any failures
and warnings need to be investigated and rectified before continuing.
If there are any INFO lines, these can be ignored. Action any failed checks.
Specific lines that are worthy of grep'ing out for individual component verification are:
[root@bda01 ~]# grep cores ~/all-bdahwcheck.out should be 96.
[root@bda01 ~]# grep memory ~/all-bdahwcheck.out should be ~252 (256GB).
[root@bda01 ~]# grep fans ~/all-bdahwcheck.out should be 4.
[root@bda01 ~]# grep supply ~/all-bdahwcheck.out should be all OK.
An easy way to verify that the power supplies are all present is to use wc -l to count the
number of output lines, there should be 12 for a Starter, an appropriate multiple of 2 for
elastic configurations and 36 for Full Rack.
Verify that the disks and volumes are all present:
[root@bda01 ~]# grep disk ~/all-bdahwcheck.out | grep
"model\|status" | more should be LSI with disks 0 to 11 the same model and all
“Online, Spun Up No alert”
If using “wc -l” to count lines replace “more” with “|grep Online |wc -l”, which will
count the disks online – should be 72 for a Starter, an appropriate multiple of 12 for elastic
configurations and 216 for a Full Rack.
Verify that the IB HCA is being seen properly:
[root@bda01 ~]# grep Host ~/all-bdahwcheck.out | grep model
should be:
SUCCESS: Correct Host Channel Adapter model: Mellanox Technologies MT27500 [ConnectX-3]
If Memory DIMM faults are encountered during the installation or upgrade of any
engineered systems, please follow the "BEST PRACTICE" as supplied by the hardware
team – refer to page 2 of this checklist.
Oracle Internal and Approved Partners Only Page 35 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Verify that the RAID volumes are all present and in optimal state:
[root@bda01 ~]# dcli MegaCli64 -ldinfo -lall -a0 | grep
"Virtual Drive\|State" > ~/all-ldstate.out
[root@bda01 ~]# less ~/all-ldstate.out
should be Optimal 12 virtual drives numbered 0 to 11 for each server node. If using “wc
-l” to count lines, change “less” to “grep Optimal” - there should be 72 for a Starter,
an appropriate multiple of 12 for elastic configurations and 216 for a Full Rack.
Gather the software profile output from each system into a file for review:
[root@bda01 ~]# dcli bdachecksw > ~/all-bdaswcheck.out
The software profile output checks the partition setup and software versions. Check there
are no failures in the software profile output:
[root@bda01 ~]# grep -vi success ~/all-bdaswcheck.out
If there are no issues the above command should return nothing. If there is an issue, it will
return this as the output. Optionally, use less or more to page through the output file
which verifies the software configuration is supported. If there are any INFO lines, these
can be ignored. If there are any failures then the system OS may need to be re-imaged for
re-partitioning.
Verify the boot order is correct on all of the nodes:
[root@bda01 ~]# dcli efibootmgr
192.168.10.1: BootCurrent: 0000
192.168.10.1: Timeout: 1 seconds
192.168.10.1: BootOrder: 0000,0004,0005,0006,0007
192.168.10.1: Boot0000* Oracle Big Data Appliance 4
192.168.10.1: Boot0004* NET0:PXE IP4 Intel(R) I210 Gigabit Network Connection
192.168.10.1: Boot0005* Oracle Linux
192.168.10.1: Boot0006* Oracle Linux
192.168.10.1: Boot0007* Oracle Linux
[...]
192.168.10.4: BootCurrent: 0000
192.168.10.4: Timeout: 1 seconds
192.168.10.4: BootOrder: 0000,0001,0002,0003,0004
192.168.10.4: Boot0000* Oracle Big Data Appliance 4
192.168.10.4: Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit Network Connection
192.168.10.4: Boot0002* Oracle Linux
192.168.10.4: Boot0003* Oracle Linux
192.168.10.4: Boot0004* Oracle Linux
NOTE: If upgrading to a new BDA server base image is required for any reason, then it is
recommended that this be done now, prior to any additional setup of customer network or
connections, although re-imaging should not matter if it is network configured or not.
ACS, EEST, or TSC x86 can provide instructions and the image if necessary.
Oracle Internal and Approved Partners Only Page 36 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Oracle Internal and Approved Partners Only Page 37 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
LINK bdasw-ib2.4A ... bda10.HCA-1.2 UP
LINK bdasw-ib2.8A ... bdasw-ib3.8A UP
LINK bdasw-ib2.8B ... bdasw-ib1.1B UP
[root@bda1 ~]#
For more extensive checking of the fabric, review the output from "iblinkinfo".
Oracle Internal and Approved Partners Only Page 38 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
IMAGE_VERSION : 4.10.0
IMAGE_CREATION_DATE : Fri Sep 29 04:58:55 UTC 2017
IMAGE_LABEL : BDA_MAIN_LINUX.X64_170927
LINUX_VERSION : Oracle Linux Server release 6.9
KERNEL_VERSION : 4.1.12-94.5.9.el6uek.x86_64
BDA_RPM_VERSION : bda-4.10.0-1.el6.x86_64
JDK_VERSION : jdk1.8.0_141-1.8.0_141-fcs.x86_64
IMAGE_VERSION : 4.10.1
IMAGE_CREATION_DATE : Tue Oct 24 06:34:17 UTC 2017
IMAGE_LABEL : BDA_4.10.0_LINUX.X64_RELEASE
LINUX_VERSION : Oracle Linux Server release 6.9
KERNEL_VERSION : 4.1.12-94.5.9.el6uek.x86_64
BDA_RPM_VERSION : bda-4.10.1-1.el6.x86_64
JDK_VERSION : jdk1.8.0_141-1.8.0_141-fcs.x86_64
Update the BDA rpm on all nodes & remove the package afterwards::
[root@bda01 ~]# dcli "rpm -Uvh /tmp/bda-4.10.0-3.el6.x86_64.rpm"
[root@bda01 ~]# dcli "rm -f /tmp/bda-4.10.0-3.el6.x86_64.rpm"
Oracle Internal and Approved Partners Only Page 39 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
INITIAL CONFIGURATION OF THE NETWORK
The HW installation engineer will now perform the initial configuration of the network
based on the customer's configuration provided using the network setup JSON files
provided by the ACS engineer using the installation tracking website. There are two
scripts used to setup the network, one script to be run before connecting the Cisco and
Client 10GbE to the customer's networks and one script to be run after.
Next the customer-specific json files obtained from ACS will be copied to the BDA Server
1 via scp or USB thumb drive. Detailed instructions for using the USB drive follow; if
using scp go to the Section COPY CUSTOMER_SPECIFIC NETWORK.JSON FILE on the
next page.
COPYING FILE TO BDA SERVER 1 FROM USB MEDIA
Insert USB drive3 into the first server (db01) and locate the drive:
# for x in `blkid | cut -d: -f1 | grep -i sd` ; do udevadm info -q property -n $x
| grep -iq "id_bus=usb" ; if [ $? -eq 0 ] ; then echo $x ; fi ; done
/dev/sdb1
#
From the above output, one can see that the USB primary partition to mount is /dev/sdb1.
Alternative method for locating the USB partition to mount:
1. Locate the drive from the kernel messages log (some lines below have wrapped):
[root@node8 oracle.SupportTools]# tail -20 /var/log/messages
Apr 15 16:15:48 node8 lsidiagd-monitor: lsidiagd is alive
Apr 15 20:26:06 node8 kernel: usb 2-1.4: new high speed USB device number 3 using ehci_hcd
Apr 15 20:26:06 node8 kernel: usb 2-1.4: New USB device found, idVendor=0781, idProduct=5530
Apr 15 20:26:06 node8 kernel: usb 2-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
From the above one can see that the new device is sdb.
2. Output from fdisk for a FAT32-formatted USB stick:
[root@node8 oracle.SupportTools]# fdisk -l /dev/sdb
Disk /dev/sdb: 8036 MB, 8036285952 bytes
255 heads, 63 sectors/track, 977 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
From the above output one can see that the USB primary partition to mount is sdb1.
3 Confirm that the file system type on the USB stick is FAT32 in order to prevent Linux mounting problems during
the installation.
Oracle Internal and Approved Partners Only Page 40 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Create a directory for mounting the USB # mkdir /mnt/usb
drive on the database server using the
following command:
Mount the device. Use the device name Example:
given in the first part of this subsection: # mount -t vfat /dev/sdb1 /mnt/usb
Oracle Internal and Approved Partners Only Page 41 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
If the above fails due to ssh keys, then follow the instruction given to run
“/opt/oracle/bda/bin/remove-root-ssh” first, and if necessary also “rm
/root/.ssh/authorized_keys ” then re-run “rack-networksetup”.
Since the network setup script also changes the ILOM network address, at the end of the
script the connection may appear to be hung. In fact it has simply changed networks and is
no longer accessible.
After completing the above step, the systems will have IP addresses suitable for the
Customer's network. The network on the node connected to will also be restarted. The
installer must now take steps so that their laptop is on the customer’s network range:
• Disconnect PDU A’s Ethernet cable.
• Change the installer laptop to PDU A’s IP and netmask (NO GATEWAY)
Now the “Reconnect” step in the next task should work.
Reconnect to the host’s ILOM to connect to Example:
# ssh root@10.100.50.101
its console:
Password: welcome1
• Login via SSH to the server's new IP for
ILOM. -> start -f /HOST/console
Oracle Internal and Approved Partners Only Page 42 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Connect the Cisco switch port 48 to the Customer's management network – the Customer's
network administrator may wish to perform this step. There is already a blue network
cable attached to this port, coiled and tied off within the cabinet. It may have been used for
earlier configuration work. It may be used for this connection, or replaced with a customer
supplied cable if its not sufficient length or incorrect colour.
The Cisco switch should not be connected until the running configuration has been verified
and any necessary changes have been made by the customer's network administrator.
After connection ensure that you can get to the management addresses from outside the
switch.
If the customer wishes to use the SFP+ ports for a fibre uplink in port 48, then the interface
setting for port 48 needs to be changed as follows:
bda1sw-ip#configure terminal
Oracle Internal and Approved Partners Only Page 43 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
CONNECTING TO CUSTOMER'S CLIENT DATA NETWORK
Oracle recommends the following guidelines for connecting 10GbE connections to the IB
Gateway switches on the BDA:
• The same number of 10GbE connections should be made to both IB Gateway switches.
• If connecting between 1-4 10GbE connections to each switch, use a single QSFP
splitter cable to the 0A-ETH port of both switches.
• If connecting between 5-8 10GbE connections to each switch, use 2 QSFP splitter
cables to both 0A-ETH and 1A-ETH ports of both switches. In this case divide
connections as evenly as possible between the 2 splitter cables.
• When connecting multiple 10GbE connections to a single QSFP splitter cable, make
connections starting with the lowest numbered port and counting upwards e.g. for 2
connections use 0A-ETH1 and 0A-ETH2.
• 10GbE connections should be made to exactly the same ports on both IB Gateway
switches. If connections are made to 0A-ETH1 and 0A-ETH2 on one switch,
connections should only be made to 0A-ETH1 and 0A-ETH2 on the other switch.
Connect 10Gb Ethernet cables between the In simplest configuration there will be 1 cable each
BDA InfiniBand gateway switch ports and between the Customer's network and each InfiniBand
gateway switch. If there are additional network
the customer's network. connections used the number of cables will multiply
Once the cables are routed, the Customer's accordingly.
network administrator may need to perform
some network switch end configuration to
bring the links up.
Verify the 10GbE gateway switch links are up on both IB Gateway leaf switches. The
minimum supported configuration is 1 10GbE link on each IB Gateway leaf switch.
Connect to the IB gateway switch via SSH, and login as ilom-admin user. Then enter the
Fabric Management shell:
-> start /SYS/Fabric_Mgmt
Are you sure you want to start /SYS/Fabric_Mgmt (y/n)? y
FabMan@bda1sw-ib2->
Oracle Internal and Approved Partners Only Page 44 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Following Task to be Performed on both IB Gateway Switches 1 2
(Leaf 1 & 2 with hostnames ib2 & ib3 respectively)
Run the following, and check the Bridge entries have active links up for all ports that are
connected from the IB Gateway leaf switches to the customer's network switch. In this
example, 4 ports are connected on this switch:
FabMan@bda1sw-ib2-> listlinkup
…
Connector 0A-ETH Present
Bridge-0 Port 0A-ETH-1 (Bridge-0-2) up (Enabled)
Bridge-0 Port 0A-ETH-2 (Bridge-0-2) up (Enabled)
Bridge-0 Port 0A-ETH-3 (Bridge-0-1) up (Enabled)
Bridge-0 Port 0A-ETH-4 (Bridge-0-1) down (Enabled)
Connector 1A-ETH Present
Bridge-1 Port 1A-ETH-1 (Bridge-1-2) up (Enabled)
Bridge-1 Port 1A-ETH-2 (Bridge-1-2) up (Enabled)
Bridge-1 Port 1A-ETH-3 (Bridge-1-1) up (Enabled)
Bridge-1 Port 1A-ETH-4 (Bridge-1-1) down (Enabled)
…
Note: If the customer's using an Oracle Sun Network 10GbE Switch 72p, then the links
may not come up until the appropriate ports are enabled. Ensure the EIS checklist for that
switch has been completed as well.
Now verify that the DNS and NTP servers can be pinged from an IBGW switch. Log into
IBGW1 (ib2) as root and ping each of the customer specified 1G router, DNS and NTP
servers. If DNS can’t be contacted, cluster-networksetup (below) will likely fail.
If the router or DNS/NTP servers cannot be pinged, this should be resolved before moving
forward to cluster-networksetup.
Use the 2nd script cluster-networksetup to push out and create the VNICs and
10GbE bond interfaces on the servers:
[root@bda1c01 network]# ./cluster-networksetup
Make sure you capture the output to a file. The output file should be provided to your
Install Co-ordinator to upload to the install tracker so the ACS engineer can review this
prior to coming on-site to do the SW install.
Now reboot all the nodes and the InfiniBand Gateway switches.
Verify the VNICs are created properly on each switch as follows. Connect to the IB
gateway switch via ssh and login as ilom-admin user.
Then enter the Fabric Management shell:
-> start /SYS/Fabric_Mgmt
Are you sure you want to start /SYS/Fabric_Mgmt (y/n)? y
FabMan@bda1sw-ib2->
Oracle Internal and Approved Partners Only Page 45 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Each active port should be assigned the default VLAN (vlan id=0): 1 2
FabMan@hostname-> showvlan
Connector/LAG VLN PKEY
------------- --- ----
0A-ETH-1 0 ffff
0A-ETH-3 0 ffff
1A-ETH-1 0 ffff
1A-ETH-3 0 ffff
If the interfaces are configured with Link Aggregation (LAG), you will see this instead of
the connector port/link name:
FabMan@bda1sw-ib2->showvlan
Connector/LAG VLN PKEY
------------- --- ------
LAG-01 0 0xffff
Finally you should see VNICs created round-robin on each server and 10GbE interface
(output cut for brevity):
FabMan@hostname-> showvnics
ID STATE FLG IOA_GUID NODE IID
MAC VLN PKEY GW
--- -------- --- ----------------------- -------------------------------- ----
----------------- --- ---- --------
561 UP N 0021280001CF4C23 bda1node12 BDA 192.168.41.31 0000
CE:4C:23:85:2B:0A NO ffff 0A-ETH-1
...<SNIP>
Also from the BDA nodes, make sure you can ping out the 10Gb interfaces and verify the
customer can ping into the BDA nodes 10Gb interfaces from the customer's data network.
NTP VERIFICATION ON THE INFINIBAND SWITCHES
Login as root to the IB switches and verify that the NTP service is running properly.
Note: The first example is of the internal Cisco NTP server. The second example is a
failure because NTP was not enabled on the NTP server. Note the transmit and receive. If
it is not responsive you will only see transmits with no receives.
First example:
[root@edx1sw-iba ~]# ntpdate -ud 129.148.9.196
6 Dec 17:49:40 ntpdate[30753]: ntpdate 4.2.6p2@1.2194-o Fri Jun 9 11:44:57 UTC 2017 (1)
Looking for host 129.148.9.196 and service ntp
host found : 129.148.9.196transmit(129.148.9.196)
receive(129.148.9.196)
transmit(129.148.9.196)
receive(129.148.9.196)
transmit(129.148.9.196)
receive(129.148.9.196)
transmit(129.148.9.196)
receive(129.148.9.196)
transmit(129.148.9.196)
server 129.148.9.196, port 123
stratum 8, precision -19, leap 00, trust 000
refid [129.148.9.196], delay 0.02647, dispersion 0.00000
transmitted 4, in filter 4
reference time: ddd2ac39.95a4fe78 Wed, Dec 6 2017 17:49:45.584
originate timestamp: ddd2ac44.444d1403 Wed, Dec 6 2017 17:49:56.266
transmit timestamp: ddd2ac44.442bb16f Wed, Dec 6 2017 17:49:56.266
filter delay: 0.02657 0.02655 0.02647 0.02654
0.00000 0.00000 0.00000 0.00000
filter offset: 0.000021 0.000036 0.000021 0.000026
0.000000 0.000000 0.000000 0.000000
delay 0.02647, dispersion 0.00000
offset 0.000021
6 Dec 17:49:58 ntpdate[30753]: adjust time server 129.148.9.196 offset 0.000021 sec
Oracle Internal and Approved Partners Only Page 46 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Second Example:
[root@hyw1r002sw-iba01 ~]# ntpdate -ud 129.148.9.196
6 Dec 17:54:45 ntpdate[32489]: ntpdate 4.2.6p2@1.2194-o Fri Jun 9 11:44:57 UTC 2017 (1)
Looking for host 129.148.9.196 and service ntp
host found : 129.148.9.196transmit(129.148.9.196)
transmit(129.148.9.196)
transmit(129.148.9.196)
transmit(129.148.9.196)
transmit(129.148.9.196)129.148.9.196: Server dropped: no data
server 129.148.9.196, port 123
stratum 0, precision 0, leap 00, trust 000
refid [129.148.9.196], delay 0.00000, dispersion 64.00000
transmitted 4, in filter 4
reference time: 00000000.00000000 Thu, Feb 7 2036 6:28:16.000
originate timestamp: 00000000.00000000 Thu, Feb 7 2036 6:28:16.000
transmit timestamp: ddd2ad75.b4f468c3 Wed, Dec 6 2017 17:55:01.706
filter delay: 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000
filter offset: 0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000
delay 0.00000, dispersion 64.00000
offset 0.000000
If this asks you for a password then enter Control-C (several times) and continue,
otherwise go to the next step.
To generate the root SSH keys and push them across the rack with the default 'welcome1'
password, use the provided script:
[root@bda1c01 bda]# setup-root-ssh -p welcome1
Re-verify by re-running:
[root@bda1c01 bda]# dcli "hostname ; date"
Note that there is some uncertainty as to whether the above is entirely correct. Feedback
via the EIS Support alias would be appreciated.
Using the date output from above, check all If systems are not within a difference of a few
systems and switches are clock seconds, then the installation scripts will fail.
Manually correct any times that are too large for NTP
synchronized and also seeing the NTP to correct automatically to something close that NTP
server correctly. can correct.
If not, reboot each box or switch and monitor the
console boot messages, or check the routing through
the Cisco switch to the NTP server.
Also ntpq -p 127.0.0.1 can be used to verify
the NTP client is running and configuration is correct.
Oracle Internal and Approved Partners Only Page 47 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Use the bdachecknet-cluster script to verify all network connectivity and expected
network services are working properly.
Pipe the output to tee in order to log the output to a file. The output file should be
provided to your Install Co-ordinator to upload to the install tracker so the ACS engineer
can review this prior to coming on-site to do the SW install.
[root@bda1c01 network]# bdachecknet-cluster | tee -a /tmp/bdachecknet_cluster.out
bdachecknet-cluster: do basic sanity checks on /opt/oracle/bda/rack-network.json
and /opt/oracle/bda/cluster-network.json
bdachecknet-cluster: passed
bdachecknet-cluster: checking for rack-expansion.json
bdachecknet-cluster: ping test private infiniband ips (bondib0 40gbs)
bdachecknet-cluster: passed
bdachecknet-cluster: ping test admin ips (eth0 1gbs)
bdachecknet-cluster: passed
bdachecknet-cluster: test client network (eoib) resolve and reverse resolve
bdachecknet-cluster: passed
bdachecknet-cluster: test client name array matches ip array
bdachecknet-cluster: passed
bdachecknet-cluster: ping servers on client network by ip
bdachecknet-cluster: passed
bdachecknet-cluster: test ntp servers
bdachecknet-cluster: passed
bdachecknet-cluster: ping client gateway
bdachecknet-cluster: passed
bdachecknet-cluster: test arp -a
bdachecknet-cluster: passed
bdachecknet-cluster: test vnics for this node
host if status actv primary switch gw
port ping gw vlan
======================= === ====== ==== ===== ==================== ========= ======= ====
bdax72bur09node01 eth8 up no no bdax72bur09sw-ib2 0A-ETH-1 no N/A
bdax72bur09node01 eth9 up yes yes bdax72bur09sw-ib3 0A-ETH-1 yes N/A
1Ping gtw error on host bdax72bur09node01, interface eth8, switch bdax72bur09sw-ib2, port
0A-ETH-1
bdachecknet-cluster: network checks failed
[root@bdacl01 bin]#
If systems are not able to resolve and use DNS names, then the installation scripts will fail.
Manually correct any /etc/resolv.conf files and verify the reverse lookup as well.
Verify the PDU metering units are Enter the metering unit’s static IP address or hostname
accessible from a laptop or system into the browser’s address line. If the network
configuration was successful, the browser displays the
connected to the Customer's management Current Measurement page.
network.
If the PDUs are not accessible then the installation
Use a web browser to log on to the PDU scripts will fail.
metering unit.
Oracle Internal and Approved Partners Only Page 48 of 51 Vn 1.1b Created: 15 Feb 2018
Task Check
Oracle Internal and Approved Partners Only Page 49 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
Set various values:
-> set destination=<ASR_mgr_IP> <<< IP address of ASR Manager.
-> set destination_port=162
-> set level=minor
-> set snmp_version=2c
-> set community_or_username=public
-> set type=snmptrap
/SP/alertmgmt/rules/1
Targets:
Properties:
community_or_username = public
destination = <ASR_mgr_IP> <<< IP address of ASR Manager.
destination_port = 162
email_custom_sender = (none)
email_message_prefix = (none)
event_class_filter = (none)
event_type_filter = (none)
level = minor
snmp_version = 2c
testrule = (Cannot show property)
type = snmptrap
Disconnect the laptop's serial cable from the IB switch's USB to DB9 adapter port, leaving
the USB to DB9 adapter wired in the rack & move to next IB switch.
ACTIVATE THE ASR ASSET
Log into the ASR Manager system as user Via ssh or rlogin.
root.
# asr activate_asset -i <IP-of-SW>
Activate the ASR Asset for the switch’s SP.
During the ASR registration the technical contact for the system(s) will receive an email
with report results.
Confirm that the switch has been activated via asr list_asset. If you omit the -i option you
will see all assets – this may be a very long list.
# asr list_asset -i <IP-of-SW>
IP_ADDRESS HOST_NAME SERIAL_NUMBER ASR PROTOCOL SOURCE PRODUCT_NAME
------------- --------- ------------- ------- -------- ------ -------------
10.172.144.76 MySwitch 1013AK208D Enabled SNMP ILOM Sun Network QDR InfiniBand GW
Switch
Oracle Internal and Approved Partners Only Page 50 of 51 Vn 1.1b Created: 15 Feb 2018
Task Comment Check
HANDOVER
The installer should ensure that their laptop’s Ethernet cable is disconnected and that the
PDU A Ethernet cable is plugged back in.
Hand over to ORACLE ACS Software Engineer (ASE).
• Ensure the file outputs /tmp/bdachecknet_*.out from the bdachecknet_rack,
bdachecknet_cluster and cluster-networksetup scripts are given to the Install Co-
Ordinator to upload to the install tracker for use by ACS.
• If you are installing a Multi-rack at the same time as an BDA rack continue now with
the EIS installation checklist for each additional rack.
• Please note that multi-rack cabling can only be performed after the network
configuration of all racks have been completed – among other prerequisites, refer to the
Multi-Rack Cabling EIS Checklist for details.
Copies of the checklists are available on the EIS web pages or on the EIS-DVD. We recommend that you always check
the web pages for the latest version.
Oracle Internal and Approved Partners Only Page 51 of 51 Vn 1.1b Created: 15 Feb 2018