Opc - TSG Oc3 - 12 - 48

Download as pdf or txt
Download as pdf or txt
You are on page 1of 213

FIBER

WORLD

OPC Troubleshooting Guide for OC-3/OC-12 TBM Rel 13 OC-48 Rel 14.10
Editor: Ross Brydon Date: Aug. 11, 1998 Issue: AD05

Security Notice: The information disclosed herein is property of Nortel or others and is not to be used by or disclosed to unauthorized persons without the written consent of Nortel. The recipient of this document shall respect the security status of the information.

For Nortel Internal Use Only

Issue / Change History


Issue AA01 Date July 18, 1997 Reason Created the PLS book version of the internal trouble shooting guide opctsg_i.aa01 Added problem text to SRT section Updated SRT section. Added information to the CUA section. Added information to the LMR section. Editing changes. Added the stpprov to Chapter 16 Included TL1 and rmopcld sections. Closed all nal actions and incorporated the nal comments. Update to the section Atools and 06brm Included Chapter 18 and some other nal changes to the document Approved version of this document. Add CUA changes Add connection services updates Change wpit.24 Author Wayne Pitman

AB01

July 25, 1997

heald.1

Alex Balaban

AB02 AB03

July 29, 1997 August 5, 1997

heald.3 heald.5

Vladimir Milutinovic Greg Lamarre

AB04

August 7, 1997

olga.8

Olga BeskidWojcicka Colette Heald Chris Todd

AB05 AB06

August 11, 1997 Sept. 12, 1997

heald.6 CJT.21

AB07

Sept. 17, 1997

esami.23

Eric Sami

AB08

Sept, 28, 1997

esami.24

Eric Sami

AB09

Nov, 11, 1997

rbrydon.1

Ross Brydon

AB10

Nov. 25, 1997

rbrydon.2

Ross Brydon

AC01 AC02

May 26, 1998 June 22, 1998

CMRAGHU.8 BROOMH.1

Raghunath Mohanrao Hugh Broomeld

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Issue AC03 AC04 AC05 AC06 AC07

Date July 14, 1998 July 14, 1998 July 14, 1998 July 20, 1998 July 21, 1998

Reason Add ESWD to TL1 section Add ESWD to TL1 section Add ESWD to TL1 section Add TL1 over 7 layer problems Updating RHE document from changes in external review. Final formatting revisions for Rel 14.10 OPC TSG. New stream for OC12 Rel 13 Troubleshooting Guide. OPC Support for InService Timeslot Rollowver on OC12 OC12 Rel 13 matched node provisioning. Final formatting revisions for OC12 Rel 13 TSG and include PSU section from Jim Forbes. Multiple updates

Change VEENA.7 VEENA.8 VEENA.9 SHAKTI.1 VEENA.10

Author Madhuri Veena Madhuri Veena Madhuri Veena Shakti Thakur Madhuri Veena

AC08

July 23, 1998

RBRYDON.4

Ross Brydon

AD01

July 28 1998

RBRYDON.5

Ross Brydon

AD02

August 6 1998

GREGG.95

Greg Gnaedinger

AD03

August 6 1998

CXINP.13

Cindy Pu

AD04

August 11 1998

RBRYDON.6

Ross Brydon

AD05

Augus 28, 1998

WPIT.28

Wayne Pitman

TSG Versions and the corresponding stream and OPC release.


Stream OPC 15, 16 OPC 17, 18, 20, 22, 23 OPC 24 OPC30-36 OC-48 Release Rel 10 Rel 11 Rel 13 Rel 14 OC-12 Release TSG Version 10.0 11.0 13.0 14.0

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13)) Issue: AD04

For Nortel Internal Use Only

Stream OPC38 OPC45

OC-48 Release Rel 14.10

OC-12 Release Rel 13.00

TSG Version 14.10 14.10 / 13.00

Document Editor: Ross Brydon 1B43 Document Location: This document is stored in pls fwpdoc under the document name opctsg_i, uncontrolled copies and copies of this document for pervious releases can be found on local le servers under /opcserv/common/operations/dept_docs/OPC_tsg. Document Approval: This is an internal document updated by the designers which follows the informal process, therefore no approval is necessary. The updated sections are cua, tl1, stp, and Atools.. Purpose of Document The OPC Troubleshooting Guide, is intended as a means of assistance with solving common technical problems, which may arise during the operation of the OPC. It also lists potential defect PRSs which are pertinent to the particular release. Each chapter of the document is written by a relevant OPC designer and deals with specic aspects of OPC operation. The chapter is updated if necessary, with each OPC software release. THIS DOCUMENT IS NOT INTENDED FOR CUSTOMER USE. Summary The OPC Troubleshooting Guide represents the amalgamation of common problems and work-arounds encountered through the course of testing and working with OPC36, OPC38, and OPC45. This guide is not meant to be an exhaustive representation of all problems but only those which have occurred most frequently during OPC operation and use. All problems outlined in this document are accompanied by one or more fault reasons and solutions. Generally the solutions are embedded in the reason text but if a common workaround is available then references to that work-around will be used instead. Throughout the document, there are solutions which refer to Contact the appropriate OPC support authority, in these cases gather as much data as possible and contact the OPC Customer Support Staff. Target Audience The OPC Troubleshooting Guide is meant for general use by the TransportNode Product Support Teams. It is assumed that the reader is familiar with the basic operation of the OPC, the UNIX environment and the OPC designer test tools. THIS DOCUMENT IS NOT INTENDED FOR CUSTOMER USE. Document Highlights This document contains:

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Standard TOC- lists all problems and solutions, ordered as they are presented in the chapters The chapters describe common OPC problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description. All Problem and Solution Titles are formatted in the following manner: <paragraph number> <subject >: <problem or solution text> where: paragraph number= unique numerical identier of all problems and solutions in this document subject = basic category of the problem or solutions, valid subjects are:

BRM Ethernet MIB OSI STP USM

CAM GUI NEA OWS SWI VCP

CNET H/W NUM ROA TAPE VT100

CUA LAPB OAM SCF TBOS X.25

DISK LAS ODS SCM TELNET X.3

DLM MC68302 OPC SRT TL1 XNTP

problem/solution = a brief single sentence description of the problem or solution

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13)) Issue: AD04

For Nortel Internal Use Only

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

1 - OPC Login and Start Up


1.1.1 VT100: Login Prompt Missing ........................................................................ 1.1.1-1 1.1.2 VT100: Login Wont Respond......................................................................... 1.1.2-1 1.1.3 VT100: Removing Port B Doesnt Automatically Logout. ............................. 1.1.3-2 1.1.4 VT100: Password Automatically Rejected ...................................................... 1.1.4-2 1.1.5 USM: Cannot Login......................................................................................... 1.1.5-2 1.1.6 USM: Root Password Not Valid ...................................................................... 1.1.6-3 1.1.7 USM: opcui: command not found.................................................................... 1.1.7-4 1.1.8 USM: UI is Garbled ......................................................................................... 1.1.8-4 1.1.9 USM: UI Wont Start ....................................................................................... 1.1.9-4 1.1.10 USM: Critical System Resource Unavailable .............................................. 1.1.10-6 1.1.11 USM: Excessively Slow............................................................................... 1.1.11-6 1.1.12 USM: Keeps Closing.................................................................................... 1.1.12-7 1.1.13 USM: Tools Wont Open.............................................................................. 1.1.13-7 1.1.14 USM: Tools Unavailable .............................................................................. 1.1.14-8 1.1.15 USM: There are no toolsets dened for this user......................................... 1.1.15-8 1.1.16 USM: Failed to retrieve user prole............................................................. 1.1.16-9 1.1.17 Telnet: Telnet Connection not running......................................................... 1.1.17-9 1.1.18 GUI: Login Window Not Available ........................................................... 1.1.18-10 1.1.19 GUI: Text doesnt t in Window................................................................ 1.1.19-10 1.2.1 VT100: Reconguring Port B to Terminal..................................................... 1.2.1-11 1.2.2 VT100: Port B Cable Pinouts......................................................................... 1.2.2-12 1.2.3 VT100: Port B settings................................................................................... 1.2.3-12 1.2.4 USM: Using the wall and write commands .............................................. 1.2.4-13 1.2.5 GUI: Setting Up An Xterminal for an OPC ................................................... 1.2.5-14 1.2.6 EtherNet: OPC EtherNet Connector Pinout................................................... 1.2.6-15

2 - OPC Base Operations


2.1.1 OAM: NE Indicates an OPC OAM S/W Failure ............................................. 2.1.1-1 2.1.2 OWS: Both Primary and Backup OPCs are Active ......................................... 2.1.2-2 2.1.3 OWS: Primary OPC is Inactive........................................................................ 2.1.3-2 2.1.4 OWS: Backup OPC Wont Go Active.............................................................. 2.1.4-3 2.1.5 ODS: Data Synchronization Fails .................................................................... 2.1.5-3 2.1.6 ODS: Want to Data Sync from Backup to Primary.......................................... 2.1.6-4 2.1.7 CAM: Associations are Down or are Unstable ................................................ 2.1.7-4 2.1.8 OPC: OPC is Not Communicating................................................................... 2.1.8-5 2.1.9 OPC: OPCCLEAN is Not Running ................................................................. 2.1.9-6 2.1.10 OPC: OPC is Continuously Rebooting ........................................................ 2.1.10-6 2.1.11 OWS: OWS_SWACT Doesnt Work ........................................................... 2.1.11-7 2.2.1 OPC: Booting from Tape ................................................................................. 2.2.1-8 2.2.2 OSI: Reconstructing an OPCs Serial Number ................................................ 2.2.2-9

3 - OPC Hardware
3.1.1 H/W: ELAN Fail is Lit..................................................................................... 3.1.1-1 3.1.2 H/W: CNET Fail .............................................................................................. 3.1.2-1 3.1.3 H/W: Active is not lit ....................................................................................... 3.1.3-2 3.1.4 H/W: Unit Fail is lit.......................................................................................... 3.1.4-2 3.1.5 TAPE: Amber Light is on................................................................................. 3.1.5-3 3.1.6 TAPE: Amber Light is Flashing Rapidly ......................................................... 3.1.6-3 3.1.7 TAPE: Green Light is on.................................................................................. 3.1.7-3 3.1.8 TAPE: Green Light is Flashing Slowly............................................................ 3.1.8-4 3.1.9 TAPE: Green Light is Flashing Slowly and the Amber Light is on................. 3.1.9-4 3.1.10 TAPE: Green Light is Flashing Rapidly ...................................................... 3.1.10-4 3.1.11 TAPE: Tape Wont Eject .............................................................................. 3.1.11-5

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) -1 Issue: AD04

For Nortel Internal Use Only


3.1.12 TAPE: RBNCLEAN is Not Running ........................................................... 3.1.12-5 3.1.13 TAPE: Tape Drive Cleaning Alarm.............................................................. 3.1.13-5 3.1.14 BAD DISK: SCANDISK/KLS is Not Running ........................................... 3.1.14-5 3.1.15 BAD DISK: Disk Bad Media Alarm............................................................ 3.1.15-6 3.1.16 HARDRIVE INDICATOR LIGHT: ashing ............................................... 3.1.16-6 3.2.1 CNET: Using tstatc to evaluate CNET............................................................. 3.2.1-6

4 - NE Software Download
4.1.1 DLM: Reboot/Load Manager is not Downloading an NE ............................... 4.1.2 DLM: Reboot/Load Manager Wont Start ....................................................... 4.1.3 DLM: Reboot/Load Manager is not displaying any NEs................................. 4.1.4 DLM: NE is Continuously Rebooting.............................................................. 4.1.5 DLM: NE is Frozen Immediately After a Reboot............................................ 4.1.6 DLM: Reboot/Load Manager indicates Fail .................................................... 4.1.7 DLM: Load Processor Activity is Disabled ..................................................... 4.1.8 DLM: NE shelf processor rmware load is Corrupt or Incomplete................. 4.1.9 DLM: NE shelf processor application load is Corrupt or Incomplete ............. 4.2.1 DLM: NE serial number is corrupted............................................................... 4.2.2 DLM: Removing Release From the Backup OPC ........................................... 4.1.1-1 4.1.2-3 4.1.3-4 4.1.4-4 4.1.5-5 4.1.6-5 4.1.7-6 4.1.8-6 4.1.9-7 4.2.1-7 4.2.2-8 5.1.1-1 5.1.2-1 5.1.3-1 5.1.4-2 5.1.5-3 5.1.6-3 6.1.1-1 6.1.2-2 6.1.3-3 6.1.4-4 7.1.1-1 7.1.2-2 7.1.3-2 7.2.1-3 7.2.2-4 8.1.1-1 8.1.2-1 8.1.3-2 8.1.4-2 8.1.5-2 8.1.6-2 8.2.1-3 8.2.2-3 8.2.3-3 8.2.4-3

5 - Installations & Upgrades


5.1.1 SWI: Backup OPC is still active after start_backout completes ...................... 5.1.2 SWI: OPC Not Functioning After Installation................................................. 5.1.3 SWI: Installation is Failing During Validation................................................ 5.1.4 SWI: Installation is Failing During Transfer.................................................... 5.1.5 NUM: Both Primary and Backup OPCs are Active ......................................... 5.1.6 RBN: Disk 95% Full Alarm .............................................................................

6 - Backup/Restore Manager
6.1.1 BRM: NE Backup is Not Working................................................................... 6.1.2 BRM: NE Restore is Not Working................................................................... 6.1.3 BRM: Backup OPC is Handling the NE Database Requests ........................... 6.1.4 BRM: NE Database Backups Incomplete or Corrupted...................................

7 - OPC Save And Restore


7.1.1 SRT: Save to Tape is Failing............................................................................. 7.1.2 SRT: Restore from Tape is Failing ................................................................... 7.1.3 SRT: Critical Files are Missing After a Restore............................................... 7.2.1 SRT: Files saved by Save and Restore Tool ..................................................... 7.2.2 SRT: Unable to Resore OPC Data from Disk ..................................................

8 - Commissioning Manager
8.1.1 SCF: Cant Enable Clear Commissioning Button............................................ 8.1.2 SCF: Cant Commission a New NE ................................................................. 8.1.3 SCF: Error Message - This OPC contains invalid data .................................... 8.1.4 ODS: Data Synchronization is Failing ............................................................. 8.1.5 SCF: NE Release is Set to NONE.................................................................... 8.1.6 SCF: Cannot Edit the Commissioned NE ........................................................ 8.2.1 SCF: Replacing A Backup OPC....................................................................... 8.2.2 SCF: Replacing A Primary OPC...................................................................... 8.2.3 SCF: Clearing Commissioning Data ................................................................ 8.2.4 SCF: Dumping All Commissioning Data to File .............................................

9 - Network Surveillance
9.1.1 LAS: Network Surv Tools Display ? Symbol .................................................. 9.1.1-1

-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

9.1.2 LAS: OPC Alarm View Doesnt Match NE Alarm View ................................ 9.1.3 LAS: CMT Status Line Doesnt Match the NE Alarm Banner ....................... 9.1.4 LAS: Network Surv Tools Dont Display Newly Added NEs or NE Names .. 9.1.5 LAS: Event Browser Filter Settings Changed.................................................. 9.1.6 LAS: Alarm Monitor Isnt Displaying All Alarms .......................................... 9.1.7 RBN: RBNDISK is Not Running .................................................................... 9.1.8 RBN: Disk 95% Full Alarm............................................................................ 9.2.1 LAS: Dumping Contents of the LAS Database ...............................................

9.1.2-1 9.1.3-2 9.1.4-2 9.1.5-3 9.1.6-3 9.1.7-3 9.1.8-4 9.2.1-4

10 - NE Login
10.1.1 NEA: NEs are missing ................................................................................. 10.1.2 NEA: More NEs Displayed Than Commissioned ....................................... 10.1.3 NEA: Duplicate NEs Error Message ........................................................... 10.1.4 NEA: NE Access is very slow...................................................................... 10.1.5 NEA: NE Access is not available................................................................. 10.1.6 NEA: NE Login Manager will not Start ...................................................... 10.1.7 NEA: Cannot Auto-Login to NE from OPC ................................................ 10.2.1 NEA: Using nelogin to Access NEs ............................................................ 10.1.1-1 10.1.2-1 10.1.3-1 10.1.4-1 10.1.5-2 10.1.6-3 10.1.7-3 10.2.1-4

11 - Remote Telemetry
11.1.1 TBOS: No NE Listed in Remote Telemetry Tool ........................................ 11.1.1-1 11.1.2 TBOS: NEs Missing on Remote Telemetry Tool......................................... 11.1.2-1 11.1.3 TBOS: TBOS Display Screen IS Frozen ..................................................... 11.1.3-1 11.1.4 TBOS: Serial Telemetry for Remote Display is Incorrect ........................... 11.1.4-2 11.1.5 TBOS: Parallel Telemetry for Remote Display is Incorrect ........................ 11.1.5-2 11.1.6 TBOS: Data Selector for Monitor Display is Disabled................................ 11.1.6-3 11.1.7 TBOS: Remote Telemetry Tool Wont Open on Active OPC...................... 11.1.7-3 11.1.8 TBOS: Source NE and Source Display Unknown ....................................... 11.1.8-4 11.1.9 TBOS: Maximum Number of Display Mappings Reached ......................... 11.1.9-4 11.1.10 TBOS: Position ID Field is Invalid .......................................................... 11.1.10-4 11.1.11 TBOS: Monitored Source Field is Invalid ............................................... 11.1.11-5 11.1.12 TBOS: Monitored Source Name Doesnt Correspond to Source ID ....... 11.1.12-5 11.1.13 TBOS: Display Field is Empty or Invalid ................................................ 11.1.13-5 11.1.14 TBOS: Cannot Remove Display Mapping............................................... 11.1.14-5 11.1.15 TBOS: System Generated Time Out........................................................ 11.1.15-6 11.1.16 TBOS: Association is Down Between OPC and NE ............................... 11.1.16-6 11.1.17 TBOS: Display is Already Mapped ......................................................... 11.1.17-6 11.1.18 TBOS: Maximum Number of Mappings Exceeded................................. 11.1.18-7 11.1.19 TBOS: Display Mapped to its Source NE is Not Allowed ...................... 11.1.19-7

12 - Remote OPC Login


12.1.1 ROA: Remote OPC Login is Not Available................................................. 12.1.2 ROA: OPCs are missing............................................................................... 12.1.3 ROA: More OPCs Displayed Than Commissioned ..................................... 12.1.4 ROA: Duplicate OPCs Error Message ......................................................... 12.1.5 ROA: OPC Access is very slow ................................................................... 12.1.6 ROA: Remote OPC Login to other OPCs Not Available............................. 12.2.1 ROA: Using nelogin to Access OPCs .......................................................... 12.1.1-1 12.1.2-1 12.1.3-1 12.1.4-2 12.1.5-2 12.1.6-2 12.2.1-3 13.1.1-1 13.1.2-1 13.1.3-2 13.1.4-2 13.1.5-3

13 - Centralized Security
13.1.1 CUA: Users Cannot Login after Upgrade .................................................... 13.1.2 CUA: Users Cannot Login ........................................................................... 13.1.3 CUA: NE User Class Different Than Indicated ........................................... 13.1.4 CUA: Userid is Disabled, But User Can Still Login.................................... 13.1.5 CUA: Userid is Disabled, But User Gets Wrong Error Message.................

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) -3 Issue: AD04

For Nortel Internal Use Only


13.1.6 CUA: Cannot Login to Newly Commissioned NEs..................................... 13.1.7 CUA: Cannot Login to NEs Which Have Been Restarted or Rebooted ...... 13.1.8 CUA: Userid shows Assigned/Expired, But the Password Was Changed ... 13.1.9 CUA: Root Password Was Forgotten ........................................................... 13.2.1 CUA: Verifying the Contents of the Password File...................................... 13.2.2 CUA: Verifying the Contents of the Group File.......................................... 13.2.3 CUA: Recovering the Root Password ......................................................... 13.1.6-3 13.1.7-4 13.1.8-4 13.1.9-4 13.2.1-4 13.2.2-6 13.2.3-7

14 - TL-1 and X.25


14.1.1 TL1: TID/SID Name Rejected ..................................................................... 14.1.1-1 14.1.2 TL1: RETRIEVE PM Responds with DENY .............................................. 14.1.2-1 14.1.3 TL1: Response Messages are Lost or Not Complete ................................... 14.1.3-1 14.1.4 TL1: Missing PM Counts............................................................................. 14.1.4-2 14.1.5 TL1: Call Accepted Logs in the Event Browser .......................................... 14.1.5-2 14.1.6 TL1: Call Terminated Logs in the Event Browser ....................................... 14.1.6-2 14.1.7 TL1: Call Rejected Logs in the Event Browser............................................ 14.1.7-3 14.1.8 TL1: Port not congured for X.25 Logs in the Event Browser.................... 14.1.8-3 14.1.9 TL1: Cannot Establish Connection .............................................................. 14.1.9-3 14.1.10 TL1: Commands Arent Being Performed............................................... 14.1.10-5 14.1.11 TL1: Autonomous Messaging Isnt Working........................................... 14.1.11-5 14.1.12 LAPB: LAPB is Dropping ....................................................................... 14.1.12-7 14.1.13 LAPB: LAPB Problems ........................................................................... 14.1.13-7 14.1.14 MC68302: MC68302 Problems ............................................................... 14.1.14-8 14.1.15 X.3PAD: X.3 PAD Cant Establish Connection....................................... 14.1.15-9 14.2.1 TL1: Setting Up X.25, VCP and X.3PAD.................................................. 14.2.1-10 14.2.2 TL1: Determining Inhibited PM Counts .................................................... 14.2.2-12 14.2.3 X.25: Default Settings Stored in X25INIT_TEMPLATE File ................... 14.2.3-12 14.2.4 X.3: Default Settings Stored in x3cong File ............................................ 14.2.4-13 14.2.5 VCP: VCP PID Defaults ............................................................................ 14.2.5-14 14.2.6 TL1 Interface Router Service : Cannot Establish Connection ................... 14.2.6-14 14.2.7 TL1 conguartion for TL1 Over TCP/IP : Error during Conguring and deleting the conguration.......................................................................................................... 14.2.7-16 14.2.8 STA-ESWD is rejected and ESWD cannot be started................................ 14.2.8-16 14.2.9 STA - ESWD accepted, ESWD initiated and then aborted ..................... 14.2.9-19 14.2.10 CANC - ESWD rejected ....................................................................... 14.2.10-20 14.2.11 Association cannot be established over 7 Layers .................................. 14.2.11-21 14.2.12 Association Established through ACT-USER and then dropped .......... 14.2.12-23 14.2.13 Association not Dropped by CANC-USER ........................................... 14.2.13-24

15 - Conguration Manager
15.1.1 SCM: Cannot Save Conguration Data to NEs ........................................... 15.1.2 SCM: Cannot Send Conguration Data to NE............................................. 15.1.3 SCM: Cannot Remove a Conguration........................................................ 15.1.4 SCM: Conguration Manager Doesnt Start................................................ 15.1.5 SCM: Scheduled Conguration Audit Fails................................................. 15.1.6 SCM: Scheduled Conguration Audit Mismatch ........................................ 15.2.1 SCM: Retrieving Conguration and Connection Data................................. 15.1.1-1 15.1.2-1 15.1.3-1 15.1.4-2 15.1.5-3 15.1.6-3 15.2.1-4 16.1.1-1 16.1.2-1 16.1.3-2 16.1.4-3 16.1.5-3

16 - Connection Manager
16.1.1 STP: Cannot Send Connection Data to NEs ................................................ 16.1.2 STP: Connection Manager Doesnt Start ..................................................... 16.1.3 STP: Unable to provision connections on an active Primary OPC .............. 16.1.4 STP: Unable to provision connections on an active Backup OPC ............... 16.1.5 STP: Connection Audit Fails........................................................................

-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

16.1.6 STP: Audit Mismatch................................................................................... 16.1.6-4 16.1.7 STP: No option to correct an audit mismatch .............................................. 16.1.7-5 16.1.8 STP: Cannot Add a Matched Node Connection .......................................... 16.1.8-5 16.1.9 STP: Unable to add a nodal cross-connect................................................... 16.1.9-6 16.1.10 STP: Unable to delete connections .......................................................... 16.1.10-7 16.1.11 STP: Unable to provision a DCP connection........................................... 16.1.11-7 16.1.12 STP: Bandwidth unavailable.................................................................... 16.1.12-8 16.1.13 STP: Uni-directional Nodal Cross-Connects present on TBM OC-12 .... 16.1.13-8 16.2.1 STP: Retrieving Conguration and Connection Data .................................. 16.2.1-8 16.2.2 STP: Viewing the Protection State of a Matched Node Ring ...................... 16.2.2-9 16.2.3 STP: Viewing Mismatched Connection Data .............................................. 16.2.3-9 16.2.4 STP: Correcting individual mismatches..................................................... 16.2.4-10

17 - PM Collection
17.1.1 PM Counts not reported in TL1 ................................................................... 17.1.2 TL1 RTRV-PM command retrieves no data ................................................. 17.1.3 Daily counts on OPC do not match daily counts on the NE. ....................... 17.1.4 PM Collection exceeded 15 minutes............................................................ 17.1.1-1 17.1.2-2 17.1.3-3 17.1.4-4

18 - Network Time Protocol


18.1.1 XNTP: The OPC has entered freerun mode................................................. 18.1.1-1

19 - Protection Manager / 1:N


19.1.1 PSU: Cant Display Conguration............................................................... 19.1.1-1 19.1.2 PSU: Dumping All Protection Data to File.................................................. 19.1.2-1

20 - OPC Date / OPC Shutdown


20.1.1 SSD: Time Zone Missing from Date UI ...................................................... 20.1.1-1

A - Complete set of OPC Tools

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) -5 Issue: AD04

For Nortel Internal Use Only

-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

OPC Login and Start Up

1.1 Problem Description


The following sections describe common login and user interface problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description.

1.1.1 VT100: Login Prompt Missing After an OPC shutdown the VT100 connected to Port B shows all the proper OPC diagnostics but the login prompt is not available. Reason-1 After an OPC shutdown, the Port B is always started off as a terminal to display the OPC diagnostic results. After the diagnostics, Port B is initialized to the value indicated by the PortConguration tool. Use this tool to query the Port B conguration. See VT100: Reconguring Port B to Terminal on page 1-11. Reason-2 Using the diskinit tape to boot the OPC causes the Port B to initialize as a terminal with full modem line control. For this reason, a full RS-232 Null Modem cable is required. See VT100: Port B Cable Pinouts on page 1-12.

1.1.2 VT100: Login Wont Respond During normal operation, the Port B connection suddenly stops working. Reason-1 Port B has been changed to something other than terminal. Use the PortConguration tool to query the Port B conguration. See VT100: Reconguring Port B to Terminal on page 1-11. Reason-2 VT100 settings changed. The OPC Port B cannot auto-baud down lower than 1200 baud. See VT100: Port B settings on page 1-12.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-1 Issue: AD04

Chapter

OPC Login and Start Up

For Nortel Internal Use Only

Reason-3 VT100 settings changed. The OPC Port B cannot auto-baud up from a lower baud rate. Remove and re-connect the Port B cable to drop the terminal connection. If removing the cable does not correct the problem then re-congure the Port B. See VT100: Reconguring Port B to Terminal on page 1-11. Reason-4 Port B cable is broken. Test the Port B cable. See VT100: Port B Cable Pinouts on page 1-12. Reason-5 Port B is frozen. This is extremely rare and can be xed by unconguring Port B and re-conguring Port B as terminal.

1.1.3 VT100: Removing Port B Doesnt Automatically Logout. Removing the Port B cable does not logout the session. Reason-1 The cable was disconnected at the terminal end. Always disconnect the cable from the OPC end. The OPC detects cable removal by an open circuit on the DTR. If the cable used has an internal loopback then the OPC will not know if the cable is missing. Reason-2 Port B is frozen. This is extremely rare and can be xed by unconguring Port B and re-conguring Port B as terminal.

1.1.4 VT100: Password Automatically Rejected When attempting to login, the userid is accepted and the password is rejected before any characters are entered. Reason-1 VT100 settings changed. The new line option on the VT100 is enabled. Disable the new line option. See VT100: Port B settings on page 1-12.

1.1.5 USM: Cannot Login When logging into the OPC the userid and password are rejected.

1-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Login and Start Up

Reason-1 Userid does not exist or has been disabled. This can occur if the Centralized User Administration tool is in use by another user or if a restore from tape has been performed. Login to the OPC as admin and use the Centralized User Administration tool to add/enable the userid. See SRT: Files saved by Save and Restore Tool on page 7-3. See USM: Cannot Login on page 1-2. See USM: Root Password Not Valid on page 1-3. Reason-2 Password is not correct. This can occur if the Centralized User Administration or Password tools are in use by another user or if a restore from tape has been performed. Use the old user password or login to the OPC as admin and use the Centralized User Administration tool to change the password. See SRT: Files saved by Save and Restore Tool on page 7-3. See USM: Cannot Login on page 1-2. See USM: Root Password Not Valid on page 1-3.

Reason-3 Too many users already logged in or too many processes (tools) are running. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13.

1.1.6 USM: Root Password Not Valid When logging into the OPC as root, the password is rejected. Reason-1 Password is not correct. This can occur if a restore from tape has been performed. Use the old root password. See SRT: Files saved by Save and Restore Tool on page 7-3. Reason-2 Password is not correct. This can occur if the Password tool is in use by another user or if the UNIX passwd command is use. Ask the network administrator for the new password.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-3 Issue: AD04

Chapter

OPC Login and Start Up

For Nortel Internal Use Only

Reason-3 THIS IS NOT INTENDED FOR THE FIELD APPLICATION Forgot the valid password. Use the diskinit tape to boot from tape, mount the harddrive and edit the password le directly. The passwd le is critical to the operation of the OPC, inadvertent changes may result in an OPC corruption.

1.1.7 USM: opcui: command not found As a root user, this error message opcui: command not found is usually displayed when executing opcui at an OPC prompt. Reason-1 The opcui command cannot be performed from within a UNIX shell already running under a user session (CMT and GUI). Reason-2 The opcui command cannot be performed when the OPC is in the process of being upgraded. Reason-3 The appropriate alias has not yet been set. To determine if the alias has been set, enter alias | grep opcui at the command line. Assure the.login le contains the following line: alias opcui unsetenv DISPLAY;/iws/usm/usmstart The.login le is critical to the operation of the OPC, inadvertent changes may result in an OPC corruption.

1.1.8 USM: UI is Garbled User Session was working ne, later the screen columns and rows are garbled. The data is correct but displayed in the wrong places. Reason-1 The terminal session has been corrupted. Issue the /usr/bin/reset tool to reset the terminal session.

1.1.9 USM: UI Wont Start Managed to login in to the OPC but the User Session Manager will not start.

1-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Login and Start Up

Reason-1 Another user session manager cannot be started from within a UNIX shell already running under a user session (CMT and GUI). Reason-2 User has logged in too soon after an OPC power up. Wait a few minutes and try to login again. Reason-3 Too many users already logged in or too many processes (tools) are running. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13. Reason-4 Process is growing out of control. Perform a view /var/log/syslog and enter? LIMIT_REACHED to search for process limits exceeded. The process nir may be consuming excessive amounts of CPU time. Kill the nir process if necessary. Reason-5 Possible corruption of the /etc/group le. The group le contains information correlating group level privileges and associated userids. Assure all userids correspond to the proper groups and assure all groups exist. The group le is critical to the operation of the OPC, inadvertent changes may result in an OPC corruption. See CUA: Verifying the Contents of the Group File on page 13-6. Reason-6 The owner of the /home/<userid> has been changed or incorrectly set. The owner should be the same as the <userid>. Example list of the userid maint: opc> ll /home/maint drwxrwxr-x 2 maint slat 1024 May 27 15:20 maint Privilege Owner Group Directory

The maint userid has a directory named maint. it is owned by maint and is part of the slat group. Reason-7 Possible corruption of the /etc/passwd le. The passwd le contains information correlating startup shells and privileges. Assure all userids correspond to the proper shells and the privileges are set correctly. The passwd le is critical to the operation of the OPC, inadvertent changes may result in an OPC corruption.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-5 Issue: AD04

Chapter

OPC Login and Start Up

For Nortel Internal Use Only

See CUA: Verifying the Contents of the Password File on page 13-4. Reason-8 Possibly the CUA database has been corrupted and the available toolsets are missing. Use the /iws/opcdb/opcdbtst tool to determine the state of the OPC. This failure sometimes manifests itself by displaying the Critical System Resource Unavailable error dialogue when trying to start the USM session. Reason-9 OPC has failed the sanity check.

1.1.10 USM: Critical System Resource Unavailable The error message Critical System Resource Unavailable usually indicates that an MSR that an application needs is not available. This error message is only displayed when the particular tool is invoked. Reason-1 OPC is not commissioned. Open the Commissioning Manager and commission the system. Reason-2 An MSR has been manually busied. Open the drmstat tool. Check for MSRs which are in the manualbusy:systemterminate state. Determine why the MSR has been busied and return to service the MSR if possible. Reason-3 The OPC is on the verge of performing an automatic or manual shutdown. During the shutdown process all MSRs are brought down. Reason-4 The CUA database has been corrupted and the following error message is displayed, Information Failed to retrieve user prole information due to the unavailability of a critical system resource See USM: Failed to retrieve user prole on page 1-9. See USM: There are no toolsets dened for this user on page 1-8.

1.1.11 USM: Excessively Slow The USM is running but everything is very slow.

1-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Login and Start Up

Reason-1 Too many users already logged in or too many processes (tools) are running. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13. Reason-2 A process is growing out of control. In some rare cases a process is not operating properly and will start consuming increasing amounts of CPU time. To determine if a process is growing, enter ps -ef | more and record the highest CPU times and the associated processes. Repeat the process a number of times. If a process seems to be growing, contact the appropriate OPC support authority.

1.1.12 USM: Keeps Closing The USM will start up properly and OPC will suddenly close. Reason-1 The OPC is being shutdown by another user. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13. Reason-2 A critical MSR has caused the OPC to shutdown. Reason-3 The le system is corrupted. Reason-4 The hard disk is corrupted.

1.1.13 USM: Tools Wont Open The UI starts up properly but the selected tools wont open. Reason-1 The tool is already open by another user. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-7 Issue: AD04

Chapter

OPC Login and Start Up

For Nortel Internal Use Only

Reason-2 The tool is hung or another user cannot close the tool. Determine the subsystem name of the tool. Login to the OPC as root. Locate the PID of the lct or xt interface of the subsystem and kill it.

1.1.14 USM: Tools Unavailable The UI starts up properly but some or all tools are missing. Reason-1 The user class does not support the tools expected. Login as a user that supports the desired tools or login as admin user and open the Centralized User Administration tool to view the toolsets of the user and group. Reason-2 Possible corruption of the /etc/group le. See USM: UI Wont Start on page 1-4. Reason-3 The Centralized User Administration database is corrupted. Rebuilding the CUA database can be performed but it will result in a complete loss of all new userids and groups and cause all default users to revert to their default passwords. The OPC database is critical to the operation of the OPC, inadvertent changes may result in an OPC corruption. Contact the appropriate OPC support authority Reason-4 Two or more TL1 sessions have dropped simultaneously. If more than 1 TL1 sessions are running on the OPC and those sessions drop for whatever reason then there is a 25% chance that the database semaphore will be killed resulting in unstable database accesses. View the /var/log/syslog le. Move to the end of the le by typing G and search for the opcdb message by typing? opcdb. Around the time the opcdb error is detected, if a LAPB is DOWN message is seen then the there is high probability that the TL1 dropping problem has killed off the database semaphore. Re-initialize the opcdb MSR or reboot the OPC. See LAPB: LAPB is Dropping on page 14-7

1.1.15 USM: There are no toolsets dened for this user The error message Information There are no toolsets dened for this user usually indicates that the Centralized User Administration database has been improperly reset.

1-8 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Login and Start Up

Reason-1 During a database reset the Centralized User Administration database was corrupted. Reason-2 Two or more TL1 sessions have dropped simultaneously. If more than 1 TL1 sessions are running on the OPC and those sessions drop for whatever reason then there is a 25% chance that the database semaphore will be killed resulting in unstable database accesses. View the /var/log/syslog le. Move to the end of the le by typing G and search for the opcdb message by typing? opcdb. Around the time the opcdb error is detected, if a LAPB is DOWN message is seen then the there is high probability that the TL1 dropping problem has killed off the database semaphore. Re-initialize the opcdb MSR or reboot the OPC. See LAPB: LAPB is Dropping on page 14-7

1.1.16 USM: Failed to retrieve user prole The error message Information Failed to retrieve user prole information due to the unavailability of a critical system resource usually indicates that the Centralized User Administration database has been improperly reset. Reason-1 During a database reset the Centralized User Administration database was corrupted. Reason-2 Two or more TL1 sessions have dropped simultaneously. If more than 1 TL1 sessions are running on the OPC and those sessions drop for whatever reason then there is a 25% chance that the database semaphore will be killed resulting in unstable database accesses. View the /var/log/syslog le. Move to the end of the le by typing G and search for the opcdb message by typing? opcdb. Around the time the opcdb error is detected, if a LAPB is DOWN message is seen then the there is high probability that the TL1 dropping problem has killed off the database semaphore. Re-initialize the opcdb MSR or reboot the OPC. See LAPB: LAPB is Dropping on page 14-7

1.1.17 Telnet: Telnet Connection not running Telnet to the OPC is not working or is working too slowly. Reason-1 The EtherNet port is not turned on. Assure that the hosts, netlinkrc and rc les are correctly edited. If not, use the /iws/lan/ether_admin tool to initialize/enable the

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-9 Issue: AD04

Chapter

OPC Login and Start Up

For Nortel Internal Use Only

port. After using the ether_admin tool, you may be prompted to reboot or shutdown the OPC, NEVER enter reboot or shutdown at the OPC prompt, instead use the OPC Shutdown tool from the USM. Reason-2 The EtherNet connection is very noisy or congested. Use the ping command to determine the health of the EtherNet connection. In some cases, a ping with 1000 byte packets may be necessary to expose weaknesses in the LAN connection. If packets are being lost or are very slow to echo back then the LAN is very noisy and/or congested. opc> ping primary 1000 3 PING primary: 1000 byte packets 1000 bytes from 47.105.7.9: icmp_seq=0. time=17. ms 1000 bytes from 47.105.7.9: icmp_seq=1. time=10. ms 1000 bytes from 47.105.7.9: icmp_seq=2. time=15. ms

Reason-3 The EtherNet connection is down. Check the EtherNet drop and connector and assure that it is properly connected. Contact the local LAN administrator to see if there any problems with the LAN. Use the ping or tstatc command to determine if EtherNet connectivity is established.

1.1.18 GUI: Login Window Not Available The login window for the GUI session will not appear on the xterminal. Reason-1 The EtherNet connection is not working properly. See Telnet: Telnet Connection not running on page 1-9. Reason-2 The xterminal is not set up properly. See GUI: Setting Up An Xterminal for an OPC on page 1-14.

1.1.19 GUI: Text doesnt t in Window Sentences, titles and names are too large or too small for the GUI window allocated.

1-10 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Login and Start Up

Reason-1 The fonts used by the GUI are dependent on the font server specied in the conguration parameters of the workstation or xterminal. For a workstation, it is normal for the fonts to be the incorrect size, as the workstation will use its own fonts to drive the GUI display. Workstation GUI to an OPC is not supported. For an xterminal, the font server and possibly the conguration server settings are incorrect. The fonts and conguration server should be set to the OPC IP address. See GUI: Setting Up An Xterminal for an OPC on page 1-14.

1.2 Solution Description


1.2.1 VT100: Reconguring Port B to Terminal The following procedure will outline the steps needed to uncongure services on Port B and then recongure Port B as a terminal. A detailed procedure can be found in the NTP, Volume: Operations, Administration and Provisioning, Section: System Administration Procedures. 1) Open the PortConguration tool from the USM or enter PortConguration tool at the opc prompt 2) Select item 3, (Uncongure a service) 3) Select item 1, to uncongure the service 4) Repeat step 3, until there are no services congured on Port B 5) Select item 8, (Return to Main menu) 6) Select item 2, (Congure a service) 7) Select item 1, (Terminal) 8) Select item 8, (Return to Main menu) 9) Select item 1, (Query Port Conguration) 10) Assure the port is congured as a terminal 11) Hit <ENTER> to return to the main menu 12) Select item 9, (Exit)

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-11 Issue: AD04

Chapter

OPC Login and Start Up

For Nortel Internal Use Only

1.2.2 VT100: Port B Cable Pinouts The following table outlines the pinouts of the various cables that can connect to Port B. The cable connector to Port B is always a male DB9. For more information, refer to the NTP, Volume: Installation, Section: Installing Peripheral Cables.
TABLE 1. OPC Port B to Asynchronous DCE (MODEM) OPC (DB9) 1 2 3 4 5 6 7 8 DCE (DB25) 8 3 2 20 7 6 4 5 OPC Port B to Synchronous DCE (X25) OPC (DB9) 1 2 3 4 5 6 7 8 DCE (DB25) 17 3 2 20 7 15 4 5 OPC Port B to Asynchronous DTE (VT100) OPC (DB9) 1 2 3 4 5 6, 8 7 DTE (DB25) 4 2 3 5, 6 7 20 8 OPC Port B to Asynchronous DTE (LAPTOP) OPC (DB9) 1 2 3 4 5 6 7 8 DTE (DB9) 3 2 1 5 4 8 7

1.2.3 VT100: Port B settings The following table outlines the VT100 settings required to connect a VT100 terminal to Port B.
TABLE 2. VT100 Terminal Settings Variable mode baud bits parity stop Xon/Xoff auto new line duplex autowrap scroll columns controls Setting VT100 1200 - 9600 8 bit none 1 enabled off full (no local echo) off jump 80 interpret smooth can also be used Comment no other is supported auto-bauds down only

1-12 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Login and Start Up

1.2.4 USM: Using the wall and write commands The wall and write command are used to send messages to other users on the same OPC. wall stands for write all, and is used to broadcast messages to all users presently logged onto the same OPC. write is used to send messages to a specied user logged onto the same OPC. To use wall: 1) From an OPC prompt, enter wall 2) Enter the message, end each line of the message with a <return> NOTE: wall permits multiple line messages to be sent 3) When the message is completed, hit CTRL-D to send it NOTE: - wall can send messages to all OPC CMT sessions - wall cannot send messages to the GUI session except for the GUI console window - To abort the message, hit CTRL-C Example Usage: opc> wall <return> Hello World <return> How are you <return> <CTRL-D> To use write: 1) 2) 3) 4) From an OPC prompt, enter who Determine the user and serial device you want to write to Enter write <userid> <device> Enter the message, one line at a time followed by a <return> NOTE: the line is sent when the <return> is pressed 5) When the message is completed, hit CTRL-C to terminate the write NOTE: write can send messages to all OPC CMT sessions write cannot send messages to the GUI session except for the GUI console window Example Usage: opc> write admin pty/ttyu0 <return> Hello World <return> How are you <return> <CTRL-C>

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-13 Issue: AD04

Chapter

OPC Login and Start Up

For Nortel Internal Use Only

1.2.5 GUI: Setting Up An Xterminal for an OPC This procedure already assumes that the /iws/lan/ether_admin command has already been run to enable the EtherNet port. The following steps are meant only to congure an xterminal to communicate with an OPC. NOTE: Setting up an Xterminal for a Network Manager is different than setting up an Xterminal for an OPC For NCD xterminals only: 1) 2) 3) 4) 5) Login to the OPC as the root user, enter /iws/lan/ether_admin Select item 3 (X terminals conguration) Add the appropriate data (boot version, xterminal address and boot server) Save data Continue with Generic xterminal setup

Generic xterminal setup: Refer to your xterminal documentation on how to set up the font and conguration servers. 1) Open the setup window for your xterminal 2) Set the server values (font, conguration and/or display manager) to match the OPCs IP address Useful information to have: font server: (OPC ip address) X le server: (OPC ip address) conguration server: (OPC ip address) boot server: (appropriate workstation ip address or PROM) name server: (appropriate workstation ip address or none) font paths: /iws/X11/fonts /iws/X11/lib/fonts/misc The following fonts are NCD xterminal specic: /iws/X11/lib/ncd/fonts/misc /iws/X11/lib/ncd/fonts/100dpi /iws/X11/lib/ncd/fonts/75dpi 3) Save all settings 4) Reboot the xterminal

1-14 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Login and Start Up

1.2.6 EtherNet: OPC EtherNet Connector Pinout The following is the pinout of the OPC EtherNet port. Note: The connector is not standard 10baseT (EMI restrictions). It is suggested to use the NT suggested EtherNet cable to comply with EMI guidelines.

6 5 4 3 2 1

Pin 1 Pin 2 Pin 5 Pin 6

+ +

TX TX RX RX

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 1-15 Issue: AD04

Chapter

For Nortel Internal Use Only

OPC Base Operations

2.1 Problem Description


The following sections describe common OPC base functionality problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description. 2.1.1 OAM: NE Indicates an OPC OAM S/W Failure The NE containing the OPC is indicating a minor alarm for an OPC OAM Software Failure. The NE uses the OWS process of the OPC to determine the state of the OPC. If the OWS process is not running for any reason then the OAM failure is activated. Reason-1 The OPC is in the middle of an upgrade or an install. While an OPC is being upgraded, it is normal for this alarm to become active. Reason-2 The OPC is in the middle of the restore process from either a restore from tape or a data sync. While an OPC is being restored, it is normal for this alarm to be active. Reason-3 The OWS process has been man-busied. Use the drmstat command to investigate the state of the MSRs. Re-initialize the MSR, if required. Reason-4 The OPC is in the process of an OPC Shutdown. Wait for the OPC to shutdown and restart. Reason-5 The MBIF communication between OPC and NE is slow or failing. To assure MBIF communication is working, open the tstatc tool and select the k option to view MBIF statistics. Assure that the RX and TX failed packets are zero and the RX and TX packets are equal. Packets from the NE should arrive every minute. Reason-6 If there is no k option in the tstatc then the kernel does not support MBIF communication. Assure that the load running actually supports OPC OAM Software Failure.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 2-1 Issue: AD04

For Nortel Internal Use Only

Chapter

OPC Base Operations

2.1.2 OWS: Both Primary and Backup OPCs are Active The Primary and Backup OPCs are both active and have been active for a period greater than ve minutes. Reason-1 There is a network partition. Use osiping to verify the continuity between the OPCs. Assure the associations to all NEs are available. See CAM: Associations are Down or are Unstable on page 2-30. A network partition can occur because: an optical bre/CNET break occurs an NE is being downloaded an NE has dropped into debug mode There is little that can be done for a bre/CNET break but to wait until the break is repaired. If an NE is downloading, login to both OPCs and open the Reboot/Load Manager to monitor the progress of the NE download. If an NE has dropped into debug mode then it will be necessary to login to the debug port at the NE and issue the go command. It may be necessary to issue the go command several times to get the NE out of debug mode.

Reason-2 The Backup OPC is not commissioned or the wrong OPC is commissioned as Backup. Open the Commissioning Manager tool to verify that the proper OPC has been commissioned. Reason-3 A Primary or Backup OPC has been recently replaced and/or commissioned. Whenever there is a change to the commissioning data it is recommended to datasync the changes and to shutdown the OPC (use the OPC Shutdown tool). Reason-4 The ows process is not working. Under normal operation the Primary and Backup OPCs are always exchanging messages. Use tstatc to monitor the packets being sent between the OPCs. If there is no ows process running or if the ows outgoing messages are not increasing then the ows MSR must be re-initialized. Reason-5 If both Primary and backup OPCs are found Active in situations other than those mentioned above, look for /iws/ows/ows_swact.record le on both the Primary and Backup OPCs. Delete this le from both OPCs (or from whichever OPC it exists), busy and return to service warmstandby (OWS) MSR on both of the OPCs. Primary and Backup OPCs will return to a normal state. 2.1.3 OWS: Primary OPC is Inactive The Primary OPC is inactive and the Backup OPC is active.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 2-2 Issue: AD04

Chapter

OPC Base Operations

For Nortel Internal Use Only

Reason-1 The Primary OPC is just recovering after a shutdown. Wait for the OPC to fully recover, in about 5 minutes and the Primary OPC will regain control of the network. Reason-2 The ows_swact tool was used to switch the OPC activities. Use ows_swact -a on the Primary OPC or ows_swact -i on the Backup OPC to revert the OPC activities. Reason-3 The Primary and Backup OPC were being swapped and the process was not performed correctly. Refer to the NTP for OPC module replacement.

2.1.4 OWS: Backup OPC Wont Go Active The Backup OPC will not go active even when Primary OPC is shutdown. Reason-1 The ows_swact tool was used to lock the Backup OPC inactive. Use ows_swact -r on the Primary or Backup OPC to release the lock. Reason-2 A data sync was not performed from the primary. Verify the commissioning information.

2.1.5 ODS: Data Synchronization Fails OPC data synchronization is failing. Reason-1 The connection from Primary to Backup is down. Use osiping to verify the continuity between the OPCs. See CAM: Associations are Down or are Unstable on page 2-30. Reason-2 Directory structures are missing at either the Primary or Backup OPCs. ODS requires the directory /iws/ods/odstmp be on the Primary and the directory /users/ VFS/users/opcods be on the Backup. Reason-3 The time between the Primary and Backup OPC has been changed using the UNIX date command. The OPC Date tool must be used to change the dates on the OPC.

2-3 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Base Operations

To correct the problem, re-align the Backup date to match the Primary date, using the OPC date tool command. Reason-4 Different loads are running on OPCs. Put identical loads on OPCs and then do a data sync.

Reason-5 Read, write or execute permissions for one of the directories /users/VFS, /users/ VFS/users or /users/VFS/users/opcds is not there. Give permissions to these directories wherever they are missing. 2.1.6 ODS: Want to Data Sync from Backup to Primary Request from customers. Normally a restore from tape using the Save and Restore tool would be the advisable method of restoring data. See ODS: Data Sync from Backup to Primary on page 2-34.

2.1.7 CAM: Associations are Down or are Unstable The only reliable means of determining if an NE is no longer communicating is to use osiping from an OPC or clping and coping from an NE. osiping, clping and coping send a number of test patterns to the specied target. If the target is responding properly then the target will echo the patterns back. For more information about osiping, see OPC Tools on page ,A-137. Reason-1 Fibre break. When a bre break occurs and datacom cannot get routed to NEs via other means then associations will drop. If possible, login to both the Primary and Backup OPC and perform an osiping broadcast at both OPCs. A broadcast sends test patterns to all NEs and OPCs on the network. The resulting data can be correlated to determine where the bre break is and which nodes are affected. Note: broadcast sends a large number of messages throughout the network, which may degrade network performance. If this occurs target the NEs individually. Reason-2 An NE is downloading. When an NE is downloading, it is unable to provide routing services for datacom to other NEs, thus causing a network partition at the NE. To quickly determine if an OPC is downloading an NE, use the tstatc tool to look for outgoing and incoming messages being sent to the dls process and the NE.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 2-4 Issue: AD04

Chapter

OPC Base Operations

For Nortel Internal Use Only

Reason-3 During network upgrades using NUM, it is normal to see only a partial SOC because both the primary and backup OPCs are sharing the SOC. Reason-4 The NE has been re-commissioned and the NEid has been changed from its previous value. The OPC uses the unique NEid to identify an NE. If the NEid has changed on the OPC (re-commissioned) but not at the NE (requires a reboot) then the associations to that NE will be down. Use nnsmon -d to view all visible NEs on the network. Re-commission the NEid back to its original value. It may take up to 5 minutes to re-establish the associations. Reason-5 The NE is experiencing trouble with the CNET or SDCC. Login to the NE. Assure the CNET and SDCC alarms are not masked. Check the NE UI alarm screen. Correct the CNET and SDCC alarms at the NE. At the NE UI, enter cnet readstat, mlapd readstats 0 and mlapd readstats 1, Frame Fragments or Errors greater than 0 (zero) indicate a possible noisy CNET cable or port. Replace the hardware. Reason-6 The NE comm ports are down. At the NE UI, enter the ports all command to retrieve a summary of all CNET, SDCC and LAPD ports. If the CNET or SDCC port is OOS or off, login to the NE from an hmi port and use the fa comm;ports;portprov;chgstate command to enable the comm port. If the port is IStbl then there is a hardware fault. Verify the fault by putting the port OOS and then IS. If the problem persists, return the SP to Nortel. Reason-7 There is another NE with the same NEid in the network. Use nnsmon -d to locate the duplicate NE. Change the NEids of the duplicate NEs. Reason-8 The backup OPC is active and has taken control of the SOC. See OWS: Both Primary and Backup OPCs are Active on page 2-28. Reason-9 There are too many NEs on the CNET. See H/W: CNET Fail on page 3-37.

2.1.8 OPC: OPC is Not Communicating The OPC is not communicating over CNET, SONET, EtherNet, Port B. There is no access to the debug port of the OPC.

2-5 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Base Operations

Reason-1 The user logged out near the end of an OPC shutdown without conrming the nal shutdown prompt. The work-around is removing and re-seating the OPC. Before removing the OPC, assure the harddisk is not spinning by listening to the OPC. Reason-2 An OPC shutdown was performed while the OPC Save and Restore tool was performing a restore operation. The work-around is removing and re-seating the OPC. Before removing the OPC, assure the harddisk is not spinning by listening to the OPC. Reason-3 After an installation or an upgrade, the /etc/reboot command fails to reboot the OPC properly. This is very rare and can be remedied by removing and re-seating the OPC. Before removing the OPC, assure the harddisk is not spinning by listening to the OPC. Reason-4 The le system is corrupted.

Reason-5 The hard disk is corrupted. 2.1.9 OPC: OPCCLEAN is Not Running If OPCCLEAN is not running there is a chance that the OPC disk will become full. There should be events recorded in the /var/log/syslog le indicating that OPCCLEAN was run each night. Reason-1 The opcclean script is not available or not executable. Perform a ll /iws/opcles/ opcclean. If the script is not found, retrieve a copy from another OPC or extract the le from tape. If the permissions are not set properly change it with the chmod command. 2.1.10 OPC: OPC is Continuously Rebooting The OPC is not recovering from its reboot cycle. The symptoms will be a continuously running diagnostic cycle. A general troubleshooting hint is to read the diagnostic messages being displayed as this will provide valuable information in determining the root cause of the failure. Reason-1 The 1 Hz clock has been enabled on an OPC running rmware releases OPCREL02 or 01. There is a bug in rmware releases older than OPCREL03 such

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 2-6 Issue: AD04

Chapter

OPC Base Operations

For Nortel Internal Use Only

that the OPC will remain in the reboot cycle indenitely when the 1 Hz clock is enabled. To determine what the rmware release of the OPC is: The work-around is to pull the OPC out of the shelf and re-seat it. This will cause the 1 Hz clock to turn off and the OPC will recover from the reboot. The 1 Hz clock must then be turned off from the OPC Date tool. Reason-2 There has been a le system corruption. This usually means the OPC has managed to get through the diagnostic cycle but it unable to run the kernel. If this is the case, there is very little that can be done except to re-initialize the disk and re-install the load.

Reason-3 There has been a disk media failure. Nothing can be done if this is the case, except to re-initialize the disk and re-install the load.

Reason-4 There has been a hardware failure. This can be determined from the diagnostic readout. The OPC must be returned to Nortel.

2.1.11 OWS: OWS_SWACT Doesnt Work The ows_swact command is not working. Reason-1 The backup OPC is not commissioned. The ows_swact command requires a backup be commissioned. Commission a backup OPC. Reason-2 Communications to the backup OPC is down. Assure communication to the backup OPC and NEs are available. USe osiping and tstatc to check the communication paths. See CAM: Associations are Down or are Unstable on page 2-30. See OPC Tools on page A-137. Reason-3 The ows_swact communication protocol is confused between the old and the new software releases. view the /usr/adm/swilog le and go to the end of the le by hitting G. The log message will indicate the failure occurred during the

2-7 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Base Operations

ows_disable_swact. Quit the le by entering :q!. Use tstatc to assure ows is messaging properly to the backup OPC. Perform ows_swact, if the communication path is bad then there is an ows_swact protocol problem. The workaround is to busy the warmstandby MSR on both the primary and backup OPCs at the same time. Assure both OPCs are OOS, then Return to Service the warmstandby MSR on both the OPCs at the same time. Perform ows_swact and assure the communication path is okay. Resume the NUM in progress. See OPC Tools on page A-137. Reason-4 NUM has not initialized ows_swact properly. view the /usr/adm/swilog le and go to the end of the le by hitting G. The log message will indicate the failure occurred during the ows_disable_swact. Quit the le by entering :q!. Use tstatc to assure ows is messaging properly to the backup OPC. Perform ows_swact, if the peer OPC is not commissioned then ows_swact did not initialize properly. Shutdown the primary OPC to correct the problem. After the OPC recovers from the shutdown, perform ows_swact and assure the communication path is okay. Resume the NUM in progress.

2.2 Solution Description


2.2.1 OPC: Booting from Tape In order to perform a boot from tape, the following criteria must be met: the tape is a diskinit tape and the OPC rmware release is 3 or higher. To determine what rmware is running on the OPC perform the following: 1) 2) 3) 4) Congure Port B as a terminal Connect a VT100 to Port B Login to the OPC and shutdown the OPC using the Shutdown tool The very 1st line spooling on the screen will be: ROM Load Information: OPCRELX where X is the release version (03 and higher)

To boot from tape: 1) 2) 3) 4) 5) Congure Port B as a terminal Connect a VT100 to Port B Insert the diskinit tape into the OPC tapedrive Login to the OPC and shutdown the OPC using the Shutdown tool The OPC will now boot from the tape

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 2-8 Issue: AD04

Chapter

OPC Base Operations

For Nortel Internal Use Only

6) The root password is toor. To access the disk maintenance menu, the userid is opc and the password is cpo. 2.2.2 OSI: Reconstructing an OPCs Serial Number There are several methods to regenerate an OPC serial number. In the below procedures, the command checksum is used, if checksum is unavailable guessing can be used to determine the full serial number. The OPC Commissioning manager will display the OPCs serial number in the upper right hand corner of the screen. Use these procedures only if you do not have access to the Commissioning manager Reconstructing an OPC Serial Number using checksum: 1) Login to the OPC, as root 2) Enter checksum -l 3) The serial number will be given or 1) 2) 3) 4) Login to the OPC, as root Enter uname -a A line will be displayed which will include the base serial number b17e#### Enter checksum -o 3e#### where #### are the last 4 numbers retrieved from the base serial number in step 3 5) The serial number will be given

Note: The above information assumes you are using a so called legacy OPC. The partitioned OPC uses serial numbers with a slighty different format.

2-9 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

OPC Hardware

3.1 Problem Description


The following sections describe common hardware problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description.

3.1.1 H/W: ELAN Fail is Lit The EtherNet Fail light is on. Reason-1 The EtherNet connection is broken or is not connected. Check the EtherNet cable and connection for continuity Reason-2 The EtherNet port is not turned on. Use the /iws/lan/ether_admin tool to initialize/enable the EtherNet port. See Telnet: Telnet Connection not running on page 1-9. Reason-3 The LAN is having problems. See Telnet: Telnet Connection not running on page 1-9. Reason-4 The EtherNet LOS is on but EtherNet is actually running. Sometimes the EtherNet LOS gets stuck. This can be remedied by shutting down the OPC through the OPC Shutdown tool. Normally this will clear on its own, when the OPC detects trafc on the LAN.

3.1.2 H/W: CNET Fail The Control NET (CNET) light is on or a CNET Fail alarm is raised. Reason-1 The CNET connection between the OPC and NE shelf processor is being corrupted. Use the tstatc tool to determine the extent of the CNET problem. See CNET: Using tstatc to evaluate CNET on page 3-6.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 3-1 Issue: AD04

Chapter

OPC Hardware

For Nortel Internal Use Only

Reason-2 There is a termination fault with the CNET connection. Assure that the proper equipment is connected to the CNET ports and that all unused CNET ports are terminated with a proper CNET terminator. Reason-3 The system contains more than 15 NEs and more than 5 of those NEs are on the same CNET. This situation results in excessive messaging on the CNET and causes the CNET hardware to go off-line. The work around is to identify the CNET path between the problem NE and the primary OPC. Then for all other NEs not on that path, but connected to CNET, disable CNET until the problem NE completely recovers. Once the NE recovers then enable CNET on all the NEs on CNET. Reason-4 There is a hardware fault with the CNET connection. Contact the appropriate OPC support authority.

3.1.3 H/W: Active is not lit The Green Active light is off. Reason-1 The OPC is presently being shutdown or is in the process of starting up after a shutdown. Wait ve to 20 minutes for the OPC to complete the shutdown process. Reason-2 The OPC is not powered up or is not seated properly in the shelf. Reseat the OPC.

3.1.4 H/W: Unit Fail is lit The Unit Fail light is lit. Reason-1 An OPC shutdown was performed with the halt option and the OPC was left in the shelf. If an OPC is halted and left in the shelf for a period of three minutes or greater then the OPC will re-start. In order to alert the user that the OPC is starting up, the Unit Fail will light. While the Unit Fail is lit, do not pull out the OPC. The OPC restart can be monitored by connecting a VT100 to Port B. Reason-2 The OPC was improperly seated. Login to the OPC to verify that the OPC is functioning properly. Run the /etc/dmesg command to assure no hardware faults

3-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Hardware

have occurred during re-start. If no reason for the Unit Fail can be found, shutdown the OPC using the OPC Shutdown tool and reseat the OPC. Reason-3 The OPC is unable to function as a result of a le system corruption. Reason-4 There is a real hardware fault. Return the OPC to Nortel.

3.1.5 TAPE: Amber Light is on The amber light on the tape drive is on Reason-1 The tape is rewinding or is in use by another application. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13.

3.1.6 TAPE: Amber Light is Flashing Rapidly The amber light on the tape drive is ashing rapidly. Use the /etc/dmesg command to determine the tape drive problem. Reason-1 Moisture has been detected in the tape drive. This will occur most often when the OPC has been brought in from a cold environment or if the environment is very humid. Wait up to 60 minutes for the OPC to dry out. Reason-2 There has been a media fault. The tape drive needs to be cleaned. Reason-3 There has been a real hardware fault. Return the OPC to Nortel.

3.1.7 TAPE: Green Light is on The green light on the OPC is lit.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 3-3 Issue: AD04

Chapter

OPC Hardware

For Nortel Internal Use Only

Reason-1 A tape has been inserted into the tape drive. It is normal for the green light to be on while a tape is in the drive.

3.1.8 TAPE: Green Light is Flashing Slowly The green light on the tape drive is ashing slowly. Use the /etc/dmesg command to determine the tape drive problem. Reason-1 There has been a media fault. The tape drive needs to be cleaned. Reason-2 The tape inserted is generating excessive errors. The following can cause excessive errors: dirty tape drive, bad tape or kernel problems. Try to clean the tape drive, if the error persists, try a different tape. If the new tape is still causing excessive errors then there is a real hardware fault and the OPC must be returned to Nortel.

3.1.9 TAPE: Green Light is Flashing Slowly and the Amber Light is on The green light on the tape drive is ashing slowly and the amber light on the tape drive is on. Reason-1 A pre-recorded audio tape has been inserted into the tape drive and it is being automatically played. Eject the tape and put in a proper OPC tape.

3.1.10 TAPE: Green Light is Flashing Rapidly The green light on the tape drive is ashing rapidly. Use the /etc/dmesg command to determine the tape drive problem. Reason-1 There has been a media fault. The tape drive needs to be cleaned. Reason-2 The tape drive is having difculty writing to the tape. The following can cause tape write problems: dirty tape drive, bad tape or kernel problems. Try to clean the tape drive, if the error persists, try a different tape. If the new tape is still causing write problems then there is a real hardware fault and the OPC must be returned to Nortel.

3-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Hardware

3.1.11 TAPE: Tape Wont Eject Pushing the eject button wont eject the tape. Reason-1 Eject mechanism is jammed. Shutdown the OPC using the OPC Shutdown tool. While the OPC is re-starting, press the eject button.

3.1.12 TAPE: RBNCLEAN is Not Running If RBNCLEAN is not running there is a chance that if the heads require cleaning on the tape drive of the OPC it will not be detected. There should be events recorded in the /var/ log/syslog le indicating that RBNCLEAN was run each day. Reason-1 The rbnclean script is not available or not executable. Perform a ll /etc/ drive_cleaning/rbnclean. If the script is not found, retreive a copy from another OPC or extract the le from another tape. If the permission are not set properly change it with the chmod command. NOTE this script is only available on Legacy OPC.

3.1.13 TAPE: Tape Drive Cleaning Alarm The Primary or(and) Backup: tape drive cleaning required alarm(s) is(are) active on the OPC. Reason-1 The heads require cleaning on the tape drive of the OPC. NOTE You must clear the alarm manually using the OPC Alarm Provisioning tool after you clean the tape drive.

3.1.14 BAD DISK: SCANDISK/KLS is Not Running If SCANDISK or(and) the Kernel Login System (KLS) is(are) not running there is a chance that a bad disk media will not be detected on the OPC. There should be events recorded in the /var/log/syslog le indicating that SCANDISK was run each Wednesday and that KLS was running. Reason-1 The scandisk script is not available or not executable. Perform a ll /etc/scandisk. If the script is not found, retreive a copy from another OPC or extract the le from another tape or ash card. If the permission are not set properly change it with the chmod command.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 3-5 Issue: AD04

Chapter

OPC Hardware

For Nortel Internal Use Only

Reason-1 KLS is not running on the OPC. For KLS to be running commissioned the OPC.

3.1.15 BAD DISK: Disk Bad Media Alarm The Primary or(and) Backup: disk bad media detected alarm(s) is(are) active on the OPC. Reason-1 There is a problem on the hard disk of the OPC. Reinitialize the OPC hard disk.. NOTE You must clear the alarm manually using the OPC Alarm Provisioning tool after you repair the problem.

3.1.16 HARDRIVE INDICATOR LIGHT: ashing Reason-1 OPCs with a 1000 Mb. harddrive have a hardrive indicator light. If this light is ashing it indicates that the harddrive is being accessed. It therefore serves as a warning that the OPC should not be receded while the harddrive is in use as this may cause potential hardware problems.

3.2 Solution Description


3.2.1 CNET: Using tstatc to evaluate CNET tstatc can perform a number of osi related functions. This procedure will indicate how to locate the CNET socket information using tstatc. See Complete set of OPC Tools on page A-1. Using tstatc for CNET information: 1) 2) 3) 4) 5) 6) 7) Login to OPC, as root Enter tstatc A menu of available commands will appear Enter c, for CLNP subnet statistics A display of CNET and LAN sockets will appear Enter t, for a full display of the CNET sockets information A display of complete CNET socket information will appear

3-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

NE Software Download

4.1 Problem Description


The following sections describe common NE Software Download problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description.

4.1.1 DLM: Reboot/Load Manager is not Downloading an NE The NE is going through the reboot cycle but the NE is not being downloaded by the Reboot/Load Manager. Reason-1 The NE is not commissioned. Open the Commissioning Manager and assure that the NE has been commissioned with the proper data. Reason-2 The NE is commissioned with the wrong data. Open the Commissioning Manager and assure that the NE has been commissioned with the proper data. If the NE has been commissioned with the wrong serial number, use the Commissioning Manager to change the serial number to the proper value. If the commissioned NE release does not match an NE load release then use the on line command: lomui setNE_release -p <product> -r <release> -n <NE ID> to change the NE release. Reason-3 The download request is not being received by the OPC. Open tstatc and determine if the dlm MSR is receiving incoming messages. If no messages are being received then check for connectivity problems. See CAM: Associations are Down or are Unstable on page 2-4. See Complete set of OPC Tools on page A-1.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 4-1 Issue: AD04

Chapter

NE Software Download

For Nortel Internal Use Only

Reason-4 The NE network address is invalid or incomplete.The download request is being received by the OPC, the download server (dls) is being created therefore the download is started but the packets are never received by the NE. As the result, the download les almost immediately. Using tstac tool verify that the dls has been created. Check what is the NE address being served by the dls server. Ref PID Proc Messages Dst. Osihost Name or Address. in out 12 1358 dls DB 5350 5350 49+00003D3141B01B0402 Logon to the NE, go to fa comm menu and obtain the NE address from there (for example 49+00003D3141B01B0400). Compare NE addresses: one reported by tstatc and the other obtained from the NE. If they are identical (except last two digits), contact the NE support group to verify what is going on on the NE and why the NE is not accepting received packets. If the NE addresses are different or 49+0000 area code is missing, correct or add the area code using areadaddr command on the NE (option 7 from fa comm menu). Reason-5 The download request received by the OPC has the wrong serial number. Open the /var/log/syslog le and search for not in OPC. In the SWERR will be a representation of the NE serial number. It will have a format of 8 numbers. Reconstruct the NEs serial number and assure the serial number sent by the NE matches the NE serial number commissioned. If the serial numbers are different, it may be necessary to replace the shelf id card. Reason-6 The download request received by the OPC has the wrong shelf type. Open the /var/log/syslog le and search for not in OPC. The SWERR will refer to the shelf type. If the shelf type does not match the commissioned value, it may be necessary to replace the shelf id card. Reason-7 The SOC table does not contain all the NEs commissioned. Use the spock -u command to assure that the SOC table and database contain the same information. The spock command can also be used to correct any misalignment of the SOC table. Reason-8 The software release of the NE is set to NONE. Ty

4-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

NE Software Download

The NONE indicates that no releases have been found on the OPC. Use the install_release utility to install an NE load to the OPC. Commission appropriate NE using the Commissioning Manager tool. Reason-9 The specied NE software load doesnt exist for the selected release. A log will be generated in the Event Browser indicating load <NE load name> not found. Assure the proper NE load exists in the /users/VFS/download directory. If the load does not exist in the directory, ftp the missing NE software load to the /users/VFS/download directory. Reason-10 The catalogue le associated with the release that rebooting NE is assigned to does not exist in the /users/VFS/download directory. A log will be generated in the Event Browser indicating Cannot nd Catalog <catalogue name> for release <release name>. FTP missing catalogue le to the / users/VFS/download directory. Reason-11 The catalogue le exists but refers to the different release than the rebooting NE. A log will be generated in the Event Browser indicating that Data mismatch between Catalog le occurred. Regenerate the catalogue le for the valid release and ftp it to the OPC. Reason-12 The NE is being downloaded by the backup OPC. See DLM: Reboot/Load Manager indicates Fail on page 4-5. Reason-13 The NE load is corrupt. See DLM: NE shelf processor rmware load is Corrupt or Incomplete on page 4-6.

4.1.2 DLM: Reboot/Load Manager Wont Start The Reboot/Load Manager wont start. Reason-1 The OPC is not commissioned. Once the OPC is commissioned, the critical MSRs needed for the Reboot/Load Manager will become available. Use the drmstat tool to assure the loadisumgr and downloadmgr MSRs are inservice and available.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 4-3 Issue: AD04

Chapter

NE Software Download

For Nortel Internal Use Only

4.1.3 DLM: Reboot/Load Manager is not displaying any NEs The Reboot/Load Manager comes up okay but there are no NEs visible in the list. Reason-1 The NEs have not been commissioned. Use the Commissioning Manager to commission the NEs. Reason-2 The SOC table has been corrupted. Run the spock tool to correct the SOC.

4.1.4 DLM: NE is Continuously Rebooting After an NE successfully reboots, the NE starts to reboot again. Reason-1 The NE load is corrupted. See DLM: NE shelf processor rmware load is Corrupt or Incomplete on page 4-6. Reason-2 The system contains 12 or more NEs and 5 or more of those NEs are on the same CNET. This situation results in excessive messaging on the CNET and causes download to halt. This is particular to 1:N systems. The problem has to do with 2 or more LAPD cheapest equal routes converging at the CNET at the same time. The work around is to disable one of the SDCC connections. Reason-3 The NE control processor has a hardware fault. Connect a VT100 to the crafts port of the NE and watch for hardware faults from the diagnostics displayed. If a fault is detected, send the CP back to NT. Reason-4 FW vintage mismatch on the NE loads. (Note: the FW load is not corrupted and NE accepts the load but event thought the OPC keeps sending the same FW load over and over again) Open the Event Browser and search for SDA logs indicating the start and nish of a FW download. If repeating FW download logs to the same NE appear in the Event Browser then the FW vintage mismatch problem exists. Check the rmware vintage value (eight eld) in the catalogue le in the line starting with OAM_Firm string for the hardware equipment type (for example ringadm, regen, term etc) that your system is congured for (fth eld). For example:

4-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

NE Software Download

#eld in the OAM_Firm line in a catalogue le: 1 2 3 4 5 6 7 8 9 10 11 12 B 13 OAM_Firm OC48 ringadm sp 0 16 LOAD1.LD 40847 525 internalName B 13 OAM_Firm OC48 regen sp 0 16 LAOD1.LD 40847 525 internalName B 13 OAM_Firm OC48 term sp 0 16 LOAD1.LD 40847 525 internalName where rmware vintage value is 16. Obtain the rmware vintage value for the shelf main processor rmware load currently running on the NE. Compare both numbers: from the catalogue le and from the NE: if the numbers are different then the FW vintage mismatch problem exists. To correct the problem, re-generate the catalogue le with the rmware vintage indicated by the NE and ftp the new catalogue to the OPC. 4.1.5 DLM: NE is Frozen Immediately After a Reboot Immediately after an NE has completed downloading, it seems to freeze and not function. Reason-1 After a reboot occurs, the NE requests a restore from the OPC. If the OPC Backup and Restore Manager does not respond to the NEs request then the NE will continue requesting a backup database until the OPC services the request. See BRM: NE Restore is Not Working on page 6-2. Reason-2 The NE load is corrupted. See DLM: Reboot/Load Manager indicates Fail on page 4-5. Reason-3 The NE control processor has a hardware fault. Connect a VT100 to the crafts port of the NE and watch for hardware faults from the diagnostics displayed. If a fault is detected, send the CP back to Nortel.

4.1.6 DLM: Reboot/Load Manager indicates Fail The Reboot/Load Manger indicates that a download to an NE has failed. Reason-1 The NE is going through another reboot cycle. The NE reboot cycle and the OPC download cycle run asynchronously, therefore an NE can be entering its reboot cycle just as the OPC is about to service the request. This is considered to be normal OPC behavior and the OPC will service the request on the next download request from the NE.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 4-5 Issue: AD04

Chapter

NE Software Download

For Nortel Internal Use Only

Reason-2 The NE is being serviced by another OPC. The other servicing OPC can be the backup OPC or a portable OPC. When an NE reboots, any OPC which recognizes that NE as part of its SOC, can service the NE. Since the backup OPC is always synchronized with the primary there should be no problem with the NE being serviced by the backup. Once the download has been completed control will return to the primary OPC. A problem with an NE being downloaded by the portable OPC is the portable OPC will not relinquish control of the NE after the NE has been downloaded. The only way for the primary OPC to regain control of the NE is to remove the NE from the portable OPCs SOC or shutdown the portable OPC using the Shutdown tool. Reason-3 The NE load is corrupted. See DLM: NE shelf processor rmware load is Corrupt or Incomplete on page 4-6. 4.1.7 DLM: Load Processor Activity is Disabled The Reboot/Load Manager load processor list item menu entry has been disabled. Reason-1 Associations to the NE is down. When associations to an NE is not available, the Reboot/Load Manager cannot send the request to the NE to reboot the load. 4.1.8 DLM: NE shelf processor rmware load is Corrupt or Incomplete The NE download of a shelf processor rmware load is failing as a result of a corrupted load. This is extremely rare to see. The loads can be checked for validity as follows: If the dlmbfa command is available then it can be used to perform a B block check on the NE load. Usage dlmbfa integrity <NE shelf processor rmware load>. A failure in a B block check will indicate Corrupt B-record. Reason-1 An OPC shutdown was performed while the user was trying to deliver a load via ftp to the OPC. This will cause the load to be incomplete. Reason-2 The restore operation of an OPC Save and Restore was aborted. This will cause the load to be incomplete. If the le is corrupted or missing, See DLM: NE shelf processor application load is Corrupt or Incomplete on page 4-5.

4-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

NE Software Download

4.1.9 DLM: NE shelf processor application load is Corrupt or Incomplete The NE download of a shelf processor application load is failing as a result of a corrupted load. This is extremely rare to see. The loads can be checked for validity as follows: If the dlmbfa command is available then it can be used to perform a B block check on the NE load. Usage dlmbfa integrity <NE shelf processor application load>. A failure in a B block check will indicate Corrupt B-record. To determine if any load les is missing, type lomui -e validate at the OPC prompt. Reason-1 An OPC shutdown was performed while the user was trying to deliver a load to the OPC via ftp. This will cause the load to be incomplete. If the le is corrupted or missing, remove the NE shelf processor load using the UNIX rm command and use the /iws/thf/thf_extract tool to retrieve an NE shelf processor load from tape. Reason-2 The restore operation of an OPC Save and Restore was aborted. This will cause the load to be incomplete. If the le is corrupted or missing, remove the NE shelf processor load using the UNIX rm command and use OPC Save and Restore to perform a restore from tape.

4.2 Solution Description


4.2.1 DLM: NE serial number is corrupted There are two methods to regenerate an NE serial number. The rst involves using the checksum command and the second involves guessing. Reconstructing NE Serial Number using checksum: 1) Login to the OPC, as root 2) Enter checksum -n NN ##### where NN = 02 for OC48 LTE, ADM, RING_ADM 03 for OC48 Regen 08 for OC12 TBM, ADM, RING_ADM 0C for STM16 LTE 0D for STM16 Regen where ##### is the last 5 digits of the NE serial identier received by the OPC. The NE serial identier can be located in the OPCDLM352 SWERR. Example SWERR in /var/log/syslog: downloadmgr: OPCDLM352: Unable to download to NE 02081223; not in OPCs span of control. Commissioning of this transport MUX shelf may have been performed incorrectly. Verify the serial number of this NE. In the above SWERR the NE serial identier is 02081223 and the ##### is 81223. Example: Assuming the shelf is an OC-48 NE, enter:

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 4-7 Issue: AD04

Chapter

NE Software Download

For Nortel Internal Use Only

checksum -n 02 81223 The NE serial number is A2 30281223 3) The serial number of the NE is generated Reconstructing NE Serial Number using guessing: 1) Login to the OPC, as slat 2) The serial number format of an NE is A2 H NN##### where NN and ##### are the same as the checksum method above where H is the checksum (hexadecimal value from 0-f) 3) Open the Commissioning Manager 4) Enter the NE data including the above serial number format, substituting H with a hexadecimal number from 0 to f Example: Assuming the shelf is an OC-48 NE, enter: A2 H 02 81223 where OHF 5) Repeat Step 4 until the serial number is accepted 4.2.2 DLM: Removing Release From the Backup OPC In order to remove the release from the Backup OPC, the activity of both OPCs must be switched, such that the Primary is inactive and the Backup is active. Once the switch of activity is performed the lomui commands can be used to remove the release. Remove release from Backup OPC: 1) 2) 3) 4) Login to the Backup OPC as root Enter ows_swact -a The Backup OPC will go active and the Primary will go inactive execute lomui deleteNE_release -p <product> -r <release> command The command will fail if: - the release to be deleted is the release commissioned on the OPC - the release is used by at least one NE commissioned on the system 5) Logout from the session 6) Enter ows_swact -i 7) The Primary OPC will go active and the Backup will go inactive

4-8 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

Installations & Upgrades

5.1 Problem Description


The following sections describe common release installation problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description. 5.1.1 SWI: Backup OPC is still active after start_backout completes After starting the backout with the start_backout utility, the backup OPC is still active and in upgrade mode. Reason-1 The start_backout command did not tell the backup OPC to exit upgrade mode. Use /tmp/install/util/abort_upgrade to force upgrade mode to be exited. 5.1.2 SWI: OPC Not Functioning After Installation After an installation using the install_release utility, the OPC is running but none of the MSRs have started. Reason-1 The /etc/reboot command was not performed. Use /etc/dmesg | more and view / var/log/syslog to determine if the OPC rebooted. Since the OPC MSRs are not running, the only way to shutdown the OPC is to use the /etc/reboot command. 5.1.3 SWI: Installation is Failing During Validation The install_release utility is failing at the very start of the installation. During the validation several tests are performed to assure the installation is possible. Reason-1 Not enough disk space. The installation program will always check to assure there is enough disk space to perform the installation. This should not normally happen, since a process OPCCLEAN is supposed to remove miscellaneous les every night and a software installation is normally only performed after removing the current release. Use view /var/log/syslog and enter /CLEAN to see if the OPCCLEAN process has been running. Remove any miscellaneous les which may have been stored on the OPC by accident. See OPC: OPCCLEAN is Not Running on page 2-6.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 5-1 Issue: AD04

Chapter

Installations & Upgrades

For Nortel Internal Use Only

Reason-2 The communication channel is unreliable (only applicable for remote release installations). The installation program will always check to assure there is communications to the remote OPC before starting an installation. Communications between a primary and backup OPC is normally over OWS. If the associations are down then the two OPCs will be unable to communicate and both OPCs will become active. See OWS: Both Primary and Backup OPCs are Active on page 2-2. Reason-3 The upgrade path is not supported. The installation program will always check to assure that the current release, if any, can be upgraded to the new release. If this error occurs then the OPC must rst be upgraded to an intermediate release then to the nal release. If an upgrade is not required and the data presently on the OPC is not necessary then the OPC disk can be initialized and the new release installed. Instead of initializing the disk, an alternate means of cleaning the OPC is to wipe the release. This method is faster but not as thorough as a disk initialization. Use the disk initialization method, if there is any doubt about the integrity of the OPC environment. See rmopcld - remove OPC load on page A-24. 5.1.4 SWI: Installation is Failing During Transfer The install_release utility is failing during the extraction of the les from the tape. Reason-1 Not enough disk space. This should not normally happen, since the installation program always checks the available disk space before starting an installation. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or to free up disk space. See USM: Using the wall and write commands on page 1-13. Reason-2 The tape has been ejected. Re-insert the tape and restart the installation. Reason-3 There has been a tape read error. Check for a hardware failure. See TAPE: Amber Light is Flashing Rapidly on page 3-3. See TAPE: Green Light is Flashing Slowly on page 3-4.

5-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Installations & Upgrades

5.1.5 NUM: Both Primary and Backup OPCs are Active The Primary and Backup OPCs are both active and have been active for a period greater than ve minutes. Reason-1 The Network Upgrade Manager is in the middle of upgrading the NEs. During NE upgrades it is normal for the backup and primary OPCs to be active. 5.1.6 RBN: Disk 95% Full Alarm The Primary or(and) Backup: disk 95% full alarm(s) is(are) active on the OPC. Reason-1 The hard disk of the OPC is more than 95% full. NOTE If this alarm is active, do not proceed with a software upgrade until you have cleared the alarm.. After more disk space is available, you must clear the alarm using the OPC Alarm Provisioning tool or wait for the next periodic audit to clear the alarm.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 5-3 Issue: AD04

Chapter

Installations & Upgrades

For Nortel Internal Use Only

5-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

Backup/Restore Manager

6.1 Problem Description


The following sections describe common NE backup and restore problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description.

6.1.1 BRM: NE Backup is Not Working The NE backup request is failing or the request is succeeding but the NE FWDB logs indicate the backup request failed. Reason-1 The NE is not commissioned. Open the Commissioning Manager and assure the NE has been properly commissioned. Reason-2 The backuprestoremgr MSR is MANUALBUSY or SYSTEMBUSY. Re-initialize the MSR. Once the re-initialization has completed, make sure the MSR ins INSERVICE/AVAILABLE. If the problem persists, contact the appropriate OPC support authority. Reason-3 The Backup and Restore Manager is not receiving the NEs backup request. Open the tstatc tool and enter d to display the sockets. Watch the brm incoming sockets and determine if the backup requests are being received by the OPC. See BRM: Backup OPC is Handling the NE Database Requests on page 6-3. Reason-4 The NE associations are down. See CAM: Associations are Down or are Unstable on page 2-4. Reason-5 Not enough disk space to store the NE database backups. This should not normally happen, since a process OPCCLEAN is supposed to trim miscellaneous les every night. Use view /var/log/syslog and enter?CLEAN to see if the OPCCLEAN process has been running. Remove any miscellaneous les which may have been stored on the OPC by accident. See OPC: OPCCLEAN is Not Running on page 2-6.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 6-1 Issue: AD04

Chapter

Backup/Restore Manager

For Nortel Internal Use Only

Reason-6 The neshadow MSR is MANUALBUSY or SYSTEMBUSY. Re-initialize the MSR. If the problem persists, contact the appropriate OPC support authority. Reason-7 There are too many NEs requesting to do a backup. The number of backups that can be handled by the Backup and Restore Manager is limited to 10 at one time. However, the NE will retry sending the request again. Usually all backups are received from OPC. A SWERR is generated in the /var/log/syslog le whenever the maximum is exceeded. The message is socket limit reached in syslog. This message will show when more than 10 NEs in SOC and a database backup command is sent to all NEs. This is normal and the backup request will be handled at a later time. It is normally recommended to schedule the backups of each NE to occur at different time intervals. Reason-8 The NE is waiting for the QAPPROVECI alarm to be cleared. The command QAPPROVECI must be entered at the NE UI and the prompted instructions must be followed in order to clear the alarm before an auto-prov or change to the NE database is permitted. Prior to the clearing of this alarm, an NE backup is not permitted.

Reason-9 The Backup Database on the OPC missing. Check the alarms on the OPC and look for alarm Database Missing or outdated alarms. To solve this Re-initialize the backuprestoremgr MSR. Once the re-initialization has completed, make sure the MSR ins INSERVICE/AVAILABLE. After a few minutes the alarm will be cleared and you can initiate a manual NE database backup. 6.1.2 BRM: NE Restore is Not Working The NE restore request is failing or the request is succeeding but the NE FWDB logs indicate the restore request failed. Reason-1 The NE is not commissioned. Open the Commissioning Manager and assure the NE has been properly commissioned. Reason-2 The backuprestoremgr MSR is MANUALBUSY or SYSTEMBUSY. Re-initialize the MSR. Once the re-initialization has completed, make sure the MSR ins INSERVICE/AVAILABLE.If the problem persists, contact the appropriate OPC support authority.

6-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Backup/Restore Manager

Reason-3 The Backup and Restore Manager is not receiving the NEs restore request. Open the tstatc tool and enter d to display the sockets. Watch the brm incoming sockets and determine if the restore requests are being received by the OPC. See BRM: Backup OPC is Handling the NE Database Requests on page 6-3. Reason-4 The NE associations are down. See CAM: Associations are Down or are Unstable on page 2-4. Reason-5 The neshadow MSR is MANUALBUSY or SYSTEMBUSY. Re-initialize the MSR. If the problem persists, contact the appropriate OPC support authority. Reason-6 Both the current and the backup1 NE database backups are corrupted. See BRM: NE Database Backups Incomplete or Corrupted on page 6-4. Reason-7 The NE is waiting for the QAPPROVECI alarm to be cleared. The command QAPPROVECI must be entered at the NE UI and the prompted instructions must be followed in order to clear the alarm before an auto-prov or change to the NE database is permitted. Prior to the clearing of this alarm, an NE backup is not permitted. Reason-8 No backups are found for that particular NE in OPC. Reason-9 Data conversion failed. This occurs when OPC does not have a current release backup, however, a backup is found for a previous release. The NE conversion program tried to convert to new release entity set, but it failed. Reason-10 There is no NE database backup store in the OPCs. 6.1.3 BRM: Backup OPC is Handling the NE Database Requests The NE backup and restore requests are being handled by the backup OPC. Reason-1 Both primary and backup OPC are active. This is normal for a network partition. See OWS: Both Primary and Backup OPCs are Active on page 2-2.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 6-3 Issue: AD04

Chapter

Backup/Restore Manager
Reason-2 BOPC is active and POPC is inactive.

For Nortel Internal Use Only

6.1.4 BRM: NE Database Backups Incomplete or Corrupted The NE Database Backups have been corrupted or is incomplete. This may result in failure of NE restore operations. Reason-1 The OPC Save and Restore was aborted in the middle of a restore operation. This may cause the NE database backup to be incomplete. Re-execute the restore and let it run to completion. Reason-2 The OPC was shutdown during an NE database backup operation. This will cause the NE database to be incomplete or corrupted. Restore the NE with the backup1 version of the NE database or restore the NE database backups from tape. Reason-3 The OPC hard disk data is corrupted. If the restore is coming from the Primary OPC, then try switching activity to the Backup OPC and then try to restore from the Backup OPC.

6-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

OPC Save And Restore

7.1 Problem Description


The following sections describe common OPC Save and Restore problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description.

7.1.1 SRT: Save to Tape is Failing The save to tape is failing to write to the tape. Reason-1 The tape is write-protected. Eject the tape and set the write-protect tab to the write position. Reason-2 There is no tape in the tape drive. Insert a tape. Reason-3 One of the les the Save and Restore Tool is trying to save cannot be read or is unavailable. Search the Event Browser for SDA logs. The SDA log will display the trouble le. Perform a ls -l <trouble le>. If the le is not available, retrieve a copy from the backup OPC. If it is not readable use the chmod command to change the permissions. In some cases the problem isnt a le but a directory. Perform a ls -ld <trouble directory>. If the directory is not available, retrieve a copy from the backup OPC. If it not readable use the chmod command to change the permissions. If the missing directory is of the form /home/<userid> or /usr/<userid> then the missing directory is the home directory of an illegal user. Remove the illegal user from the /etc/passwd le. In some other cases it may be some problems with NE (NE sends one backup after another to the OPC ). Check the syslog for the name of the missing le. Then perform ls -l and replace digits in the name with a wildcard (e.g. N1.S1.NE32767.EQ1.REL1300.current.T000001FA change to N1.S1.NE32767.EQ1.REL1300.current.*). Perform ls every ten seconds or so and observe if the name of the le changes. If this is the case, SRT cannot perform the save operation because the le on the list is no longer there. Deal with this problem rst, then retry saving to tape.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 7-1 Issue: AD04

Chapter

OPC Save And Restore

For Nortel Internal Use Only

Reason-4 The ag le /iws/srt/srtsave.lck is present. This le is used to prevent multiple save operations from being requested. To correct the problem, assure that a save is not in progress and then delete the le manually. Reason-5 There is a fault with tape or tape drive. Insert a new tape. See TAPE: Amber Light is on on page 3-3. See TAPE: Amber Light is Flashing Rapidly on page 3-3. See TAPE: Green Light is Flashing Slowly on page 3-4. See TAPE: Green Light is Flashing Rapidly on page 3-4.

7.1.2 SRT: Restore from Tape is Failing The restore from tape is failing to read from the tape. Reason-1 There is no tape in the tape drive. Insert a tape. Reason-2 There is a fault with tape or tape drive. Try to read from the tape by using the /iws/thf/thf_extract -l command. See TAPE: Amber Light is on on page 3-3. See TAPE: Amber Light is Flashing Rapidly on page 3-3. See TAPE: Green Light is Flashing Slowly on page 3-4. See TAPE: Green Light is Flashing Rapidly on page 3-4.

7.1.3 SRT: Critical Files are Missing After a Restore After a restore using the Save and Restore Tool, critical les are missing. The critical les can be any of the les that the Save and Restore Tool saves to tape. See SRT: Files saved by Save and Restore Tool on page 7-3. Reason-1 The restore was aborted in the middle of reading the tape. Aborting the restore would include such actions as shutting down the OPC, powering down the OPC, killing the srt process, busying critical resources and cancelling the restore. During the process of restoring an OPC, the Save and Restore Tool rst erases disk les and second replaces the les with those extracted from tape. If the restore is

7-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Save And Restore

aborted before or during the le replacement then those les will be missing or corrupted. After a restore is aborted, it is always recommended to perform a complete restore from tape.

7.2 Solution Description


7.2.1 SRT: Files saved by Save and Restore Tool The following is a list of the types of les saved by the Save and Restore Tool. Note only the les resident to the local OPC are stored to tape and not those of the remote peer: Directory Service - used to establish associations between OPC and NEs Network Element Database Backups - as administered by the Backup/Restore Manager Network Element Software loads OPC Database - OPC object data and PM data Span of control table - denes NEs commissioned in this OPCs span of control Commissioning, Remote Telemetry, Access, toolset data passwords, system addresses, system conguration, Xterminal conguration, communications conguration, user conguration 1 for N database The following is a list of les saved by the Save and Restore Tool: (This list is probally out of date)

/etc/csh.login /etc/hosts /etc/netlinkrc /etc/osihosts /etc/prole /etc/rc /etc/x25init_scc0 /etc/osiservices /etc/osiaet /etc/x25/x29hosts /etc/x25/x3cong /etc/xntp/xntpd.conf /etc/xntp/xntpd_osi.conf /etc/authparms /etc/services /etc/inetd.conf /home/<home directories dened in /etc/passwd> /iws/brm/nedsb/<NE database backups> /iws/srt/srttmp/dsadir.dir /iws/srt/srttmp/dsadir.pag

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 7-3 Issue: AD04

Chapter

OPC Save And Restore


/iws/srt/srttmp/oomdb.dir /iws/srt/srttmp/oomdb.pag /iws/uie/uievars.text /iws/xdm/Xservers /iws/tir/x25_pid_p /iws/tir/x25_pid_b /iws/tir/opc_tid_p /iws/tir/opc_tid_b /iws/ops/opscrdir.le /iws/las/data/opcalmprov /iws/via/x25_pid /iws/X11/fonts/usr/lib/X11/ncd/congs /iws/nma/nma_pm_report_ags.* /iws/opm/opmconf.text /iws/tl1/tl1tpid.text /iws/dlm/opcspan.tbl /iws/scf/opccom/opcname.tbl /iws/rtu/rtu_ne /iws/srt/srtsch.ag /iws/dlm/opcspan.tbl /iws/tl1/tl1_tid /iws/stp/cot/cot.data /iws/hbt/util/hbt.audit /iws/hbt/util/baseline.mod /.psurc /tmp/opcdb.tar (entire OPC database) /tmp/sch_db_dir/sch_ljob_db.dir /tmp/sch_db_dir/sch_ljob_db.pag /tmp/sch_db_dir/sch_type_db.dir /tmp/sch_db_dir/sch_type_db.pag /tmp/sch_db_dir/schinsts /tmp/sch_db_dir/schtypes /usr/etc/ows_reachability /usr/etc/osiadminrc /usr/adm/inetd.sec /usr/etc/oc3ts.dat /users/VFS/download/<NE loads>

For Nortel Internal Use Only

List of those les may change from load to load and the current list is in the PLS in the le srtsys.<xx00> 7.2.2 SRT: Unable to Resore OPC Data from Disk OC-48 has a critical need to nd out how to put OPC Save & Restore les recovered from a customer tape onto a new tape in the lab environment.

7-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

OPC Save And Restore

This process is needed when a customer has an outage and designers want to restore a lab OPC with the customers data. The customer has a tape that they have created with OPC Save and Restore, and they can extract the four SRT_* les to our server immediately following an outage. OC-48 eld support needs a way to do the following: 1. Go to the OPC Save and Restore tool and perform an OPC Save. 2. Extract the OPC save les that were put on the tape onto a workstation with thf_extract -l. 3. This produces four les: SRT_TAPE_ARCHIVE (which appears to be a big .tar le),SRT_TAPE_CONFIG,SRT_TAPE_INDEX,and SRT_ARCHIVE_CHECKSUM. 4. Take those les and thru some process similar to what the OPC itself does when it does an OPC save, put them onto another NEW tape. 5. Take the new tape just created and do an OPC Restore with that tape. Solution 1 Process to restore OPC archive (image) on the OPC from the archive le. a) use /iws/thf/thf_extract -f /iws/srt/SRT_TAPE_ARCHIVE command to extract the archive le from the tape. b) upload this le to your OPC. c) execute /iws/srt/srtsystem PREP_RESTORE command. This command puts OPC out of service (busies MSRs, etc.). d)OPTIONAL***************************************************** Use this step only if you want to restore all OPC data including users and passwords. Do not use it if you dont know the customers password otherwise you may be not able to login. ***************************************************************** execute /iws/srt/srtuser.text PREP_RESTORE command. This command prepares the list of users and passwords on your machine to compare it to restored data. e) use tar xf SRT_TAPE_ARCHIVE command to untar the archive. Make sure that le /tmp/opcdb.tar exist. f) execute /iws/srt/srtsystem RESTORE command. This command restores the OPC database from the archive backup le. Then it returns OPC to service. g)OPTIONAL***************************************************** Use this step only if you want to restore all OPC data including users and

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 7-5 Issue: AD04

Chapter

OPC Save And Restore

For Nortel Internal Use Only

passwords. Do not use it if you dont know the customers password otherwise you may be not able to login. ***************************************************************** execute iws/srt/srtuser.text RESTORE command. This will restore user from the archive. h) reboot OPC to restore internal variables. Process how to create the OPC archive le on the disk (not on tape). This may be useful in case you have access to the customers OPC or there is no tapedrive on OPC. You may create archive on OPC disk and bypass the tape creation/extraction part of the process. a) make sure you have enough free space on the disk. You may use the command / iws/srt/srtsystem SAVE_DISK_SPACE to decide how much space you need. b) execute tar cf <archive.tar> `srtsystem SAVE` command to build the archive le. Here <archive.tar> is the path and lename of the archive. c) ftp <archive.le> to your OPC and use process #5 to extract the archive.

7-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

Commissioning Manager

8.1 Problem Description


The following sections describe common commissioning problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description. 8.1.1 SCF: Cant Enable Clear Commissioning Button The Clear Commissioning Button is grayed out and is unavailable. Reason-1 The Clear Commissioning button is only enabled for OPCs which do not match any of the serial numbers listed in the commissioning database. To enable the Clear Commissioning button, use the Enable Clear Commissioning tool (available from OPCUI). See Complete set of OPC Tools on page A-1. See SCF: Clearing Commissioning Data on page 8-3. 8.1.2 SCF: Cant Commission a New NE Unable to add another NE. Reason-1 The maximum number of NEs that can be supported has been reached. The maximum number of NEs is 34. The maximum of LTE is 24. Other span of control rules are described in the NTP. Reason-2 An NE already exists in the network which has the same NEid and/or serial number. If the problem is related to the NEid then change the NEid to another value. If the problem is related to the serial number then contact the owner of the other NE to try to sort out serial numbers. Reason-3 The commissioning and/or SOC information has been corrupted. Run spock to align the database and le information.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 8-1 Issue: AD04

Chapter

Commissioning Manager

For Nortel Internal Use Only

8.1.3 SCF: Error Message - This OPC contains invalid data The following error message is generated at the Commissioning Manager tool, This OPC contains invalid data. Reason-1 The commissioning and/or SOC information has been corrupted. Run spock to align the database and le information. Reason-2 The commissioning data is corrupted. Clear all commissioning data. See SCF: Clearing Commissioning Data on page 8-3. 8.1.4 ODS: Data Synchronization is Failing See ODS: Data Synchronization Fails on page 2-3.

8.1.5 SCF: NE Release is Set to NONE The available NE releases in the Commissioning Manager is set to NONE. Reason-1 There are no loads available in the /iws/lom/loads directory. Use the install_release tool to deliver NE loads to the OPC. Note the install_release tool will not become available until the OPC system data has been commissioned. Once the NE loads are delivered, close the Commissioning Manager and re-open it to read the new NE load information. Commission the new NEs.

8.1.6 SCF: Cannot Edit the Commissioned NE The NE commissioning information cannot be changed. Reason-1 Associations to the NE are down. The NE commissioning information cannot be edited unless associations are available. See CAM: Associations are Down or are Unstable on page 2-4. Reason-2 The new NE type change is incompatible with the NE. The NE will reject any commissioning change that it cannot support. To force a change to an NE. First delete the NE and then add it back again with the new changes. Reason-3 Editing of the NE data is not supported for all NE types.

8-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Commissioning Manager

8.2 Solution description

8.2.1 SCF: Replacing A Backup OPC See NTP Module Replacement Procedures, Replacing an OPC module. 8.2.2 SCF: Replacing A Primary OPC See NTP Module Replacement Procedures, Replacing an OPC module. 8.2.3 SCF: Clearing Commissioning Data The procedure listed below will clear all commissioning data. Note the save before clearing all commissioning information. The data can be retrieved later to re-commission the system to its original state, if required. Clearing Commissioning Data: 1) Login to the primary OPC, as root 2) Store all previous commissioning data to a le See SCF: Dumping All Commissioning Data to File on page 8-3. 3) Enter opcui 4) Open the enable clear commissioning tool 5) Select enable clear commissioning 6) Quit the tool 7) Open the Commissioning Manager 8) Select the Clear Commissioning button 9) Wait for the data to get cleared 10) Commission the OPC. 8.2.4 SCF: Dumping All Commissioning Data to File This procedure will keep a data le of all commissioning information: Dumping Commissioning Data: 1) Login to OPC, as root 2) Enter /iws/scf/scct -d > /tmp/commissioning.data.old NOTE: scfxt -d also works. In both cases the OPC must be active.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 8-3 Issue: AD04

Chapter

Commissioning Manager

For Nortel Internal Use Only

8-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

Network Surveillance

9.1 Problem Description


The following sections describe common surveillance problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description. The network surveillance tools include: Network Summary Alarm Monitor Network Browser Event Browser

9.1.1 LAS: Network Surv Tools Display ? Symbol There are ? symbols displayed on the network surveillance tools. Reason-1 The ? indicates that the alarm information is out of date and should not be trusted. The ? will normally go away in 30 minutes when the alarm audit runs. It is not necessarily indicative of a loss of association. See CAM: Associations are Down or are Unstable on page 2-4.

9.1.2 LAS: OPC Alarm View Doesnt Match NE Alarm View The alarm counts shown on the network surveillance tools do not agree with the alarm counts shown on the NE. Reason-1 Heavy messaging between OPC and NE may be causing alarm update messages to be lost. Wait 30 minutes for a regular alarm audit to occur. The audit will cause the alarm information to align with the NE view. Quick alarm audit may be arranged (5 minutes) to retrieve the lost alarms on the OPC. Reason-2 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-4.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 9-1 Issue: AD04

Chapter

Network Surveillance

For Nortel Internal Use Only

Reason-3 Periodic alarm audits are not occurring. If after 30 minutes the OPC alarm view still does not match the NE alarm view then it will be necessary to re-initialize the logalarmsystem MSR. Shutting down the OPC using the OPC Shutdown tool will also work. Reason-4 An NE has recently been added or deleted. This problem becomes more prominent during heavy OPC-NE messaging. To correct this problem, re-initialize the logalarmsystem MSR. Shutting down the OPC using the OPC Shutdown tool will also work. Reason-5 There has been a corruption in the LAS database. Reset the LAS database. Note that all the alarm and log information will be lost. Reason-6 There has been a memory corruption in the LAS cache. Re-initialise the log alarm system MSR or shut down the OPC. 9.1.3 LAS: CMT Status Line Doesnt Match the NE Alarm Banner The CMT status line, located at the bottom of the CMT session manager, does not match the counts displayed at the NE alarm banner. But the other network surveillance tools are correct. Reason-1 Status tool has hung. The NE counts and the time will freeze on the status line when the status tool is hung. To correct the problem, logout and login.

9.1.4 LAS: Network Surv Tools Dont Display Newly Added NEs or NE Names Adding a new NEid does not update the surveillance tools. Changing or creating an NE name does not update the surveillance tools. Reason-1 The update message was not received. The work-around is to close and re-open the tool. Reason-2 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-4.

9-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Network Surveillance

9.1.5 LAS: Event Browser Filter Settings Changed The lter settings for the Event Browser has changed since last invocation. Reason-1 Multiple users using the same userids have the Event Browser. Only one lter saving is saved for any one given userid. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13.

9.1.6 LAS: Alarm Monitor Isnt Displaying All Alarms The Alarm Monitor isnt showing all the alarms or is not updating properly. Reason-1 Alarm Monitor lter selection is set improperly. Change the lter to the appropriate values and save the settings. Reason-2 The auto-update option has been turned off. Turn on the auto-update option. Reason-3 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-4. Reason-4 Alarm monitor discriminator has been disabled because of the heavy messaging. Check the OPC syslog. The discriminator will be enabled once the system is stabilised.

9.1.7 RBN: RBNDISK is Not Running If RBNDISK is not running there is a chance that the OPC disk will become more than 95% full. There should be events recorded in the /var/log/syslog le indicating that RBNDISK was run each hour. Reason-1 The rbndisk script is not available or not executable. Perform a ll /iws/rbn/ rbndisk. If the script is not found, retreive a copy from another OPC or extract the le from another tape or ash card. If the permission are not set properly change it with the chmod command.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 9-3 Issue: AD04

Chapter

Network Surveillance

For Nortel Internal Use Only

9.1.8 RBN: Disk 95% Full Alarm The Promary or(and) Backup: disk 95% full alarm(s) is(are) active on the OPC. Reason-1 The hard disk of the OPC is more than 95% full. NOTE After more disk space is available, you must clear the alarm manually using the OPC Alarm Provisioning tool or wait for the next periodic audit to clear the alarm.

9.2 Solution Description


9.2.1 LAS: Dumping Contents of the LAS Database To view all logs and alarms issue the following commands: lasdump to dump all OPC logs neldump tool performs the same action lasaldmp to dump all active alarms lasaldmp -a to dump all historical alarms

9-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

10

NE Login

10.1 Problem Description


The following sections describe common NE login problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description. 10.1.1 NEA: NEs are missing The NE list of the NE Login Manager is incomplete. Reason-1 The NE Login Manager does not dynamically update the NE list. Select the Refresh List button or close and re-open the tool to update the list. Reason-2 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-4. 10.1.2 NEA: More NEs Displayed Than Commissioned The NE list of the NE Login Manager has more NEs than there are commissioned in the system. Reason-1 The NE Login Manager traverses all the SOCs in a network and displays all NEs that are visible to the OPC including those of the other SOCs. 10.1.3 NEA: Duplicate NEs Error Message The Duplicate Error Message is displayed when the NE Login Manager is opened. Reason-1 Another NE with the same NEid was found. The NE Login Manager can traverse many SOCs in a network, as a result an NE with the same NEid may be found. As root, enter nnsmon -d | more and determine if another NE with the same NEid exists. If this is the case the NEids must be changed to something unique throughout the entire network. 10.1.4 NEA: NE Access is very slow The response time for the NEA is very slow.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 10-1 Issue: AD04

Chapter 10

NE Login

For Nortel Internal Use Only

Reason-1 The OPC-NE messaging is very high. This can be a result of an NE download, excessive alarm messaging or too many users. See USM: Excessively Slow on page 1-6. Reason-2 Too many NE sessions open. The OPC can only support a maximum of 4 NE sessions per NE with a total maximum of 12 NE sessions in total for an OPC. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13. 10.1.5 NEA: NE Access is not available Cannot log into an NE. Reason-1 Too many NE sessions open. The OPC can only support a maximum of 4 NE sessions per NE with a total maximum of 12 NE sessions in total for an OPC. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13. Reason-2 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-4. Reason-3 Duplicate NEid. See NEA: Duplicate NEs Error Message on page 10-1. Reason-4 The RMAP executable is missing or is non-executable. Perform ll /iws/rmp and assure the following line is displayed -rwxr-xr-x 1 29470 538 68251 Oct 23 19:14 rmapcl Note the size, date and time will vary. If the executable is missing then it will need to be replaced. If the attributes are not set correctly then use chmod to set the attributes correctly. The OC-3 Express and OC-192 NEs use telnet protocol. Perform ll /iws/tel and assure the following line is displayed:

10-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter
3697

10

NE Login

-rwxr-xr-x 1 root

56857 Sep 15 04:16 nelogin

Reason-5 Userids and passwords are not set correctly. Check the Centralized User Administration tool and force an audit to the NEs. See USM: Cannot Login on page 1-2. Reason-6 The NE Login Manager is stuck. Try to login using the nelogin command. If the nelogin works then there is a problem with the NE Login Manager itself, contact the appropriate OPC support authority. See NEA: Using nelogin to Access NEs on page 10-4. 10.1.6 NEA: NE Login Manager will not Start The NE Login Manager will not start up. Reason-1 The NE Login Manager executables are missing or are non-executable. Perform ll /iws/nea and assure the following lines are displayed:
-r--r--r--rwxr-xr-x -rwxr-xr-x -rwsr-xr-x -r--r--r--rwsr-xr-x 1 1 1 11 1 11 root root root root root root 519 519 519 519 519 519 14523 54804 739 1940084 7104 1586295 Oct Oct Oct Oct Oct Oct 24 24 24 24 24 24 00:11 00:11 00:11 00:10 00:11 00:10 nea.uidt neainit neainstl nealct neaohelp.help neaxt

Note that the size, date and time will vary. If the executables are missing then they will need to be replaced. If the attributes are not set correctly then use chmod to set the attributes correctly. 10.1.7 NEA: Cannot Auto-Login to NE from OPC From the OPC, the auto-login command is failing to log into an NE. Reason-1 Logged into OPC as root. The root userid does not exist on the NE and therefore cannot auto-login to an OPC. Perform a regular login using a valid userid and password. Reason-2 Logged into OPC, as a user without NE access. Since the userid does not have NE access, auto-login is disabled. Perform a regular login using a valid userid and password.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 10-3 Issue: AD04

Chapter 10

NE Login

For Nortel Internal Use Only

Reason-3 The OPC and NE databases have become out of sync. This can occur as a result of a restore from an older database backup or an auto-prov. Restore the NE with the current database backup. Reason-4 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-4. Reason-5 The NE type is OC-3 Express. This NE does not support the autologin feature.

10.2 Solution Description


10.2.1 NEA: Using nelogin to Access NEs The following method of accessing NEs is not supported. Using this method will bypass the safety mechanisms for preventing excessive numbers of NE login sessions being opened. Using nelogin to Access an NE: 1) 2) 3) 4) 5) 6) 7) Login to the OPC, as admin Open a UNIX shell Enter cd /iws/tel Enter nnsmon -d A list of available nodes will be displayed Enter nelogin x where x=NEid (NE.1.1.x) Enter userid and password

If the user is required to log data (ie. lasdump, xterm logging), the user should login to the OPC as ROOT and not start OPCUI.

10-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

11

Remote Telemetry

11.1 Problem Description


The following sections describe common remote telemetry problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description.

11.1.1 TBOS: No NE Listed in Remote Telemetry Tool When the Remote Telemetry tool is opened there are no TBOS port listings. Reason-1 By default there are no TBOS ports set on an NE, therefore no TBOS listings will appear on the Remote Telemetry tool. Login to the NE and add some TBOS ports. Reason-2 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-30.

11.1.2 TBOS: NEs Missing on Remote Telemetry Tool The Remote Telemetry does not display all the NEs with active TBOS ports. Reason-1 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-30. Reason-2 TBOS information was lost. Login to the NE and delete and re-add TBOS port.

11.1.3 TBOS: TBOS Display Screen IS Frozen The TBOS serial display equipment is frozen. Reason-1 Cable connecting the NE TBOS port to the TBOS display equipment is faulty or loose. Check the cable and connections.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 11-1 Issue: AD04

Chapter 11

Remote Telemetry

For Nortel Internal Use Only

Reason-2 The NE TBOS port is state is OOS. Login to the NE and change the TBOS port state to IS. Reason-3 The NE TBOS port is status is OFF. Login to the NE and change the TBOS port status to ON. Reason-4 TBOS display equipment not set correctly. Verify settings with manufacturer of equipment. Reason-5 TBOS display equipment cannot display past an empty port. Verify with the manufacturer of the equipment. If the display equipment cannot display past empty ports then login to the NE and re-assign the ports to assure none are empty.

11.1.4 TBOS: Serial Telemetry for Remote Display is Incorrect The remote display of the serial telemetry port is not updating correctly or contains incorrect data. Reason-1 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-30. Reason-2 The eventhandler MSR is not updating properly. Perform a view /var/log/syslog and enter ?ems. If the eventhandler is SWERRing then re-initialize the MSR. If the problem persists, contact the appropriate OPC support authority. 11.1.5 TBOS: Parallel Telemetry for Remote Display is Incorrect The remote display of the parallel telemetry port is not updating correctly or contains incorrect data. Reason-1 The parallel port assignments have changed or are incorrect. Login to the NE, and verify the settings for the parallel port are correct. Reason-2 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-30.

11-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

11

Remote Telemetry

Reason-3 The eventhandler MSR is not updating properly. Perform a view /var/log/syslog and enter ?ems. If the eventhandler is SWERRing then re-initialize the MSR. If the problem persists, contact the appropriate OPC support authority. 11.1.6 TBOS: Data Selector for Monitor Display is Disabled All selections for the Remote Telemetry monitored display data selector have been grayed out. Reason-1 The NE has already been assigned Monitor Display 2. No need to assign it, its already there. If it is desired to move Monitor Display 2 elsewhere, delete the assignment and re-add it. Reason-2 Monitor Display 2 has already been assigned twice in the SOC. To re-assign Monitor Display 2, delete one of the assignments and re-add it elsewhere.

11.1.7 TBOS: Remote Telemetry Tool Wont Open on Active OPC The Remote Telemetry tool will not open on the active OPC. Reason-1 An instance of the tool is already open. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-23. Reason-2 The primary OPC is not active. Use the Remote Telemetry tool at the backup OPC (do not perform remote telemetry provisioning at the backup) or use the ows_swact command to make the primary OPC active and the backup OPC inactive. See OPC Tools on page A-137. Reason-3 MSRs which are critical for the Remote Telemetry tool are not running. Use the drmstat tool to determine the state of the MSRs. All MSRs, except possibly TL1X25SM, should be in the INSERVICE/AVAILABLE state. Initialize any MSR which is not INSERVICE/AVAILABLE. See OPC Tools on page A-137.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 11-3 Issue: AD04

Chapter 11

Remote Telemetry

For Nortel Internal Use Only

11.1.8 TBOS: Source NE and Source Display Unknown The following warning is shown, There are mapping assignments whose Source NE and Source Display are unknown. Please reassign or delete these mappings. Reason-1 Provisioning of the Remote Telemetry tool was performed at the backup OPC without performing a data sync to the primary OPC. The alarm is indicative of a misalignment between the primary and backup OPC. Align the primary to the backup OPC by performing a data sync or delete and re-add the TBOS settings at the primary OPC. See ODS: Data Sync from Backup to Primary on page 2-34. Reason-2 NE has restarted with an older version of a database backup. The TBOS settings are stored as part of the NE database backup, an older version may have obsolete information. Restore the NE with the current NE database backup or delete and readd the TBOS settings at the primary OPC. Reason-3 The backup OPC became active before a datasync from primary to backup could occur. The alarm is indicative of a misalignment between the primary and backup OPC. Align the primary to the backup OPC by performing a data sync or delete and re-add the TBOS settings at the backup OPC. Reason-4 The NE was deleted from the Commissioning Manager. Delete the TBOS setting at the primary OPC.

11.1.9 TBOS: Maximum Number of Display Mappings Reached The following warning is shown, The maximum number of display mapping to the indicated TBOS port has been reached. Reason-1 No port position was supplied and the Remote Telemetry tool was unable to select the rst empty position because all the positions have already been assigned, either locally or remotely. Select an assigned position and re-assign with the desired port.

11.1.10 TBOS: Position ID Field is Invalid The following warning is shown, The display Mapping cannot be performed because the Position ID eld is invalid.

11-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

11

Remote Telemetry

Reason-1 The given Position ID is out of range. The Position ID must be a number between 1 and 8, or empty. An empty Position ID will automatically select the rst available empty position.

11.1.11 TBOS: Monitored Source Field is Invalid The following warning message is shown, The display Mapping cannot be performed because the Monitored Source eld is empty or invalid. Reason-1 The given Monitored Source is empty or invalid. A Monitor source NE must be selected for every assignment request. There is a data selector of available NEs for this eld.

11.1.12 TBOS: Monitored Source Name Doesnt Correspond to Source ID The following warning is shown, The display Mapping cannot be performed because the Monitored Source name does not correspond to Monitored Source ID provided. Reason-1 The name of the NE has been entered incorrectly. There is a data selector of available NEs for this eld. Reason-2 NEs have been recommissioned while the Remote Telemetry tool was open. Close and re-open the Remote Telemetry tool to read the new NE data.

11.1.13 TBOS: Display Field is Empty or Invalid The following warning is shown, The display Mapping cannot be performed because the Display eld is empty or invalid. Reason-1 The given Display eld is incorrect or empty. A Display must be selected if the Monitor Source is not the Network. There is a data selector of available Displays for this eld.

11.1.14 TBOS: Cannot Remove Display Mapping The following warning is shown, The removal of display Mapping cannot be performed because there is no display assignment at the indicated position. or The removal of display Mapping cannot be performed because no display assignment has been selected.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 11-5 Issue: AD04

Chapter 11

Remote Telemetry

For Nortel Internal Use Only

Reason-1 Attempting to delete an empty port position. Select the assigned port and try the delete command again. Reason-2 No assignment was selected for deletion. Select the assigned port and try the delete command again.

11.1.15 TBOS: System Generated Time Out The following warning is shown, System generated time out. OPC failed to provide a response to previous request within a timely manner. Consult the Event Browser tool for more detail. Reason-1 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-30. Reason-2 Message was discarded as a result of overow conditions. Heavy Messaging between OPC and NE. Wait for messaging rate to reduce. Use tstatc to view socket usage. See USM: Excessively Slow on page 1-17. See OPC Tools on page A-137.

11.1.16 TBOS: Association is Down Between OPC and NE The following warning is shown, Association is down between the OPC and requested Network Element. When this warning is displayed NE provisioning changes cannot be changed. The content of existing remote assignments will not be updated until the association is restored Reason-1 Association to the NE is down. See CAM: Associations are Down or are Unstable on page 2-30.

11.1.17 TBOS: Display is Already Mapped The following warning is shown, The display Mapping cannot be performed because the Display selected is already mapped to the selected Network Element.

11-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

11

Remote Telemetry

Reason-1 Each display is only allowed to be mapped once to an NE through Remote Telemetry regardless of the number of TBOS ports the NE has. The assignment for this display to this NE must be deleted before another assignment of it will be permitted to this NE.

11.1.18 TBOS: Maximum Number of Mappings Exceeded The following warning is shown, The display Mapping cannot be performed because it will exceed the limit of the maximum number of mappings allowed for each display. Reason-1 A display is only allowed to be assigned twice within the OPC span of control through Remote Telemetry tool. Delete a previous assignment to the display. 11.1.19 TBOS: Display Mapped to its Source NE is Not Allowed The following warning is shown, The display Mapping will not be performed because the display is being mapped back to its source NE. Please use the local NE UI to perform this assignment. Reason-1 Displays can be assigned to any NE as long as that NE is not the source of the display. This provisioning must be done locally on the NE UI.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 11-7 Issue: AD04

Chapter

For Nortel Internal Use Only

12

Remote OPC Login

12.1 Problem Description


The following sections describe common Remote OPC Login problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description.

12.1.1 ROA: Remote OPC Login is Not Available The Remote OPC Login Manager is missing. Reason-1 The user is logged in through a CMT. The Remote OPC Login Manager is only available on GUI. Use the nelogin command to access a remote OPC See ROA: Using nelogin to Access OPCs on page 12-3.

12.1.2 ROA: OPCs are missing The OPC list of the Remote OPC Access tool is incomplete. Reason-1 The Remote OPC Login Manager does not dynamically update the OPC list. Close and re-open the tool to update the list. Reason-2 Communication to the OPC is unavailable. See CAM: Associations are Down or are Unstable on page 2-4.

12.1.3 ROA: More OPCs Displayed Than Commissioned The OPC list of the Remote OPC Login Manager has more OPCs than there are commissioned in the system. Reason-1 The Remote OPC Login Manager traverses all the SOCs in a network and displays all OPCs that are visible to the OPC including those of the other SOCs.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 12-1 Issue: AD04

For Nortel Internal Use Only

Chapter

12

Remote OPC Login

12.1.4 ROA: Duplicate OPCs Error Message The Duplicate Error Message is displayed when the Remote OPC Login Manager is opened. Reason-1 Another OPC with the same OPC name was found. The Remote OPC Login Manager can traverse many SOCs in a network, as a result an OPC with the same name may be found. As root, enter nnsmon -d | more and determine if another OPC with the same name exists. If this is the case the names must be changed to something unique throughout the entire network.

12.1.5 ROA: OPC Access is very slow The response time for the ROA is very slow. Reason-1 The OPC-NE messaging is very high. This can be a result of an NE download, excessive alarm messaging or too many users. See USM: Excessively Slow on page 1-6. Reason-2 Too many NE sessions open. The OPC can only support a maximum of 12 OPC and NE login sessions. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13.

12.1.6 ROA: Remote OPC Login to other OPCs Not Available Cannot log into an OPC. Reason-1 Too many OPC sessions open. The OPC can only support a maximum of 12 OPC and NE Login sessions. Login to the OPC as root and enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13. Reason-2 Communication to the OPC is unavailable. See CAM: Associations are Down or are Unstable on page 2-4.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 12-2 Issue: AD04

Chapter 12

Remote OPC Login


Reason-3 Duplicate OPC name.

For Nortel Internal Use Only

See ROA: Duplicate OPCs Error Message on page 12-2. Reason-4 Userids and passwords are not set correctly. Check the Centralized User Administration tool and force an audit to the NEs. See USM: Cannot Login on page 1-2. Reason-5 The Remote OPC Login Manager is stuck. Try to login using the nelogin command. If the nelogin works then there is a problem with the Remote OPC Login Manager itself, contact the appropriate OPC support authority. See ROA: Using nelogin to Access OPCs on page 12-3.

12.2 Solution Description


12.2.1 ROA: Using nelogin to Access OPCs The following method of accessing OPCs is not supported. Using this method will bypass the safety mechanisms for preventing excessive numbers of OPC login sessions being opened. Using nelogin to Access an OPC: Login to the OPC, as admin Open a UNIX shell Enter cd /iws/tel Enter nnsmon -d A list of available nodes will be displayed Enter nelogin x 1600 where x=OPC node name (OPCMnnnP/B) or enter nelogin -n z 1600 where z=OPC nsap address (49.0000...) 7) Enter userid and password 1) 2) 3) 4) 5) 6)

12-3 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

13

Centralized Security

13.1 Problem Description


The following sections describe common user security problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description.

13.1.1 CUA: Users Cannot Login after Upgrade After an upgrade to OC48 Release 7, one or more users cannot log into the OPC even though the user account(s) are not disabled. Reason-1 The user prole upgrade procedure for upgrades to Release 7 can cause some user passwords to be corrupted under certain conditions. The root users password is not affected. To correct the problem, assign new passwords to the affected userids.

13.1.2 CUA: Users Cannot Login One or more users cannot log into the OPC. Reason-1 The userid is disabled. Open the Centralized User Administration tool and verify that the userid exists and is not disabled. To correct the problem, assign new passwords to the affected userids. Reason-2 Entries in the /etc/passwd le have been manually deleted. The passwd le contains all the valid userids and associated passwords. This le should never be edited. See CUA: Verifying the Contents of the Password File on page 13-4. Reason-3 Entries in the /etc/group le have been manually edited. The group le contains all the valid user groups and all userids associated with the group. This le should never be edited. See CUA: Verifying the Contents of the Group File on page 13-6.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 13-1 Issue: AD04

Chapter 13

Centralized Security

For Nortel Internal Use Only

Reason-4 Another user has changed the password using either the Password Update tool or the Centralized User Administration tool. Ask your system administrator to assign a new password. Reason-5 The user has forgotten his or her password. Ask the system administrator to assign a new password. 13.1.3 CUA: NE User Class Different Than Indicated A user logged onto an NE has an access class different than the access class shown for the userid in the Centralized User Administration tool. Reason-1 The user was logged into an NE when the Centralized User Administration tool was used to change the userids access class for that NE. While the user is logged into the NE, the CUA changes will not take effect. For the CUA changes to take effect, logout and back into the NE. Reason-2 Association to the NE is down. While the associations are down CUA changes cannot be made to the NE. See CAM: Associations are Down or are Unstable on page 2-4.

13.1.4 CUA: Userid is Disabled, But User Can Still Login A userid is logged onto an NE even though the Centralized User Administration tool shows the user account status as disabled. Reason-1 The user was logged into an NE when the Centralized User Administration tool was used to disable/delete the userid. A warning is generated indicating that a user is still logged in. While the user is logged into the NE, the CUA changes will not take effect. Once the user has logged off, the Centralized User Administration audit will synchronize the NE with the OPC. Use the forceout <userid> command in the NE UI, to force a user off an NE. Note only an admin class user can use the forceout command. Reason-2 Association to the NE is down. While the associations are down CUA changes cannot be made to the NE. See CAM: Associations are Down or are Unstable on page 2-4.

13-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

13

Centralized Security

13.1.5 CUA: Userid is Disabled, But User Gets Wrong Error Message A userid is in the disabled state however when the user logs into an NE, a message is displayed stating that the user account has expired and the user must log into the OPC to change his password. Attempt to login to the OPC fails, since the userid is in the disabled state. Reason-1 The user was logged into an NE when the Centralized User Administration tool was used to disable/delete the userid. A warning is generated indicating that a user is still logged in. While the user is logged into the NE, the CUA changes will not take effect. Once the user has logged off, the Centralized User Administration audit will synchronize the NE with the OPC. Use the forceout <userid> command in the NE UI, to force a user off an NE. Note only an admin class user can use the forceout command. Reason-2 Association to the NE is down. While the associations are down CUA changes cannot be made to the NE. See CAM: Associations are Down or are Unstable on page 2-4. Reason-3 The Centralized User Administration audit was not run. Login to the OPC as admin and force an audit or wait until the next scheduled audit occurs.

13.1.6 CUA: Cannot Login to Newly Commissioned NEs Cannot manual or auto login to a newly commissioned NE. Reason-1 New NEs are not automatically assigned non-default userids. The default userids and passwords assigned are admin, operator and netsurv. The next Centralized User Administration audit will synchronize the NE with the OPC. Reason-2 Association to the NE is down. While the associations are down CUA changes cannot be made to the NE. See CAM: Associations are Down or are Unstable on page 2-4. Reason-3 The Centralized User Administration audit was not run. Login to the OPC as admin and force an audit or wait until the next scheduled audit occurs.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 13-3 Issue: AD04

Chapter 13

Centralized Security

For Nortel Internal Use Only

13.1.7 CUA: Cannot Login to NEs Which Have Been Restarted or Rebooted Cannot manual or auto login to an NE which has just been restarted or rebooted. Reason-1 The NE has restored from an older NE database backup which does not have the userid changes. Only those users which were assigned at the time of the backup will be remembered. The next Centralized User Administration audit will synchronize the NE with the OPC. Reason-2 There were no NE database backups available and the NE has auto-provd. The default userids and passwords assigned are admin, operator and netsurv. The next Centralized User Administration audit will synchronize the NE with the OPC. Reason-3 The Centralized User Administration audit was not run. Login to the OPC as admin and force an audit or wait until the next scheduled audit occurs.

13.1.8 CUA: Userid shows Assigned/Expired, But the Password Was Changed The Centralized User Administration tool shows a userid account status as assigned/ expired even though the user changed the password using the Password Update tool. Reason-1 The Centralized User Administration tool does not dynamically update its list. Close and re-open the Centralized User Administration tool to update the list.

13.1.9 CUA: Root Password Was Forgotten The root password was forgotten, how do I recover? See CUA: Recovering the Root Password on page 13-7.

13.2 Solution Description


13.2.1 CUA: Verifying the Contents of the Password File The following procedure will verify some of the major aspects of the /etc/passwd le. In particular the home directories and the start up commands will be veried. Verifying the Contents of the /etc/passwd File: 1) Login to the OPC, as root

13-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

13

Centralized Security

2) Enter view /etc/passwd The format of each userid is: admin:GtJOQfoqT1F6c:10:21:admin:/home/admin:/iws/usm/usmstart

userid

password

home

start up

userid = the name of the user password = encrypted password for that userid home = the home directory of the userid start up = the command to launch for the userid

The previous example is for the userid, admin. The admin home directory is located in the home/admin directory and the rst command started when logging in as the admin user is /iws/usm/usmstart. 3) Record each userid, home and start up in the /etc/passwd le 4) Verify the existence and attributes of the userid home directory by entering ll -d <home> 5) Verify the existence and attributes of the userid start up command by entering ll <start up> 6) Repeat steps 4 and 5 for all userids. The following is the contents of a default passwd le: opcods::11:1001:opcods:/users/opcods:/bin/false anonymous::12:1000:anonymous:/users/anonymous:/bin/false netmgr:*:99:21:Network Manager:/users/netmgr:/bin/sh swdelivery:*:14:21:software_delivery:/users/swdelivery:/bin/csh root:QmGq2fte4KMnM:0:1:Super User:/:/bin/csh daemon:*:1:1::/:/bin/sh bin:*:2:2::/bin:/bin/sh adm:*:4:4::/usr/adm:/bin/sh lp:*:9:2::/usr/spool/lp:/bin/sh admin:HxbnDYhh.IUoQ:10:21:admin:/home/admin:/iws/usm/usmstart operator:JdEHd3P2A5cPM:13:21:operator:/home/operator:/iws/usm/usmstart slat:sqL2ZypvLhmZ.:8:17:slat:/home/slat:/iws/usm/usmstart demo:C8CH1Cq7omh82:6:16:demo:/home/demo:/iws/usm/usmstart master:j82x7BZss6VJo:16:22:master:/home/master:/iws/usm/usmstart

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 13-5 Issue: AD04

Chapter 13

Centralized Security

For Nortel Internal Use Only

root1:*:0:1:pseudo root user:/:/etc/rlucsh netsurv:Qvnbl.h6wXxm6:18:18:netsurv:/home/netsurv:/iws/usm/usmstart standby:1iT0/6RsXN9Co:15:100:standby:/home/standby:/iws/usm/usmstart viewsurv:dASnONdNTYrX6:19:23:viewsurv:/home/viewsurv:/iws/usm/usmstart nmaproc:kdv1lhT/u7Md.:20:26:nmaproc:/home/nmaproc:/bin/false opsproc:FdO352d0KRgD.:21:27:opsproc:/home/opsproc:/bin/false bthproc:ajgEbRhrRms5E:22:28:bthproc:/home/bthproc:/bin/false 13.2.2 CUA: Verifying the Contents of the Group File The following procedure will verify some of the major aspects of the /etc/group le. In particular the groups and userids will be veried. Verifying the Contents of the /etc/group File: 1) 2) 3) 4) Login to the OPC, as root Enter view /etc/passwd Record all the userids Enter view /etc/group The format of each userid is Group Userids admin:*:21:netmgr,swdelivery,admin,operator,maint,admin1 The previous example is for the group called admin. The admin group contains the userids: netmgr, swdelivery, admin, operator, maint and admin1. 5) Assure every userid listed in the /etc/passwd le belongs to a group listed in the /etc/ group le. Some userids will belong to more than one group. The following is the contents of a default group le: root::0:root other::1:root,daemon bin::2:root,bin,daemon,lp sys::3:root,bin,sys,adm adm::4:root,adm,daemon daemon::5:daemon mail::6:root admin:*:21:netmgr,swdelivery,admin,operator ftam:*:1000:anonymous

13-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

13

Centralized Security

opcgrp:*:1001:opcods lp::7:lp slat::17:slat demo::16:demo netsurv::18:netsurv techsupp::25: standby::100:standby master::22:master oc3admin::24: viewsurv::23:viewsurv nmaproc::26:nmaproc opsproc::27:opsproc bthproc::28:bthproc 13.2.3 CUA: Recovering the Root Password The following procedure will allow you to recover the root password. 1) Connect a vt100 to Port B of the OPC (since this procedure drops ethernet connectivity). 2) Insert a disk initialization tape into the primary OPCs tape drive. 3) Login as any user that has access to the OPC Shutdown tool (e.g. admin) and use the OPC Shutdown tool to restart the OPC. Wait for the OPC to reboot from the disk initialization tape before going on to the next step. NOTE: The Backup OPC will become active during this step. 4) Login to the OPC as the root user, using the password toor. 5) Type the following commands at the UNIX prompt: mount /dev/dsk/disk1 /disk cd /disk/etc cp passwd passwd.bak vi passwd 6) In the vi editor, change the line specifying the root account so that the root password is blank. For example, Line before edit: root:QmGq2fte4KMnM:0:1:Super User:/:/bin/csh Line after edit: root::0:1:Super User:/:/bin/csh 7) Write the le, and then exit from the vi editor. 8) Remove the disk initialization tape from the OPC tape drive. 9) On the UNIX command line, type /etc/reboot. Wait for the OPC to reboot from disk before going on to the next step. Note: The Primary OPC will eventually become active, but you dont have to wait for it to do so before proceeding to the next step. 10) Login to the OPC as the root user. You shouldnt have to specify a password.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 13-7 Issue: AD04

Chapter 13

Centralized Security

For Nortel Internal Use Only

11) Change the root password immediately using the UNIX command, passwd.

13-8 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

14

TL-1 and X.25

14.1 Problem Description


The following sections describe common TL1 and X.25 problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description. Throughout this section the OS and DCE will be used synonymously.

14.1.1 TL1: TID/SID Name Rejected Using the set-sid TL1 command fails. Unable to change the value of the 20 character TID/SID. Reason-1 The new name contains illegal characters. A valid TL1 name must not contain any spaces and must be comprised of only alphanumeric characters and underscore. TL1 will convert all alpha characters to uppercase. Reason-2 The old name contains illegal characters. The set-sid command uses the old name to address the target NE. If the old name contains invalid data then the set-sid command will not understand the name. Change the old name to a valid new name using the tidmap command on the OPC. See Complete set of OPC Tools on page A-1.

14.1.2 TL1: RETRIEVE PM Responds with DENY Issuing the RETRIEVE PM T3 TL1 command results in an empty response block. Reason-1 The PM collection is disabled. Use RTRV-PMMODE to nd the pm collection state for associated facilities and SET-PMMODE to enable/disable PM collection state for any/all facilities. Alternatively, use OPC PM Coll. Filter from OPCUI. 14.1.3 TL1: Response Messages are Lost or Not Complete Responses from TL1 are being lost or is missing information.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-1 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

Reason-1 The OS is trying to implement hardware ow control. The OPC does not support hardware ow control. Disable ow control at the OS and try to increase the baud rate between the OPC and OS to 9600 baud. Note the maximum OPC baud rate is 9600. Reason-2 The OS cannot handle the size of the response message. The OPC can generate response messages up to 1Kbyte in size. The work-around is to not perform retrieve all commands, instead use specic retrieve calls which will return smaller responses that the OS can handle.

14.1.4 TL1: Missing PM Counts PM counts are not being displayed by TL1. Reason-1 A INH-PMREPT TL1 command is in effect. Determine which counts are being inhibited and use the ALW-PMRPT TL1 command to allow the PM counts. See TL1: Determining Inhibited PM counts on page 14-12. Reason-2 PM collection is not turned on at the OPC. Use the methods described in 15.1.2 to enable pm collection.

14.1.5 TL1: Call Accepted Logs in the Event Browser The NAD500 X.25 call accepted from X.121 address <X.121 address> log is being generated in the Event Browser Tool. Reason-1 An OS at the specied X.121 address has successfully connected to the OPC.

14.1.6 TL1: Call Terminated Logs in the Event Browser The NAD501 X.25 call terminated from X.121 address <X.121 address> log is being generated in the Event Browser Tool. Reason-1 An OS at the specied X.121 address has disconnected from the OPC.

14-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

14.1.7 TL1: Call Rejected Logs in the Event Browser The NAD300 X.25 call rejected from X.121 address <X.121 address> log is being generated in the Event Browser Tool. Reason-1 An OS at the specied X.121 address is trying to connect to the OPC, but the OPC is rejecting the call since the X.121 address is not on the VCP list. Use the cong_TL1 or vcpinfo tools to list and add an X.121 address to the OPC. See Complete set of OPC Tools on page A-1.

14.1.8 TL1: Port not congured for X.25 Logs in the Event Browser The GEN310 X.25 is not congured on any port log is being generated in the Event Browser Tool. Reason-1 X.25 has not been congured on a port. Use the cong_TL1 or the cong_port tools to congure X.25 on a port. See Complete set of OPC Tools on page A-1.

14.1.9 TL1: Cannot Establish Connection Cannot Establish TL1 connection over X.25 between OS and OPC. Reason-1 The Port B is not congured for X.25. Open the Event Browser tool and search for a NAD310 message indicating that X.25 is not congured or use the cong_port command to query the port. Congure the port using the cong_TL1 or cong_port command. When the port has been congured, initialize the TL1X.25SM MSR, using the cong_TL1 or drmstat tool. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10. Reason-2 The TL1X.25SM MSR is MANUAL or SYSTEM BUSY. The TL1X.25SM MSR will always go SYSTEMBUSY if the Port B is congured as anything other than X.25 and will remain in that state until the MSR is re-initialized. Initialize the MSR using the cong_TL1 or drmstat tool. Reason-3 X.3 PAD is enabled and the TL1 protocol ID is set to 01 or NULL. X.3 PAD reserves the TL1 PID of 01 for its own use, therefore for TL1 and X.3 PAD to both exist the TL1 PID must be set to a value other than 01. Use the cong_TL1 or cong_port tools to determine if the port has been congured for X.3 PAD. Use

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-3 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

the cong_TL1 or vcpinfo tool to determine if the TL1 PID is set to 01 or NULL. To correct the problem, un-congure X.25 and X.3 PAD, change the TL1 PID to the proper value, re-congure X.25 and X.3 PAD and re-initialize TL1X.25SM. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10. Reason-4 LAPB is down. Use the tstatc tool to determine the status of the X.25 layer and the LAPB layer. If the LAPB layer is down then uncongure and re-congure the X.25. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10. See LAPB: LAPB is Dropping on page 14-7. See LAPB: LAPB Problems on page 14-7. Reason-5 The X.25 driver is continuously messaging. From time to time, the chip which drives the X.25 gets stuck and will continuously generate SABM, RNR and/or REJ messages. Use the tstatc tool to determine the status of the lapb connection. Check the transmit message counts for SABM, RNR and REJ. If the counts are continuously rising at more than 1 message/second. Use the x25init command to re-initialize the X.25 driver. Another work-around is to disable and enable the X.25 settings on Port B. Reason-6 The X.25 parameters are not set correctly at both the OS and OPC. For X.25 to work properly all settings at both the OPC and OS must match and be set appropriately. A protocol analyzer may be necessary to determine if this is the problem. The X.25 supplier will provide the proper X.25 parameter settings. Reason-7 The X.25 parameters have been changed at the OPC but the changes are not running. The contents of the /etc/x25init_scc0 le has been manually altered but the port has not been initialized to use the new values. This is done most commonly to quickly change the X.121 address of the OPC. The proper way to correct this problem is to un-congure X.25, change the X.25 parameters, recongure X.25 and re-initialize TL1X.25SM. A faster alternative is to use the x25init command. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10.

14-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

Reason-8 : (This is common to Both TL1 Over X.25 and TCP/IP) (1) Two or more NEs (or OPC) in the OPCs Span of control have the same TID (Target Identier). Make sure that the TIDs are Unique in the OPCs span of control. The TIDs can be set using tidmap tool on the OPC. See OPC Tools on page A-1. (2) TL1 is unable to get the OPCs commisioning data (OPCName). Look for sysylogs and ensure that the OPC is commisioned properly.

14.1.10 TL1: Commands Arent Being Performed Response from the OPC to TL1 commands is DENY with error SROF (Status, Requested Operation Failed). Reason-1 The logalarmsystem MSR is MANUAL or SYSTEM BUSY. The logalarmsystem MSR is required to access events stored by the NE. Re-initialize the MSR. Reason-2 The hardware does not support the requested operation. Login to an NE and determine if the required equipment is available.

14.1.11 TL1: Autonomous Messaging Isnt Working Autonomous messages are not being sent to the OS. Reason-1 There has been a SWACT (switch of activity) between the primary and backup OPC. The backup OPC is now handling the TL1 messaging and the autonomous messages are being sent to the OS connected at the backup OPC. Correct the reason for the OPC SWACT and the primary OPC will revert back to the active state. See OWS: Both Primary and Backup OPCs are Active on page 2-2. See OWS: Primary OPC is Inactive on page 2-2. Reason-2 The NEs containing the primary and/or backup OPCs have not been commissioned. The workaround is to commission the NEs which are containing the OPCs. Reason-3 A silent failure has occurred after an OPC has reverted from a SWACT. A TL1 NMA session may silently failure when the OPC the session is running on goes

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-5 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

from the active to inactive back to active states. There is no indication that the failure has occurred. The workaround is to drop the connection and open a new one. Reason-4 There is more than 4 TL1 sessions running. All TL1 sessions beyond the maximum of 4 will connect but no TL1 sessions will be established. There is an Event Log and TL1 message generated indicating that the maximum number of sessions are up and running. Drop all extra TL1 sessions. Reason-5 Two OS with the same address have managed to connect to the OPC. This will only occur if the two connections arrive at the OPC in less than 5 seconds. One of the connections will successfully establish a TL1 session, while the other will have a connection but no session running. Drop both connections and re-try with one OS at a time. Reason-6 An NE has been re-commissioned (edited, added, deleted) and has caused TL1 to no longer recognizes the autonomous alarms events generated by the NE. Close and open the TL1 connection, to correct the problem. Reason-7 An NE is being downloaded. While the NE is downloaded, autonomous alarm events will not be received at the NE or any NE downstream of the downloading NE. Wait for the NE to complete downloading. Reason-8 Association to the NE is down. While the associations are down, TL1 messages cannot be sent by the NE. See CAM: Associations are Down or are Unstable on page 2-4. Reason-9 The OPC has been restored to an earlier state. The restore may change NE information or change port information. To correct the problem, un-congure X.25, change the X.25 parameters, re-congure X.25, and re-initialize the TL1X.25SM MSR. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10. Reason-10 The OPC has been data synchronized. The data sync may change NE information or change port information. To correct the problem, un-congure X.25, change the X.25 parameters, re-congure X.25, and re-initialize the TL1X.25SM MSR. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10.

14-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

14.1.12 LAPB: LAPB is Dropping The LAPB connection will drop and re-connect by itself. Reason-1 The T3 inactivity timer is set too low. View the /var/log/syslog le. Move to the end of the le by typing G and search for the LAPB message by typing ? LAPB. Around the time the LAPB connection went down, search for a T3 expiration log. If the log is present, set the T3 timer in the /etc/x25init_scc0 le to a higher value or disable the T3 timer by setting it to zero (0). Note that X.25 must be disabled and re-enabled on Port B to assert the new T3 value. Reason-2 The OPC has gone from a busy to non-busy state. When the OPC becomes too busy to handle LAPB messaging, the OPC sends an RNR (Receiver Not Ready) message to the OS. The OS inturn will stop sending messages to the OPC. When the OPC becomes available to handle LAPB messaging again, an RR (Receiver ready) message is normally sent to the OS, however the OPC sends an REJ (Reject) message to the OS. The REJ command causes the OS to respond with an FRMR (Frame Rejected) message. The OPC sees the FRMR and takes down the LAPB connection. There is no known workaround

14.1.13 LAPB: LAPB Problems Using the tstatc command and entering X, the sccn (n=0,1,2,3) port indicates Level 2 down. Most LAPB problems require a protocol analyzer to solve. Reason-1 Wrong cable or bad connection. Assure that the right cable is being used and that it is properly connected. See VT100: Port B Cable Pinouts on page 1-12. Reason-2 The X.25 parameters are not set correctly at both the OS and OPC. For X.25 to work properly all settings at both the OPC and OS must match and be set appropriately. A protocol analyzer may be necessary to determine if this is the problem. The X.25 supplier will provide the proper X.25 parameter settings. Reason-3 X.25 modems are not properly connected or set. Review the settings at all modem sites to assure the settings are correct. A protocol analyzer may be necessary to determine if this is the problem.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-7 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

Reason-4 RR messages are not being sent to and from the OS. Use the tstatc tool and enter L to view the LAPB statistics. Assure that the received and sent RR messages increment together. If the RR messages do not increment together then the X.25 connection is not operating properly. Reason-5 UA message is not being sent from the OS. Use the tstatc tool and enter L to view the LAPB statistics. Assure that the SABM messages are being sent out at T1 intervals (see the /etc/x25init_scc0). Once the OS receives the SABM, the OS will send a UA message indicating that the connection has been acknowledged. If the UA message is not received by the OPC then the OPC will continue to send the SABM message. Reason-6 SABM collision. This is extremely rare, but occasionally both the OS and the OPC will try to establish connection by sending SABM messages at the same time. Use the tstatc tool and enter L to view the LAPB statistics. Assure that the SABM messages received and sent are incrementing at different times. If the SABM counts for both received and sent are incrementing at the same time then the OS is trying to establish the LAPB link at the same time the OPC is trying to. Congure the OS to not establish LAPB connection. Reason-7 The X.25 modem auto-bauds up. The OPC will always try to establish connection at 9600 baud and will then start auto-bauding down until a connection is made. Some modems are designed to auto-baud up to the connection speed and in doing so, both the OPC and modem miss each other. Disable the auto-baud capability or set the baud rate to 9600 at the modem. Reason-8 The MC68302 has a hardware fault. Use the tstatc tool and enter L to view the LAPB statistics. The LAPB status is unknown. Perform view /var/log/syslog and enter ? MC68302 to search for fault SWERRs. See MC68302: MC68302 Problems on page 14-8.

14.1.14 MC68302: MC68302 Problems There is a fault with the MC68302. LAPB state is unknown. Reason-1 The MC68302 load is corrupted. Perform a ll /etc/load.bin to display the le load.bin. The following should be displayed:

14-8 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

-rwxr--r-- 1 root other 80635 Jun 1 03:09 /etc/load.bin The size should always be around 80000. If the size is signicantly different then copy the le from another OPC or extract it from an installation tape. Reason-2 The permissions for the MC68302 load is incorrect. Perform a ll /etc/load.bin to display the le load.bin. The following should be displayed: -rwxr--r-- 1 root other 80635 Jun 1 03:09 /etc/load.bin Use chmod 744 /etc/load.bin to set the permissions of the load.bin le. Reason-3 The x25init le is corrupted. Perform a ll /etc/x25init to display the le. The following should be displayed: -rwxr--r-- 2 root other 84443 Jun 22 04:08 /etc/x25init The size should always be around 84000. If the size is signicantly different then copy the le from another OPC or extract it from an installation tape. Reason-4 The permissions for x25init is incorrect. Perform a ll /etc/x25init to display the le. The following should be displayed: -rwxr--r-- 2 root other 84443 Jun 22 04:08 /etc/x25init Use chmod 744 /etc/x25init to set the permissions of the x25init le. Reason-5 True hardware problem. Return the OPC to Northern Telecom.

14.1.15 X.3PAD: X.3 PAD Cant Establish Connection Unable to establish a connection with the X.3 PAD. Reason-1 The X.3 PAD parameters are not set correctly at the OPC. For X3 PAD to work properly all settings at the OPC must match the settings provided by the X.3 PAD supplier. A protocol analyzer may be necessary to determine if this is the problem. The X.3 PAD supplier will provide the proper parameter settings. Reason-2 The TL1 protocol ID is set to 01 or NULL. X.3 PAD reserves the TL1 PID of 01 for its own use, therefore for TL1 and X.3 PAD to both exist the TL1 PID must be set to a value other than 01. Use the cong_TL1 or cong_port tools to determine if the port has been congured for X.3 PAD. Use the cong_TL1 or vcpinfo tool to determine if the TL1 PID is set to 01 or NULL. To correct the

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-9 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

problem, un-congure X.25 and X.3 PAD, change the TL1 PID to the proper value, re-congure X.25 and X.3 PAD and re-initialize TL1X.25SM. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10.

14.2 Solution Description


14.2.1 TL1: Setting Up X.25, VCP and X.3PAD The following procedures are outlined in greater detail in the NTP, Volume: Operations, Administration and Provisioning, Section: System Administration Procedures. The following procedures will detail steps to uncongure a port, setting up a port for X.25, setting a vcp, setting up a port for X.3PAD and re-initializing the TL1X.25SM. For OPC13 onwards a single tool, cong_TL1, can be used to perform all TL1 conguration requirements. For more information about the individual tools used, see Complete set of OPC Tools on page ,A-1. Uncongure A Port: 1) 2) 3) 4) 5) 6) Login to the OPC, as root Enter cong_port Select item 3, (Uncongure a service) Select item 1, to uncongure the service Repeat step 3 and 4, until there are no services congured on Port B Select item 9, (Exit)

Conguring A Port for X.25: 1) Retrieve all X.25 parameter settings from the X.25 supplier, including the X.121 address of the OPC itself. 2) Login to the OPC, as root 3) Enter cong_port 4) Select item 2, (Congure a service) 5) Select item 3, (X.25) 6) Select item 2, (Enter X.25 parameters) 7) Enter Y to overwrite/create the /etc/x25init_scc0 le 8) Enter Y to edit the /etc/x25init_scc0 le 9) Change the appropriate parameters 10) When the parameters have been successfully changed, select item 5, (Exit conguration program and create le) 11) Select item 3, (Enable X.25) 12) Enter Y to continue the port change

14-10 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

13) Hit <RETURN> when cables have been installed 14) Select item 9, (Exit) Setting a VCP: 1) Retrieve the X.121 addresses and TL1 PIDs of all the OS, this is normally available from the OS TL1 supplier. The PIDs are sometimes referred to as MPIDs, Module Processor Identiers, Process Identiers and/or Protocol Identiers. BellCore species the following: TL1, PID=CF (BellCore recommendation) X3PAD, PID=01 (BellCore Standard) TCPIP, PID=CC (BellCore Standard) 2) Login to the OPC, as root 3) Enter cd /iws/vcp 4) Enter vcpinfo -l, (list all existing X.121 addresses) 5) Delete X.121 addresses which are not allowed, enter vcpinfo -d <X.121 address>, repeat if necessary 6) Add X.121 addresses which are allowed, enter vcpinfo -a <X.121 address> <NMA or OPS>, repeat if necessary 7) Change the TL1 PID, if necessary, enter vcpinfo -p <TL1 PID> 8) Re-initialize the TL1X.25SM MSR

Conguring A Port for X.3 PAD: 1) Retrieve all X.3 PAD parameter settings from the X.3PAD supplier, including the X.121 address of the X.3 PAD 2) Login to the OPC, as root 3) Enter cong_port 4) Select item 2, (Congure a service) 5) Select item 4, (X.3 PAD) 6) Select item 1, (View/Modify parameters) 7) Select item 2, (Modify the X.3 PAD conguration) 8) Select item 1, (Add a new X.121 address) 9) Enter the X.121 address of the X.3 PAD 10) Modify the X.3 parameters if necessary 11) Enter q, (return to menu) 12) Select item 8, (Return to previous menu) 13) Select item 3, (Save X.3 PAD conguration) 14) Select item 8, (Return to Congure X.3 PAD menu) 15) Select item 2, (Enable X.3 PAD support) 16) Select item 9, (Exit)

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-11 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

14.2.2 TL1: Determining Inhibited PM Counts The following process will identify which NMA sessions have inhibited PM counts. The le /iws/nma/nma_pm_report_ags.xxxx (where xxxx=OS identication number) contains the ags identifying which PM counts are inhibited or allowed. Identifying the Status of the PM Counts: 1) 2) 3) 4) Login to the OPC, as root Enter cd /iws/nma Enter ls nma_pm_report_ags.* Enter nmasee <OS identication number> A table will be displayed indicating which PM counts are A=active or I=inactive 5) Repeat for other OS identication numbers

14.2.3 X.25: Default Settings Stored in X25INIT_TEMPLATE File The following lists the default settings stored in /etc/X25INIT_TEMPLATE:
Global Parameters x.121 address device name Level 2 Parameters t1 t3 .frmsize' n2 l2window Level 3 Parameters networktype lci max_circuits owcontrol thruputclass fast_select_accept reverse_charge def_inpacketsize def_outpacketsize def_inwindow def_outwindow 1 DTE_84 svc 8 on on disabled disabled 128 128 2 2 8 3000 60000 263 20 7 123456789000 /dev/x25_0 scc0

14-12 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

def_inthruputclass def_outthruputclass neg_inpacketsize neg_outpacketsize neg_inwindow neg_outwindow neg_inthruputclass neg_outthruputclass

11 11 256 256 2 2 11 11

14.2.4 X.3: Default Settings Stored in x3cong File The following lists the default settings stored in /etc/x25/x3cong: 1) Escape from data transfer 2) Echo 3) Data forwarding character 4) Idle timer delay 5) Ancillary device control 6) PAD service signal 7) Procedure on break 8) Discard output 9) Padding after return 10) Line folding 11) Binary speed 12) Flow control of the PAD 13) Linefeed after RETURN 14) Linefeed padding 15) Editing 16) Character delete 17) Line delete 18) Line display 19) Editing PAD service signal 20) Echo mask 21) Parity treatment 22) Page wait
0 0 126 0 1 5 21 0 0 0 14 1 0 0 0 8 21 18 2 0 0 0 (escape not allowed) (Echo off) (line) (line) (use X-ON & X-OFF) (use prompt PAD & PAD service signal) (TX interrupt & break indication, no output) (normal delivery) (no padding) (no linefolding) (9600 bps) (use X-ON & X-OFF) (no linefeed insertion) (no padding after linefeed) (no editing in data transfer) (Control H) (Control U) (Control R) (PAD service signal for display terminal) (no echo mask) (no parity checking or generation) (disabled) 126 1 (raw) (raw)

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-13 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

14.2.5 VCP: VCP PID Defaults The VCP PIDs are provided by the OS TL1 provider. VCP PIDs are sometimes referred to as MPIDs, Module Processor Identiers, Process Identiers and/or Protocol Identiers. BellCore species the following:

TL1, PID=CF (BellCore recommendation) X3PAD, PID=01 (BellCore Standard) TCPIP, PID=CC (BellCore Standard) Primary Router, PID=C7 (BellCore Recommendation) Backup Router, PID=C9 (BellCore Recommendation) OC3 Router, PID=AC

14.2.6 TL1 Interface Router Service : Cannot Establish Connection Cannot Establish TL1 Interface Router Service connection over X.25 between OS and the GatewayOPC or Between GatewayOPC and RemoteOPC. Reason-1 The Port B is not congured for X.25. Open the Event Browser tool and search for a NAD310 message indicating that X.25 is not congured or use the cong_port command to query the port. Congure the port using the cong_TL1 or cong_port command. When the port has been congured, enable the TIRDMONP/TIRDMONB (TL1 Primary Router Service/TL1 Backup Router Service) using the cong_TL1 tool. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10. Reason-2 The TIRDMONP/TIRDMONB process will always go to pause state, if the ProtocolId and TL1 Interface Router Conguration is not done. When the ProtocolId and Router Conguration is done, enable the TIRDMONP/ TIRDMONB (TL1 Primary Router Service/TL1 Backup Router Service) using the cong_TL1 tool. Reason-3 All X.25 based applications are enabled and the TL1 Interface Router Service protocol Ids are set to 01/CF/AC/CC and the Router process enabled. Use the cong_TL1 tools to determine if the TL1 Interface Router Services are set to any of the above mentioned ProtocolIds. To correct the problem, disable the TL1 Interface Router Service and, change the ProtocolId to the proper value C7 incase of Primary router and C9 in case of BAckup Router service and enable the TL1 Router service.

14-14 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

Reason-4 LAPB is down. Use the tstatc tool to determine the status of the X.25 layer and the LAPB layer. If the LAPB layer is down then uncongure and re-congure the X.25. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10. See LAPB: LAPB is Dropping 14.7. See LAPB: LAPB Problems on page 14-7. Reason-5 The X.25 driver is continuously messaging. From time to time, the chip which drives the X.25 gets stuck and will continuously generate SABM, RNR and/or REJ messages. Use the tstatc tool to determine the status of the lapb connection. Check the transmit message counts for SABM, RNR and REJ. If the counts are continuously rising at more than 1 message/second. Use the x25init command to re-initialize the X.25 driver. Another work-around is to disable and enable the X.25 settings on Port B. Reason-6 The X.25 parameters are not set correctly at both the OS and OPC. For X.25 to work properly all settings at both the OPC and OS must match and be set appropriately. A protocol analyzer may be necessary to determine if this is the problem. The X.25 supplier will provide the proper X.25 parameter settings. Reason-7 The X.25 parameters have been changed at the OPC but the changes are not running. The contents of the /etc/x25init_scc0 le has been manually altered but the port has not been initialized to use the new values. This is done most commonly to quickly change the X.121 address of the OPC. The proper way to correct this problem is to un-congure X.25, change the X.25 parameters, recongure X.25 and re-initialize TL1X.25SM. A faster alternative is to use the x25init command. See TL1: Setting Up X.25, VCP and X.3PAD on page 14-10. Reason-8 The TIRDMONP/TIRDMONB cant establish a connection to RemoteOPC. In such cases check the Cnet/SDCC/Ethernet is present between the GatewaySOC and RemoteOPC SOC. An OS at the specied X.121 address is trying to connect to the OPC, but the OPC is rejecting the call since the X.121 address is not on the VCP list. Use the cong_TL1 or vcpinfo tools to list and add an X.121 address to the OPC. See Complete set of OPC Tools on page A-1.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-15 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

Reason-9 An OS at the specied X.121 address is trying to connect to the Gateway OPC, but the OPC is rejecting the call since the X.121 address is not on the VCP list. Use the cong_TL1 or vcpinfo tools to list and add an X.121 address to the OPC. See Complete set of OPC Tools on page A-1.

14.2.7 TL1 conguartion for TL1 Over TCP/IP : Error during Conguring and deleting the conguration. The cong_TL1 tool will display the corresponding error message when tried to conure (or Un congure) a port for TL1 TCP/IP operations.

Reason-1 The error Inetd not found indicates that the Internet deamon Inetd is running but cannot be recongured. Next step : Do /etc/inetd -k (kill the inetd deamon) and restart the deamon by giving /etc/inetd -l. Continue with conguration.

Trouble-shooting for initiating and cancelling ESWD (Electronic Software Delivery) using the TL1 commands - STA-ESWD, CANC-ESWD

14.2.8 STA-ESWD is rejected and ESWD cannot be started Reason-1 The STA-ESWD is rejected with the error message ICNV, Input, Command Not Valid on a Backup OPC. This command must always be issued on the primary OPC.Check if the OPC is primary or not and issue the command on the primary OPC to which the software is to be delivered. Reason-2 The STA-ESWD is rejected with the error message - SROF, OPC Is In Inactive State The OPC, though primary, has been rendered inactive, possible due to an OPC switch activity. Perform an OPC switch and make the primary OPC active and then issue the STA-ESWD command.

14-16 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

Reason-3 The STA-ESWD is rejected with the error message - SROF, No Disk Space. Insufcient disk space is the reason for this rejection. For software delivery to occur, the disk space on the OPC must be 5 MEG or more. For software delivery, the release.info le, which is the rst one to be transferred from the remote software repository, will be used to extract the names of all the les in the software release. The le sizes are then added up and then a check is made to see if a buffer of 5 MEG exists, in addition to the software release. If the disk space is not sufcient, the above error message is displayed. Reason-4 The STA-ESWD is rejected with the error message - SROF, OPC to Remote Server Communication Error. The connectivity between the remote software repository and the OPC to which software is to be delivered cannot be established. Establish the connectivity and issue the STA-ESWD again. Reason-5 The STA-ESWD is rejected with the error message - SROF, NSAP Address Could Not Be Retrieved. The NET address specied in the STA-ESWD command is not recognised. Verify that the NET is the NET of the software repository containing the software release to be delivered. Reason-6 The STA-ESWD is rejected with the error message - IPEX, Both NET and RTID cannot appear in the command. Both the Remote TID and the NET address are specied in the STA-ESWD command. Make sure that only one of the two is entered in the command. Reason-7 The STA-ESWD is rejected with the error message - SROF, Requested Software Already Delivered. The software that is specied to be delivered is already present on the OPC.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-17 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

Reason-8 The STA-ESWD is rejected with the error message - SROF, Product/Release Name Mismatch. The software to be delivered ,as specied in the command, is older than what is on the system currently. For the software to be delivered, it must be more recent than the version of the software currently present on the system. Reason-9 The STA-ESWD is rejected with the error message - SROF, Product/Release Name Mismatch. The release number mentioned in the command is different from what is present in the release.info le on the remote server. Verify that the right release is specied in the command. Reason-10 The STA-ESWD is rejected with the error message - SROF, Product/Release Name Mismatch. The software that is specied in the command is not for the product installed on the OPC. Verify that the software release to be delivered is the correct one for this OPC. Reason-11 The STA-ESWD is rejected with the error message - SROF, Remote Server/Directory path Not Valid. The directory path specied in the command is not valid. Find out the directory path containing the software release to be delivered. Reason-12 The STA-ESWD is rejected with the error message - IITA, The TID should be OPC TID. The TID specied in the command is not an OPC TID. The TID should be the OPC TID to which the software is being delivered. To determine the OPC TID, type TIDMAP at the OPC prompt.

14-18 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

Reason-13 The STA-ESWD is rejected with the error message - SROF, Another Software Delivery In Progress. Another software delivery is already in progress. There can only be one software delivery occurring to a particular OPC at a given point of time. Make sure that there is only one software delivery occurring.

14.2.9 STA - ESWD accepted, ESWD initiated and then aborted

Reason-1 The primary OPC that was active when ESWD was initiated has been rendered inactive, possibly due to an OPC swact. This is known from the REPT SW message on the tl1shell. Restore the primary OPC to active state for ESWD to resume from where it left off.

Reason-2 ESWD aborted with the message - SROF, OPC to Remote Server Connection Lost. The connectivity between the OPC and the remote server has been lost. Reestablish connectivity for ESWD to resume from where it left off.

Reason-3 ESWD aborted with the message ESWD Aborted - No Disk Space. When ESWD was commenced, there was sufcient disk space.Subsequently, disk space reduced due to some reason other than software delivery. As a result, ESWD was aborted. Make sure the disk space is as required for ESWD.That is, there must be 5 MEG plus the amount of disk space for the software release to be transferred.

Reason-4 ESWD aborted with the message - ESWD Aborted - Checksum Failed The checksum of the le that was transferred does not match that specied in the release.info le. The checksum calculation failed. Remove the software release from the software repository and transfer the software release from the software release tape to the software repository again.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-19 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

Reason-5 ESWD aborted with the message - ESWD Aborted - Checksum Failed The checksum of the le that was transferred does not match that specied in the release.info le. The le could be corrupted. As in reason-4, remove the software release from the software repository and transfer the software release from the software release tape to the software repository again.

14.2.10 CANC - ESWD rejected Reason-1 CANC-ESWD rejected with the message - SROF, Status, Requested Operation Not Active. The combination of the directory path and the Remote TID / NET specied is not valid. Determine the RTID or NET of the software repository and the directory path of the software release for which the software delivery is occurring and reissue the CANC-ESWD command. Alternatively, issue the CANC-ESWD command without any fblock parameters. If there is an ESWD in progress, this command will cancel it.

Reason-2 CANC-ESWD rejected with the message - IITA, The TID should be OPC TID. The TID specied is not that of the OPC. The TID should be the OPC TID to which the software is being delivered. To determine the OPC TID, type TIDMAP at the OPC prompt.

Reason-3 CANC-ESWD rejected with the message - ICNV, Input, Command Not Valid on a Backup OPC. CANC-ESWD has been issued on the backup OPC. This is an invalid action. Issue the command on the active primary OPC.

Reason-4 The OPC was rendered inactive , possibly due to an OWS swact. This is accompanied by a TL1 REPT SW message. Make the primary OPC active and then issue the CANC-ESWD.

14-20 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

Reason-5 CANC-ESWD is rejected with the message - IPEX, Both NET and RTID cannot appear in the command. Both the NET and Remote TID have been specied in the command. Make sure only one of them is specied and issue the command again. Reason-6 CANC-ESWD is rejected with the message - SROF, Status, Requested Operation Not Active. The software delivery specied by the command is not active. Reason-7 CANC-ESWD is rejected with the message - SROF, NSAP Address Could Not Be Retrieved. The specied NET is not valid. Specify the NET of the software repository from which the software delivery is occurring and issue the command again. Alternatively, issue the CANC-ESWD command without any fblock parameters.If there is an ESWD in progress, this command will cancel it.

Trouble-shooting for establishing and dropping association over TL1 over 7 Layers using TL1 commands

14.2.11 Association cannot be established over 7 Layers

Reason-1 The rst command given from the OS is not an ACT-USER command. The user will be intimated with an error message PLNA, Privilege, Login Not Active. The association will be activated with a ACT-USER command only. The rst command for TL1 over 7 Layers association should be ACT-USER.

Reason-2 The ACT-USER command gets rejected with the error message - IITA Input, Invalid TArget identier

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-21 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

The user needs to provide correct OPCTID in the ACT-USER command. The error comes because the TARP on the GNE is not able to resolve the OPCTID given in the ACT-USER command.

Reason-3 The ACT-USER command gets rejected with the error message - IITA Input, Invalid TArget identier The GNE is not able to see the OPC. ie the OPC might not be present in the network and the Association request from the GNE is not able to reach the OPC..

Reason-4 The ACT-USER command gets rejected with the error message - IITA Input, Invalid TArget identier The TARP on the Target OPC is not enabled. Due to this it is not able to provide the NSAP of the OPC to the GNE for the association to be established. The user needs to enable the TARP on the TargetOPC.

Reason-5 The ACT-USER command gets rejected with no error message but only a DENY response. The DENY response is generated by the OPC which means that Illegal UID/PID have been specied in ACT-USER command. Syslogs will also be generated to ag this incident.

Reason-6 The ACT-USER command with NMAPROC, OPSPROC or BTHPROC users gets rejected with the error message - PIRC User not an administrator The ACT-USER has already been done on the OPC as one of NMAPROC, OPSPROC or BTHPROC users and the association has already been established between the GNE and the OPC. Second ACT-USER can only have the UID of the

14-22 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

14

TL-1 and X.25

ADMIN group or a ROOT UID. It wont take the UID of any of the Non-Admin groups. Reason-7 The ACT-USER command gets rejected with the error message - SARB Status, All Resources Busy When the GNE trys to invoke OPS/BTH interface on an Inactive OPC, the user will be intimated with this error message. Make sure the OPC is active before invoking BTH/OPS sessions on TL1 over 7 Layers.

Reason-8 The ACT-USER command gets rejected with the error message - SSRE Status, System Resources Exceeded This message will come if there are already four sessions running on the OPC (NMAs and OPSs) and atleast one of them is not coming over TL1 over 7 Layers. If all the 4 sessions are coming over 7 Layer Association, this message would not come. 14.2.12 Association Established through ACT-USER and then dropped

Reason-1 No error message will be generated under this scenario from the Responder. This will happen if OPS or BTH session is started on the OPC and OPC goes through a SWACT. In that case the association for OPS or BTH sessions will be dropped.

Reason-2 The association may drop owing to the problems in the Data Comms ( SDCC etc). or the Data Comms going down. In this case the Initiator ( GNE ) should provide the message to the user about the session being dropped. The OPC cannot send a message to the GNE because there is no physical connectivity between it and the GNE.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 14-23 Issue: AD04

Chapter 14

TL-1 and X.25

For Nortel Internal Use Only

14.2.13 Association not Dropped by CANC-USER

Reason-1 The CANC-USER command gets rejected with the error message - PIUI, Privilege, Illegal User Identity The UID specied in the CANC-USER command is not same to the one given in the ACT-USER to establish an Association over 7 Layers.

14-24 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

15

Conguration Manager

15.1 Problem Description


The following sections describe common conguration ( general, linear and ring ) problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description.

15.1.1 SCM: Cannot Save Conguration Data to NEs Unable to save the conguration data. Reason-1 The Conguration Manager is running on a backup OPC. The Conguration Manager runs in read-only mode on a backup OPC. Convert the OPC to a primary OPC or make the changes at a primary OPC.

15.1.2 SCM: Cannot Send Conguration Data to NE After selecting the save button the Conguration Manager fails to apply the conguration data at the NE. Reason-1 Association to the NE is down. While the associations are down conguration changes cannot be sent to the NE. See CAM: Associations are Down or are Unstable on page 2-4. Reason-2 The NE is not commissioned for Ring_ADM, ADM or terminal capability. Open the Commissioning Manager and assure the NE is commissioned for Ring_ADM, ADM or terminal operation. If the NE is to operate in a ring then commission the NE as a Ring_ADM. If the NE is to operate in a linear chain then commission the NE as an ADM or terminal.

15.1.3 SCM: Cannot Remove a Conguration Cannot remove a linear or ring conguration from the General Conguration Manager.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 15-1 Issue: AD04

Chapter 15

Conguration Manager

For Nortel Internal Use Only

Reason-1 There are NEs still dened in the Linear or Ring Conguration Manager. Delete the NEs from the Linear or Ring Conguration Manager before deleting the conguration from the General Conguration Manager. Reason-2 There are STS connections still dened in the conguration. Open the Connection Manager and delete all connections for the conguration to be deleted. Note this will cause loss of trafc on the deleted connections. Open the Conguration Manager, delete all NEs in the conguration and delete the conguration.

15.1.4 SCM: Conguration Manager Doesnt Start The Linear or Ring Conguration Manager wont start. Reason-1 The Connection Manager is running. The Linear or Ring Conguration Managers and the Connection Manager cannot be running at the same time. Enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-13. Reason-2 A critical MSR necessary to the Conguration Manager is in the MANUAL or SYSTEM BUSY state. The opc bandwidthmgr MSR is required for the operation of the Conguration Manager. If the opc bandwidthmgr MSR is not INSERVICE:AVAILABLE re-initialize the MSR. Reason-3 The executables do not have the proper permissions or are missing. Perform the following commands: ll /iws/gcm/gcmmxt ll /iws/gcm/gcmmlct ll /iws/lcm/lcmlxt ll /iws/lcm/lcmllct ll /iws/scm/scmxt ll /iws/scm/scmlct (general conguration manager for xterminal) (general conguration manager for vt100) (linear conguration manager for xterm) (linear conguration manager for vt100) (ring conguration manager for xterm) (ring conguration manager for vt100)

15-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

15

Conguration Manager

If the les are unexecutable, use the chmod 755 <executable> command to change the permissions of the les. If the les are missing, retrieve another copy from the OPC install tape or from the backup OPC. Reason-4 The conguration database has been corrupted. This is extremely rare. Perform a view /var/log/syslog and enter ?OPCDB to search for database errors. If the conguration database is corrupted, it will be necessary to reset the conguration part of the OPC database. 15.1.5 SCM: Scheduled Conguration Audit Fails The scheduled Conguration Audit fails. Reason-1 A critical MSR necessary to the Conguration Manager is in the MANUAL or SYSTEM BUSY state. The opc bandwidthmgr MSR is required for the operation of the Conguration Manager. If the opc bandwidthmgr MSR is not INSERVICE:AVAILABLE re-initialize the MSR. Reason-2 The executable does not have the proper permissions or is missing. Perform the following commands: ll /iws/scm/scmcaudt (ring conguration audit)

If the le is unexecutable, use the chmod 755 <executable> command to change the permission of the le. If the le is missing, retrieve another copy from the OPC install tape or from the backup OPC. Reason-3 The connection database has been corrupted. This is extremely rare. Perform a view /var/log/syslog and enter ?OPCDB to search for database errors. If the conguration database is corrupted, it will be necessary to reset the conguration part of the OPC database. Reason-4 Association to the NE is down. While the associations are down conguration audits cannot be sent to the NE. See CAM: Associations are Down or are Unstable on page 2-4.

15.1.6 SCM: Scheduled Conguration Audit Mismatch The conguration audit log in the Event Browser indicates that a mismatch exists. The scheduled conguration audit has found conguration mismatches on the NEs.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 15-3 Issue: AD04

Chapter 15

Conguration Manager

For Nortel Internal Use Only

Reason-1 The NE did not receive the conguration changes. Perform a conguration audit to correct the problem. Reason-2 The NE was restored with an older database which did not have the same conguration data. Perform a conguration audit to correct the problem.

15.2 Solution Description


15.2.1 SCM: Retrieving Conguration and Connection Data User request to retrieve in readable format, the network conguration and connection layout. See STP: Retrieving Conguration and Connection Data on page 16-8.

15-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

16

Connection Manager

16.1 Problem Description


The following sections describe common Connection Manager problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description.

16.1.1 STP: Cannot Send Connection Data to NEs After entering the connection data, the connection changes cannot be applied to the NEs. Reason-1 Association to the NE is down. While the associations are down connection changes cannot be sent to the NE. See CAM: Associations are Down or are Unstable on page 2-30. Reason-2 The NE is not congured for Ring_ADM, ADM or terminal operation. Open the Conguration Manager and assure the NE is congured for Ring_ADM, ADM or terminal operation. If the NE is to operate in a ring then congure the NE as a Ring_ADM. If the NE is to operate in a linear chain then congure the NE as an ADM or terminal. Reason-3 The NE is not commissioned for Ring_ADM, ADM or terminal capability. Open the Commissioning Manager and assure the NE is commissioned for Ring_ADM, ADM or terminal operation. If the NE is to operate in a ring then commission the NE as an Ring_ADM. If the NE is to operate in a linear chain then commission the NE as an ADM or terminal. 16.1.2 STP: Connection Manager Doesnt Start The Connection Manager wont start. Reason-1 The OPC has not been commissioned, or no congurations exist. If no congurations exist and at least one NE is congured as a terminal operation, then the user will have access only to the Nodal Provisioning dialogs. This is the exception to the rule.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 16-1 Issue: AD04

Chapter 16

Connection Manager

For Nortel Internal Use Only

Reason-2 The Linear or Ring Conguration Manager is running. The Connection Manager and the Linear or Ring Conguration Manager cannot be running at the same time. Enter the who command to see who is logged into the OPC. The wall or write command can be used to send messages to other users asking to log off or close tools. See USM: Using the wall and write commands on page 1-23. Reason-3 A critical MSR necessary to the Connection Manager is in the MANUAL or SYSTEM BUSY state. The opcbandwidthmgr MSR is required for the operation of the Connection Manager. If the opcbandwidthmgr MSR is not INSERVICE:AVAILABLE re-initialize the MSR. Reason-4 The executables do not have the proper permissions or are missing. Perform the following commands: ll /iws/stp/stpxt ll /iws/stp/stplct ll /iws/stp/stpcaudt (xterminal connection manager) (vt100 connection manager) (connection audit)

If the les are unexecutable, use the chmod 755 <executable> command to change the permissions of the les. If the les are missing, retrieve another copy from the OPC install tape or from the backup OPC. Reason-5 The connection database has been corrupted. This is extremely rare. Perform a view /var/log/syslog and enter ?OPCDB to search for database errors. If the sts database is corrupted, it will be necessary to reset the sts part of the OPC database. 16.1.3 STP: Unable to provision connections on an active Primary OPC The add, edit, and delete options are all disabled. Reason-1 The readonly version, Connection Mngr (R) tool is running. Check the title (upper left hand corner) of the connection manager anchor window. If this is the case, then restart the connection manager with one of the following three tools: Connection Mngr, Connection Mngr (AC), or Connection Mngr (BP) Reason-2 CS Base is busy processing another request, for example, processing discovery audit or provisioning other connections by other user sessions. In this case, wait for CS Base to free up by ensuring associations are stable (they do not have to be

16-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

16

Connection Manager

established) and/or ensure there is only one application performing connection services function (no potential collision of requests to access CS Database). 16.1.4 STP: Unable to provision connections on an active Backup OPC The add, edit, and delete options are all disabled. Reason-1 The Connection Mngr (BP) is not running. Check the title (upper left hand corner) of the connection manager anchor window. Restart the connection manager using the Connection Mngr (BP) tool. Reason-2 CS Base is busy processing another request, for example, processing discovery audit or provisioning other connections by other user sessions. In this case, wait for CS Base to free up by ensuring associations are stable (they do not have to be established) and/or ensure there is only one application performing connection services function (no potential collision of requests to access CS Database). 16.1.5 STP: Connection Audit Fails The Connection Audit fails. Reason-1 A critical MSR necessary to the Connection Manager is in the MANUAL or SYSTEM BUSY state. The opcbandwidthmgr MSR is required for the operation of the Connection Manager. If the opcbandwidthmgr MSR is not INSERVICE:AVAILABLE re-initialize the MSR. Reason-2 The executables do not have the proper permissions or are missing. Perform the following commands: ll /iws/stp/stpcaudt (connection audit)

If the les are unexecutable, use the chmod 755 <executable> command to change the permissions of the les. If the les are missing, retrieve another copy from the OPC install tape or from the backup OPC. Reason-3 The connection database has been corrupted. This is extremely rare. Perform a view /var/log/syslog and enter ?OPCDB to search for database errors. If the sts database is corrupted, it will be necessary to reset the sts part of the OPC database. Reason-4 Association to the NE is down. While the associations are down connection audits cannot be sent to the NE.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 16-3 Issue: AD04

Chapter 16

Connection Manager

For Nortel Internal Use Only

See CAM: Associations are Down or are Unstable on page 2-30. Reason-5 CS Base is busy processing another request, for example, processing discovery audit or provisioning other connections by other user sessions. In this case, wait for CS Base to free up by ensuring associations are stable (they do not have to be established) and/or ensure there is only one application performing connection services function (no potential collision of requests to access CS Database). 16.1.6 STP: Audit Mismatch The audit log in the Event Browser indicates that a mismatch exists. The scheduled audit has found mismatches on the NEs. Reason-1 Connections were changed on the backup OPC but the changes were not applied on the primary OPC. Make the same changes to the primary OPC or perform a data sync from the backup OPC to the primary OPC. See ODS: Data Sync from Backup to Primary on page 2-34. Reason-2 Association to the NE was down when the connection data was sent. While the associations are down connection changes cannot be sent to the NE. Perform an audit to correct the problem. Reason-3 The NE was restored with an older database which did not have the same connection data. Perform an audit to correct the problem. Reason-4 TL1 OPS/INE was used to add/delete/edit the STS connections. The TL1 STS connection services are mutually exclusive to the OPC Connection Manager. Because the Connection Manager has no means of determining what changes were performed through TL1 OPS/INE, when the Connection Manager starts the connection audit it will not have the proper connections data and the audit will fail. The work around is to re-add the connection information using the Connection Manager. The added information must match exactly with the data provided through TL1 OPS/INE, including the path the connection is to take and the STS channel used. NOTE failure to match the connection exactly will result in trafc loss use this as a last resort.

16-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

16

Connection Manager

16.1.7 STP: No option to correct an audit mismatch After an audit the only two options are Yes and No to view the audit mismatch. Reason-1 The audit was not performed with the Connection Mngr (AC) tool. Check the title (upper left hand corner) of the connection manager anchor window. Restart the audit using the Connection Mngr (AC) tool. 16.1.8 STP: Cannot Add a Matched Node Connection When attempting to add a Matched Node connection between rings, the Secondary Gateway change window is rejecting the entries. Reason-1 The connection is not on a synchronous tributary at the gateway. Only STS connections can be used for Matched Node Rings. These would include STS channels on the following cards for the OC48 products: STS-1 OC-3 OC-12. OC12 products support matched node connections on STS-1 circuit pack only.

Reason-2 A Secondary gateway NE lies on the same A_term to Z_term path of the Primary gateway NEs. Change the Secondary gateway NE to an NE which is not on the path between the A_term and Z_term Primary gateway NEs or change the path between the A_term and Z_term Primary gateway NEs such that the Secondary gateway NE(s) are no longer on the path.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 16-5 Issue: AD04

Chapter 16

Connection Manager

For Nortel Internal Use Only

A_term

Z_sec_term

This is wrong. The Secondary gateway NE is on the path between the Primary gateway NEs.

Z_term

A_term

Z_term

This is correct. The Secondary gateway has been moved off of the path between the Primary gateway NEs.

Z_sec_term

A_term

Z_sec_term

This is correct. The path between the Primary gateway NEs have been changed to go bypass the Secondary gateway NE.

Z_term

Reason-3 CS Base is busy processing another request, for example, processing discovery audit or provisioning other connections by other user sessions. In this case, wait for CS Base to free up by ensuring associations are stable (they do not have to be established) and/or ensure there is only one application performing connection services function (no potential collision of requests to access CS Database). 16.1.9 STP: Unable to add a nodal cross-connect The user is unable to add a nodal cross-connect to a specic NE in the Nodal Add dialog. Reason-1 The NE cannot be selected from the chooser list or is invalidated when manually entered. Nodal cross-connects can only be added to NEs that are in the OPCs SPan Of Control (commissioned NEs).

16-6 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

16

Connection Manager

Reason-2 The NE doesnt belong to a conguration. All Ring_ADM NEs must exist in a conguration before they will be selectable from the Nodal Add dialogs chooser list. Linear NEs are exempted from this rule, but no end-to-end information will be available unless the NE belongs to a conguration. Reason-3 There is no association to the NE. Nodal cross-connects can only be added when there is an association between the NE and the OPC (ie. No pre-provisioning). Reason-4 CS Base is busy processing another request, for example, processing discovery audit or provisioning other connections by other user sessions. In this case, wait for CS Base to free up by ensuring associations are stable (they do not have to be established) and/or ensure there is only one application performing connection services function (no potential collision of requests to access CS Database). 16.1.10 STP: Unable to delete connections The user is unable to delete connects. Reason-1 CS Base is busy processing another request, for example, processing discovery audit or provisioning other connections by other user sessions. In this case, wait for CS Base to free up by ensuring associations are stable (they do not have to be established) and/or ensure there is only one application performing connection services function (no potential collision of requests to access CS Database). 16.1.11 STP: Unable to provision a DCP connection The user is unable to provision a DCP connection, and it is not obvious why the connection is being invalidated. Reason-1 The SAP and Secondary Gateway cannot be provisioned on the same node for a DCP secondary path. For example, if the SAP is Primary NE A, and the user is provisioning a DCP connection with the gateway on Secondary NE A, the same NE cannot be used at both the Primary A and Secondary A. Reason-2 The Complimentary Rule. The OPC Connection Manager will restrict the provisioning of two unidirectional matched node connections that drop or add with a DCP protection scheme at the same node and channel on opposing circuit pack groups. For example, suppose a unidirectional DCP connection was provisioned from NE A (10) to NE Z (20) to Secondary NE Z (30) on Channel 1. Now suppose the user tries to create a second unidirectional DCP connection from NE A (30) to NE Z (20) to Secondary NE Z (10) on channel 1. The connection would be

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 16-7 Issue: AD04

Chapter 16

Connection Manager

For Nortel Internal Use Only

invalidated. On both sides of the Primary Z NE, trafc is entering on Channel 1 on an optic on one side of the NE and leaving on the opposing cpg on channel 25 on the other side of the NE. If a protection switch were to occur trafc being moved on to the protection channel would collide with trafc on the secondary gateway DCP channel. Reason-3 The Hardware Provisioning Rule. A DCP connection will be invalidated if the user tries to provision a connection on the working Rx timeslot on one side of an NE where the extra trafc Rx timeslot is already in use on the optic on that side of the NE. The reason is that the trafc being received on the working Rx timeslot is actually moved to the extra trafc Rx timeslot and transmitted on the extra trafc Tx timeslot due to a limitation in the hardware. 16.1.12 STP: Bandwidth unavailable When provisioning end-to-end connections, if the requested channel is not available (veried using the Channel Usage Dialog), yet no end-to-end connection appear to be using the channel, it is quite likely that the bandwidth is in use by a nodal cross-connect. Reason-1 A nodal cross-connect has been provisioned on the channel in question. Reason-2 An in-service connection rollover is in progress (which generates uni-directional and bi-directional nodal cross-connects to bridge the connection). 16.1.13 STP: Uni-directional Nodal Cross-Connects present on TBM OC-12 The nodal provisioning dialog lists uni-directional cross-connects, yet the TBM OC-12 product does not support provisioning of uni-directional connections. Reason-1 An in-service connection rollover is in progress (which generates uni-directional and bi-directional nodal cross-connects to bridge the connection). Use the Filter option on the Connection Manager main inventory list to show connections inprogress of an in-service rollover.

16.2 Solution Description


16.2.1 STP: Retrieving Conguration and Connection Data User request to retrieve in readable format, the network conguration and connection layout. Dumping the Conguration and Connection Data:

16-8 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

16

Connection Manager

1) Login to the OPC, as root 2) Enter, /iws/obm/scancs -h to get help le 3) Help le indicates options that are available

16.2.2 STP: Viewing the Protection State of a Matched Node Ring On Matched Node Rings, both the Connection Manager and Network Summary tools are required to view the protection status of the rings. The Network Summary tool can be used to view the optical protection status within the ring and the Connection Manager can be used to view the STS protection status of the connections between the rings. Viewing Optical Protection Status Within A Ring 1) Login to the OPC, as netsurv 2) Open the Network Summary tool 3) From the selection menu. open the Protection menu item 4) Select the appropriate ring conguration 5) The Optical Protection Status window for the selected ring conguration is displayed Viewing STS Protection Status at Primary Gateways 1) Login to the OPC, as admin 2) Open the Connection Manager tool 3) From the selection menu, open the Utilities menu item 4) Select the Show primary gateway selectors item 5) In the Primary Gateway Selector Status window, select the appropriate ring conguration and NEid 6) The Primary Gateway Selector Status window will display the Primary and Secondary Feed Protection State

16.2.3 STP: Viewing Mismatched Connection Data When a connection data audit reports Mismatch, use the Mismatch Details button available on the audit results dialog to view the cross-connects that are mismatching. Any cross connects that are found on the OPC which are not found on the NE will be listed. Any cross connects that are found on the NE which are not found on the OPC will be listed. OPC CONNECTIONS NOT FOUND ON THE NE: Data listed in this section is easiest to manage in that it is inherently safe to correct. If fact, if ALL mismatch data was listed in this section, the user could, in full condence, initiate a corrective audit knowing that no trafc was at risk because the correction involves only creating new cross connects at the NE. Also, data listed in this section can be cross referenced to the OPC Connection Manager inventory listing of end-to-end connections using elds that are common to both (e.g. Transport channel, Tributary for either the A or Z NEId). The user could, for example, use the search utility on the `View Audit

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 16-9 Issue: AD04

Chapter 16

Connection Manager

For Nortel Internal Use Only

Mismatch dialog to nd all occurrences of transport channel endpoints (search for TP ) and quickly nd the full path connection containing that cross connect by sorting the anchor window inventory by channel. NE CONNECTIONS NOT FOUND ON THE OPC: Data listed in this section must be dealt with using caution because a corrective audit results in these cross connects being DELETED from the NE. Data listed in this section should serve as a possible indicator for the case in which provisioning was done from an active Backup OPC (described above). If veried to be the case, the nal required steps of shadowing all provisioning operations on the now active Primary OPC could be done. In any case, with this explicit audit mismatch listing now available, the user should verify that each cross connect is indeed NOT carrying trafc; this could be done, for example, by logging into the NE and displaying the states of the facilities corresponding to each endpoint of a cross connect.

16.2.4 STP: Correcting individual mismatches To correct mismatches individually, the Nodal Provisioning Conict dialog can be used. Select Nodal Provisioning from the Connection Manager Utilities menu option. From the Nodal Provisioning dialog select the NE that contains the individual mismatch you want to correct. Choose Mismatch Details ... from the list item menu. The Nodal Provisioning Conict dialog will be displayed, showing all conicts with the selected cross connect. To correct the conicts, delete the cross-connects that should not exist.

16-10 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

17

PM Collection

17.1 Problem Description


The following sections describe common PM Collection problems. Each problem will be described and possible diagnostic reasons will be provided. Available work-arounds will be provided at the end of each problem reason description. 17.1.1 PM Counts not reported in TL1 A REPT-PM TL1 message reports no data, or does not appear. Reason-1 All PM counts for the given facility type have a value of zero. Reason-2 PM collection is disabled at the OPC (this is the default status after an installation). Use the OPC PM Collection Filter or the SET-PMMODE TL1 command to enable PM collection for the desired facility types at the OPC. Reason-3 A INH-PMREPT TL1 command is in effect. Determine which counts are being inhibited and use the ALW-PMRPT TL1 command to allow the PM counts. See TL1: Determining Inhibited PM Counts on page 15-12. Reason-4 Association to the NE is down. While the associations are down, PM data cannot be retrieved from the NE, and therefore cannot be reported. See CAM: Associations are Down or are Unstable on page 2-4. Reason-5 There has been a SWACT (switch of activity) between the primary and backup OPC. The backup OPC is now handling the TL1 messaging and the autonomous messages are being sent to the OS connected at the backup OPC. Correct the reason for the OPC SWACT and the primary OPC will revert back to the active state. See OWS: Both Primary and Backup OPCs are Active on page 2-2. See OWS: Primary OPC is Inactive on page 2-2.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 17-1 Issue: AD04

Chapter 17

PM Collection

For Nortel Internal Use Only

Reason-6 The NE was unable to process a message during a PM collection cycle. This can occur when the NE is busy completing another task (e.g. a restart) which takes priority over processing other messages. Reason-7 The PM collector MSR is not running. Use the drmstat tool to restart the npccoll MSR. 17.1.2 TL1 RTRV-PM command retrieves no data A RTRV-PM TL1 command retrieves no data.. Reason-1 The requested facility does not exist, or all PM counts for the given facility type(s) have a value of zero. Reason-2 PM collection for the requested facility type(s) is disabled at the OPC (this is the default status after an installation). Use the OPC PM Collection Filter or the SET-PMMODE TL1 command to enable PM collection for the desired facility types at the OPC. Reason-3 Association to the NE was down for the requested time period. While association is down, PM data cannot be collected from the NE. As a result, the OPC PM database does not contain data for the requested period. See CAM: Associations are Down or are Unstable on page 2-4. Reason-4 A SWACT (switch of activity) between the primary and backup OPC occurred. The peer OPC was active and collecting PM data for the requested time period(s), and therefore the database on the currently active OPC does not contain this data. Correct the reason for the OPC SWACT and the primary OPC will revert back to the active state. See OWS: Both Primary and Backup OPCs are Active on page 2-2. See OWS: Primary OPC is Inactive on page 2-2. Reason-5 The NE was unable to process a message during a PM collection cycle. This can occur when the NE is busy completing another task (e.g. a restart) which takes priority over processing other messages. In this case, the OPC PM database cannot be populated with counts for that interval.

17-2 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

17

PM Collection

Reason-6 The OPC date has been moved back. The OPC PM database is deleted if the previous collection time stored in the OPC PM database is more recent than the current OPC time. Reason-7 The PM collector MSR is not running. Use the drmstat tool to restart the npccoll MSR. Reason-8 The OPC PM database is empty. PM collection has not yet taken place, or counts were cleared with the TL1 INIT-REG command. 17.1.3 Daily counts on OPC do not match daily counts on the NE. The value of the daily count as reported by the OPC does not match the value of the daily count as seen on the NE UI for some/all facilities. Reason-1 The OPC starts a new daily bin at GMT midnight. The NE starts a new daily bin at midnight local NE time. Mismatches are expected when local NE time is something other than GMT. Reason-2 PM collection for some/all facilities was disabled at the OPC for part of the period. Use the OPC PM Collection Filter or the SET-PMMODE TL1 command to enable PM collection for the desired facility types at the OPC. Reason-3 Association to the NE was down for part of the requested time period. The OPC calculates daily counts as a sum of 15-minute counts, binned starting at GMT midnight. While association is down, PM data cannot be retrieved from the NE, and therefore the calculated daily counts in the OPC database may be inaccurate. See CAM: Associations are Down or are Unstable on page 2-4. Reason-4 A SWACT (switch of activity) between the primary and backup OPC occurred. The peer OPC was active and collecting PM data for the requested time period(s), and therefore the PM database on the currently active OPC does not contain this data. Correct the reason for the OPC SWACT and the primary OPC will revert back to the active state. See OWS: Both Primary and Backup OPCs are Active on page 2-2. See OWS: Primary OPC is Inactive on page 2-2.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 17-3 Issue: AD04

Chapter 17

PM Collection

For Nortel Internal Use Only

Reason-5 The NE was unable to process a message during a PM collection cycle. This can occur when the NE is busy completing another task (e.g. a restart) which takes priority over processing other messages. When this occurs, the PM database cannot be populated with counts for that interval, and that interval cannot be included in the daily bin summation. Reason-6 A message between the OPC and NE was corrupted. When this occurs, the PM database cannot be populated accurately with counts for that interval, and the daily bin summation may become inaccurate. Verify data communications between the OPC and NE. Reason-7 The OPC date has been moved back. The OPC PM database is deleted if the previous collection time stored in the OPC PM database is more recent than the current OPC time.

17.1.4 PM Collection exceeded 15 minutes. PM collection on the OPC took longer than 15 minutes to complete. Reason-1 The number of OPC users and/or tools running exceeds the limit specied in the engineering guidelines. Close some OPC tools and/or terminate some OPC login sessions to get below the limits specied Reason-2 A large number of alarms or events are occurring on the system. Messages containing PM data are interleaved with other messages to the OPC. The time required for PM collection will increase as the number of messages being processed by the OPC increases. If the problem is seen for a number of consecutive periods, use the OPC PM Collection Filter or the TL1 SETPMMODE command to turn off PM collection for unused or unnecessary facility types. Reason-3 Data communication between the OPC and the NEs is very slow. Ensure that the data communication channel(s) between the OPC and NEs are functioning properly. If the problem is seen for a number of consecutive periods, use the OPC PM Collection Filter or the TL1 SET-PMMODE command to turn off PM collection for unused or unnecessary facility types.

17-4 OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

Chapter

For Nortel Internal Use Only

18

Network Time Protocol

18.1 Problem Description


The following sections describe common network time protocol problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description. 18.1.1 XNTP: The OPC has entered freerun mode. The OPC does not choose one of the external timing sources provisioned on the OPC. Reason-1 If 5 external timing sources are provisioned on the OPC and all 5 are at the same stratum level, but they are not in sync, the OPC may not choose one of the sources to synchronize to. If this happens, the OPC will enter a freerun mode. The NTP cong dialog will show the status of the sources to be A if they are all valid. This means Currently Synchronizing, Validating Time Host. Eliminate two of the provisioned sources. This will force the OPC to choose one of the remaining 3 to synchronize to.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 18-1 Issue: AD04

Chapter

For Nortel Internal Use Only

19

Protection Manager / 1:N

19.1 Problem Description


The following sections describe common Protection Manager problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description. 19.1.1 PSU: Cant Display Conguration Error dialog is displayed. Reason-1 The Conguration does not contain any NEs. Use the Conguration Manager (admin user) to add a valid set of NEs to the Conguration (normally seen with ring congurations because the Ring Conguration manager saves the data in two stages). See Complete set of OPC Tools on page A-1.

19.1.2 PSU: Dumping All Protection Data to File This procedure will dump a data le of all protection information: Dumping Protection Data: 1) 2) 3) 4) 5) Login to OPC, as root Enter ps -ef | grep psu (and note the pid for the psu executable - psulct/xt) Enter /iws/psu/psutrace on pid (dumps trace info) Enter /iws/psu/psutrace cache pid (dumps the cache) Enter /iws/psu/psutrace for other options NOTE: Data is dumped to le .psutrace in /tmp.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 19-1 Issue: AD04

Chapter

For Nortel Internal Use Only

20

OPC Date / OPC Shutdown

20.1 Problem Description


The following sections describe common OPC Date problems. Each problem will be described and possible diagnostic reasons will be provided. Available workarounds will be provided at the end of each problem reason description. 20.1.1 SSD: Time Zone Missing from Date UI Reason-1 The timezone environment variable is not set. The workaround is to select the timezone from the list.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.10, OC-3/OC-12 Rel. 13) 20-1 Issue: AD04

Chapter

For Nortel Internal Use Only

Complete set of OPC Tools

Unless otherwise dened, all these tools exist under directory /iws/test-tools/tool-res on the OPC. Note, unlike the NE, the OPC is case sensitive. Also, the OPCs UNIX operating system provides many basic tools / utilities - documentation for which is available from commercial sources.

1.1 archint - NE logger archive interval


NAME archint - NE logger archive interval LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> archint DESCRIPTIONS archint is used to set the NE logger archive interval. When the logger interval time has elapsed, the OPC will request from each NE, all archived log data. The retrieved data is then stored on the OPC as accessible historical data. The stored data is viewable through the event browser. USES Used only for setting NE logger archive interval. SAMPLE OUTPUT opc> archint Current NE log archive interval is 4 hours. (A) 4 hours. (B) 8 hours. (C) 12 hours. (D) 24 hours. (E) Suspend. Choose a new interval:

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-1 Issue: AD04

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1.2 cchdata - command handler data


NAME cchdata - command handler data LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> cchdata [ [-p <pd>] [-t <ti>] [-f] [-F] [-r] [-R] [-d] ] where: -p <pd> - Connect to process with pid <pd> -t <ti> -Start with time interval <ti> seconds -f -Write message data to stdout, must be used with the -p option -F -Write message data to stdout for all processes using cch, cannot be used with -p option. -r -Reset data stored about a pid, must be used with the -p option -R-Reset data stored about all processes using cch, cannot be used with -p option -d-Dump of internal data structures The tool may also be invoked with no options resulting in the MAIN SCREEN being displayed as described below. DESCRIPTION cchdata is a tool used to monitor CMIS message ow and report on any message related errors encountered by an Object Manager or a User Interface. All CMISE messages are sent to and from the object managers and the UIs in the OPC software using command handler routines. cchdata collects statistics on the CMISE messages being passed through the command handler and displays them in a table format. The tool is divided into three screens: MAIN SCREEN - Used to select options and go to other screens. From here, the user can issue the following commands: P -Connect direct to a process. The user is prompted for the PID to connect to.. C -Establish a Connection With a Process. This takes the user to the connection screen from where the user can select a process to monitor. M -Go to Message Screen. This will take the user from the Main Screen to the Message Screen provided a connection has been established. I -Set Update Interval. This command allows the user to change the default update interval time (in seconds).

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-2 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

F -Force an Audit of Shared Memory. This command will initialize the Shared Memory area. D -Dump CCH Internal Shared Memory. Dump all Processes and their present state to the screen. T -Toggle Process Trace State for All New Processes. This will cause all new processes to log cmis messages sent and received. The trace will be in /tmp/cchtrace.<PID>. Individual traces can be turned on/off fdrom the process details screen, Q -Quit from cchdata. CONNECTION SCREEN - Used to make a connection to a process. On this screen the user is presented with a list of processes that cchdata has determined are candidates for connection. The commands for this screen are: U -Move up in the list of processes. D -Move down in the list of processes. L - Move to left column R -Move to right column <RETURN> -Select a process for connection. This will take the user directly to the message screen. Q -Quit to Main Screen. Occasionally the process is either not linked into the Command Handler or not set up to interact with cchdata. Hitting Q will return the user to the Main Screen. On the Connection Screen, an asterisk indicates that there are more processes than can be displayed on the screen. An asterisk at the top of the list means there are more processes above, and an asterisk at the bottom means there are more processes below. Travelling up or down past the list reveals the next part of the list. MESSAGE SCREEN - Used to display Command Handler data for the connected process. The commands are the same as those for the Main Screen with the following exceptions or additions: R - reset the counts Z - toggle tracing for process ? -Help. Displays a simple help screen. T -Terminates the connection and brings up the Connection Screen

USES cchdata can be used for diagnosing messaging problems with OPC software. When tools are malfunctioning, such as invalid alarm counts, loss of messages may be the cause. When there are general system problems occurring, cchdata can be invoked if the Event Browser and /var/syslog are unable to indicate the problem. Also, if the elds in the

A-3 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

SYSTEM RESOURCES reach their maximum size, it is a good indication that a loss of messages is occurring. SAMPLE OUTPUT

The cchdata display is divided into four areas: 1. MESSAGES SENT: - summary of the CMISE messages sent by and received by the object manager and handled by the command handler. The total messages being sent will be high during the commissioning process. The requests/conrmations (reqs/cnfs) and also the indications/responses (inds/rsps) should be equal in the M-Create, M-Delete, MGet and columns. The number of scoped messages and null messages should be equal. 2. FLOW CONTROLLED MESSAGES: - summary of any CMISE messages waiting to be sent. If the receivers UNIX message queue has reached its capacity, the message will be stored in the senders ow-controlled message queue. These totals are typically low reecting the ability of the receiving process to handle the incoming message ow. 3. SYSTEM RESOURCES: (Table 3, SYSTEM RESOURCES Display Fields, on page A-5) -provides a summary of the maximum stress placed on the systems resources.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-4 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

The maximum queue size is indicated for all elds. Loss of messages is guaranteed if the maximum values are attained.
TABLE 3. SYSTEM RESOURCES Display Fields Max Outstanding Reqs (maximum outstanding request messages) Max Outstanding Inds (maximum outstanding indication messages) Max Ptrs in a LCS Msg maximum number of conrmed request messages waiting for a conrmation (maximum = 80) maximum number of conrmed indication messages waiting for a response (maximum = 80) maximum pointers in a Local Concrete Syntax message (maximum = 15) Max partial msgs received: maximum number of messages that were being assembled by the receiver at one time (maximum = 20) Max Flow Controlled Msgs/Dst maximum number of messages waiting to be sent to the receiver at one time (maximum = 50) Max Flow controlled dsts number of destination queues being used to store unsendable messages. (one per destination) (maximum = 5) Max retrys for a LCS message times that command handler attempted to send a message to a particular destination before it was successful (maximum = 15)

4. SYSTEM ERRORS: (Table 4, SYSTEM ERRORS Display Fields, on page A-5) indicates messages that never completed their cycle of being sent, received and replied to.
TABLE 4. SYSTEM ERRORS Display Fields Request timeouts Indication timeouts Failed Sends Missing CMISE headers Missing CMISE data number of conrmed request messages sent to that didnt receive a conrmation number of conrmed indication messages received that didnt get a response number of messages discarded after repeated attempts at sending to destination (maximum number of retries is 15) number of messages discarded due to non-receipt of CMISE headers. Data received without a header is discarded. If all parts of a message from a sender are not received before receiving a new CMISE message header, then the message is incomplete and discarded

SEE ALSO AG2192 -- Command Handler

A-5 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1.3 checksum - calculate serial number for an OPC/NE


NAME checksum - calculate the full serial number including checksum for an OPC or NE LOCATION /iws/test-tools/tool-res SYNOPSIS To use, type: opc> checksum <option> where <option> is any of the following:
-h -l -o <OPC serial #> -n <NE shelf type> <NE serial #> display help read OPC serial number from hardware calculate # from given serial number calculate NE serial number for a given shelf type 02 = OC48 LTE, ADM, RING_ADM 03 = OC48 Regen 08 = OC12 TBM, ADM, RING_ADM

DESCRIPTION This tool is used to calculate the complete serial number as used by the OPC software. The checksum tool can be used to calculate the serial numbers as they will be input to the Commissioning manager. The tool calculates the checksum component of the serial number and outputs the complete serial number for either an OPC or an NE. USES The checksum tool can be used as follows: - To determine the OPCs serial number that the user is currently logged into, use the command checksum -l - To determine any OPCs serial number, the user can use checksum -o <OPC serial#>. Note that the OPC serial number can be extracted from the OPC address. The serial number consists of the 6 digits that follow the digits 490000. Note that the 7 must be substituted by a 3. For example, if the OPC address is 490000.7e-1a-2f-1b-00-0c.00, the OPC serial number can be calculated using the digits 7e-1a-2f, substituting a 3 for the 7 and using the command as follows: checksum -o 3e1a2f - To determine a NE serial number use checksum -n <NE shelf type> <NE serial number>. The <NE shelf type> is a two digit number (02 for terminal, 03 for regenerator) and the <NE serial number> is on the backplane of the shelf.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-6 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

SAMPLE OUTPUT The following is output when checksum -l is typed at the command line

1.4 cong_port - port conguration tool


NAME cong_port - interface for conguring OPC serial ports. LOCATION /etc SYNOPSIS To invoke, type: opc> cong_port Alternatively the tool may be invoked from the User Session Manager. DESCRIPTION cong_port is a menu-driven tool which enables the user to query, congure and uncongure OPC serial port B (B=1). Ther serial port can be set to VT100 terminal, printer, X.25, X.3 PAD, PPL (electronic software delivery) or TCP/IP over X.25. The default setting is VT100 terminal. For each of the settings, the following should be observed: VT100 Terminal: null modem cable 9600 baud (max) with auto baud 8 bit, no parity, 1 stop Printer: null modem cable 9600, 4800, 2400, 1200 & 300 baud 8 bit, no parity, 1 stop direct = without modem control simple = with modem control

X25: DTE to OS network cable MUST match all OS X25 parameters

A-7 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

X3: DTE to OS network cable MUST match all X25 & X3 parameters NOTE: The cong_port tool cannot be run simultaneously with the TL1 conguration tool.

1.5 cong_TL1 - tl1 conguration tool


NAME cong_TL1 - tl1 conguration tool LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> cong_TL1 DESCRIPTION cong_TL1 is a menu driven tool which permits the user to: set the ports (calls /etc/cong_port) set the vcp information (calls /iws/vcp/vcpinfo) restarts the TL1X25SM /etc/cong_port is another menu driven tool to congure the OPC ports (B, 2, 3) to be either terminal, printer, X25 or X3 PAD. /iws/vcp/vcpinfo is used to set the virtual circuit prole. /iws/vcp/vcpinfo can add a virtual circuit, delete a virtual circuit, list all virtual circuits and set the TL1 Protocol Identier. Due to problems with TL1X25SM, the MSR will go sysbusy/terminate whenever the OPC port B is not congured as X25. In order to congure port B to X25, it is necessary to always restart the TL1X25SM MSR. USES Congures TL1 parameters.

1.6 dbplog - Database Patch Log


NAME dbplog - Database Patch Log tool LOCATION /iws/test-tools/tool-res

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-8 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

SYNOPSIS To invoke type: opc> dbplog YYYYMMDDHHMMSS Where YYYYMMDDHHMMSS is YearMonthDayHourMinuteSecond DESCRIPTION This program removes future stamped events from the Log database by taking the Log System (NELogger) out of service. The OPC will stop collecting Logs during this activity ensuring the database patch will not be corrupted. Patching the database should take less than 2 minutes, at which time the NELogger will be returned to service. Once started this activity cannot be halted and the removed events will not be retrievable. USES For internal NORTEL use only (NOT TO BE USED IN THE FILED). Future stamped events are inconvenient because they always occur at the top of the Event Browser anchor window even though they may be months out of date. SAMPLE OUTPUT opc> dbplog 19971004235959 Take NELogger out of service (y/n)? y Attempting to busy NELogger. NELogger has been taken out of service. Making log database backup, le /iws/nel/data/logdb.dbp A temporary Log database backup has been made as as percaution, to be used in the unlikely event of a problem during the patch. In which case the patch will be aborted and the backup restored to active use. The backup will be removed after a successful patch. Remove all log events starting from 97 Oct 04 23:59:59 (y/n)? y Begining log database patch.

A-9 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1.7 dbpalm - Database Patch Alarm


NAME dbpalm - Database Patch Alarm tool LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke type: opc> dbpalm YYYYMMDDHHMMSS Where YYYYMMDDHHMMSS is YearMonthDayHourMinuteSecond DESCRIPTION This program removes future stamped events from the Alarm database by taking the Alarm System (Logalarmsystem) out of service. The OPC will stop collecting Alarms during this activity ensuring the database patch will not be corrupted. Patching the database should take less than 2 minutes, at which time the Logalarmsystem will be returned to service. Once started this activity cannot be halted and the removed events will not be retrievable. USES Fo rinternal NORTEL use only (NOT TO BE USED IN THE FILED). Future stamped logs are inconvenient because they always occur at the top of the Alarm Monitor anchor window even though they may be months out of date. ISAMPLE OUTPUT opc> dbpalm 19971004235959 Take Logalarmsystem out of service (y/n)? y Attempting to busy Logalarmsystem. Logalarmsystem has been taken out of service. Making alarm database backup, le /iws/las/data/alarmdb.dbp A temporary Alarm database backup has been made as as percaution, to be used in the unlikely event of a problem during the patch. In which case the patch will be aborted and the backup restored to active use. The backup will be removed after a successful patch. Remove all alarm events starting from 97 Oct 04 23:59:59 (y/n)? y Begining alarm database patch. 900 alarms removed.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-10 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

Finished Log/Alarm database patch. Attempting to return Logalarmsystem to service. (One moment please) Logalarmsystem has been returned to service. Removing Alarm datbase backup. Alarm database patch complete.

1.8 dmesg - internal kernel log information


NAME /etc/dmesg - internal kernel log information LOCATION /etc SYNOPSIS To view, type: opc> /etc/dmesg | more DESCRIPTION /etc/dmesg is a command that retrieves various internal messages generated by the kernel during initialization. These kernel messages indicate the status of the initialization sequence, any device failure and identify the OPCs tape drive, disk drive and other hardware devices. The more command allows you to page through /etc/dmesg or any other text le. After issuing the more command, help can be attained by typing h. To view the next page, press the <space> bar. To advance one line, press <return>. To quit more, type q. USES It is useful to view the /etc/dmesg le if there are problems with the tape drive or disk drive. One may attain the revision numbers of the disk drive, tape drive and any diagnostic messages. SAMPLE OUTPUT The following is output when /etc/dmesg is typed at the command line:

A-11 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1.9 drmstat - dynamic full screen Resource Manager User interface


NAME drmstat - full screen Resource Manager User interface (rmu). LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> drmstat DESCRIPTION drmstat is a full screen interface allowing monitoring and manipulation of the Managed Software Resources (MSRs) managed by the Distributed Resource Manager (DRM). It provides a full screen interface to rmu and a subset of rmus commands. It displays the current status of OPC MSRs as well as their current fault status. USES drmstat may be used to observe the current state of the Managed Software Resources (MSRs) on the OPC and their current fault count. The tool may be used to busy (BUSY) or return to service (RTS) MSRs.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-12 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

The proper procedure to busy out all MSRs is to busy out warmstandby, wait for all other MSRs to react to this, busy all other MSRs starting at the bottom of the list and working up. SAMPLE OUTPUT The following is a typical drmstat screen:

The MSR column gives the name of the DRM Managed Software Resource (MSR). This is the descriptor used by DRM to identify the process. It is not necessary the same as the actual executable name for the process. 1. OPC database manager. It controls all the access operations to the OPC object database. 2. MIT(Management Info Tree) manager. MIT stores all the distinguish name of objects. The applications need to retrieve the objects name in this directory before access the OPC database. All the following process have a dependency on opcdbmgr and mitshmld, which means that if either opcdbmgr or mitshmld dies, then all the process should die. 3. Warmstandby monitors the status of peer OPC commissioned in the same span of control(Primary, Backup or Slat). It will poll the peer OPC periodically(15 seconds). Once the peer is dead(if the peer is Primary OPC) or the network is split (i.e., Fiber was broken), it will switch OPCs state. In the former case, the Backup OPC will become active. In the latter case, both OPC will be active. Note: warmstandby has a dependency on the commissioning data to determine the role of the local OPC in the network.

A-13 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

All processes under warmstandby(See above screen) have dependency on warmstandby. Manbusy warmstandby will take the local OPC from active(if it was) to inactive and bring down all the applications on the OPC. 4. Download manager is in charge of downloading NE load from OPC to NEs. 5. CMISE association manager manages all CMISE communication among OPC and NEs within the span. If any problem exists, user should see the ? shown in the banner line or some UI tool such as Reboot/Load manager. Note: All the applications which need talk to NEs will have a dependency on cmisassocmgr. 6. Event manager manages to route the events coming from NEs or from some OPC applications to the destination application which are interested in the events. Notes: If eventhandler is in trouble, the ? will be seen in the banner line. If there is no NEs commissioned on OPC, eventhandler will shows INSERVICE:UNAVAILABLE after OPC initialized. 7. NE shadow manager maintains a copy of NE-based object attributes that are needed for provisioning of the NEs. It also stores the alarms, alert and protection count summary information for all the NEs within the span. Note: If neshadow is in trouble, the ? will be seen in the banner line. 8. NE database backup and restore manager is responsible to store and backup NE databases to/from OPC disk. 9. OPC log and alarm manager is responsible to collect and manage log and alarm events from NEs and the OPC. Note: If there is no NEs commissioned on OPC, logalarmsystem will shows INSERVICE:UNAVAILABLE after OPC initialized. 10.OPC 1:N system manager is responsible to manage the 1:N system conguration and store all the information such as channel status, protection status, etc. 11.OPC Remote TBOS manager provides the Telemetry Function to receive displays from some NEs and send the displays to a destination NE which has a TBOS connection to a TBOS device. 12.OPC load manager is responsible to manage the NE loads on the OPC and help to download NE loads to the NEs. 13.OPC Performance Measurements Collector is responsible to collect performance measurements from NEs and store these data in an OPC PM database. 14.X25 Virtual Circuit Prole manager provides the service to maintain and access the list of remote OSSs X.121 addresses that OPC may be connected to. 15.OPC TBOS object manager to manage TBOS alarms. 16.OPC TLI X25 session manager is responsible to set up X25-based TL-1 connection with the remote OSS

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-14 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

17.STS Connection Manager is responsible for managing the STS-1 connections in an OC-48 Ring network. 18.Depending on what product the OPC is supporting additional MSRs may be listed, or some of the MSRs listed will not be displayed if they provide a service which is not required or supported for the product the OPC is supporting. The STATE column identies the current state of the MSR. On an active OPC all of the MSRs should be in the INSERVICE:AVAILABLE state except for TL1X25SM which is only AVAILABLE if the OPC is congured for X.25. There should not be any processes in the BUSY state. The FAULTS column provides a count of the number of times the MSR has terminated unexpectedly. This count is reset to zero at midnight daily. When the fault count reaches 3, DRM will no longer try to restart the MSR and will put that MSR into a SYSTEM:BUSY state.

WARNINGS Placing a process in the BUSY state causes that process to terminate. Interdependencies between processes make it necessary to BUSY processes in a specic order. In general it is safer to BUSY processes starting at the bottom of the list - however, the safest approach is to shutdown the OPC as this restarts all processes in the correct order. Never BUSY the database (opcdbmgr) or MIT (mitshmld) processes unless all other processes are already in the BUSY state. This tool has the potential to disrupt the functioning of the OPC and even corrupt the database if used incorrectly. SEE ALSO ows_swact, rmu

1.10 Ndsadarp - send a NE DARP request


NAME dsadarp - send a NE DARP request to a specied OPC DSA LOCATION ./ SYNOPSIS To invoke, type: opc> dsadarp [ -p] [-n <nsap.>] [ -h ] <NE-id>

A-15 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

where: - p-Use the old style BNR object id numbers (1.3.666.0.1.1.0.1.0). - n- Specify an alternative DSA NSAP address, where nsap is the OSI NSAP address in the form 49.0000.XX-XX-XX-XX-XX-XX.00. - h-Specify the destination OPC that the DARP request should be sent to. host is the destination OPCs OSI hostname. - <NE-id> is the ID dened for the NE by OPC commissioning manager. DESCRIPTION dsadarp allows the user to simulate an NE DARP request to a DSA running on a specied OPC. The simulated DARP request will be sent as if it was from the network element NEid. The network element specied by NE-id must be in the span of control of the destination OPC, otherwise the resident DSA will not respond to the request. If no NE-id is specied, dsadarp will default to network element 1. USES dsadarp can be used for diagnosing DARP messaging problems between OPC and a specic NE. On the NE side, there are several applications use DARP message to nd who is the master OPC in the span of control. For example, before NE BRM gets database from OPC, BRM will send a DARP message to nd out which OPC should supply the database. Another example is when user issue OPC login from the NE, NE will send a DARP message to nd out which OPC is active. In both of the cases, if NE cannot receive the DARP responses from OPC, user can use this tool to nd out if it is the problem of OPC DSA, or it is the problem of physical connection between OPC and NE, or it is the problem of NE base software.

1.11 dsadib - display contents of the DIB


NAME dsadib - display contents of the DIB LOCATION ./ SYNOPSIS To invoke, type: opc> dsadib [-abpx ] [ -d <dsa-database> ] where: - a-Include alias entries when displaying the database. - b-Display contents of the database in long format. - p-Use proprietary object arc IDs.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-16 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

- x-Display a debug dump of entries in the database. - d-Use the database specied by dsa-database.

DESCRIPTION dsadib is a tool used to display the contents of the Directory Information Base (DIB). After OPC 08 release, Network Element information is no longer stored in the DIB, see nnsmon for more information.

1.12 esconf - End System Conguration tool


NAME esconf - End System Conguration tool LOCATION /usr/etc SYNOPSIS To invoke, type: opc> esconf [test] | [edit] DESCRIPTION esconf has many capabilities and only a small subset will be described here. In the following examples of uses, esconf will be used to alter the addressing scheme implemented between pre and post OPC15 loads. USES 1) Test, if the kernel is using FWP (old, pre-OPC15) or IEEE (new) addressing formats opc> esconf test krn_ieee 1) If the result message is Local KERNEL system id. -- within the IEEE range then the new addressing scheme is running. 1) If the result message is Local KERNEL system id. -- NOT within the IEEE range then the old addressing scheme is running. 1) Force the kernal to the new addressing format 1) opc> esconf 1) a menu screen will appear, type y I 1) once the command is completed, type q to quit 1) Force the kernal to the old addressing format 1) opc> esconf 1) a menu screen will appear, type y f 1) once the command is completed, type q to quit

A-17 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1) Convert the osihosts address for the backup and primary OPCs from the old to the new addressing format 1) opc> esconf edit host f2i backup 1) opc> esconf edit host f2i primary 1) Convert the osihosts address for the backup and primary OPCs from the new to the old addressing format 1) opc> esconf edit host i2f backup 1) opc> esconf edit host i2f primary 1) SEE ALSO nsap_tool

1.13 ether_admin
NAME ether_admin - EtherNet administration. LOCATION /iws/lan SYNOPSIS To invoke, type: opc> /iws/lan/ether_admin DESCRIPTION This menu-driven tool is used to initialize and control (turn on/off) the Ethernet port. The OPC can communicate with the Network manager or an X terminal only when the Ethernet port is enabled. The following optins are supported: Initialise and enable the ethernet port Ethernet port control (enable/disable) X terminal conguration When using ether_admin: NEVER use the /etc/reboot or /etc/reboot/shutdown commands; always use the OPC shutdown tool to restart the OPC Use ncd_x19_cong along with ether_admin to congure an NCD Xterminal

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-18 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

1.14 get_les - retrieve release management utilities


NAME get_les - retrieve release management utilities LOCATION /etc SYNOPSIS To invoke, type: opc> get_les [ -d ] where:
-d extract the les from /eswd (tape being the default)

DESCRIPTION get_les will perform the following actions on behalf of the user: cd /tmp dd if=/dev/rdt/tape2 bs=20b | tar xf USES get_les is used to extract from tape or disk the install directory and the les necessary to perform a release removal (rmopcld) or a release installation (install_release).

1.15 grok - General Resource, Operating system and Kernel tool


NAME grok - General Resource, Operating system and Kernel display tool LOCATION /etc/ktools SYNOPSIS To invoke, type: opc> grok [-h] [[-v][-s][-p <process_name>][-c <count>]][-i <interval>] where:
TABLE 5. -h -v -s displays the syntax for the command displays memory statistics displays swap space conguration

A-19 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

TABLE 5. -p <process_name> -c <count> provides PID and statistics on every process with <process_name> number of times to sample for system statistics.(may be used with options v,s,p and i) delay in seconds between samples (may be used with options v,s,p and c. When used with v, specify an interval >= 5 seconds)

-i <interval>

If no options are specied, grok becomes a menu driven application. DESCRIPTION grok is a designer-oriented tool used for monitoring the operating system and kernel resources and for examining the lower level functioning of the OPC system. It displays information on various areas including the state of cables, ports, memory and bus usage. USES grok is useful for diagnosing problems with cables and ports. The MC68302 screen displays information on the ports. The ethernet error counts screen displays ethernet statistics that reect the state of the cables. SAMPLE OUTPUT The main menu can be attained by entering grok while in the UNIX shell. The MC68302 screen can be attained by selecting h - Hardware Device registers submenu and then selecting option 3 - MC68302 info. Cong indicates the speed of the port. Mode indicates the number of stop bits, parity, character length, and status of the receiver and transceiver. The normal value for that eld is 010d. Stat shows the status of the modem lines. A Stat of 04 means that there is a terminal attached to the port. A Stat of 03 means that there is no terminal attached to the port or that the cable is bad. Any other Stat indicates that it is a bad cable (usually no modem lines). Note: the interpretation of these registers depends on the mode (if the port is UART or HDLC(X.25). This information only pertains to UART(terminal, printer or PPL). The stat eld is only valid if the maks is ff. The Ethernet error counts screen can be attained by selecting h - Hardware Device registers submenu and then selecting option e - Ethernet error counts. The sampling interval may be altered by typing s for slower or f for faster. It is useful in analyzing problems with the network. If Packets transmitted OK eld is not incrementing but Lost Carrier and No Heartbeat are, then there is a bad ethernet cable or no connection at all (especially if Packets received OK is not incrementing). If Framing errors are occurring, it is an indication of a bad cable. Slow response is usually characterized by a high number of collisions and can be explained by heavy trafc on the network.

SEE ALSO monitor

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-20 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

1.16 ipcmon - InterProcess Communications (IPC) monitor


NAME ipcmon - InterProcess Communications (IPC) monitor LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type:

A-21 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

opc> ipcmon <options> where <options> are:


-p -s -t <proc> R S x o d -q <mnsd> -n <pat> -l <pat> -w <pat> -g <pat> display the IPC names of processes using IPC display IPC services currently provided turn on message trafc tracing for the provided <proc> limit -t option to received messages limit -t option to sent messages turn off data display for -t option trace messages in octal instead of hex trace messages in decimal instead of hex query multi-nodal name server daemon (mnsd) status list multi-nodal name server (mns) entries list mns entries with LAN scope list mns entries with WAN scope list mns entries with GLOBAL scope

where
<proc> <mnsd> <pat> process id or an unique ipc_init name. mnsd identier (MNS-1 is default) any regular expression ( * is default)

DESCRIPTION ipcmon is a tool used to monitor which processes are using InterProcess Communications (IPC). IPC provides a process with various services with which to manage its communications. These services include the ability for a process to advertise or locate advertised services (resolve services), send or receive messages (message services), notication of message arrival (event services) and the means to congure the IPC environment (control services). Processes in the OPC software use the IPC functions to implement a protocol independent method of messaging to other processes. The OPC will not run unless IPC is running. USES When a process cant connect to an IPC service, ipcmon can be used in an attempt to isolate the problem to a process. ipcmon can be used on a process to determine if it is able to send or receive messages. SAMPLE OUTPUT ipcmon can be used to observe which processes are currently using IPC services to send messages to and from other processes by using -p.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-22 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

All IPC services currently available are displayed by using -s. This display can be examined to ensure that the IPC services requested by a process are currently available.

A particular process may have its messaging monitored by using ipcmon -t <proc> where <proc> is either the process id or process name (obtained from the PROCESS NAME column of the output from executing the ipcmon -p command). To end this command,

A-23 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

type CTRL_c. A summary of the number of messages sent and received by the process will then be displayed.

DEPENDENCIES Two processes, ipcwd and mnsd, must be running in order for IPC to run.

1.17 lasaldmp - dump all alarms


NAME lasaldmp - dump all alarms LOCATION /iws/test-tools/tool-res SYNOPSIS To use, type: opc>lasaldmp [-h | -a | -c] where:
-h -a -c displays help screen displays all historical alarms displays the current active alarms (default)

DESCRIPTION lasaldmp will list all alarms (active or historical) from the local OPC and all NEs in the span of control. USES lasaldmp is useful for retrieving alarm data. This data can then be captured for review. To view the alarm data, perform: opc>lasaldmp | more To capture alarm data to le called alarm.capture, perform: opc>lasaldmp > /alarm.capture

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-24 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

1.18 lasdump - dump OPC log le


NAME lasdump - dump OPC log le LOCATION /iws/test-tools/tool-res SYNOPSIS To use, type: opc>lasdump DESCRIPTION lasdump will list all OPC logs. This command is a symbolic link to the neldump tool. USES lasdump is useful for retrieving OPC log data. This data can then be captured for review. To view the log data, perform: opc>lasdump | more To capture log data to a le called log.capture, perform: opc>lasdump > /log.capture

1.19 lomRelease - create a processor load release le


NAME lomRelease - create a FiberWorld processor load release le LOCATION /iws/test-tools/tool-res SYNOPSIS To use, type: opc>lomRelease -p proc -l lname -r rel -s chksum [-f] [-d date] [-v vint]

A-25 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

where:
proc hmu - host messaging unit apu - access processing unit lc - line card tac - test access card rsp - radio shelf processor oc48 - oc48 shelf processor tss - transport services shelf processor rdn - radio node shelf processor demux - demultiplexer processor lpbk - loopback processor irtu - integrated remote testing unit apufw - rmware sts1 - sts1 processor lname rel chksum -f date vint must be identical to the name of the load le (i.e.the .LD.SWD le) that the release le is meant to accompany. the software release: fwp03 - fwp06, an07, REL_0500 - REL_0800 checksum of the le lname.LD.SWD (only the checksum is required, not the number of blocks) the -f option is valid only for rmware loads under TBM and ABM (optional) Must be in the form dd-mm-yy. The default is 0. (optional) Must be in the range 0-24. The default is 0.

DESCRIPTION lomRelease creates a FiberWorld processor load release le for the specied processor, loadname, release, and optionally specied date and/or card vintage.The le lomRelease creates will be named lname.REL.SWD. USES It used to be that processor load release les were created by hand. lomRelease is meant to hide the le format and contents from the user. SAMPLE OUTPUT The following is command when lomRelease is used to create the release le NT7E84CA0301.REL.SWD: The following is command when lomRelease is used to create the release le NT7E84CA0301.REL.SWD:

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-26 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

1.20 neldump - NE logger dump


NAME neldump - NE logger dump LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> neldump DESCRIPTION neldump will list all OPC logs. neldump provides the same functionality as lasdump. USES neldump is useful for retrieving OPC log data. This data can then be captured for review. To view the log data, perform: opc>neldump | more To capture log data to a le called log.capture, perform: opc>neldump > /log.capture

1.21 nelogin - NE login tool


NAME nelogin - NE login tool LOCATION /iws/tel SYNOPSIS To invoke, type: opc> /iws/tel/nelogin

A-27 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

DESCRIPTION nelogin implements the telnet protocol over the OSI stack. It allows logins from the OPC to either Network Elements in the span of control or peer OPC. Usage: nelogin [NE-id | OPC-Name][t-selector] where: - NE-id - Id of the Network Element, no t-se;ector required.

- OPC-Name - OPC name if login to the OPC is desired. Must provide a t-selector of 1600. - t-selector - OPC name if login to the OPC is desired. Must provide a t-selector of 1600.

1.22 nmasee - check what PMs are available on NMA


NAME nmasee - display the PM status for various counts available to NMA LOCATION /iws/nma/ SYNOPSIS To invoke, type: opc> /iws/nma/nmasee [xxxx] where xxxx = OS identication number, see le /iws/nma/nma_pm_report_ags.xxxx DESCRIPTIONS nmasee will dump the contents of the le /iws/nma/nma_pm_report_ags.xxxx. If the OS identication number is not provided a menu will appear with the following options: View INH-PMREPT state dumps /iws/nma/nma_pm_report_ags.xxxx, Manual AID validation check AIDs used in the surveillance interface Auto AID validation check AIDs used in the surveillance interface NOTE: A = active I = inactive

EXAMPLE OUTPUT Commands: 1 View INH-PMREPT state 2 Manual AID validation

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-28 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

3 Auto AID validation CR EXIT Enter command code: 1 Enter extension for le /iws/nma/nma_pm_report_ags: 0 NEN/AT1T3N/ASTS1OC48OC12OC3 506AAAAAAAAAA AIAAAAA Commands: 1 View INH-PMREPT state 2 Manual AID validation 3 Auto AID validation CR EXIT Enter command code: 2 Enter SCCM, AID, or CR to exit: oc3 1-as Enter system code: 0 - OC3 LTE 1 - OC3 Linear ADM 2 - OC12 LTE 3 - OC12 Regen 4 - OC12 Ring ADM 5 - OC12 Linear ADM 6 - OC48 LTE 7 - OC48 Regen 8 - OC48 Ring ADM 9 - ALL SYSTEM TYPES 1 OC3LADM: OC3 1-AS - VALID Enter SCCM, AID, or CR to exit: oc3 1-as Enter system code: 0 - OC3 LTE 1 - OC3 Linear ADM 2 - OC12 LTE 3 - OC12 Regen 4 - OC12 Ring ADM 5 - OC12 Linear ADM 6 - OC48 LTE 7 - OC48 Regen

A-29 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

8 - OC48 Ring ADM 9 - ALL SYSTEM TYPES 0 OC3LTE: OC3 1-AS - NOT VALID Enter SCCM, AID, or CR to exit:

Commands: 1 View INH-PMREPT state 2 Manual AID validation 3 Auto AID validation CR EXIT Enter command code: 3 Enter system code: 0 - OC3 LTE 1 - OC3 Linear ADM 2 - OC12 LTE 3 - OC12 Regen 4 - OC12 Ring ADM 5 - OC12 Linear ADM 6 - OC48 LTE 7 - OC48 Regen 8 - OC48 Ring ADM 9 - ALL SYSTEM TYPES 2 Enter facility type or CR for all: ds3 ******************************************************************** 2OC12LTE: Shelf 0; Type 1; Rate 0 T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, VALID 1-3G0-1 NOT VALID 1-3G0-2 NOT VALID 1-3G0-3 NOT VALID 1-3G0-4 NOT VALID 1-3G1 NOT VALID 1-3G1 NOT VALID 1-3G1-1 VALID 1-3G1-2 VALID 1-3G1-3 VALID 1-3G1-4 NOT VALID

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-30 Issue: AD04

Chapter

A
T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3, T3,

Complete set of OPC Tools


1-3G2 NOT VALID 1-3G2-1 VALID 1-3G2-2 VALID 1-3G2-3 VALID 1-3G2-4 NOT VALID 1-3G3 NOT VALID 1-3G3-1 VALID 1-3G3-2 VALID 1-3G3-3 VALID 1-3G3-3 VALID 1-3G3-4 NOT VALID 1-3G4 NOT VALID 1-3G4-1 VALID 1-3G4-2 VALID 1-3G4-3 VALID 1-3G4-4 NOT VALID 1-3G5 NOT VALID 1-3G5-1 NOT VALID 1-3G5-2 NOT VALID 1-3G5-3 NOT VALID 1-3G5-4 NOT VALID 1-3G1S NOT VALID 1-3G2S NOT VALID 1-A NOT VALID 1-B NOT VALID 1-AS NOT VALID 1-BS NOT VALID 1-AE NOT VALID 1-BE NOT VALID 1-AW NOT VALID 1-BW NOT VALID

For Nortel Internal Use Only

USES Display PM status (inhibited/uninhibited) for an NMA connection. Also validate AIDs for current TL1 NMA.

1.23 nnsmon/nnsfmon - network name service monitor tool


NAME nnsmon - network name service monitor tool nnsfmon - menu driven version of nnsmon

A-31 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> nnsfmon opc> nnsmon [-s | -r | -d | -D | -l | -L | -H <aec> | -c | -h] where: - s - Print status and statistics. - r - Reset the statistics - d - Print the database. - D - Print the database, showing internal information. - l - Print the entry for the local node. - L - Print the entry for the local node, showing internal information. - H - Print a hash table, where: a e c - addresses - equipment names - character names

- c - Calculate and print database checksums. - h - Print this message. The option -t and -n are supposed to be used by designer only. DESCRIPTION nnsfmon provides all the functionality of nnsmon but in a menu driven format. nnsmon is a tool used to monitor all routing information maintained in the network. It can list network name service protocols status and statistics information; print out the data in routing database; list all network-wide reachable node names and related addresses; check the validity of routing database, etc.,. It is a useful tool to help the user to get a view of network-wide connectivity and trouble shoot the communication problem between network nodes. Note that new entries are added immediately and non-available entries are removed in 20 minutes. USES nnsmon is used in conjunction with osiping to troubleshoot osi communication problems.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-32 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

1.24 numdump - Network Upgrade Manager Status Dump


NAME numdump - Network Upgrade Manager Status Dump (only during an upgrade) LOCATION /tmp/install/util SYNOPSIS To invoke, type: opc> numdump [-d directory] [-backup] [-e ] where: -d <directory> -backup -e -dist directory where the num.status le is located ( default=/iws/num ) Use the backup status and data les for all operations Modify upgrade task information,, design use only Access the status and data les for the numdist tool (Distribution Phase) rather than NUMs status and data les. DESCRIPTION numdump will translate the contents of the NUM status and data les le into english text and will allow the user to skip to various steps of the upgrade currently in progress. When the edit option is specied, a menu is presented listing the available actions. NUM should not be running if changes are made using the -edit option of numdump since there will be a conict writing to the NUM status and data les. The edit menu will have the following options: Change the current task. Change the current subtask. Modify pause after the current task. Display status/data les. Create backup status/data les. View backup status/data les. Restore backup status/data les. Exit NUMDUMP.

USES Used to dump the status of a current upgrade or to modify upgrade task information.

A-33 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1.25 numover
NAME numover - Override the HealthCheck and/or Baseline tools during an upgrade. Ionly during an upgrade) LOCATION /tmp/install/util SYNOPSIS To invoke, type: opc> numover from the /tmp/install/util directory. The tool will then display the override status of the HealthCheck and Baseline tools, follwed by options to toggle the override status of each of these tools on/off, and an option to quit the program. DESCRIPTION numover will allow the user to override the upgrades enforcement of a successful completion of the HealthCheck and Baseline tools. Normally, without the successful completion of both of these tools, the Activations Phase of an upgrade cannot be started. Numover provides the ability to start the Activation phase despite such a condition, by allowing the user to specify to the Network Upgrade Manager which tool(s) to ignore when performing this check. numover also provides the ability tot check the override status of each of these tools, simply by invoking the tool and quitting, since it will display this information automatically upon invocation. numover can only be invoked by the root user, for security reasons. SAMPLE OUTPUT The following is a snapshot of the main screen of numover, showing its display as well as command option: Network Upgrade Manager Override Tool ============================== HealthCheck tool Override: : off Baseline tool Override : off 1. Toggle Healthcheck tool Override 2. Toggle Baseline Tool Override 3. Quit Please select option:

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-34 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

1.26 obmdump - Connection and Timeslot Usage dump utility


NAME obmdump - Connection and Timeslot Usage dump utility LOCATION /iws/obm SYNOPSIS To use, type: opc> /iws/obm/obmdump cache where cache Dump and append the contents of the OPC cross-connect cache. Three les will be created or appended: /tmp/.obm_conn_cache - this le contains a listing of all crossconnects stored on the OPC /tmp/.obm_ne_cache - this le contains a listing, on a per-NE basis, of all Used and Available timeslots (in both Rx and Tx directions) on the tributary and both optical interfaces /tmp/.obm_isedit_cache - this le contains a listing of all active(i.e. left pending) connection in-service rollovers and their current states.

1.27 oft - OSI File Transfer


NAME oft - OSI File Transfer LOCATION /etc SYNOPSIS To invoke, type: opc> oft -R responder [-u name] [-p pw] [-v] [-d] [-e] [-L logle] \\p {-a act [-t type] [-o overwrite] -l lle [-r rle] | -f cle} where: All options are position independent. {} must supply only one of the options separated by |. [] denotes optional data.

A-35 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

R FTAM Responder Service Identier. Format service-name or host-name:servicename. If host-name is not specied, oft uses local node. For example to transfer le to backup OPC, use backup:ftamd. u log-in name on the remote node. p password to log into the remote node. v verbose. d activate debug facility. e perform the action (or command) many times or until receiving signal 16. Only for testing purpose. Use this option will degrade system. L name of the operation log le. a action to perform (put or get). t document type (FTAM-3). o overwrite existing le. r <r>eplace existing when successful. Will try to preserve old le permission on new le. When no existing le, give default access permission to the new le. o First delete existing le. Replace existing le when successful. Will try to preserve <o>ld le permission on new le. When no existing le, give default access permission to the new le. nFirst delete existing le. Replace existing le when successful. Give default access permission to new le. l name of the local le. r name of the remote le. f name of the le where all action details are kept. DESCRIPTION oft is used to transfer les from one OPC to another OPC using the OSI connection. Note all les are transmitted to the target OPC under the directory /users/VFS. oft is found under the /etc directory. USES Example to transfer /etc/hosts to the backup OPC, as /example/hosts.orig: opc> oft -v -Rbackup:ftamd -uroot -proot -aPut -oR -l/etc/hosts -r/example/hosts.orig NOTE:the hosts.orig le will be located on the backup OPC under the directory /users/VFS/example/.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-36 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

1.28 osinetping - Test OSI transport-level communications


NAME osiping - Test OSI transport-level communications LOCATION /usr/etc SYNOPSIS To invoke, type:
opc> osiping [<remote host name> | <network address> | broadcast] [<packet size>] [-n <number of packets>][-e <endbyte>] [-r <rate>] [-stream] [-l]

1.29 osiping - Test OSI transport-level communications


NAME osiping - Test OSI transport-level communications LOCATION /usr/etc SYNOPSIS To invoke, type:
opc> osiping [<remote host name> | <network address> | broadcast] [<packet size>] [-n <number of packets>][-e <endbyte>] [-r <rate>] [-stream] [-l]

where
<remote hostname> <network address> broadcast <packet size> hostname as it appears in the /etc/osihosts le address of the remote target node as appears in /etc/osihosts le or nnsmon sends packets to any node that has lower layer communication connections with the source node number of bytes in the message. This value may be set to any value between 1 and 16384 for TP4, or between 1 and 7168 for CLTP. The default value is 64. -n <number of packet> -e <endbyte> sets the number of messages that osiping will send. The default value is 10,000. If the end byte is specied, the -n will be ignored. rst packet sent is of <packet size>. Each subsequent packet sent is one byte larger than the previous. These packets are continually sent until the packet size reaches the specied end byte size rate controls how fast osiping send messages. The default value of rate is 1 second.

-r <rate>

A-37 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

-stream -l

causes the test to be run over TP4 socket. osiping does the test over CLTP socket by default. causes osiping to ignore the rate value. Under this option, osiping sends and receives messages on a lock-step base. The -r will always be ignored.

DESCRIPTION osiping is used to test OSI transport-level connectivity between the OPC and Network Elements (NEs) or a remote OPC. The target NE is specied by the rst argument [<remote host name> | <network address>] which must be either a host name or network address, as specied in the /etc/osihosts le. Use nnsmon to determine the network addresses. osiping also provides statistics on packet round-trip time, loss packets, and duplicated packet counts. osiping starts by opening a TP4 or a CLTP socket to echod of the target node. CLTP is the default socket. echod is a daemon spawned by ositsapd that sends the message received back to the sending node. For a CLTP socket, some messages sent may be disposed of by ositsapd or along the route to the destination. Consequently, the number of packets sent by osiping using a CLTP socket may not equal the number of packets received. With a TP4 socket, there is a guaranteed delivery of all messages, thus the number of packets received must equal the number of packets transmitted. osiping then sends messages to echod at a rate specied by the -r parameter. Echod responds by sending back messages that it received. osiping calculates the round-trip time by using the time-stamp in the messages. Missing and duplicated messages are detected by using a sequence number assigned to each message. USES osiping is used when there is an inexplicable loss of communication with an NE or remote OPC. One can osiping the NE using its network address or its hostname as indicated in the osihosts le. If no packets are received back from the remote node from the source node, then lower level communications has been lost with this remote node. When osiping indicates a loss of lower layer communications, the following should be checked: 1. Use tstatc and view the routing table screen (option R). This table indicates all of the nodes that the source node has the physical ability to communicate with. If the address of the remote node does not appear in this table then the cable must be checked to ensure that there is an actual hardware link from the source node to the remote node. 2. If the cable is determined to be in place and undamaged, check that ositsapd is running on the remote node. This is done by ensuring that /etc/ositsapd exists. 3. If ositsapd is up and running on the remote node, then ensure that /etc/echod exists and is in /etc directory.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-38 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

4. The following les must exist and contain the following:


/etc/osihosts /etc/services /etc/ositsapd.conf -must contain address and hostname of remote node, if the OPC node name is going to be used. -must exist -must contain the following two lines: echod stream tp4 wait root /usr/etc/echod echod echod dgram cltp wait root /usr/etc/echod echod -echod must be readable/executable for root

SAMPLE OUTPUT The following is the output from executing the osiping command from the UNIX shell.

To terminate the tool, use CTRL-C. DIAGNOSTICS osiping: could not identify host name or address... The remote host name or address could not be identied. If a name was specied, it could not be found in /etc/osihosts, and if an address was specied, it was not formatted correctly. The address must be 25 or 28 characters in length, in the form: 49.0000.xx-xx-xx-xx-xx-xx or 49.0000.xx-xx-xx-xx-xx-xx.yy or 49+0000xxxxxxxxxxxx osiping: packet size too big, maximum size is 7168 bytes This message indicates that the packet size or end byte greater than the maximum CLTP message size. osiping: packet size too big, maximum size is 16384 bytes

A-39 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

This message indicates that the packet size or the end byte greater than the maximum TP4 message size. osiping: rate ... must be greater than or equal to 0. When -r option is specied, it must not be a negative value. DEPENDENCIES osiping assumes that ositsapd is running on the remote node (if an OPC), and that it can start echod. Also, the local and remote hostnames must be appear in the /etc/osihosts le, and the local hostname must appear in the /etc/hosts le. WARNINGS System performance may be affected by the use of osiping with out limiting the number of packets (option -n) to a smaller value than the default value of 10,000. SEE ALSO tstatc, nnsmon

1.30 ows_swact - force OPC switch of activity


NAME ows_swact - force OPC switch of activity LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> ows_swact [-aipedv|h] [process_name] where a i c p e l Make this OPC active and the peer Inactive Make this OPC inactive and the peer Active Cancel Force Activity Switch Prevent users from Switching Activity by disabling Force Switch Allow users to Switch Activity by Enabling the Force Switch Lock Backup OPC into inactive state. Local or peer Backup OPC remains in this state even if Primary is inactive or if OPCs cannot communicate. Available for Backup only. Release local or peer Backup OPC from inactive lock Enter upgrade mode. Exit upgrade mode.

r u s

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-40 Issue: AD04

Chapter
v d h q

Complete set of OPC Tools


Verbose - show all messages Turn debug on help quit.

For Nortel Internal Use Only

[process_name] - optional process name DESCRIPTION ows_swact is used to force a switch of activity on an OPC pair. Forcing a switch of activity will cause the active OPC to go inactive and the inactive OPC to go active. Issuing ows_swact without parameters will bring the user into a menu driven display of ows_swact. USES ows_swact is used primarily for network troubleshooting or manual upgrades of the network. SEE ALSO drmstat

1.31 query - gather important OPC and NE status


NAME query - query OPC and NE information LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> query [pp [ all | <pp_type> ]] [opc] [[ primary | backup ] opc load] [[ primary | backup ] opc date] [[ primary | backup ] opc name] [ne] [ne name] [ne logs [ NEid username password ]] [ne association] [ne load [ all | pp | fw ]] [logs [ log_id ]] [upgrade] where: pp [ all | <pp_type> ] opc [ primary | backup ] opc load [ primary | backup ] opc date Peripheral Processor All OPC info listed below for Primary and Backup OPC load running OPC date and time

A-41 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

[ primary | backup ] opc name ne ne name ne logs [ NEid username password ] ne association ne load [ all | pp | fw ] logs [ log_id ]

upgrade

OPC name and status All ne info listed below for all NEs NE name and status State of NE TRAP or SWERR State of NE association Load in iws/lom/loads State of various logs types: NAD329, STBY702, STBY612, STBY356, STBY348, STBY509... State of upgrade

DESCRIPTION query is used by prepare_upgrade, pre_upgrade and post_upgrade checks. The information provided is minimal and is meant for status indication rather than indepth information gathering. USES Determine health of the OPC and NE..

1.32 remotsh - remote shell over OSI


NAME remotsh - remote shell over OSI LOCATION /usr/bin SYNOPSIS To invoke, type: opc> remotsh <host> [ -l loginName ] [ -n ] <command> where remotshlocated under /usr/bin <host>name of target OPC found in /etc/osihosts [loginName]userid privileges, default is the login userid -nredirect /dev/null to remote stdin <command>command to execute within the remote shell generated at <host> with privileges dened by [loginName]

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-42 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

DESCRIPTION remotsh can be used to execute single commands at a remote OPC site. remotsh creates a UNIX shell on the target host OPC. The shell will have the privileges set by the loginName userid. NOTE: only the root user can create a csh, all other users will automatically initiate a usm session, therefore it is recommended that only the root user use remotsh. Once the shell is established, the command specied on the command line will be issued to the remote end. All remote stdin and stdout are routed to the stdin and stdout of the local OPC for the duration of the remote shells existence. NOTE: only the standard ASCII set is interpreted correctly, special function and control keys are not recognized. USES remotsh can be used to execute single commands at remote OPCs. Issuing a single command through remotsh is faster than performing a remote OPC login. When used in conjunction with other commands remotsh can be used to transfer les between the local OPC and remote OPC. EXAMPLES 1) Want to see the syslog at the backup OPC opc> remotsh backup cat /var/log/syslog | more 2) Want to run cong_port at OPCM004 opc> remotsh OPCM004 /etc/cong_port 3) Want to transfer syslogs from backup OPC to primary OPC opc> remotsh backup cp /var/log/syslog /var/log/syslog.backup opc> remotsh backup tar cvf - /var/log/syslog.backup | tar xvf 4) Want to transfer syslogs from local OPC to OPCM004 opc> cp /var/log/syslog /var/log/syslog.local opc> tar cvf - /var/log/syslog.local | remotsh OPCM004 tar xvf DEPENDENCIES remotsh works only if there is osi connectivity available over SONET DCC or CNET to the remote OPC.

1.33 ristool - release information tool


NAME ristool - release information tool LOCATION

A-43 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

/iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> ristool /iws/info/release.info <option> where <option> can be:
-allles -directory -getles <outle lename> -listles -opcloadname -releasename -upgradetype -nereleasename -lenameopc prints out all the les listed in the release.info le. prints out a unique directory name to be used to store the les listed in the release.info le. lists all the les to <outle lename> that are required to upgrade the current OPC. lists all the les to the screen that are required to upgrade the current OPC. prints out the external OPC load name found in the release.info le. prints out the release name of the release.info le. prints out the type of upgrade required to upgrade the current OPC to the release of the release.info le. prints out the release found the fourth eld of the NE release les. It is the same for all loads. prints out the le name for the opc load. Example: opc13am_hp_80.tar.

DESCRIPTION ristool is used to help lter through the data stored in the /iws/info/release.info le. The release.info le is a plain text le viewable with more or view. The release.info le contains the names of all les needed to congure the OPC, for upgrade and for download. USES ristool can be used to save the user from having to manually search the /iws/info/ release.info le.

1.34 rmopcld - remove OPC load


NAME rmopcld - remove OPC load LOCATION Retreived off an OPC tape

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-44 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

SYNOPSIS To invoke, put the software delivery tape of the release which is currently running on the OPC in the tape drive and type (the rst two commands can be replaced by get_les): opc> cd /tmp opc> dd if=/dev/rdt/tape2 bs=20b | tar xvf opc> cd install opc> rmopcld [-e | -f ] where: -e -f preserve ethernet force remove without prompts

DESCRIPTION rmopcld is used to remove a release. In the process of the removing the release, a number of system les are also replaced. The replacement of the system les will assure that the OPC Port B is reset to terminal, the passwords are all reset and the EtherNet port is disabled (unless the -e option is specied). Note: The base Operating System is not replaced in its entirety. To determine what system les are replaced, look under the /tmp/install/rmopclddir directory. Depending upon the kernel release, HPUX6.5 or HPUX8.0, look under the appropriate directory structure. Under hpux_6.5 will be all the les replaced during a rmopcld for that hpux version. Under hpux_8.0, will be a tar le, rmopcld_basesys.tar, which contains all the les replaced during rmopcld. To view the les, enter tar tvf rmopcld_basesys.tar | more The following les/directories are removed by rmopcld: Subdirectories in /usr: everything but adm, bin, contrib, etc, games, include, lib, local, mail, news, preserve, pub, spool, tmp /iws/ /nmsir/ /opcbld*/ /nmbld*/ /home/ /eswd/ /backout/ /etc/xntp/

A-45 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

/usr/spool/bftp/ /tmp/* /iws/lib/*.sl /users/VFS/nedsb/* /users/VFS/download/* /users/VFS/users/opcods/* /users/VFS/users/anonymous/* /cleanup_les /loginInstall /num_dist_started /upgrade_state /etc/checkupg /etc/install_release /etc/osiaet /etc/xntpdsync /usr/etc/xntpdsync The following Files are replaced by rmopcld: /usr/etc/echod /etc/ositsapd /usr/etc/remotshd /etc/ftamd /usr/local/lib/osi/libaps.sl /usr/local/lib/osi/libcm.sl /usr/local/lib/osi/lib.sl /usr/local/lib/osi/liboft.sl /usr/local/lib/osi/librose.sl + anything else delivered in the rmopcld_basesys tarle.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-46 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

The following les are preserved if -e is specied: /etc/hosts /etc/netlinkrc USES rmopcld can be used as an alternative to a disk initialization. Note: only a disk initialization can guarantee a complete Operating System replacement.

1.35 rmu - resource manager user interface


NAME rmu - resource manager user interface LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> rmu Once rmu is active, the opc> prompt will change to wm>. This indicates that rmu is in workstation manager mode and is ready to accept commands. In order to monitor OPC processes, the user must enter the command pm. The prompt will then change to pm> and rmu can then be used to monitor OPC processes. The available commands are:
help [<command>] pm wm admin query {base | descendant} {all | service} {<target_name> | <root_name>} query next map desc <descriptor> map id <object_id> Provides specic syntax information on the specied command or a list of valid commands when specied alone. communicate with process manager (prompt changes to pm>) communicate with workstation manager (prompt changes to wm>) forces a restart of the resources Master and a re-read of con guration data for that resource. user species whether that specic target_name (base) or the rst applicable element in the subtree starting from target_name (descendent) is being referenced. root_name indicates the maximum height upwards in the tree which can be reached during the traversal. If no value is specied for the instance value, denoting the class, a 0 specier will automatically be assumed. provides information on the next element in the MIB allows user to map an ID to a descriptor. allows a user to map a descriptor to an ID.

A-47 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

connection create <name> connection retry <count> connection timeout <timeout> connection disconnect <name> connection use <name> connection list <name> exert {busy | rts} <name> Workstation only: exert {ofine | attribute | decongure} <name>

create a new connection. set retry count for current connection. set timeout for current connection. disconnect and destroy a new connection. use the named connection for all further DRM operation. list all known connections. causes the resource <name> to enter a busy state (MAN_BUSY) or to be returned to service (rts) (IN_SERVICE). allows one to ofine the workstation, change its attributes (must rst query it. Note that the new attribute values entered by the user are not nested to ensure that the identier used for the query matches that for the attribute change exertion. If no value is entered to any of the prompts during this operation, the existing value is left unchanged) or remove (decongure) resource from management information base (MIB) force restart of the workstation resource

Workstation only: exert pmreset {reboot | halt | shutdown} <objid> <args> Process only: exert spare <name> notify add {base | descendent} {all | service} <target_id> notify delete <target_id> quit

force sparing of the resource <name>. allows user to register for notications at any level in the tree, and specify whether notications of changes within descendents are to be propagated.

any answer starting with a y will cause the program to exit.

DESCRIPTION rmu is a test tool which provides the functionality of a Distributed Resource Management (DRM) client and is useful in testing Workstation Management (WM) and Process Management (PM). It can be modied to work from scripts for the purposes of automated testing. rmu allows the user to perform a mapping between a resources or classs object identier and object descriptor, register for resource state change notications, receive and display received state change notications, perform queries of resources associated with that resource manager, address both Process Management and Workstation Management managers and perform method exertions on resources associated with a workstation manager and receive responses to these exertions.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-48 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

USES rmu can be used to display information on any Managed Software Resource (MSR) on the OPC. Ensure that the process management prompt is displayed (pm>). The <name>s of the MSRs are: opcdbmgr, mitshmld, warmstandby, downloadmgr, cmisassocmgr, eventhandler, neshadow, backuprestoremgr, logalarmsystem, onefornobjectmgr, rntojecdtmgr, loadisumgr. The MSRs may be queried, busied or restarted using rmu. rmu can be used to restart an MSR if it has been put into SYSTEM/BUSY state due to exceeding the maximum fault count allowed (3 retries before DRM will system busy the MSR). It is suggested that the tools, drmstat or drmstat, which call rmu, be used instead of a direct call to the rmu tool. SAMPLE OUTPUT The following displays the information available and format used for a query command:

DEPENDENCIES rmu requires that the IWS IPC be available on that node, and that the DRM Administration Service be running in a context visible to rmu, and that the DRM Workstation Management be running on the same node. Process Management may or may not be present. If process management is present, rmu will connect with it. Otherwise, it will begin interactive operation solely with Workstation Management, rejecting any commands directed at Process Management. SEE ALSO drmstat

A-49 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1.36 scancs
NAME /iws/obm/scancs - A connection services data viewing tool LOCATION /iws/obm/ SYNOPSIS To invoke a help screen for usage and available options, type: /iws/obm/scancs DESCRIPTION Usage: [-o] [-s scantype] This program will output information maintained by the OPC Connection Services application; this information includes provisioned cross-connects, Network Element bandwidth usage, and Network Element conguration information. [-o] The -o command line option is optional and allows the user to specify that the output should be written to an output le. This programs output is appended to the appropriate output le, depending on the [scantype] option (see below) chosen: For [scantype] xcdata, the output will be written to /tmp/xcdata. For [scantype] bwusage, the output will be written to /tmp/bwusage. For [scantype] congs, the output will be written to /tmp/congs. For [scantype] readdb, the output will be written to /tmp/readdb. If the -o option is not specied, this programs output will be directed to the screen (standard output). [-s scantype] The -s command line option allows the user to specify which data to display. It is followed by the scantype argument which can be one of the following: - xcdata: display all provisioned cross-connect data - bwusage: display all Network Element bandwidth usage - congs: display all Network Element congurations

1.37 scct/scfxt - commissioning manager


NAME scct or scfxt - commissioning manager LOCATION /iws/scf

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-50 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

SYNOPSIS To invoke, type: opc> scct -d DESCRIPTION scct and scfxt are the display managers for the commissioning manager. By invoking the executables with the -d option the complete system commissioning data will be dumped to the screen. USES scct -d and scfxt -d is useful for getting a quick look at the commissioning data and for storing all the commissioning data into a le. EXAMPLES 1) Store commissioning data to le opc> /iws/scf/scct -d > /commissioning.data

1.38 snapshot - gather all relevant OPC information


NAME snapshot - gather all relevant OPC information (NOTE: this tool is not fully functional) LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> snapshot DESCRIPTION snapshot is a menu driven tool designed to gather pertinent trouble shooting information about the OPC. The data gathered are:
General OPC information Retrieves the following information: date, hostname, ip address, kernel issue, hardware name, OPC software version, UNIX version and NE loads Retrieves all commissioning information. Equivalent to scct -d command Performs bdf on disk and performs nd on all directories and les. NEVER kill this process when it is running... it may corrupt other processes Performs ps -ef

Commissioning Data Disk Space and Disk Usage

Running Processes

A-51 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

Loaded Packages CPU / Memory Requirements Software Installation Logs System Logs Log Database Alarm Database

Lists the OPC packages installed on the OPC Retrieves a list of running user processes and the amount of CPU used and a list of the total usage of les, processes and swap space. Retrieves the contents of the current /usr/adm/swilog and/or the contents of the historical /usr/adm/swilog.old# (#= 1 or 2 installations back) Retrieves the contents of the current /var/log/syslog and/or the contents of the historical /var/log/syslog.day-# (#= 1 to 7 days back) Retrieves event logs based on user selectable lter. Filtering is by alarm type (Critical, Major, Minor, Warning) and by date Retrieves alarms based on user selectable lter. Filtering is by current or history, alarm type (Critical, Major, Minor, Warning, Cleared) and by date Retrieves the status of all MSRs. Equivalent to drmstat or rmu Retrieves the status of the association Retrieves the status of the communication layers

MSR States Association States Communication States

DESCRIPTION snapshot provides a quick and simple means of accessing many of the OPCs critical information and tools. The information gathered can be displayed on screen or sent to a le, /tmp/snapshot.trace.###, where ### is the PID of snapshot tool. Presently there are a number of options which are not supported in Rel 08. Also the help facilities are not presently supported. The Disk Space and Disk Usage option takes approximately 10 to 20 minutes to run, while running, never kill the process as it may corrupt other processes. USES This tool is very useful for retrieving data for trouble shooting purposes.

1.39 socdump - dump span of control


NAME socdump - dump span of control LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> socdump DESCRIPTION socdump translates the span of control table stored in the encrypted /iws/dlm/ opcspan.tbl le into a readable format and then displays it to the screen.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-52 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

The span of control table contains the OPC Distinguished Name (DN) along with the DNs of all the Network Elements (NEs) under its span of control. The span of control table is created during commissioning. It is used by the Directory Service Agent (DSA) to determine which NEs the OPC must service. When an NE broadcasts for its primary OPC, the DSA checks the span of control table for this NEs DN. If it is found, then the NE will be serviced by this OPC. The contents of the span of control table is updated by the commissioning manager. The download manager uses the span of control table to ensure that a particular NE is in its span and thus is to be serviced. The association manager accesses the span of control table to determine if any NEs have been added or deleted from the OPC. USES socdump is useful in determining which NEs are under the span of control of a particular OPC. The situation may occur that a NE is removed from the span of OPCa to the span of OPCb. This NE is inserted in OPCbs span of control table. If the NE was not correctly removed from the OPCas span of control table, then there will be confusion when the NE loses association with its primary OPC and starts broadcasting the message Who is my primary OPC?. Both OPCa and OPCb have the ability to re-establish a connection with this NE. The rst available OPC will be the one that does. There is no guarantee that it will be the correct one. By using socdump, it can be determined exactly which NEs the OPC can service. SAMPLE OUTPUT The following is the output from socdump for the primary OPC on a system with 3 NEs. One record exists for each NE. The record displays the DN (networkId, fWPSystemId, NEid) and the services which the OPC provides to the NE.

SEE ALSO spock

1.40 spock - SPan of Control Kreator


NAME spock - SPan of Control Kreator LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type:

A-53 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

opc> spock [-s][ -C | -U | -I <NE Id> | -D <NE Id> ] [-b | -r | -c | -m | -u | -a <NE Id> | d <NE Id>][-q] s - execute in silent mode. C - Create an empty Master and Active SOC les. I - Insert the specied nEId to Master SOC le. D - Delete the specied nEId from Master SOC le. U - Audit the Master SOC le. Q - just query the state of Master SOC le. b - backup both Master and Active SOC le. r - restore both the backed up Master and Active SOC le. c - create an empty Active SOC le. m - make an Active SOC le from the Master SOC le. a - add the specied nEId to Active SOC le. d - delete the specied nEId from Active SOC le. u - audit the Active SOC le. q - just query the state of Active SOC le. DESCRIPTION spock is a tool which reads the Network Element (NE) Distinguished Names (DNs) from the span of control table of the resident OPC. It then accesses the Management Information Base (MIB) to attain the recorded NE DNs under this OPCs span of control. These two sets of information are compared. If any discrepancies exist, the MIB information is considered to be correct and the span of control table is recreated. The result is that the information stored in the MIB and the span of control table is identical. USES spock is run when the commissioning manager tool is closed to ensure the NE DNs in the span of control table and the MIB are identical. If they are not identical, the span of control table is recreated with the information found in the MIB. Consequently, after running spock, regardless if discrepancies between the span of control table and the MIB were found, the result is that the information in the MIB and span of control table will be identical. Manual invocation of spock can be useful when a loss of association occurs between a NE and an OPC yet lower-layer communication with the NE still exists. This loss of association can occur if the span of control table has been corrupted or the Directory Service Agent (DSA) is having trouble interpreting the contents of the span of control table. Invoking spock will replace any corrupt data in the span of control table with the correct data as found in the MIB.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-54 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

Use of this tool is not typically required. If a SOC related problem is discovered please ensure BNR is contacted to investigate the root cause before the SOC table is recreated. SAMPLE OUTPUT

SEE ALSO socdump

1.41 stpprov - STS provisioning tool


NAME stpprov - STS provisioning tool LOCATION /iws/test-tools/tool-res SYNOPSIS stpprov is a batch provisioning tool that allows multiple STS-1 connections to be added or deleted from the OPC database. These connections can be added or deleted according to a set of commands that are specied in a data le. These commands must follow a syntax that is specic to stpprov. It is possible to add multiple connections on consecutive channels and tributaries using one command (bulk provisioning approach).

A-55 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

stpprov also allows a user to dump all the connections in the OPC database to a data le. This data le will list all the connections as specic commands that can be read by stpprov for the purpose of populating the OPC database with these connections. Note: You have to be the root user to use all of stpprovs commands, only the bulkadd command is available to all users. USES Dumping Connection Data to a File Connections can be dumped to a data le by typing: stpprov -s output_le at the command line. Where output_le will be the name of the data le that will contain all the connections in the OPC database. This le will have the connections listed as commands that individually add connections into the OPC database using stpprov. Also, this le will be created in the same directory that stpprov is called from. If there is already a le named output_le, stpprov will delete that le and replace it with the data le that it creates. Adding or Deleting Connections Connections can be added or deleted in the OPC database by typing: stpprov < input_le at the command line. Where input_le is the name of the data le that contains the commands to add or delete connections. These commands must follow a certain syntax for successful operation. This syntax is discussed in the section stpprov Command Syntax. The commands in input_le only affect the OPC database. Using the Connection Manager, an audit correct must be done to pass the connections to the NEs. Note: Although it is recommended to use an input_le, stpprov can be used without any parameters. The connection data can be entered from the command line with the same format used in the input_le. See Adding Individual Connections in stpprov Command Syntax. Type quit or CTRL-d to exit. Displaying the help text The help text can be displayed by typing: stpprov -full (root user only) or stpprov -h stpprov Command Syntax Deleting Connections Currently, the only delete functionality stpprov supports is that of deleting all the connections in the opc database. The command and syntax for that functionality to put in the data le for stpprov input is: remove all_trafc; Adding Individual Connections Use this command to add connections individually in the OPC database for any type of conguration. Its syntax is as follows: add_cons cong connect rate [squelch] [direct] ne_a trib_a [drop_pt_1] [trb_drop_pt_1] ... [drop_pt_14] [trb_drop_pt_14] ne_z trib_z channel [vtgroup vtnumber] [route] [ne_a_s] [trib_a_s] [ne_z_s] [trib_z_s] ;

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-56 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

cong connect

- conguration name enclosed in quotation marks (1-20 characters) - connection identier enclosed in quotation marks (1-40 characters) - if there is no connection name, just place two quotation marks with nothing between them rate - connection rate (STS-1, STS-3C, STM-1, VT1.5) o squelch - squelching type for an STS-1 connection (STS, VTM) only applicable for STS-1 connections -- STS is default o direct - [optional] directionality (UNI, BI) -- BI is default ne_a - identication code for NE A (1-32767) trib_a - tributary for NE A enclosed in quotation marks o drop_pt - identication code for drop point (1-32767) o trb_drop_pt - tributary for drop point enclosed in quotation marks ne_z - identication code for NE Z (1-32767) trib_z - tributary for NE Z enclosed in quotation marks channel - STS-1 channel number o vtgroup - [optional] the vt group for a VT1.5 channel connection (1-7) o vtnumber - [optional] the vt number for a VT1.5 channel connection (1-4) o route - [optional] route variable for rings - long or short (l or s) - short is the default - Will also accept G1 or G2 as a facility ID for NE A o ne_a_s - [optional] identication code of secondary gateway node for NE A (1-32767) o trib_a_s - [optional] tributary of secondary gateway node for NE A enclosed in quotation marks o ne_z_s - [optional] identication code of secondary gateway node for NE Z (1-32767) o trib_z_s - [optional] tributary of secondary gateway node for NE Z enclosed in quotation marks ; - termination for the command line Any attribute that is not marked by an o or in square brackets must be included in the command line. Attributes marked with an o or in square brackets are optional in some types of connections, mandatory for other types of connections, or may cause an error if included in other types of connections. Adding connections using bulk provisioning The bulkadd command is used to provision multiple connections on consecutive channels and tributaries. This command can only be used for LINEAR CONFIGURATIONS. Specify the connection parameters, the channel and

A-57 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

tributaries of the rst connection and the number of desired connections. The connections will be added on consecutive tributaries and channels starting with the ones specied in the command. Heres the syntax of bulkadd: bulkadd cong connect rate [direct] ne_a start_trib_a ne_z start_trib_z start_channel nb_of_conn ; cong connect - conguration name enclosed in quotation marks (1-20 characters) - connection identier enclosed in quotation marks (1-40 characters) - if there is no connection name, just place two quotation marks with nothing between them rate - connection rate (STS-1, STS-3C, STS-12C STM-1, VT1.5) o direct - [optional] directionality (UNI, BI) -- BI is default ne_a - identication code for NE A (1-32767) start_trib_a - tributary for the start connection on NE A enclosed in quotation marks ne_z - identication code for NE Z (1-32767) start_trib_z - tributary for the start connection on NE Z enclosed in quotation marks start_channel - STS-1 channel number nb_of_conn - total number of desired consecutive connections ; - termination for the command line Any attribute that is not marked by an o or in square brackets must be included in the command line. Nodal Provisioning A nodal cross connection refers to a local cross connects on one NE (trib to transport or vise versa). Use this command to add nodal connections individually in the OPC database for any type of conguration. Its syntax is as follows: add_xc ne rate [squelch] [direct] ne_a ne_z source sink [ne_a_s][ne_z_s]; ne - identication code for NE (1-32767) rate - connection rate (STS-1, STS-3C, STM-1, VT1.5) o squelch - squelching type for an STS-1 connection (STS, VTM) only applicable for STS-1 connections -- STS is default o direct - [optional] directionality (UNI, BI) -- BI is default ne_a - identication code for NE A (1-32767) ne_z - identication code for NE Z (1-32767) source - tributary name (DS1 Gx y, where x=<1-16> and y=<1-3>) sink - transport name (Gx y)

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-58 Issue: AD04

Chapter

Complete set of OPC Tools


o ne_a_s o ne_z_s ;

For Nortel Internal Use Only

- [optional] identication code of secondary gateway node for NE A (1-32767) - [optional] identication code of secondary gateway node for NE Z (1-32767) - termination for the command line

Any attribute that is not marked by an o or in square brackets must be included in the command line. Attributes marked with an o or in square brackets are optional in some types of connections, mandatory for other types of connections, or may cause an error if included in other types of connections. Notes for Adding/Deleting Connections An audit must be done after connections are added or deleted to/from the OPC database. stpprov only manipulates the OPC database, it does not propagate the connections onto the network. Connections that are owned by (i.e. setup by) OSI and TL1 can not be deleted from the OPC database using stpprov.Only connections that are set up by stpprov, the Connection Manager, Network Manager, and INA can be deleted by stpprov. Each command line must end with a semi-colon or else an error message will appear. Each attribute given in a stpprov command must be separated by one blank space. As many different blank spaces, tabs, and carriage returns desired may separate the attributes. A command line may span more then one line in a data le. Conguration names, connection names, and tributary strings must be in quotation marks. A secondary gateway for only NE Z may be specied by entering zeros for the two NE A gateway parameters and lling in the NE Z parameters appropriately. In bulkadd, if a number of desired connections exceeds the existing number of channels and tribs for a conguration, bulk provisioning will be stopped once the limit is exceeded. The progress/error messages may be output to a le by running the tool with the following format: stpprov < input_le > output_le where input_le is the data le that contains all the commands and output_le contains all the progress/error messages.

Error Messages for Adding/Deleting Connections Command Line Syntax Errors Command line syntax errors are errors that are found by stpprov in command lines that are in an input data le for stpprov. These errors may be incorrect syntax for a command and stpprov can not recognize it or the error may be with the logic of the connection. Some of these command line syntax errors include: unexpected command: <the command>

A-59 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

conguration name exceeds 20 characters connection name not in strings UNI is not available for non-STS-1 connections NE id is outside of range 1-32767 cant have a UNI VT-managed STS-1 connection .invalid channel number STS VT-managed connections dont have tributaries unexpected character(s): <list of character(s)> Number of desired connections not specied

These are just a few of the many command line syntax errors that may appear as output from stpprov. These errors are explicitly mention in stpprovs output. Bandwidth Errors Bandwidth errors occur when all the command line syntax is correct but the bandwidth that is requested is in error or already allocated. Unlike command line syntax errors, bandwidth errors do not explicitly mention the exact error. Output for a bandwidth error is as follows: bandwidth error for connection <connection name> If a bandwidth error occurs, it can mean one of many different bandwidth errors. Some of theses errors occur:

Attempting to provision trafc on bandwidth that is already occupied. Allocating inappropriate channels for a certain connection rate. (Ex. setting up an STS-3c connection starting from channel 5) Invalid tributary types for match node connections. Invalid tributary types for certain connection rates. Bad mix of different tributaries for a single connection. A VT-managed STS-1 pipe was not created to put the vt connections in. Tributary slot already in use.

These are just some of the many different bandwidth error conditions that can occur. Other Errors These errors are reported when a certain subroutine returns with a fail condition. When these errors occur, the opc is not active or else a process such as opcbandwidthmgr is not active. If this is the case, then stpprov cannot function correctly. Some of these error messages include:

obm_load_cong_cache() failed obm_reg_conndata() failed uic_attach_cache() failed

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-60 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

Notes for Dumping Connections Each connection will be represented in the add command syntax so that the output dump le may be used again by stpprov to provision all the connections. If the tributary information can not be received from an NE, that tributary will appear in the data le with lower case letters. This is called a best effort tributary. The tributary information may not be the same as what was originally provision but if the command in the le is used again to provision the connection, the original tributary will be provision in the OPC database. All optional attributes will be displayed in the data le. Some of the original attributes may be displayed in a different manner then what was originally provisioned. For example, a connection might have been provisioned as a short route, the dump le would indicate what facility (G1, G2) out of NE A for the route instead of implicitly mentioning short route. Both the opcbandwidthmgr process and the OPC must be active for stpprov to dump the opc connection database to le. If they are not, errors such as uic_attach_cache() failed and obm_load_cong_cache() failed will occur.

See Also http://47.45.4.122/users/pkalab/CS/Stpprov/stpprov.html

1.42 syslog - system log le


NAME /var/log/syslog - system log le LOCATION /var/log SYNOPSIS To view the current days logs, type: opc> more /var/log/syslog To view previous days logs, type opc>zcap <compressed_syslog_lename> | more where <compressed_syslog_lename> is a syslog le with the sufx.Z. (e.g. /var/log/ syslog.day-1.Z) DESCRIPTION /var/log/syslog displays any log messages produced by the system that are not displayed in the event browser or by lasdump.

A-61 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

USES It is an essential rst resource for diagnosing problems on the OPC. One displays it in order to determine the possible areas which may be causing the problems. The more command allows you to page through /var/log/syslog or any other text le. After issuing the more command, help can be attained by typing h. To view the next page, press the <space> bar. To advance one line, press <return>. To quit more, type q. SAMPLE OUTPUT The following is a sample of the messages contained in /var/log/syslog:

The logs can be scanned for any warning messages, state changes, notications or messages delivered by various actions.

1.43 thf_extract - Tape Handling Format Extraction tool


NAME thf_extract - Tape Handling Format Extraction tool LOCATION /iws/thf SYNOPSIS To invoke, type: opc> thf_extract [ -l ] | [ -lf <lename> ] | [ -ld <description> ] | [ -f <lename> ] | [ -t <tape drive> ] where -k -d -e -l -lf <lename> print tape type print tape creation time in normal format print tape creation time in seconds since 1970 (epoch) lists all les on a load/upgrade tape lists the description of the given lename

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-62 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

-ld <description> lists the les matching the given description (MISC, UPR, INSTALL, OPCLOAD, NE_LOAD, NE_RELEASE, NOCLOAD) -f <lename> -t <tape drive> -v -minfree <n> -dir <dirname> -noprecheck retrieves the contents of the given lename and dumps it to stdout load a set of les given in <diskles.info> le reads another drive (default=/dev/rdt/tape2) verbose mode abort if disk free space drops below <n> bytes store leset in target directory dont use <diskles.info> to check disk space -leset <diskles.info>

-diskspacecheck perform diskspace check only, dont load les -exclude <lename>skip le if it occurs in the leset -c <n> -p <lename> extract only the given <n> number of bytes of the le direct progress completion info to <lename>

DESCRIPTION thf_extract is used to directly access the data on a tape created by SRT or tapecrea. thf_extract is found under the /iws/thf directory. USES Example for extracting the install le from an upgrade tape and storing it under the root directory as a le called installation.tar: opc> thf_extract -l release.info OC481400.CTG STS1BL01AE.LD RDB16AM00005.LD OC3DBL01AD.LD OWB17AA27.LD STS1AL01AP.LD OWA17AA27.LD SCPLDR03AJ.LD RELB03AJ.LD OC3DAL01AD.LD RDA16AM00005.LD

A-63 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

bootstrap.tar upgrade_opc35as_35as_hp_80.tar.BHU upgrade_opc35as_15bp_hp_80.tar.BHU upgrade_opc35as_15bi_hp_80.tar.BHU upgrade_opc35as_17bo_hp_80.tar.BHU upgrade_opc35as_24bx_hp_80.tar.BHU upgrade_opc35as_24cg_hp_80.tar.BHU install.tar RPT00KPIPL.LD SSL00KPIPL.LD opc35as_hp_80.tar.BHU opc> thf_extract -f install.tar > installation.tar

1.44 tidmap - show/change NEs target identier for TL1


NAME tidmap - show/change NEs target identier for TL1 LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> tidmap [-a <NE ID> <New NE TID>] | [<TL1 TID>] where -a NE ID TL1 TID changes the existing NE TID id number for a specic NE existing TID for an NE

new NE TID new TID to be assigned to the NE

DESCRIPTION tidmap specied with the -a option allows the user to change the existing TID for the specied NE id with a new TID. tidmap with a TL1 TID already species will retrieve the TID, NE name and NE id for the specied TID. tidmap with no parameters will retrieve the TID, NE name and NE id for all NEs in a system. NOTE: the TID shown by tidmap is

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-64 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

only applicable for a TL1 NMA connections. When using TL1 OPS connections, the TID refers to the NE name. USES tidmap is useful for retrieving the TL1 NMA TID for all the NEs. SAMPLE OUTPUT opc> tidmap TID ================ NE01 NE02 NE NAME =========== GEORGE FRANK NE ID ======= 1 2

Note: For a TL1 NMA connection the TIDs are NE01 and NE02 For a TL1 OPS connection the TIDs are GEORGE and FRANK

1.45 tl1mlct - monitor a TL1 OSS connection


NAME tl1mlct - monitor a TL1 OSS connection LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> tl1mlct DESCRIPTIONS tl1mlct monitors activity on a single TL1 OSS connection and displays all the messages being sent between the OPC and OS. tl1mlct also allows the user to log all the transactions to le. If multiple TL1 OSS connections exist, the user has the choice of viewing only one connection at a time. Multiple instances of tl1mlct can be opened to view other TL1 NMA connections. USES tl1mlct is meant only for debugging of TL1 OSS connections.

A-65 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

1.46 tl1shell - simulate a TL1 connection within a UNIX shell


NAME tl1shell - simulate a TL1 connection within a UNIX shell LOCATION /iws/test-tools/tool-res SYNOPSIS To invoke, type: opc> tl1shell <NMA | OPS | -f <lename>> where NMA OPS simulate a TL1 NMA connection simulate a TL1 OPS connection

-f <lename> simulate a TL1 connection with an OS specied by lename DESCRIPTIONS tl1shell will simulate a TL1 connection without having to actually congure and setup a X25 network and OS. tl1shell works by replacing the entire X25 session manager and providing all the sockets and signals which would normally be generated by TL1X25SM. USES tl1shell is not a replacement for a TL1 network, but it does permit ready access to TL1 functionality through serial devices. SEE ALSO tl1mlct

1.47 tstatc - display OPC communications information


NAME tstatc - display OPC communications information LOCATION /usr/etc SYNOPSIS To invoke, type: opc> tstatc

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-66 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

DESCRIPTION tstatc is a program that continuously displays statistics and conguration variables for OSI protocols implemented inside the HP-UX kernel. There are a number of displays that may be chosen from the main menu. These displays provide statistics on the socket layer, transport layer, network layer, and CNET layer, as well as X.25. USES tstatc can be used for diagnosing lower layer data communications problems. A main menu screen is displayed when tstatc is invoked. From this screen, one can navigate through all the information screens. The socket detail display (option D) indicates which processes that use sockets are running and how they are performing. Changes in the messages in and out columns indicate that the socket is active and thus that the process is active. From this screen, one can examine any of the sockets in more detail by selecting it using n for next socket. By examining the bytes in/out eld it can be determined if any data is being taken from the socket. If no data is being taken, it is an indication of a problem with the application and not with lower level communication. The subnet details will indicate the status of the lan and cnet subnetworks. cnet0 should be up and running error-free. lan0 (ETHERNET) should be up and running but some errors are to be expected. If the T option is used to select details on a cable, more detail is provided. NS <unavailable> means that the cable is not in service which indicates a faulty or unconnected cable. The N option under network commands provides the user with additional statistics on the status of the cables. It allows one to enable and disable cable connections. This screen also displays the address and hostname of the OPC as it appears in the /etc/osihosts le. The R option under network commands displays the routing table. This is a list of all nodes that the OPC has a physical connection with and thus the OPC has the possibility to communicate with. The OPC has the ability to osiping any node contained in the routing table. Note that the OPC can communicate with nodes not in the routing table, if at least one IS system is available. The X option under the network commands displays an X.25 (TL1) screen. The L option under the network commands displays the LAPB screen. This screen is the X3PAD which allows a VT100 terminal to remotely access the system over X.25. SAMPLE OUTPUT All options may be accessed from the main menu display. Note that the menu has had minor changes to it with each release. The following display is an example only.

A-67 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

A menu item is chosen simply by typing the letter displayed on the left-hand part of the screen, in either upper or lower case. No carriage return is required. This is true through out tstatc, with the exception of two cases in which a variable-length string is required (the I command, and the F command). Once a menu item is chosen, the new display will show on the screen, and will be updated periodically depending on the value of the update time interval, which is shown on the main menu page. The I command is available to change the update interval from the default value of 2 seconds. From any display, the main menu can be re-displayed by typing a q (quit). Alternatively, you can type the letter corresponding to any main menu command, and that display will immediately be shown. Note that if you type I to change the value of the update interval, the display will revert to the display from which the I was typed, and the new interval value will only affect that display. Once the display is changed, the interval value will return to its previous value. The F command writes statistics to a le and is available on all screens. The Z command zeros out statistics and only applies to Transport layer statistics (displayed by the S command). The second line of the main menu displays the release-id of the OSI protocols in the kernel. This is either a date, or an identier that indicates the loadbuild that created the OSI library. The V command shows tstatc version information, including the version number of tstatc, the release id (loadbuild id), and the release-id of the OSI protocols.

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-68 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

DIAGNOSTICS
tstatc: nlist() failed tstatc: open of /dev/kmem failed open: Permission denied Socket called failed. Type q to return to main menu. The nlist function failed access /hp-ux correctly. This may be a problem with permissions on /hp-ux. This error occurs when tstatc is initiated without the proper permissions on /dev/kmem. tstatc requires read permissions on these les. This error may occur when selecting the Network layer statistics option or the CNET statistics option in the main menu. The error will occur when there is no CNET card attached

SEE ALSO osiping

1.48 upr_manage_ne_backups - provide basic upgrade functionality


NAME upr_manage_ne_backups - provide basic upgrade functionality LOCATION /iws/upr SYNOPSIS To invoke, type: opc> /iws/upr/upr_manage_ne_backups <-v <le> | f | t | a | b | d | e > where v <le> f t a b d e brm validate the upgrade operation and dump the results to le force all NEs to do a db backup data synch primary to backup check associations busy all MSRs on the local OPC disable operation of ows_swact enable operation of ows_swact print database backup release strings

DESCRIPTIONS upr_manage_ne_backups will provide a number of useful operations to the user during the course of an upgrade. There are more operations available but the above are the most useful.

A-69 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

USES upr_manage_ne_backups is used by the Network Upgrade Manager but can be used manually by the user. The above list indicates the available functions provided by upr_manage_ne_backups.

1.49 verify_tape - verify tape data is not corrupted


NAME verify_tape - verify tape data is not corrupted LOCATION /iws/test-tools/tool-res SYNOPSIS To use, type: opc>verify_tape <tape device> eg. opc>verify_tape ./dev/rdt/tape2 DESCRIPTION verify_tape can be used to ensure that contents of a tape are valid according to the information stored in the header of the tape. USES The tool recalculates the checksum and blocksize for the les stored on the tape and then compares it to the information stored in the tape header at the time of the tape creation. If there are any discrepancies, the tape contents are considered invalid. This command is very useful if the user is trying to extract a load from tape and the tape drive issues error messages. It is not always obvious if it is the tape drive itself or the tape which is causing the errors. By using verify_tape the user can verify if the tape is the problem or not. At the end of tape validation either of the following messages are displayed to the user screen: 1. Case of valud tape: ****THE WHOLE TAPE IS VALID.**** 2. Case of Invalid tape: ****THE TAPE IS INVALID. ONE OR MORE FILES**** ****FAILED VALIDATION SAMPLE OUTPUT The following is output when verify_tape is used on a tape containing an OPC load: ****

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-70 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

When the tool is nished checking the entire tape contents, one of two messages will be displayed: THE WHOLE TAPE IS VALID. or THE TAPE IS INVALID, ONE OR MORE FILES FAILED VALIDATION.

1.50 X/Y/Z-Modem le transfer software


NAME /usr/etc/sb, /usr/etc/sx, /usr/etc/sz, /usr/etc/rb. /usr/etc/rx, /usr/etc/rz LOCATION usr/etc SYNOPSIS To invoke, type: opc> /usr/etc/sz <options> opc>/usr/etc/rz <options> where <options> is any of the options presented by the help screens shown below. DESCRIPTIONS The zmodem software package is provided for Nortel eld service to transfer les to and from the OPC. This package includes support for the X, Y and Z protocols. Typical usages may include connecting a modem to the OPCs port-B and running: /usr/etc/sz -be /var/log/syslog

A-71 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

To download or transfer the /var/log/syslog le from the OPC to the receiving end. /usr/etc/rz -be To receive les uploaded by the remote end. /usr/etc/sz -h

opc> /usr/etc/sz -h Send le(s) with ZMODEM/YMODEM/XMODEM Protocol (Y) = Option applies to YMODEM only (Z) = Option applies to ZMODEM only Usage: sz [-2+abdefkLlNnquvwYy] [-] le ... sz [-2Ceqv] -c COMMAND sb [-2adfkquv] [-] le ... sx [-2akquv] [-] le 2 Use 2 stop bits + Append to existing destination le (Z) a (ASCII) change NL to CR/LF b Binary le transfer override c send COMMAND (Z) d Change . to / in pathnames (Y/Z) e Escape all control characters (Z) f send Full pathname (Y/Z) i send COMMAND, ack Immediately (Z) k Send 1024 byte packets (Y) L N Limit subpacket length to N bytes (Z) l N Limit frame length to N bytes (l>=L) (Z) n send le if source newer (Z) N send le if source newer or longer (Z) o Use 16 bit CRC instead of 32 bit CRC (Z) p Protect existing destination le (Z) r Resume/Recover interrupted le transfer (Z) q Quiet (no progress reports) u Unlink le after transmission v Verbose - provide debugging information w N Window is N bytes (Z) Y Yes, overwrite existing le, skip if not present at rx (Z) y Yes, overwrite existing le (Z) - as pathname sends standard input as sPID.sz or environment ONAME sz 2.15 12-08-91 for SYS III/V by Chuck Forsberg, Omen Technology INC The High Reliability Software

/usr/etc/rz -h

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-72 Issue: AD04

Chapter

Complete set of OPC Tools

For Nortel Internal Use Only

opc> /usr/etc/rz -h Usage: rz [-abeuvy] (ZMODEM) or rb [-abuvy] (YMODEM) or rx [-abcv] le (XMODEM or XMODEM-1k) -a ASCII transfer (strip CR) -b Binary transfer for all les -c Use 16 bit CRC (XMODEM) -e Escape control characters (ZMODEM) -v Verbose more vs give more info -y Yes, clobber existing le if any rz 2.03 05-17-88 for SYS III/V by Chuck Forsberg, Omen Technology INC The High Reliability Software

USES This software should be used only by Nortel eld service / development personal. Its primary usage is to allow the transfer of logs and other data back to Nortel personal for off-line investigation. LIMITATIONS Testing has shown that le transfers when connected via a modem through an OC-48 / OC-12 NE port will not work as the OC-48 / OC-12 port cannot handle the large data packets without dropping characters. No problems have been observed when the modem is connected via the OPCs Port B. Similar limitations are believed to exist on the OPCs debug port (cable ribbon onto the OPC itself). Modem connections via a OC-192 NEs have not been tested but are believed to be capable of handling the throughput. We suggest that you always use the -e switch as this will escape control characters which may cause the modem/0link to disconnect or hang.

1.51 Further list of OPC Tools, for eld service / development purposes
The following contains a list of OPC tools which is included here for completeness. These tools are primarily for use by Nortel eld service / development personnel. Nortel does not recommend and does not accept responsibility for the use of these tools by any personnel other that those mentioned above. The list is as follows: baseline_tool bhu_expand brm_tool

A-73 OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) Issue: AD04

Date: Aug. 11, 1998 Editor: Ross Brydon

For Nortel Internal Use Only

Chapter

Complete set of OPC Tools

cleanup_les diskcheck hcamtest install_ci lomConvert lomui opcdrmp osinetping restore_commands restore_opc_data schswi start_backout store_commands stpprov swigo transfer_commands transfer_le transfer_ne_loads

Date: Aug. 11, 1998 Editor: Ross Brydon

OPC Trouble Shooting Guide (OC-48 Rel. 14.1, OC-3/OC-12 Rel. 13) A-74 Issue: AD04

You might also like