NSM r11.1 Diag
NSM r11.1 Diag
NSM r11.1 Diag
Diagnostics Guide
r11.1\r11.2
This documentation (the Documentation) and related computer software program (the Software) (hereinafter collectively referred to as the Product) is for the end users informational purposes only and is subject to change or withdrawal by CA at any time. This Product may not be copied, transferred, reproduced, disclosed, modified or duplicated, in whole or in part, without the prior written consent of CA. This Product is confidential and proprietary information of CA and protected by the copyright laws of the United States and international treaties. Notwithstanding the foregoing, licensed users may print a reasonable number of copies of the Documentation for their own internal use, and may make one copy of the Software as reasonably required for back-up and disaster recovery purposes, provided that all CA copyright notices and legends are affixed to each reproduced copy. Only authorized employees, consultants, or agents of the user who are bound by the provisions of the license for the Software are permitted to have access to such copies. The right to print copies of the Documentation and to make a copy of the Software is limited to the period during which the license for the Product remains in full force and effect. Should the license terminate for any reason, it shall be the users responsibility to certify in writing to CA that all copies and partial copies of the Product have been returned to CA or destroyed. EXCEPT AS OTHERWISE STATED IN THE APPLICABLE LICENSE AGREEMENT, TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS PRODUCT AS IS WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IN NO EVENT WILL CA BE LIABLE TO THE END USER OR ANY THIRD PARTY FOR ANY LOSS OR DAMAGE, DIRECT OR INDIRECT, FROM THE USE OF THIS PRODUCT, INCLUDING WITHOUT LIMITATION, LOST PROFITS, BUSINESS INTERRUPTION, GOODWILL, OR LOST DATA, EVEN IF CA IS EXPRESSLY ADVISED OF SUCH LOSS OR DAMAGE. The use of this Product and any product referenced in the Documentation is governed by the end users applicable license agreement. The manufacturer of this Product is CA. This Product is provided with Restricted Rights. Use, duplication or disclosure by the United States Government is subject to the restrictions set forth in FAR Sections 12.212, 52.227-14, and 52.227-19(c)(1) - (2) and DFARS Section 252.227-7013(c)(1)(ii), as applicable, or their successors. All trademarks, trade names, service marks, and logos referenced herein belong to their respective companies. Copyright
Contents
Chapter 1: Introduction
CA Product References ........................................................................ 1-2
Contents iii
CA Product References
Unable to Run Commands ................................................................ 3-8 Message Records and Actions Not Working ................................................. 3-9 Held Messages Do Not Appear in Next-Day Console ........................................ 3-10 Unable to Execute opreload .............................................................. 3-10 Console Does Not Refresh................................................................ 3-10 Console Messages Not Being Forwarded .................................................. 3-11 Message Record Action Banner is not Functional .......................................... 3-12 MCC Issues ................................................................................. 3-13 The RMI Connection to xxx has been lost. Please restart the Management Command Center to access this namespace................................................................... 3-14 The colors in the MCC are not propagating correctly up the LHP tree -or- Changes are not being reflected in the MCC ..................................................................... 3-15 When I select the Console Log plugin on the Right Hand Pane in the MCC, I get a message .. 3-16 Newly installed component icons dont show correctly...................................... 3-17 Cannot edit Message Records in MCC (Command execution denied) ........................ 3-17 In the MCC left hand pane, there is no Alert plugin available or Console Log Plugin .......... 3-18 No alerts are created .................................................................... 3-18 In the MCC Topology there is nothing under the WorldView object .......................... 3-19 System Performance Issues ................................................................. 3-20 A performance object is shown as blue unknown state in nodeview ......................... 3-20 I dont see the MIB values I associated to a specific class available in Trend ................. 3-20 Unicenter Configuration Manager ............................................................. 3-21 When I deliver a profile from UCM, the agent doesnt get the profile. ....................... 3-21 General/Miscellaneous Issues ................................................................ 3-22 Cannot connect to UBI/IIS on Windows 2003 (HTTP 404) .................................. 3-22 IP Address returned by hostname doesnt match IP returned by DNS ....................... 3-22
iv
CA Product References
Contents v
Chapter 1: Introduction
This Diagnostics Guide provides information to help you troubleshoot problems that may occur in your CA Network and Systems Management (NSM) infrastructure. Several of the chapters contain symptoms and solutions": descriptions of errors that have occurred at customer sites and tips for diagnosing and fixing those errors. You can use this guide to try to solve problems yourself before calling CA Technical Support. If you have visited the CA Support web site (http://support.ca.com) you may have seen some of these tips in the form of FAQs. In fact, this guide combines useful information from various sources at CA: support technicians, field service representatives, product specialists, and more. The information in this guide focuses on CA NSM 11.x and is arranged as follows: Basic troubleshooting techniques Symptoms and solutions for: Installation and General Issues Enterprise Management WorldView Agent Technology and Performance Discovery Issues
Information about CA Technical Support Unless otherwise noted, the procedures and syntax provided in each section pertain to 11.0/11.1 releases. Operating system variations are provided where applicable. Important! The Additional Guidelines for Troubleshooting DIA chapter that was previously available in this guide has been removed and is now part of the new DIA Supplemental Implementation Topics Guide for CA NSM r11.1 and r11.2. This document is available for download from the CA NSM Home Page on http://support.ca.com .
Chapter 1: Introduction
11
CA Product References
CA Product References
This document contains references to the previous Unicenter NSM Diagnostics Guide (for Unicenter TNG 2.4 & NSM 3.0). If you cannot remedy your issue using either document, please contact CA support.
12
The operating system version and most recently applied patches for each of the systems involved (for example server and agent installations). Also note security details for the operating system setup (for example, Domain details, mapped drives, user IDs being used). Useful commands include: winver.exe / ver.exe (for most Windows environments) ca_syscheck command from the CA NSM DVD image (for UNIX environments) uname command (for UNIX environments)
21
Version and patch level of any additional software packages that are interacting with your installation (for example, Microsoft SQL Server). Network protocols, firewall port limitations, and any other relevant communications details (for example LAN or WAN and network speed) After you have identified your environment, it is time to identify the problem.
What Happened
Typically, a problem is identified when something unusual or unexpected happens but it can also be suspected when something that normally happens does not. Therefore you need to identify the event that occurred (or did not occur). For example, you will need to identify: What should have happened if everything had been working properly (for example, discovery of a specific subnet)? What actually happened (or did not happen that was supposed to)? or happened that should not have happened (for example, did specific MRA not execute)? What is the specific error (for example, Agent View values are invalid)? What error messages or returned codes were issues (and from where)? Was this an isolated incident (for example, were other subnets discovered or other MRA executed on this node)? Answering these questions will identify the scope of your search for a solution. It is critical to examine not only what did happen but also what did not happen; this will identify the scope of your search for a solution. For example, suppose an agent is running but the Agent View display includes only question marks in place of a valid status. In this case you would verify the communications between the agent and the machine launching Agent View by, launching mibbrowse from the same Agent View machine, for example, using both the machine name and ip address for the agent.
22
Where It Happened
The next step is to further isolate the location of the error to determine how widespread it is. Find out if the error message was generated on a single machine or in a p the following: Useful questions to ask include the following: Which machines are affected (for example, all users in a specific subnet)? Note: Limit the number of functions being performed by the suspect machine to further isolate the problem. For example, if you suspect a problem with a specific policy on the DSM, exclude all other policies and update DSM to limit monitoring to a single instance of the failing agent. This will also reduce the amount of data that will need to be sifted though in the log files. If more than one machine is affected, what do they have in common (for example, does the installation only fail on computers using a dial-up connection)? Which specific component had the error (for example, the held messages pane on the Event Console)? If more than one component is affected, how do these components relate to one another? By isolating the specific machines on which the error is observed, you can find other similarities that may identify the root of the problem. For example, are all users unable to log in or just a select few? If just a select few, are they all defined in the same domain? If so, this indicates a problem with that particular domain. If Job Management cannot run the job, can it be run from the command line? If not, the problem may be with the job itself, not Job Management.
When It Happened
Another crucial step is to identify the time the problem occurred. This can help you limit potential causes to only those events that happened during that time. Useful questions to ask include the following: What function or functions were being performed at the time? When did the error first occur (for example, the Monday after a long weekend, shortly after the router was replaced, after a change to daylight savings time, and so on)? Has the problem been repeated since that first observation? If so, is there a pattern to that repetition (for example, every Friday after the weekly backup is performed)? What changes were made to the product before the problem occurred (for example, was an upgrade recently applied)?
23
How often does the problem recur (for example, if it occurs during the execution of a particular process does it ALWAYS occur when that process executes)? What other events occurred at that time (for example, does the error occur only during times of heavy network traffic and disappear when the load is lighter)? Can the problem be repeated at will by running certain commands or by certain actions on the system? Does the problem happen randomly without relation to anything else on the system? By identifying a pattern in the times during which the problem occurs (for example, the day after a holiday, the end of the month, shortly after the agent configuration set was modified, after a new patch was applied), you may be able to pinpoint a potential trigger. For example, is the install package set to be delivered via a particular router that happens to be undergoing a software update of its own and is, therefore, unavailable? Important! If support requests and analyzes multiple log files, it is critical that all log files cover the same time span, unless otherwise specified.
24
Once again, the goal is to detect a behavioral pattern for the problem. Does it recur periodically? Has it spread to other components or machines? Did it affect only computers that recently had an upgrade or system change?
25
Changes to any part of your enterprise can affect other areas in ways that you may not expect. Therefore, you need to find out what changed and when. In addition to hardware, software, and operating system changes you should identify changes to the following: Component profiles (such as users, users groups, computers and computer groups Configuration files Security and access rights Other software that recently has been installed (for example, virus scanning software or security software) OS configuration (such as registry changes or anything reported in a winmsd report) Variables (such as the default mode for Security Management)
26
User Error
Sometimes the problem lies not with the software or its environment but, rather with the user who either does not know or does not understand how the product or component works. For example, under Security Management, if a user has more than one access rule defined using the most specific format, the most restrictive rule applies. Further, confusion about how jobs and jobsets are processed in the new-day autoscan can leave you wondering why a job has not been processed. Use the autoscan simulation utility to test how your current job and jobset definitions affect each other and how they will be processed. User error can also include something as simple as a missing or mistyped command parameter. Therefore, you should ensure that a simple error in tasks, or a misunderstanding of how a task was to be performed, did not cause the error. Check error logs and review correct procedures with the user who last worked with the affected function or component. It is always wise to take care when modifying defaults. Some modifications are only temporary and are therefore lost when the system is recycled. Some modifications are affected by update intervals; changes are not applied until the next interval expires.
27
28
I have installed MDB on a dedicated server with just WorldView Manager but the AIS local catalog is missing
Symptom No providers were selected during the install and, consequently, the AIS local catalog is missing and MCC does not launch correctly. You must select at least one provider for the local catalog to be created. If no providers are selected, the install process will determine that there is no requirement for a local catalog. If you install MCC without any providers, it will create an AIS catalog but there will be no DNA cells available
Solution
The installation halted because the estimated size of the System Path Entry would have been exceeded based on the components selected
Symptom The install process was halted after the component selection was made. Typically, this occurs on a Windows 2003 system with SP1 applied.
31
Solution
For most contemporary Microsoft operating systems the maximum length for the PATH variable is 2,048 characters but for Windows 2003 Service Pack 1 that restriction has been reduced to 1,024 characters (see Microsoft KB article 906469). Once the component selection has been made, the NSM install process verifies that the maximum system path length will not be exceeded. This is likely to occur when multiple products (for example, USD, DSM and NSM) are installed on the same server. To resolve this you can do the following: Shorten the directory path name. By default, the NSM location is
\Program Files\CA\Shared Components\CCS\WVEM
Reduce number of components selected See Path Length Considerations for Unicenter NSM r11.x document on Implementation Best Practices page for more details and a tool for estimating path length.
Internal error: function LookupAccountSid failed, rc=1789, Reason= The trust relationship between this workstation and the primary domain failed
Symptom This message is encountered during installation and occurs when the computers machine account has an incorrect role or when its password has become mismatched with that of the domain database. Please refer to the following Microsoft issue for further details to resolve this issue: http://support.microsoft.com/kb/162797/ Log on locally as a local administrator. In the Network tool of Control Panel, select Change and enter a Workgroup name, leaving the domain. Restart the computer and log on locally as a local administrator. There are two methods to rejoin the domain: You can join the domain from the client if at the same time you can provide an administrator and password on the domain OR You can delete the existing computer account in Server Manager, recreate the computer account, synchronize the domain, and then on the client rejoin the domain If you have AD Running: Remove the PC from the domain
Solution
32
Go into AD users and computer Delete your pc if listed in computers Rejoin the domain
Typically, as the DSM's capacity nears its limit, you will see an increase in the number of retries. If your DSM is constantly polling at your -m set value, then you are probably close to overload for that value. What can I do to avoid overloading my DSMs? To minimize the chances of overloading your DSMs you need to plan your architecture, understand what factors impact DSM performance and regularly monitor DSM behavior to identify the early warning signs of a potential overload.
33
When planning your architecture, keep the following in mind for DSMs: Always monitor the DSM as a critical device Determine your failover policy up front DSMs need resources ( 8MB + processor cycles) Think of multiple DSMs when you monitor > 200 hosts Ensure Polling frequency is reasonable PING Only takes no resources on the monitored device DSMs should be location based DSMs report to a specific MDB You can implement multiple classes of DSM (Ping only/MDB connected/etc.) You can implement trap-multiplexing in a DSM Avoid hierarchically organized layers of DSMs (Can create bottlenecks) Put DSMs as close as possible to their monitored devices/agents
You should understand that DSM performance (and, therefore, overload potential) is affected by the following: type of hardware on which the DSM is installed location of the DSM in relation to the MDB and the agents being monitored use of cold start vs. warm start electronic proximity to hosts configuration and congestion of the network number of hosts number of managed objects polling configuration
After applying a system-wide security patch, we had to reboot a large number of servers relatively quickly. Now, one of the managed nodes is showing a status of any:absent. What does this mean?
Symptom Managed node shows a status of any: absent immediately following a reboot. A status of "any:absent" indicates that the DSM cannot communicate with the agent and, considering the timing of this 'event' (right after the reboot of a significant number of machines), this may be due to DSM overloading where a large number of status updates were communicated to the DSMs in a very short period of time. Therefore, you will need to look at:
Solution
34
How many managed objects is the DSM managing? What is the polling interval? Has the aws_snmp -m option been specified? Is the any:absent status for different servers or always the same server? If the same server, review awm_catch. If for different servers, update aws_snmp to run level 4 debug mode and adjust the debug file size accordingly. Additional DSM scalability considerations can be found on the Implementation Best Practices page on SupportConnect.
Solution
35
Other considerations to check: Make sure the proper community strings are set. You can do this by checking in the MCCs Tool Plugin under DSM Community Strings or by running dsmwiz Open and look in the DNA host list in aws_agtgate.log. Is the agent host included?
In DSM Monitor View theres a yellow exclamation mark next to MDB Connection and it has a WV Error Message
Symptom The DSM Monitor icon is yellow and says WV Error status next to the MDB Connection.
Solution
Click on the Confirm Button and the status will go back to Normal.
36
Discovery Issues
Discovery Issues
Topics in this section are relevant to WorldView Discovery both Classic and Continuous.
New classes are not classified correctly by Continuous Discovery or Classical Discovery
Symptom After adding a new class with its SNMP OID to the MDB (with MCC Class Specification for instance) it does not get classified correctly by Continuous Discovery or Classical Discovery Check that you did the following steps: Make sure that you added the following class level properties: asset_class_id with a default value of 0
Solution
37
Make sure you have run the following commands after adding the new class:
Program Files\CA\SharedComponents\CCS\Discovery\BIN\UpdateClassRules.exe Program Files\CA\SharedComponents\CCS\Discovery\BIN\RuleToDBConverter.exe
Solution
38
1. On Windows 2000/2003, start Administrative Tools Policy. 2. Select Local Policies User Rights Assignments
Local Security
3. Double click on the policy and add the caunint user. 4. Select Act as part of operating system (SeTcbPrivilege), increase quotas (SeIncreaseQuota) and Replace a process level token (SeAssignPrimary) To set required privileges for users logged on by Event Management: 1. On Windows 2000/2003, start Administrative Tools Policy. 2. Select Local Policies User Rights Assignments Local Security
Double click on Logon as a batch job and add the user(s) that can be logged on by Event Management.
Under Windows, ensure that the ID issuing the command is included in Users Authorized to Issue Commands in Config/Settings/Client Preferences/Event Management. Under Windows, verify that the user ID has permission to execute the command in the Unicenter TNG or NSM environment. Do this by entering the following from the Console command line and then execute the command
/int cmd
39
Solution
Solution
For connection from Windows to Windows, Transport and WindowsServer or Windows-Client should be running. For a connection from Windows to Unix, Remote should be running.
310
On UNIX enter the following, and make sure that ccirmtd is running
ps -ef | grep cci
2. Define a policy on either or both servers to forward a received message to the other server. A policy consists of a message record (identifying the message to be intercepted) and the corresponding message action (indicating the action to be performed upon receipt of the message). If either of these tests is successful, you can be reasonably sure that the event managers and agents are functioning properly. If these tests were not successful, do the following: 3. Make sure that you can ping the object by both IP address and host name. If you cannot ping by host name, add the host name to the DNS, WINS, or Hosts File depending on your setup. If neither ping works, you may have a problem with the network. 4. If you are able to ping the target servers, execute the oprping command. Oprping is similar to ping, but it uses the common communication interface (CAICCI). The syntax of oprping on Windows is:
oprping target_server number_of_pings test_message
311
5. If the oprping command is successful, Security may be preventing the message from being forwarded. Make sure that the ID issuing the command to the target server is listed in that servers CA_OPR_AUTH_LIST. If you are still unable to determine why your commands are not functioning properly, and if the oprping was not successful, verify that CAICCI is functioning correctly: On Windows, run the following command.
ccicntrl status
For connection from Windows to Windows, Transport and Windows-Server or Windows-Client should be running. For a connection from Windows to Unix, Remote should be running. On UNIX enter the following, and make sure that ccirmtd is running.
ps -ef | grep cci
Solution
312
MCC Issues
This issue is caused by a change in behavior of the Windows operating system beginning with Windows Server 2003 and Microsoft is in the process of further changing this behavior to increase security. For more information on these changes, see the following Microsoft articles: Q171890 INFO: Services, Desktops and Windows Stations Q327618 INFO: Security, Services and the Interactive Desktop Q165194 INFO: CreateProcessAsUser() Windowstations, and Desktops As Microsoft makes these changes and more, in the future, CA will work to update CA NSM software in order to maintain previous levels of functionality with the newest operating system specifications.
MCC Issues
The following topics are related to problems with the CA NSM UIs - including WorldView and the Management Command Center (MCC). In addition to this section, you should also consult the "Troubleshooting" chapter in the Inside Systems Management Guide for additional information for such topics as: Agent View Message: Could Not Connect to ORB Agents Do Not Appear in Management Command Center or WorldView Agent View Message: No Response for this Request Inconsistent Agent Status Mismatched Community Strings DSM Policy Not Loaded Abrowser Does Not Open Abrowser Starts but Gives Error "Could not Connect to ORB" Abrowser Starts but Gives Error "No Configuration File Specified" Abrowser Starts but no Values can be Altered Abrowser Starts but Gives Error "Could Not Read in Configuration File
313
MCC Issues
The RMI Connection to xxx has been lost. Please restart the Management Command Center to access this namespace
Symptom While using the MCC this message pops up:
Solution
If this happens then generally a problem occurred communicating to your rmi_server.exe. To resolve this, close the MCC as well as any other instances accessing the MDB at that moment. Then, from a command prompt on the MDB box, do a camclose. Keep issuing a camclose until you get the response
D:\>camclose camclose: server closed. D:\>camclose camclose: select failed (15) Unable to connect to CAM server
To verify that CAM is done, check Task Manager to ensure that rmi_server.exe is not running. Now issue a cam start Note: If you have System Performance installed, a camclose will stop the services associated with SP. You will need to restart them in the services.msc CA Systems Performance Distribution Server CA Systems Performance Domain Server If the performance agents are on this box as well, you need to start them:
hpaagent start prfagent start
314
MCC Issues
The colors in the MCC are not propagating correctly up the LHP tree -orChanges are not being reflected in the MCC
Symptom The colors in the MCC are not being propagated correctly. For example, even though you know that an object is really down and has a status of down the color of that object is not correctly reflected. Alternately, if you create a new object, that object is not being displayed in the MCC. In general symptoms such as these are related to Sevprop. The first thing you should check is if you still have a valid connection to the catalog machine. Within the MCC do a File Connect and enter your Master Catalog machine name. If you dont get any errors then go to the next step. If you do get an error then go through the solution steps mentioned in the previous issue. To recycle sevprop, first close all instances of the MCC that are running. Then recycle CA WorldView Severity Propagation Service either through services.msc or via command prompt:
sevprop stop sevprop start
Solution
Note: Make sure that, after issuing the sevprop stop, the following processes actually stopped running: sevprop.exe sevpropcom.exe startbpv.exe If this did not resolve the problem then follow the solution described in the previous issue.
315
MCC Issues
When I select the Console Log plugin on the Right Hand Pane in the MCC, I get a message Cannot find EM Manager Cell xxx
Symptom While in the MCC Topology View, selecting the Console Log plugin in the right hand side results in the following error message: Cannot find EM Manager Cell <server name>. For example:
Solution
This can occur when DIA is installed in Non-DNS environments. The reason for this error message was that the EM-Server property of the selected WV object was not set properly in Full Qualified Domain Name (FQDN) format. For the MCC to locate the EM server via DIA, the EM-Server name has to be in FQDN format. The first system reference entry in etc\hosts file is taken to set the EM-Server property. To fix this add to the \etc\hosts file the FQDN and either try an ipconfig /flushdns or recycle the system. The EMserver property in WV will update automatically then and the plugin will work as such:
316
MCC Issues
Solution
To fix this, close the MCC and issue a camclose at the command line. Keep issuing a camclose till you get the response:
D:\>camclose camclose: server closed. D:\>camclose camclose: select failed (15) Unable to connect to CAM server
To make sure it came down, check Task Manager to verify that rmi_server.exe is not running. Now issue a cam start Note: If you have System Performance installed, a camclose will stop the services associated with SP. You will need to restart them in the services.msc CA Systems Performance Distribution Server CA Systems Performance Domain Server If the performance agents are on this box as well, you need to start them:
hpaagent start prfagent start
Solution
317
MCC Issues
In the MCC left hand pane, there is no Alert plugin available or Console Log Plugin
Symptom I dont see the Alert drop down or the Console Log plugin available in the LHP of the MCC? AMS utilizes the DIA protocol for communication between the MCC and the AMS manager. If DIA has not been configured, the Alert drop down that should be available in the left hand drop down within the MCC will not be available and when the MCC is started you may receiving a DIA warning dialogue box. More information on DIA concepts can be found in the NSM r11 Implementation Guide Appendix A: DIA Reference.
Solution
If the file has been created, this confirms that EM has called the alert action so you can move onto the next step. If the file is not created check EM and the MRA. 5. Start the AMS service and then confirm that the file is removed from the AMS hold folder. To debug AMS run the cautrace command to start the trace GUI and run the caamssrv traceon command to put AMS in a debug mode.
318
MCC Issues
Solution
This generally means you did not select WorldView Provider during the NSM Installation. The provider is needed so that the MCC can communicate with that component. Re-run the installation selecting only the WV Provider run through the setup. For example:
319
I dont see the MIB values I associated to a specific class available in Trend
Symptom You ran through the Associate MIB wizard in the Performance Configuration GUI to a particular class. You saved it and then updated your profiles and delivered that profile. An ample amount of time has passed yet when you bring up Performance Trend none of the new MIB values you associated are available to report on
320
Solution
The agent will only collect SNMP data for itself if this functionality is switched on under the machine properties in Performance Configuration. To do this you simply have to Locate the machine in the network tree Right-click and select properties Select the SNMP Proxies tab Select the Collect SNMP Resources for this machine check box (see screenshot below) Redeliver the profile to the agent machine.
When I deliver a profile from UCM, the agent doesnt get the profile.
Symptom Solution A profile delivered from UCM is not received by an agent. On the target machine, check to make sure that the agtctrlcell is up and running. This cell is responsible for putting the configuration on the agent. If the agtctrlcell is running check to make sure that it is also registered with the DNA on that machine by using diatool to connect to the target host. If it is registered correctly it there should be a green check mark next to the agtctrlcell. For example:
321
General/Miscellaneous Issues
If the agent has an exclamation mark through it then that means its currently in a failed state. You can try stopping CA DIA 1.2 DNA then restarting it and see if that fixes the problem. If not then you need to reregister the agtctrlcell. Please refer to the next chapter for information on how to do this.
General/Miscellaneous Issues
The following section contains general symptoms and solutions that are not specific to a particular component or function.
Solution
322
General/Miscellaneous Issues
nameserver 101.200.20.303
2. Make sure /etc/nsswitch.conf has the entry"hosts: dns files" to get the hostname from DNS first: /etc/nsswitch.conf hosts: dns files
3. Make sure /etc/hosts has a valid IP address along with the box name. This is required in case resolving the host name fails from DNS.
cat /etc/hosts : localhost 138.42.147.12 machine1.domain.com machine1
4. Check Hostname (and check this against the entry in /etc/hosts) LINUX only: /etc/init.d/network status Check if the interfaces are active If at least one of the interface (eth* interface) is not UP then start the interface using
/etc/init.d/network start hostname i
check if the interface(s) is up IP address that's returned from 5 above should be for one of the interfaces which are UP.
323
41
2 3
The Technical Support Organization at CA provides support 24 hours a day, 7 days a week for all Severity 1 technical support calls. Review the Technical Support Policy posted on Support web site for details regarding the service level objectives for each severity level.
51
Telephone Support
Contact information for CA Technical support can be found on the http://support.ca.com site at the following link: https://support.ca.com/irj/portal/anonymous?NavigationTarget=navurl://036c ebd7fd20d8d7505abb95e5ff120f The telephone number listed connects you directly to the support center responsible for the product. During primary service hours, call this number to speak with a technician. If all technicians are busy at the time of your call, a receptionist will log and queue your call for a callback by a technician. The CA standard is to return all calls by the end of the same business day in priority sequence. If a call is received near the end of the day and you cannot be reached, your call will be returned the morning of the next business day. If you are calling with a severity 1 problem or need immediate assistance, you should always inform the receptionist so that a technician can be made available to take your call immediately. Important! If you are calling from outside of North America, use the direct numbers during primary service hours. Call (631) 342-4683 for severity 1 problems only, during emergency service hours. When contact Customer Support, have the following information available: The product name, version, operating system or platform, and general description of problem Your name and telephone number Your company site ID Any documentation that may help in resolving the problem Use the guidelines provided under the "Basic Troubleshooting chapter document to help identify the problem (for example, when it occurred, what changed in your environment, which machines are impacted, and under what circumstances the problem occurs).
52
Regardless of the route you use to contact Technical Support, your issue will be entered into the StarTrak program which is used to collect, record, disseminate, and track data related to client support requests worldwide. When you open up an issue with Technical Support, the tracking number you are given is the StarTrak number. Refer to this number for subsequent support contact to further speed your response.
Escalation
If you need to escalate the severity level of your problem, request an escalation from the CA Support Center working on your problem. If you feel that your problem is not being adequately addressed, you can escalate your concerns by requesting to speak with the manager responsible for the technician assigned to your issue. If the issue is not assigned to a technician, you can request to speak with a manager.
53
CPU Bottlenecks
One common problem to watch for is CPU bottlenecks, which can cause processing delays and timeouts. You can identify potential CPU bottlenecks on the MS-SQL Server on which the MDB resides, by installing the MS-SQL Agent and monitoring the following: Processor object, % Processor Time counter System object, % Total Processor Time counter In general, if usage continuously exceeds 80%, a CPU bottleneck is likely.
61
Event Management
One of the most obvious ways to monitor your enterprise is by checking the Event Management console. It is recommended that you employ filtering techniques to minimize the message traffic and highlight messages that warrant action. Depending on whether DSM messages are sent to the Event Console, you should look for messages containing the words Critical or Down in them to identify items that have gone into Critical Status or devices that are down. Note: If you are using an Event Agent, keep in mind that, although the Event Agent machine has an Event DSB, it does not have a caioprdb database. Therefore, in order to review the DSB in effect for a particular Event Agent, you need to enter the following commands:
oprdb list dsb_filename oprdb script db > c:\temp\cautili.txt
WorldView
Although the classic change in the color of an object on Node View provides a visual and obvious indication of trouble with a managed object, you can also use the Real World Interface in less obvious ways. For example, careful monitoring of the number of objects managed by a single DSM machine alerts you to potential scalability issues when that number becomes excessive. WorldView provides several ways to determine the number of objects being managed by a particular DSM. Open DSM View (obrowser), select Query Option, and click Search. With no filtering selected, the number of matches the query returns equals the number of managed objects. Open Node View and click the microscope icon. Once again, the number of matches equals the number of managed objects. Run dsm_report to generate a report of managed objects. This command creates a CSV file (dsmrpt.csv) in the current directory
Agent Technology
Agent Technology includes several SNMP diagnostic utilities that you can use to test and clarify the details of your AT setup. These include: Awget Returns the value of a specific SNMP attribute. The syntax is as follows:
62
awget [-h hostname|IP-address] [-c community] [-p port|service-name] [-t timeout] [-d loglevel] [-f logfile] -o oid
Awnext Returns the value of the next SNMP attribute from the one specified. The syntax is as follows:
awnext [-h hostname|IP-address] [-c community] [-p port|service-name] [-t timeout] [-d loglevel] [-f logfile] [-o] oid
Awtrap Can test a managers ability to process an agents traps without actually running the agent. The syntax is as follows:
awtrap [-f from] [-h destination] [-p port] [-c community] enterprise-type [subtype] [oid type value]+
Awwalk Retrieves the value of every instance of every attribute defined in the MIB from the specified OID through the last OID in the tree. It is the equivalent of repeated executions of awnext. The syntax is as follows:
awwalk [-h hostname|IP-address] [-c community] [-p port|service-name] [-o] oid
For more information about these commands, including additional syntax and examples, see the online CA Reference
Running Reports
Reports provide a view of activity in your enterprise over a period of time, enabling you to detect patterns that may indicate problems. CA NSM 11.x includes a variety of different reports that can be helpful in troubleshooting your enterprise. Job Management, for example, includes a utility that invokes a simulated autoscan process and produces a set of detailed reports that identify: Jobs that would be selected, executed late, or carried over to the next day (backlogged) Date and time that each job would be processed Location where each job would be processed Resources required for each job Amount of utilization for each report Whenever you define or update new jobs or jobsets in Job Management, you should run this report to ensure that the Job definitions you provided actually have the result that you intended.
63
In addition to standard reports, CA NSM r11.x includes a number of commands that can generate data files listing detailed information about settings and activities for a particular component or components. These include: cautenv dumpini (Windows NT/Windows 2000 Unicenter Manager) Enables you to display and dynamically modify the CA NSM EM Environment variables.
You can also direct output to a text file for future reference. On UNIX you can check the output of the env command for the user that starts Enterprise Management. Caiserv Creates a comprehensive file detailing your EM environment. This includes the CA NSM history files, logs, general system information (such as system variables) and specific component details (such as causec for Security). cauexpr.exe Copies Job Management definitions to a text file, cauexpr.txt, which you can then upload using cautil. Dsbulist Displays the Security Decision Support Binary (DSB) cache and files. dsm_report Writes a record of every object in the DSM store to a CSV file (dsmrpt.csv). The syntax is:
dsm_report a agentClass , agentClass.|-c |-@hostname| -v | -o filename| -h hostname
In verbose mode, dsm_report provides detailed information, which may be enough to eliminate the need for running storectrl.
64
oprdb script db > c:temp\cautili.txt Gets a copy of all your message records and actions in cautil format. You can then use the output file to build other Event Management machines and load the same message records and actions there. oprdb list Lists the contents of your Event Management DSB as summary (not suitable for cautil) and can be redirected to a text file. Whohas Displays the policies defined for assets in Security Management Whathas Displays the policies defined for users IDs in Security Management For more syntax and explanation, see the online Administrator Guide, CA Reference, and CA Procedures.
65
Verifying Functionality
The applyptf utility generates a history file which you can review to identify which patches were applied and when. This file, named machinename.his, can be found in the root of the install directory. It is in ASCII text format and, if needed, can be forwarded to Support for review as is. Consider the following sample entry from the history file:
[FRI Jan 10 14:27:21 2003] PTF Wizard installed LO91233 (TNGEM) RELEASE=2.4 GENLEVEL=GA COMPONENT=TNGCC PREREQS= MPREREQS= MDBQS= SUPERSEDE= INSTALLEDFILE= /uni/cci/bin/caiccid INSTALLEDFILE= /uni/cci/bin/cci INSTALLEDFILE= /uni/cci/bin/cciclnd INSTALLEDFILE= /uni/cci/bin/ccicntrl INSTALLEDFILE= /uni/cci/bin/ccirmtd INSTALLEDFILE= /uni/cci/bin/libcci.so INSTALLEDFILE= /uni/cci/bin/rmt INSTALLEDFILE= /uni/cci/bin/rmtcntrl
Line 1 indicates the patch number, along with the date and time it was applied. Lines 2 and 3 identify the release and genlevel of the patch. Line 4 identifies the target component - in this case CAICCI (TNGCC). Lines 5 through 7 identify any prerequisites, master image prerequisites and corequisites. Line 8 identifies any other patches that might be superseded by this patch. All remaining lines prefixed by INSTALLED FILE denote a file that has been replaced.
Verifying Functionality
Pulse check commands identify whether a component and its required services and machines are functional. The following sections identify such commands: Components and co-requisite components Databases Communications
66
Verifying Functionality
Components
When checking on the functionality of a CA NSM component, like Job Management, you should also verify that any co-requisite components are also fully functioning. For example, Job Management uses CAICCI to manage crossplatform job scheduling. If CAICCI is not properly configured, this impacts Job Management. Related components and required services are: Primary Component Agent Technology Related Components WorldView Required Services Services Control Mgr. (awservices) DSB (aws_orb) SNMP Gateway (aws_snmp) trap mux (aws_listen) object store (aws_store) DSM (aws_dsm) WorldView DSM Gateway (aws_wvgate)
CAICCI (required) Calendar (required) Calendar (optional) Agent Technology MS-Excel (required for Performance Trend and Chargeback) Job Management (for batch jobs to generate charts) CAM, CAFT (required) CAIENF, CAISSF (required) CAICCI (required), CAFT (optional), CAIENF (required) Event Management (optional) Calendar (required for Job Manager but not for Job Agent) Agent Technology MDB, WorldView Gateway Service, DSM Gateway, Severity Propagation Service Domain Server (pmdmnsrvr) Distribution Server (pmdstrbsrvr)
WorldView
Use the commands described next to verify the health of these components and services.
67
Verifying Functionality
If awservices is stopped, you are simply told this. If it is running, you see a breakdown of the services that are running. To list the version of all binaries included in the awservices status command, execute the following command:
awservices version
This also lists the agent versions. orbctrl This command line utility lists the services and agents that have attached to the instance of the Distributed Services Bus running on the specified system. It is a useful debugging tool because it can be used to verify that the required services are running on a remote host. The syntax is:
orbctrl -@ servicesHost
Where servicesHost is the hostname or IP address of a remote node. servicectrl Run the servicectrl utility to manage the awservices configuration for Agent Technology. To list the operational status of an agent or service, run the following:
servicectrl status
The display indicates whether the service is running or stopped. Servicectrl can also start and stop remote agents with the following syntax:
Servicectrl stop -remote=[machinename] -name=caiW2kOs Servicectrl start -remote=[machinename] -name=caiW2kOs
68
Verifying Functionality
The remote orb must be running for servicectrl to work. If awservices is not up, the servicectrl will fail. storectrl Run the storectrl command to display the information contained in any of the AT stores (asw_nsm, aws_sadmin and objstore). Consider the following example:
Below is an example of the type of information contained in the resulting temp1.txt file.
Note: These commands are case sensitive You can also get this information by running dsm_report in verbose mode. wvgethosts The wvgethosts command can extract a list of discovered hosts from the MDB and compare it to the DSM filter file. This can be useful for determining if a DSM is trying to manage a particular host. For example, if wvgethosts does not return the host machine you are trying to manage, then one of the following is probably true:
69
Verifying Functionality
The object is not classified properly in the MDB. The DSM does not try to manage objects that are unclassified. The discovered IP address is not in the range of that DSMs IP Address scoping. Syntax and Examples wvgethosts has the following arguments:
wvgethosts [-n DSMServer|ALL] [-o nsm/hosts/agents] [-c class] [-r repositoryName -u user -p passwd] [-d dbglvl] [-f logfile]
Note: If n is not specified, the local host is not selected. For example to get a list of all hosts managed by the DSM run the following:
wvgethosts -o hosts
Use the -o nsm switch to extract all agents instead of hosts. You can also list all hosts in the MDB with the DSM_server name instead of limiting the command to one DSM. The results of this command can be piped to a file to make it easier to view a long list. Because wvgethosts queries the MDB for objects, MS-SQL is required, unless the wvdbt option is implemented. Severity Propagation Service The Severity Propagation Service is a key component for several applications. Therefore, if this service fails, applications will likely be affected. Severity Propagation Service consists of sevprop.exe and sevpropcom.exe. The main function of sevprop is to administer the service. In prior releases, sevprop carried out the function of sevpropcom as well. To verify that the Severity Propagation Service is functioning correctly, do the following: 1. 2. 3. Verify that sevprop.exe and sevpropcom.exe binaries reside in \ca_appsw directory. Verify that Severity Propagation is part of Administration Group and has "Log on as a Batch Job" user rights. If the Severity Propagation service is active, ensure that sevpropcom.exe or sevpro~1.exe is running.
Enterprise Management
The following commands check the status of Enterprise Management and its components: Enterprise Management functions, common objects, and services:
610
Verifying Functionality
unifstat
Common Services
This section details the commands for checking the following common services: CAM and CAFT CAICCI CAM and CAFT The Unicenter Explorer interface, uses CA Message Queuing (CAM) and CA File Transfer (CAFT). Use the following command to review the status in both directions. You should see at least the opposing machine under the host category. A high number of retries could indicate a potential problem.
camstat nodename
Use the s 8000 option to verify whether large packets can be sent. This can help indicate if UDP is causing the problem. Use the following command to verify that CAM is operating successfully and to determine if the cam.cfg file includes the forward 127.0.0.1 command.
camcheck
Look for collect_message_spec( 127.0.0.1 ) called in the output. If CAM detects a configuration file error during startup, the fact is logged and the configuration record ignored. The camcheck program performs a syntax check on the configuration file. Blank lines and lines starting with a # (hash, pound or number) character are ignored. The cam configuration file (cam.cfg) is not present by default; however, you can build a cam.cfg file by executing the following command:
611
Verifying Functionality
camsave persist
Note: See the online CA Reference for additional details about the cam.cfg file. CAICCI The commands for checking CAICCI are described below. On Windows use the following command to verify the status of CAICCI:
ccicntrl status
In particular, verify that the remote and transport services are running. To administer remote CAICCI services, enter the following command:
rmtcntrl
Look for a response with the path of ccirmtd; do not be fooled by seeing a response for your grep. If you see the line with ccirmtd, CAICCI is running. To find out what version of ccirmtd is running on UNIX, enter:
what $CAIGLBL0000/cci/bin/ccirmtd
To identify which machines or applications the machine can talk to, enter this command:
ccii
For consoles, ccii needs to list the UNIX machine with the application CA_STARUNIX_SERVER. If this entry is not present in the output of the ccii command, the star console cannot connect to the UNIX machine. In addition to netstat, ping and nslookup, you can use the following commands to troubleshoot remote CAICCI connections: traceroute (UNIX) or tracert (Windows) to identify the route taken between two hosts. If the client cannot ping a host, this command can help identify where, along the network path, the failure is occurring. Note: You can also use the tracert command to identify the number of router hops between machines. It is recommended that your agents be located close to their DSMs. In general, they should be no more than two to three hops away.
612
Verifying Functionality
ccinet to pass commands to the ccirmtd daemon on UNIX (rmtcntrl on Windows NT). For example, ccinet ping can be used to send a special CAICCI test packet across the CAICCI connection, whereas ccinet status can be used to identify the status of the CAICCI connections. netstat -a to list all the network connections on the local box. Run it at the Command Prompt. If the command takes a long time to return any information, while the command netstat a completes quickly, then the system is having a problem resolving host names.
Databases
The MDB is critical to the functionality of your CA NSM implementation. On UNIX, you can use the caidbck program to get detailed statistics about a particular database. This program lists all the tablespaces and advises on capacity status, indicating which tablespaces are at or over 100%. If the values are high, run the schrecvr utility to clean up the database. It unloads, reloads, and clears logical errors. To identify which version of the MDB is being used, enter the following command: For UNIX:
RUN sql script \tnd\sql\INGVERSION.ING
For Windows:
isql U tngsa Q select string from TNGD.dbo.tng_class_ext where name=class_version and class_id=1
613
Verifying Functionality
To verify that DNS lookup is functioning correctly, ping the agent node and the manager node. In order to deliver traps from the agent node to the manager node, the communications path between agent and manager must be open. If your agent node configuration (aws_sadmin.cfg) file specifies manager nodes by hostname, use those hostnames in the ping. Enter the following command from both the agent and the manager:
ping a IP-address
Verify that the correct hostname is returned. The rping remote ping command is another useful tool that can be executed from a local manager machine to verify connectivity between another manager and the machines connected to it. You can also use the Agent Technology Remote Ping GUI to perform this function. If you are unable to ping the target server using its name, then you need to resolve the name resolution issue. If you are able to ping the target servers, the next step is to run the oprping command, which is similar to ping but uses CAICCI. The format of the oprping command is:
Oprping target-server number-of-pings test-message
To verify that you have SNMP communication to a specific machine use ObjectView to browse the agent MIB manually through port 6665. ObjectView can be accessed in context or by executing the objview command. If you are unable to establish SNMP communication, the problem may be an incorrect SNMP agent configuration or other network issues, such as firewall use, or even a security policy (for example, the SNMP device has been restricted to respond only to requests issued from specific IP addresses). On Windows to verify CAICCI connectivity to a specific node, use the following command:
u0verify -d=nodename
If UNIAPP.MAP file is missing an entry, this may appear as the first line of the error.
614
To test communications in Agent Technology, use the awm_config and awm_catch commands together. They let you manually push and catch messages on the Distributed Services Bus (DSB). The command awm_config pushes messages onto the DSB. The syntax is as follows:
awm_config -@ remote-node
The command awm_catch lets you display or redirect DSB messages to a file. awm_catch does not interfere with the normal delivery of the target message; the messages continue to their original destination. The syntax is as follows:
awm_catch -@ orbhostname message-type message-key
For example, the following command waits for poll event ICMP (ping) messages on the DSM at the node named OTHERHOST:
awm_catch -@ OTHERHOST POLL_EVENT ICMP
The following command waits for all SNMP poll responses containing the string mynode.cai.com:
awm_catch -F mynode.cai.com POLL_EVENT SNMP
Both commands can be run interactively or in batch. For more information about these commands and utilities, including examples and additional syntax options, see the online CA Reference.
615
Pertinent values include: PerObjectLogFilesIf set to 1, creates separate log file for each managed object. BreakOnBreakpointIf set to 1, aws_dsm breaks for debugging where AWDM_ASM_INT_3 is included. ResetOnAddMoIf set to 1, DSM resets managed object tree on AddMO call. LogFileSizeMaximumSets Max log file size in KB. The default is 4096. LogLevelSets debug log level. LogActiveSet to 1 to turn logging on. LogModeSelect WrapAround , Backup, or Limited. The default is WrapAround.
616
Where: service Indicates the name of the Agent Technology Service you want to configure (for example aws_sadmin or orbctrl). If you are not sure of the correct service name, use the orbctrl command to view the services. Specifies: MODESet to 0 to specify log wrapping. NumSet to an integer to specify the number of log files (for example 5). SIZESet to an integer to specify the size of the log file, in KB. LevelTo indicate the debug level (usually 4). n Specifies the level of logging required, ranging from 1 (for very high level errors only) to 9 (for extremely verbose logging). A log level of 4 provides a good middle level.
parm
For example, to configure logging for the SNMP Gateway service (aws_snmp) on the local host, issue the following command:
Awm_config -s LOG_CONFIG LOG:aws_snmp:Level SET:4
617
3. Click the Client Preferences tab at the right. 4. On the Calendar Management tab at the bottom, set Calendar Trace to Y. 5. From the same Command Prompt as in Step 1, enter:
start unitrace unicntrl start cal
7. Send the following trace output file to Computer Associates Technical Support: unitrace.001 UNIX To run a diagnostic trace for Calendar on UNIX, follow this procedure. 1. Perform the following from a UNIX shell:
CA_CAIDEBUG=1 ;export CA_CAIDEBUG CAICAL0000=1 ;export CAICAL0000 unishutdown cal script cal.out unistart cal common debug option cal debug option save screen output to file cal.out
2. Run test to repeat the error, and then issue the following:
exit unset CA_CAIDEBUG unset CAICAL0000 unicycle cal close cal.out
618
3. Click the Client Preferences tab at the right. 4. Select the Event Management tab at the bottom and enter 2 for OPR Trace: 0-2. 5. Select the Diagnostic Trace tab at the bottom and set: Router Trace to ON GUI Trace to ON Trace: 0-2 to 2 Common debug to Y 6. From the same Command Prompt as in Step 1, enter:
start unitrace unicntrl start opr
8. Send the following files to Computer Associates Technical Support: unitrace.001 (trace output) Logs for that day (for example, for Oct 15, 2006 include: CA\SharedComponents\CCS\WVEM\Logs\ 20061015.IDX 20061015.LDX 20061015.LOG UNIX To run a diagnostic trace for Event Management on UNIX, follow this procedure. 1. Perform the following from a UNIX shell:
export CA_CAIDEBUG=1 unishutdown opr script caiopr.out unistart opr common debug option save screen output to file caiopr.out
2. Run test to repeat the error, and then issue the following:
exit unset CA_CAIDEBUG unicycle opr close caiopr.out turn off tracing
3. Send the following files to Technical Support: caiopr.out (trace output) Console logs for that day from directory $CAIGLBL0000/opr/logs:
619
3. Click the Client Preferences tab at the right. 4. Select the Job Workload Management tab at the bottom and set: Full Trace to Y Common Debug to Y 5. From the same Command Prompt as in Step 1, enter:
start unitrace unicntrl start sch
7. Send the following file to Technical Support: unitrace.001 UNIX To run a diagnostic trace for Job Management on UNIX, follow this procedure. 1. If you can reproduce the problem, gather the following trace information:
# # # # # unishutdown sche LEVEL2TRC=y;export LEVEL2TRC LEVEL2TRK=y;export LEVEL2TRK LEVEL2MTR=y;export LEVEL2MTR unistart sche
2. Get all files in the $CAISCHD0006 directory, unset the traces variables, and recycle Job Management.
620
621
The /D:MSSQLServer parameter only needs to be specified if MS-SQL Server is on the same machine. This creates a sevprop_SCM log under the \CA_APPSW directory and a sevpropcom_trace.log under the C: drive root directory.
As on Windows, trace the library and the daemon processes. The environment variable, CAI_CCI_DEBUG, enables tracing when set. After a process starts, its environment is set, so you need to recycle the process to set the environment variable in its process space. Note: You should carefully monitor the amount of time the CAICCI trace is running as it may generate a large amount of data, potentially filling up the file system if left to run for too long. Library and Daemon Tracing On UNIX, do the following to trace CAICCI (library and daemon tracing): 1. Perform the following from a UNIX shell:
# # # # # script /tmp/trace.cci.script unishutdown all CAI_CCI_DEBUG=y;export CAI_CCI_DEBUG rm $CAIGLBL0000/cci/logs/* unistart all (Or the problem application where applicable).
622
# exit
4. Send the following documentation to Technical Support: /tmp/trace.cci.script /tmp/conlog.cci.out $CAIGLBL0000/cci/logs directory Remote Daemon Tracing On UNIX, do the following to trace CAICCI remote daemon: 1. Perform the following from the UNIX shell:
# script/tmp/trace.cci.script # ccinet debugon
4. Send the following files to Technical Support: /tmp/trace.cci.script /tmp/conlog.cci.out ccirmtd.prf $CAIGLBL0000/cci/logs directory Application and Daemon Tracing To trace CAICCI daemons and applications on UNIX, do the following: 1. Enter the following from a UNIX shell:
# # # # unishutdown all CAI_CCI_DEBUG=y rm $CAIGLBL0000/cci/logs/* unistart all
(make it easier to get current doc) (start just the problem app here)
4. To unset the trace variable and restart the product, do the following:
623
The trace files are located in $CAIGLBL0000/cci/logs/ directory. They include separate files for the CAICCI daemon processes and files in the format ccistub_pid.log, which are trace files generated by the applications calling into the CAICCI library.
CAM
To start tracing for a local CAM (CA Message Queue), enter the following:
camconfig trace=all
Dynamic Tracing
aws_sadmin To activate dynamic tracing for aws_sadmin, do the following: 1. Create a batch file that includes the following (nodename identifies the DSM machine):
rem get the current log setting awm_config -c 1 -s LOG_CONFIG LOG:AwNsm@nodename:Level GET LOG_RESULT "" rem change it to 4 if "%1" == "" goto end awm_config -c 1 -s LOG_CONFIG LOG:AwNsm@nodename:Level SET:%1 LOG_RESULT "" :end
Note: The first executable statement displays the current debug level and the second one sets the debug level to the required value. Set this to 4. To turn on the debug option for aws_wvgate dynamically, change AwNsm@nodename to aws_wvgate (in other words, awm_catch -c 1 -s LOG_CONFIG LOG:aws_wvgate:Level SET:%1) 2. Restart awservices. Do not start aws_dsm or aws_wvgate in debug mode because this may generate an extremely large debug file. 3. If aws_dsm is not performing any FSM event, turn dynamic tracing on by executing the following (batchfilename identifies the batch file you created in Step 1):
batchfilename 4
4. Let the file run for a few minutes to ensure that the debug data is logged in the \AT\SERVICES\VAR\LOG\aws_dsm.log file
624
Note: You may need to increase the size of the log file. Be advised that GET LOG RESULT may not always return a response and may cause the request to hang. CAICCI To perform dynamic tracing for CAICCI, use the ccir and ccis commands: To send from Windows to Unix: 1. Enter the following on the UNIX machine:
-cd $CAIGLBL0000/cci/bin ./ccir
2. Press Enter. The machine waits for a response. 3. Enter the following on the Windows machine:
dos> ccis unixmachine 3
This should send 3 test messages to the UNIX machine. To send from UNIX to Windows: 1. Enter the following on the Windows machine:
dos> ccir
2. Press Enter. The machine waits for a response. 3. Enter the following on the UNIX machine:
-cd $CAIGLBL0000/cci/bin -./ccis ntmachine 3
Circular Trace
To run the circular unitrace: 1. Set the environment variable UNITRACE_CIRC_SIZE to a number of bytes. Use a large enough number so that the data will not be overwritten if the trace keeps running after the problem occurs. For example, set UNITRACE_CIRC_SIZE=15000000 (~15MB). The default size is 1MB. Note: Five files are created, so you need five times the amount of space specified by UNITRACE_CIRC_SIZE. 2. Start the tracing by running the following command:
625
start unitrace -c
This starts the unitrace in a circular trace. Five files are created: unitrace.001 through unitrace.005. Although each file is the size specified by the UNITRACE_CIRC_SIZE value, only unitrace.002 through unitrace.005 are overwritten so that startup information is not lost.
626