Troubleshoot CUCM Database Replication Issues
Troubleshoot CUCM Database Replication Issues
Issues
Contents
Introduction
Steps to Diagnose the Database Replication
Step 1. Verify database replication is broken
Step 2. Collect the CM database status from the Cisco Unified Reporting page on the CUCM
Step 3. Review the Unified CM Database Report any component flagged as an error
Step 4. Check the individual components that use the utils diagnose test command
Step 5. Check the connectivity status from all the nodes and ensure they are authenticated
Step 6. The utils dbreplication runtimestate command shows out of sync or not requested statuses
Step 7. Repair all/selective tables for database replication
Step 8. Reset the database replication from scratch
Introduction
This document describes how to diagnose database replication issues and provides the steps
necessary to troubleshoot and resolve those issues.
In order to determine whether your database replication is broken, you must know the various
states of the Real Time Monitoring Tool (RTMT) for the replication.
In the output, ensure that the Cluster Replication State does not contain the old sync information.
Check the same and use the Timestamp.
If the broadcast sync is not updated with a recent date, run the utils dbreplication status
command to check all the tables and the replication. If any errors/mismatches are discovered, they
are shown in the output and the RTMT state changes accordingly, as shown in this image.
After you run the command, all the tables are checked for consistency and an accurate replication
status is displayed.
Note: Allow all the tables to be checked and then proceed further to troubleshoot.
Once an accurate replication status is displayed, check the Replication Setup (RTMT) and details
as shown in the first output. You must check the status for every node. If any node has a state
other than 2, continue to troubleshoot.
Step 2. Collect the CM database status from the Cisco Unified Reporting page
on the CUCM
1. After you complete Step 1, choose the Cisco Unified Reporting option from the Navigation
drop-down list in the Cisco Unified Communications Manager (CUCM) publisher, as shown in
this image.
2. Navigate to System Reports and click Unified CM Database Status as shown in this image.
3. Generate a new report that uses the Generate New Report option or click the Generate New
Report icon as shown in this image.
4. Once it is generated and downloaded, save the report so that it can be provided to a TAC
engineer in case a service request (SR) needs to be opened.
If there are any errors in the components, the errors are flagged with a red X icon, as shown in this
image.
● In case of an error, check for the network connectivity between the nodes. Verify if the A Cisco
DB service runs from the CLI of the node and uses the utils service list command.
● If the A Cisco DB service is down, run the utils service start A Cisco DB command to start
the service. If this fails, contact Cisco TAC.
● Ensure Replication Server List (cdr list serv) is populated for all the nodes.
This image illustrates an ideal output.
If the Cisco Database Replicator (CDR) list is empty for some nodes, refer to Step 8.
● Ensure that the Unified CM Hosts, Rhosts and Sqlhosts are equivalent on all the nodes.
This is an important step. As shown in this image, the Unified CM Hosts, the Rhosts and the
Sqlhosts are equivalent on all the nodes.
Refer to this link in order to change IP address to the Hostname for the CUCM.
Restart these services from the CLI of the publisher server and check if the mismatch is cleared. If
yes, go to Step 8. If no, contact Cisco TAC. Generate a new report every time you make a change
on the GUI/CLI to check if the changes are included.
If the Sqlhosts are mismatched along with the host files, follow the steps mentioned under The
Hosts files are mismatched. If only the Sqlhosts files are mismatched, run the command from
the CLI:
● Ensure that the Database Layer Remote Procedural Call (DBL RPC) hello is successful, as
shown in this image.
● Ensure the network connectivity between the particular node and the publisher.
● Ensure that the port number 1515 is allowed on the network.
Refer to this link for details on TCP/UDP port usage:
● Ensure that the network connectivity is successful between the nodes, as shown in this image:
If the network connectivity fails for the nodes:
Step 4. Check the individual components that use the utils diagnose test
command
The utils diagnose test command checks all the components and returns a passed/failed value.
The components that are essential for the proper functioning of the database replication are:
● Network Connectivity:
The validate_network command checks all aspects of the network connectivity with all the nodes
in the cluster. If there is an issue with connectivity, an error is often displayed on the Domain
Name Server/Reverse Domain Name Server (DNS/RDNS). The validate_network command
completes the operation in 300 seconds. The common error messages as seen in the network
connectivity tests:
●Cause
This error is caused when one or more nodes in the cluster have a network connectivity problem.
Ensure that all the nodes have ping reachability.
● Effect
If the intra-cluster communication is broken, database replication issues occur.
●Cause
This error is caused when the reverse DNS lookup fails on a node. However, you can verify
whether the DNS is configured and functions properly when you use these commands:
utils network eth0 all - Shows the DNS configuration (if present)
utils network host <ip address/Hostname> - Checks for resolution of ip address/Hostname
● Effect
If the DNS does not function correctly, it can cause database replication issues when the servers
are defined and use the hostnames.
It is extremely important for the NTP to be fully functional in order to avoid any database
replication issues.
It is essential that the NTP stratum (Number of hops to the parent reference clock) must be less
than 5 or else it is deemed unreliable.
1. Use the utils diagnose test command to check the output, as shown in this image.
Step 5. Check the connectivity status from all the nodes and ensure they are
authenticated
1. After you complete Step 4, if there are no issues reported, run the utils network
connectivity command on all the nodes to check the connectivity to the databases is
successful, as shown in this image.
2. If you receive Cannot send TCP/UDP packets as an error message, check your network for
any retransmissions or block the TCP/UDP ports. The show network cluster command checks
for authentication of all nodes.
3. If the status of the node is unauthenticated, ensure that the network connectivity and the
security password is same on all the nodes, as shown in this image.
It is important to understand that the database replication is a network intensive task as it pushes
the actual tables to all the nodes in the cluster. Ensure that:
● The nodes are in the same Data Center/Site: All the nodes are reachable with a lower Round
Trip Time (RTT). If the RTT is unusually high, check network performance.
● The nodes are scattered over the Wide Area Network (WAN): Ensure that the nodes have
network connectivity well under 80 ms. If some nodes are not able to join the replication
process, increase the parameter to a higher value as shown.
Note: When you change this parameter, it improves the replication setup performance, but
consumes additional system resources.
● The replication timeout is based on the number of nodes in the cluster: The replication timeout
(Default: 300 Seconds) is the time that the publisher waits for all the subscribers in order to
send their defined messages. Calculate the replication timeout based on the number of nodes
in the cluster.
Server 1-5 = 1 Minute Per Server Servers 6-10 = 2 Minutes Per Server Servers >10 = 3 Minutes
Per Server.
Example: 12 Servers in Cluster : Server 1-5 * 1 min = 5 min, + 6-10 * 2 min = 10 min, + 11-12 *
3 min = 6 min,
Repltimeout should be set to 21 Minutes.
Commands to check/set the replication timeout:
Checklist:
● All the nodes have the connectivity to each other. Refer to Step 5.
● Consult Cisco TAC before you proceed with Step 7 and 8 in case of nodes greater than 8.
If the utils dbreplication runtimestate command shows that there are error/mismatched tables,
run the command:
Refer to the sequence to reset the database replication and start the process from scratch.
Refer to the sequence to reset the database replication for a particular node:
● The utils create report database command from CLI. Download the .tar file and use a SFTP
server.