NBU TS Net 1
NBU TS Net 1
NBU TS Net 1
002
10.0.0.32:748 -> 10.0.0.59:13782
10.0.0.32:983 <- 10.0.0.59:635
A connection from the server fred is tried to the client wilma by using the vnetd port number if
possible:
# bptestbpcd -M fred -client wilma -connect_options 2 2 0 1 1 1
10.0.0.59:40983 -> 10.0.0.104:13724
10.0.0.59:40984 -> 10.0.0.104:13724
10.0.0.59:40985 -> 10.0.0.104:13724
This command will make connections to client on the vnetd port instead of pbx?
And from next time backup will use vnetd port to connect to client?
The connect option only apply to the connection bptesbpcd is about to perform.
If you want the connect option to be permanent you have to place the option in bp.conf on the master
and media servers.
The first setting indicates the type of port to use to connect to bpcd on the client:
0 = Use a reserved port number.
The second setting indicates the bpcd call-back method to use to connect to the client:
0 = Use the traditional call-back method
The third setting indicates the connection method to use to connect the client:
2 = Connect to a daemon on the server by using the traditional port number of the daemon only.
Description
The information below is accurate for the specific version of NetBackup that is targeted. The details
relevant to each NetBackup version can be found in this article:
000017676
Best Practices for bptestnetconn including arguments and outputs by NetBackup version
Using the following command can test a connection from a NetBackup host, a master server for exam
ple, to another NetBackup host, like a media server and to a service that should be running on that
host. Here is an example command:
$ bptestnetconn -v -cnbrmms/DiskPollingService.DPS -t 10 -o 5 -H mymm
If there is a perceived problem with the master server polling disk pools on a media server, a test to
the nbrmms service can be done. This example shows SUCCESS, so the remote host is reachable and
the service is running.
$ bptestnetconn -v -cnbrmms/DiskPollingService.DPS -t 10 -o 5 -H mymm
adding hostname = mymm
------------------------------------------------------------------------
Connecting to 'nbrmms/DiskPollingService.DPS'
CN: mymm : 80 ms [SUCCESS] PBX: Yes VNETD: Yes BPCD: Yes
------------------------------------------------------------------------
Total elapsed time: 0 sec
The connection test will make a connection to PBX which is passed to the nbrmms service. Here is
what the behavior should look like in the logs when a successful connection is made:
Because the connection to PBX is passed to the nbrmms service, the nbrmms debug log will show the
connection arriving from the IP address of the master server.
12/07/2011 07:58:08.027 [Debug] NB 51216 libraries 137 PID:28355 TID:1123518784 File ID:222
[No context] 1 [vnet_cached_getaddrinfo_and_update] ../../libvlibs/vnet_addrinfo.c.1370:
1123518784: found in cache name: 10.10.6.101
12/07/2011 07:58:08.027 [Debug] NB 51216 libraries 137 PID:28355 TID:1123518784 File ID:222
[No context] 1 [vnet_cached_getaddrinfo_and_update] ../../libvlibs/vnet_addrinfo.c.1371:
1123518784: found in cache service: NULL
12/07/2011 07:58:08.027 [Debug] NB 51216 libraries 137 PID:28355 TID:1123518784 File ID:222
[No context] 1 [vnet_cached_getaddrinfo_and_update] ../../libvlibs/vnet_addrinfo.c.1514:
1123518784: found in file cache name: masterserver.name.local
12/07/2011 07:58:08.027 [Debug] NB 51216 libraries 137 PID:28355 TID:1123518784 File ID:222
[No context] 1 [vnet_cached_getaddrinfo_and_update] ../../libvlibs/vnet_addrinfo.c.1515:
1123518784: found in file cache service: NULL
The verbose (-v) argument to bptestnetconn also caused a connection to the bpcd process on the media
server. This is a single connection to prove bpcd is reachable, and does not attempt to bring up a a call
-back connection so the bpcd debug log shows the initial connection and then a failure when bptestnet
conn closes the initial socket without providing the information for the call-back connection.
Similarly the verbose (-v) argument also causes a connection to the vnetd service on the media server.
Once the connection is complete, bptestnetconn closes the connection without negotiating the vnetd
protocol or sending across a vnetd command as show in the vnetd debug log on the media server.
The bptestnetconn program can be used to test connectivity to any valid CORBA using service/object.
See the Related Articles for a list of some of the services and objects.
All configured NetBackup servers can be tested at one time by replacing the host (-H host) argument
with the server (-s) argument.
$ bptestnetconn -v -c -o 5 -t 10 -s
SERVER = mymaster
SERVER = myadmin
SERVER = myoldmm
MEDIA_SERVER = mymm
------------------------------------------------------------------------
Connecting to 'nbsl/HSFactory'
CN: mymaster : 11 ms [SUCCESS] PBX: Yes VNETD: Yes BPCD: Yes
CN: myadmin : 4 sec [TRANSIENT] PBX: Yes VNETD: Yes BPCD: Yes
CN: myoldmm : 4 sec [TRANSIENT] PBX: No VNETD: No BPCD: No
CN: mymm : 12 ms [SUCCESS] PBX: Yes VNETD: Yes BPCD: Yes
------------------------------------------------------------------------
Total elapsed time: 17 sec
In the above example, the orbtimeout (-t 10) and orbobjtimeout (-o 5) arguments were specified.
These limit the time permitted for the connection to the PBX and the service/object respectively.
When a connection fails, the length of the timeout will indicate which portion of the connection failed.
In the above test we see two hosts with SUCCESS and two hosts with TRANSIENT. The myadmin
host is showing TRANSIENT because it does not run the nbsl service because it is a Windows
Administration Console host. The myoldmm host is on the network, but both PBX and NetBackup
are shutdown, ideally this host should be removed from the NetBackup configuration. The 4 second
timeout confirms that a network route exists, but that the service and/or object was not available.
The customer conducts a checkpoint every 7 minutes. One of his clients, bkoweb26, began to backup
successfully after creating an exclusions list. Also noticed a timeout of infinity in bpbrm, see snippet.
We have made suggestions and recommendations, below. Some were made, but not all.
Environment info:
Master: nmbackup01, configured on third-party, NBU version 8.1.1, Platform: AIX ver. 7.1
Media: nmbpmed05, configured on third-party server, NBU version 8.1.1, Platform AIX ver. 7.1
Clients: nmocmi02, bkoweb26 and bkoweb25
Recommendations:
1) Increase checkpoints to every 60min. (not done)
2) Decrease timeouts to 7200 instead of infinite (not done). I'm thinking these are
CLIENT_READ_TIMEOUT or CLIENT_CONNECT_TIMEOUT = 7200 in bp.conf
3) Since exclusions helped one client succeed, I wondered if the test in this technote could apply to
this situation: https://www.veritas.com/support/en_US/article.100003560
4) Check for communication related patches (tried but did not find any online)
5) Run bppllist and bpplinfo on media. Run bpgetconfig from a failing and a successful client, to
compare. (not done)
Use vxlogview -p 50936 -o 103, or check the /var/adm/syslog for the PBX logs written by the O/S.
Would also help to check the bprd logs on the master. You'll see the connection attempts to the
master, and you may see the cause of the network disruption.
Status 40 is generally related to another process terminating the connection, like a firewall or
something on the system interrupting the connection as opposed to something timing out.
We need to see ALL text in job details to show timestamps and PIDs
Can you please point out which logs exactly contain the specific job, timestamps and PID reflected
here:
5:53:15 AM - Info bpbrm (pid=7471230)
Important to look at one set of logs when you troubleshoot - see which client and which media server
exactly, what the timestamps and PIDs are, then follow the process flow in the relevant logs.
For example, it does not help to compare this activity monitor entry with the bpbrm log snippet that
happened at a different time and with different PIDs. Are these even the same media server?:
Mar 28, 2020 5:53:14 AM - Error bpbrm (pid=10289380) db_FLISTsend failed: network connection
broken (40)
Mar 28, 2020 5:53:15 AM - Info bpbrm (pid=7471230) sending message to media manager: STOP
BACKUP bkoweb26_1585391368
00:07:52.377 [18415728.1] <2> db_getdata: timeout is 0 (infinite)
Level 3 bpbrm and bptm logs on media server will confirm if data was received from client in a
continuous stream when network failure occurred.
Always best to involve network and firewall team in situations like these - they need to monitor the
port connections while the backup is running.
The issue you posted closely resembles the issue at hand because (I failed to mention) only FULL
backups are failing with this exit status. INCs are successfully backing up.
And here lies the frustration...I kept getting half the conversation no matter how many times I re-
requested logs. So the job detail I posted was the original job log. I can post the whole log, but it
seems moot since that bkoweb26 client is now backing up successfully.
"Which timeouts exactly are configured as 0 (infinite)?"
I have no idea, nor does my senior engineer. I only mentioned having it decreased to 7200 because it
appears all others on his server are set at that, aside from the infinite value, which I am guessing was
purposely performed.
Do you think checkpoints are an issue? If not, I will stop harping him on that.
Could you point me to where you could deduce that? That is helpful to know.
The logs attached are the most recent bpbrm and bptm logs at a verbose level 3. I'd really love to
involve their networking team, but we received pushback, until we have concrete evidence it's
network or FW related.
I've been in that situation before, it is quite frustrating to constantly need the burden of proof to
receive logs.
Can you post the Detailed Status from the failing client?
Would also help to see the output from bptestbpcd -client <client name> -verbose -debug
This will give you a detailed output of the connection attempt through BPCD to the client.
What kind of O/S is the client? If it's Windows, look at Windows Defender/Firewall. If it's Linux, take
a look at the iptables.
Is this a recent occurrence, or has this client always had issues? Lots of background info needed, but
based off of what you've said you should focus your troubleshooting on the client.
I have requested a bptestbpcd, bpgetconfig -e, vxlogview and bppllist -allpolicies from the failing
client and master. It turned out @Marianne was right, we've been looking at the wrong media server,
which we realized when we did not see the client listed in the media's policies. So, I mustered up the
courage to ask for the full gammut of logs (again) by hostname, and not by item, this time. Stay tuned
for that info...
The media and master are running AIX and the clients are running RHEL7.7. I did not look into
iptables because smaller fulls and incs are completing successfully. Something must have changed
because this is the first time this client is having any issues (I've checked into the last year). They use
Accelerator in some of their policies, but I did not see it in this client's log. I will find out soon though.
I believe everyone here is on the same page and know it's a network or FW issue. I have a strong
feeling it's the FW timeout, but I HAVE to give the proof before suggesting it. So, your keen eyes and
patience on this will be much appreciated!
This is communication gap or issue within master and media servers. I have faced similar situation
and really hard to pin-point the problem.
Please verify if you are using multiple nic's or ip's to communicate with media servers. Try using only
one IP or nic on the master server to communicate with all the media server and check if it helps.
I just received a response. They have a dedicated NIC for backups with only a single IP configured.
Only the vxlogs for days of failure and that he only has a dedicated NIC for backups, no multiple IP
configs.
He said he had "no binary file is created for bpgetconfig" and left it at that. I have the vxlogs you
requested, I looked at it and didn't know what to really look for or how to tell if something is off.
Please let me know if you see anything out of the ordinary.