Chassis Cluster Interface Monitoring

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Chassis cluster: Redundancy groups and

interface monitoring
By richard_pracko
June 11, 2016 8:51 am

This post is focused on the interface monitoring functionality in

redundancy groups.
Redundancy groups (RG) in SRX chassis cluster provide high-availability.
They fail over from one node to the other in case of failure. You can
configure the cluster to monitor physical state of interfaces (interface
monitoring) and/or check the reachability of IP addresses (IP monitoring).

Combining these options is quite flexible and allows you to define the
desired circumstances that represent failure. For example: single interface
physical failure, multiple interfaces physical failure, unreachable single IP
address, unreachable multiple IP addresses, single interface physical
failure and at the same time one IP address unreachable, etc.

In this post we will have a look at couple of the interface monitoring

options. Please see the example setup below:

 RG1 contains redundant ethernet reth0 (ge-0/0/4 and ge-

5/0/4 are child interfaces)

 RG2 contains redundant ethernet reth1 (ge-0/0/5, ge-0/0/6,

ge-5/0/5 and ge-5/0/6 are child interfaces). The cluster forms
2 LAG interfaces - one on node0 (ge-0/0/5 and ge-0/0/6) and
the other on node1 (ge-5/0/5 and ge-5/0/6).

 ge-0/0/3 and ge-5/0/3 interfaces are uplinks. The dynamic

routing protocol is used for the uplink path selection.

Each monitored interface is assigned a numerical value (range 0-255)

called weight. The failover is triggered when the cumulated weight of all
failed interfaces equals or is more than 255. The configured weight of the
monitored interfaces is crucial. It defines whether a single interface (with
the weight of 255) causes the failover or multiple interfaces need to fail
the same time (with the weight less then 255).

In our example, failure of any reth0 child interface causes the failover of

The same approach can be used also for RG2. Because the reth1 has 4
child interfaces another option exists. The failover would be triggered
when the whole LAG fails, i.e. no active LAG links are available on the
node. In our case it requires both child interfaces on one node to fail at
the same time. To achieve it the weight of each child interface has to be
less then 255. But at the same time the cumulated weight of 2 child
interfaces needs to be 255 or more. For example: 200 and 200, 150 and
150, 200 and 100, 254 and 99, etc.

Please have a look on our redundancy groups example configuration:

The “show chassis cluster status” command displays the RG state and and
the “show chassis cluster interfaces” command lists the details about
monitored interfaces.
Failure of any single reth1 child interface does not change the ownership
of RG2. It remains primary on node1.
However, failure of both reth1 child interfaces on the node1 results in RG2
transitioning to node0..
This approach can be generalized. If failure of N or more interfaces should
trigger the failover their weights need to fulfill following criteria: The
cumulated weight of N interfaces is 255 or more but at the same time the
cumulated weight of N-1 interfaces has to be less than 255.

For instance lets assume following examples:

 Three or more interfaces (N=3) should trigger the failover.

The cumulated weight of 3 interfaces is above 255 but the
cumulated weight of 2 interfaces is less then 255. Possible
options are: (100, 100, 100) or (120, 120, 120), etc.

 Four or more interfaces (N=4) should trigger the failover.

The cumulated weight of 4 interfaces is above 255 but the
cumulated weight of 3 interfaces is less then 255. Possible
options are: (80, 80, 80 80) or (70, 70, 70, 70), etc.

Furthermore RGs can monitor reth child interfaces from other RGs or
interfaces that do not belong to any reth/RG at all (called local interfaces).

A single interface can be monitored by multiple RGs and in each RG have

a different weight defined. And with the weight of 255 it can cause
simultaneous failover of multiple RGs.

In our example the ge-0/0/3 and ge-5/0/3 are local interfaces monitored by
RG1 as well as by RG2. Both RGs have weight of 255 associated with
those interfaces. If one uplink fails the both RG1 and RG2 will transition to
the node with the remaining one. It helps to avoid transit traffic traversing
the data link between nodes.
The cluster status below is after the previously failed interfaces (ge-5/0/5
and ge-5/0/6) are recovered. The RG1 remains primary on node0.

Now if the ge-0/0/3 interface fails the RG1 and RG2 failover to node1.
The “show chassis cluster information” command is very useful for
troubleshooting because is displays detailed information about the chassis
cluster. Multiple parameters can be defined for the command which
provide further details about the cluster. For instance the “interface-
monitor” parameter reveals the history of monitored interfaces. Please
keep in mind the command is hidden in Junos release 11.4.

This post focused on the interface monitoring functionality in SRX chassis

cluster. It allows to define monitored interfaces and their weights on per
redundancy group basis. This makes it quite flexible and capable of
accommodating various failover scenarios.

You might also like