VPC Peer Switch - Sharaf
VPC Peer Switch - Sharaf
VPC Peer Switch - Sharaf
This section describes the vPC Peer Switch enhancement, which is enabled with the peer-switch
vPC domain configuration command.
Overview
In many environments, a pair of Nexus switches in a vPC domain are aggregation or core
switches acting as the boundary between Layer 2 switched Ethernet domains and Layer 3 routed
domains. Both switches are configured with multiple VLANs and are responsible for routing
inter-VLAN east-west traffic as well as north-south traffic. In these environments, the Nexus
switches also typically act as root bridges from a Spanning Tree Protocol perspective.
Normally, one vPC peer is configured as the root bridge of the Spanning Tree by setting its
Spanning Tree priority to a low value, such as 0. The other vPC peer is configured with a slightly
higher Spanning Tree priority, such as 4096, which allows it to take over the role of root bridge
within the Spanning Tree if the vPC peer acting as the root bridge fails. With this configuration,
the vPC peer acting as the root bridge originates Spanning Tree Bridge Protocol Data Units
(BPDUs) with a Bridge ID containing its system MAC address.
However, if the vPC peer acting as the root bridge fails and causes the other vPC peer to take
over as the Spanning Tree root bridge, the other vPC peer originates Spanning Tree BPDUs with
a Bridge ID containing its system MAC address, which is different from the original root
bridge's system MAC address. Depending on how downstream bridges are connected, the impact
of this change varies and is described in the following subsections.
Non-vPC-connected bridges that are connected to both vPC peer with redundant links (such that
one link is in a Blocking state from a Spanning Tree Protocol perspective) that detect the change
in the BPDU (and, therefore, the change in root bridge) observe a change in Root Port. Other
Designated Forwarding interfaces immediately transition to a Blocking state, then traverse the
Spanning Tree Protocol finite state machine (Blocking, Learning, and Forwarding) with pauses
in between equivalent to the configured Spanning Tree Protocol Forward Delay timer (15
seconds by default).The change in Root Port and subsequent traversal of the Spanning Tree
Protocol finite state machine can cause a significant amount of disruption within the network.
The vPC Peer Switch enhancement was introduced primarily to prevent network disruption
caused by this issue if one of the vPC peers were to go offline. With the vPC Peer Switch
enhancement, the non-vPC-connected bridge still has a single redundant link that is in a
Blocking state, but immediately transitions that interface to a Forwarding state if the existing
Root Port goes down due to link failure. The same process happens when the offline vPC peer
comes back online - the interface with the lowest cost to the root bridge seizes the Root Port role,
and the redundant link immediately transitions to a Blocking state. The only data plane impact
that is observed is the unavoidable loss of packets in-flight that were traversing the vPC peer as it
went offline.
vPC-Connected Bridges
vPC-connected bridges in the Spanning Tree domain detect the change in the BPDU (and,
therefore, the change in root bridge) and flush dynamically learned MAC addresses from their
local MAC address tables.
This behavior is inefficient and unnecessary in topologies with vPC-connected devices that are
not reliant on Spanning Tree Protocol for a loop-free topology. vPCs are viewed as a single
logical interface from a Spanning Tree Protocol perspective just like normal port-channels, so the
loss of a vPC peer is similar to the loss of a single link within a port-channel member. In either
scenario, the spanning tree does not change, so the flush of dynamically learned MAC addresses
from bridges in the spanning tree domain (the purpose of which is to allow Ethernet's flood-and-
learn behavior to re-learn MAC addresses on newly-forwarding interfaces of the spanning tree)
is unnecessary.
Furthermore, the flush of dynamically learned MAC addresses could potentially be disruptive.
Consider a scenario where two hosts have a largely unidirectional UDP-based flow (such as a
TFTP client sending data to a TFTP server). In this flow, data mostly flows from the TFTP client
to the TFTP server - rarely does the TFTP server send a packet back towards the TFTP client. As
a result, after a flush of dynamically learned MAC addresses in the Spanning Tree domain, the
TFTP server's MAC is not learned for some time.
This means the TFTP client's data sent towards the TFTP server is flooded throughout the
VLAN, as the traffic is unknown-unicast traffic. This can cause large data flows to travel to
unintended places within the network and can cause performance issues if it flows through
oversubscribed sections of the network.
The vPC Peer Switch enhancement was introduced to prevent this inefficient and unnecessary
behavior from occurring if the vPC peer acting as the Spanning Tree root bridge for one or more
VLANs is reloaded or powered off.
To enable the vPC Peer Switch enhancement, both vPC peers must have identical Spanning Tree
Protocol configuration (including Spanning Tree priority values for all vPC VLANs) and be the
Root Bridge for all vPC VLAN. Once these prerequisites are met, the peer-switch vPC domain
configuration command must be configured to enable the vPC Peer Switch enhancement.
Once the vPC Peer Switch enhancement is enabled, both vPC peers begin originating identical
Spanning Tree BPDUs with a Bridge ID containing the vPC system MAC address that is shared
by both vPC peers. If a vPC peer is reloaded, the Spanning Tree BPDU that is originated by the
remaining vPC peer does not change, so other bridges in the Spanning Tree domain do not see
any change in the root bridge and do not react sub-optimally to the change in the network.
Caveats
The vPC Peer Switch enhancement has some caveats you should be aware of prior to configuring
it in a production environment.
Before enabling the vPC Peer Switch enhancement, Spanning Tree priority configuration for all
vPC VLANs must be modified so that it is identical between both vPC peers.
Consider the configuration here, where N9K-1 is configured to be the Spanning Tree root bridge
for VLANs 1, 10, and 20 with a priority of 0. N9K-2 is the secondary Spanning Tree root bridge
for VLANs 1, 10, and 20 with a priority of 4096.
N9K-1#
show running-config spanning-tree
spanning-tree vlan 1,10,20 priority 0
interface port-channel1
spanning-tree port type network
!
N9K-2#
show running-config spanning-tree
spanning-tree vlan 1,10,20 priority 4096
interface port-channel1
spanning-tree port type network
!
Prior to enabling the vPC Peer Switch enhancement, you must modify the Spanning Tree priority
configuration for VLANs 1, 10, and 20 on N9K-2 to match the Spanning Tree priority
configuration for the same VLANs on N9K-1. An example of this modification is shown here.
N9K-2#
configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
N9K-2(config)#
spanning-tree vlan 1,10,20 priority 0
N9K-2(config)#
end
!
N9K-2#
show running-config spanning-tree
spanning-tree vlan 1,10,20 priority 0
interface port-channel1
spanning-tree port type network
!
N9K-1#
show running-config spanning-tree
spanning-tree vlan 1,10,20 priority 0
interface port-channel1
spanning-tree port type network
!
In this topology, two vPC peers (N9K-1 and N9K-2) have two Layer 2 trunks between them -
Po1, and Po2. Po1 is the vPC Peer-Link carrying vPC VLANs, while Po2 is a Layer 2 trunk
carrying all non-vPC VLANs.
If the Spanning Tree priority values for non-vPC VLANs carried across Po2 are identical on
N9K-1 and N9K-2, then each vPC peer originates Spanning Tree BPDU frames sourced from the
vPC system MAC address, which is identical on both switches. As a result, N9K-1 appears to
receive its own Spanning Tree BPDU on Po2 for each non-vPC VLAN, even though N9K-2 is
the switch that originated the Spanning Tree BPDU. From a Spanning Tree perspective, N9K-1
places Po2 in a Blocking state for all non-vPC VLANs.
This is expected behavior. To prevent this behavior from occurring or to work around this issue,
both vPC peers must be configured with different Spanning Tree priority values on all non-vPC
VLANs. This allows one vPC peer to become the root bridge for the non-vPC VLAN and
transition the Layer 2 trunk between vPC peers to a Designated Forwarding state. Similarly, the
remote vPC peer transitions the Layer 2 trunk between vPC peers to a Designated Root state.
This allows traffic in non-vPC VLANs to flow across both vPC peers through the Layer 2 trunk.
Configuration
An example of how to configure the vPC Peer Switch feature can be found here.
In this example, N9K-1 is configured to be the Spanning Tree root bridge for VLANs 1, 10, and
20 with a priority of 0. N9K-2 is the secondary Spanning Tree root bridge for VLANs 1, 10, and
20 with a priority of 4096.
Next, we can enable the vPC Peer Switch feature through the peer-switch vPC domain
configuration command. This changes the Bridge ID within Spanning Tree BPDUs originated by
both vPC peers, which causes other bridges in the Spanning Tree domain to flush their local
MAC address tables for all affected VLANs.
N9K-1#configure terminal
N9K-1(config)#vpc domain 1
N9K-1(config-vpc-domain)#peer-switch
N9K-1(config-vpc-domain)#end
!
N9K-2#configure terminal
N9K-2(config)#vpc domain 1
N9K-2(config-vpc-domain)#peer-switch
N9K-2(config-vpc-domain)#end
!
You can verify that the vPC Peer Switch feature is operating as expected by validating both vPC
peers claim to be the root bridge for vPC VLANs with the show spanning-tree summary
command. This output should also state that the vPC Peer Switch feature is enabled and
operational.
Use the show spanning-tree vlan {x} command to view more detailed information about a
specific VLAN. The switch holding the Primary or Operational Primary vPC role has all of its
interfaces in a Designated Forwarding state. The switch holding the Secondary or Operational
Secondary vPC role has all of its interfaces in a Designated Forwarding state except for the vPC
Peer-Link, which is in a Root Forwarding state. Note that the vPC system MAC address
displayed in the output of show vpc role is identical to the Root Bridge ID and Bridge ID of
each vPC peer.
Finally, we can use the Ethanalyzer control plane packet capture utility on either vPC peer to
confirm that both vPC peers are originating Spanning Tree BPDUs with a Bridge ID and Root
Bridge ID containing the vPC system MAC address shared between both vPC peers.
!
N9K-1#ethanalyzer local interface inband display-filter stp limit-captured-frames 0
Capturing on inband
2021-05-13 01:59:51.664206 68:9e:0b:aa:de:d4 -> 01:80:c2:00:00:00 STP RST. Root =
0/1/00:23:04:ee:be:01
!
N9K-2#ethanalyzer local interface inband display-filter stp limit-captured-frames 0
Capturing on inband
2021-05-13 01:59:51.777034 68:9e:0b:aa:de:34 -> 01:80:c2:00:00:00 STP RST. Root =
0/1/00:23:04:ee:be:01
!
The impact of enabling the vPC Peer Switch enhancement varies depending on whether other
bridges in the Spanning Tree domain are connected to both vPC peers via a vPC, or if they are
redundantly connected to both vPC peers without a vPC.
Redundantly Connected Non-vPC Bridges If a non-vPC-connected bridge with redundant links
to both vPC peers (such that one link is in a Blocking state from a Spanning Tree Protocol
perspective) detects a change in the Spanning Tree root bridge advertised in Spanning Tree
BPDUs, the Root Port of the bridge may change between the two redundant interfaces. In turn,
this can cause other Designated Forwarding interfaces to immediately transition to a Blocking
state, then traverse the Spanning Tree Protocol finite state machine (Blocking, Learning, and
Forwarding) with pauses in between equivalent to the configured Spanning Tree Protocol
Forward Delay timer (15 seconds by default). The change in Root Port and subsequent traversal
of the Spanning Tree Protocol finite state machine can cause a significant amount of disruption
within the network. It is worth mentioning that this impact occurs whenever the vPC peer that is
presently the root bridge for the Spanning Tree domain goes offline (such as in the event of
power failure, hardware failure, or a reload). This behavior is not specific to the vPC Peer Switch
enhancement - enabling the vPC Peer Switch enhancement simply causes similar behavior as a
vPC peer going offline from a Spanning Tree perspective.
vPC-Connected Bridges
If a vPC-connected bridge detects a change in the Spanning Tree root bridge advertised in
Spanning Tree BPDUs, the bridge flushes dynamically-learned MAC addresses from its MAC
address table. While configuring the vPC Peer Switch feature, you can observe this behavior
under the following two scenarios:
When Spanning Tree priority values are configured to match between both vPC peers, the
Spanning Tree root bridge may change from one vPC peer to another if the vPC peer that was
previously not the root bridge has a lower system MAC address than the vPC peer that was
previously the root bridge.
An example of this scenario is shown in the vPC Peer Switch Configuration section of this
document.
1.When the vPC Peer Switch feature is enabled through the peer-switch vPC domain
configuration command, both vPC peers begin operating as root bridges of the Spanning Tree
domain. Both vPC peers begin originating identical Spanning Tree BPDUs asserting themselves
as the root bridge of the Spanning Tree domain.
2. In most scenarios and topologies, no data plane impact is observed as a result of either of these
two scenarios. However, for a short period of time, data plane traffic is flooded within a VLAN
due to unknown unicast flooding, as the destination MAC address of frames are not learned on
any switchport as a direct result of the flush of dynamically-learned MAC addresses. In some
topologies, this can cause brief periods of performance issues or packet loss if data plane traffic
is flooded to oversubscribed network devices within the VLAN. This can also cause issues with
bandwidth-intensive unidirectional traffic flows or silent hosts (hosts that primarily receive
packets and rarely send packets), as this traffic is flooded within the VLAN for an extended
period of time instead of being switched directly to the destination host as normal.
It is worth mentioning that this impact is related to the flush of dynamically-learned MAC
addresses from the MAC address table of bridges within the affected VLAN. This behavior is not
specific to the vPC Peer Switch enhancement or a change in root bridge - it can also be caused
by a Topology Change Notification generated due to a non-edge port coming up within the
VLAN.
In this topology, N9K-1 and N9K-2 are vPC peers in a vPC domain. N9K-1 is configured with a
Spanning Tree priority value of 0 for all VLANs, making N9K-1 the root bridge for all VLANs.
N9K-2 is configured with a Spanning Tree priority value of 4096 for all VLANs, making N9K-2
the secondary root bridge for all VLANs. Access-1 is a switch that is redundantly connected to
both N9K-1 and N9K-2 through Layer 2 switchports.
These switchports are not bundled into a port-channel, so Spanning Tree Protocol places the
link connected to N9K-1 in a Designated Root state and the link connected to N9K-2 in an
Alternate Blocking state.
Consider a failure scenario where N9K-1 goes offline due to a hardware failure, power failure, or
a reload of the switch. N9K-2 asserts itself as the root bridge for all VLANs by advertising
Spanning Tree BPDUs using its system MAC address as the bridge ID. Access-1 sees a change
in the root bridge's ID. Furthermore, its Designated Root port transitions to a down/down state,
which means the new Designated Root port is the link that was in an Alternate Blocking state
facing N9K-2.
This change in Designated Root ports causes all non-edge Spanning Tree ports to step through
the Spanning Tree Protocol finite state machine (Blocking, Learning, and Forwarding) with
pauses in between equivalent to the configured Spanning Tree Protocol Forward Delay timer (15
seconds by default). This process can be extremely disruptive to the network.
In the same failure scenario with the vPC Peer Switch enhancement enabled, both N9K-1 and
N9K-2 transmit identical Spanning Tree BPDUs using the shared vPC system MAC address as
the bridge ID. If N9k-1 fails, N9K-2 continues transmitting this same Spanning Tree BPDU. As
a result, Access-1 immediately transitions the Alternate Blocking link towards N9K-2 to a
Designated Root state and begin forwarding traffic across the link. Furthermore, the fact that the
Spanning Tree root bridge ID does not change prevents non-edge ports from stepping through
the Spanning Tree Protocol finite state machine, which reduces the amount of disruption
observed in the network.
In this topology, N9K-1 and N9K-2 are vPC peers in a vPC domain that perform inter-VLAN
routing between VLAN 10 and VLAN 20. N9K-1 is configured with a Spanning Tree priority
value of 0 for VLAN 10 and VLAN 20, making N9K-1 the root bridge for both VLANs. N9K-2
is configured with a Spanning Tree priority value of 4096 for VLAN 10 and VLAN 20, making
N9K-2 the secondary root bridge for both VLANs. Host-1, Host-2, Host-3, and Host-4 are all
continuously communicating with each other.
Consider a failure scenario where N9K-1 goes offline due to a hardware failure, power failure, or
a reload of the switch. N9K-2 asserts itself as the root bridge for VLAN 10 and VLAN 20 by
advertising Spanning Tree BPDUs using its system MAC address as the bridge ID. Access-1 and
Access-2 see a change in the root bridge's ID, and although the spanning tree remains the same
(meaning, the vPC facing N9K-1 and N9K-2 remains a Designated Root port) both Access-1 and
Access-2 flush their MAC address of all dynamically learned MAC addresses in VLAN 10 and
VLAN 20.
In most environments, the flushing of dynamically learned MAC addresses causes a minimal
amount of impact. No packets are lost (aside from those lost as they were transmitted to N9K-1
while it failed), but traffic is temporarily flooded within each broadcast domain as unknown
unicast traffic while all switches in the broadcast domain re-learn dynamic MAC addresses.
In the same failure scenario with the vPC Peer Switch enhancement enabled, both N9K-1 and
N9K-2 would be transmitting identical Spanning Tree BPDUs using the shared vPC system
MAC address as the bridge ID.
If N9k-1 fails, N9K-2 continues transmitting this same Spanning Tree BPDU. As a result,
Access-1 and Access-2 are unaware that any change in the Spanning Tree topology has taken
place - from their perspective, the root bridge's Spanning Tree BPDUs are identical, so there is
no need to flush dynamically learned MAC addresses from relevant VLANs. This prevents the
flooding of unknown unicast traffic in each broadcast domain in this failure scenario.