Layer 2 Loop Troubleshooting
Layer 2 Loop Troubleshooting
Layer 2 Loop Troubleshooting
Issue 01
Date 2016-10-25
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Contents
1.6.4.3 Switch Ports Connected to Terminals Are Not Configured as STP Edge Ports. When Booting from the Network
Adapter, Some Terminals Cannot Obtain IP Addresses.....................................................................................................42
1.6.4.4 STP Convergence Cannot Be Adjusted in Other MSTIs But Not MSTI 0 Because MST Region Configurations
Are Different......................................................................................................................................................................44
1.6.4.5 Inconsistent MSTP Packet Formats Cause Ports to Be Down...............................................................................46
1.6.4.6 RRPP Multi-Instance Causes a Temporary RRPP Loop........................................................................................48
1.6.4.7 RRPP Master Node's Working Mode Is Different from That of Transit Nodes, Which Makes MAC Entries Fail
to Be Updated.....................................................................................................................................................................50
1.6.4.8 Users on a Transit Node and Downstream Nodes Connected to the Transit Node Cannot Go Online..................51
1.6.4.9 An RRPP Loop Occurs Due to Original Multi-Instance Configuration.................................................................53
1.6.4.10 loopback internal Causes a Loop..........................................................................................................................55
1.6.4.11 Services Are Interrupted After Smart Link Master and Slave Interfaces Are Switched.......................................56
1.6.5 Improper Configurations...........................................................................................................................................58
1.6.5.1 A Large Number of TC BPDUs Cause an ARP Learning Error on a Modular Switch..........................................58
1.6.5.2 Many TC BPDUs Cause a High CPU Usage.........................................................................................................60
1.6.5.3 An MSTP Loop Causes a High CPU Usage...........................................................................................................62
1.6.5.4 STP Convergence Is Abnormal When an S9300 Interface Processes BPDUs.......................................................63
1.6.5.5 STP Flapping Occurs Because the STP Timeout Interval on the ATAE Device Is Incorrectly Calculated............65
1.6.5.6 RSTP Cannot Provide Fast Convergence When the S6500 Port Connected to the S6500 Changes from Down to
Up.......................................................................................................................................................................................66
1.6.5.7 Unicast Suppression Causes RRPP Flapping for One Hour...................................................................................68
1.6.5.8 Unknown Unicast Suppression Causes RRPP Flapping........................................................................................70
1.6.5.9 Services on the RRPP Ring Consisting of CX600 and S3300 Are Interrupted......................................................72
1.6.5.10 ERPS Becomes Invalid When RTN Interconnects with an S Switch...................................................................73
1.6.6 Pseudo Loops.............................................................................................................................................................75
1.6.6.1 MAC Address Flapping Occurs But No Loop Is Detected....................................................................................75
1.6.7 Others........................................................................................................................................................................76
1.6.7.1 The S2300SI Configured with Loopback Detection Cannot Detect Loops...........................................................76
1.6.7.2 The OSPF Neighbor Relationship Is Down Due to a Loop on the S Switch.........................................................77
1.6.7.3 Packet Loss Due to a Loop in Layer 2 Forwarding................................................................................................78
1.7 FAQ..............................................................................................................................................................................79
1.7.1 Can a Switch Transparently Transmit BPDUs?.........................................................................................................79
1.7.2 What Are the Basis for STP Calculation? Will STP Topology Be Changed When Port Rate Is Changed?..............80
1.7.3 Does a Switch Support MAC Address Flapping Detection?.....................................................................................80
1.7.4 After LDT or LBDT Detects a Loop on an Interface, the Interface Is Blocked. Can the Blocked Interface Continue
to Send Protocol Packets?..................................................................................................................................................81
1.7.5 Can Loopback Detection Be Used with VLAN Mapping on an Interface?..............................................................81
1.7.6 What Is the Destination MAC Address of SEP Packets?..........................................................................................81
1.7.7 How Many Modes Is Available to Block an Interface?.............................................................................................81
1.7.8 After the SEP Topology Changes, Which Ring Network Protocols Will Update Their Forwarding Tables?...........81
1.7.9 What Is the Destination MAC Address of RRPP Packets?.......................................................................................81
1.7.10 What Are the Notes About Configuring RRPP?......................................................................................................82
1.7.11 How Does RRPP Implement Fast Switching?.........................................................................................................82
1.7.12 Why Does the display Command Not Display Statistics About Health Packets on an RRPP Transit Node?.........82
1.7.13 How Is Load Balancing Implemented When RRPP Is Deployed?..........................................................................83
1.7.14 What Is the Maximum Number of Devices That Can Be Deployed in an RRPP Ring?.........................................83
1.7.15 Do S Series Switches Support Sub-rings?...............................................................................................................83
1.7.16 Can ERPS Be Used with Other Ring Network Protocols on the Same Network?..................................................83
1.7.17 Does ERPS on S Series Switches Support Load Balancing?..................................................................................83
1.7.18 Can ERPS Be Configured on an Eth-Trunk?..........................................................................................................83
1.1 Overview
1.2 How to Detect a Loop
1.3 How to Quickly Remove a Loop
1.4 How to Find Out the Root Cause
1.5 Hardening and Optimizing the Network
1.6 Typical Loop Troubleshooting Cases
1.7 FAQ
1.1 Overview
Definition
To improve reliability of an Ethernet switching network, device redundancy and link
redundancy are commonly used methods. However, many factors such as networking
adjustment, configuration modification, and upgrade/migration, loops may still occur. In
Figure 1.1, loops will occur if each two devices are connected, and broadcast storm will occur
if no loop prevention protocol is configured or network configurations are modified.
The major harm of Layer 2 loop is broadcast storm. If no loop has occurred on an Ethernet,
broadcast Ethernet frames are flood on the network to ensure that they can be received by
every device. With sufficient bandwidth, each bridge forwards received broadcast frames to
all interfaces except the interface receiving these frames. However, if a loop occurs, this
broadcast mechanism will affect the entire network.
When broadcast storm is generated, Ethernet frames are forwarded permanently, and the
forwarding speed reaches or approximates the line speed on an interface to consume link
bandwidth. According to Ethernet forwarding rules, these broadcast frames are copied to all
interfaces. Therefore, the entire network is full of broadcast frames. Assume that an Ethernet
uses GE connections, every link is full of broadcast frames at the speed of 1000 M/s. Other
data packets cannot be forwarded.
In a broadcast domain, if Layer 2 devices forward broadcast frames repeatedly, broadcast
storm will occur. The broadcast storm causes the MAC table unstable, affects services,
degrades communication quality, or even interrupts communication.
To prevent loops and ensure network reliability, the following loop prevention protocols can
be configured on switches:
STP/RSTP/MSTP
RRPP
SEP
Smart Link
ERPS
In addition, Huawei S series switches support the following loop detection functions:
Loop Detection
Loopback Detection
This document describes how to identify Layer 2 loops.
Purpose
This is a guide for Huawei engineers to remove Layer 2 loops, including:
Helping frontline service engineers describe the fault symptom and determine fault range
Helping GTAC engineers collect NE information, analyze abnormalities of NEs, and
quickly locate the faulty NE and service
Helping R&D engineers locate the fault
PHY: Physical
*down: administratively down
^down: standby
~down: LDT down
#down: LBDT down
(l): loopback
(s): spoofing
(E): E-Trunk down
(b): BFD down
(e): ETHOAM down
(dl): DLDP down
(d): Dampening Suppressed
(ld): LDT block
(lb): LBDT block
InUti/OutUti: input utility/output utility
Interface PHY Protocol InUti OutUti inErrors outErrors
Ethernet0/0/0/0 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/2 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/16 up up 0.01% 0.56% 0.56% 0
GigabitEthernet1/0/12 up up 0.01% 0.56% 0.56% 0
Last query:
<HUAWEI> display interface brief| include up
PHY: Physical
*down: administratively down
^down: standby
~down: LDT down
#down: LBDT down
(l): loopback
(s): spoofing
(E): E-Trunk down
(b): BFD down
(e): ETHOAM down
(dl): DLDP down
(d): Dampening Suppressed
(ld): LDT block
(lb): LBDT block
InUti/OutUti: input utility/output utility
Interface PHY Protocol InUti OutUti inErrors outErrors
Ethernet0/0/0/0 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/2 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/16 up up 76% 76% 0 0
GigabitEthernet1/0/12 up up 76% 76% 0 0
Compare the displayed network traffic with the service traffic collected when network
services are normal. You can obtain the service traffic bandwidth from the network
monitoring diagram on the NMS.
Determine whether a loop has occurred:
− If the current network traffic volume is much higher than service traffic volume in
normal situation, a Layer 2 loop may be occurring.
− If current traffic volume is normal and broadcast suppression is not configured (the
broadcast-suppression { percent-value | cir cir-value [ cbs cbs-value ] | packets
Frames: 0
Total Error: 0
CRC: 0, Giants: 0
Jabbers: 0, Fragments: 0
Runts: 0, DropEvents: 0
Alignments: 0, Symbols: 0
Ignoreds: 0
Total Error: 0
Collisions: 0, ExcessiveCollisions: 0
Late Collisions: 0, Deferreds: 0
Buffers Purged: 0
----End
All fixed and modular switches of all versions support MAC address flapping prevention
configurations including alarm generation and interface blocking upon MAC address
flapping.
MAC address flapping detection commands and alarms differ for fixed and modular switches
of different versions.
1. Modular switches:
− On a switch running V100R002, global MAC address flapping detection can take
effect only on non-S series boards. In addition, when detecting a MAC address
flapping, the switch can only send a trap. Run the following command to enable
MAC address flapping detection:
[HUAWEI] mac-flapping alarm enable
− In V100R003 and later versions, the switch supports VLAN-based MAC address
flapping detection and can perform actions when MAC address flapping is detected.
Run the following commands in the system or VLAN view to enable MAC address
flapping detection:
가 System view:
[HUAWEI] loop-detect eth-loop alarm-only
나 VLAN view:
<HUAWEI> system-view
[HUAWEI] vlan 10
[HUAWEI-vlan10] loop-detect eth-loop alarm-only
After enabling MAC address flapping detection, run the display trapbuffer command to
view MAC address flapping traps (OID: 1.3.6.1.4.1.2011.5.25.160.3.7 or OID:
1.3.6.1.4.1.2011.5.25.42.2.1.7.12). Table 1.1 describes MAC address flapping detection
traps in different versions.
Table 1.1 MAC address flapping detection traps on modular switches of different versions
Version Trap
Version Trap
2. Fixed switches:
Fixed switches (excluding the S2300/S2700) of V100R003 and later versions do not
support global MAC address flapping detection. They support only VLAN-based MAC
address flapping detection and actions such as sending traps and blocking interfaces
when MAC address flapping is detected. Run the following commands to enable MAC
address flapping detection:
VLAN view:
<HUAWEI> system-view
[HUAWEI] vlan 10
[HUAWEI-vlan10] loop-detect eth-loop alarm-only
After enabling MAC address flapping detection, run the display trapbuffer command to
view MAC address flapping traps (OID: 1.3.6.1.4.1.2011.5.25.160.3.7 or OID:
1.3.6.1.4.1.2011.5.25.42.2.1.7.12). Table 1.2 describes MAC address flapping detection
traps in different versions.
Table 1.2 MAC address flapping detection traps on fixed switches of different versions
Version Trap
V100R003 L2IF/4/MFLPPORTRESUME:OID
1.3.6.1.4.1.2011.5.25.160.3.7 Loop exist in
vlan
for(hwMflpVlanId:"[1001]";hwMflpVlanCf
gAlarmReason:"[for flapping mac-address
Version Trap
Run the display loop-detection interface command to check the status of a specified port.
<Quidway> display loop-detection interface gigabitethernet 1/0/0
The port is enable.
The port's status list:
Status WorkMode Recovery-time EnabledVLAN
-----------------------------------------------------------------------
Normal Shutdown 200 556
Loopback Detection
Fixed switches of all versions support loopback detection, and modular switches of
V200R001 and later versions support loopback detection.
After loopback detection is configured on a port, the port starts to send detection packets.
In a version before V200R003, the switch can detect a loop only when the detection
packets are sent and received by the same port. In V200R003 and later versions,
loopback detection allows the switch to detect a loop even if the detection packets are
sent and received by different ports.
Enable loop detection:
[HUAWEI] loopback-detect enable
[HUAWEI] loopback-detect packet vlan { vlan-id1 [ to vlan-id2 ] } &<1-8>
After loopback detection is enabled, run the display loopback-detect command to view
the configuration and port status.
<Quidway> display loopback-detect
Loopback-detect is enabled in the system view
Loopback-detect interval: 30
Loopback-deteck sending-packet interval: 5
Interface ProtocolID RecoverTime Action Status
-------------------------------------------------------------------------------
-
GigabitEthernet0/0/2 602 30 block NORMAL
The traps vary according to software versions. Table 1.4 provides loopback detection
trap messages in different versions.
Loop detection or loopback detection cannot be configured on upstream ports, because the
switch will be out-of-management if the upstream port is blocked. Tell the customers about
the risks of the configuration and obtain permission of the customers.
To view CPU usage, run the display cpu-usage command in any view.
If the PPI task has a high CPU usage, there is a high possibility that a loop occurs.
<HUAWEI> display cpu-usage
CPU utilization for five seconds: 10%: one minute: 10%: five minutes: 10%
......
PPI 70% 0/ 512f8c PPI Product Process
Interface
......
If the CPU usage of the PPI task is normal, run the display cpu-defend statistics [
packet-type packet-type ] { all | slot slot-id | mcu } command to check whether protocol
packets are discarded by CPCAR. If so, a loop may have occurred; otherwise, find out
the cause.
For example, to view VRRP packet statistics, run the display cpu-defend vrrp statistics
all. When information similar to the following is displayed, VRRP packets have been
lost because of a loop.
<HUAWEI> display cpu-defend vrrp statistics all
Statistics on mainboard:
-------------------------------------------------------------------------------
Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)
-------------------------------------------------------------------------------
vrrp 0 0 0 0
-------------------------------------------------------------------------------
Statistics on slot 1:
-------------------------------------------------------------------------------
Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)
-------------------------------------------------------------------------------
vrrp 0 0 0 0
-------------------------------------------------------------------------------
Statistics on slot 4:
-------------------------------------------------------------------------------
Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)
-------------------------------------------------------------------------------
Do not affect the intermediate devices, ports, and VLANs related to remote login; otherwise,
the switch may be out-of-management or cannot be accessed.
Manual loop removal is required when a network storm seriously affects services and services
need to be restored as soon as possible. Three manual loop removal methods are available:
Remove a port from the VLAN where the loop is detected.
This method has little impact on the network. Table 1.1 describes commands to be
executed on ports of different types.
Access undo port default vlan This command may affect services
on the downstream device. Use it
with caution.
Trunk undo port trunk allow-pass vlan id None.
Hybrid undo port hybrid vlan id After this command is executed, the
port treats tagged and untagged
packets in the same way.
1.6.4.1Services Are Interrupted When Ports Are Not Deleted from VLAN 1
1.6.4.6RRPP Multi-Instance Causes a Temporary RRPP Loop
Shut down the port where the loop is generated.
This method can be used to remove a loop.
Before running the shutdown command in the interface view, ensure that data service
will not be affected. That is, the devices can communicate with each other in all VLANs.
This method is used in the following cases:
1.6.2.2Incorrect Device Connections Cause Broadcast Storm
1.6.3.3Network Construction Causes a Loop
Remove the optical fiber from the port where the loop is occurring.
This method can be used to remove a loop.
This method is similar to shutting down the port where the loop is occurring, and is used
only when you cannot log in to the switch.
This method is used in the following case:
1.6.2.1RRPP Does Not Take Effect on an S9300 Because a Board Is Loose
Step 3 Check whether services are recovered.
Test communication quality by performing the operations such as ping and check whether
services are recovered.
Generally, there are redundant links and configurations in a ring topology; therefore, services
can automatically recover after the loop is eliminated.
----End
The loop is often caused by incorrect connections of fibers or network cables. To solve the
problem, correctly connect fibers or network cables.
In the following cases, loopback on a single interface occurs:
1.6.4.10loopback internal Causes a Loop
1.6.5.9Services on the RRPP Ring Consisting of CX600 and S3300 Are Interrupted
Prerequisite: No loop prevention protocol such as STP or LDT is configured on a switch, and
the local device does not have a loop.
Symptom: Traffic volume increases continuously in the inbound and outbound directions of
an interface, and a loop occurs on the downstream device.
Cause: A loopback occurs in the downlink or a self-loop occurs.
Handling Method:
a. Search for the link where a loop occurs hop by hop.
b. Disable internal loopback on an interface of the downstream device.
c. The following loops are caused by link loops:
Packets sent by an interface on the downstream device are received by the same interface, and
a loop occurs between two interfaces of a downstream device. The loop is often caused by
incorrect connections of fibers or network cables. To solve the problem, correctly connect
fibers or network cables.
In the following cases, loopback occurs on the downstream device:
1.6.1.1MAC Address Flapping Occurs on a Non-Huawei Device
1.6.5.5STP Flapping Occurs Because the STP Timeout Interval on the ATAE Device Is
Incorrectly Calculated
Prerequisite: A loop prevention protocol such as STP, RRPP, SEP, and SMLK is configured.
Symptom: Network convergence temporarily becomes invalid, or flapping remains
continuously.
Cause: Link flapping occurs, causing protocol packet forwarding failure and frequently
flapping due to timeout. For example:
Packets over the link are lost or error packets occur. That is, protocol packets are
discarded.
Protocol packets are discarded due to unknown unicast suppression or improper QoS
configuration.
Handling Method:
If error packet or packet loss occurs, replace the problematic network cable, fiber, or
optical module.
If packets are discarded to due the suppression function, modify the suppression and QoS
configurations.
Check whether network congestion causes protocol packet loss, and interfaces are
unblocked due to protocol timeout, to form a loop. If this problem occurs, the network
needs to be optimized.
In the following cases, a loop occurs between two ports, causing protocol flapping:
1.6.3.3Network Construction Causes a Loop
1.6.7.2The OSPF Neighbor Relationship Is Down Due to a Loop on the S Switch
1.6.5.1A Large Number of TC BPDUs Cause an ARP Learning Error on a Modular
Switch
1.6.5.8Unknown Unicast Suppression Causes RRPP Flapping
1.6.5.7Unicast Suppression Causes RRPP Flapping for One Hour
Prerequisite: Layer 2 network convergence is normal, and blocked port status is correctly
delivered.
Symptom: MAC flapping alarms are frequently generated on LSW 3. It is suspicious that a
loop has occurred.
Cause: When the Layer 2 edge devices, such as STB, from some vendors cannot process
packets, they send the packets back.
Handling Method: Replace the edge devices.
In the following cases, the pseudo loop is caused by abnormal packet forwarding on the
downstream device:
1.6.1.2ATAEs Fail to Interwork with MSTP-enabled Switches Due to a Software Problem
1.6.5.10ERPS Becomes Invalid When RTN Interconnects with an S Switch
application.
6 Log Suggested Help R&D Obtaining Collect the
engineers files through logs
check FTP or recorded
whether the TFTP from 24
problem is hours before
caused by the problem
unknown occurs till
reasons. now.
7 Diagnostic Suggested Help R&D Obtaining Collect the
log engineers files through diagnostic
check FTP or logs
whether the TFTP recorded
problem is from 24
caused by hours before
unknown the problem
reasons. occurs till
now.
8 STP STP issue Allow R&D Collecting Run the
calculation Optional engineers to information display stp
historical analyze using history
records (if protocol commands command in
enabled) calculation on each the hidden or
process. device diagnostic
view on each
device.
9 display Suggested The device- Collecting Complete
diagnostic- level information information
information diagnostic using on each
(optional, information commands device
about 3 allows R&D on each
minutes per engineers to device
device) exclude
unknown
reasons and
issues.
If physical link quality is poor, a loss of protocol packets will cause a temporary loop. Check
the link and replace the fiber or optical module.
If protocol packets are discarded due to insufficient bandwidth, expand bandwidth or
configure link aggregation to improve link reliability.
Step 3 Configure broadcast suppression to improve network robustness.
To prevent loops from occurring again, configure broadcast suppression on the ports on the
ring. Based on experience, setting the broadcast suppression rate to 5% can effectively
prevent broadcast storm. You can also set the suppression rate according to the concurrent
broadcast traffic volume on the live network.
Step 4 Configure QoS to ensure preferential forwarding of protocol packets.
If protocol packets cannot be promptly forwarded due to network congestion, configure QoS
to ensure a high priority of protocol packets.
Case:
1.6.5.9Services on the RRPP Ring Consisting of CX600 and S3300 Are Interrupted
Step 5 Optimize the network structure.
Plan the access layer and aggregation layer properly.
If too many devices are located at one layer, allocate them into different domains according to
logical organization and physical locations.
Cases:
1.6.2.2Incorrect Device Connections Cause Broadcast Storm
1.6.3.1Improper Server Networking Causes MAC Address Flapping
----End
Network Diagram
In Figure 1.1, a firewall connects to three switches.
Symptom
MAC address 00e0-fc09-bcf9 flaps on the firewall, affecting service forwarding.
Cause Analysis
On Huawei switches, only NDP uses the MAC address 00e0-fc09-bcf9 as the source MAC
address of protocol packets. NDP is enabled by default. Therefore, the firewall reports MAC
address flapping in this scenario, which affects service forwarding on the firewall. Usually,
such MAC address flapping does not affect services (unless an action is configured for MAC
address flapping on the device).
The NDP packets are BPDUs. In the latest version, neither the switch or the firewall learns
MAC addresses from BPDUs.
Handling Procedure
Run the ndp disable command to disable NDP globally.
Network Diagram
In Figure 1.1, ATAE devices, Switch-1, and Switch-2 form a square-shaped loop.
Symptom
After STP is enabled, STP convergence is abnormal. Both Switch-1 and ATAE-SW-8 are root
bridges; ports connecting the switches and ports connecting the ATAE devices are normally
converged. However, ports connecting Switch-1 and Switch-2 to the ATAE devices are not
normally converged.
Cause Analysis
Switch-1 is the root bridge, and its system MAC address is 4c1f-cc82-d659. The software
version of ATAE devices is V200R013SPC005, and this version has a software problem: it
cannot normally process STP packets whose root bridge MAC address ends with 59.
Handling Procedure
1. Check STP convergence on each port. The result shows that two root bridges exist.
Both Switch-1 and ATAE-SW-8 are STP root bridges.
<ATAE-SW-8> disply stp brief
MSTID Port Role STP State Protection
0 GigabitEthernet0/7 DESI FORWARDING BPDU
0 GigabitEthernet0/15 DESI FORWARDING NONE //ATAE
interconnection
0 GigabitEthernet0/18 DESI FORWARDING NONE //Connecting to
Switch-2
----[Port18(GigabitEthernet0/18)][FORWARDING]----
Port Protocol :enabled
Port Role :CIST Designated Port
Port Priority :128
R&D engineers of ATAE confirm that the faulty ATAE uses a switching unit running the
V200R013SPC005 version. This version has a known software problem: it cannot normally
process STP packets whose root bridge MAC address ends with 59. This problem is solved in
V200R013SPC006 and later versions. After the root bridge is switched to Switch-2, MSTP
convergence becomes normal.
<ATAE-SW-8> display version
VRP (R) Software, Version 3.10, RELEASE 0010
Copyright (c) 2000-2008 HUAWEI TECH CO., LTD.
uptime is 0 week,0 day,2 hours,38 minutes
Upgrade the switching unit version of ATAE to the latest version V200R013SPC007.
1.6.1.3 RRPP Temporary Loop Occurs Because the Interface Up Time on the
Switch and CX600 Is Different
Network Diagram
In Figure 1.1 and , the S5700 has RRPP enabled; the S5700_1 and S5700_2 are used as the
master nodes of RRPP domains 1 and 2 respectively; other S5700s function as transit nodes;
the CX600s are not enabled with RRPP and they use different VPLS VSIs to transparently
transmit RRPP packets and data packets.
Figure 1.1 RRPP temporary loop occurs because the interface Up time on the switch and router is
different
Symptom
When the LPU in slot 1 of CX600_1 fails and CX600_1 restarts, the Up time of GE1/1/1 on
CX600_1 is 8s or even 1 minute longer than the Up time of GE0/0/1 on S5700_1. After the
faulty LPU restarts, a temporary loop occurs for several seconds, which may cause service
exceptions.
Cause Analysis
1. After the board on a CX600 restarts, the bottom-layer physical status becomes Up first
no matter whether the interface negotiation modes are forcible or auto-negotiation. If the
system finds that board configuration restoration is not finished, the system does not
report physical status Up to the software layer. The route interface goes Up after one
minute. Therefore, the interface Up time of the router is longer than that of the switch.
2. The switch interface goes Up first. RRPP unblocks interface 6s after the interface goes
Up. At this time, the router has not reported Up to the software layer. When the software
layer of router reports Up, some data VSIs start to transparently transmit data packets.
The RRPP VSI of the router may be enabled late or cannot transparently transmit packets
in a short time the VSI is enabled. The LPU of the CX600 is busy and RRPP VSI is not
enabled, so a temporary loop occurs. According to the service configuration on the LPU
of the CX600, a temporary loop may last for about 10s. If the intermediate switch
receives many packets, eliminating the loop may require longer time.
Handling Procedure
Optimize the CX600 so that the CX600 can rapidly report the Up event.
Network Diagram
In Figure 1.1, four S9300 switches form an RRPP ring. The slave port on the master node is
not blocked.
Symptom
The slave port of the master node on the RRPP ring network is not blocked.
Cause Analysis
The HG port on the MPU does not forward RRPP packets because the board is loose.
Handling Procedure
1. It is suspected that the RRPP delivery is abnormal.
2. Run the display diagnostic-information command to collect device information. The
command output shows that the HG port is not in the control VLAN. There is a
possibility that packets are discarded because the channel is unstable.
3. If the channel is unstable, remove and reinstall the board. If the problem is fixed, the
fault is caused by improper board connection.
4. Packet forwarding is normal, and fault is fixed.
Root Cause
Connected interfaces between switches are often access interfaces, VLAN planning and
assignment are improper, and connections are complex. In this situation, incorrect connections
cause loops and the upper-layer core device is affected.
Identification Method
Focus on the solution or workaround.
Solution
1. Provide proper network plan and VLAN assignment, reduce unnecessary connections,
and enable storm control.
2. Review the network plan if the networking is complex.
3. During network deployment and commissioning, shut down all interfaces connected to
the live network.
4. When the interface connected to the live network is restored, check whether there is
unexpected broadcast or multicast traffic on the interface for at least 20 minutes. If an
exception is detected, shut down the uplink interface.
5. If the indicator of a switch interface blinks fast or is steady on, heavy traffic may be
transmitted on the interface. Check whether there are loops.
Conclusion
None.
Network Diagram
In Figure 1.1, two servers have their NICs bound together and forward packets in load
balancing mode. The two NICs share the same IP address and MAC address.
Figure 1.1 MAC address flapping and ARP flapping on modular switches cause service
interruption
Symptom
MAC address flapping persists on a switch. ARP entries of the servers are learned to the
interconnected ports of the two switches. As a result, external access to the server is
intermittently interrupted.
Cause Analysis
1. The ports on two switches connected to servers alternates between Up and Down and
MAC address flapping occurs on servers. The ports between two switches and ports
connected to servers have learned the servers' MAC addresses.
2. When a user requests to access a server through Switch-1, Switch-1 searches for the
outbound interface according to MAC address entries. Due to MAC address flapping,
there are two outbound interfaces (downstream interface GE4/0/9y connected to server
and Eth-Trunk1 connected to another switch). If Eth-Trunk1 is selected, the packets are
sent to Switch-2. Switch-2 has learned the MAC address of server on the interface
connected to Switch-1, so Switch-2 discards the packets (depending on the Layer 2 loop
prevention mechanism).
Handling Procedure
1. The two servers are bound together in load balancing mode but connected to two
independent switches. The network is not symmetrical. It is recommended that the two
servers be configured to work in active/standby mode. This configuration will solve the
problem of MAC address flapping.
2. If load balancing between servers and cross-device networking are required, you are
advised to configure CSS on the switches and load balancing on CSS links.
Network Diagram
In Figure 1.1, two S series switches and the ATAE switching boards form an STP ring. The
two ATAE switching boards can be considered as two switches that are connected through
GE0/15 ports. Swtich-1 is the root bridge and Swtich-2 is the backup root bridge. Eth-Trunk 0
is created between Switch-1 and Switch-2. In normal situations, GE0/19 of ATAE slot8 is the
blocked port. Switch-1 and Switch-2 have VRRP enabled and function as gateways of the
ATAE switching boards.
Symptom
When a network fault occurs, service traffic sent by the ATAE switching boards is interrupted.
Services are temporarily recovered after Swtich-1 is powered off.
Cause Analysis
On Switch-1, root protection is enabled on the ports connected to Switch-2 and ATAE slot7.
After an O&M switch with a higher priority is incorrectly connected to the network, root
protection takes effect. All ports that have root protection enabled are blocked and services are
interrupted.
Handling Procedure
After the fault occurs, check the VRRP state on Switch-1 and Switch-2. Both of the two
switches are the Master, indicating that VRRP heartbeat packet forwarding is faulty.
Normally, VRRP heartbeat packets are forwarded through the Eth-Trunk between the two
switches. If the Eth-Trunk negotiation fails after the fault occurs, STP reconverges and
heartbeat packets are forwarded through the ATAE switch board.
Power on Switch-1 but do not connect it to the network. Check the configuration file of
Switch-1. The configuration file shows that STP root protection (stp root-protection) is
enabled on all ports in Up state. After receiving STP BPDUs with a higher priority, the ports
enter the Discarding state and stop forwarding packets. Because Switch-1 is restarted, it is
unknown whether Switch-1 receives packets with a higher priority when the fault occurs.
Analyze the STP history calculation information of the ATAE switching board.
According to the STP history calculation information, GE0/19 of ATAE slot8 receives STP
BPDUs from the device whose MAC address is 000f-e2f6-1d18 and the priority is 0,
triggering STP recalculation.
GigabitEthernet0/19 Alte->Desi at 2011/10/29 04:38:06
{0.5489-98f5-26bf 18 4096. 5489-98f5-834d 0 4096. 5489-98f5-834d 128.18}
STP selects the root bridge according to the bridge ID (the bridge priority and MAC address).
When two devices have the same bridge priority, the device with a smaller system MAC
address has a smaller bridge ID and a higher priority. When the fault occurs, ATAE slot8
receives STP BPDUs with a higher priority (0.000f-e2f6-1d18) than the priority (0.000f-e2f6-
26bf) of the original root bridge Switch-1. As a result, ports configured with STP root
protection on Switch-1 are blocked. VRRP heartbeat packets cannot be forwarded between
Switch-1 and Switch-2. Both the two switches become the VRRP master and services are
interrupted.
It is found that 000f-e2f6-1d18 is the system MAC address of the O&M switch connected to
GE0/17. The switch is incorrectly connected to the network when the fault occurs.
Disable STP on ports that are not added to the STP ring on the ATAE switching board.
Network Diagram
After network (in Figure 1.1) restructuring and migration, the original core devices (Layer 3
devices) are re-deployed as access devices AS (Layer 2 devices). VRRP is configured on
DS_01 and DS_02.
Symptom
Ping the management IP address of the AS on the Layer 3 device DS. The command output
shows that the ping fails and the VRRP group status of the DS frequently alternates between
master and backup.
The following traps are reported on DS_02:
Sep 17 2013 21:46:11+08:00 DS_02 VRRP/3/VRRPMASTERDOWN:OID
1.3.6.1.4.1.2011.5.25.127.2.30.1 The state of VRRP changed from master to other
state.(VrrpIfIndex=143, VrId=48, IfIndex=143, IPAddress=11.91.127.239,
NodeName=DS_02, IfName=Vlanif948, CurrentState=2, ChangeReason=priority
calculation)
The VRRP group status frequently alternates. Check the VRRP group status after the
switchover. All VRRP groups are in Backup state.
<DS_02> display vrrp brief
VRID State Interface Type Virtual IP
--------------------------------------------------------
3 Backup Vlanif903 Normal 10.93.4.30
5 Backup Vlanif599 Normal 11.91.127.94
14 Backup Vlanif914 Normal 10.93.41.126
24 Backup Vlanif924 Normal 10.93.32.126
25 Backup Vlanif925 Normal 10.93.32.254
…………
Cause Analysis
A loop exists on the network.
Handling Procedure
1. Run the display cpu-defend vrrp statistics all command to check statistics of the VRRP
packets. The command output shows a large number of packets are dropped on DS_02.
[DS_02] display cpu-defend vrrp statistics all
Statistics on mainboard:
-------------------------------------------------------------------------------
vrrp 0 0 0 0
-------------------------------------------------------------------------------
Statistics on slot 1:
-------------------------------------------------------------------------------
vrrp 0 0 0 0
-------------------------------------------------------------------------------
Statistics on slot 4:
-------------------------------------------------------------------------------
2. Run the display interface brief command to check interface bandwidth usage.
[DS_02] display interface brief
…………
Interface PHY Protocol InUti OutUti inErrors outErrors
Eth-Trunk1 up up 31% 31% 0 0
GigabitEthernet4/0/22 up up 0.72% 81% 0 0
GigabitEthernet4/0/23 up up 81% 0.73% 2 0
Ethernet0/0/0 down down 0% 0% 0 0
…………
GigabitEthernet4/0/0 up up 0% 81% 0 0
GigabitEthernet4/0/1 up up 0% 81% 0 0
GigabitEthernet4/0/2 up up 0% 81% 2 0
GigabitEthernet4/0/3 up up 0% 81% 0 0
GigabitEthernet4/0/4 up up 0% 81% 0 0
GigabitEthernet4/0/5 up up 0% 81% 0 0
GigabitEthernet4/0/6 up up 0% 81% 0 0
GigabitEthernet4/0/7 up up 0% 81% 0 0
GigabitEthernet4/0/8 up up 0% 82% 0 0
GigabitEthernet4/0/9 up up 0% 82% 0 0
GigabitEthernet4/0/10 up up 0% 82% 0 0
GigabitEthernet4/0/11 down down 0% 0% 0 0
GigabitEthernet4/0/12 up up 0% 82% 0 0
GigabitEthernet4/0/13 up up 0% 82% 0 0
GigabitEthernet4/0/14 up up 0% 82% 0 0
GigabitEthernet4/0/15 up up 0% 82% 0 0
GigabitEthernet4/0/16 up up 0% 82% 0 0
GigabitEthernet4/0/17 up up 0.01% 82% 0 0
GigabitEthernet4/0/18 up up 82% 0% 0 0
GigabitEthernet4/0/19 up up 87% 82% 0 0
GigabitEthernet4/0/20 down down 0% 0% 0 0
GigabitEthernet4/0/21 up up 0.01% 0.01% 0 0
LoopBack500 up up(s) 0% 0% 0 0
NULL0 up up(s) 0% 0% 0 0
Vlanif599 up up -- -- 0 0
…………
As shown in the preceding information, the outbound traffic on the interface connecting
to the AS reaches 80%, which indicates that a loop occurs. The inbound traffic on
GigabitEthernet4/0/18 and GigabitEthernet4/0/19 also reaches 80%, which indicates that
the loop occurs on the AS devices connected to the two interfaces. Manually shut down
the two interfaces, and then check CPU-defend statistics and ping the management IP
address of another AS. You can find that the number of dropped VRRP packets stop
increasing and the ping command succeeds.
3. Interfaces GigabitEthernet4/0/18 and GigabitEthernet4/0/19 connect to AS_03 and
AS_05 respectively. Both are non-Huawei and Layer 3 devices, on which STP is
disabled. When the two devices are re-deployed as Layer 2 devices, the command for
enabling STP is not configured, resulting in the loop.
Enable STP, and then enable GigabitEthernet4/0/18 and GigabitEthernet4/0/19 on the
DS. Check STP status and traffic on the interfaces, and you can find that services are
recovered.
Network Diagram
In Figure 1.1, SwitchA, SwitchB, SwitchC, SwitchG, SwitchF, and SwitchE form SEP
Segment 1, while SwitchC, SwitchD, and SwitchE form SEP Segment 2.
Figure 1.1 SEP deletion on a faulty port causes the switch to be out of management
Symptom
The link between SwitchC and SwitchD is faulty. After the SEP configuration is deleted on
the faulty port of SwitchD, SwitchD is out of management.
Cause Analysis
When the link between SwitchC and SwitchD is faulty, the blocked port in SEP Segment 2 is
unblocked. The two faulty ports are in Discarding state. After the SEP configuration is
deleted on the faulty port of SwitchD, SEP Segment 2 selects a new blocked port from the two
connected ports of SwitchD and SwitchE. Both links connecting SwitchD to SwitchC and
SwitchE fail. As a result, SwitchD cannot be managed.
Handling Procedure
Run the display sep topology segment segment-id command to view the current topology
information and locate the faulty port.
<SwitchD> display sep topology segment 2
SEP segment 2
SEP detects a segment failure that may be caused by an incomplete topology
-----------------------------------------------------------------
System Name Port Name Port Role Port Status
-----------------------------------------------------------------
SwitchE GE0/0/3 secondary forwarding
SwitchC GE0/0/1 common forwarding
SwitchD GE0/0/2 common discarding
When deleting an SEP configuration in an open ring scenario, you are advised to delete the
configuration from one end of the open ring. When only one SEP-enabled port is left, shut
down the port and then delete the SEP configuration on the port.
1.6.4 Misconfigurations
1.6.4.1 Services Are Interrupted When Ports Are Not Deleted from VLAN 1
Network Diagram
In Figure 1.1, the switch connects to routers through dual links and connects to downlink
access devices.
Symptom
All services on the dual links of the switch are interrupted. After the switch restarts, services
are restored for a short time. Then the problem occurs again.
Cause Analysis
A loop occurs on the access network, causing broadcast storm. As a result, the bandwidth of
uplink ports on the switch is occupied and the OSPF neighbor relationship is Down. After the
switch restarts, broadcast storm is eliminated and services are restored. When broadcast storm
recurs, the fault occurs.
Handling Procedure
1. Check the log file. The log file shows that the OSPF neighbor relationship is Down
because the remote device does not receive OSPF Hello packets in a timely manner.
NBR_CHG_DOWN(l): Neighbor event:neighbor state changed to Down. (ProcessId=88,
NeighborAddress=x.x.x.x, NeighborEvent=KillNbr, NeighborPreviousState=Loading,
NeighborCurrentState=Down)
2. Check the diagnostic log file. The file shows that there is abnormal traffic on the
interfaces. There are alarms about outgoing traffic on GE1/0/0 and GE1/0/1, and alarms
about incoming traffic on GE1/0/3 and GE1/0/4.
3. Analyze the configurations of interfaces where there are alarms about abnormal traffic.
These interfaces all join VLAN 1. Traffic from VALN 1 on GE1/0/3 and GE1/0/4 is
broadcast to all other interfaces. As a result, the outgoing traffic on uplink interfaces is
abnormal and OSPF Hello packets are discarded. A loop has occurred in VLAN 1. After
GE1/0/3 and GE1/0/4 are deleted from VLAN 1, the fault is rectified.
Conclusion
Loops often occur in VLAN 1. If traffic on interfaces is abnormal, check the configurations of
the interfaces and check whether the interfaces are in VLAN 1. In addition, pay attention to
the number of broadcast packets on the interface.
Network Diagram
In Figure 1.1, the switches run V100R005C01SPC100 and have global STP enabled. The
switches connect to multiple Cisco switches to constitute multiple STP rings.
Figure 1.1 Services are interrupted because switch ports are not configured with BPDU.
Symptom
When services are interrupted, log in to the switches. There are many broadcast packets on
interconnected ports and loops occur.
Root Cause
The preceding configuration shows that global STP is enabled on the two switches but bpdu
enable is not configured on interconnected ports.
#
interface GigabitEthernet0/0/4
port link-type access
port default vlan 10
loopback-detect enable
undo ntdp enable
undo ndp enable
#
On the switch ports enabled with Layer 2 protocols such as STP and LACP, the bpdu enable
command needs to be configured so that received protocol packets can be sent to the CPU for
processing. Otherwise, protocol packets are discarded and protocol negotiation cannot be
implemented.
Handling Procedure
There are loops on the network. First check whether STP convergence is normal. When there
is no blocked port in the STP ring, run the display stp interface command to check the role
of the port in the spanning tree and check whether STP BPDUs are received and sent
normally. For example:
Port Role :Designated Port
Port Priority :128
Port Cost(Dot1T ) :Config=auto / Active=20000 //Path cost of the port
Designated Bridge/Port :4096.5489-98f5-a433 / 128.34 //Specified bridge ID
BPDU Sent :726
TCN: 0, Config: 0, RST: 0, MST: 726
BPDU Received :0
TCN: 0, Config: 0, RST: 0, MST: 0
If interconnected STP-enabled ports are designated ports, STP negotiation fails. Check
whether bpdu enable is configured on the ports. If bpdu enable is not configured, configure
bpdu enable on the ports that participate in STP calculation.
Conclusion
For the X7 series switches of modular switches, bpdu enable does not need to be configured
on the ports that participate in STP calculation. bpdu disable or bpdu bridge disable is
configured by default.
For fixed switches of versions earlier than V100R006, bpdu enable needs to be configured on
the ports that participate in STP calculation. Otherwise, the switch does not process received
STP BPDUs. For fixed switches of V100R006 and later versions, bpdu enable is configured
on ports by default.
For modular switches, bpdu enable does not need to be configured on the ports that
participate in STP calculation. bpdu disable or bpdu bridge disable is configured by default.
Network Diagram
In Figure 1.1, PCs connect to switches and obtain IP addresses through DHCP.
Figure 1.1 PCs booting from network adapters fail to obtain IP addresses
Symptom
When some types of terminals (such as Lenovo PC) start, they cannot obtain IP addresses
from the DHCP server and cannot go online.
Cause Analysis
The switches connected to terminals run STP but the connected ports are not configured as
STP edge ports.
When the problematic terminals boot from network adapters, the corresponding switch ports
alternate between Up and Down. The terminals then send four messages to request IP
addresses. Since the ports are not configured as STP edge ports, port disconnection will
trigger the STP protocol to recalculate the network topology. The network convergence takes
about 30s. During this period, traffic cannot be forwarded by the ports. Therefore, the ports
discard the request messages from the terminals. The terminals send only four messages to
request IP addresses and consider an IP address obtaining failure if they receive no response
to the four messages. Therefore, the terminals cannot obtain IP addresses.
Handling Procedure
1. STP is enabled on the switch and switch ports connected to the terminals are not
configured as STP edge ports.
2. The switch ports connected to the terminals alternate between Up and Down when the
terminals boot from the network adapter.
3. Run the stp edged-port enable command to configure switch ports connected to the
terminals as edge ports.
Network Diagram
In Figure 1.1, Switch-1 and Switch-2 are connected through GE0/0/20, GE0/0/23, and
GE0/0/24. GE0/0/20 is added to VLAN 99 and VLAN 101, and GE0/0/23 is added to only
VLAN 99, and GE0/0/24 is added to only VLAN 101. VLAN 99 maps MSTI 1, and VLAN
101 maps MSTI 2.
Symptom
The STP convergence result on two switches is as follows:
<Switch-1> display stp brief
MSTID Port Role STP State Protection
0 GigabitEthernet0/0/20 DESI FORWARDING NONE
0 GigabitEthernet0/0/23 DESI FORWARDING NONE
0 GigabitEthernet0/0/24 DESI FORWARDING NONE
1 GigabitEthernet0/0/20 DESI FORWARDING NONE
1 GigabitEthernet0/0/23 DESI FORWARDING NONE
2 GigabitEthernet0/0/20 DESI FORWARDING NONE
2 GigabitEthernet0/0/24 DESI FORWARDING NONE
<Switch-2> display stp brief
MSTID Port Role STP State Protection
0 GigabitEthernet0/0/20 ROOT FORWARDING NONE
0 GigabitEthernet0/0/23 ALTE DISCARDING NONE
0 GigabitEthernet0/0/24 ALTE DISCARDING NONE
GE0/0/20 are in forwarding state in MSTIs 1 and 2. The customer requires that the STP status
of GE0/0/20 in different MSTIs is different. After the costs of GE0/0/20 in different MSTIs
are adjusted, the status remains unchanged.
Cause Analysis
On the two switches, the MST region names are different. That is, the two switches belong to
different regions. STP or RSTP is used between different regions for them to converge. The
convergence result of MSTI 0 takes effect for all MSTIs.
Handling Procedure
After MSTP multi-instance is configured on the two switches, convergence can be performed
in each MSTI. The convergence result shows that the configurations are correct. The
convergence results in MSTIs 1 and 2 are the same as the convergence result in MSTI 0.
Check whether the two switches are in the same MST region.
The MST region configurations on the two switches are as follows:
Switch-1:
stp region-configuration
region-name vlan101
instance 1 vlan 101
instance 2 vlan 99
active region-configuration
Switch-2:
stp region-configuration
region-name vlan99
instance 1 vlan 101
instance 2 vlan 99
active region-configuration
According to the preceding configurations, the MST region names are different. Two devices
belong to the same MST region only when they have the same MST region name, mapping
between MSTIs and VLANs, format selector, and revision level.
Convergence can be performed independently in MSTIs of the same MST region. Configure
the same MST region name for two switches and adjust the costs of GE0/0/20 in different
MSTIs so that the STP status of GE0/0/20 in different MSTIs is different.
Conclusion
By default, S series switches use the system MAC address as the region name, for example,
00d0d0c7ec77.
When a switch works in MSTP mode and several MSTIs are configured in an MST region,
pay attention to the MST region configuration and MSTI mapping the VLAN where the
interface is added.
Network Diagram
In Figure 1.1, Switch-1, Switch-2, and two H3C S6500s form an MSTP ring.
Figure 1.1 Inconsistent MSTP packet formats on two ends cause ports to be Down
Symptom
After Switch-1 restarts and goes online again, GE0/0/4 of S6500-1 automatically goes down.
You need to run the undo shutdown command to manually recover the port. The following
alarm message is printed:
%Jul 5 08:13:33 2011 S6500-1 L2INF/5/PORT LINK STATUS CHANGE:
GigabitEthernet0/0/4: is UP
%Jul 5 08:13:42 2011 S6500-1 MSTP/3/BPDUFORMATERROR:Port GigabitEthernet0/0/4
received different format of MSTP BPDU packets continually! Shut down it in order
to voiding broadcast
%Jul 5 08:13:43 2011 S6500-1 L2INF/5/PORT LINK STATUS CHANGE:
GigabitEthernet0/0/4: is DOWN
Cause Analysis
MSTP retains default packet formats on the ports between switches and S6500 switches.
However, these default MSTP packet formats are different. The ports on S6500s are shut
down.
The default stp compliance implementation mode on ports of Huawei switches is auto, in
which the ports send packets in dot1s format. However, the H3C S6500s send packets in
legacy format by default. After going Up, the port of an S6500 consecutively sends three
legacy packets. After going Up, the port of a Huawei switch sends one dot1s packet. The
S6500 replies with a dot1s packet and switch replies to the legacy packet of the S6500. After
that, the S6500 and Huawei switch exchange dot1s packets.
The S6500 uses a special mechanism to check the packet format: if it receives three or more
legacy and dot1s packets on a port within 10 seconds, it shuts down the port.
Handling Procedure
1. When the ports go Up after a reboot of the Huawei switches, run the display stp
interface command on the S6500s to check information about the interconnected ports.
In the command output, MSTP BPDU format displays legacy.
[S6500] display stp interface
......
----[Port4(GigabitEthernet0/0/4)][FORWARDING]----
Port Protocol :enabled
Port Role :CIST Designated Port
Port Priority :128
Port Cost(Legacy) :Config=auto / Active=20
Desg. Bridge/Port :32768.000f-e2e0-5501 / 128.4
Port Edged :Config=disabled / Active=disabled
Point-to-point :Config=auto / Active=true
Transit Limit :3 packets/hello-time
Protection Type :None
Receive/Send
MSTP BPDU format :legacy
Port Config
Digest Snooping :disabled
Num of Vlans Mapped :0
PortTimes :Hello 2s MaxAge 20s FwDly 15s RemHop 0
BPDU Sent :56143
TCN: 0, Config: 0, RST: 0, MST: 56143
BPDU Received :56734
TCN: 0, Config: 0, RST: 0, MST: 56734
......
2. Run the stp compliance legacy command to set the port mode between switch and
S6500 to legacy.
Conclusion
When Huawei switches connect to non-Huawei devices, check whether the MSTP packet
format on the remote port is auto and whether special check mechanism is used.
If the STP compliance mode of a Huawei switch's port is not auto, and the format the packets
received by the port differs from the configured format, the switch prints the following log:
MSTP/3/PACKET_ERR_COMPLIAN:The port compliance protocol type of the the packet
received by MSTP from the port [port-name] is invalid.
If a non-Huawei device is used, run the related commands provided by the manufacturer
to obtain information on the device.
3. If MSTP sets the STP status of a port incorrectly after receiving invalid packets, loops
may occur on the Layer 2 network. It is recommended that the port be shut down to
avoid broadcast storm. To view STP status and check whether loops occur, run the
display stp brief command. If loops do not exist or are removed, run the undo
shutdown command to enable the port.
Network Diagram
In Figure 1.1, SwitchA, SwitchB, SwitchC, and SwitchD form an RRPP ring. According to
data planning, the RRPP ring protects data of VLAN 10 and VLAN 20, which are added to
instance 1. The protected VLANs are bound to instance 1.
Symptom
On the preceding network, a loop occurs in VLAN 1.
Cause Analysis
1. Run the display current-configuration interface GigabitEthernet 1/0/1 command to
check configuration on ports of the RRPP ring. If the command output does not contain
undo port trunk allow-pass vlan 1, the ports all join VLAN 1 by default.
[SwitchA] display current-configuration interface GigabitEthernet 1/0/0
#
interface GigabitEthernet1/0/0
port link-type trunk
port trunk allow-pass vlan 10 20
stp disable
#
return
Handling Procedure
You can use either of the following two methods to eliminate the loop in VLAN 1.
Method 1: Add VLAN 1 to instance 1 on SwitchA, SwitchB, SwitchC, and SwitchD. SwitchA
is used as an example.
[SwitchA] stp region-configuration
Info: Please activate the stp region-configuration after it is modified.
[SwitchA-mst-region] instance 1 vlan 1 10 20
[SwitchA-mst-region] active region-configuration
Info: This operation may take a few seconds. Please wait for a moment...done.
[SwitchA-mst-region] quit
[SwitchA] display stp region-configuration
Oper configuration
Format selector :0
Region name :00e084701700
Revision level :0
Method 2: Delete VLAN 1 on ports connected to the RRPP ring if VLAN 1 is not required.
SwitchA is used as an example.
[SwitchA] interface GigabitEthernet1/0/1
[SwitchA-GigabitEthernet1/0/1] undo port trunk allow-pass vlan 1
Network Diagram
In Figure 1.1, SwitchA, SwitchB, SwitchC, and SwitchD form an RRPP ring. SwitchA is the
master node. SwitchB, SwitchC, and SwitchD are transit nodes.
Figure 1.1 RRPP master node has a different mode than other transit nodes
Symptom
When links among SwitchB, SwitchC, and SwitchD fail and recover, MAC and ARP entries
on transit nodes are not updated. As a result, traffic forwarding is affected.
Cause Analysis
On the RRPP master node SwitchA, the configured RRPP mode is defined by international
standards, while that on SwitchB, SwitchC, and SwitchD is defined by Huawei's standard by
default. When a transit node fails, the common or complete packet sent by the RRPP master
node SwitchA is not processed. As a result, the MAC and ARP entries are not updated, and the
traffic forwarding is affected.
Handling Procedure
1. Check whether the RRPP master node is SwitchA.
<SwitchA> display rrpp verbose domain 1
Domain Index : 1
You can see from the preceding command output, the RRPP mode configured globally on
SwitchA is defined by international standards (rrpp working-mode GB), while the RRPP
mode on transit nodes is the default working mode defined by Huawei's standard.
Conclusion
All nodes on the RRP ring must be configured with the same working mode: either the
working mode defined by international standards or that defined by Huawei's standard.
Network Diagram
In Figure 1.1, SwitchA functions as the master node of the RRPP ring. Normally, GE1/0/0 is
the primary interface, and GE2/0/0 is the secondary interface (blocked interface).
Figure 1.1 Users fail to go online on a transit node and downstream nodes connected to the transit
node
Symptom
When the primary interface on a transit node becomes Down and recovers, users on the transit
node and other downstream nodes connected to the transit node cannot go online. The fault is
rectified several minutes later.
Cause Analysis
The master and transit nodes use different RRPP working modes. The master node works in
GB mode and the transit node works in HW mode. As a result, the transit node cannot process
Flush packets of the master node.
Handling Procedure
1. Check whether the device works in GB or HW mode, that is, check whether rrpp
working-mode gb or rrpp working-mode hw is configured.
2. Run the display rrpp brief command to check the RRPP Working Mode field. Check
whether the value of the RRPP Working Mode field on nodes of the RRPP network is
the same.
<Quidway> display rrpp brief
Abbreviations for Switch Node Mode :
M - Master , T - Transit , E - Edge , A - Assistant-Edge
RRPP Protocol Status: Disable
RRPP Working Mode: HW
RRPP Linkup Delay Timer: 1 sec (0 sec default)
Number of RRPP Domains: 1
Conclusion
If this problem occurs, check whether MAC address entries and ARP entries of the device are
updated. If not, check whether the RRPP working mode is the same.
Network Diagram
In Figure 1.1, SwitchA, SwitchB, and SwitchC constitute an RRPP ring. SwitchB is the
master node while SwitchC is the transit node. GE2/0/4 and GE1/0/5 of SwitchA allow
packets from control VLAN 2515 of the RRPP ring to pass through. VLANs in instance 0 on
SwitchB and SwitchC are configured as protected VLANs.
Symptom
The original multi-instance configuration on the master node SwitchB causes a loop in a
VLAN that is not in instance 0. As a result, many access devices cannot be managed.
Cause Analysis
1. Check the RRPP configuration on SwitchB.
2. Run the display current-configuration configuration rrpp-domain-region command
to check RRPP domain configuration.
[SwitchB] display current-configuration configuration rrpp-domain-region
#
rrpp domain 1
control-vlan 2515
protected-vlan reference-instance 0
ring 1 node-mode master primary-port GigabitEthernet0/0/1 secondary-port
GigabitEthernet0/0/2 level 0
ring 1 enable
#
return
----------------
Interface Physical
GigabitEthernet0/0/1 UP
GigabitEthernet0/0/2 DOWN
Configuration of SwitchC:
[SwitchC] display vlan 2500
VLAN ID Type Status MAC Learning
----------------------------------------------------------
2500 common enable enable
----------------
Tagged Port: GigabitEthernet0/0/1 GigabitEthernet0/0/2
----------------
Interface Physical
GigabitEthernet0/1/1 UP
GigabitEthernet0/1/2 UP
Configuration of SwitchA:
[SwitchA] display vlan 2500
VLAN ID Type Status MAC Learning Broadcast/Multicast/Unicast Property
-------------------------------------------------------------------------------
-
2500 common enable enable forward forward forward default
----------------
GigabitEthernet2/0/2 GigabitEthernet2/0/4
GigabitEthernet2/0/5 GigabitEthernet2/0/6
----------------
Interface Physical
GigabitEthernet2/0/0 UP
GigabitEthernet2/0/1 UP
GigabitEthernet2/0/2 UP
GigabitEthernet2/0/4 UP
GigabitEthernet2/0/5 DOWN
GigabitEthernet2/0/6 UP
Beside all ports on the ring, some ports not on the ring allow packets from VLAN 2500 to
pass through. VLAN 2500 is in instance 1. The RRPP ring only protects VLANs of instance
0. As a result, a loop occurs in VLAN 2500.
Handling Procedure
In this case, the RRPP ring is deployed to protect all VLANs. Instance 1 can be deleted.
SwitchB is used as an example.
[SwitchB] stp region-configuration
Info: Please activate the stp region-configuration after it is modified.
[SwitchB-mst-region] undo instance 1
[SwitchB-mst-region] active region-configuration
Info: This operation may take a few seconds. Please wait for a moment...done.
[SwitchB-mst-region] quit
[SwitchB] display stp region-configuration
Oper configuration
Format selector :0
Region name :00259e5cec21
Revision level :0
Network Diagram
In Figure 1.1, a PC connects to a switch through a L2 switch, and requests to access the
internal server.
Symptom
When a PC connected to the switch accesses an intranet server, severe packet loss occurs and
services are interrupted.
Cause Analysis
loopback internal is configured on a switch, causing a MAC address flapping.
Handling Procedure
Delete loopback internal on L2 Switch.
1. Run the loop-detect eth-loop alarm-only command in the VLAN view to enable MAC
flapping detection.
2. Run the display trapbuffer command to view alarm information. Check whether MAC
flapping alarm has been reported. The alarm information shows that MAC flapping has
occurred on GE0/0/1. Check the configurations of downstream device.
3. Run the display current-configuration command on L2 switch to display the interface
configurations. loopback internal has been configured on the interface. Therefore, the
server's MAC address is learned by the ports between switch and l2 switch.
1.6.4.11 Services Are Interrupted After Smart Link Master and Slave
Interfaces Are Switched
Network Diagram
In Figure 1.1, SwitchA is configured with Smart Link, and GE1/0/2 is the master interface and
GE1/0/3 is the slave interface.
Figure 1.1 Services are interrupted after master and slave interfaces are switched
Symptom
On SwitchA, GE1/0/2 fails, services are switched to the slave interface. As a result, services
are interrupted. Services can be recovered only after MAC addresses and IP addresses are
manually updated.
Cause Analysis
When a link failover occurs in the Smart Link group, original forwarding entries are
applicable to the new topology. MAC address entries and ARP entries on the entire network
need to be updated. The Smart Link group sends Flush packets to instruct upstream devices to
update their MAC address entries and ARP entries. Upstream devices can update their MAC
address entries and ARP entries only when they are enabled to receive Flush packets. If a
device rejects Flush packets, it cannot forward packets correctly after a link failover occurs in
the Smart Link group.
The interface on SwitchD is not configured to receive Flush packets. When Flush packets of
SwitchA during the failover reach SwitchD, SwitchD does not update its ARP entries
(GE1/0/2 on SwitchA needs to be changed to GE1/0/3). In this case, traffic passing SwitchD
is still sent to the original link that has been blocked. As a result, packets cannot pass.
Handling Procedure
Check whether the smart-link flush receive control-vlan vlan-id command is configured on
interfaces (GE1/0/2 and GE1/0/3 on SwitchB, GE1/0/3 and GE1/0/4 on SwitchC, and
GE1/0/4 and GE1/0/5 on SwitchD) of active and standby links of SwitchB, SwitchC, and
SwitchD.
There is no smart-link flush receive control-vlan vlan-id command configuration on the
interfaces. Run the smart-link flush receive control-vlan vlan-id command on interfaces of
active and standby links of SwitchB, SwitchC, and SwitchD. Ensure that the control VLAN
ID and password in the Flush packet are the same as those configured on SwitchA.
You only need to configure interfaces on active and standby links between Smart Link devices
and destination devices to receive Flush packets from a specified control VLAN.
Network Diagram
In Figure 1.1, Switch-A and Switch-B are directly connected through an Eth-Trunk, and
VRRP is run on them. Switch-A is the VRRP master and Switch-B is the VRRP backup.
Switch-A and Switch-B function as a Layer 3 gateway to connect to access switches. STP is
enabled on these devices. The Layer 2 access switches directly connect to users.
Figure 1.1 A large number of TC BPDUs cause an ARP learning error on a modular switch
Symptom
An ARP learning error occurs on Switch-A. Many incomplete ARP entries exist on the switch.
Switch-A cannot learn the ARP entries of users sometimes, affecting service stability.
Cause Analysis
On the Layer 2 access switches, the stp edged-port enable command is not run on STP edge
ports. When the status of edge ports changes, a TC BPDU is sent to the VRRP group. The
VRRP group starts STP convergence, and then clears ARP entries or detects aged ARP entries.
Too many ARP entries exist in the VRRP group, so the VRRP group sends many ARP request
packets for probing and receives many ARP reply packets. The rate of ARP packets exceeds
the CIR value. As a result, some ARP reply packets are discarded and ARP entries are aged
out. Services of the corresponding users are abnormal. When the VRRP group frequently
receives such TC BPDUs, services are unstable.
Handling Procedure
1. Log in to Switch-A to view the ARP entries on VLANIF 27. This VLANIF interface
connects to the servers where users are online for a long time. View statistics for a long
time. You can find that the total number of ARP packets on the interface alternates
between 50 and 20. There are ARP entries in Incomplete state, and the IP addresses also
change frequently. The aging time of learned ARP entries becomes 0 sometimes.
<Switch-A> display arp interface vlanif 27
IP ADDRESS MAC ADDRESS EXPIRE(M) TYPE INTERFACE VPN-INSTANCE
VLAN/CEVLAN
------------------------------------------------------------------------------
132.212.4.3 0025-9e7f-fd01 I - Vlanif27
132.212.4.129 0014-38b9-73c3 0 D-0 Eth4/0/42
27/-
132.212.4.133 00e0-fc94-cddd 0 D-0 Eth4/0/42
27/-
132.212.4.203 0018-7172-5901 0 D-0 Eth4/0/42
27/-
132.212.4.107 0011-43a3-388f 0 D-0 Eth4/0/42
The switch has received TC BPDUs, and aged out ARP entries.
2. Run the display stp tc command to view the TC BPDUs received by the interface.
[Switch-A-hidecmd] display stp tc
---------- Stp Instance 0 tc or tcn count ----------
Port GigabitEthernet1/0/0 0
Port GigabitEthernet1/0/1 0
Port GigabitEthernet1/0/2 0
Port GigabitEthernet1/0/3 0
Port GigabitEthernet1/0/4 87
Port GigabitEthernet1/0/5 123
Port GigabitEthernet1/0/6 99
Port GigabitEthernet1/0/8 71
Port GigabitEthernet1/0/9 173
Port GigabitEthernet1/0/10 146
Port GigabitEthernet1/0/13 8
Port GigabitEthernet1/0/21 0
3. Analyze the log. The log shows the received TC BPDUs and ARP entry aging records.
Apr 19 2011 09:59:58 DCN_S9306_A %%01MSTP/6/RECEIVE_MSTITC(l): MSTP received
BPDU with TC, MSTP process 0 instance 0, port name is Ethernet4/0/46.
The log also shows that ARP reply packets have been discarded due to CPCAR
exceeding.
Apr 19 2011 09:28:13 DCN_S9306_A %%01QOSE/4/CPCAR_DROP_LPU(l): Some packets are
dropped by cpcar on the LPU in slot 1. (Protocol=arp-reply, Drop-Count=061)
The preceding information indicates that the switch has frequently received TC BPDUs and
aged out ARP entries. The device needs to send a large number of ARP probe packets, and
user terminals return many ARP reply packets, whose rate exceeds the CIR. Therefore, most
ARP reply packets are discarded. The ARP entries are aged out and deleted, affecting services.
The TC BPDUs received by the switch are sent from downstream access switches. Access
switches are directly connected to PCs, and STP is enabled on their interfaces; however, the
stp edged-port enable command is not run. When PCs are powered on and off, many edge
ports alternate between Up and Down. The switch repeatedly sends TC BPDUs.
After the stp edged-port enable command is run on the edge ports, the problem does not
recur within several days, and user services are normal.
Network Diagram
None.
Symptom
1. In Figure 1.1, the CPU usage of a switch displayed on the network management system
(NMS) is high.
3. There are also logs indicating that a large number of ARP packets are discarded because
of CPCAR exceeding.
Switch %%01DEFD/4/CPCAR_DROP_MPU(l)[56]:Rate of packets to cpu exceeded the
CPCAR limit on the MPU. (Protocol=arp-miss, ExceededPacketCount=016956)
Switch %%01DEFD/4/CPCAR_DROP_MPU(l)[57]:Rate of packets to cpu exceeded the
CPCAR limit on the MPU. (Protocol=arp-reply, ExceededPacketCount=020699)
Switch %%01DEFD/4/CPCAR_DROP_MPU(l)[58]:Rate of packets to cpu exceeded the
CPCAR limit on the MPU. (Protocol=arp-request, ExceededPacketCount=0574
Root Cause
Based on statistics about TC BPDUs, the number of TC BPDUs received on STP-enabled
ports is large and keeps increasing. After receiving TC BPDUs, the switch deletes MAC
address entries and updates ARP entries. The switch has to process a large number of ARP
Miss, ARP Request, and ARP Reply packets, leading to a high CPU usage. OSPF Hello
packets and VRRP heartbeat packets cannot be processed in a timely manner, resulting in
protocol flapping.
Identification Method
1. Run the stp tc-protection command in the system view.
This command ensures that the switch updates entries once every 2 seconds when
receiving a large number of TC BPDUs. This configuration prevents a high CPU usage
caused by frequent updates of MAC address entries and ARP entries.
2. Run the arp topology-change disable and mac-address update arp commands in the
system view.
By default, the switch deletes the MAC address entries and ages out ARP entries after
receiving TC BPDUs. If there are many ARP entries on the switch, ARP entry relearning
triggers a large number of ARP packets on the network. After the arp topology-change
disable and mac-address update arp commands are configured, the switch updates the
outbound interfaces in ARP entries based on the changed outbound interfaces in the
MAC address entries upon network topology changes. The commands prevent
unnecessary ARP entry updates.
The mac-address update arp command has been available since V100R006, and the arp topology-
change disable command has been available since V200R001.
Conclusion
When deploying STP, you are advised to enable TC protection and configure all ports
connected to terminals as edge ports. These measures prevent status change of an interface
from causing flapping and re-convergence of the entire STP network. When this problem
occurs, pay attention to packet loss caused by CPCAR.
Network Diagram
None.
Symptom
On an MSTP network, an S5700 has a high CPU usage.
Cause Analysis
When network topology recalculation occurs on an MSTP network, a large number of BPDUs
indicating topology changes will be advertised. The switch will then recalculate the topology,
causing a high CPU usage.
Handling Procedure
1. Run the display interface brief command to check interface bandwidth usage.
<HUAWEI> display interface brief
…………
Interface PHY Protocol InUti OutUti inErrors outErrors
GigabitEthernet4/0/1 up up 0.72% 81% 0 0
GigabitEthernet4/0/2 up up 81% 0.73% 2 0
2. Run the display stp tc-bpdu statistics command to check the number of TC and TCN
BPDUs received and sent by each interface. The command output shows that a large
number of TC BPDUs are received.
<HUAWEI> display stp tc-bpdu statistics
-------------------------- STP TC/TCN information --------------------------
MSTID Port TC(Send/Receive) TCN(Send/Receive)
0 GigabitEthernet4/0/1 3/2 0/0
0 GigabitEthernet1/0/10 14/9 0/0
It is difficult to locate the fault that causes topology changes. To resolve the high CPU
usage problem, perform the following operations:
− Run the arp topology-change disable command, so that ARP entries will not be
aged out or deleted when the network topology changes.
− Run the mac-address update arp command, so that the switch will update
outbound interfaces in ARP entries when outbound interfaces in MAC address
entries change.
The mac-address update arp command has been available since V100R006, and the arp topology-
change disable command has been available since V200R001.
After the preceding operations are performed, there is a noticeable decrease in the CPU
usage.
Figure 1.1 Abnormal STP convergence when the S9300 ports process BPDUs
Root Cause
The S5624 has the Neighbor Discovery Protocol (NDP) enabled in the system and on all
ports. The bpdu bridge enable command is configured on the S9300 ports connected to the
S8500 and S5600. As a result, NDP packets sent by the S5624 are looped and sent to the CPU
of the S5624. In this case, STP BPDUs cannot be processed normally.
Identification Method
Analyze the configuration of the ports on the S9300 and S8500 that connect to the S5624 and
S3328. The configurations are the same except the allowed VLANs.
The configurations of ports on the S9300 are as follows:
#
interface GigabitEthernet1/0/2
description S5600 1/0/2
port link-type trunk
port trunk pvid vlan 4080
port trunk allow-pass vlan 10 4080
stp disable
l2protocol-tunnel stp enable
bpdu bridge enable
trust upstream default
#
Reproduce the environment in the lab. After the interconnected ports of the S9300, S8500,
and S5600 become Up, the CPU usage of the S5600 rapidly reaches 100%. Obtain packets on
the S5600 ports. Many NDP packets with the destination MAC address of 0180-c200-000a
are received on the S5600 ports.
NDP is enabled on the S5600 in the system and on ports by default, so ports periodically send
NDP packets. After NDP packets reach the S9300, NDP packets can be forwarded normally,
though the bpdu bridge enable command is used on the primary and secondary ports of the
S9300 and the secondary port is blocked. As a result, NDP packets are looped between the
S8500 and S9300 and forwarded to the S5600 through GE1/0/2. Many NDP packets are sent
to the S5600. As a result, the S5600 cannot process STP BPDUs. NDP is not enabled on the
S3328 in the system and on ports by default. After receiving NDP packets, the S3328 discards
them. The STP convergence is normal.
Solution
Delete the bpdu bridge enable command configuration on the S9300 ports that are connected
to the S8500, S3328, and S5600.
Conclusion
The bpdu enable command on the S9300 V100R002 and the bpdu bridge enable command
in V100R003 and V100R006 are used to enable ports to forward BPDUs. The ports do not
discard the packets that have the destination MAC address as the BPDU MAC address and are
not sent to the CPU for processing. Instead, such packets are forwarded through the hardware.
The bpdu enable or bpdu bridge enable command is not required to implement Layer 2
protocol transparent transmission. You can run the display bpdu mac-address command to
check the BPDU MAC address.
On the S2300 or SS3300&5300 running V100R006 or an earlier version, the bpdu enable
command must be run on ports enabled with Layer 2 transparent transmission, except the
ports with bpdu-tunnel enable or l2protocol-tunnel enable configured; otherwise, the
related Layer 2 packets cannot be sent to the CPU.
1.6.5.5 STP Flapping Occurs Because the STP Timeout Interval on the
ATAE Device Is Incorrectly Calculated
Network Diagram
None.
Symptom
The switch connects to the ATAE device of an earlier version using STP. On the switch used
as the root bridge, the stp timer hello command is used to set the Hello time to 1s. When the
switch is busy in a short time or a few packets are discarded, STP flapping occurs on the
ATAE device.
Cause Analysis
The timeout interval of the ATAE device of an earlier version is three times the Hello time,
and is irrelevant to the timeout factor. When the Hello time on the root bridge is set to 1s, the
timeout interval on the ATAE device is 3s. When the switch is busy in a short time or a few
packets are discarded, STP flapping easily occurs on the ATAE device.
The timeout interval on the ATAE device of the latest version is changed to be the same as
that on the switch. The timeout interval is calculated using the following formula: Hello time
x Time factor x 3. The default Hello time is 2s and the time factor is 3, so the default timeout
interval is 18s.
Handling Procedure
1. Check whether the stp timer-factor command is configured on the ATAE device of an
earlier version.
2. Check whether the Hello time on the root bridge is 1s. That is, check whether the stp
timer hello 100 command is used. Here, the value 100 refers to 100 centiseconds.
3. During STP flapping, check whether the ATAE device first sends STP BPDUs with the
source MAC address of 00e0-fc09-bc-f9.
Two solutions are available:
Solution 1: Upgrade the version of the ATAE device to the latest version that supports the
time factor.
Solution 2: The ATAE device still uses STP, and the stp timer hello 300 command is used on
the root bridge and secondary root bridge so that the timeout interval of the ATAE device
reaches 9s.
Conclusion
If a switch does not receive any BPDU from the upstream device within the timeout interval,
the switch considers that the upstream device fails and recalculates the spanning tree.
Sometimes, the switch cannot receive BPDUs from the upstream device for a long time
because the upstream device is busy. In this case, the switch should not recalculate the
spanning tree. Therefore, you can set a longer timeout interval on a stable network to save
network resources.
The recommended timer factor is 5 to 7 on a stable network.
1.6.5.6 RSTP Cannot Provide Fast Convergence When the S6500 Port
Connected to the S6500 Changes from Down to Up
Network Diagram
In Figure 1.1, two S6500s and the switch constitute an RSTP ring. When the network is
stable, the port connecting the switch and S6500-2 is the blocked port.
Figure 1.1 RSTP cannot provide fast convergence after the connected port between the switch and
S6500 changes from Down to Up
Symptom
Shut down the port on S6500-1 connected to the switch and restore the port to check the
RSTP fast convergence mechanism. After the link between S6500-1 and the switch recovers,
the port on S6500-1 remains in Discarding state and changes to Forwarding state after 30s.
Cause Analysis
Run the debugging stp all command to check whether there is the Agreement flag in the
Flags field. The following information shows that there is only the Proposal flag.
Port50(GigabitEthernet0/0/8) Rcvd Packet(Length: 43)
ProtocolVersionID : 02
BPDUType : 02( RST BPDU )
Flags : 0e( Proposal DESIGNATED )
Root Identifier : 0.000f-e2e0-7425
Root Path Cost : 0
Bridge Identifier : 0.000f-e2e0-7425
Port Identifier : 128.206
Message Age : 0
Max Age : 20
Hello Time : 2
Forward Delay : 15
Version 1 Length : 0
After the port between S6500-1 and the switch goes Up, the Proposal packets sent by S6500-1
do not carry the Agreement. As a result, the port cannot implement fast transition. That is, the
Proposal/Agreement mechanism does not take effect.
Handling Procedure
Run the stp no-agreement-check command on the port between the switch and S6500.
Network Diagram
In Figure 1.1, after the loop on the RRPP master node is removed, the loop is generated again.
This process repeats.
Symptom
RRPP flapping keeps for over than one hour. No error (such as interface flapping) is recorded
in the log, and the interfaces on the RRPP ring do not have FCS count.
Cause Analysis
The test result shows that RRPP hello packets are discarded when unknown unicast traffic
volume on an interface increases. The RRPP ring status turns to Failed after three consecutive
packets are discarded, and is recovered when the next hello packet is received. The RRPP ring
status alternates between Failed and Complete.
Handling Procedure
1. Simulate the live network in the lab. The RRPP status on S3328 is normal.
[119-S3328TP-01]display rrpp verbose domain 1
Domain Index : 1
Control VLAN : major 4091 sub 4092
Protected VLAN : Reference Instance 0
Hello Timer : 1 sec(default is 1 sec) Fail Timer : 3 sec(default is 6 sec)
RRPP Ring :1
Ring Level :0
Node Mode : Master
Ring State : Complete
Is Enabled : Enable Is Actived : Yes
Primary port : GigabitEthernet0/0/1 Port status: UP
Secondary port : GigabitEthernet0/0/2 Port status: BLOCKED
2. Send unknown unicast traffic carrying RRPP control VLAN ID from the tester to S3328.
3. RRPP flapping occurs and the recovery interval is the same as that on the live network.
Jan 2 2008 20:02:48 119-S3328TP-01 %%01RRPP/4/PFWD(l): Domain 1 Ring 1 Port
GigabitEthernet0/0/2 has been set to forwarding state.
#Jan 2 20:02:50 2008 119-S3328TP-01 RRPP/4/RNGDN:1.3.6.1.4.1.2011.5.25.113.4.2:
Domain 1 ring 1 is failed.
Network Diagram
In Figure 1.1, SwitchA functions as the master node of the RRPP ring. Normally, GE1/0/0 is
the primary interface, and GE2/0/0 is the secondary interface (blocked interface).
Symptom
After the loop on the RRPP master node is removed, the loop is generated again. This process
repeats.
Cause Analysis
Unknown unicast suppression is configured on a switch, and the destination MAC addresses
of RRPP packets are unknown unicast MAC addresses. When the volume of unknown unicast
traffic on an interface increases, RRPP packets are suppressed. As a result, the switch
considers that a fault occurs on the RRPP ring and unblocks the interface to form a loop.
Handling Procedure
1. Run the display rrpp statistics command. The command output shows that the switch
frequently sends and receives Link Down packets and the numbers of Send and Rcv
packets on the master and slave interfaces are different.
2. Check the configurations on the switch. The unicast-suppression command has been
run on the switch to suppress unknown unicast packets.
3. Run the undo unicast-suppression command in the interface view to undo unknown
unicast suppression. The fault is recovered.
1.6.5.9 Services on the RRPP Ring Consisting of CX600 and S3300 Are
Interrupted
Network Diagram
In Figure 1.1, a CX600 and two S3300 switches form an RRPP ring. Services on the ring are
interrupted.
Symptom
The port on CX600 connected to S3328 alternates between Up and Down, and a loop occurs
on the RRPP ring where the CX600 resides. In addition, the loop also occurs on other RRPP
rings connected to the CX600, causing service interruption. After the problematic RRPP rings
are manually broken, services are recovered.
Cause Analysis
Due to a poor link quality, there is a time difference between the Up time of two ports on
CX600 and S3328 after the fiber is removed and reinstalled. This causes a loss of RRPP
packets. Packets are duplicated into multiple copies and loops are quickly formed. Within a
short period (1s), traffic volume exceeds the chip scheduling capability, and the excess traffic
occupies bandwidth for service and protocol packets. The RRPP ring cannot be recovered,
affecting all RRPP rings connected to the same data VSI. As a result, services on S3300s are
abnormal.
Handling Procedure
1. No priority scheduling configuration is performed on the subinterface transparently
transmitting RRPP packets on CX600. Therefore, the CX600 treats RRPP packets as
common data packets. If traffic volume exceeds the bandwidth, RRPP packets may be
discarded randomly.
The counter information of the traffic scheduling chip TM shows that timeout and error have
occurred, indicating that the traffic volume was huge and exceeds the chip processing
capability. In this situation, many packets will be discarded.
[A83101-CX600X8-01-diagnose]
……
SD587V_ReadReg:(0xc4420414)=0x0000b997( 32318) RBE_TIMEOUT_MCELL_CTR
SD587V_ReadReg:(0xc442046c)=0x0000712e( 43261) RBE TX CHECK ERROR CTR
……
2. When the port of CX600 connected to an RRPP ring goes Down, the master node on the
ring unblocks the secondary port. However, due to the time difference, there is a high
probability that a loop is generated within 1s. The secondary port is blocked as long as
the next RRPP packet is received. However, many RRPP packets are discarded.
Jun 7 2010 15:38:16 A83101-CX600X8-01 %%01PHY/4/PHY_STATUS_UP(l)
[25555]:Slot=1;GigabitEthernet1/1/6 change status to up.
3. When a loop is detected, the related subinterface is blocked and a trap is reported. Loop
detection is a Huawei property protocol, the protocol packets do not carry priority
information, so the packets may be discarded. No all loops have block information.
Jun 7 2010 15:38:28 A83101-CX600X8-01 FLD/4/TRAP:Slot=1;
1.3.6.1.4.1.2011.25.180.2.3 This interface is blocked.(PortIndex = 191,PortName =
GigabitEthernet1/1/3.99)
Jun 7 2010 15:38:29 A83101-CX600X8-01 FLD/4/TRAP:Slot=1;
1.3.6.1.4.1.2011.25.180.2.3 This interface is blocked.(PortIndex = 122,PortName =
GigabitEthernet1/1/9.99)
Jun 7 2010 15:38:30 A83101-CX600X8-01 FLD/4/TRAP:Slot=1;
1.3.6.1.4.1.2011.25.180.2.3 This interface is blocked.(PortIndex = 156,PortName =
GigabitEthernet1/1/5.99)
Network Diagram
In Figure 1.1, to improve network reliability, SwitchA and SwitchB are deployed on an RTN
network. The original chain network between RTNA and RTNB changes to a ring network.
ERPS is enabled on the RTN devices and switches.
Figure 1.1 ERPS becomes invalid when RTN interconnects with an S switch
Symptom
The RTN owner node and interfaces connected to switches are blocked.
Cause Analysis
ERPS packets of the RTN device and switch are different. EtherType is set to 0x8809 in
ERPS packets sent by the RTN device, which does not comply with protocol standards (the
standard value of EtherType is 0x8802). After receiving the ERPS packets, the switch cannot
forward them to upper-layer devices. As a result, the switch-side interface is also blocked.
Handling Procedure
1. Run the display erps verbose command to check for the ERPS interface state.
<HUAWEI> display erps verbose
Ring ID : 1
Description : Ring 1
Control Vlan : 4094
Protected Instance : 1
WTR Timer Setting (min) : 5 Running (s) : 0
Guard Timer Setting (csec) : 50 Running (csec) : 0
Holdoff Timer Setting (deciseconds) : 0 Running (deciseconds) : 0
Ring State : Pending
RAPS_MEL : 7
Time since last topology change : 0 days 0h:31m:36s
-------------------------------------------------------------------------------
-
The RPL owner interface on the RTN side is blocked. GE2/0/36 of SwitchB is also
blocked. The ERPS port states are abnormal.
2. Run the display erps statistics command to view ERPS packet statistics, checking
whether the switch has received packets from RTN.
<SwitchB> display erps statistics
-------------------------------------------------------------------------------
-
Ring Port Directtion SF NR NRRB
-------------------------------------------------------------------------------
-
1 Eth-Trunk1 RX 0 80 0
1 Eth-Trunk1 TX 0 16 0
1 GE2/0/36 RX 0 0 0
1 GE2/0/36 TX 0 11 0
The statistics show that GE2/0/36 has not received ERPS packets from RTNB. Packets
are obtained from GE2/0/36. Analyze whether the RTN device and switch send different
ERPS packets.
The analysis result shows that EtherType is 0x8809 in ERPS packets sent by the RTN
device and 0x8802 in ERPS packets sent by the switch. The standard value of
EtherType should be 0x8802. ERPS implementation of the RTN device does not
comply with standards, and the switch cannot forward ERPS packets sent from the RTN
device, causing a fault.
If permitted by the customer, deploy another loop prevention technology, such as STP.
Network Diagram
None.
Symptom
The switch generates a MAC address flapping alarm. Efforts are made to check for loops, but
the interface where the loop occurs fails to be located. The MAC address flapping problem
cannot be rectified.
Alarm information differs for fixed and modular switches based on versions. The following alarm
information is only used as an example.
Cause Analysis
1. A loop exists on the network.
2. There are multiple terminals with the same MAC address.
Handling Procedure
The preceding alarm information shows that the device can learn the same MAC address from
multiple interfaces. There are two possible causes: a loop exists, or there are multiple
terminals using the same MAC address. The second cause can be further divided into two
scenarios: multiple Layer 2 devices using the same MAC address, or multiple user terminals
using the same MAC address.
If there is a loop on the network, the alarm usually involves many MAC addresses. In
addition, traffic is heavy on some interfaces, with a large number of broadcast packets. If you
disable one interface where the alarm is generated, the alarm is cleared. MAC address
flapping occurs regardless of the service traffic volume.
If multiple terminals use the same MAC address, the alarm usually involves only one MAC
address or a small number of MAC addresses, and the statistics show that the number of
received and sent packets is within a normal range. Change the MAC address learning priority
for an interface. If traffic of users connected to this interface becomes abnormal, multiple user
terminals are using the same MAC address. In this case, change the MAC addresses of the
user terminals. If user traffic remains normal, some Layer 2 devices are using the same MAC
address. In this case, check the configuration of the Layer 2 devices and change their MAC
addresses.
1.6.7 Others
1.6.7.1 The S2300SI Configured with Loopback Detection Cannot Detect
Loops
Network Diagram
As shown in Figure 1.1, the S2300SI connects to the office network through the S2300EI.
Figure 1.1 Loops cannot be detected by the S2300SI configured with loopback detection
Symptom
The incoming traffic on the interface of the S2300SI connected to the S2300EI increases
continuously. There are loops on the office network, but the S2300SI configured with
loopback detection cannot detect loops.
Cause Analysis
The S2300SI supports loopback detection since V100R006. BPDUs are sent in untagged
mode, so the downstream device is required to transparently transmit BPDUs. The S2300SI
then can detect loops. The S2300EI terminates BPDUs or sends them to the CPU for
processing. That is, LBDT packets sent by an interface of the S2300SI cannot be forwarded
by the S2300EI. As a result, the S2300SI cannot detect loops.
Handling Procedure
The S2300SI is configured with loopback detection. Check whether the downstream device
connected to the S2300SI can transparently transmit BPDUs.
Network Diagram
None.
Symptom
When a switch establishes an OSPF neighbor relationship with another device, the OSPF
neighbor relationship is often Down.
Cause Analysis
The rates of incoming and outgoing traffic on all Up interfaces of the switch are large. A loop
occurs on the link. As a result, OSPF packets are discarded and the OSPF neighbor
relationship is Down.
%%01IFPDT/4/ABNORMAL_FLOW(D): Interface GigabitEthernet12/0/7's flow is abnormal.
(Speed=1000Mbps, CurrentInSpeed=814Mbps, CurrentOutSpeed=813Mbps,
File=IFPDT_FUNC_C, Line=13072)
%%01IFPDT/4/ABNORMAL_FLOW(D): Interface GigabitEthernet12/0/12's flow is abnormal.
(Speed=1000Mbps, CurrentInSpeed=624Mbps, CurrentOutSpeed=813Mbps,
File=IFPDT_FUNC_C, Line=13072)
%%01IFPDT/4/ABNORMAL_FLOW(D): Interface GigabitEthernet12/0/13's flow is abnormal.
(Speed=1000Mbps, CurrentInSpeed=625Mbps, CurrentOutSpeed=813Mbps,
File=IFPDT_FUNC_C, Line=13072)
Handling Procedure
1. Check the interface traffic. The maximum rates of incoming and outgoing traffic on
interfaces are reached during the problem occurrence.
2. Analyze logs and check whether the traffic rate is abnormal and exceeds the threshold.
3. Check the network configuration and eliminate the loop.
Conclusion
When a loop occurs on the network, many packets are looped back and traffic on many
interfaces is abnormal. If there is heavy incoming and outgoing traffic on some interfaces,
there is a high probability that loops occur.
Network Diagram
As shown in Figure 1.1, the switch is connected to an enterprise customer through a leased
line. The switch functions as a Layer 2 aggregation switch, and the NE80 functions as the
gateway.
Figure 1.1 Network where layer 2 packet loss occurs due to loops
Symptom
The enterprise customer complains about the slow service response problem. When you ping
an enterprise terminal from the NE80, packet loss occurs.
Cause Analysis
A loop exists on the downstream network attached to GE10/0/6. As a result, the MAC address
of the NE80 flaps between GE10/0/6 and GE12/0/0 of the switch. When GE10/0/6 learns the
MAC address of the NE80, user packets cannot be forwarded to the gateway.
Handling Procedure
1. Enable MAC address flapping detection on the switch and check alarms.
Alarm information differs for fixed and modular switches based on versions. The following alarm
information is only used as an example.
#Jul 28 09:59:34 2012 Switch L2IF/4/mac_flapping_alarm:OID
1.3.6.1.4.1.2011.5.25.42.2.1.7.12The mac-address has flap value .
(BaseTrapSeverity=0, BaseTrapProbableCause=0, BaseTrapEventType=4,
L2IfPort=549,entPhysicalIndex=1, MacAdd=0025-9e03-02f1,vlanid=107,
FormerIfDescName=GigabitEthernet12/0/0,CurrentIfDescName=GigabitEthernet10/0/6,
DeviceName= Switch)
The preceding alarm information indicates that MAC address flapping has occurred.
2. Set the NE80 MAC address to a static MAC address on GE12/0/0.
3. Eliminate the loop on the downstream network connected to GE10/0/6.
1.7 FAQ
1.7.1 Can a Switch Transparently Transmit BPDUs?
After the bpdu enable command is run on an interface, the interface sends received
BPDUs to the CPU for processing.
The local device determines whether to process BPDUs of a protocol depending on
whether the protocol is enabled. For example, whether STP BPDUs on an interface are
sent to the CPU depends on whether STP has been enabled on the interface using the stp
enable command.
After the bpdu disable command is run on an interface, the interface discards BPDUs.
By default, an interface discards received BPDUs.
To configure a switch to transparently transmit BPDUs, enable Layer 2 protocol transparent
transmission on an interface by running the l2protocol-tunnel all enable command in the
interface view. To ensure successful forwarding of packets, configure the default VLAN on
the inbound and outbound interfaces of all devices on the forwarding path.
1.7.2 What Are the Basis for STP Calculation? Will STP Topology
Be Changed When Port Rate Is Changed?
A spanning tree is calculated based on two metrics: ID and path cost.
IDs used in STP calculation include bridge ID (BID) and port ID (PID). On an STP network,
the device with the smallest BID is elected as the root bridge. The port priority affects role
selection in a specified MSTI.
Path cost is a variable used for link selection. STP selects more "robust" links and block
redundant links based on path costs to prune a network into a loop-free tree topology.
Port rate is used for cost calculation. The change of port rate will cause the path cost change,
and trigger STP recalculation.
Fixed switches
Fixed switches (excluding S2700) of V100R003 and later versions do not support global
MAC address flapping detection. They support only VLAN-based MAC address
flapping detection and actions such as sending traps and blocking interfaces when MAC
address flapping is detected.
Run the following command in the VLAN view to enable MAC address flapping
detection:
VLAN view: loop-detect eth-loop alarm-only
Since V200R001, switches have supported global MAC address flapping detection,
VLAN whitelist, and quit-vlan action.
You can run the display rrpp statistics domain domain-id command to check statistics on RRPP
packets.