NE40E-M2 V800R021C10 Feature Description
NE40E-M2 V800R021C10 Feature Description
NE40E-M2 V800R021C10 Feature Description
V800R021C10
Feature Description
No part of this document may be reproduced or transmitted in any form or by any means without prior written
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and
recommendations in this document are provided "AS IS" without warranties, guarantees or representations of
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statments, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Bantian, Longgang
Shenzhen 518129
Website: https://www.huawei.com
Email: support@huawei.com
Feature Description
Contents
2 VRPv8 Overview 2
2.2.1 Introduction 5
2.2.2 Architecture 7
3 Basic Configurations 14
3.2.2.1 TTY 17
2022-07-08 I
Feature Description
4 System Management 95
4.2 VS Description 98
4.2.1 Overview of VS 98
4.2.2 Understanding VS 99
4.2.2.1 VS Fundamentals 99
2022-07-08 II
Feature Description
2022-07-08 III
Feature Description
2022-07-08 IV
Feature Description
2022-07-08 V
Feature Description
2022-07-08 VI
Feature Description
2022-07-08 VII
Feature Description
4.15.4 Terms and Abbreviations for 1588v2, SMPTE-2059-2, and G.8275.1 300
2022-07-08 VIII
Feature Description
2022-07-08 IX
Feature Description
2022-07-08 X
Feature Description
5.2.3.1 Faults on an Intermediate Node or on the Link Connected to It - LDP FRR/TE FRR 403
2022-07-08 XI
Feature Description
5.4.3.1 Application of MPLS OAM in the IP RAN Layer 2 to Edge Scenario 473
2022-07-08 XII
Feature Description
5.5.3.1 Application of MPLS-TP OAM in the IP RAN Layer 2 to Edge Scenario 488
2022-07-08 XIII
Feature Description
5.7.5.2 Fault Information Advertisement Between EFM and Other Modules 578
5.7.5.3 Fault Information Advertisement Between CFM and Other Modules 581
2022-07-08 XIV
Feature Description
5.9.3.9 Load Balancing Based on Odd and Even MAC Addresses 618
5.10.2.11 Bit Error Rate-based Selection of an mLDP Tunnel Outbound Interface 646
2022-07-08 XV
Feature Description
RAN 648
5.10.3.2 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which LDP LSPs Carry an IP RAN
650
2022-07-08 XVI
Feature Description
6.3.3 Terms and Abbreviations for Transmission Alarm Customization and Suppression 717
2022-07-08 XVII
Feature Description
2022-07-08 XVIII
Feature Description
2022-07-08 XIX
Feature Description
2022-07-08 XX
Feature Description
2022-07-08 XXI
Feature Description
7.14.3.2 Establishment of a VXLAN in Centralized Gateway Mode Using BGP EVPN 1006
7.14.3.3 Establishment of a VXLAN in Distributed Gateway Mode Using BGP EVPN 1017
7.14.4.1 Establishment of a Three-Segment VXLAN for Layer 3 Communication Between DCs 1031
7.14.4.2 Using Three-Segment VXLAN to Implement Layer 2 Interconnection Between DCs 1036
7.14.5.2 Application for Communication Between Terminal Users on a VXLAN and Legacy Network 1072
2022-07-08 XXII
Feature Description
2022-07-08 XXIII
Feature Description
2022-07-08 XXIV
Feature Description
9 IP Services 1187
2022-07-08 XXV
Feature Description
9.3.3.1 ACLs Applied to Telnet (VTY), SNMP, FTP & TFTP 1233
2022-07-08 XXVI
Feature Description
2022-07-08 XXVII
Feature Description
2022-07-08 XXVIII
Feature Description
2022-07-08 XXIX
Feature Description
10 IP Routing 1431
2022-07-08 XXX
Feature Description
10.2.2.16 Direct Routes Responding to L3VE Interface Status Changes After a Delay 1453
10.2.3.2 Data Center Applications of Association Between Direct Routes and a VRRP Group 1457
10.2.3.3 IPRAN Applications of Association Between Direct Routes and a VRRP Group 1459
2022-07-08 XXXI
Feature Description
2022-07-08 XXXII
Feature Description
2022-07-08 XXXIII
Feature Description
2022-07-08 XXXIV
Feature Description
2022-07-08 XXXV
Feature Description
11 IP Multicast 1835
2022-07-08 XXXVI
Feature Description
2022-07-08 XXXVII
Feature Description
2022-07-08 XXXVIII
Feature Description
2022-07-08 XXXIX
Feature Description
2022-07-08 XL
Feature Description
11.11.2.6 BIERv6 Inter-AS Static Traversal and Intra-AS Automatic Traversal 2116
2022-07-08 XLI
Feature Description
2022-07-08 XLII
Feature Description
12 MPLS 2216
2022-07-08 XLIII
Feature Description
12.3.2.21 Support for the Creation of a Primary mLDP P2MP LSP in the Class-Specific Topology 2276
2022-07-08 XLIV
Feature Description
12.4.5.5 Association Between CR-LSP Establishment and the IS-IS Overload 2323
2022-07-08 XLV
Feature Description
12.6.2.4 UNI Tunnel Calculation Using Both IP and Optical PCE Servers 2393
12.6.2.5 SRLG Sharing Between Optical and IP Layers Within a Transport Network 2394
2022-07-08 XLVI
Feature Description
13.2.2.6.5 Service Traffic Steering into an SR-MPLS BE Path Based on Flex-Algo 2438
2022-07-08 XLVII
Feature Description
2022-07-08 XLVIII
Feature Description
2022-07-08 XLIX
Feature Description
13.3.3.2 SRv6 Application for Cross-Domain Cloud Backbone Private Lines 2684
15 VPN 2689
15.3.3.1 Enlarging the Operation Scope of the Network with Limited Hops 2708
15.3.3.3 CEs Connecting to the MPLS VPN over GRE Tunnels 2710
2022-07-08 L
Feature Description
2022-07-08 LI
Feature Description
15.7.3.4 Application of Route Import Between VPN and Public Network in the Traffic Cleaning Networking 2794
2022-07-08 LII
Feature Description
15.9.2.1 Centralized Management of IP Hard-Pipe-based Leased Line Services on the NMS 2847
15.9.3.3 Hard-Pipe-based Leased Line Services Implemented by Huawei and Non-Huawei Devices 2853
2022-07-08 LIII
Feature Description
2022-07-08 LIV
Feature Description
15.13.12.6 NFVI Distributed Gateway Function (BGP VPNv4/v6 over E2E SR Tunnels) 3042
15.13.12.7 NFVI Distributed Gateway Function (BGP EVPN over E2E SR Tunnels) 3052
2022-07-08 LV
Feature Description
16 QoS 3079
2022-07-08 LVI
Feature Description
2022-07-08 LVII
Feature Description
17 Security 3244
2022-07-08 LVIII
Feature Description
17.4.3.1 Application of BGP Flow Specification on a Network with Multiple Ingresses 3289
17.5.2.3 Man-in-the-middle Attack, IP/MAC Spoofing Attack, and DHCP Exhaustion Attack 3302
2022-07-08 LIX
Feature Description
2022-07-08 LX
Feature Description
2022-07-08 LXI
Feature Description
2022-07-08 LXII
Feature Description
2022-07-08 LXIII
Feature Description
2022-07-08 LXIV
Feature Description
17.24.3.1.2 RADIUS Attributes Defined by Huawei+1.1 Protocol (Vendor = 2011, Attribute Number=26) 3469
2022-07-08 LXV
Feature Description
2022-07-08 LXVI
Feature Description
2022-07-08 LXVII
Feature Description
2022-07-08 LXVIII
Feature Description
18.8.3.6 TWAMP Light Application in Eth-Trunk Member Interface-based Measurement Scenarios 3662
2022-07-08 LXIX
Feature Description
18.10.3.1.5 IFIT Application in a Scenario Where Public Network Traffic Enters an SRv6 Tunnel 3689
2022-07-08 LXX
Feature Description
2022-07-08 LXXI
Feature Description
2022-07-08 LXXII
Feature Description
2022-07-08 LXXIII
Feature Description
2022-07-08 LXXIV
Feature Description
2022-07-08 LXXV
Feature Description
19.9.3.5 User Access Dual-device Hot Backup Configured Together with Value-Added Services 3945
2022-07-08 LXXVI
Feature Description
19.10.3 RADIUS Attribute Prohibition, Conversion, and Default Carrying Status 4010
2022-07-08 LXXVII
Feature Description
2022-07-08 LXXVIII
Feature Description
19.10.4.1.2 RADIUS Attributes Defined by Huawei+1.1 Protocol (Vendor = 2011, Attribute Number=26) 4062
2022-07-08 LXXIX
Feature Description
2022-07-08 LXXX
Feature Description
2022-07-08 LXXXI
Feature Description
19.10.4.1.3 RADIUS Attributes Defined by DSL Forum (Vendor ID = 3561, Attribute Number=26) 4120
2022-07-08 LXXXII
Feature Description
19.10.4.1.4 RADIUS Attributes Defined by Microsoft (Vendor ID = 311, Attribute Number=26) 4128
19.10.4.1.5 RADIUS Attributes Defined by Redback (Vendor ID = 2352, Attribute Number=26) 4134
2022-07-08 LXXXIII
Feature Description
19.10.4.1.7 RADIUS Attributes Defined by Huawei+1.0 Protocol (Vendor = 2011, Attribute Number=26) 4137
2022-07-08 LXXXIV
Feature Description
2022-07-08 LXXXV
Feature Description
2022-07-08 LXXXVI
Feature Description
2022-07-08 LXXXVII
Feature Description
20.2.6.3 NAT Deployment in Outbound Interface Traffic Diversion Mode for Education Network 4458
2022-07-08 LXXXVIII
Feature Description
20.2.6.10 NAT Easy IP and a GRE Tunnel Sharing an Interface IP Address 4464
2022-07-08 LXXXIX
Feature Description
21 Value-Added-Service 4523
2022-07-08 XC
Feature Description
2022-07-08 XCI
Feature Description
2022-07-08 XCII
Feature Description
• Enterprise users
• Carrier users
2022-07-08 1
Feature Description
2 VRPv8 Overview
Purpose
This document describes the VRP8 features in terms of its overview, architecture, system features and
features.
This document together with other types of document helps intended readers get a deep understanding of
the VRP8 features.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
2022-07-08 2
Feature Description
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
2022-07-08 3
Feature Description
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
2022-07-08 4
Feature Description
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2.2.1 Introduction
Huawei has been dedicated to developing the Versatile Routing Platform (VRP) for the last 10-plus years to
provide improved IP routing services. The VRP has been widely applied to Huawei IP network devices,
including high-end and low-end switches and routers. As network convergence and IP orientation develop,
the VRP has also been applied to wireless and transmission devices, such as the Gateway GPRS Support
Node (GGSN) and Serving GPRS Support Node (SGSN) wireless devices and the multi-service transmission
platform (MSTP) and packet transport network (PTN) transmission devices.
The VRP provides various basic IP routing services and value-added services.
■ TCP
■ Multiprotocol Label Switching (MPLS) protocols, including MPLS Label Distribution Protocol (LDP)
and MPLS traffic engineering (TE)
■ Security
■ Firewall
The network devices running the VRP are configured and managed on the following universal management
interfaces:
2022-07-08 5
Feature Description
• SNMP
• Netconf
As a large-scale IP routing software package, the VRP has been developed based on industry standards and
has passed rigorous tests before being released. Huawei rigorously tests all software versions to make sure
that they comply with all the relevant standards before they are released. Major features and specifications
of the VRP satisfy industry standards, including the standards defined by the Internet Engineering Task Force
(IETF) and International Telecommunication Union-Telecommunication Standardization Sector (ITU-T).
The VRP software platform has also been verified by the market as well. So far, the VRP has been installed
on more than 2,000,000 network devices. As IP technologies and hardware develop, new VRP versions are
released to provide higher performance, extensibility, and reliability, and more value-added services.
Following this development trend to provide higher network reliability and to fully use the processing
capabilities of the multi-core hardware, Huawei developed the VRP8 based on pre-existing versions. The
VRP8 supports the following features:
• Distributed applications
2022-07-08 6
Feature Description
The VRP5 is a distributed network operating system and features high extensibility, reliability, and
performance. Currently, network devices running VRP5 are serving more than 50 carriers worldwide. The
VRP5 provides various features and its stability has withstood the market test.
The VRP8 is a next-generation network operating system, which has a distributed, multi-process, and
component-based architecture. The VRP8 supports distributed applications and virtualization techniques. It
builds upon the hardware development trend and will meet carriers' exploding service requirements for the
next five to ten years.
2.2.2 Architecture
2022-07-08 7
Feature Description
The VRP8 software can be customized to a product architecture that is quite different from its original
hardware platform.
• Capacity and performance: Services based on fine-granularity distribution are simultaneously processed.
• Operation and maintenance tools: The configuration plane is separated from the control plane.
The trend is to utilize, multi-main control board, multi-CPU, and multi-core architectures in the development
of the hardware on existing core routers. The reason is that traditional integrated OS does not support
modular service deployment or processing, and only depends on the processing capability of a single CPU
with a single core. The second-generation OS supports coarse-granularity modules, allowing multiple
protocols and service modules to simultaneously process services. These OSs, however, are incapable of
supporting the processing of protocol- and service-specific distributed instances and are still unable to take
advantage of multi-CPU and multi-core processing capabilities. The VRP8 with its fine-granularity distributed
architecture and protocol- and service-specific components allows a device to deploy services in distributed
instances and to process services simultaneously. This helps a device overcome the constraints of the single
entity's processing capability and memory and to take advantage of integral hardware processing capability
on the control plane, improving the sustainable extensibility of the device's performance and capacity.
2022-07-08 8
Feature Description
On the VRP8, the data plane adopts a model-based data processing technique. A mere change in the
forwarding model, not in code, allows a new function to be implemented or allows a function change on the
forwarding plane, enabling quick responses to carriers' demands.
Configuration Management
As shown in Figure 1, the VRP8 management plane adopts a hierarchical architecture, consisting of the
following elements:
• Configuration tools
2022-07-08 9
Feature Description
• Configuration data
A configuration interface layer provides various configuration tools. A configuration tool parses a
configuration request and then sends the request to a Configuration (CFG) component. The CFG component
uses a pre-defined configuration information model to perform verification, association, and generation of
configuration data. After a user commits a configuration and the configuration is successfully executed,
configuration data is saved in a central database. A process-specific APP database obtains the configuration
information from the central database.
The VRP8 supports two-phase configuration validation and configuration rollback.
Fault Management
As shown in Figure 2, the VRP8 implements fault management based on service objects. The VRP8 creates a
service object relationship model to analyze the correlation between alarms, filter out invalid alarms, and
report root alarms, speeding up fault identification.
2022-07-08 10
Feature Description
Performance Management
As shown in Figure 3, the VRP8 provides a flexible performance management mechanism. Information about
an object to be monitored, including a description of the object and a monitoring threshold, can be manually
defined on a configuration interface. The configuration data can then be delivered by the central database.
The APP component collects statistics about the configured object and sends them to a performance
management (PM) module through a PM agent. After receiving the statistics, the PM module generates
information about a fault based on the pre-defined object and monitoring threshold and then sends the
fault information to the network management system (NMS) through the fault management center.
Performance information can be viewed by running a command or through the NMS.
2022-07-08 11
Feature Description
Plug-and-Play
As shown in Figure 4, VRP8 plug-and-play allows a great number of devices to be deployed on a site at a
time and to be managed and maintained in remote mode, reducing OPEX.
3. Devices automatically apply for IP addresses and initial configurations and the DHCP server assigns IP
addresses and delivers initial configurations.
4. The devices report their presence to the NMS and the NMS detects the devices online. Then the
commissioning engineers remotely commission the devices and configure services.
• High extensibility
■ The VRP8 has a layered architecture with clear inter-layer dependency and interfaces and
independent intra-layer components.
■ The base framework uses the model-driven architecture technology, with stable processing
mechanisms and flexible separated service models, to rapidly respond to customers' requirements
for software features.
■ Based on the standard driver framework, hardware drivers support plug-and-play, implementing
2022-07-08 12
Feature Description
Benefits: flexible service operation, timely response to customers' requirements, and smooth hardware
upgrades
• High reliability
■ Process-based NSx helps implement seamless convergence on the forwarding plane, control plane,
and service plane.
Benefits: non-stop service operation with high reliability and reduced operation and maintenance
expenditures
• High performance
■ Services are distributed in fine granularity and processed at the same time, achieving industry-
leading performance and specification indicators.
■ Performance and specifications are expandable and can be improved along with hardware
upgrades.
Benefits: larger-scaled service deployment and faster fault convergence, full use of hardware
capabilities, and continuous improvement in performance and specifications
■ The carrier-class configuration and management plane facilitates service deployment and
maintenance.
Benefits: more effective service deployment capabilities, faster service monitoring and fault locating,
and lower operation and maintenance expenditures
2022-07-08 13
Feature Description
3 Basic Configurations
Purpose
This document describes the basic configurations features in terms of its overview, principle, and
applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
2022-07-08 14
Feature Description
password to "%^%#". This causes the password to be displayed directly in the configuration file.
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
2022-07-08 15
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
2022-07-08 16
Feature Description
in earlier issues.
• Console port
Routers support user login over console or VTY ports. You can use a console port to set user interface
parameters, such as the speed, databits, stopbits, and parity. You can also initiate a Telnet or Secure Shell
(SSH) session to log in to a VTY port.
3.2.2.1 TTY
User Management
You can configure, monitor, and maintain local or remote network devices only after configuring user
interfaces, user management, and terminal services. User interfaces provide login venues, user management
ensures login security, and terminal services provide login protocols. Router supports user login over console
ports.
User Interface
A user interface is presented in the form of a user interface view for you to log in to a router. You can use
user interfaces to set parameters on all physical and logical interfaces that work in asynchronous and
interactive modes, and manage, authenticate, and authorize login users. Routers allow users to access user
2022-07-08 17
Feature Description
User Login
If a device is started for the first time and you log in to the device through a console port, the system
prompts you to set a password. When you re-log in to the device through a console port, you must enter the
correct password to log in.
When a router is powered on for the first time, you must log in to the router through the console port, which is a
prerequisite for other login modes as well. For example, you can use Telnet to log in to a router only after you use the
console port to log in to the router and configure an IP address.
Definition
The command line interface (CLI) is an interface through which you can interact with a Router. The system
provides a series of commands that allow you to configure and manage the Router.
Purpose
The CLI is a traditional configuration tool, which is available on most data communication products.
However, with the wider application of data communication products worldwide, customers require a more
available, flexible, and friendly CLI.
Carrier-class devices have strict requirements for system security. Users must pass the Authentication,
Authorization and Accounting (AAA) authentication before logging in to a CLI or before running commands,
which ensures that users can view and use only the commands that match their rights.
• If a matching command exists, the system enters the command checking phase.
• If a matching command does not exist, the system informs you that the command is invalid and
waits for a new command.
• If all command elements are valid, the system authenticates the command.
• If any command element is invalid, the system informs you that the command is invalid and waits
for a new command.
• If you have permission to run the command, the system begins to parse the command.
• If you do not have permission to run the command, the system displays a message and waits for
a new command.
• Full help
2022-07-08 19
Feature Description
■ In any command view, when you enter a question mark (?) at the command prompt, all the first
element of the commands available in the command view and their brief descriptions are listed.
■ When you enter a command followed by a space and a question mark (?), all the keywords and
their brief descriptions are listed if the position of the question mark (?) is for a keyword.
■ When you enter a command followed by a space and a question mark (?), the value range and
function of the parameter are listed if the position of the question mark (?) is for a parameter.
To provide full help in command mode, the CLI undergoes the following phases:
• If a matching command exists, the system matches commands with your permission and
displays all commands you can use.
• If a matching command does not exist, the system informs you that the command is invalid
and waits for a new command.
• If the entered command is incomplete, possible command elements and their description are
displayed.
• Partial help
■ When you enter a string followed by a question mark (?), the system lists all keywords that start
with the string.
■ If the position of the question mark (?) is for a keyword, all keywords in the command starting
with the string are listed.
■ If the position of the question mark (?) is for a parameter and the parameter is valid,
information about all the parameters starting with the string is listed, including the value
range.
■ If the position of the question mark (?) is for a parameter but the parameter is invalid, the CLI
informs you that the input is incorrect.
To provide partial help in specific command mode, the CLI undergoes the following phases:
2022-07-08 20
Feature Description
• If a matching command exists, the system matches commands with your permission and
displays all commands you can use.
• If a matching command does not exist, the system informs you that the command is invalid
and waits for a new command.
• Tab help
Tab help is an application of partial help, which provides help only for keywords. The system does not
display the description of a keyword.
You can enter the first letters of a keyword in a command and press Tab.
■ If what you have entered identifies a unique keyword, the complete keyword is displayed.
■ If what you have entered does not identify a unique keyword, you can press Tab repeatedly to view
the matching keywords and select the desired one.
■ If what you have entered does not match any command element, the system does not modify the
input and just displays what you have entered.
■ If what you have entered is not a keyword in the command, the system does not modify the input
and just displays what you have entered.
The CLI also provides dynamic help for querying the database and script. If parameters in a command
support dynamic help and you enter the first letters of a parameter in the command and press Tab, the
following situations occur:
■ If what you have entered identifies a unique parameter, the complete parameter is displayed.
■ If what you have entered does not identify a unique parameter, you can press Tab repeatedly to
view the matching parameters and select the desired one.
2022-07-08 21
Feature Description
• User-defined shortcut key: You can associate a shortcut key with any command. When the shortcut key
is used, the system automatically executes the corresponding command.
• System shortcut key: System shortcut keys are fixed in the system. They represent fixed functions and
cannot be defined by users.
Different terminal software defines shortcut keys differently. Therefore, the shortcut keys on your terminal may be
different from those listed here.
Definition
• Configuration: a series of command operations performed on the system to meet service requirements.
These operations still take effect after the system restarts.
• Configuration file: a file used to save configurations. You can use a configuration file to view
configuration information. You can also upload a device's configuration file to other devices for batch
management.
A configuration file saves command lines in a text format. (Non-default values of the command
parameters are saved in the file.) Commands can be organized into a basic command view framework.
The commands in the same command view can form a section. Empty or comment lines can be used to
separate different sections. The line beginning with "#" is a comment line.
• Configuration management: a function for managing configurations and configuration files using a
series of commands.
A storage medium can save multiple configuration files. If the location of a device on the network
changes, its configurations need to be modified. To avoid reconfiguring the device, specify a
configuration file for the next startup. The device restarts with new configurations to adapt to its new
environment.
2022-07-08 22
Feature Description
Purpose
Configuration management allows you to lock, preview, and discard configurations, save the configuration
file used at the current startup, and set the configuration file to be loaded at the next startup of the system.
Benefits
Configuration management offers the following benefits:
Basic Principles
In two-phase validation mode, the system configuration process is divided into two phases. The actual
configuration takes effect after the two phases are complete. Figure 1 shows the two phases of the system
configuration process.
1. In the first phase, a user enters configuration commands. The system checks the data type, user level,
and object to be configured, and checks whether there are repeated configurations. If syntax or
semantic errors are found in the command line, the system displays a message on the terminal to
inform the user of the error and cause.
2. In the second phase, the user commits the configuration. The system then enters the configuration
commitment phase and commits the configuration in the candidate database to the running database.
• If the configuration takes effect, the system adds it to the running database.
• If the configuration fails, the system informs the user that the configuration is incorrect. The user
can enter the command line again or change the configuration.
2022-07-08 23
Feature Description
In two-phase validation mode, if a configuration has not been committed, the symbol "*" is displayed in the
corresponding view (except the user view). If all configurations have been configured, the symbol "~" is displayed
in the corresponding view (except the user view).
• Running database:
A configuration set that is currently being used by the system.
• Candidate database:
For each user, the system generates a mapping of the running database. Users can edit the
configuration in the candidate database and commit the edited configuration to the running database.
Validity Check
After users enter the system view, the system assigns each user a candidate database. Users perform
configuration operations in their candidate databases, and the system checks the validity of each user's
configurations.
In two-phase validation mode, the system checks configuration validity and displays error messages. The
system checks the validity of the following configuration items:
• Repeated configuration
The system checks whether configurations in the candidate databases are identical to those in the
running database.
■ If configurations in the candidate databases are identical to those in the running database, the
system does not commit the configuration to the running database and displays repeated
configuration commands.
■ If configurations in the candidate databases are different from those in the running database, the
system commits the configuration to the running database.
• Data type
2022-07-08 24
Feature Description
Benefits
The two-phase validation mode offers the following benefits:
• Clears configurations that do not take effect if an error occurs or the configuration does not meet
expectations.
Basic Concepts
• Configuration: a set of specifications and parameters about services or physical resources. These
specifications and parameters are visible to and can be modified by users.
• Configuration operation: a series of actions taken to meet service requirements, such as adding,
deleting, or modifying the system configurations.
• Configuration rollback point: Once a user commits a configuration, the system automatically generates
a configuration rollback point and saves the difference between the current configuration and the
historical configuration at this configuration rollback point.
Usage Scenario
Users can check the system running state after committing system configurations. If a fault or an
unexpected result (such as service overload, service conflict, or insufficient memory resources) derived from
2022-07-08 25
Feature Description
misoperations is detected during the check, the system configurations must roll back to a previous version.
The system allows users to delete or modify the system configurations only one by one.
Configuration rollback addresses this issue by allowing users to restore the original configurations in batches.
• The system automatically records configuration changes each time a change is made.
• Users can specify the historical state to which the system configurations are expected to roll back based
on the configuration change history.
For example, a user has committed four configurations and four consecutive rollback points (A, B, C, and D)
are generated. If an error is found in configurations committed at rollback point B, configuration rollback
allows the system to roll back to the configurations at rollback point A.
Configuration rollback significantly improves maintenance efficiency, reduces maintenance costs, and
minimizes error risks when configurations are manually modified one by one.
Principles
As shown in Figure 1, a user committed configurations N times. Rollback point N indicates the most recent
configuration the user committed. The configuration rollback procedure is as follows:
1. The user determines to roll the system configuration back to rollback point X based on the comparison
between the historical and current configurations.
2. After the user performs the configuration rollback operation, the system rolls back to the historical
state at rollback point X and generates a new rollback point N+1, which is specially marked.
Configuration rollback works in a best-effort manner. If a configuration fails to be rolled back, the system
records the configuration.
Benefits
Configuration rollback brings significant benefits for users in terms of configuration security and system
maintenance.
• Minimizes impact of mistakes caused by misoperations. For example, if a user mistakenly runs the undo
2022-07-08 26
Feature Description
bgp command, Border Gateway Protocol (BGP)-related configurations (such as peer configurations) are
deleted. Configuration rollback allows the system to roll back configurations to what they were before
the user ran the undo bgp command.
• Facilitates feature testing: When a user is testing a feature, the system generates only one rollback
point if all the feature-related configurations are committed at the same time. Before the user tests
another feature, configuration rollback allows the system to roll back configurations to what they were
before the previous feature was tested, ruling out the possibility that the previous feature affects the
one to be tested.
• Functions properly regardless of whether the device restarts. A configuration rollback point remains
after a device restarts. If any change is made after the restart, the system automatically generates a
non-user-triggered configuration rollback point and saves it. Users can determine whether to roll system
configurations back to what they were before the device restarts.
Usage Scenario
Deploying unverified new services directly on live network devices may affect the current services or even
disconnect devices from the network management system (NMS). To address this problem, you can deploy
configuration trial run. Configuration trial run will roll back the system to the latest rollback point by
discarding the new service configuration if the new services threaten system security or disconnect devices
from the NMS. This function improves system security and reliability.
Principles
Configuration trial run takes effect only in two-phase configuration validation mode.
As shown in Figure 1, a user committed configurations N times. Rollback point N indicates the most recent
configuration that the user committed. The configuration trial run procedure is as follows:
In two-phase configuration validation mode, you can specify a timer for the configuration trial run to take
effect. Committing the configuration trial run is similar to committing an ordinary configuration, but the
committed configuration takes effect temporarily for the trial. Each time you commit a configuration, the
system generates a rollback point and starts the specified timer for the trial run. You cannot change the
configuration during the trial run, but you can check configurations at rollback points or perform
maintenance operations.
Before the timer expires, you can confirm or abort the tested configuration. If you commit the tested
configuration, the timer stops and the configuration trial run ends. And if you abort the configuration trial
run, the system will roll back to the latest rollback point by discarding the tested configuration. Meanwhile,
2022-07-08 27
Feature Description
Usage Scenario
With the growth in network scales and complexity, network configuration becomes more complex. A large
number of network device configurations are duplicate. This function allows you to import the same
configurations and then manually add different configurations, reducing configuration workloads.
Principles
You can copy the system configuration data file to a local device and then load the configuration file. After
the configuration file is loaded, you can directly commit the configuration, or edit the configuration through
the CLI before you submit it.
After the configuration file is loaded, the configuration in the file overwrites the candidate configuration. For
example, if the BGP configuration does not exist on the device but is required, you can load the
configuration file to load the BGP configuration. If the loaded configuration conflicts with the existing
configuration, the loaded configuration overwrites the existing configuration.
The one-click configuration import function is supported only in the two-phase configuration validation mode. After a
configuration file is loaded in this mode, run the commit command to commit the configuration.
Benefits
2022-07-08 28
Feature Description
• File configuration replacement: Replace all the running configurations on the current device with a
configuration file that contains all the configurations of the device.
The system compares the specified configuration file with the full configuration that is running on the
device, identifies the differences, and then automatically executes the configuration with differences. For
example, if the replacing file contains configurations a, b, c, and d, and the current configurations of the
device are a, b, e, and f, the differences between the two files are +c, +d, -e, -f. The process of replacing
the configuration file is to add configurations c and d and to delete configurations e and f.
■ +: added configuration
■ -: deleted configuration
• Segment configuration replacement: Replace the configuration only in a specified view or in a scope
restricted using the <replace/> tag, because the specified configuration file contains only the
configuration of this view or because the <replace/> tag restricts the replacement scope in a
configuration file that contains all configurations.
For example, replace the configuration in the AAA view on the current device. Enter the AAA view and
save the configuration in the AAA view to a specified file (the file name can be customized). In this way,
the saved configuration contains the <replace/> tag. When the configuration replacement command is
executed, the device replaces the configuration in the destination file according to the <replace/> tag.
• Pasting of differential configurations: Query the configurations that are different between the current
device and other devices, paste the differential configurations to the current device, and commit the
configurations.
For example, the current device is device A and its configuration needs to be the same as device B. After
the configuration file on device B is transmitted to device A, a command is executed on device A to
query the configuration differences between device A and device B. Then these differences are pasted to
device A.
• Character string replacement: Enter the specific service view and run the character string replacement
command to replace the specified character string in the current view with the target character string.
2022-07-08 29
Feature Description
Definition
ZTP enables a newly delivered or unconfigured device to automatically load version files (including the
system software, configuration file, and patch file) when the device starts.
Purpose
In conventional network device deployment, network administrators are required to perform manual onsite
configuration and software commissioning on each device after hardware installation is complete. Therefore,
deploying a large number of geographically scattered devices is inefficient and incurs high labor costs.
ZTP addresses these issues by automatically loading version files from a file server, without requiring onsite
manual intervention in device deployment and configuration.
Benefits
2022-07-08 30
Feature Description
ZTP eliminates the need for onsite device deployment and configuration, improves deployment efficiency,
and reduces labor costs.
• DHCP server: assigns the following addresses to a ZTP-enabled device: temporary management IP
address, default gateway address, DNS server address, and intermediate file server address.
• DHCP relay agent: forwards DHCP packets between the ZTP-enabled device and DHCP server that
reside on different network segments.
• Intermediate file server: stores the intermediate Intermediate File in the INI Format, Intermediate File
in the CFG Format or Intermediate Files in the Python Format file required by the ZTP process. The
intermediate file contains the version file server address and information about version files, which the
ZTP-enabled device can learn by parsing the intermediate file. An intermediate file server can be a TFTP,
FTP, or SFTP server.
• Version file server: stores version files, including the system software, configuration file, and patch file.
A version file server and an intermediate file server can be deployed on the same server that supports
2022-07-08 31
Feature Description
• DNS server: stores mapping between domain names and IPv4/IPv6 addresses. A DNS server can provide
a ZTP-enabled device with the IPv4/IPv6 address that maps the domain name of an IPv4/IPv6 file server,
so that the ZTP-enabled device can obtain files from the IPv4/IPv6 file server.
File transfer through TFTP or FTP is prone to security risks, and therefore the SFTP file transfer mode is recommended.
To enable ZTP to apply for an IPv4/IPv6 address through DHCP, select the DHCP server, DHCP relay agent, intermediate
file server, version file server, and DNS server that support IPv4/IPv6. In addition, the server address in the intermediate
file must be an IPv4/IPv6 address.
ZTP Process
Figure 2 shows the ZTP process.
2022-07-08 32
Feature Description
2022-07-08 33
Feature Description
fields contain the IP address of the DHCP server, default gateway address, intermediate file server
address, and intermediate file name.
The file name extension of a pre-configuration script must be .py. The file name is a string of 1 to 65 case-
sensitive characters which can be a combination of digits, letters, and underscores (_). It cannot contain
spaces. The file name must not start with a digit or contain other special characters. A pre-configuration
script can be named as preconfig.py for example. Use the Python 3.7 syntax to compile or modify the script
file. For details about script file explanation, see Preconfiguration Script File Explanation.
The following preconfiguration script file is only an example and needs to be modified as required.
The SHA256 checksum in the following file is only an example.
#sha256sum="68549835edaa5c5780d7b432485ce0d4fdaf6027a8af24f322a91b9f201a5101"
2022-07-08 34
Feature Description
#!/usr/bin/env python
# coding=utf-8
#
# Copyright (C) Huawei Technologies Co., Ltd. 2008-2013. All rights reserved.
# ----------------------------------------------------------------------------------------------------------------------
# Project Code : VRPV8
# File name : preconfig.py
# ----------------------------------------------------------------------------------------------------------------------
# History:
# Date Modification
# 20180415 created file.
# ----------------------------------------------------------------------------------------------------------------------
import sys
import http.client
import logging
import logging.handlers
import string
import traceback
import re
import xml.etree.ElementTree as etree
import ops
# error code
OK =0
ERR =1
NOT_START_PNP = 2
ETHTRUNK_WORK_MODE = 'Static'
MAX_TIMES_CHECK_STARTUPCFG = 36
CHECK_CHECK_STARTUP_CFG_INTERVAL = 5
class OPIExecError(Exception):
""""""
pass
class NoNeedPNP(Exception):
""""""
pass
class OPSConnection(object):
"""Make an OPS connection instance."""
2022-07-08 35
Feature Description
def close(self):
"""Close the connection"""
self.conn.close()
def convert_byte_to_str(data):
result = data
if type(data) != type(""):
result = str(data, "iso-8859-1")
return result
def get_startup_cfg_info(ops_conn):
uri = "/cfg/startupInfos/startupInfo"
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<startupInfo>
<position/>
<configedSysSoft/>
<curSysSoft/>
<nextSysSoft/>
<curStartupFile/>
<nextStartupFile/>
<curPatchFile/>
<nextPatchFile/>
2022-07-08 36
Feature Description
</startupInfo>'''
config = None
config1 = None
ret, _, rsp_data = ops_conn.get(uri, req_data)
if ret != http.client.OK or rsp_data is '':
logging.warning('Failed to get the startup information')
return ERR, config, config1
root_elem = etree.fromstring(rsp_data)
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
mpath = 'data' + uri.replace('/', '/vrp:') # match path
nslen = len(namespaces['vrp'])
elem = root_elem.find(mpath, namespaces)
if elem is None:
logging.error('Failed to get the startup information')
return ERR, config, config1
def is_need_start_pnp(ops_conn):
ret, config, _ = get_startup_cfg_info(ops_conn)
if ret is OK and config is not None and config != "cfcard:/vrpcfg.zip":
logging.info("No need to run ztp pre-configuration when device starts with configuration file")
return False
return True
def check_nextstartup_file(ops_conn):
cnt = 0
check_time = MAX_TIMES_CHECK_STARTUPCFG
while cnt < check_time:
ret, _, config1 = get_startup_cfg_info(ops_conn)
if ret is OK and config1 is not None and config1 == "cfcard:/vrpcfg.zip":
logging.info("check next startup file successful")
return OK
def print_precfg_info(precfg_info):
""" Print Pre Config Info """
str_temp = string.Template(
'Pre-config infomation:\n'
' Eth-Trunk Name: $ethtrunk_name\n'
' Eth-Trunk Work Mode: $ethtrunk_work_mode\n'
' Eth-Trunk MemberIfs: $ethtrunk_member_ifs\n'
' Vlan: $vlan_pool\n'
)
2022-07-08 37
Feature Description
precfg = str_temp.substitute(ethtrunk_name=precfg_info.get('ethtrunk_ifname'),
ethtrunk_work_mode=precfg_info.get('ethtrunk_work_mode'),
ethtrunk_member_ifs=', '.join(precfg_info.get('ethtrunk_member_ifs')),
vlan_pool=precfg_info.get('vlan'))
logging.info(precfg)
def get_device_productname(ops_conn):
"""Get system info, returns a dict"""
logging.info("Get the system information...")
uri = "/system/systemInfo"
req_data = \
'''<?xml version="1.0" encoding="UTF-8"?>
<systemInfo>
<productName/>
</systemInfo>
'''
ret, _, rsp_data = ops_conn.get(uri, req_data)
if ret != http.client.OK or rsp_data is '':
raise OPIExecError('Failed to get the system information')
productname = ""
root_elem = etree.fromstring(rsp_data)
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
uri = uri + '/productName'
uri = 'data' + uri.replace('/', '/vrp:')
elem = root_elem.find(uri, namespaces)
if elem is not None:
productname = elem.text
if '910C' in productname:
# ATN 910C product, all port need active port-basic
uri = "/devm/portResourceInfos"
lcsDescription = ['ATN 910C Any 4GE/FE Port RTU',
'ATN 910C 4*10GE Port RTU']
try:
req_data = etree.tostring(root_elem, 'UTF-8')
2022-07-08 38
Feature Description
try:
req_data = etree.tostring(root_elem, 'UTF-8')
ret, _, _ = ops_conn.set(uri, req_data)
if ret == http.client.OK:
active_flag = True
except OPIExecError:
pass
else:
logging.error('parse position failed, product: {0}, interface: {1}'.format(productname, if_port))
else:
logging.info('The current device no need active port-basic')
active_flag = True
if active_flag == False:
logging.info('{0} port-basic license active failed'.format(if_port))
uri = '/ifmtrunk/TrunkIfs/TrunkIf'
str_temp = string.Template("""
<?xml version="1.0" encoding="UTF-8"?>
<TrunkIf operation="create">
<ifName>$ifName</ifName>
<workMode>$workmode</workMode>
<TrunkMemberIfs>
$ifs
</TrunkMemberIfs>
2022-07-08 39
Feature Description
</TrunkIf>
""")
ifs_temp = string.Template(""" <TrunkMemberIf operation="create">
<memberIfName>$memberifname</memberIfName>
</TrunkMemberIf>""")
ifs = []
for iface in member_ifs:
ifs.append(ifs_temp.substitute(memberifname=iface))
ifs = '\n'.join(ifs)
req_data = str_temp.substitute(ifs=ifs, ifName=ifname, workmode=work_mode)
uri = '/ifmtrunk/TrunkIfs/TrunkIf'
str_temp = string.Template("""
<?xml version="1.0" encoding="UTF-8"?>
<TrunkIf operation="delete">
<ifName>$ifName</ifName>
</TrunkIf>
""")
req_data = str_temp.substitute(ifName=ifname)
try:
ret, _, rsp_data = ops_conn.delete(uri, req_data)
if ret != http.client.OK:
logging.error(rsp_data)
raise OPIExecError('Failed to delete Eth-Trunk interface')
except Exception as reason:
logging.error('Error:', reason)
else:
logging.info('Successed to delete Eth-Trunk interface')
if vlan == 0:
logging.info('Current vlan is 0, no need config')
return
str_temp = string.Template("""
<?xml version="1.0" encoding="UTF-8"?>
<vlanNotify>
<startVlan>$startVlan</startVlan>
<endVlan>$endVlan</endVlan>
</vlanNotify>
2022-07-08 40
Feature Description
""")
def config_interface_nego_auto_and_l2mode(_ops):
def undo_autosave_config(_ops):
handle, err_desp = _ops.cli.open()
if err_desp not in ['Success','Error: The line has been opened.']:
raise OPIExecError('Failed to open cli')
_ops.cli.execute(handle,"sys")
fd, _, err_desp = _ops.cli.execute(handle,"undo set save-configuration",None)
if fd == None or err_desp is not 'Success':
raise OPIExecError('Failed to execute undo set save-configuration')
ifname = precfg_info.get('ethtrunk_ifname')
work_mode = precfg_info.get('ethtrunk_work_mode')
member_ifs = precfg_info.get('ethtrunk_member_ifs')
vlan = precfg_info.get('vlan')
_ops = ops.ops()
if is_need_start_pnp(ops_conn) is False:
return NOT_START_PNP
2022-07-08 41
Feature Description
sleep(15)
try:
undo_autosave_config(_ops)
except OPIExecError as reason:
logging.error('Error: %s' % reason)
return ERR
try:
config_interface_nego_auto_and_l2mode(_ops)
except OPIExecError as reason:
logging.error('Error: %s' % reason)
return ERR
try:
create_ethtrunk(ops_conn, ifname, work_mode, member_ifs)
except OPIExecError as reason:
logging.error('Error: %s' % reason)
return ERR
try:
config_vlan(ops_conn, vlan)
except OPIExecError as reason:
logging.error('Error: %s' % reason)
delete_ethtrunk(ops_conn, ifname)
return ERR
try:
check_nextstartup_file(ops_conn)
except OPIExecError as reason:
logging.error('Error: %s', reason)
return OK
def main():
"""
:return:
"""
host = 'localhost'
try:
work_mode = ETHTRUNK_WORK_MODE
except NameError:
work_mode = 'Static'
try:
vlan = VLAN
except NameError:
vlan = 0
try:
member_list = ETHTRUNK_MEMBER_LIST
except:
member_list = []
precfg_info = {
'ethtrunk_ifname': 'Eth-Trunk0',
'ethtrunk_work_mode': work_mode,
2022-07-08 42
Feature Description
'ethtrunk_member_ifs': member_list,
'vlan': vlan
}
print_precfg_info(precfg_info)
try:
ops_conn = OPSConnection(host)
ret = main_proc(ops_conn, precfg_info)
except Exception:
logging.error(traceback.print_exc())
ret = ERR
finally:
ops_conn.close()
return ret
if __name__ == '__main__':
""" """
main()
The SHA256 checksum is used to check the integrity of the script file.
You can use either of the following methods to generate an SHA256 checksum for a script file:
2. Run the certutil -hashfile filename SHA256 command provided by the Windows operating
system.
The SHA256 checksum is calculated based on the content following #sha256sum=. In practice, you need to
delete the first line in the file, move the following part one line above, calculate the SHA256 checksum, and
write #sha256sum= plus the generated SHA256 checksum at the beginning of the file.
The SHA256 algorithm can be used to verify the integrity of files. This algorithm has high security.
2022-07-08 43
Feature Description
VLAN = 127
• Configure the maximum number of retries allowed when the check boot items fail to be set.
MAX_TIMES_CHECK_STARTUPCFG = 36
• Specify the interval for checking whether the system software is successfully configured.
CHECK_CHECK_STARTUP_CFG_INTERVAL = 5
def create()
def delete()
def get()
def set()
2022-07-08 44
Feature Description
active_port_license()
def main()
#sha256sum="88298f97c634cb04b1eb4fe9ad2255abffc0a246112e1960cb6402f6b799f8b6"
;BEGIN ROUTER
[GLOBAL CONFIG]
FILESERVER=sftp://username:password@hostname:port/path
[DEVICEn DESCRIPTION]
ESN=2102351931P0C3000154
MAC=00e0-fc12-3456
DEVICETYPE=DEFAULT
SYSTEM-SOFTWARE=V800R021C10SPC600.cc
SYSTEM-CONFIG=test.cfg
SYSTEM-PAT=V800R021C10SPC600SPH001.PAT
;END ROUTER
NOTE:
2022-07-08 45
Feature Description
;BEGIN ROUTER Yes Start flag of the file. This field cannot be
modified.
[GLOBAL CONFIG] Yes Start flag of the global configuration. This field
cannot be modified.
FILESERVER Yes Address of the server from which version files are
obtained. You can obtain files through
TFTP/FTP/SFTP. Available address formats are as
follows:
tftp://hostname/path
ftp://[username[:password]@]hostname/path
sftp://[username[:password]@]hostname[:port]/
path
The username, password, and port parameters
are optional. The path parameter specifies the
directory where version files are saved on the file
server. The hostname parameter specifies a
server address, which can be an IPv4 address,
domain name, or IPv6 address. The value of port
ranges from 0 to 65535. If the specified value is
out of the range, the default value 22 is used. A
port number can be configured only when an
IPv4 SFTP server address is specified.
[DEVICEn DESCRIPTION] Yes Start tag of the file description. n indicates the
device number. The value is an integer and starts
from 0.
2022-07-08 46
Feature Description
NOTE:
You can obtain the ESN of the device from the
nameplate on the device package.
The ESN is case-insensitive.
You are advised to use the ESN of a device to
specify the configuration information of the device,
but not to use DEFAULT to perform batch
configuration.
NOTE:
You can obtain the MAC address of the device from
the nameplate on the device package.
The MAC address is case-insensitive.
You need to fill in the intermediate file in strict
accordance with the MAC address format displayed
on the device. For example, if the MAC address
displayed on the device is 00e0-fc12-3456, the
MAC address 00e0fc123456 is incorrect because "-"
is also verified.
You are advised to use the MAC address of a device
to specify the configuration of the device, but not
to use DEFAULT to perform batch configuration.
NOTE:
2022-07-08 47
Feature Description
NOTE:
;END ROUTER Yes End flag of the file. This field cannot be modified.
The device checks the content of [DEVICEn DESCRIPTION] in the INI file in sequence.
The DEVICETYPE option is the first check item.
• If the value of the DEVICETYPE field is DEFAULT, or this field does not exist or is empty, the device only checks
ESN or MAC. If ESN or MAC matches the criteria, the device considers the DESCRIPTION configuration valid.
Otherwise, the device considers the DESCRIPTION configuration invalid.
• If the DEVICETYPE field has a value that is not DEFAULT, the device checks whether the value is the same as the
device type. If the value is different from the device type, the device considers the DESCRIPTION configuration
invalid and checks the next one. If the value is the same as the device type, the device moves on to check ESN or
MAC. If ESN or MAC matches the criteria, the device considers the DESCRIPTION configuration valid. Otherwise,
the device considers the DESCRIPTION configuration invalid.
• If the values of ESN and MAC are both DEFAULT, the two fields are not checked.
The following preconfiguration script file is only an example and needs to be modified based on deployment
requirements.
2022-07-08 48
Feature Description
#sha256sum="126b05cb7ed99956281edef93f72c0f0ab517eb025edfd9cc4f31a37f123c4fc"
#!/usr/bin/env python
# coding=utf-8
#
# Copyright (C) Huawei Technologies Co., Ltd. 2008-2013. All rights reserved.
# ----------------------------------------------------------------------------------------------------------------------
# History:
# Date Author Modification
# 20180122 Author created file.
# ----------------------------------------------------------------------------------------------------------------------
"""
Zero Touch Provisioning (ZTP) enables devices to automatically load version files including system software,
patch files, configuration files when the device starts up, the devices to be configured must be new devices
or have no configuration files.
This is a sample of Zero Touch Provisioning user script. You can customize it to meet the requirements of
your network environment.
"""
import hashlib
import http.client
import logging
import os
import re
import string
import traceback
import xml.etree.ElementTree as etree
from time import sleep
from urllib.parse import urlparse
import ops
# error code
OK = 0
ERR = 1
# File server in which stores the necessary system software, configuration and patch files:
# 1) Specify the file server which supports the following format.
# tftp://hostname/path
# ftp://[username[:password]@]hostname/path
# sftp://[username[:password]@]hostname[:port]/path
# 2) Do not add a trailing slash at the end of file server path.
FILE_SERVER = 'sftp://username:password@hostname:port/path/'
2022-07-08 49
Feature Description
'NE40E': 'V800R021C10SPC600SPH001.PAT'
}
# File path of sha256 file, contains sha256 value of image / patch / configuration, file extension is '.txt'
REMOTE_PATH_SHA256 = 'sha256.txt'
# constant
# autoconfig
HTTP_OK = 200
HTTP_BAD_REQUEST = 400
HTTP_BAD_RESPONSE = -1
CONFLICT_RETRY_INTERVAL = 5
POST_METHOD = 'POST'
GET_METHOD = 'GET'
DELETE_METHOD = 'DELETE'
PUT_METHOD = 'PUT'
MAX_TIMES_GET_STARTUP = 120
GET_STARTUP_INTERVAL = 15
MAX_TIMES_CHECK_STARTUP = 205
MAX_TIMES_CHECK_STARTUP_SLAVE = 265
CHECK_STARTUP_INTERVAL = 5
FILE_DELETE_DELAY_TIME = 3
# ztplib
LAST_STATE_MAP = {'true': 'enable', 'false': 'disable'}
# DNS
DNS_STATE_MAP = {'true': 'enable', 'false': 'disable'}
# download
FILE_TRANSFER_RETRY_TIMES = 3
FILE_DOWNLOAD_INTERVAL_TIME = 5
DISK_SPACE_NOT_ENOUGH = 48
IPV4 = 'ipv4'
IPV6 = 'ipv6'
OPS_CLIENT = None
# exception
class PNPStopError(Exception):
"""Stop by pnp"""
class OPIExecError(Exception):
"""OPS Connection Exception"""
class NoNeedZTP2PNPError(Exception):
"""No need start ztp"""
class SysRebootError(Exception):
"""Device reboot error"""
2022-07-08 50
Feature Description
class ZTPDisableError(Exception):
"""ZTP set disable error"""
# opslib
class OPSConnection:
"""Make an OPS connection instance."""
__slots__ = ['host', 'port', 'headers', 'conn']
def close(self):
"""Close the connection"""
self.conn.close()
try:
self.conn.request(method, uri, body, self.headers)
2022-07-08 51
Feature Description
except http.client.CannotSendRequest:
logging.warning('An error occurred during http request, try to send request again')
self.close()
self.conn = http.client.HTTPConnection(self.host, self.port)
self.conn.request(method, uri, body, self.headers)
except http.client.InvalidURL:
logging.warning('Failed to find url: %s in OPS whitelist', uri)
return HTTP_BAD_REQUEST, '', ''
try:
response = self.conn.getresponse()
except AttributeError:
logging.warning('An error occurred during http response, try again')
return HTTP_BAD_RESPONSE, '', ''
rest_message = response.read()
if isinstance(rest_message, bytes):
rest_message = str(rest_message, 'iso-8859-1')
# logging.debug('uri = %s ret = %s \n %s \n %s', uri, response.status, req_data, rest_message)
OPS_CLIENT = OPSConnection("localhost")
# pnplib
def dhcp_stop():
"""Stop DHCP client, include dhcpv4 and dhcpv6."""
logging.info('Stopping dhcp client')
uri = '/pnp/stopPnp'
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<stopPnp/>'''
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
if ret != HTTP_OK:
# ignore stop pnp err
logging.warning('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
logging.warning('Failed to stop dhcp client')
return
# commlib
def get_cwd():
"""Get the full filename of the current working directory"""
logging.info("Get the current working directory...")
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
uri = "/vfm/pwds/pwd"
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<pwd>
<dictionaryName/>
</pwd>
'''
ret, _, rsp_data = OPS_CLIENT.get(uri, req_data)
if ret != http.client.OK or rsp_data is '':
raise OPIExecError('Failed to get the current working directory')
2022-07-08 52
Feature Description
root_elem = etree.fromstring(rsp_data)
uri = 'data' + uri.replace('/', '/vrp:') + '/vrp:dictionaryName'
elem = root_elem.find(uri, namespaces)
if elem is None:
raise OPIExecError('Failed to get the current working directory for no "directoryName" element')
return elem.text
if dir_path:
req_data = str_temp_2.substitute(dirName=dir_path, fileName=file_name)
else:
req_data = str_temp_1.substitute(fileName=file_name)
ret, _, rsp_data = OPS_CLIENT.get(uri, req_data)
if ret != HTTP_OK or rsp_data == '':
return False
root_elem = etree.fromstring(rsp_data)
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
uri = 'data' + uri.replace('/', '/vrp:') + '/vrp:fileName'
elem = root_elem.find(uri, namespaces)
if elem is None:
return False
return True
if 'slave' in dest_path:
file_name = dest_path.split(':/')[1]
if file_exist(file_name, 'slave#cfcard:/'):
logging.info('Detect dest file exist, delete it first')
delete_file(dest_path)
uri = '/vfm/copyFile'
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<copyFile>
<srcFileName>$src</srcFileName>
<desFileName>$dest</desFileName>
</copyFile>''')
req_data = str_temp.substitute(src=src_path, dest=dest_path)
2022-07-08 53
Feature Description
if ret != HTTP_OK:
file_name = dest_path.split(':/')[1]
if file_exist(file_name, "slave#cfcard:/"):
logging.info('Exists file copy fragment, delete it')
delete_file(dest_path)
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
logging.error('Failed to copy %s to %s', src_path, dest_path)
return False
logging.info('succeed to copy')
return True
def delete_file(file_path):
"""Delete a file permanently"""
if file_path is None or file_path == '':
return
def has_slave_mpu():
"""Whether device has slave MPU, returns a bool value
:raise OPIExecError
"""
logging.info("Test whether device has slave MPU")
uri = '/devm/phyEntitys'
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<phyEntitys>
<phyEntity>
<entClass>mpuModule</entClass>
<entStandbyState/>
<position/>
</phyEntity>
2022-07-08 54
Feature Description
</phyEntitys>'''
has_slave = False
mpu_slot = {}.fromkeys(('master', 'slave'))
ret, err_code, rsp_data = OPS_CLIENT.get(uri, req_data)
if ret != HTTP_OK or rsp_data == '':
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to get the device slave information')
root_elem = etree.fromstring(rsp_data)
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
uri = 'data{0}/vrp:phyEntity'.format(uri.replace('/', '/vrp:'))
for entity in root_elem.findall(uri, namespaces):
elem = entity.find("vrp:entStandbyState", namespaces)
if elem is not None and elem.text.lower().find('slave') >= 0:
has_slave = True
elem = entity.find("vrp:position", namespaces)
if elem is not None:
mpu_slot['slave'] = elem.text
if elem is not None and elem.text.lower().find('master') >= 0:
elem = entity.find("vrp:position", namespaces)
if elem is not None:
mpu_slot['master'] = elem.text
def get_system_info():
"""Get device product esn mac
:raise: OPIExecError
"""
logging.info("Get the system information...")
uri = "/system/systemInfo"
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<systemInfo>
<productName/>
<esn/>
<mac/>
</systemInfo>
'''
root_elem = etree.fromstring(rsp_data)
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
uri = 'data' + uri.replace('/', '/vrp:')
nslen = len(namespaces['vrp'])
elem = root_elem.find(uri, namespaces)
if elem is not None:
for child in elem:
tag = child.tag[nslen + 2:]
if tag in list(sys_info.keys()):
sys_info[tag] = child.text
return sys_info
2022-07-08 55
Feature Description
def reboot_system(save_config='false'):
"""Reboot system."""
logging.info('System will reboot to make the configuration take effect')
sleep(10)
uri = "/devm/reboot"
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<reboot>
<saveConfig>$saveConfig</saveConfig>
</reboot>''')
req_data = str_temp.substitute(saveConfig=save_config)
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
logging.info("/devm/reboot/: rep_data[{}]".format(str(rsp_data)))
if ret != HTTP_OK or rsp_data == '':
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to execute the reboot system operation')
return True
# startuplib
class StartupInfo:
"""Startup configuration information
2022-07-08 56
Feature Description
class Startup:
"""Startup configuration information
def __init__(self):
self.current, self.next = self._get_startup_info()
self.startup_info_from_ini_or_cfg = {}
self.startup_info_before_set = StartupInfo()
@staticmethod
def _get_startup_info(retry=True):
"""Get device startup information
:raise
opslib.OPIExecError
"""
uri = '/cfg/startupInfos/startupInfo'
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<startupInfo>
<position/>
<configedSysSoft/>
<curSysSoft/>
<nextSysSoft/>
<curStartupFile/>
<nextStartupFile/>
<curPatchFile/>
<nextPatchFile/>
</startupInfo>'''
if retry is True:
retry_time = MAX_TIMES_GET_STARTUP
else:
retry_time = 1
cnt = 0
elem = None
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
ns_len = len(namespaces['vrp'])
path = 'data' + uri.replace('/', '/vrp:') # match path
while cnt < retry_time:
ret, _, rsp_data = OPS_CLIENT.get(uri, req_data)
if ret != HTTP_OK or rsp_data == '':
cnt += 1
logging.warning('Failed to get the startup information')
# sleep to wait for system ready when no query result
sleep(GET_STARTUP_INTERVAL)
continue
root_elem = etree.fromstring(rsp_data)
elem = root_elem.find(path, namespaces)
2022-07-08 57
Feature Description
if elem is None:
raise OPIExecError('Failed to get the startup information')
@staticmethod
def _set_startup_image_file(file_path, slave=True):
"""Set the next startup system software"""
file_name = os.path.basename(file_path)
logging.info('Set the next startup system software to %s, please wait a moment', file_name)
uri = '/sum/startupbymode'
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<startupbymode>
<softwareName>$fileName</softwareName>
<mode>$startupMode</mode>
</startupbymode>''')
if slave:
startup_mode = 'STARTUP_MODE_ALL'
else:
startup_mode = 'STARTUP_MODE_PRIMARY'
@staticmethod
def _set_startup_config_file(file_path):
"""Set the next startup saved-configuration file"""
file_name = os.path.basename(file_path)
logging.info('Set the next startup saved-configuration file to %s', file_name)
2022-07-08 58
Feature Description
uri = '/cfg/setStartup'
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<setStartup>
<fileName>$fileName</fileName>
</setStartup>''')
req_data = str_temp.substitute(fileName=file_name)
# it is a action operation, so use create for HTTP POST
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to set startup configuration file')
@staticmethod
def _del_startup_config_file():
"""Delete startup config file"""
logging.info('Delete the next startup config file')
uri = '/cfg/clearStartup'
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<clearStartup>
</clearStartup>'''
# it is a action operation, so use create for HTTP POST
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to delete startup configuration file')
@staticmethod
def _set_startup_patch_file(file_path):
"""Set the next startup patch file"""
file_name = os.path.basename(file_path)
logging.info('Set the next startup patch file to %s', file_name)
uri = "/patch/startup"
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<startup>
<packageName>$fileName</packageName>
</startup>''')
req_data = str_temp.substitute(fileName=file_name)
# it is a action operation, so use create for HTTP POST
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to set startup patch file')
@staticmethod
def _reset_startup_patch_file():
"""Reset patch file for system to startup"""
logging.info('Reset the next startup patch file')
uri = '/patch/resetpatch'
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<resetpatch/>'''
# it is a action operation, so use create for HTTP POST
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to reset startup patch file')
2022-07-08 59
Feature Description
2022-07-08 60
Feature Description
try:
self._set_startup_config_file(config_file)
if self._check_next_startup_file(config_file, 'config', slave) is False:
raise OPIExecError('Failed to check the next startup config file')
except OPIExecError as reason:
logging.error(reason)
delete_file_all(config_file, slave, [cur_startup.config, cur_next_startup.config])
self.reset_startup_info(slave)
raise
2022-07-08 61
Feature Description
if next_startup.patch:
sleep(FILE_DELETE_DELAY_TIME)
delete_file_all(next_startup.patch, slave,
[cur_startup.patch, self.startup_info_before_set.patch])
except Exception as reason:
logging.error(reason)
def convert_byte_to_str(data):
result = data
if not isinstance(data, str):
result = str(data, "iso-8859-1")
return result
def read_chunks(fhdl):
'''read chunks'''
chunk = fhdl.read(8096)
while chunk:
yield chunk
chunk = fhdl.read(8096)
else:
fhdl.seek(0)
sha256_obj = hashlib.sha256()
if isinstance(fname, str):
with open(fname, "rb") as fhdl:
2022-07-08 62
Feature Description
def sha256_get_from_file(fname):
"""Get sha256 num form file, stored in first line"""
with open(fname, "rb") as fhdl:
fhdl.seek(0)
line_first = convert_byte_to_str(fhdl.readline())
# if not match pattern, the format of this file is not supported
if not re.match('^#sha256sum="[\\w]{64}"[\r\n]+$', line_first):
return 'None'
return line_first[12:76]
def sha256_check_with_first_line(fname):
"""Validate sha256 for this file"""
if sha256_file.lower() != sha256_calc:
logging.warning('SHA256 check failed, file %s', fname)
logging.warning('SHA256 checksum of the file "%s" is %s', fname, sha256_calc)
logging.warning('SHA256 checksum received from the file "%s" is %s', fname, sha256_file)
return False
return True
def parse_sha256_file(fname):
"""parse sha256 file"""
def read_line(fhdl):
"""read a line by loop"""
line = fhdl.readline()
while line:
yield line
line = fhdl.readline()
else:
fhdl.seek(0)
sha256_dic = {}
work_fname = os.path.join("ztp", fname)
with open(work_fname, "rb") as fhdl:
for line in read_line(fhdl):
line_spilt = convert_byte_to_str(line).split()
if 2 != len(line_spilt):
continue
2022-07-08 63
Feature Description
def verify_and_parse_sha256_file(fname):
"""
verify data integrity of sha256 file and parse this file
file-name sha256
conf_5618642831132.cfg 1254b2e49d3347c4147a90858fa5f59aa2594b7294304f34e7da328bf3cdfbae
------------------------------------------------------------------
"""
if not sha256_check_with_first_line(fname):
return ERR, None
return OK, parse_sha256_file(fname)
return False
def check_parameter(aset):
seq = ['&', '>', '<', '"', "'"]
if aset:
for c in seq:
if c in aset:
return True
return False
def check_filename():
sys_info = get_system_info()
url_tuple = urlparse(FILE_SERVER)
if check_parameter(url_tuple.username) or check_parameter(url_tuple.password):
logging.error('Invalid username or password, the name should not contain: ' + '&' + ' >' + ' <' + ' "' + " '.")
return ERR
file_name = os.path.basename(REMOTE_PATH_IMAGE.get(sys_info['productName'], ''))
if file_name is not '' and check_parameter(file_name):
logging.error(
'Invalid filename of system software, the name should not contain: ' + '&' + ' >' + ' <' + ' "' + " '.")
return ERR
file_name = os.path.basename(REMOTE_PATH_CONFIG)
if file_name is not '' and check_parameter(file_name):
2022-07-08 64
Feature Description
logging.error(
'Invalid filename of configuration file, the name should not contain: ' + '&' + ' >' + ' <' + ' "' + " '.")
return ERR
file_name = os.path.basename(REMOTE_PATH_PATCH.get(sys_info['productName'], ''))
if file_name is not '' and check_parameter(file_name):
logging.error(
'Invalid filename of patch file, the name should not contain: ' + '&' + ' >' + ' <' + ' "' + " '.")
return ERR
try:
file_name = os.path.basename(REMOTE_PATH_SHA256)
except NameError:
file_name = ''
if file_name is not '' and check_parameter(file_name):
logging.error(
'Invalid filename of sha256 file, the name should not contain: ' + '&' + ' >' + ' <' + ' "' + " '.")
return ERR
return OK
if slave:
ret = copy_file(local_path_config, 'slave#' + local_path_config)
if ret is False:
logging.error('%s copy fail', local_path_config)
return False, local_path_config
if ret == DISK_SPACE_NOT_ENOUGH:
2022-07-08 65
Feature Description
if slave:
ret = copy_file(local_path_patch, 'slave#' + local_path_patch)
if ret is False:
logging.error('%s copy fail', local_path_patch)
return ERR, local_path_patch
if ret == DISK_SPACE_NOT_ENOUGH:
logging.error('The space of disk is not enough')
return DISK_SPACE_NOT_ENOUGH, local_path_image
if slave:
ret = copy_file(local_path_image, 'slave#' + local_path_image)
if ret is False:
logging.error('%s copy fail', local_path_image)
return ERR, local_path_image
# current STARTUP_INFO
cur_startup, next_startup = STARTUP._get_startup_info()
cur_config = None if not cur_startup.config else os.path.basename(cur_startup.config)
cur_patch = None if not cur_startup.patch else os.path.basename(cur_startup.patch)
cur_image = None if not cur_startup.image else os.path.basename(cur_startup.image)
next_config = None if not next_startup.config else os.path.basename(next_startup.config)
next_patch = None if not next_startup.patch else os.path.basename(next_startup.patch)
next_image = None if not next_startup.image else os.path.basename(next_startup.image)
2022-07-08 66
Feature Description
# download sha256 file first, used to verify data integrity of files which will be downloaded next
try:
cwd = get_cwd()
file_path = REMOTE_PATH_SHA256
if not file_path.startswith('/'):
file_path = '/' + file_path
file_name = os.path.basename(file_path)
if file_name:
url = FILE_SERVER + file_path
local_path = os.path.join(cwd, "ztp", file_name)
ret = download_file(url, local_path, ip_protocol, vpn_instance)
if ret is ERR:
logging.error('Error: Failed to download sha256 file "%s"' % file_name)
return ERR, None, None, None
logging.info('Info: Download sha256 file successfully')
ret, sha256_val_dic = verify_and_parse_sha256_file(file_name)
# delete the file immediately
os.remove(os.path.join("ztp", file_name))
if ret is ERR:
logging.error('Error: sha256 check failed, file "%s"' % file_name)
return ERR, None, None, None
else:
sha256_val_dic = {}
except NameError:
sha256_val_dic = {}
logging.info('no need sha256 to check download file')
# if user change the startup to the name in ini/cfg, ztp will not download
# 1. Download configuration file
if startup_info['SYSTEM-CONFIG'] and startup_info['SYSTEM-CONFIG'] not in [cur_config, next_config]:
ret, local_path_config = download_cfg_file(startup_info, slave, ip_protocol, vpn_instance, sha256_val_dic)
if ret is False:
logging.info('delete startup file [cfg]')
delete_startup_file(local_path_image, local_path_config, local_path_patch, slave)
return ERR, local_path_image, local_path_config, local_path_patch
logging.info('succeed to download config file')
elif startup_info['SYSTEM-CONFIG'] and startup_info['SYSTEM-CONFIG'] in [cur_config, next_config]:
logging.warning('The configured config version is the same as the current device version')
2022-07-08 67
Feature Description
# ztplib
def set_ztp_last_status(state):
"""Set ztp last status."""
uri = '/ztpops/ztpStatus/ztpLastStatus'
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<ztpLastStatus>$ztpLastStatus</ztpLastStatus>''')
req_data = str_temp.substitute(ztpLastStatus=state)
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
logging.error('Failed to set ztp last status to %s', LAST_STATE_MAP[state])
return
def get_ztp_enable_status():
"""Get ztp enable status
:raise: OPIExecError
"""
uri = '/ztpops/ztpStatus/ztpEnable'
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<ztpEnable/>'''
ret, err_code, rsp_data = OPS_CLIENT.get(uri, req_data)
if ret != HTTP_OK or rsp_data == '':
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to get ztp enable status')
root_elem = etree.fromstring(rsp_data)
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
uri = 'data' + uri.replace('/', '/vrp:')
elem = root_elem.find(uri, namespaces)
if elem is None:
raise OPIExecError('Failed to read ztp enable status')
return elem.text
2022-07-08 68
Feature Description
def parse_environment(env):
lines = re.split(r'\r\n', env)
for line in lines[3:-2]:
item = re.split(r'[ ][ ]*', line)
if item[1] == 'ztp_exit_flag':
logging.info('parse environment, ztp_exit_flag: ' + item[2])
return item[2]
return None
def get_ztp_exit_environment():
_ops = ops.ops()
handle, err_desp = _ops.cli.open()
ret = _ops.cli.execute(handle, "display ops environment")
if ret[2] == 'Success' and ret[0]:
return parse_environment(ret[0])
return None
def check_ztp_continue():
"""Check if ztp can continue to run"""
res = True
try:
enable_state = get_ztp_enable_status()
ztp_exit_flag = get_ztp_exit_environment()
if enable_state == 'false' or ztp_exit_flag == 'true':
res = False
except OPIExecError as ex:
logging.warning(ex)
return res
# DNS
class DNSServer:
"""Dns protocol service"""
__slots__ = ['dns_servers', 'enable_state', 'vpn_instance']
def __init__(self):
self.dns_servers = []
self.enable_state = 'false'
self.vpn_instance = {}
if self.enable_state == switch:
logging.info('The current enable state of dns is %s, no need to set', DNS_STATE_MAP.get(switch))
return
uri = '/dns/dnsGlobalCfgs/dnsGlobalCfg'
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<dnsGlobalCfg>
<dnsEnable>$dnsEnable</dnsEnable>
</dnsGlobalCfg>''')
2022-07-08 69
Feature Description
req_data = str_temp.substitute(dnsEnable=switch)
ret, err_code, rsp_data = OPS_CLIENT.set(uri, req_data)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to %s DNS' % DNS_STATE_MAP.get(switch))
self.enable_state = switch
return
self._set_dns_enable_switch('true')
logging.info('Add DNS IPv4 servers')
uri = '/dns/dnsIpv4Servers'
root_elem = etree.Element('dnsIpv4Servers')
for server_addr in new_dns_servers:
dns_server = etree.SubElement(root_elem, 'dnsIpv4Server')
etree.SubElement(dns_server, 'ipv4Addr').text = server_addr
etree.SubElement(dns_server, 'vrfName').text = vpn_instance
req_data = etree.tostring(root_elem, 'UTF-8')
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to config DNS IPv4 server')
# configure success
self.dns_servers.extend(new_dns_servers)
self.vpn_instance.update(dict.fromkeys(new_dns_servers, vpn_instance))
def del_dns_servers_ipv4(self):
"""Delete IPv4 DNS servers configuration.
:raise: OPIExecError
"""
if not self.dns_servers:
logging.info('Current dns server is empty, no need to delete')
return
uri = '/dns/dnsIpv4Servers'
root_elem = etree.Element('dnsIpv4Servers')
for server_addr in self.dns_servers:
dns_server = etree.SubElement(root_elem, 'dnsIpv4Server')
etree.SubElement(dns_server, 'ipv4Addr').text = server_addr
etree.SubElement(dns_server, 'vrfName').text = self.vpn_instance.get(server_addr)
req_data = etree.tostring(root_elem, 'UTF-8')
2022-07-08 70
Feature Description
self._set_dns_enable_switch('false')
@staticmethod
def get_addr_by_hostname(host, vpn_instance, addr_type='1'):
"""Translate a host name to IPv4 address format. The IPv4 address is returned as a string.
:raise: OPIExecError
"""
logging.info('Get ipv4 address by host name %s', host)
uri = '/dns/dnsNameResolution'
root_elem = etree.Element('dnsNameResolution')
etree.SubElement(root_elem, 'host').text = host
etree.SubElement(root_elem, 'addrType').text = addr_type
etree.SubElement(root_elem, 'vrfName').text = vpn_instance
req_data = etree.tostring(root_elem, "UTF-8")
logging.warning(req_data)
ret, err_code, rsp_data = OPS_CLIENT.get(uri, req_data)
if ret != HTTP_OK or rsp_data == '':
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
raise OPIExecError('Failed to get ipv4 address by host name')
logging.warning(rsp_data)
root_elem = etree.fromstring(rsp_data)
namespaces = {'vrp': 'http://www.huawei.com/netconf/vrp'}
uri = 'data' + uri.replace('/', '/vrp:') + '/vrp:'
elem = root_elem.find(uri + 'ipv4Addr', namespaces)
if elem is None:
raise OPIExecError('Failed to read IP address by host name')
return elem.text
# download
def download_file(url, local_path, ip_protocol, vpn_instance):
"""
Description:
Download file, support TFTP, FTP, SFTP.
Args:
url: URL of remote file
tftp://hostname/path
ftp://[username[:password]@]hostname/path
sftp://[username[:password]@]hostname[:port]/path
url_tuple = urlparse(url)
func_dict = {
'tftp': {
2022-07-08 71
Feature Description
IPV4: TFTPv4,
IPV6: TFTPv6,
},
'ftp': {
IPV4: FTPv4,
IPV6: FTPv6,
},
'sftp': {
IPV4: SFTPv4,
IPV6: SFTPv6,
}
}
scheme = url_tuple.scheme
if scheme not in func_dict.keys():
logging.error('Unknown file transfer scheme %s', scheme)
return ERR
if ip_protocol == IPV4:
if not re.match(r'\d+\.\d+\.\d+\.\d+', url_tuple.hostname):
# get server ip by hostname from dns
try:
dns_vpn = '_public_' if vpn_instance in [None, ''] else vpn_instance
server_ip = DNS.get_addr_by_hostname(url_tuple.hostname, dns_vpn)
logging.info("server ip: " + server_ip)
except OPIExecError as ex:
logging.error(ex)
return ERR
ret = ERR
cnt = 0
while cnt < 1 + FILE_TRANSFER_RETRY_TIMES:
if cnt:
logging.info('Try downloading again, please wait a moment')
try:
ret = func_dict[scheme][ip_protocol](url, local_path, vpn_instance).start()
if ret in [OK, DISK_SPACE_NOT_ENOUGH]:
logging.info('download file %s using %s, ret:%d', os.path.basename(local_path), scheme, ret)
break
logging.error('Failed to download file %s using %s', os.path.basename(local_path), scheme)
sleep(FILE_DOWNLOAD_INTERVAL_TIME)
except OPIExecError as ex:
logging.error(ex)
except Exception as ex:
logging.exception(ex)
cnt += 1
return ret
class Download:
"""File download base class"""
def start(self):
"""Start to download file"""
uri = self.get_uri()
req_data = self.get_req_data()
2022-07-08 72
Feature Description
self.pre_download()
ret, err_code, rsp_data = OPS_CLIENT.create(uri, req_data, False)
if ret != HTTP_OK:
logging.error('HTTP response: HTTP/1.1 %s %s\n%s', ret, err_code, rsp_data)
root = etree.fromstring(rsp_data)
rpc_error = root.find('rpc-error')
if rpc_error and rpc_error.find('error-app-tag') is not None:
ret = int(rpc_error.find('error-app-tag').text)
else:
ret = ERR
else:
ret = OK
self.after_download()
return ret
def get_uri(self):
"""Return download request uri"""
raise NotImplementedError
def get_req_data(self):
"""Return download request xml message"""
raise NotImplementedError
def pre_download(self):
"""Do some actions before download file"""
raise NotImplementedError
def after_download(self):
"""Do some actions after download file"""
raise NotImplementedError
class FTP(Download):
"""FTP download class"""
def get_uri(self):
"""Return ftp download request uri"""
return '/ftpc/ftpcTransferFiles/ftpcTransferFile'
def get_req_data(self):
"""Implemented by subclasses"""
raise NotImplementedError
def pre_download(self):
"""FTP not care"""
def after_download(self):
"""FTP not care"""
class FTPv4(FTP):
"""FTPv4 download class"""
def get_req_data(self):
"""Return ftpv4 download request xml message"""
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
2022-07-08 73
Feature Description
<ftpcTransferFile>
<serverIpv4Address>$serverIp</serverIpv4Address>
<commandType>get</commandType>
<userName>$username</userName>
<password>$password</password>
<localFileName>$localPath</localFileName>
<remoteFileName>$remotePath</remoteFileName>
<vpnInstanceName>$vpnInstance</vpnInstanceName>
</ftpcTransferFile>''')
url_tuple = urlparse(self.url)
req_data = str_temp.substitute(serverIp=url_tuple.hostname,
username=url_tuple.username,
password=url_tuple.password,
remotePath=url_tuple.path[1:],
localPath=self.local_path,
vpnInstance=self.vpn_instance)
return req_data
class FTPv6(FTP):
"""FTPv6 download class"""
def get_req_data(self):
"""Return ftpv6 download request xml message"""
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<ftpcTransferFile>
<serverIpv6Address>$serverIp</serverIpv6Address>
<commandType>get</commandType>
<userName>$username</userName>
<password>$password</password>
<localFileName>$localPath</localFileName>
<remoteFileName>$remotePath</remoteFileName>
<ipv6VpnName>$vpnInstance</ipv6VpnName>
</ftpcTransferFile>''')
url_tuple = urlparse(self.url)
idx = url_tuple.netloc.rfind('@')
server_ip = url_tuple.netloc[idx + 1:]
req_data = str_temp.substitute(serverIp=server_ip,
username=url_tuple.username,
password=url_tuple.password,
remotePath=url_tuple.path[1:],
localPath=self.local_path,
vpnInstance=self.vpn_instance)
return req_data
class TFTP(Download):
"""TFTP download class"""
def get_uri(self):
"""Return ftp download request uri"""
return '/tftpc/tftpcTransferFiles/tftpcTransferFile'
def get_req_data(self):
"""Implemented by subclasses"""
raise NotImplementedError
2022-07-08 74
Feature Description
def pre_download(self):
"""TFTP not case"""
def after_download(self):
"""TFTP not case"""
class TFTPv4(TFTP):
"""TFTPv4 download class"""
def get_req_data(self):
"""Return tftpv4 download request xml message"""
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<tftpcTransferFile>
<serverIpv4Address>$serverIp</serverIpv4Address>
<commandType>get_cmd</commandType>
<localFileName>$localPath</localFileName>
<remoteFileName>$remotePath</remoteFileName>
<vpnInstanceName>$vpnInstance</vpnInstanceName>
</tftpcTransferFile>''')
url_tuple = urlparse(self.url)
req_data = str_temp.substitute(serverIp=url_tuple.hostname,
remotePath=url_tuple.path[1:],
localPath=self.local_path,
vpnInstance=self.vpn_instance)
return req_data
class TFTPv6(TFTP):
"""TFTPv6 download class"""
def get_req_data(self):
"""Return tftpv4 download request xml message"""
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<tftpcTransferFile>
<serverIpv6Address>$serverIp</serverIpv6Address>
<commandType>get_cmd</commandType>
<localFileName>$localPath</localFileName>
<remoteFileName>$remotePath</remoteFileName>
<ipv6VpnName>$vpnInstance</ipv6VpnName>
</tftpcTransferFile>''')
url_tuple = urlparse(self.url)
idx = url_tuple.netloc.rfind('@')
server_ip = url_tuple.netloc[idx + 1:]
req_data = str_temp.substitute(serverIp=server_ip,
remotePath=url_tuple.path[1:],
localPath=self.local_path,
vpnInstance=self.vpn_instance)
return req_data
2022-07-08 75
Feature Description
class SFTP(Download):
"""SFTP download class"""
def get_uri(self):
"""Return ftp download request uri"""
return '/sshc/sshcConnects/sshcConnect'
def get_req_data(self):
"""Implemented by subclasses"""
raise NotImplementedError
def pre_download(self, ):
self._set_sshc_first_time('Enable')
def after_download(self):
self._del_sshc_rsa_key()
self._set_sshc_first_time('Disable')
@classmethod
def _set_sshc_first_time(cls, switch):
"""Set SSH client attribute of authenticating user for the first time access"""
if switch not in ['Enable', 'Disable']:
return ERR
raise OPIExecError(reason)
return OK
def _del_rsa_peer_key(self):
"""Delete RSA peer key configuration"""
logging.info('Delete RSA peer key')
uri = '/rsa/rsaPeerKeys/rsaPeerKey'
root_elem = etree.Element('rsaPeerKey')
etree.SubElement(root_elem, 'keyName').text = self.get_key_name()
req_data = etree.tostring(root_elem, 'UTF-8')
ret, _, _ = OPS_CLIENT.delete(uri, req_data)
if ret != HTTP_OK:
logging.error('Failed to delete RSA peer key')
2022-07-08 76
Feature Description
self._del_rsa_peer_key()
def get_key_name(self):
"""Get sftp server ip"""
raise NotImplementedError
class SFTPv4(SFTP):
"""SFTPv4 download class"""
def get_key_name(self):
url_tuple = urlparse(self.url)
return url_tuple.hostname
def get_req_data(self):
"""Return sftpv4 download request xml message"""
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<sshcConnect>
<HostAddrIPv4>$serverIp</HostAddrIPv4>
<commandType>get</commandType>
<userName>$username</userName>
<password>$password</password>
<serverPort>$port</serverPort>
<localFileName>$localPath</localFileName>
<remoteFileName>$remotePath</remoteFileName>
<vpnInstanceName>$vpnInstance</vpnInstanceName>
<identityKey>ssh-rsa</identityKey>
<transferType>SFTP</transferType>
</sshcConnect>''')
url_tuple = urlparse(self.url)
try:
if url_tuple.port is None:
port = 22
else:
port = url_tuple.port
except ValueError:
port = 22
class SFTPv6(SFTP):
"""SFTPv6 download class"""
2022-07-08 77
Feature Description
def get_key_name(self):
url_tuple = urlparse(self.url)
idx = url_tuple.netloc.find('@')
return url_tuple.netloc[idx + 1:]
def get_req_data(self):
"""Return sftpv4 download request xml message"""
str_temp = string.Template('''<?xml version="1.0" encoding="UTF-8"?>
<sshcConnect>
<HostAddrIPv6>$serverIp</HostAddrIPv6>
<commandType>get</commandType>
<userName>$username</userName>
<password>$password</password>
<localFileName>$localPath</localFileName>
<remoteFileName>$remotePath</remoteFileName>
<ipv6VpnName>$vpnInstance</ipv6VpnName>
<identityKey>ssh-rsa</identityKey>
<transferType>SFTP</transferType>
</sshcConnect>''')
url_tuple = urlparse(self.url)
server_ip = self.get_key_name()
req_data = str_temp.substitute(serverIp=server_ip,
username=url_tuple.username,
password=url_tuple.password,
remotePath=url_tuple.path[1:],
localPath=self.local_path,
vpnInstance=self.vpn_instance)
return req_data
def _is_startup_info_valid(startup_info):
"""Does startup info valid
FILESERVER, SOFTWARE, CONFIG, PATCH, not None
"""
return startup_info.get('SYSTEM-CONFIG', None) and startup_info.get('FILESERVER', None)
sys_info = get_system_info()
slave, _ = has_slave_mpu() # Check whether slave MPU board exists or not
logging.info('Get devicetype=%s, esn=%s, mac=%s from the current system', sys_info['productName'],
sys_info['esn'], sys_info['mac'])
if not REMOTE_PATH_IMAGE.get(sys_info['productName']):
logging.warning(
"The product name of the current device [{}] not in REMOTE_PATH_IMAGE".format(sys_info['productName']))
if not REMOTE_PATH_PATCH.get(sys_info['productName']):
logging.warning(
"The product name of the current device [{}] not in REMOTE_PATH_PATCH".format(sys_info['productName']))
2022-07-08 78
Feature Description
if '%s' in REMOTE_PATH_CONFIG:
REMOTE_PATH_CONFIG = REMOTE_PATH_CONFIG % sys_info['esn']
startup_info = {'FILESERVER': FILE_SERVER,
'SYSTEM-SOFTWARE': REMOTE_PATH_IMAGE.get(sys_info['productName'], ''),
'SYSTEM-CONFIG': REMOTE_PATH_CONFIG,
'SYSTEM-PAT': REMOTE_PATH_PATCH.get(sys_info['productName'], '')}
STARTUP.set_startup_info_from_ini_or_cfg(startup_info)
if not _is_startup_info_valid(startup_info):
logging.warning('FILESERVER is None or SYSTEM-CONFIG is None, no need download and '
'set system startup file')
return ERR
ret = check_filename()
if ret == ERR:
return ERR
if check_ztp_continue() is False:
logging.info('user stop ztp before setting, ztp will reset startup')
delete_startup_file(image_file, config_file, patch_file, slave)
return ERR
if not check_ztp_continue():
logging.info('user stop ztp after setting, ztp will reset startup')
STARTUP.reset_startup_info(slave)
return ERR
set_ztp_last_status('true')
dhcp_stop()
try:
reboot_system()
except OPIExecError as reason:
logging.error("reboot failed: {}".format(reason))
set_ztp_last_status('false')
STARTUP.reset_startup_info(slave)
return ERR
return OK
2022-07-08 79
Feature Description
Args:
Raises:
Returns: user script processing result
"""
ip_protocol = ip_protocol.lower()
try:
ret = main_proc(vpn_instance, ip_protocol)
except Exception as reason:
logging.error(reason)
trace_info = traceback.format_exc()
logging.error(trace_info)
ret = ERR
finally:
# Close the OPS connection
OPS_CLIENT.close()
return ret
while True:
try:
STARTUP = Startup()
break
except OPIExecError as ex:
logging.warning(ex)
sleep(CHECK_STARTUP_INTERVAL)
DNS = DNSServer()
if __name__ == "__main__":
main()
• The content in bold in this example can be modified based on actual requirements.
• Do not modify the content that is not in bold in this example. Otherwise, the ZTP function may be unavailable.
• Do not modify the script logic. Otherwise, an infinite loop may occur during script execution or the script fails to be
executed, causing the ZTP function to be unavailable.
• If the preceding examples do not meet the requirements, contact Huawei engineers.
The SHA256 verification code is used to check the integrity of the script file.
You can use either of the following methods to generate an SHA256 verification code for a script file:
2. Run the certutil -hashfile filename SHA256 command provided by the Windows operating
system.
2022-07-08 80
Feature Description
The SHA256 verification code is calculated based on the content following #sha256sum=. In practice, you need to
delete the first line in the file, move the following part one line above, calculate the SHA256 verification code, and
write #sha256sum= plus the generated SHA256 verification code at the beginning of the file.
The SHA256 algorithm can be used to verify the integrity of files. This algorithm has high security.
You can obtain version files from an SFTP/TFTP/FTP server. Based on the server used, the path can be
any of the following:
■ tftp://hostname/path
■ ftp://[username[:password]@]hostname/path
■ sftp://[username[:password]@]hostname[:port]/path
If no system software needs to be loaded, leave this parameter blank or do not specify the device type.
For example:
REMOTE_PATH_IMAGE = {
'NE40E' : ''
}
Or
REMOTE_PATH_IMAGE = {}
If the device model entered here is inconsistent with the actual device model, the device skips this check and
continues the ZTP process. That is, the system considers that this item does not need to be set, and only logs are
recorded.
%s indicates a device ESN, based on which you can obtain a configuration file. This field cannot be
edited.
■ You are advised to use the ESN to specify the configuration file of a specific device. Do not use a
2022-07-08 81
Feature Description
configuration file that does not contain the ESN for batch configuration.
■ The ESN is case-sensitive and must be the same as that on the device.
■ If the conf_%s.cfg file does not exist on the file server, a message is displayed indicating that the
configuration file fails to be downloaded. For example, if the ESN of the device is 2102351HLD10J2000012
and the conf_2102351HLD10J2000012.cfg file does not exist on the file server, an error message is
displayed.
Or
REMOTE_PATH_PATCH = {}
You can use the SHA256 verification file to check the integrity of the files downloaded by the device.
For details about the format of the SHA256 verification file, see Version File Integrity Check.
If the downloaded files do not need to be checked, set this field to ".
• Specify the waiting time for initiating a second request after a request failure.
CONFLICT_RETRY_INTERVAL = 5
• Specify the maximum number of retries allowed when the startup information fails to be obtained.
MAX_TIMES_GET_STARTUP = 120
2022-07-08 82
Feature Description
GET_STARTUP_INTERVAL = 15
• Specify the maximum number of retries allowed when the check boot items fail to be configured for a
device equipped with a single main control board.
MAX_TIMES_CHECK_STARTUP = 205
• Specify the maximum number of retries allowed when the check boot items fail to be configured for a
device equipped with two main control boards.
MAX_TIMES_CHECK_STARTUP_SLAVE = 265
• Specify the interval for checking whether the system software is successfully set.
CHECK_STARTUP_INTERVAL = 5
• Specify the ZTP status value mapping, which is used for logs.
LAST_STATE_MAP = {'true': 'enable', 'false': 'disable'}
• Specify the DNS status value mapping, which is used for logs.
DNS_STATE_MAP = {'true': 'enable', 'false': 'disable'}
• Specify the waiting time for the next download after a download failure.
FILE_DOWNLOAD_INTERVAL_TIME = 5
2022-07-08 83
Feature Description
class ZTPDisableError()
def create()
def delete()
def get()
def set()
• Copy files.
def copy_file()
If a file fails to be loaded, all files downloaded by the device must be deleted to roll the device back to
the state before ZTP is performed.
You do not need to edit this field.
2022-07-08 84
Feature Description
• Check whether the files for the next startup are ready.
def _check_next_startup_file()
2022-07-08 85
Feature Description
• Reset information about the next startup and delete the downloaded files.
def reset_startup_info()
def sha256_get_from_file()
def sha256_check_with_first_line()
def sha256_check_with_dic()
def parse_sha256_file()
def verify_and_parse_sha256_file()
• Check whether the username, password, and file name contain special characters.
def check_parameter()
def check_filename()
2022-07-08 86
Feature Description
def delete_startup_file()
def get_ztp_exit_environment()
2022-07-08 87
Feature Description
def main()
if __name__ == "__main__":
main()
2022-07-08 88
Feature Description
#sha256sum="fffcd63f5e31f0891a0349686969969c1ee429dedeaf7726ed304f2d08ce1bc7"
fileserver=sftp://username:password@hostname:port/path;
mac=00e0-fc12-3456;esn=2102351931P0C3000154;devicetype=DEFAULT;system-version=V800R021C10SPC600;system-
software=V800R021C10SPC600.cc;system-config=test.cfg;system-pat=V800R021C10SPC600SPH001.PAT;
NOTE:
fileserver Yes Address of the server from which version files are
obtained. You can obtain files through
TFTP/FTP/SFTP. Available address formats are as
follows:
tftp://hostname/path
ftp://[username[:password]@]hostname/path
sftp://[username[:password]@]hostname[:port]/
path
The username, password, and port parameters
are optional. The path parameter specifies the
directory where version files are saved on the file
server. The hostname parameter specifies a
server address, which can be an IPv4 address,
domain name, or IPv6 address. The value of port
ranges from 0 to 65535. If the specified value is
out of the range, the default value 22 is used. A
port number can be configured only when an
IPv4 SFTP server address is specified.
2022-07-08 89
Feature Description
NOTE:
You can obtain the ESN of the device from the
nameplate on the device package.
The ESN is case-insensitive.
You are advised to use the ESN of a device to
specify the configuration information of the device,
but not to use DEFAULT to perform batch
configuration.
NOTE:
You can obtain the MAC address of the device from
the nameplate on the device package.
The MAC address is case-insensitive.
You need to fill in the intermediate file in strict
accordance with the MAC address format displayed
on the device. For example, if the MAC address
displayed on the device is 00e0-fc12-3456, the MAC
address 00e0fc123456 is incorrect because "-" is
also verified.
You are advised to use the MAC address of a device
to specify the configuration of the device, but not
to use DEFAULT to perform batch configuration.
NOTE:
For details about the device type, see "Chassis" in
2022-07-08 90
Feature Description
Hardware Description.
If the value of this field is different from the actual
device type, the ZTP process is performed again.
NOTE:
You can use either of the following methods to generate an SHA256 checksum for a script file:
2022-07-08 91
Feature Description
#sha256sum="29d29a2b0ef2136f0f192667d71627020e58438fbfb87323f2dae27b5cd9a797"
file-name sha256
conf_5618642831132.cfg 319c16ebcbc987ef11f28f78cb7d6e7ea4950b8b195e1388c031f3327cc2666e
Condition Impact
The device starts with non-base configuration, or the name of ZTP becomes invalid.
the configuration file for the startup is not vrpcfg.zip.
The set ztp disable command is run. The ZTP function is disabled. To use the
ZTP function again, run the set ztp
enable command.
Any of the following configurations is performed on the device: The ZTP process ends.
An IP address is configured for any interface, excluding the
following configurations: 192.168.0.1 is configured for the
management network port, an IP address is configured for a
loopback interface, and an IP address is configured for a DCN-
related sub-interface.
The undo pnp enable command is configured globally.
A VLAN is configured globally.
A BD is configured globally.
A VSI is configured globally.
The device is configured as an AP.
The DHCP client function is configured on an interface.
Login to the device through DCN succeeds. The ZTP process ends.
2022-07-08 92
Feature Description
Terms
None
2022-07-08 93
Feature Description
2022-07-08 94
Feature Description
4 System Management
Purpose
This document describes the system management feature in terms of its overview, principles, and
applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
2022-07-08 95
Feature Description
password to "%^%#". This causes the password to be displayed directly in the configuration file.
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
2022-07-08 96
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
2022-07-08 97
Feature Description
in earlier issues.
4.2 VS Description
4.2.1 Overview of VS
Definition
A network administrator divides a physical system (PS) into multiple virtual systems (VSs) using hardware-
and software-level emulation. Each VS performs independent routing tasks. VSs share the same software
package and all public resources, including the same IPU, but each interface works only for one VS.
Background
As the demand on various types of network services is growing, network management becomes more
complex. Requirements for service isolation, system security, and reliability are steadily increasing. The
virtual private network (VPN) technique can be used to isolate services on a PS. If a module failure occurs on
the PS, all services configured on the PS will be interrupted. To prevent service interruptions, the VS
technique is used to partition a PS into several VSs. Each VS functions as an independent network element
and uses separate physical resources to isolate services.
Further development of the distributed routing and switching systems allows the VS technique to fully utilize
the service processing capability of a single PS. The VS technique helps simplify network deployment and
management, and strengthen system security and reliability.
Benefits
This feature offers the following benefits to carriers:
• Service integrity: Each VS has all the functions of a common Router to carry services. Each VS has an
independent control plane, which allows rapid response to future network services and makes network
services more configurable and manageable.
• Service isolation: A VS is a virtual Router on both the software and hardware planes. A software or
hardware fault in a VS does not affect other VSs. The VS technique ensures network security and
stability.
2022-07-08 98
Feature Description
• Expenditure reduction: As an important feature of new-generation IP bearer devices, VSs play an active
role in centralized operation of service provider (SP) services, reducing capital expenditure (CAPEX) or
operational expenditure (OPEX).
4.2.2 Understanding VS
4.2.2.1 VS Fundamentals
Concepts
Admin-VS and common VS
Common VS (VSn): A network administrator divides a PS into multiple VSs using hardware- and software-
level emulation. Each VS performs independent routing tasks. VSs share the same software package and all
public resources, including the same IPU, but each interface works only for one VS.
Admin VS: Each PS has a default VS named admin VS. All unallocated interfaces belong to this VS. The
admin VS can process services in the same way as a common VS. In addition, the PS administrator can use
the admin VS to manage VSs.
The admin VS has permission to manage service VSs in independent management mode.
Each VS uses the independent configuration management and system management planes and serves as
independent network elements to provide flexible management and high security. VS supports the following
functions, in addition to service isolation:
• Flexible resource management: Resources are allocated using a resource template. The resource
template can be modified dynamically to allocate resource. This mode improves VS resource
management flexibility.
• File-directory isolation: Each VS has its own file directory. A PS administrator can check all VS file
directories, such as configuration files and log files, and access their contents. A VS administrator can
only check and access its contents. This results in improved security.
• Separate alarm reports: A VS reports its own alarms to the network administrator. Faults are located
quickly, and VS security is guaranteed.
• Independent starts and stops: A PS administrator starts, stops, or resets a VS without affecting other
VSs.
• VS-switches: When configuring or operating VSs, a PS administrator can switch between VSs.
After you create a VS, allocate logical and hardware resources to the VS.
Logical resources include u4route, m4route, u6route, m6route, and vpn-instance.
Before you configure VSs, specify a port allocation mode for the VSs.
A port allocation mode determines the scope of resources allocated to a VS. Currently, only the port mode is
supported. In port mode, VSs share service resources that a PS provides, and some features can only be
enabled on a single VS.
Resource template: By using a resource template, multiple logical resource items can be allocated to a VS at
2022-07-08 99
Feature Description
a time, which saves time of the user. After a resource template is modified, it must be loaded to the
corresponding VS for the change to take effect.
In Figure 1, a PS is partitioned into VSs, VS1 carries voice services, VS2 carries data services, and VS3 carries
video services. Each type of service is transmitted through a separate VS, and these services are isolated from
one other. VSs share all resources except interfaces. Each VS functions as an individual Router to process
services.
Figure 1 VS partitioning
VSs share resources such as CPU memory and interface boards, but do not share interfaces. A physical or logical
interface belongs only to one VS.
The number of VSs that can be created is limited by the interface resources of a device. Multiple VSs can be created if
the device has sufficient interface resources.
VS Authority Management
Table 1 shows VS authority management.
PS administrator √ √
VS administrator - -
• A VS administrator can perform operations only on the managed VS, including starting and stopping the allocated
services, configuring routes, forwarding service data, and maintaining and managing the VS.
• On the NE40E, physical interfaces can be directly connected so that different VSs on the same Physical System (PS)
can communicate.
2022-07-08 100
Feature Description
• If the entire section of a command does not contain any VS description, the command is supported by
the physical device, admin VS, and service VSs.
• If only certain parameters have VS descriptions, the parameters without VS descriptions are supported
by the physical device, admin VS, and service VSs.
• Different routing instances are isolated, which is more secure and reliable than route isolation
implemented using VPN.
• Physical resources of a device can be fully utilized. For example, without the VS technique, on a device
with 16 interfaces, if only 4 interfaces are needed to transmit services, the other 12 interfaces idle,
wasting resources.
• Devices of different roles are integrated to simplify network tests, deployment, management, and
maintenance.
• Links between devices are simplified into internal buses that are of higher reliability, higher
performance, and lower cost.
2022-07-08 101
Feature Description
In Figure 2, a physical device can serve as both aggregation and core nodes (such as the BRAS, PE, and P),
which simplifies network topology and network management and maintenance.
2022-07-08 102
Feature Description
The virtual system (VS) technique, which isolates services, can be used to prevent this risk.
As shown in Figure 1, new services are deployed on VSs to verify the services and avoid security risks. This
deployment makes full use of resources and does not affect the existing services.
Definition
Information management classifies output information, effectively filters out information, and outputs
information to a local device or a remote server.
Purpose
The information management function helps users:
2022-07-08 104
Feature Description
Type Description
Logs Logs are records of events and unexpected activities of managed objects. Logging is
an important method to maintain operations and identify faults. Logs provide
information for fault analysis and help an administrator trace user activities,
manage system security, and maintain a system.
Some logs are used by technical support personnel for troubleshooting only.
Because such logs have no practical significance to users, users are not notified
when the logs are generated. System logs are classified as user logs, diagnostic logs,
O&M logs, or security logs.
User logs: During device running, the log module in the host software records all
running information in logs. The logs are saved in the log buffer, sent to the Syslog
server, reported to an NMS, and displayed on the screen. Such logs are user logs.
Users can view the compressed log files and their content.
Diagnostic logs: The logs recorded after the device starts but before the logserver
component starts are diagnostic logs. Such logs are recorded in the process-side
black box, and they are not saved in the log buffer, sent to the Syslog server,
reported to an NMS, or displayed on the screen. Users can view the compressed log
files and their content.
NOTE:
The information recorded in diagnostic logs is used for troubleshooting only and does not
contain any sensitive information.
O&M logs: During the running of a device, the log module of the host software
records the data generated during the running of each service, forming O&M logs.
Log information is not saved in the log buffer, sent to the Syslog server, reported to
an NMS, or displayed on the screen. Users can view the compressed log files and
their content.
NOTE:
The information recorded in O&M logs is used for troubleshooting only and does not
contain any sensitive information.
Security logs: If the system of a device is intruded, the device must be informed of
the intrusion so that it can take responsive measures. Collecting logs about intrusion
from external attackers is an important means of security detection. Security logs
2022-07-08 105
Feature Description
Type Description
are recorded in the log buffer, sent to the Syslog server in SSL mode, reported to an
NMS, and displayed on the screen.
NOTE:
The system-defined user name _SYSTEM_ is displayed as the user name in operation and
security logs in the following scenarios:
No operation user is available for the security logs of events.
Operation logs record system behaviors, such as internal configuration and configuration
file restoration.
No username is available for password authentication.
If operation logs record system behaviors, such as internal configuration and configuration
file restoration, "**" is displayed for the IP and Terminal parameters.
Traps Traps are sent to a workstation to report urgent and important events, such as the
restart of a managed device. In general, the system also generates a log with the
same content after generating a trap, except that the trap contains an additional
OID.
Debugging Debugging information shows the device's running status, such as the sending or
information receiving of data packets. A device generates debugging information only after
debugging is enabled.
log.log The current information files of the system are saved in log format.
diag.log Logs recording exceptions that occur when the system is started or running are
saved in diag.log format.
pads.pads Logs generated during the running of each service after a device starts are saved in
.pads format.
security.log A security log is saved in the security log space in the .log format, and is also
recorded in the log.log file.
log_SlotID_time.log.zip If the size of a current information file reaches the upper threshold, the system
automatically compresses the file into a historical file and changes the file name to
log_SlotID_time.log.zip.
2022-07-08 106
Feature Description
In the file name, SlotID indicates the slot ID and time indicates the compression
and saving time.
diag_SlotID_time.log.zipIf the size of a current diagnostic log reaches the upper threshold, the system
automatically converts the file to a compressed file and names the compressed file
diag_SlotID_time.log.zip.
In the file name, SlotID indicates the slot ID and time indicates the compression
and saving time.
pads_SlotID_time.pads.zip
If the size of a current O&M log file reaches the upper threshold, the system
automatically converts the file to a compressed file and names the compressed file
pads_SlotID_time.pads.zip.
In the file name, SlotID indicates the slot ID and time indicates the compression
and saving time.
Overview
Identifying fault information is difficult if there is a large amount of information. Setting information levels
allows users to rapidly identify information.
Information Levels
Table 1 describes eight information severities. The lower the severity value, the higher the severity.
1 Alert A serious fault. For example, device memory reaches the maximum limit.
Such a fault must be rectified immediately.
2 Critical A critical fault. For example, memory usage reaches the upper limit, the
temperature reaches the upper limit, or bidirectional forwarding detection
(BFD) detects an unreachable device or error messages generated by a
local device. The fault must be analyzed and rectified.
2022-07-08 107
Feature Description
4 Warning An exception. For example, users disable a routing process, BFD detects
packet loss, or error protocol packets are detected. The fault does not
affect subsequent services and requires attention.
5 Notice A key operation is performed to keep the device running properly. For
example, the shutdown command is used on an interface, a neighbor is
discovered, or the protocol status changes.
Logs can be output or filtered based on a specified severity value. A device can output logs with severity
values less than or equal to the specified value. For example, if the log severity value is set to 6, the device
only outputs logs with severity values 0 to 6.
<Int_16> Leading characters Theses characters are added to the information to be sent
to a syslog server but are not added to the information
saved on a local device.
2022-07-08 108
Feature Description
2022-07-08 109
Feature Description
and are independent of each other. Information channels are available only after information sources are
specified. By default, a device defines information sources for the first six channels (console, monitor, log
host, trap buffer, log buffer, and SNMP agent) and for channel 9 (information file).
Figure 1 illustrates information output channels. Logs, traps, and debugging information are output through
default channels. All types of information can also be output through specified channels. For example, if
channel 6 is configured to carry information to the log buffer, information is sent to the log buffer through
channel 6, not channel 4.
Table 1 describes the default information output channels.
0 Console Console Receives logs, traps, and debugging information for local
query.
2 Loghost Syslog server Receives and saves logs and traps. An administrator can
monitor routers and locate faults by querying the files.
The syslog server to which log information is output can be
specified by configuring the server IP address, UDP port
number, recording information facility, and information
severity. Multiple source interfaces can be specified on devices
to output log information. This configuration allows a syslog
server to identify which the device outputs information.
2022-07-08 110
Feature Description
2022-07-08 111
Feature Description
• Validity: Information to be sent must comply with the format required by the Syslog protocol.
• Integrity: Information must include various user operations, exceptions, and key events.
Definition
The fault management function is one of five functions (performance management, configuration
management, security management, fault management, and charging management) that make up a
telecommunications management network. The primary purposes of this function are to monitor the
operating anomalies and problems of devices and networks in real time and to monitor, report, and store
data on faults and device running conditions. Fault management also provides alarms, helping users isolate
or rectify faults so that affected services can be restored.
Purpose
With the popularity of networks, complexity of application environments, and expansion of network scales,
our goal must be to make network management more intelligent and effective. Improving and optimizing
2022-07-08 112
Feature Description
fault management will help us meet this goal. Improved fault management can achieve the following:
Function Description
2022-07-08 113
Feature Description
• Alarm masking is terminal-specific. Specifically, alarms that are masked on a terminal can still be
received normally by other terminals.
• A terminal can be configured with an alarm masking table to control its alarm information.
• Jitter suppression: uses alarm continuity analysis to allow the device not to report the alarm if a fault
lasts only a short period of time and to display a stable alarm if a fault flaps.
• Correlation suppression: uses alarm correlation rules to reduce the number of reported alarms, reducing
the network load and facilitating fault locating.
Alarm continuity analysis aims to differentiate events that require analysis and attention from those that do
not and to filter out unstable events.
Continuity analysis measures time after a stable event, such as fault occurrence or fault rectification, occurs.
If the event continues for a specified period of time, an alarm is sent. If the event is cleared, the event is
filtered out and no alarm is sent. If a fault lasts only a short period of time, it is filtered out and no alarm is
reported. Only stable fault information is displayed when a fault flaps.
Figure 2 shows the alarm generated if a fault flaps.
2022-07-08 114
Feature Description
• If NMS-based correlative alarm suppression is configured, the system filters out correlative alarms and
reports only root alarms and independent alarms to the NMS.
• If NMS-based correlative alarm suppression is not configured, the system reports root alarms,
correlative alarms and independent alarms to the NMS.
Terms
Term Description
Jitter alarm Alarms generated in batches due to managed object abnormalities, such as
blocking/unblocking and flapping.
2022-07-08 115
Feature Description
Term Description
Alarm masking A function that allows masking rules to be configured to prevent alarms matching
the rules from being reported to the alarm terminal. Masked alarms are still saved
on the device that generated them.
Alarm suppression An alarm management function. Managed objects that generate jitter alarms and
repeating alarms can be suppressed and prevented from generating a large number
of useless alarms.
Alarm correlation The process of analyzing the alarms that meet alarm correlation rules. If alarm B is
analysis generated within 5 seconds after alarm A is generated and meets the alarm
correlation rules, alarm B is masked or its severity is increased accordingly.
Root alarm An alarm generated due to network abnormalities or faults. Lower-level alarms
always accompany root alarms.
Correlative alarm An alarm generated because of the same fault that caused another alarm. If alarm
B is generated because of the fault that causes alarm A, alarm B is the correlative
alarm of alarm A.
Definition
The performance management feature periodically collects performance statistics on a device to monitor the
performance and operating status of the device. This feature allows you to evaluate, analyze, and predict
device performance with current and historical performance statistics.
Purpose
The performance management feature is essential to device operation and maintenance. This feature
provides current and historical statistics about performance indicators, helping you to determine the device
operating status and providing a reference for you to locate faults and perform configurations.
Analysis on performance statistics helps you to predict the device performance trend. For example, by
analyzing the peak and valley values of user traffic during a day, you can predict the network traffic growth
trend and speed in the next 30 days or longer.
Performance statistics provide a reference for you to optimize network configuration and make network
2022-07-08 116
Feature Description
• Traffic rate calculated by dividing the traffic volume collected during a statistics period by the length of
the period
The statistics can be the peak, valley, or average values collected during a statistics period, or the snapshot
values collected at the end of a statistics period. The maximum, minimum, average, and current values of
the ambient temperature are examples of such statistics.
The statistics collection function supports many types of statistics tasks. A statistics task can be bound to a
statistics period and multiple statistics instances.
You can query current and historical performance statistics or clear current performance statistics using
either commands or a network management system (NMS).
2022-07-08 117
Feature Description
2. The device collects performance statistics based on the performance statistics task and generates a
performance statistics file.
3. The device actively transfers the performance statistics file to the NMS or transfers the file to the NMS
upon request.
4. The NMS parses the performance statistics file, stores the file in the database, and presents collected
statistics if necessary.
The NMS can convert received performance statistics files to files recognizable to a third-party NMS and
transfer these files to the third-party NMS for processing.
Definition
If the performance of the current system software does not meet requirements, you can upgrade the system
software package or maintain the device to enhance the system performance. Specific operations involve:
2022-07-08 118
Feature Description
Purpose
You can select a proper operation to upgrade and maintain the device according to the real-world situation.
Application scenarios of these operations are as follows:
• Upgrade
■ Patch installation
Patches are a type of software compatible with system software. They are used to fix urgent bugs
in system software. You can upgrade the system by installing patches, without having to upgrade
the system software.
Benefits
To add new features to a device or optimize device performance, or if the current resource files (including
the system software and GTL file) do not meet requirements, you can choose to upgrade software, install
patches, or update the GTL file as needed.
Background
Software management is a basic feature on a device. It involves various operations, such as software
installation, software upgrade, software version rollback, and patch operations.
• Software upgrade optimizes system performance, enables new performance capabilities, and resolves
problems in an existing software version.
• Patches are software compatible with the system software. Installing patches can resolve a specific
number of urgent problems, whereas the device does not need to be upgraded.
2022-07-08 119
Feature Description
Basic Concepts
Software management is a basic feature on a device. It involves various operations, such as software
installation, software upgrade, software version rollback, and patch operations.
When you install or upgrade the system software, enable the log and alarm management functions to record
installation or upgrade operations on a device. The recorded information helps diagnose faults if installation or an
upgrade fails.
• Software installation
A device can load software onto all main control boards simultaneously, which minimizes the loading
time.
• Software upgrade
Software can be upgraded to satisfy network and service requirements on a live network.
• Patch operations
Installing the latest patch optimizes software capabilities and fixes software bugs. Installing the latest
patch also dynamically upgrades software on a running device, which minimizes negative impact on
services and improves communication quality.
2022-07-08 120
Feature Description
Software Upgrade
At present, the NE40E supports two types of software upgrade: software upgrade that takes effect at the
next startup .
Background
The system software of a running device may need to be upgraded to correct existing errors or add new
functions to meet service requirements. The traditional way is to disconnect the device from the network and
upgrade the system software in offline mode. This method is service-affecting.
Patches are specifically designed to upgrade the system software of a running device with minimum or no
impact on services.
Basic Concepts
A patch is an independent software unit used to upgrade system software.
• Incremental patch: A device can have multiple incremental patches installed. The latest incremental
patch contains all the information of previous incremental patches.
• Non-incremental patch: A device can have only one non-incremental patch installed. If you want to
install an additional patch for a device on which a non-incremental patch exists, uninstall the non-
incremental patch first.
• Hot patch: The patch takes effect immediately after it is installed. Installing a hot patch does not affect
services.
• Cold patch: The patch does not take effect immediately after it is installed. You must reset the
corresponding board or subcard or perform a master/slave main control board switchover for the patch
to take effect. Installing a cold patch affects services.
2022-07-08 121
Feature Description
1. For an ECP that is released based on an ACU, if activating and validating the ECP would not affect
user experience, the ECP is a hot ECP and named HPyyyy; if activating and validating the ECP would
affect user experience, the ECP is a cold ECP and named CPyyyy.
2. The first y in HPyyyy or CPyyyy is fixed at 0, and the subsequent yyy is the same as yyy in SPCyyy or
SPHyyy of the corresponding ACU. Therefore, an ECP is named in the format of HP0yyy or CP0yyy. If a
calculated ECP name is the same as that of the previously released ECP, the newly calculated one
increases by 1.
1. For an ACU that is released based on the previous cold ACU, if the current ACU contains patches that
would affect user experience when being validated, the current ACU is a cold ACU and named SPCyyy.
2. For an ACU that is released based on the previous cold ACU, if the current ACU does not contain any
patches that would affect user experience when being validated, the current ACU is a hot ACU and
named SPHyyy.
Principles
Patches have the following functions:
• Correct errors in the source version without interrupting services running on a device.
• Add new functions, which requires one or more existing functions in the current system to be replaced.
Patches are a type of software compatible with the Router system software. They are used to fix urgent bugs
in the Router system software.
Table 2 shows the patch status supported by a device.
2022-07-08 122
Feature Description
None The patch has been saved to the storage When a patch is loaded to the patch
medium of the device, but is not loaded area in the memory, the patch status is
to the patch area in the memory. set to Running.
Running The patch is loaded to the patch area A patch in the running state can be
and enabled permanently. If the board uninstalled and deleted from the patch
is reset, the patch on the board remains area.
in the running state.
Figure 1 shows the relationships between the tasks related to patch installation.
In previous versions, after a cold patch is installed, the system instructs users to perform operations for the
patch to take effect. To facilitate patch installation, the system is configured to automatically perform the
operation that needs to be performed for an installed cold patch to take effect. Before the system performs
the operation, the system asks for your confirmation.
1. When a cold patch is released, its type and impact range are specified in the patch description.
2. After a cold patch is installed, the system determines which operation to perform based on the patch
description. For example, the system determines whether to reset a board or subcard based on the
impact range of the cold patch. Then, the system displays a message asking you to confirm whether to
perform the operation for the cold patch to take effect. The system automatically executes
corresponding operations based on users' choices.
Benefits
Patches allow you to optimize the system performance of a device with minimum or no impact on services.
2022-07-08 123
Feature Description
Software Upgrade
If the performance of the current system software does not meet requirements, you can update the system
software package to enhance system performance.
There are two methods to obtain a system software package: remote download or local download. For
details on how to obtain a system software package, refer to the configuration guide of the corresponding
product.
Patch Upgrade
During device operation, the system software may need to be modified due to system bugs or new function
requirements. The traditional way is to upgrade the system software after powering off the device. This,
however, interrupts services and affects QoS.
Loading a patch into the system software achieves system software upgrade without interrupting services on
the device and improves QoS.
Definition
Simple Network Management Protocol (SNMP) is a network management standard widely used on TCP/IP
networks. With SNMP, a core device, such as a network management station (workstation), running network
management software manage network elements (NEs), such as Routers.
SNMP provides the following functions:
• A workstation uses GET, Get-Next, and Get-Bulk operations to obtain network resource information.
• A workstation uses a SET operation to set management Information Base (MIB) objects.
• A management agent proactively reports traps and informs to notify the workstation of network status
(allowing network administrators to take real-time measures as needed.)
Purpose
SNMP is primarily used to manage networks.
There are two types of network management methods:
• Network management issues related to software, including application management, simultaneous file
2022-07-08 124
Feature Description
access by users, and read/write access permissions. This guide does not describe software management
in detail.
• Management of NEs that make up a network, such as workstations, servers, network interface cards
(NICs), Routers, bridges, and hubs. Many of these devices are located far from the central network site
where the network administrator is located. Ideally, a network administrator should be automatically
notified of faults anywhere on the network. Unlike users, however, Routers cannot pick up the phone
and call the network administrator when there is a fault.
To address this problem, some manufacturers produce devices with integrated network management
functions. The workstation can remotely query the device status, and the devices can use alarms to inform
the workstation of events.
Network management involves the following items:
• Agent: special software or firmware used to trace the status of managed objects
• Workstation: a core device used to communicate with agents about managed objects and to display the
status of these agents
• Network management protocol: a protocol run on the workstation and agents to exchange information
Feature Description
Access control This function restricts a user's device administration rights. It gives a user
the rights to manage specific objects on devices and therefore provides
refined management.
Authentication and This function authenticates and encrypts packets transmitted between an
encryption NMS and a managed device. This function prevents data packets from
being modified, improving data transmission security.
Error code Error codes help a network administrator identify and resolve device faults.
A wide range of error codes makes it easier for a network administrator to
manage devices.
Trap Traps are sent from a managed device to an NMS. Traps notify a network
administrator of device faults.
A managed device does not require an acknowledgement from the NMS
after it sends a trap.
2022-07-08 125
Feature Description
Feature Description
Inform Informs are sent from a managed device to an NMS. Informs notify a
network administrator of device faults.
After an NMS restarts, it learns of the informs sent during the restart process.
NOTE:
To ensure high
security, do not use the
MD5 algorithm as the
SNMPv3 authentication
algorithm.
Encryption mode:
Data Encryption
Standard-56 (DES-56)
2022-07-08 126
Feature Description
3DES168
Advanced Encryption
Standard-128 (AES128)
AES192
AES256
NOTE:
To ensure high
security, do not use the
DES-56 or 3DES168
algorithm as the
SNMPv3 encryption
algorithm.
A workstation running SNMP cannot manage NEs (managed objects) running a network management
protocol, not SNMP. In this situation, the workstation must use proxy agents for management. A proxy agent
provides functions, such as protocol transition and filtering operations. Figure 2 shows how a proxy agent
works.
2022-07-08 128
Feature Description
• The workstation (or NMS) sends an SNMP Request message to an SNMP agent.
• The agent searches the management information base (MIB) on the managed object for the required
information and returns an SNMP Response message to the workstation.
• If the trap triggering conditions defined for a module are met, the agent for that module sends a
message to notify the workstation that an event has occurred on a managed object. This helps the
network administrator deal with network faults.
• Get-Request PDUs: Generated and transmitted by the workstation to obtain one or more parameter
values from an agent.
• Get-Next-Request PDUs: Generated and transmitted by the workstation to obtain parameter values in
alphabetical order from an agent.
• Set-Request PDUs: Used to set one or more parameter values for an agent.
• Get-Response PDUs: Contains one or more parameters. Generated by an agent and transmitted in reply
to a Get-Request PDU from the workstation.
• Traps: Messages that originate with an agent and are sent to inform the workstation of network events.
Get-Request, Get-Next-Request, and Set-Request PDUs are sent by the workstation to an agent; Get-
2022-07-08 129
Feature Description
Response PDUs and traps are sent by an agent to the workstation. When Get-Request PDUs, Get-Next-
Request PDUs, and Set-Request PDUs are generated and transmitted, naming is simplified to Get, Get-Next,
and Set for convenience. Figure 1 shows how the five types of PDUs are transmitted.
By default, an agent uses port 161 to receive Get, Get-Next, and Set messages, and the workstation uses port 162 to
receive traps.
An SNMP message consists of a common SNMP header, a Get/Set header, a trap header, and variable
binding.
• Version
Specifies the SNMP version. In an SNMPv1 packet, the value of this field is 0.
• Community
The community is a simple text password shared by the workstation and an agent. It is a string. A
common value is the 6-character string "public".
• PDU type
There are five types of PDUs in total, as shown in Table 1.
0 get-request
1 get-next-request
2 get-response
2022-07-08 130
Feature Description
3 set-request
4 trap
Get/Set Header
The Get or Set header contains the following fields:
• Request ID
An integer set by the workstation, it is carried in Get-Request messages sent by the workstation and in
Get-Response messages returned by an agent. The workstation can send Get messages to multiple
agents simultaneously. All Get messages are transmitted using UDP. A response to the request message
sent first may be the last to arrive. In such cases, Request IDs carried in the Get-Response messages
enable the workstation to identify the returned messages.
• Error status
An agent enters a value in this field of a Get-Response message to specify an error, as listed in Table 2.
• Error index
When noSuchName, badValue, and readOnly errors occur, the agent sets an integer in the Response
message to specify an offset value for the faulty variable in the list. By default, the offset value in get-
request messages is 0.
2022-07-08 131
Feature Description
Next messages.
Trap Header
• Enterprise
This field is an object identifier of a network device that sends traps. The object identifier resides in the
sub-tree of the enterprise object {1.3.6.1.4.1} in the object naming tree.
To send a type 2, 3, or 5 trap, you must use the first variable in the trap's variable binding field to identify
the interface responding to the trap.
• Specific-code
If an agent sends a type 6 trap, the value in the Specific-code field specifies an event defined by the
agent. If the trap type is not 6, this field value is 0.
• Timestamp
This specifies the duration from when an agent is initializing to when an event reported by a trap
occurs. This value is expressed in 10 ms. For example, a timestamp of 1908 means that an event
2022-07-08 132
Feature Description
• Does not provide the batch access mechanism and has low access efficiency of bulk data.
• Does not provide a communication mechanism for managers and is therefore suitable for only
centralized management, not distributed management.
In 1996, the Internet Engineering Task Force (IETF) issued a series of SNMP-associated standards. These
documents defined SNMPv2c and abandoned the security standard in SNMPv2.
SNMPv2c enhances the following aspects of SNMPv1:
• Protocol control
SNMPv2c Security
SNMPv2c abandons SNMPv2 security improvements and inherits the message mechanism and community
concepts in SNMPv1.
2022-07-08 133
Feature Description
• Strong adaptability: SNMPv3 is applicable to multiple operating systems. It can manage both simple and
complex networks.
SNMPv3 has four models: message processing and control model, local processing model, user security
model, and view-based access control model.
Unlike SNMPv1 and SNMPv2, SNMPv3 can implement access control, identity authentication, and data
encryption using the local processing model and user security model.
• Identity authentication: A process in which the agent (or workstation) confirms whether the received
message is from an authorized workstation (or agent) and whether the message is changed during
2022-07-08 134
Feature Description
transmission. HMAC is an effective tool that is widely applied on the Internet to generate the message
authentication code using the security hash function and shared key.
• Data encryption: The workstation uses the key to calculate the CBC code and then adds a CBC code to
the message whereas the agent uses the same key to decrypt the authentication code and then obtains
the actual information. Similar to identity authentication, the encryption requires that the workstation
and agent share the same key to encrypt and decrypt the message.
To improve system security, it is recommended that you configure different authentication and encryption passwords for
an SNMP user.
4.7.2.6 MIB
A Management Information Base (MIB) specifies variables (MIB object identifiers or OIDs) maintained by
NEs. These variables can be queried and set in the management process. A MIB provides a structure that
contains data on all NEs that may be managed on the network. The SNMP MIB uses a hierarchical tree
structure similar to the Domain Name System (DNS), beginning with a nameless root at the top. Figure 1
shows an object naming tree, one part of the MIB.
2022-07-08 135
Feature Description
The three objects at the top of the object naming tree are: ISO, ITU-T (formerly CCITT), and the sum of ISO
and ITU-T. There are four objects under ISO. Of these, the number 3 identifies an organization. A
Department of Defense (DoD) sub-tree, marked dod (6), is under the identified organization (3). Under dod
(6) is internet (1). If the only objects being considered are Internet objects, you may begin drawing the sub-
tree below the Internet object (the square frames in dotted lines with shadow marks in the following
diagram), and place the identifier {1.3.6.1} next to the Internet object.
One of the objects under the Internet object is mgmt (2). The object under mgmt (2) is mib-2 (1) (formerly
renamed in the new edition MIB-II defined in 1991). mib-2 is identified by an OID, {1.3.6.1.2.1} or
{Internet(1).2.1}.
Internet Control Message 5 ICMP software (for collecting statistics about received ICMP
2022-07-08 136
Feature Description
External Gateway Protocol 8 EGP software (for collecting statistics on EGP traffic)
(EGP)
MIB is defined independently of a network management protocol. Device manufacturers can integrate SNMP
agent software into their products (for example, Routers), but they must ensure that this software complies
with relevant standards after new MIBs are defined. You can use the same network management software
to manage Routers containing different MIB versions. However, the network management software cannot
manage a Router that does not support the MIB function.
4.7.2.7 SMI
Structure of Management Information (SMI) is a set of rules used to name and define managed objects. It
can define the ID, type, access level, and status of managed objects. At present, there are two SMI versions:
SMIv1 and SMIv2.
The following standard data types are defined in SMI:
• INTEGER
• OCTER STRING
• DisplayString
• OBJECT IDENTIFIER
• NULL
• IpAddress
• PhysAddress
• Counter
• Gauge
• TimeTicks
• SEQUENCE
• SEQUENCE OF
4.7.2.8 Trap
A managed device sends unsolicited trap messages to notify a network management system (NMS) that an
urgent and significant event has occurred on the managed device. For example, the managed device restarts.
2022-07-08 137
Feature Description
If the trap triggering conditions defined for the agent's module are met, the agent sends a trap message to
notify the NMS that a significant event has occurred. Network administrators can promptly handle the event.
The NMS uses port 162 to receive trap messages from the agent. The trap messages are carried over the
User Datagram Protocol (UDP). After the NMS receives trap messages, it does not need to acknowledge the
messages.
With an increasing number of system features and scenarios, the current SNMP standard error code types
are inadequate. Consequently, the workstation cannot identify the scenario where the fault occurs when the
NE processes packets. As a solution, the extended error code was introduced.
When a fault occurs during packet processing, the NE returns an error code corresponding to the fault
scenario. If the fault scenario is beyond the range of the SNMP standard error code, a generic error or a
user-defined error code is returned.
The error code that is defined by users is called the extended error code.
The extended error code applies to more scenarios. Only Huawei workstations can correctly parse the fault
scenario of the current NE based on the agreement with NEs.
Extended error code can be enabled using either command lines or operations on the workstation. After
extended error code is enabled, SNMP converts the internal error codes returned from features into different
extended error codes and then sends them to the workstation based on certain rules. If the internal error
codes returned from features are standard error codes, SNMP sends them directly to the workstation.
If extended error code is disabled, standard error codes and internal error codes defined by modules are sent
directly to the workstation.
The system generates and manages extended error codes based on those registered on the modules and the
module number. The workstation parses extended error codes according to its agreement with NEs and then
displays the obtained information.
2022-07-08 138
Feature Description
2022-07-08 139
Feature Description
encryption/decryption.
Background
The Simple Network Management Protocol (SNMP) communicates management information between a
network management station (NMS) and a device, such as a Router, so that the NMS can manage the
device. If the NMS and device use different SNMP versions, the NMS cannot manage the device.
To resolve this problem, configure SNMP proxy on a device between the NMS and device to be managed, as
shown in Figure 1. In the following description, the device on which SNMP proxy needs to be configured is
referred to as a middle-point device.
The NMS manages the middle-point device and managed device as an independent network element,
reducing the number of managed network elements and management costs.
• Receives SNMP packets from other SNMP entities, forwards SNMP packets to other SNMP entities, or
forwards responses to SNMP request originators.
• Enables communication between SNMP entities running SNMPv1, SNMPv2c, and SNMPv3.
An SNMP proxy can work between one or more NMSs and multiple network elements.
2022-07-08 140
Feature Description
Principles
In Figure 2, the middle-point device allows you to manage the network access, configurations, and system
software version of the managed device. The network element management information base (MIB) files
loaded to the NMS include the MIB tables of both the middle-point device and managed device. After you
configure SNMP proxy on the middle-point device, the middle-point device automatically forwards SNMP
requests from the NMS to the managed device and forwards SNMP responses from the managed device to
the NMS.
2022-07-08 141
Feature Description
• The process in which an NMS uses a middle-point device to query the MIB information of a managed
device is as follows:
1. The NMS sends an SNMP request that contains the MIB object ID of the managed device to the
middle-point device.
• The engine ID carried in an SNMPv3 request must be the same as the engine ID of the SNMP
agent on the managed device.
• If the SNMP request is an SNMPv1 or SNMPv2c packet, a proxy community name must be
configured on the middle-point device with the engine ID of the SNMP agent on the
managed device be specified. The community name carried in the SNMP request packet must
match the community name configured on the managed device.
2. Upon receipt, the middle-point device searches its proxy table for a forwarding entry based on the
engine ID.
• If a matching forwarding entry exists, the middle-point device caches the request and
encapsulates the request based on forwarding rules.
• If no matching forwarding entry exists, the middle-point device drops the request.
3. The middle-point device forwards the encapsulated request to the managed device and waits for
a response.
4. After the middle-point device receives a response from the managed device, the middle-point
device forwards the response to the NMS.
If the middle-point device fails to receive a response within a specified period, the middle-point
device drops the SNMP request.
• The process in which a managed device uses a middle-point device to send a notification to an NMS is
as follows:
2022-07-08 142
Feature Description
1. The managed device generates a notification due to causes such as overheating and sends the
notification to the middle-point device.
2. Upon receipt, the middle-point device searches its proxy table for a forwarding entry based on the
engine ID.
• If a matching forwarding entry exists, the middle-point device encapsulates the notification
based on forwarding rules.
• If no matching forwarding entry exists, the middle-point device drops the notification.
Background
AAA is an authentication, authorization, and accounting technique. AAA local users can be configured to log
in to a device through FTP, Telnet, or SSH. However, SNMPv3 supports only SNMP users, which can be an
inconvenience in unified network device management.
To resolve this issue, configure SNMP to support AAA users. AAA users can then access the NMS, and MIB
node operation authorization can be performed based on tasks. The NMS does not distinguish AAA users
and SNMP users.
Figure 1 shows the process of an AAA user logging in to the NMS through SNMP.
Principles
Figure 2 shows the principles of SNMP's support for AAA users.
3. SNMP synchronizes the AAA user data and updates the SNMP user list. Configure a mode to
2022-07-08 143
Feature Description
authenticate the AAA user and a mode to encrypt the AAA user's data.
The AAA user's authentication and encryption modes are SNMP. An authentication password is not
used.
After the preceding operations are performed, the AAA user can log in to the NMS in the same way as an
SNMP user.
To improve system security, it is recommended that you configure different authentication and encryption passwords for
an SNMP local user.
• Add a user to a user group and associate a user group with a task group.
You can configure the read, write, and execute permissions for a specific task to control MIB node operations
that an AAA user is allowed to perform. As shown in Figure 3:
If the read permission is assigned in task 1, user 1 is allowed to read MIB nodes 1 and 2.
2022-07-08 144
Feature Description
2022-07-08 145
Feature Description
Figure 1 Networking diagram for monitoring an outdoor cabinet using SNMP proxy
The SNMP proxy is deployed on the main device. The NMS manages each cabinet as a virtual unit that
consists of the main device and monitoring device. This significantly reduces the number of NEs managed by
the NMS, lowering network management costs, facilitating real-time device performance monitoring, and
improving service quality.
Definition
The Network Configuration Protocol (NETCONF) is an extensible markup language (XML) based network
configuration and management protocol. NETCONF uses a simple remote procedure call (RPC) mechanism
to implement communication between a client and a server.
NETCONF provides a method for a network management system (NMS) to remotely manage and monitor
devices.
Purpose
As networks grow in scale and complexity, the Simple Network Management Protocol (SNMP) can no longer
meet carriers' network management requirements, especially configuration management requirements.
XML-based NETCONF was developed to meet the demands.
Table 1 lists the differences between SNMP and NETCONF.
2022-07-08 146
Feature Description
Configuration
SNMP does not provide a lock NETCONF provides a lock mechanism to prevent the
management
mechanism to prevent the operations operations performed by multiple users from
performed by multiple users from conflicting with each other.
conflicting with each other.
Query SNMP requires multiple interaction NETCONF can directly query system configuration
processes to query one or more records data and supports data filtering.
in a database table.
ExtensibilityPoor. Good.
NETCONF is defined based on multiple layers that are
independent of one another. When one layer is
expanded, its upper layers are least affected.
XML encoding helps expand NETCONF's management
capabilities and compatibility.
Security The International Architecture Board NETCONF uses existing security protocols to ensure
(IAB) released SNMPv2 (enhanced network security and is not specific to any security
SNMP) in 1996, which still has poor protocols. NETCONF is more flexible than SNMP in
security. SNMPv3, released in 2002, ensuring security.
provides important security NOTE:
improvements over the previous two
NETCONF prefers Secure Shell (SSH) at the transport
versions but is inextensible. This is layer and uses SSH to transmit XML information.
Benefits
NETCONF offers the following benefits:
• Facilitates configuration data management and interoperability between different vendors' devices
using XML encoding to define messages and the RPC mechanism to modify configuration data.
• Improves the efficiency of system software upgrade performed using a configuration tool.
• Provides high extensibility, allowing different vendors to define additional NETCONF operations.
2022-07-08 147
Feature Description
Layer 1: BEEP, Secure The transport layer provides a communication path for interaction between a
Transport Shell (SSH), NETCONF client and the server.
Protocol and Secure NETCONF can be carried on any transport protocol that meets all of the
Sockets Layer following requirements:
(SSL) The transport protocol is connection-oriented. A permanent link is
established between the NETCONF client and server. After the permanent
link is established, data is transmitted reliably and sequentially.
The transport layer provides user authentication, data integrity, and security
encryption for NETCONF.
The transport protocol provides a mechanism to distinguish the session type
(client or server) for NETCONF.
NOTE:
2022-07-08 148
Feature Description
Currently, the device only supports SSH as the transport layer protocol of NETCONF.
Layer 2: <rpc> and <rpc- The RPC layer provides a simple RPC request and response mechanism
RPC reply> independent of transport protocols. The client uses the <rpc> element to
encapsulate RPC request information and sends the RPC request information
to the server through a secure and connection-oriented session. The server
uses the <rpc-reply> element to encapsulate RPC response information
(content at the operation and content layers) and sends the RPC response
information to the client.
In normal cases, the <rpc-reply> element encapsulates data required by the
client or information about a configuration success. If the client sends an
incorrect request or the server fails to process a request from the client, the
server encapsulates the <rpc-error> element containing detailed error
information in the <rpc-reply> element and sends the <rpc-reply> element to
the client.
Layer 3: <get-config>, The operation layer defines a series of basic operations used in RPC. These
Operations <edit-config>, basic operations constitute basic capabilities of NETCONF.
and
<notification>
Layer 4: Configuration The content layer describes configuration data involved in network
Content management. The configuration data depends on vendors' devices.
So far, only the content layer has not been standardized for NETCONF. The
content layer has no standard NETCONF data modeling language or data
model.
• NETCONF Manager
A NETCONF manager resides on an NMS server and functions as a client that uses NETCONF to
manage devices. It sends <rpc> elements to a NETCONF agent to query or modify configuration data,
and learns the status of a managed device based on the alarms and events actively reported by the
NETCONF agent.
2022-07-08 149
Feature Description
• NETCONF Agent
A NETCONF agent resides on a managed device and functions as a server that maintains the
configuration data on the managed device, responds to the <rpc> elements sent by a NETCONF
manager, and sends the requested information to the NETCONF manager.
A NETCONF session is a logical connection between a NETCONF manager and agent. A network device must
support at least one NETCONF session.
The NETCONF manager obtains configuration data and status data from the running NETCONF agent and
operates the configuration data to migrate the NETCONF agent status to the expected status. NETCONF
deals with configuration data operations performed by the NETCONF manager and is not involved with how
configuration data is stored.
• Configuration data: a set of writable data that is required to transform a device from its initial default
state into its current state
• State data: the additional non-configuration data on a device, such as read-only status information and
collected statistics
Related Concepts
The NETCONF client and server communicate through the RPC mechanism. To implement the
communication, a secure and connection-oriented session must be established. The client sends an RPC
request to the server. After processing the request, the server sends a response to the client. The RPC request
of the client and the response message of the server are encoded in XML format.
NETCONF defines the syntax and semantics of capabilities. The protocol allows the client and server to
2022-07-08 150
Feature Description
notify each other of supported capabilities. The client can send the operation requests only within the
capability range supported by the server.
• XML encoding
XML, the encoding format used by NETCONF, uses a text file to represent complex hierarchical data.
NETCONF allows a user to use a traditional text compilation tool or XML-specific compilation tool to
read, save, and operate configuration data.
XML-based network management uses XML to describe managed data and management operations so
that management information becomes a database comprehensible to computers. XML-based network
management helps computers efficiently process network management data, improving network
management capabilities.
■ Version: indicates the NETCONF version. "1.0" indicates that the XML1.0 standard version is used.
■ encoding: indicates the character set encoding format. Only UTF-8 encoding is supported.
• RPC mode
NETCONF uses the RPC mechanism and XML-encoded <rpc> and <rpc-reply> elements to provide a
framework of request and response messages independent of transport layer protocols. Table 1 lists
some basic RPC elements.
Table 1 Elements
Element Description
<rpc-reply> Encapsulates a response message for an <rpc> request message. The server returns
a response message, which is encapsulated in the <rpc-reply>element, for each
<rpc> request message.
<rpc-error> Notifies a client of an error that occurs during <rpc> request processing. The server
encapsulates the <rpc-error> element in the <rpc-reply> element and sends the
<rpc-reply> element to the client.
<ok> Notifies a client that no errors occur during <rpc> request processing. The server
encapsulates the <ok> element in the <rpc-reply> element and sends the <rpc-
reply> element to the client.
• Capability set
A capability set includes basic and extended functions implemented based on NETCONF. A device can
2022-07-08 151
Feature Description
add protocol operations through the capability set to extend the operation scope of existing
configuration objects.
Each capability is identified by a unique uniform resource identifier (URI). The URI format of the
capability set defined by NETCONF is as follows:
urn:ietf:params:xml:ns:netconf:capability:{name}:{version}
In addition to the capability set defined by NETCONF, a vendor can define additional capability sets to
extend management functions. A module that supports the YANG model needs to add YANG
notifications to Hello messages before sending the messages. The message format is as follows:
<capability>http://www.huawei.com/netconf/vrp/huawei-ifm?module=huawei-ifm&revision=2013-01-
01</capability>
• Configuration Database
A configuration database is a collection of complete configuration parameters for a device. Table 2
describes NETCONF-defined configuration databases.
Configuration Description
Database
<running/> It stores the effective configuration running on a device, and the device's status
information and statistics.
Unless the NETCONF server supports the candidate capability, this configuration
database is the only standard database that is mandatory.
To support modification of the <running/> configuration database, the device must
have the writable-running capability.
NOTE:
The <candidate/> configuration databases supported by Huawei devices do not allow inter-
session data sharing. Therefore, the configuration of the <candidate/> configuration database
does not require additional locking operations.
<startup/> It stores the configuration data loaded during device startup, which is similar to the
saved configuration file.
To support the <startup/> configuration database, the current device must have the
Distinct Startup capability.
2022-07-08 152
Feature Description
• Message: The message layer provides a simple and independent transmission frame mechanism for RPC
messages. The client encapsulates an RPC request into an <rpc> element. The server encapsulates the
request processing result in the <rpc-reply> element and responds to the client.
• Operations: The operations layer defines a set of basic NETCONF operations, and the operations are
invoked by RPC methods that are based on XML encoding parameters.
• Content: The content (managed object) layer defines a configuration data model. Currently, mainstream
configuration data models include the schema and YANG models.
• message-id: indicates the information code. The value is specified by the client that initiates the RPC
request. After receiving the RPC request message, the server saves the message-id attribute, which is
used when the <rpc-reply> message is generated.
■ base1.0: indicates that the <running/> configuration database is supported. Basic operations, such
as <get-config>, <get>, <edit-config>, <copy-config>, <delete-config>, <lock>, <unlock>, <close-
session>, and <kill-session>, are defined. You can set the <error-option> parameter to stop-on-
error, continue-on-error, or rollback-on-error.
2022-07-08 153
Feature Description
■ base1.1: base1.1 is an upgrade of base:1.0, with the following items being changed.
■ The chunked framing mechanism is added to resolve the security issues in the end-of-message
(EOM) mechanism.
If you want to perform an operation in base1.1, the client must support base1.1 so that this
capability can be advertised during capability set exchanges.
• <error-option>: indicates the mode for processing subsequent operations if an error occurs when <edit-
config> is set. The options are as follows:
■ continue-on-error: records the error information and continues the execution if an error occurs.
The NETCONF server returns to the client an <rpc-reply> message indicating an operation failure if
an error occurs.
■ rollback-on-error: stops the operation if an error occurs and rolls back the configuration to the
state before the <edit-config> operation is performed. This operation is supported only when the
device supports the <rollback-on-error> capability.
• <config>: indicates a group of hierarchical configuration items defined in the data model. The
configuration items must be placed in the specified namespace and meet the constraints of that data
model, as defined by its capability set.
The XML messages sent by a client to a server must be concluded with the end character ]]>]]>. Otherwise, the
server fails to identify the XML messages and does not respond to them. By default, the end character is
automatically added to XML messages sent by a device. In the following example, the end character is not added,
which facilitates XML format identification. In practice, the end character must be added.
If the capability set in the <hello> elements contains base1.1, the RPC messages in YANG model support the chunk
format. Messages in chunk format can be fragmented. The end character is \n##\n.
Response messages:
• For a successful response, an <rpc-reply> message carrying the <ok> element is returned.
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok />
2022-07-08 154
Feature Description
</rpc-reply>
• For a failed response, an <rpc-reply> message carrying the <rpc-error> element is returned.
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="3">
<rpc-error>
<error-type>application</error-type>
<error-tag>bad-element</error-tag>
<error-severity>error</error-severity>
<error-app-tag>43</error-app-tag>
<error-path xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0"
xmlns:acl="http://www.huawei.com/netconf/vrp/huawei-acl">/nc:rpc/nc:edit-
config/nc:config/acl:acl/acl:aclGroups/acl:aclGroup[acl:aclNumOrName="2999"]/acl:aclRuleBas4s/acl:aclRuleBas4[acl:aclRuleName="r
2"]/acl:vrfAny</error-path>
<error-message xml:lang="en">vrfAny has invalid value a.</error-message>
<error-info>
<bad-element>vrfAny</bad-element>
</error-info>
</rpc-error>
</rpc-reply>
■ <error-type>: defines the protocol layer of an error. The layer can be the transport, RPC, protocol,
or application layer.
■ <error-severity>: indicates the severity of an error. The value can be error or warning.
■ <error-app-tag>: indicates a specific error type. This element does not appear if the correct <error-
tag> is not associated with the error type.
■ <error-path>: indicates the location where an error occurs and the file name.
■ <error-info>: contains the error content specific to a protocol or data model. This element does not
appear if the correct <error info> is not provided.
4.8.2.4.1 HUAWEI-NACM
Overview
HUAWEI-NACM authorization includes:
2022-07-08 155
Feature Description
• Data node access control: allows users to query and modify specific data nodes,
such as: /ifm/interfaces/interface/ifAdminStatus/devm/globalPara/maxChassisNum.
The access control rules for NETCONF operations and data nodes can be configured.
Principles
The HUAWEI-NACM mechanism is similar to the task authentication mechanism in command
authentication. HUAWEI-NACM is designed based on NETCONF access control model.
Authentication, authorization and accounting (AAA) defines tasks, task groups, and user groups. The task
authentication mechanism uses a three-layer access control model. This model organizes commands into
tasks, tasks into task groups, and task groups into user groups.
The HUAWEI-NACM mechanism is based on the task authentication mechanism. The HUAWEI-NACM
mechanism subscribes to required information from the task authentication mechanism and stores the
obtained information in its local data structures.
NETCONF operations are implemented based on NETCONF sessions established using Secure Shell (SSH).
NETCONF authorization applies only to SSH users.
• The operation permissions of a user are defined by the user group to which the user belongs. All users
in a user group have the same permissions.
A user's rights cannot be greater than those of the user group.
Figure 1 shows the task authentication diagram, and Figure 2 shows the HUAWEI-NACM diagram. The
HUAWEI-NACM mechanism adds rules for NETCONF operation and data node access control based on the
task authentication mechanism.
2022-07-08 156
Feature Description
Benefits
HUAWEI-NACM is a mechanism to restrict access for particular users to a pre-configured subset of all
2022-07-08 157
Feature Description
4.8.2.4.2 IETF-NACM
Overview
The IETF NETCONF Access Control Model (IETF-NACM) provides simple and easy-to-configure database
access control rules. It helps flexibly manage a specific user's permissions to perform NETCONF operations
and access NETCONF resources.
The YANG model defines IETF-NACM in the ietf-netconf-acm.yang file.
• Data node authorization: authorizes users to query and modify specific data nodes.
• Notification authentication: authorizes a system to report specified alarms or events through the
notification mechanism.
• Action authorization: authorizes users to define operations for data nodes through "action" statements.
• Emergency session recovery: authorizes users to directly initialize or repair the IETF-NACM
authentication configuration without the restriction of access control rules.
Emergency session recovery is a process in which a management-level user or a user in the manage-ug
group bypasses the access control rule and initializes or repairs the IETF-NACM authentication
configuration.
Management-level users are at Level 3 or 15.
By default, IETF-NACM authentication is disabled and the HUAWEI-NACM authentication process is experienced. If IETF-
NACM authentication is enabled, the IETF-NACM authentication process is experienced.
If IETF-NACM authentication is enabled, the access permission on get/ietf-yang-library must be enabled during session
establishment. Otherwise, session establishment fails due to no permission.
2022-07-08 158
Feature Description
• Read: allows a client to read a data node from a database or receive notification events.
Authentication is performed only for the delivered operations but not for all the changed nodes in the model tree. For
example, when a delete operation is performed for a parent node, this operation automatically applies to its child nodes
without authentication. Therefore, the data of both the parent node and its child nodes is deleted in this case.
Components of IETF-NACM
Table 1 describes the components and functions of IETF-NACM.
Component Description
User User defined in the NACM view. The user must be an SSH user.
IETF-NACM authenticates users only. User authentication is
implemented in the AAA view.
Group Group defined in the NACM view. This group instead of a user
performs protocol operations in a NETCONF session.
The group identifier is a group name, which is unique on the
NETCONF server.
Different groups can contain the same user.
2022-07-08 159
Feature Description
Component Description
modified.
exec-default: sets the default execution permission for RPC
operations. If the value is set to permit, NETCONF operations can be
performed. If the value is set to undo permit, NETCONF operations
cannot be performed.
Implementation Principles
After a NETCONF session is established and a user passes the authentication, the NETCONF server controls
access permissions based on the user name, group name, and NACM authentication rule list. Authentication
rules are associated with users through the user group. The administrator of a user group can manage the
permissions of users in the group.
• An IETF-NACM user is associated with an IETF-NACM user group. After IETF-NACM users are added to a
user group, the users in the same user group have the same permissions.
2022-07-08 160
Feature Description
An IETF-NACM authentication rule list is a set of rules. Various authentication rules can be added to an
IETF-NACM authentication rule list in the format of combinations. Users associated with the list can use
the rules in it.
When a user group and an authentication rule list are traversed, if the user name that is the same as that
2022-07-08 161
Feature Description
carried in the request is not found or no rule that matches the requested operation is detected, the
operation performed varies with the authenticated content. For details, see Table 2.
Protocol operation If the RPC operation defined in the YANG file contains the
nacm:default-deny-all statement, the RPC request is rejected.
If the requested operation is <kill-session> or <delete-config>, the RPC
request is rejected.
If the user has the default execution permission of the RPC operation,
the RPC request can be executed. Otherwise, the RPC request is rejected.
Data node If the definition of the data node contains the nacm:default-deny-all
statement, the data node does not support the read or write operation.
If the definition of the data node contains the nacm:default-deny-
write statement, the data node does not support the write operation.
If the user has the query permission, the read operation is allowed.
Otherwise, the read operation is rejected.
If the user has the configuration permission, the write operation is
allowed. Otherwise, the write operation is rejected.
2022-07-08 162
Feature Description
A NETCONF server can send a <hello> element to advertise the capabilities that it supports.
When a Huawei device interconnects to a non-Huawei device:
• If the capabilities contained in a <hello> element sent from the peer are all standard capabilities, the Huawei
device replies with a YANG packet.
• If the capabilities contained in a <hello> element sent from the peer are all standard capabilities and the peer
expects a schema packet, the schema 1.0 capability set can be added in the <hello> element.
<capability>http://www.huawei.com/netconf/capability/schema/1.0</capability>
• If a <hello> element sent from the peer contains extended capabilities, the Huawei device replies with a schema
packet.
After a NETCONF server exchanges <hello> elements with a NETCONF client, the server waits for <rpc>
elements from the client. The server returns an <rpc-reply> element in response to each <rpc> element.
Figure 1 shows the process.
Figure 1 Capabilities exchange interaction between the NETCONF server and client
• Example of a <hello> element sent by the NETCONF server and YANG model
<?xml version="1.0" encoding="UTF-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
<capability>urn:ietf:params:netconf:base:1.1</capability>
<capability>urn:ietf:params:netconf:capability:schema-sets:1.0?list=huawei-yang@2.0.0</capability>
<capability>urn:ietf:params:netconf:capability:writable-running:1.0</capability>
<capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:confirmed-commit:1.0</capability>
<capability>urn:ietf:params:netconf:capability:confirmed-commit:1.1</capability>
<capability>urn:ietf:params:netconf:capability:with-defaults:1.0?basic-mode=report-all&also-supported=report-
all-tagged,trim</capability>
<capability>http://www.huawei.com/netconf/capability/discard-commit/1.0</capability>
<capability>urn:ietf:params:netconf:capability:xpath:1.0</capability>
<capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
<capability>urn:ietf:params:netconf:capability:rollback-on-error:1.0</capability>
<capability>http://www.huawei.com/netconf/capability/sync/1.3</capability>
<capability>http://www.huawei.com/netconf/capability/sync/1.2</capability>
<capability>http://www.huawei.com/netconf/capability/sync/1.1</capability>
<capability>http://www.huawei.com/netconf/capability/sync/1.0</capability>
2022-07-08 163
Feature Description
<capability>http://www.huawei.com/netconf/capability/exchange/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/exchange/1.2</capability>
<capability>http://www.huawei.com/netconf/capability/sync-config/1.1</capability>
<capability>http://www.huawei.com/netconf/capability/sync-config/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/active/1.0</capability>
<capability>urn:ietf:params:netconf:capability:validate:1.0</capability>
<capability>urn:ietf:params:netconf:capability:validate:1.1</capability>
<capability>http://www.huawei.com/netconf/capability/action/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/execute-cli/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/update/1.0</capability>
<capability>http://www.huawei.com/netconf/capability/commit-description/1.0</capability>
<capability>urn:ietf:params:netconf:capability:url:1.0?scheme=file,ftp,sftp</capability>
<capability>http://www.huawei.com/netconf/capability/schema/1.0</capability>
<capability>urn:ietf:params:netconf:capability:notification:1.0</capability>
<capability>urn:ietf:params:netconf:capability:interleave:1.0</capability>
<capability>urn:ietf:params:netconf:capability:notification:2.0</capability>
<capability>urn:ietf:params:netconf:capability:yang-library:1.0?revision=2016-06-21&module-set-
id=3520578387</capability>
<capability>urn:huawei:yang:huawei-acl?module=huawei-acl&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-acl-ucl?module=huawei-acl-ucl&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-bfd?module=huawei-bfd&revision=2019-03-27</capability>
<capability>urn:huawei:yang:huawei-bras-basic-access?module=huawei-bras-basic-access&revision=2019-04-
23</capability>
<capability>urn:huawei:yang:huawei-bras-chasten?module=huawei-bras-chasten&revision=2019-04-
29</capability>
<capability>urn:huawei:yang:huawei-bras-vas?module=huawei-bras-vas&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-cfg?module=huawei-cfg&revision=2019-04-29</capability>
<capability>urn:huawei:yang:huawei-cli?module=huawei-cli&revision=2019-05-01</capability>
<capability>urn:huawei:yang:huawei-debug?module=huawei-debug&revision=2019-04-10</capability>
<capability>urn:huawei:yang:huawei-dgntl?module=huawei-dgntl&revision=2019-04-09</capability>
<capability>urn:huawei:yang:huawei-dhcp?module=huawei-dhcp&revision=2019-04-29</capability>
<capability>urn:huawei:yang:huawei-dhcpv6?module=huawei-dhcpv6&revision=2019-04-29</capability>
<capability>urn:huawei:yang:huawei-dns?module=huawei-dns&revision=2019-04-01</capability>
<capability>urn:huawei:yang:huawei-ecc?module=huawei-ecc&revision=2019-05-01</capability>
<capability>urn:huawei:yang:huawei-ethernet?module=huawei-ethernet&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-etrunk?module=huawei-etrunk&revision=2019-04-29</capability>
<capability>urn:huawei:yang:huawei-extension?module=huawei-extension&revision=2019-05-07</capability>
<capability>urn:huawei:yang:huawei-hwtacacs?module=huawei-hwtacacs&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-ietf-netconf-ext?module=huawei-ietf-netconf-ext&revision=2017-12-
23</capability>
<capability>urn:huawei:yang:huawei-if-ip?module=huawei-if-ip&revision=2019-01-01</capability>
<capability>urn:huawei:yang:huawei-l2vpn?module=huawei-l2vpn&revision=2019-04-04</capability>
<capability>urn:huawei:yang:huawei-l3-multicast?module=huawei-l3-multicast&revision=2019-03-
30</capability>
<capability>urn:huawei:yang:huawei-l3vpn?module=huawei-l3vpn&revision=2019-04-27</capability>
<capability>urn:huawei:yang:huawei-lacp?module=huawei-lacp&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-lldp?module=huawei-lldp&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-mac?module=huawei-mac&revision=2019-04-23</capability>
<capability>urn:huawei:yang:huawei-mac-flapping?module=huawei-mac-flapping&revision=2019-04-
23</capability>
<capability>urn:huawei:yang:huawei-mpls-ldp?module=huawei-mpls-ldp&revision=2019-03-27</capability>
<capability>urn:huawei:yang:huawei-multicast?module=huawei-multicast&revision=2019-03-30</capability>
<capability>urn:huawei:yang:huawei-multicast-bas?module=huawei-multicast-bas&revision=2019-03-
30</capability>
<capability>urn:huawei:yang:huawei-netconf-sync?module=huawei-netconf-sync&revision=2018-08-
30</capability>
<capability>urn:huawei:yang:huawei-network-instance?module=huawei-network-instance&revision=2019-04-
27</capability>
<capability>urn:huawei:yang:huawei-pp4?module=huawei-pp4&revision=2019-04-10</capability>
<capability>urn:huawei:yang:huawei-pp6?module=huawei-pp6&revision=2019-04-01</capability>
<capability>urn:huawei:yang:huawei-pub-type?module=huawei-pub-type&revision=2019-04-27</capability>
2022-07-08 164
Feature Description
<capability>urn:huawei:yang:huawei-radius?module=huawei-radius&revision=2019-04-02</capability>
<capability>urn:huawei:yang:huawei-routing?module=huawei-routing&revision=2019-01-01</capability>
<capability>urn:huawei:yang:huawei-routing-policy?module=huawei-routing-policy&revision=2019-04-
27</capability>
<capability>urn:huawei:yang:huawei-sshc?module=huawei-sshc&revision=2019-05-01</capability>
<capability>urn:huawei:yang:huawei-sshs?module=huawei-sshs&revision=2019-05-01</capability>
<capability>urn:huawei:yang:huawei-syslog?module=huawei-syslog&revision=2019-01-01</capability>
<capability>urn:huawei:yang:huawei-system?module=huawei-system&revision=2018-11-23</capability>
<capability>urn:huawei:yang:huawei-telnets?module=huawei-telnets&revision=2019-05-01</capability>
<capability>urn:huawei:yang:huawei-tm?module=huawei-tm&revision=2019-04-10</capability>
<capability>urn:huawei:yang:huawei-vlan?module=huawei-vlan&revision=2019-04-29</capability>
<capability>urn:huawei:yang:huawei-vrrp?module=huawei-vrrp&revision=2019-03-27</capability>
<capability>urn:huawei:yang:huawei-vty?module=huawei-vty&revision=2019-05-01</capability>
<capability>urn:ietf:params:xml:ns:netconf:base:1.0?module=ietf-netconf&revision=2011-06-
01&features=writable-running,candidate,confirmed-commit,rollback-on-
error,validate,startup,xpath,url</capability>
<capability>urn:ietf:params:xml:ns:netconf:notification:1.0?module=notifications&revision=2008-07-
14</capability>
<capability>urn:ietf:params:xml:ns:netmod:notification?module=nc-notifications&revision=2008-07-
14</capability>
<capability>urn:ietf:params:xml:ns:yang:ietf-inet-types?module=ietf-inet-types&revision=2013-07-
15</capability>
<capability>urn:ietf:params:xml:ns:yang:ietf-netconf-with-defaults?module=ietf-netconf-with-
defaults&revision=2011-06-01</capability>
<capability>urn:ietf:params:xml:ns:yang:ietf-yang-library?module=ietf-yang-library&revision=2016-06-
21</capability>
<capability>urn:ietf:params:xml:ns:yang:ietf-yang-types?module=ietf-yang-types&revision=2013-07-
15</capability>
</capabilities>
<session-id>129</session-id>
</hello>
Overview
Subtree filtering allows an application to include particular XML subtrees in the <rpc-reply> elements for a
<get> or <get-config> operation.
Subtree filtering provides a small set of filters for inclusion, simple content exact-match, and selection. The
NETCONF agent does not need to use any data-model-specific semantics during processing, allowing for
simple and centralized implementation policies.
2022-07-08 165
Feature Description
Component Description
Namespace selection If namespaces are used, then the filter output will include only elements from
the specified namespace.
Containment node A containment node is a node that contains child elements within a subtree
filter.
For each containment node specified in a subtree filter, all data model
instances which are exact matches for the specified namespaces and element
hierarchy are included in the filter output.
Content match node A content match node is a leaf node which contains simple content within a
subtree filter.
A content match node is used to select some or all of its relevant nodes for
filter output and represents an exact-match filter of the leaf node element
content.
Selection node A selection node is an empty leaf node within a subtree filter.
A selection node represents an explicit selection filter of the underlying data
model. Presence of any selection nodes within a set of sibling nodes will cause
the filter to select the specified subtrees and suppress automatic selection of
the entire set of sibling nodes in the underlying data model.
• Namespace selection
If the XML namespace associated with a specific node in the <filter> element is the same as that in the
underlying data model, the namespace is matched.
<filter type="subtree">
<top xmlns="http://example.com/schema/1.2/config"/>
</filter>
In this example, the <top> element is a selection node. If the node namespace complies with
http://example.com/schema/1.2/config, the node and its child nodes will be included in the filter for
output.
• Containment node
The child element of a containment node can be a node of any type, including another containment
node. For each containment node specified in the subtree filter, all data model instances that
2022-07-08 166
Feature Description
completely match the specified namespace and element hierarchy, and any attribute matching
expression are included in the output result.
<filter type="subtree">
<top xmlns="http://example.com/schema/1.2/config">
<users/>
</top>
</filter>
In this example, both the <users> and <user> nodes are containment nodes, and the <name> node is a
content match node. Because the sibling nodes of the <name> node are not specified, only <user>
nodes that comply with namespace http://example.com/schema/1.2/config, with their element
hierarchies matching the name element and their values being fred, can be included in the filter
output. All sibling nodes of the <name> node are included in the filter output.
The support-filter statement in the YANG model indicates whether to support content filtering for a node when the
node is being operated:
1. Content filtering is supported for key nodes by default.
2. Content filtering is not supported for non-key nodes by default. If the value of the support-filter statement is set
to true for a non-key node, content filtering is supported.
• Selection node
Selection nodes represent a basic data model for an explicit selection of filters. If any selection node
appears in a group of same-level sibling nodes, the filter selects a specified subtree and suppresses the
automatic selection of the entire sibling node set in the basic data model. In a filtering expression, an
empty tag (such as <foo/>) or an expression with explicit start and end tags (such as <foo> </ foo>)
can be used to specify an empty leaf node. In this case, all blank characters will be ignored.
<filter type="subtree">
<top xmlns="http://example.com/schema/1.2/config">
<users/>
</top>
</filter>
In this example, the <top> node is a containment node, and the <users> node is a selection node. The
<users> node can be included for filter output only when the <users> node complies with namespace
http://example.com/schema/1.2/config and is contained in the <top> element in the root directory of
2022-07-08 167
Feature Description
• If no filter is used, all data in the current data model is returned in the query result.
RPC request
<rpc message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get/>
</rpc>
RPC reply
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<!-- ... entire set of data returned ... -->
</data>
</rpc-reply>
• If an empty filter is used, the query result contains no data output, in that no content match or
selection node is specified.
RPC request
<rpc message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get>
<filter type="subtree">
</filter>
</get>
</rpc>
RPC reply
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
</data>
</rpc-reply>
• Multi-subtree filtering
The following example uses the root, fred, and barney subtree filters.
The root subtree filter contains two containment nodes (<users> and <user>), one content match node
(<name>), and one selection node (<company-info>). As for subtrees that meet selection criteria, only
<company-info> is selected.
The fred subtree filter contains three containment nodes (<users>, <user>, and <company-info>), one
content match node (<name>), and one selection node (<id>). As for subtrees that meet the selection
2022-07-08 168
Feature Description
RPC reply
<rpc-reply message-id="101"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<top xmlns="http://example.com/schema/1.2/config">
<users>
<user>
<name>root</name>
<company-info>
<dept>1</dept>
<id>1</id>
</company-info>
</user>
<user>
<name>fred</name>
<company-info>
<id>2</id>
</company-info>
</user>
</users>
2022-07-08 169
Feature Description
</top>
</data>
</rpc-reply>
Definition
YANG is a data modeling language used to model configuration and state data manipulated by the Network
Configuration Protocol (NETCONF), NETCONF remote procedure calls (RPCs), and NETCONF notifications.
Purpose
In addition to providing devices, each vendor provides an independent device management method (for
example, using different commands) to facilitate management. These management methods are
independent of each other and cannot be universally used. As the network scale expands and the number of
device types increases, traditional NETCONF management methods fail to meet the requirement for
managing various devices. To uniformly manage, configure, and monitor various devices on the network,
YANG is developed.
Benefits
YANG is gradually becoming a mainstream service description language for service provisioning interfaces. It
structures data models and defines attributes and values through tags. The YANG data model is a machine-
oriented model interface, which defines data structures and constraints to provide more flexible and
complete data description. Network administrators can use NETCONF to uniformly manage, configure, and
monitor various network devices that support YANG, simplifying network O&M and reducing O&M costs.
• Module definition
Modules and submodules: YANG structures data models into modules and submodules. A module can
2022-07-08 170
Feature Description
import data from other modules and reference data from submodules. The hierarchy can be
augmented, allowing one module to add data nodes to the hierarchy defined in another module. This
augmentation is conditional, with new nodes presented only if certain conditions are met.
"import" and "include" statements for modules and submodules: The "include" statement allows a
module or submodule to reference materials in submodules, and the "import" statement allows
references to materials defined in other modules.
• Namespace of a module
The namespace of a module must be globally unique.
• Version of a module
The "revision" statement records the version change history of a module. Any updated revision
information must be associated with the corresponding file name.
2022-07-08 171
Feature Description
Operation Definition
You can define operations in the YANG model through RPCs or the "action" statement. The definitions
include operation names, input parameters, and output parameters.
2022-07-08 172
Feature Description
}
action reset {
input {
leaf reset-at {
type yang:date-and-time;
mandatory true;
}
}
output {
leaf reset-finished-at {
type yang:date-and-time;
mandatory true;
}
}
}
}
}
The corresponding NETCONF XML description is as follows: The reset operation is performed for the
server named apache-1 at the user-specified time "2014-07-29T13:42:00Z", and a reply packet
indicating the execution end time is returned.
■ RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<action xmlns="urn:ietf:params:xml:ns:yang:1">
<server xmlns="urn:example:server-farm">
<name>apache-1</name>
<reset>
<reset-at>2014-07-29T13:42:00Z</reset-at>
</reset>
</server>
</action>
</rpc>
■ RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<reset-finished-at xmlns="urn:example:server-farm">
2014-07-29T13:42:12Z
</reset-finished-at>
</rpc-reply>
Notification Definition
NETCONF notification is a NETCONF-based mechanism for alarm and event subscription and reporting,
providing a data-model-based asynchronous reporting service. YANG allows the definition of notifications
suitable for NETCONF. YANG data definition statements are used to model the notification content.
NETCONF is different from the single-transmission and single-receiving mechanisms of RPC packets, which a
client uses to query data. NETCONF notifications allow a server to proactively send notification packets to a
client in case of an alarm or an event. NETCONF notification applies to scenarios where real-time detection
of devices is required, such as, alarms and events reported to an NMS through NETCONF.
The following example shows a defined notification. The name is link-failure. If any of the if-name, if-
admin-status, and if-oper-status parameter values changes, the change is reported to a client.
notification link-failure {
2022-07-08 173
Feature Description
2022-07-08 174
Feature Description
• Existent container: The existence of a container is meaningful. For configuration data, a container node
provides a configuration button and is also a method of organizing relevant configurations.
Take an existent container node named system as an example. The node contains another container
node named services. YANG example:
container system {
container services{
container ssh{
presence "Enables SSH";
// more leafs, containers and stuff here...
}
}
}
• Non-existent container: A container itself is meaningless. It functions as a hierarchy for organizing data
nodes and accommodating child nodes. By default, a container is in the format of non-existent
container.
Take a non-existent container node named system as an example. The node contains another container
node named login, which contains a leaf node named message. YANG example:
container system {
container login {
leaf message {
type string;
description
"Message given at start of login session";
}
}
}
2022-07-08 175
Feature Description
uniquely identified by the values of its key leaf nodes. A list node can define multiple key leaf nodes and
may contain any number of child nodes of any type (such as the leaf-list, list, and container nodes).
Take a list node named user as an example. The list node includes three leaf nodes, whose key value is
name. YANG example:
list user {
key "name";
leaf name {
type string;
}
leaf full-name {
type string;
}
leaf class {
type string;
}
}
2022-07-08 176
Feature Description
<destination>
<address>192.168.2.1</address>
<port>830</port>
</destination>
</peer>
The grouping can be refined as it is used, allowing certain statements to be overridden. The following
example shows how the description is refined:
container connection {
container source {
uses target {
refine "address" {
description "Source IP address";
}
refine "port" {
description "Source port number";
}
}
}
container destination {
uses target {
refine "address" {
description "Destination IP address";
}
refine "port" {
description "Destination port number";
}
}
}
}
2022-07-08 177
Feature Description
enum milk;
enum first-available;
}
}
}
}
}
list interface {
key "name";
leaf name {
type string;
}
leaf status {
type boolean;
default "true";
}
leaf observed-speed {
type yang:gauge64;
units "bits/second";
config false;
}
}
}
2022-07-08 178
Feature Description
2022-07-08 179
Feature Description
type uint8 {
range "0 .. 100";
}
description "Percentage";
}
leaf completed {
type percent;
}
This example defines a "uid" node that is valid only when the user's "class" is not "wheel".
2022-07-08 180
Feature Description
• If the status attribute of the node is deprecated, the node is not recommended.
The following uses the node in the huawei-devm.yang file as an example. The status attribute of the leaf
node is deprecated, indicating that the leaf node is not recommended.
1leaf serial-number {
2 type string {
3 length "0..32";
4 }
5 config false;
6 status deprecated;
7 description
8 "Entity number.";
9}
support-filter For non-key leaf nodes under the list leaf type {
type group4-type;
node, if the support-filter value is true,
mandatory true;
filtering is supported. ext:support-filter "true";
}
The support-filter value can be true or
false. Model description: The type node
supports filtering.
If support-filter is not set for a node, the
node does not support filtering.
2022-07-08 181
Feature Description
2022-07-08 182
Feature Description
name. }
2022-07-08 183
Feature Description
2022-07-08 184
Feature Description
when the conditions are met and In IPv4 VPN instances, the IPv4 unicast, IPv4
deletion when the conditions are not flow,
and IPv4 labeled unicast address families can
met. be configured. In IPv6 VPN instances, the IPv6
This syntax indicates only the list/leaf- unicast
and IPv6 flow address families can be
list/presence container that is created. configured.
The value of each node in the list and The IPv4 address family in the BGP _public_
VPN
presence container is not expressed. instance cannot be deleted.";
ext:generated-by system {
The syntax value can be user or system.
when "../../../../ni:name =
The default value is user. '_public_'";
ext:filter "type = 'ipv4uni'";
The syntax contains the following description "The public instances is
clauses: generated automatically when BGP is
enabled.";
when clause: describes the specific }
condition for creating or deleting a list or
Model description: When the when
presence container. For details, see the
clause ../../../../ni:name = '_public_' is
standard when syntax.
met, the system automatically creates a
filter clause: describes the filtering
unicast address family of the ipv4uni
criterion for creating or deleting a list or
type.
presence container.
description clause: describes the
generated-by scenario in detail.
2022-07-08 185
Feature Description
the
/ifm:ifm/ifm:interfaces/ifm:interface
node.
4.8.4.1
The <get-config> operation retrieves all or specified configuration data from the <running/>, <candidate/>,
and <startup/> configuration databases.
• source: specifies a configuration database from which data is retrieved. The value can be <running/>,
<candidate/>, or <startup/>.
• filter: specifies a range to be queried in the configuration database. If this parameter is not specified,
the entire configuration is returned.
Query interface configuration of the IFM feature in the <running/> configuration database and return the
interface information in an RPC reply message:
• RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="827">
<get-config>
<source>
<running/>
</source>
<filter type="subtree">
2022-07-08 186
Feature Description
<ifm:ifm xmlns:ifm="urn:huawei:yang:huawei-ifm">
<ifm:interfaces>
<ifm:interface/>
</ifm:interfaces>
</ifm:ifm>
</filter>
</get-config>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface>
<name>GigabitEthernet0/0/0</name>
<class>main-interface</class>
<type>MEth</type>
<number>0/0/0</number>
<admin-status>up</admin-status>
<link-protocol>ethernet</link-protocol>
<statistic-enable>true</statistic-enable>
<mtu>1500</mtu>
<spread-mtu-flag>false</spread-mtu-flag>
<vrf-name>_public_</vrf-name>
</interface>
</interfaces>
</ifm>
</data>
4.8.4.2
The <get-data> operation can be used to retrieve all or specified configuration or status data sets from the
NMDA data set.
• source: specifies a configuration database from which data is being retrieved. If the database name is
<ietf-datastores:running/>, <ietf-datastores:candidate/> or <ietf-datastores:startup/>, the configuration
data is returned. If the database name is <ietf-datastore:operational/>, the configuration and status
data of the current device is returned.
• xpath-filter: specifies a range to be queried in the configuration database in the form of an XPath. If this
parameter is not specified, all configurations on the device are returned.
• subtree-filter: specifies a range to be queried in the configuration database in the form of a subtree. If
this parameter is not specified, all configurations on the device are returned.
The following example shows how to query the task group configuration of the AAA feature in the <ietf-
datastores:running/> configuration database. The queried group information is returned in an RPC reply
message.
• RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
2022-07-08 187
Feature Description
• RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="5">
<data xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda">
<aaa xmlns="urn:huawei:yang:huawei-aaa">
<task-groups>
<task-group>
<name>manage-tg</name>
</task-group>
<task-group>
<name>system-tg</name>
</task-group>
<task-group>
<name>monitor-tg</name>
</task-group>
<task-group>
<name>visit-tg</name>
</task-group>
</task-groups>
</aaa>
</data>
</rpc-reply>
• RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-data xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda" xmlns:ds="urn:ietf:params:xml:ns:yang:ietf-
datastores">
<datastore>ds:running</datastore>
<xpath-filter xmlns:aaa="urn:huawei:yang:huawei-aaa">/aaa:aaa/aaa:task-groups/aaa:task-group</xpath-filter>
</get-data>
</rpc>
• RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="5">
<data xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda">
<aaa xmlns="urn:huawei:yang:huawei-aaa">
<task-groups>
<task-group>
<name>manage-tg</name>
</task-group>
<task-group>
<name>system-tg</name>
</task-group>
<task-group>
2022-07-08 188
Feature Description
<name>monitor-tg</name>
</task-group>
<task-group>
<name>visit-tg</name>
</task-group>
</task-groups>
</aaa>
</data>
</rpc-reply>
4.8.4.3
The <get> operation retrieves configuration and state data only from the <running/> configuration database.
If the <get> operation is successful, the server sends an <rpc-reply> element containing a <data> element
with the results of the query. Otherwise, the server returns an <rpc-reply> element containing an <rpc-error>
element.
• The <get-config> operation can retrieve data from the <running/>, <candidate/>, and <startup/> configuration
databases, whereas the <get> operation can only retrieve data from the <running/> configuration database.
• The <get-config> operation can only retrieve configuration data, whereas the <get> operation can retrieve both
configuration and state data.
Query interface configuration of the IFM feature in the <running/> configuration database and return the
interface information in an RPC reply message:
• RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="831">
<get>
<filter type="subtree">
<ifm:ifm xmlns:ifm="urn:huawei:yang:huawei-ifm">
<ifm:interfaces>
<ifm:interface/>
</ifm:interfaces>
</ifm:ifm>
</filter>
</get>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface>
<name>GigabitEthernet0/0/0</name>
<index>4</index>
<class>main-interface</class>
<type>MEth</type>
<position>0/0/0</position>
<number>0/0/0</number>
2022-07-08 189
Feature Description
<admin-status>up</admin-status>
<link-protocol>ethernet</link-protocol>
<statistic-enable>true</statistic-enable>
<mtu>1500</mtu>
<spread-mtu-flag>false</spread-mtu-flag>
<vrf-name>_public_</vrf-name>
<dynamic>
<oper-status>up</oper-status>
<physical-status>up</physical-status>
<link-status>up</link-status>
<mtu>1500</mtu>
<bandwidth>100000000</bandwidth>
<ipv4-status>up</ipv4-status>
<ipv6-status>down</ipv6-status>
<is-control-flap-damp>false</is-control-flap-damp>
<mac-address>00e0-fc12-3456</mac-address>
<line-protocol-up-time>2019-05-25T02:33:46Z</line-protocol-up-time>
<is-offline>false</is-offline>
<link-quality-grade>good</link-quality-grade>
</dynamic>
<mib-statistics>
<receive-byte>0</receive-byte>
<send-byte>0</send-byte>
<receive-packet>363175</receive-packet>
<send-packet>61660</send-packet>
<receive-unicast-packet>66334</receive-unicast-packet>
<receive-multicast-packet>169727</receive-multicast-packet>
<receive-broad-packet>127122</receive-broad-packet>
<send-unicast-packet>61363</send-unicast-packet>
<send-multicast-packet>0</send-multicast-packet>
<send-broad-packet>299</send-broad-packet>
<receive-error-packet>0</receive-error-packet>
<receive-drop-packet>0</receive-drop-packet>
<send-error-packet>0</send-error-packet>
<send-drop-packet>0</send-drop-packet>
</mib-statistics>
<common-statistics>
<stati-interval>300</stati-interval>
<in-byte-rate>40</in-byte-rate>
<in-bit-rate>320</in-bit-rate>
<in-packet-rate>2</in-packet-rate>
<in-use-rate>0.01%</in-use-rate>
<out-byte-rate>0</out-byte-rate>
<out-bit-rate>0</out-bit-rate>
<out-packet-rate>0</out-packet-rate>
<out-use-rate>0.00%</out-use-rate>
<receive-byte>0</receive-byte>
<send-byte>0</send-byte>
<receive-packet>363183</receive-packet>
<send-packet>61662</send-packet>
<receive-unicast-packet>66334</receive-unicast-packet>
<receive-multicast-packet>169727</receive-multicast-packet>
<receive-broad-packet>127122</receive-broad-packet>
<send-unicast-packet>61363</send-unicast-packet>
<send-multicast-packet>0</send-multicast-packet>
<send-broad-packet>299</send-broad-packet>
<receive-error-packet>0</receive-error-packet>
<receive-drop-packet>0</receive-drop-packet>
<send-error-packet>0</send-error-packet>
<send-drop-packet>0</send-drop-packet>
<send-unicast-bit>0</send-unicast-bit>
2022-07-08 190
Feature Description
<receive-unicast-bit>0</receive-unicast-bit>
<send-multicast-bit>0</send-multicast-bit>
<receive-multicast-bit>0</receive-multicast-bit>
<send-broad-bit>0</send-broad-bit>
<receive-broad-bit>0</receive-broad-bit>
<send-unicast-bit-rate>0</send-unicast-bit-rate>
<receive-unicast-bit-rate>0</receive-unicast-bit-rate>
<send-multicast-bit-rate>0</send-multicast-bit-rate>
<receive-multicast-bit-rate>0</receive-multicast-bit-rate>
<send-broad-bit-rate>0</send-broad-bit-rate>
<receive-broad-bit-rate>0</receive-broad-bit-rate>
<send-unicast-packet-rate>0</send-unicast-packet-rate>
<receive-unicast-packet-rate>0</receive-unicast-packet-rate>
<send-multicast-packet-rate>0</send-multicast-packet-rate>
<receive-multicast-packet-rate>0</receive-multicast-packet-rate>
<send-broadcast-packet-rate>0</send-broadcast-packet-rate>
<receive-broadcast-packet-rate>0</receive-broadcast-packet-rate>
</common-statistics>
</interface>
</ifm>
</data>
4.8.4.4
The <edit-config> operation loads all or some configurations to a specified target configuration database
(<running/> or <candidate/>). The device authorizes the operation in <edit-config>. After the authorization
succeeds, the device performs corresponding modification.
The <edit-config> operation supports multiple modes for loading configurations. For example, you can load
local and remote files, and edit files online. If a NETCONF server supports the URL capability, the <url>
parameter (which identifies a local configuration file) can be used to replace the <config> parameter.
• <config>: indicates a group of hierarchical configuration items defined in the data model.
The <config> parameter may contain the optional operation attribute, which is used to specify an
operation type for a configuration item. If the operation attribute is not present, the <merge> operation
is performed by default. The values of the operation attribute are as follows:
■ merge: modifies or creates data in the database. Specifically, if the target data exists, this
operation modifies the data. If the target data does not exist, this operation creates the data. This
is the default operation.
■ create: adds configuration data to the configuration database only if such data does not already
exist. If the configuration data already exists, <rpc-error> is returned, in which the <error-tag>
value is data-exists.
■ delete: deletes a specified configuration data record from the configuration database. If the data
exists, it is deleted. If the data does not exist, <rpc-error> is returned, in which the <error-tag>
value is data-missing.
■ remove: removes a specified configuration data record from the configuration database. If the data
exists, it is deleted. If the data does not exist, a success message is returned.
2022-07-08 191
Feature Description
■ replace: replaces configuration data records in the configuration database. If the data exists, all
relevant data is replaced. If the data does not exist, the data is created. Different from the <copy-
config> operation (which completely replaces the configuration data in the target configuration
database), this operation affects only the configuration that exists in the <config> parameter.
• target: indicates the configuration database to be edited. The configuration database can be set based
on the scenario.
■ In two-phase validation mode, set the database to <candidate/>. After editing the database,
perform the <commit> operation to submit the configuration for the modification to take effect.
■ merge: merges the configuration data in the <config> parameter with the configuration data in the
target configuration database. This is the default operation.
■ replace: completely replaces the configuration data in the target configuration database with the
configuration data in the <config> parameter.
■ none: ensures that the configuration data in <config> does not affect that in the target
configuration database, with the exception that the operation specified by the operation attribute is
performed. If the <config> parameter contains configuration data that does not exist at the
corresponding data level in the target configuration database, <rpc-error> is returned, in which the
<error-tag> value is data-missing. This prevents redundant elements from being created when a
specified operation is performed. For example, when a specified child element is deleted, <config>
contains the parent hierarchical structure of the child element but the target database does not
contain the configuration of the parent element. If the value of the default-operation parameter is
not none, the configuration of the parent element is created in the database when the child
element is deleted. Otherwise, the child element is deleted, and the configuration of the parent
element is not created.
• <error-option>: sets a processing mode for subsequent instances after a configuration error of an
instance occurs. The default value is stop-on-error. The values are as follows:
1. If the target configuration library is <running/>:
■ stop-on-error: stops the operation if an error occurs and roll back the configuration according to
the rollback-on-error mode.
■ continue-on-error: records the error information and continues the execution if an error occurs.
The NETCONF server returns an <rpc-reply> message indicating an operation failure to the client
after an error occurs.
■ rollback-on-error: stops the operation if an error occurs and rolls back the configuration to the
state before the <edit-config> operation is performed. This operation is supported only when the
2022-07-08 192
Feature Description
2. If the target configuration library is <candidate/>, set the value of <error-option> to rollback-on-
error for subsequent instances after a configuration error of an instance occurs.
The following example shows how to change the description value of the interface named
GigabitEthernet0/0/1 in the <running/> configuration database to huawei.
• RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="15">
<edit-config>
<target>
<running/>
</target>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface>
<name>GigabitEthernet0/0/1</name>
<description>huawei</description>
</interface>
</interfaces>
</ifm>
</config>
</edit-config>
</rpc>
• RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="15"
nc-ext:flow-id="27">
<ok/>
</rpc-reply>
The following example shows how to delete the configuration on the interface named LoopBack1023 from
the running configuration database.
• RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="844">
<edit-config>
<target>
<running/>
</target>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" nc:operation="delete">
<name>LoopBack1023</name>
</interface>
</interfaces>
</ifm>
2022-07-08 193
Feature Description
</config>
</edit-config>
</rpc>
• RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="844"
nc-ext:flow-id="29">
<ok/>
</rpc-reply>
If the validate capability is supported, the <edit-config> operation can carry the <test-option> parameter. If
the <test-option> parameter is not specified, the system processes the <edit-config> operation based on the
test-then-set process by default.
• If the <test-option> parameter value is test-then-set or the parameter is not specified, nodes at any
layer support the <delete> and <remove> operations that delete all configuration data of a specified
node in the configuration database.
Example of deleting the vplsInstances configuration.
RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="849">
<edit-config>
<target>
<running/>
</target>
<config>
<l2vpn xmlns="urn:huawei:yang:huawei-l2vpn">
<instances xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" nc:operation="delete"/>
</l2vpn>
</config>
</edit-config>
</rpc>
RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="849"
nc-ext:flow-id="31">
<ok/>
</rpc-reply>
4.8.4.5
The <edit-data> operation can be used to load all or some configuration data to a specified target
configuration database (<ietf-datastores:running/> or <ietf-datastores:candidate/>). The device authorizes
the operation in <edit-data>. After the authorization succeeds, the device performs corresponding
modification.
The <edit-data> operation supports multiple modes for loading configurations. For example, you can load
local and remote files, and edit files online. If a NETCONF server supports the URL capability, the <url>
2022-07-08 194
Feature Description
parameter (which identifies a local configuration file) can be used to replace the <config> parameter.
Parameters in an RPC message of the <edit-data> operation are described as follows:
• <config>: indicates a group of hierarchical configuration items defined in the data model.
The <config> parameter may contain the optional operation attribute, which is used to specify an
operation type for a configuration item. If the operation attribute is not present, the <merge> operation
is performed by default. The values of the operation attribute are as follows:
■ merge: modifies or creates data in the database. Specifically, if the target data exists, this
operation modifies the data. If the target data does not exist, this operation creates the data. This
is the default operation.
■ create: adds configuration data to the configuration database only if such data does not already
exist. If the configuration data already exists, <rpc-error> is returned, in which the <error-tag>
value is data-exists.
■ delete: deletes a specified configuration data record from the configuration database. If the data
exists, it is deleted. If the data does not exist, <rpc-error> is returned, in which the <error-tag>
value is data-missing.
■ remove: removes a specified configuration data record from the configuration database. If the data
exists, it is deleted. If the data does not exist, a success message is returned.
■ replace: replaces configuration data records in the configuration database. If the data exists, all
relevant data is replaced. If the data does not exist, the data is created. Different from the <copy-
config> operation (which completely replaces the configuration data in the target configuration
database), this operation affects only the configuration that exists in the <config> parameter.
• target: indicates the configuration database to be edited. The configuration database can be set based
on the scenario.
■ In two-phase validation mode, set the database to <ietf-datastores:candidate/>. After editing the
database, perform the <commit> operation so that the modification takes effect.
■ merge: merges the configuration data in the <config> parameter with that in the target
configuration database. This is the default operation.
■ replace: completely replaces the configuration data in the target configuration database with the
configuration data in the <config> parameter.
■ none: ensures that the configuration data in <config> does not affect that in the target
configuration database, with the exception that the operation specified by the operation attribute is
performed. If the <config> parameter contains configuration data that does not exist at the
corresponding data level in the target configuration database, <rpc-error> is returned, in which the
2022-07-08 195
Feature Description
<error-tag> value is data-missing. This prevents redundant elements from being created when a
specified operation is performed. For example, when a specified child element is deleted, <config>
contains the parent hierarchical structure of the child element but the target database does not
contain the configuration of the parent element. If the value of the default-operation parameter is
not none, the configuration of the parent element is created in the database when the child
element is deleted. Otherwise, the child element is deleted, and the configuration of the parent
element is not created.
The following example shows how to change the description of GigabitEthernet interface 0/1/0 in the <ietf-
datastores:running/> configuration database to huawei.
• RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="5">
<edit-data xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda"
xmlns:ds="urn:ietf:params:xml:ns:yang:ietf-datastores">
<datastore>ds:running</datastore>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface>
<name>GigabitEthernet0/1/0</name>
<description>huawei</description>
</interface>
</interfaces>
</ifm>
</config>
</edit-data>
</rpc>
• RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="5"
nc-ext:flow-id="27">
<ok/>
</rpc-reply>
The following example shows how to delete the configuration on the interface named LoopBack1023 from
the running configuration database.
• RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="5">
<edit-data xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda"
xmlns:ds="urn:ietf:params:xml:ns:yang:ietf-datastores">
<datastore>ds:running</datastore>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" nc:operation="delete">
<name>LoopBack1023</name>
</interface>
2022-07-08 196
Feature Description
</interfaces>
</ifm>
</config>
</edit-data>
</rpc>
• RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="5"
nc-ext:flow-id="28">
<ok/>
</rpc-reply>
4.8.4.6
The <copy-config> operation saves the data in <source/> to <target/>.
<startup/> <url/> If <source/> does not exist or the URL is unreachable, an error
message is displayed, and packets cannot be delivered.
<candidate/> -
2022-07-08 197
Feature Description
<config/> <candidate/> -
Save the configuration data in the <running/> configuration database to the local eee.xml file:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<copy-config>
<target>
<url>file:///eee.xml</url>
</target>
<source>
<running/>
</source>
</copy-config>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
Use FTP to save the configuration data in the <candidate/> configuration database to a remote path
specified by the URL:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<copy-config>
<target>
<url>ftp://root:root@10.1.1.1/abc.xml</url>
</target>
<source>
<candidate/>
</source>
</copy-config>
</rpc>
2022-07-08 198
Feature Description
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
Use SFTP to copy remote configuration data to the <candidate/> database in URL mode.
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<copy-config>
<target>
<candidate/>
</target>
<source>
<url>sftp://root:root@10.1.1.1/abc.xml</url>
</source>
</copy-config>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
4.8.4.7
The <delete-config> operation deletes the <startup/> configuration database or deletes configuration data
from the <candidate/> configuration database.
If the <delete-config> operation is successful, the server sends an <rpc-reply> element containing an <ok>
element. Otherwise, the server sends an <rpc-reply> element containing an <rpc-error> element.
Delete the <startup/> configuration database:
• RPC request:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<delete-config>
<target>
<startup/>
</target>
</delete-config>
</rpc>
• RPC reply:
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
• RPC request:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<delete-config>
2022-07-08 199
Feature Description
<target>
<nc-ext:candidate xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"/>
</target>
</delete-config>
</rpc>
• RPC reply:
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
This operation requires two-phase commitment. That is, a commit packet needs to be delivered to
commit the configuration to the <running/> database.
After the <delete-config> operation is performed to delete configuration data from the <candidate/> database, if
the commit operation is directly delivered, the configuration information of the device is deleted. As a result, the
NETCONF session is disconnected. If you need to reconnect to the device, you must reconfigure the login
information.
4.8.4.8
The <lock> operation locks a configuration database. A locked configuration database cannot be modified by
other clients. The locks eliminate errors caused by simultaneous database modifications by the NETCONF
manager and other NETCONF managers or Simple Network Management Protocol (SNMP) or command-
line interface (CLI) scripts.
If the specified configuration database is already locked by a client, the <error-tag> element will be "lock-
denied" and the <error-info> element will include the <session-id> of the lock owner.
If the <running/> configuration database is successfully locked:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<lock>
<target>
<running/>
</target>
</lock>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
If the NMDA data set is supported, the data set format in the target configuration database is different, as
shown in the following:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<lock xmlns:ds="urn:ietf:params:xml:ns:yang:ietf-datastores">
2022-07-08 200
Feature Description
<target>
<datastore xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda">ds:running</datastore>
</target>
</lock>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<lock>
<target>
<running/>
</target>
</lock>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<rpc-error>
<error-type>protocol</error-type>
<error-tag>lock-denied</error-tag>
<error-severity>error</error-severity>
<error-app-tag>43</error-app-tag>
<error-message>The configuration is locked by other user. [Session ID = 629] </error-message>
<error-info>
<session-id>629</session-id>
<error-paras>
<error-para>629</error-para>
</error-paras>
</error-info>
</rpc-error>
</rpc-reply>
If the NMDA data set is supported, the data set format in the target configuration database is different, as
shown in the following:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<lock xmlns:ds="urn:ietf:params:xml:ns:yang:ietf-datastores">
<target>
<datastore xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda">ds:running</datastore>
</target>
</lock>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<rpc-error>
<error-type>protocol</error-type>
<error-tag>lock-denied</error-tag>
2022-07-08 201
Feature Description
<error-severity>error</error-severity>
<error-app-tag>43</error-app-tag>
<error-message>The configuration is locked by other user. [Session ID = 629] </error-message>
<error-info>
<session-id>629</session-id>
<error-paras>
<error-para>629</error-para>
</error-paras>
</error-info>
</rpc-error>
</rpc-reply>
4.8.4.9
The <unlock> operation releases a configuration lock previously obtained with the <lock> operation. A client
cannot unlock a configuration database that it did not lock.
If the <unlock> operation is successful, the server sends an <rpc-reply> element containing an <ok> element.
Otherwise, the server sends an <rpc-reply> element containing an <rpc-error> element.
Unlock the <running/> configuration database:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<unlock>
<target>
<running/>
</target>
</unlock>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
If the NMDA data set is supported, the data set format in the target configuration database is different, as
shown in the following:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<unlock xmlns:ds="urn:ietf:params:xml:ns:yang:ietf-datastores">
<target>
<datastore xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda">ds:running</datastore>
</target>
</unlock>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
4.8.4.10
2022-07-08 202
Feature Description
• RPC request:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<close-session/>
</rpc>
• RPC reply:
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
4.8.4.11
The <kill-session> operation forcibly closes a NETCONF session. Only an administrator is authorized to
perform this operation.
After receiving a <kill-session> request, the NETCONF server stops all operations that are being performed
for the session, releases all the locks and resources associated with the session, and terminates the session.
If the NETCONF server receives a <kill-session> request when performing the <commit> operation, it must
restore the configuration to the status before the configuration is committed.
Close the NETCONF session with session-id 4:
• RPC request:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<kill-session>
<session-id>4</session-id>
</kill-session>
</rpc>
• RPC reply:
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
4.8.5.1 Writable-running
This capability indicates that the device supports writes to the <running/> configuration database. In other
words, the device supports <edit-config> and <copy-config> operations on the <running/> configuration
database.
• RPC request:
2022-07-08 203
Feature Description
• RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="101"
nc-ext:flow-id="27">
<ok/>
</rpc-reply>
• <commit>: converts all configuration data in the <candidate/> configuration database into running
configuration data.
If the device is unable to commit all of the changes in the <candidate/> configuration database, the
running configuration data remains unchanged.
• <discard-changes>: discards configuration data that has not been committed from the <candidate/>
configuration database. After this operation is performed, the configuration data in the <candidate/>
configuration database remains the same as that in the <running/> configuration database.
A device establishes an independent <candidate/> configuration database for each NETCONF session.
• RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
2022-07-08 204
Feature Description
<edit-config>
<target>
<candidate/>
</target>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface>
<name>GigabitEthernet1/0/0</name>
<mtu>1500</mtu>
</interface>
</interfaces>
</ifm>
</config>
</edit-config>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="101"
nc-ext:flow-id="27">
<ok/>
</rpc-reply>
confirmed-commit:1.0
The <commit> operation can carry the <confirmed> and <confirm-timeout> parameters.
• <confirmed>: submits the configuration data in the <candidate/> configuration database and converts it
into the running configuration data on a device (configuration data in the <running/> configuration
database).
• <confirm-timeout>: specifies a timeout period for confirming the <commit> operation, in seconds. The
default value is 600s. After the <commit> operation is performed, if the confirmation operation is not
performed within the timeout period, the configuration in the <running/> configuration database is
rolled back to the status before the <commit> operation is performed and the modified data in the
<candidate/> configuration database is abandoned.
This capability is valid only when the candidate configuration capability is supported. It is mainly used in
service trial running and verification scenarios.
Submit the current configuration and set the timeout period for confirming the <commit> operation to 120s:
• RPC request:
2022-07-08 205
Feature Description
• RPC reply:
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
confirmed-commit:1.1
• The <commit> operation can carry the <persist> and <persist-id> parameters.
If a <confirmed-commit> message carries the <persist> parameter, the trial run operation created using
<confirmed-commit> is still effective after the associated session is terminated. The device allows a
message to carry the <persist-id> parameter to update an existing trial-run operation.
Carry the <persist> parameter in a message for the <commit> operation:
RPC request:
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="3">
<commit>
<confirmed/>
<persist>123</persist>
</commit>
</rpc>
RPC reply:
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="3">
<ok/>
</rpc-reply>
RPC reply:
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<ok/>
</rpc-reply>
• The <cancel-commit> operation is supported. The <persist-id> parameter can be carried to eliminate or
terminate the trial operation that is being executed, which is created using <confirmed-commit> with
the <persist> parameter.
RPC request:
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<cancel-commit>
2022-07-08 206
Feature Description
<persist-id>IQ,d4668</persist-id>
</cancel-commit>
</rpc>
RPC reply:
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<ok/>
</rpc-reply>
4.8.5.4 Rollback
The rollback capability indicates that the device can roll back to the corresponding configuration based on
the specified file and commitId.
This capability is only available when the device supports the candidate configuration capability.
Roll back the current configuration to the configuration of the specified commitId.
• RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<cfg:rollback-by-commit-id xmlns:cfg="urn:huawei:yang:huawei-cfg">
<cfg:commit-id>1000033829</cfg:commit-id>
</cfg:rollback-by-commit-id>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<ok/>
</rpc-reply>
The Rollback on Error capability is supported. More specifically, "rollback-on-error" can be carried in the
<error-option> parameter of the <edit-config> operation. If an error occurs and the <rpc-error> element is
generated, the server stops performing the <edit-config> operation and restores the specified configuration
to the status before the <edit-config> operation is performed.
• RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<edit-config>
<target>
<running/>
</target>
<error-option>rollback-on-error</error-option>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface>
<name>GigabitEthernet1/0/0</name>
<mtu>1000</mtu>
</interface>
</interfaces>
</ifm>
</config>
2022-07-08 207
Feature Description
</edit-config>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="101"
nc-ext:flow-id="27">
<ok/>
</rpc-reply>
• RPC request:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<copy-config>
<source>
<running/>
</source>
<target>
<startup/>
</target>
</copy-config>
</rpc>
• RPC reply:
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
2022-07-08 208
Feature Description
• An XPath can only be an absolute path, and steps are separated using slashes (/), for example,
/acl:acl/acl:groups/acl:group.
• Only predicates in the [node name='value'] format (for example, [genre='Computer']) are supported.
There can be multiple predicates, which are in an AND relationship.
If an XPath expression is used as a filter criterion, the value of the type attribute in the <filter> element is
xpath, and the value of the select attribute (which must exist) is the XPath expression.
<filter type="xpath" xmlns:acl="urn:huawei:yang:huawei-acl" select="/acl:acl/acl:groups/acl:group[acl:identity='2000']"/>
XPath expressions cannot be used as filter criteria for such operations as notifications, full synchronization, incremental
synchronization, or copy-config.
• Use the specified XPath as a filter criterion to query information about all nodes in the XPath.
For example, query information about all nodes in the /acl:acl/acl:groups/acl:group XPath of the
<running/> configuration database.
■ RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="19">
<get-config>
<source>
<running/>
</source>
<filter xmlns:acl="urn:huawei:yang:huawei-acl" type="xpath"
select="/acl:acl/acl:groups/acl:group"/>
</get-config>
</rpc>
■ RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="6">
<data>
<acl xmlns="urn:huawei:yang:huawei-acl">
<groups>
<group>
<identity>2000</identity>
<type>basic</type>
<match-order>config</match-order>
<step>5</step>
</group>
</groups>
</acl>
</data>
</rpc-reply>
• Use the value of a node in the specified XPath as a filter criterion to query information about the node
that matches this value in the XPath.
2022-07-08 209
Feature Description
For example, query information about the node for which "identity" is set to 2000 in the
/acl:acl/acl:groups/acl:group XPath of the <running/> configuration database.
■ RPC request
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<get-config>
<source>
<running/>
</source>
<filter type="xpath" xmlns:acl="urn:huawei:yang:huawei-acl"
select="/acl:acl/acl:groups/acl:group[acl:identity='2000']"/>
</get-config>
</rpc>
■ RPC reply
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<data>
<acl xmlns="urn:huawei:yang:huawei-acl">
<groups>
<group>
<identity>2000</identity>
<type>basic</type>
<match-order>config</match-order>
<step>5</step>
</group>
</groups>
</acl>
</data>
</rpc-reply>
• Use two or more XPaths in the OR relationship as filter criteria to query information about the same
node in all expressions.
For example, query information about the same node in the /nacm/rule-list/group and /nacm/rule-
list/rule XPaths of the <candidate/> configuration database.
■ RPC request
<rpc message-id="1" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-config>
<source>
<candidate/>
</source>
<filter type="xpath" select="/t:nacm/t:rule-list/t:group | /t:nacm/t:rule-list/t:rule"
xmlns:t="urn:ietf:params:xml:ns:yang:ietf-netconf-acm"/>
</get-config>
</rpc>
■ RPC reply
<rpc-reply message-id="1" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<nacm xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-acm">
<rule-list>
<name>list1</name>
<group>group1</group>
<rule>
<name>rule11</name>
<module-name>*</module-name>
2022-07-08 210
Feature Description
• Use the /* symbol as a filter criterion to query information about all nodes in the specified XPath
(before the * symbol).
For example, you can query information about all nodes in the /nacm XPath of the <candidate/>
configuration database.
■ RPC request
<rpc message-id="1" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-config>
<source>
<candidate/>
</source>
<filter type="xpath" select="/t:nacm/*" xmlns:t="urn:ietf:params:xml:ns:yang:ietf-netconf-acm"/>
</get-config>
</rpc>
■ RPC reply
<rpc-reply message-id="1" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<nacm xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-acm">
<enable-nacm>false</enable-nacm>
<read-default>deny</read-default>
<write-default>deny</write-default>
<exec-default>deny</exec-default>
<groups>
<group>
<name>group1</name>
<user-name>puneeth1</user-name>
<user-name>puneeth2</user-name>
<user-name>puneeth3</user-name>
</group>
<group>
2022-07-08 211
Feature Description
<name>group2</name>
<user-name>puneeth1</user-name>
<user-name>puneeth2</user-name>
<user-name>puneeth3</user-name>
</group>
</groups>
<rule-list>
<name>list1</name>
<group>group1</group>
<rule>
<name>rule11</name>
<module-name>*</module-name>
<access-operations>create read update delete</access-operations>
<action>permit</action>
<rpc-name>commit</rpc-name>
</rule>
<rule>
<name>rule12</name>
<module-name>*</module-name>
<access-operations>read</access-operations>
<action>deny</action>
<rpc-name>edit-config</rpc-name>
</rule>
</rule-list>
</nacm>
</data>
</rpc-reply>
• Use the pagination query function to query information about specified nodes. For <get-config> and
<get> operations, this function supports the optional expression [position() >= a and position() <= b] in
the XPath to query the data of list/leaf-list nodes in the specified range [a, b].
■ You can specify the left and right boundaries, which are interchangeable, for the pagination query.
For example, either of the following expressions is used to query the data of nodes 1 to 100 on the
list node interface.
select="/ifm:ifm/ifm:interfaces/ifm:interface[position() >= 1 and position() <= 100] "
select="/ifm:ifm/ifm:interfaces/ifm:interface[position() <= 100 and position() >= 1] "
■ You can specify the same left and right boundary values for the pagination query. If they are the
same, the query range is a fixed value rather than a value range.
For example, the following expression is used to query the data of the first node on the list node
interface.
select="/ifm:ifm/ifm:interfaces/ifm:interface[position() >= 1 and position() <= 1] "
■ You can specify only one boundary (left or right) for the pagination query. If only the left boundary
is specified, the data of the specified node and all the subsequent nodes is queried. Conversely, if
only the right boundary is specified, the data of node 1 to the specified node is queried.
For example, the first of the following two expressions is used to query the data of node 100 and
all the subsequent nodes on the list node interface, and the second expression is used to query the
data of nodes 1 to 200 on the list node interface.
select="/ifm:ifm/ifm:interfaces/ifm:interface[position() >= 100] "
select="/ifm:ifm/ifm:interfaces/ifm:interface[position() <= 200] "
2022-07-08 212
Feature Description
■ The pagination query function verifies whether the specified query range meets the following
conditions:
■ The left boundary value is less than or equal to the right boundary value.
■ The left and right boundary values are integers ranging from 1 to 1000000000. If the left
boundary of the specified query range exceeds the actual number of data records to be
queried, no query result is displayed.
■ The pagination query function supports query based on multiple filter criteria, meaning that each
query can contain more than one filter criterion.
For example, the following expression contains two filter criteria, indicating that the GE port data
of nodes 1 to 100 is queried.
select="/ifm:ifm/ifm:interfaces/ifm:interface[ifm:type='gigabitethernet'][position() >= 1 and
position()<= 100]"
■ Only the list and leaf-list nodes support the pagination query function, and no other node can
follow the expression position().
For example:
select="/ifm:ifm/ifm:interfaces/ifm:interface[position() >= 1 and position() <= 100] "
In the preceding expression, the node interface is a list node, and the expression [position() >= 1
and position() <= 100] is not followed by another node.
■ Multiple position() parameters cannot be combined by the OR symbol (|) to deliver the pagination
query operation. To query different services, you must therefore deliver the pagination query
operations separately.
■ If a user sends two pagination query requests at a maximum interval of 3 minutes, the entered
XPaths are the same (same prefix, namespace, and node), and the input numbers for the
pagination query are consecutive, the device considers the later query request as part of the first
one and preferentially obtains the data to be queried from the cache. If the input numbers for
pagination query are not consecutive or the interval between two requests exceeds 3 minutes, the
later query operation is processed as a new request. The queried content is obtained from the
device configuration database.
For example:
■ A user delivers two pagination query requests within 3 minutes. The first queries the content
of nodes 1 to 100 of a specified list/leaf-list node, and the second queries the content of nodes
101 to 200 of the same XPath. If the device configuration changes at any time between the
two query operations, the data queried by the user is the data before the change, that is, the
data in the cache.
■ If a user delivers two pagination query requests (with the first querying the content of nodes 1
to 100 and the second querying the content of nodes 301 to 400) and the device configuration
changes at any time between the two query operations, the pre-change data is obtained for
the first request and the post-change data is obtained for the second. This is true regardless of
whether the interval between the two requests exceeds 3 minutes.
2022-07-08 213
Feature Description
■ The same XPath indicates that the prefixes, namespaces, and nodes are identical. If one of
them is different, the queries are considered to be different. For example, the XPath prefixes of
the following two expressions are considered different, meaning that the device processes
them as two independent requests.
select="/t:ifm/t:interfaces/t:interface[position() >= 1 and position()<= 100]"
select="/l:ifm/l:interfaces/l:interface[position() >= 101 and position()<= 200]"
During packet delivery, the greater-than sign (>) and less-than sign (<) in the position() expression must be
represented by the escape characters > and <.
For example, query information about the first and second NACM authentication user groups in the
<running/> configuration database.
■ RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="827">
<get-config>
<source>
<running/>
</source>
<filter xmlns:t="urn:ietf:params:xml:ns:yang:ietf-netconf-acm" type="xpath"
select="/t:nacm/t:groups/t:group[position()>=1 and position()<=2]"/>
</get-config>
</rpc>
■ RPC reply
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<nacm xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-acm">
<groups>
<group>
<name>1</name>
<user-name>test1</user-name>
<user-name>test2</user-name>
<user-name>test3</user-name>
<user-name>test4</user-name>
</group>
<group>
<name>2</name>
<user-name>test1</user-name>
<user-name>test2</user-name>
<user-name>test3</user-name>
</group>
</groups>
</nacm>
</data>
correcting the configuration delivery sequence, the device commits the configurations to the <running/>
configuration database.
Before performing the <validate> operation, locking the <running/> configuration database is advised to
prevent adverse impacts on the validate operation when other users operate the <running/> configuration
database.
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<validate>
<source>
<candidate/>
</source>
</validate>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
If the NMDA data set is supported, the data set format in the source configuration database is different, as
shown in the following:
• RPC request
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<validate xmlns:ds="urn:ietf:params:xml:ns:yang:ietf-datastores">
<source>
<datastore xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-nmda">ds:running</datastore>
</source>
</validate>
</rpc>
• RPC reply
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
Validate checks are classified into syntactic checks and semantic checks.
• Syntactic check: RPC packet validity, model matching, data type, value range, authorization, whether
existing data is to be created or nonexistent data is to be deleted, and whether the parent node exists
The <source> parameter of the Validate operation supports only <candidate/> and <running/>.
If the validate capability is supported, the <edit-config> operation can carry the test-option parameter. The
value of the <test-option> parameter can be test-then-set, set, or test-only. If this parameter is not carried in
the <edit-config> operation, the system uses the test-then-set process by default.
• <test-then-set>: The system checks the delivered configurations for syntactic and semantic errors. If the
check succeeds, the system modifies the configuration. If the check fails, the system displays a failure
2022-07-08 215
Feature Description
message and the failure cause and does not modify the configuration.
• <set>: The system checks configurations for syntactic errors. After the check succeeds, the system
commits the configurations to the <candidate/> configuration database. Semantic errors are not
checked. However, when performing the <commit> or <confirmed-commit> operation, the system
checks configurations for semantic errors and commits the configurations to the <running/>
configuration database after the check succeeds.
• <test-only>: The system checks configurations only for syntactic and semantic errors and reports the
check result without committing the configurations to any configuration database.
Change the interface name of the IFM feature to text in the <running/> configuration database and perform
a syntactic and semantic check.
• RPC request
<?xml version="1.0" encoding="utf-8"?>
<rpc message-id="2" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<edit-config>
<target>
<running/>
</target>
<test-option>test-then-set</test-option>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" nc:operation="merge">
<name>GigabitEthernet1/0/0</name>
<description>text</description>
</interface>
</interfaces>
</ifm>
</config>
</edit-config>
</rpc>
• RPC reply
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="2"
nc-ext:flow-id="27">
<ok/>
</rpc-reply>
4.8.5.8 URL
This capability indicates that a device can modify or copy files in a specified path. Currently, the <edit-
config> and <copy-config> operations are supported. Password information in URLs is protected. When
configuration data is exported, password information is exported in ciphertext.
• The <edit-config> operation commits the configuration file in a specified path to the <candidate/> or
<running/> configuration database.
2022-07-08 216
Feature Description
• The <copy-config> operation copies configuration data in the <candidate/> or <running/> configuration
database to a file in a specified path.
Currently, the SFTP, FTP, file, HTTP, and HTTPS protocols are supported.
• The SFTP or FTP protocol is used to query files on an SFTP or FTP server. The path format is ftp://user
name:password@IP address of the SFTP or FTP server/file directory/file name.
• The file protocol is used to query local files. The path format is file:///file directory/file name.
• The HTTP or HTTPS protocol is used to search for files on an HTTP or HTTPS server. The path format is
http (or https)://IP address (or DNS) of the HTTP or HTTPS server:port number/file directory/file name.
The file name is a string of case-sensitive characters starting with an underscore (_) or a letter. It only supports
underscores, digits, and letters. The dot (.) can only be used in the file name extension, and only one dot is supported.
The file name cannot contain more than 256 characters, including a path.
For the <copy-config> operation, if the file specified in the <url> element does not exist, the file is directly created. If the
file exists, it is overwritten.
For the <edit-config> operation, the file specified in the <url> element must exist.
The HTTP or HTTPS protocol supports only the <edit-config> operation in a YANG model.
Copy data in the <running/> configuration database to the local abc.xml file:
• RPC request:
<?xml version="1.0" encoding="UTF-8"?>
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<copy-config>
<target>
<url>file:///abc.xml</url>
</target>
<source>
<running/>
</source>
</copy-config>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="UTF-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="5">
<ok/>
</rpc>
Commit data in the config.xml file on the FTP server to the <candidate/> configuration database:
• RPC request:
<?xml version="1.0" encoding="UTF-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="5">
<edit-config>
<target>
<candidate/>
</target>
<url>ftp://root:root@10.1.1.2/config.xml</url>
2022-07-08 217
Feature Description
</edit-config>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="UTF-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="5">
<ok/>
</rpc>
Commit data in the config.xml file on the HTTP server to the <candidate/> configuration database:
■ RPC request:
<?xml version="1.0" encoding="UTF-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<edit-config>
<target>
<candidate/>
</target>
<url>http://192.168.1.1:8080/config.xml</url>
</edit-config>
</rpc>
■ RPC reply:
<?xml version="1.0" encoding="UTF-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<ok/>
</rpc>
4.8.5.9 Notification
Notification 1.0
The device uses NETCONF to report alarms or events to the NMS through notifications, facilitating device
management by the NMS. You can perform the <create-subscription> operation to subscribe to device
alarms or events. If the <rpc-reply> element returned by the device contains an <ok> element, the <create-
subscription> operation is successful, and the device will proactively report its alarms or events through
NETCONF to the NMS.
1. Alarms or events can be subscribed to in either of the following modes: long-term subscription and
subscription within a specified period.
• Long-term subscription: After the subscription is successful, if the <startTime> element is specified in the
subscription packet, the device sends the historical alarms or events to the NMS and then sends a
<replayComplete> packet to notify the NMS that replay is complete. If a new alarm or events is
generated on the device, the device also sends the alarm or events to the NMS. If the <startTime>
element is not specified in the subscription packet, the device sends all generated alarms or events to
the NMS. After a NETCONF session is terminated, the subscription is automatically canceled.
• Subscription within a specified period: After the subscription is successful, the device sends the alarms or
events that are generated from the start time to the end time and meet the filtering conditions to the
2022-07-08 218
Feature Description
NMS. Because the <startTime> element is specified in the subscription packet, the device sends
historical alarms or events to the NMS and then sends a <replayComplete> packet to notify the NMS
that the replay is complete. When the specified <stopTime> arrives, the NETCONF module sends a
<notificationComplete> packet to notify the NMS that the subscription is terminated.
Historical alarms or events refer to alarms or events generated from the <startTime> specified in the
subscription packet to when the user performs the subscription operation.
The format of the subscription packet sent by the device to the NMS is as follows. If <stopTime> is not
specified, the subscription is a long-term one. If both <startTime> and <stopTime> are specified, the
subscription is within a specified period.
Reply example:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<stream>NETCONF</stream>
<filter type="subtree">
<hwCPUUtilizationRisingAlarm xmlns="urn:huawei:yang:huawei-sem" />
</filter>
<startTime>2016-10-20T14:50:00Z</startTime>
<stopTime>2016-10-23T06:22:04Z</stopTime>
</create-subscription>
</rpc>
Response example:
<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
Table 1 Elements
2022-07-08 219
Feature Description
startTime Start time The value is in the N The start time must
time format. be earlier than the
time when the
subscription
operation is
performed.
stopTime End time The value is in the N The end time must
time format. be later than the
start time.
2. After the subscription is successful, the device encapsulates the alarm and event information into
notification messages and sends them to the NMS. The Notification message format is as follows:
<notification xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<eventTime>2016-11-26T13:51:00Z</eventTime>
<hwCPUUtilizationResume xmlns="urn:huawei:yang:huawei-sem">
<TrapSeverity>0</TrapSeverity>
<ProbableCause>0</ProbableCause>
<EventType>0</EventType>
<PhysicalIndex>0</PhysicalIndex>
<PhysicalName>SimulateStringData</PhysicalName>
<RelativeResource>SimulateStringData</RelativeResource>
<UsageType>0</UsageType>
<SubIndex>0</SubIndex>
<CpuUsage>0</CpuUsage>
<Unit>0</Unit>
<CpuUsageThreshold>0</CpuUsageThreshold>
</hwCPUUtilizationResume>
</notification>
3. After alarms or events are reported to the NMS, the NETCONF module sends a subscription completion
packet to the NMS.
• After historical alarms or events are reported to the NMS, the NETCONF module sends a
replayComplete packet to the NMS. The format of the replayComplete packet is as follows:
<notification xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<eventTime>2016-11-29T11:57:15Z</eventTime>
<replayComplete xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0" />
</notification>
• When the <stopTime> specified in the subscription packet arrives, the NETCONF module sends a
2022-07-08 220
Feature Description
notification message to notify the NMS that the subscription is terminated. The format of the
notificationComplete packet is as follows:
<notification xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<eventTime>2016-11-29T11:57:25Z</eventTime>
<notificationComplete xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0" />
</notification>
Table 2 Elements
4.8.5.10 YANG-library
The YANG-library capability indicates that a device can provide the YANG capabilities that the device
supports. Basic information about YANG modules that a server supports can be viewed on a NETCONF client.
The information includes the module name, YANG model version, namespace, and list of submodules and is
saved in the local buffer.
Field description:
• module-set-id: module set ID. It indicates a set of YANG modules that the server supports. If a YANG
module changes, the ID changes.
XML example: Query the module-set-id value of the YANG module whose name is ietf-yang-library and
conformance-type is implement and query basic YANG module information of the YANG module huawei-
aaa.
2022-07-08 221
Feature Description
RPC request:
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="23">
<get>
<filter type="subtree">
<modules-state xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-library">
<module-set-id></module-set-id>
<module>
<name>ietf-yang-library</name>
<conformance-type>implement</conformance-type>
</module>
<module>
<name>huawei-aaa</name>
</module>
</modules-state>
</filter>
</get>
</rpc>
Information contained in the reply includes the module-set-id value, YANG module version used,
namespace, list of submodules, and revision date. If the reply does not contain the YANG module version
information, YANG1.0 is used by default.
RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<modules-state xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-library">
<module-set-id>2148066159</module-set-id>
<module>
<name>ietf-yang-library</name>
<revision>2016-06-21</revision>
<namespace>urn:ietf:params:xml:ns:yang:ietf-yang-library</namespace>
<conformance-type>implement</conformance-type>
</module>
<module>
<name>huawei-aaa</name>
<revision>2017-03-23</revision>
<namespace>urn:huawei:yang:huawei-aaa</namespace>
<conformance-type>implement</conformance-type>
<deviation>
<name>huawei-aaa-deviations-cx</name>
<revision>2017-03-23</revision>
</deviation>
<submodule>
<name>huawei-aaa-action</name>
<revision>2017-03-23</revision>
</submodule>
<submodule>
<name>huawei-aaa-lam</name>
<revision>2017-03-23</revision>
</submodule>
<submodule>
<name>huawei-aaa-lam-action</name>
<revision>2017-03-23</revision>
</submodule>
<submodule>
<name>huawei-aaa-lam-type</name>
<revision>2017-03-23</revision>
</submodule>
<submodule>
<name>huawei-aaa-type</name>
<revision>2017-03-23</revision>
2022-07-08 222
Feature Description
</submodule>
</module>
</modules-state>
</data>
4.8.6.1 Sync
This capability indicates that the device allows the NMS to perform full or incremental data synchronization.
Through data synchronization, the NMS or controller that manages network devices can have the same
configuration data with NEs in real time.
Example of a full data synchronization operation: The NETCONF server uses FTP to transfer AAA module
configurations in the data to be synchronized to the home directory of user root (password is root) on the
server whose IP address is 10.1.1.1. The storage file name is Multi_App_sync_full.zip.
• RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="4">
<sync-full xmlns="urn:huawei:yang:huawei-netconf-sync">
<target>
<user-name>root</user-name>
<password>root</password>
<target-addr>10.1.1.1</target-addr>
<path>/home</path>
</target>
<transfer-protocol>ftp</transfer-protocol>
<transfer-method>auto</transfer-method>
<filename-prefix>Multi_App_sync_full</filename-prefix>
2022-07-08 223
Feature Description
<app-err-operation>stop-on-error</app-err-operation>
<filter>
<aaa xmlns="urn:huawei:yang:huawei-aaa"/>
</filter>
</sync-full>
</rpc>
• RPC reply:
The RPC reply message carries a full data synchronization identifier assigned by the NETCONF server,
which is returned using the <sync-full-id> parameter.
After full synchronization is triggered, the RPC reply message carries the nc-ext attribute.
Example of a <cancel-synchronization> operation that cancels the full data synchronization operation whose
<sync-full-id> is 185:
• RPC request:
<rpc message-id="cancel" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<cancel-synchronization xmlns="urn:huawei:yang:huawei-netconf-sync">
<sync-full-id>185</sync-full-id>
</cancel-synchronization>
</rpc>
• RPC reply:
Success reply
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply message-id="cancel" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
• RPC request:
<rpc message-id="upload" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<upload-sync-file xmlns="urn:huawei:yang:huawei-netconf-sync">
<sync-full-id>185</sync-full-id>
<result-save-time>1</result-save-time>
</upload-sync-file>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply message-id="upload" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
2022-07-08 224
Feature Description
</rpc-reply>
• RPC request:
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="query_status185">
<get>
<filter type="subtree">
<synchronization xmlns="urn:huawei:yang:huawei-netconf-sync">
<file-transfer-statuss>
<file-transfer-status>
<sync-full-id>185</sync-full-id>
<status></status>
<progress></progress>
<error-message></error-message>
</file-transfer-status>
</file-transfer-statuss>
</synchronization>
</filter>
</get>
</rpc>
• RPC reply:
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply message-id="query_status12" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<synchronization xmlns="urn:huawei:yang:huawei-netconf-sync">
<file-transfer-statuss>
<file-transfer-status>
<sync-full-id>12</sync-full-id>
<status>In-Progress</status>
<progress>50</progress>
</file-transfer-status>
</file-transfer-statuss>
</synchronization>
</data>
</rpc-reply>
2022-07-08 225
Feature Description
• RPC request:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<sync-increment xmlns="urn:huawei:yang:huawei-netconf-sync">
<target>
<flow-id>7</flow-id>
</target>
<source>
<flow-id>6</flow-id>
</source>
<filter type="subtree">
<ifm xmlns="urn:huawei:yang:huawei-ifm"/>
</filter>
</sync-increment>
</rpc>
• RPC reply:
<rpc-reply xmlns:nc-md="urn:huawei:yang:huawei-netconf-metadata">
<data xmlns="urn:huawei:yang:huawei-netconf-sync">
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface nc-md:difference="create">
<interfaceName>Gigabitethernet0/1/0.1</interfaceName>
<mtu>15000</mtu>
<adminStatus>down</adminStatus>
</interface>
<interface nc-md:difference="delete">
<interfaceName>Gigabitethernet0/1/1.1</interfaceName>
</interface>
<interface nc-md:difference="modify">
<interfaceName>Gigabitethernet0/2/0</interfaceName>
<mtu>15000</mtu>
<adminStatus>up</adminStatus>
</interface>
<interface nc-md:difference="modify">
<interfaceName>Gigabitethernet0/2/1</interfaceName>
<ifAm4s>
<ifAm4 nc-md:difference="create">
<ipAddress>10.164.11.10</ipAddress>
<netMask>255.255.255.0</netMask>
<addressType/>
</ifAm4>
</ifAm4s>
</interface>
</interfaces>
</ifm>
</data>
</rpc-reply>
Example of a keepalive message on the server after a client subscribes to <netconf-rpc-keepalive> messages:
• RPC request:
<netconf:rpc netconf:message-id="101" xmlns:netconf="urn:ietf:params:xml:ns:netconf:base:1.0">
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<filter netconf:type="subtree">
<nc-ext:netconf-rpc-keepalive xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"/>
</filter>
</create-subscription>
</netconf:rpc>
• Notification report:
<netconf:rpc netconf:message-id="101" xmlns:netconf="urn:ietf:params:xml:ns:netconf:base:1.0">
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<filter netconf:type="subtree">
<nc-ext:netconf-rpc-keepalive xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"/>
</filter>
</create-subscription>
</netconf:rpc>
4.8.6.3 Commit-description
The commit-description capability enables a user to write a description when a device performs a <commit>
operation. The description helps configuration rollback.
A description is carried in the <description> parameter of the <commit> operation. The YANG model defines
the capability in the huawei-ietf-netconf-ext.yang file.
• RPC request:
<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<commit>
<description xmlns="urn:huawei:yang:huawei-ietf-netconf-ext">Config interfaces</description>
</commit>
</rpc>
• RPC reply:
<rpc-reply message-id="2" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
</rpc-reply>
4.8.6.4 with-defaults
The <with-defaults> capability indicates that a device has the capability to process default values of the
model. The <get>, <get-config>, and <copy-config> operations can carry the <with-defaults> parameter.
The <with-defaults> parameter values are as follows:
• report-all: Query all nodes and do not perform any operation on the nodes.
■ RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="4">
2022-07-08 227
Feature Description
<get xmlns:wsss="urn:ietf:params:xml:ns:yang:ietf-netconf-with-defaults">
<filter type="subtree">
<system xmlns="urn:huawei:yang:huawei-system"/>
</filter>
<with-defaults xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-with-defaults">report-all</with-defaults>
</get>
</rpc>
■ RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="4">
<data>
<system xmlns="urn:huawei:yang:huawei-system">
<systemInfo>
<lsRole>admin</lsRole>
<authenFlag>false</authenFlag>
</systemInfo>
</system>
</data>
</rpc-reply>
• trim: The nodes whose values equal to default ones are not displayed in the query result.
■ RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="3">
<get xmlns:wsss="urn:ietf:params:xml:ns:yang:ietf-netconf-with-defaults">
<filter type="subtree">
<system xmlns="urn:huawei:yang:huawei-system"/>
</filter>
<with-defaults xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-with-defaults">trim</with-defaults>
</get>
</rpc>
■ RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="3">
<data>
<system xmlns="urn:huawei:yang:huawei-system">
<systemInfo>
<lsRole>admin</lsRole>
</systemInfo>
</system>
</data>
</rpc-reply>
• report-all-tagged: Query all nodes and use namespace:default="true" to identify the nodes whose
values equal to default ones.
■ RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<get xmlns:wsss="urn:ietf:params:xml:ns:yang:ietf-netconf-with-defaults">
<filter type="subtree">
<system xmlns="urn:huawei:yang:huawei-system"/>
</filter>
<with-defaults xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-with-defaults">report-all-tagged</with-defaults>
2022-07-08 228
Feature Description
</get>
</rpc>
■ RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
xmlns:wd="urn:ietf:params:xml:ns:netconf:default:1.0"
message-id="2">
<data>
<system xmlns="urn:huawei:yang:huawei-system">
<systemInfo>
<lsRole>admin</lsRole>
<authenFlag wd:default="true">false</authenFlag>
</systemInfo>
</system>
</data>
</rpc-reply>
If a node is identified using namespace:default="true", the <edit-config> operation can identify the
<default> attribute on the node and determine whether the node value equals to the default one.
The <operation> attribute of the <edit-config> operation can only be create, merge, or replace. If the
<operation> value is remove or delete, <rpc-error> is returned
If the value of the default attribute is true or 1 and the value of the leaf node is the same as the default
value defined in the YANG file, <ok> is returned for the <edit-config> operation. In other cases, <rpc-
error> is returned, including the names and actual values of the leaf nodes whose values are
inconsistent with the default values defined in the YANG file.
■ The <default> attribute value of the leaf node ifDf is true, and the node value is false, which is the
same as the default value defined in the YANG file. After the <edit-config> operation is performed,
<ok> is returned.
RPC request:
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<edit-config xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
xmlns:wd="urn:ietf:params:xml:ns:netconf:default:1.0">
<target>
<running/>
</target>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" nc:operation="merge">
<ifName>GigabitEthernet1/0/0</ifName>
<ifDf wd:default="true">false</ifDf>
</interface>
</interfaces>
</ifm>
</config>
</edit-config>
</rpc>
RPC reply:
<rpc-reply message-id="2" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<ok/>
2022-07-08 229
Feature Description
</rpc-reply>
■ The <default> attribute value of the leaf node ifDf is true, and the node value is true, which is
different from the default value defined in the YANG file. After the <edit-config> operation is
performed, <rpc-error> is returned. <error-para> contains the name and value of the error node.
RPC request:
<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="2">
<edit-config xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
xmlns:wd="urn:ietf:params:xml:ns:netconf:default:1.0">
<target>
<running/>
</target>
<config>
<ifm xmlns="urn:huawei:yang:huawei-ifm">
<interfaces>
<interface xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" nc:operation="merge">
<ifName>GigabitEthernet1/0/0</ifName>
<ifDf wd:default="true">true</ifDf>
</interface>
</interfaces>
</ifm>
</config>
</edit-config>
</rpc>
RPC reply:
<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext"
xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id="2"
nc-ext:flow-id="27">
<rpc-error>
<error-type>application</error-type>
<error-tag>bad-element</error-tag>
<error-severity>error</error-severity>
<error-path xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0"
xmlns:ifm="urn:huawei:yang:huawei-ifm">/nc:rpc/nc:edit-
config/nc:config/ifm:ifm/ifm:interfaces/ifm:interface[ifm:ifName='Ethernet0/1/0']/ifm:ifDf</error-path>
<error-message xml:lang="en">ifDf has invalid value true.</error-message>
<error-info xmlns:nc-ext="urn:huawei:yang:huawei-ietf-netconf-ext">
<bad-element>ifDf</bad-element>
<nc-ext:error-info-code>317</nc-ext:error-info-code>
<nc-ext:error-paras>
<nc-ext:error-para>ifDf</nc-ext:error-para>
<nc-ext:error-para>true</nc-ext:error-para>
</nc-ext:error-paras>
</error-info>
</rpc-error>
</rpc-reply>
each vendor provides a unique set of device management methods, configuring and managing these devices
using traditional methods will be costly and highly inefficient. To resolve these issues, use NETCONF to
remotely configure, manage, and monitor devices.
You can use the Simple Network Management Protocol (SNMP) as an alternative to remotely configure, manage, and
monitor devices on a simple network.
Before using NETCONF to configure and manage devices shown in Figure 1, perform the following
operations:
1. Configure SSH on managed devices so that these devices can be configured, managed, and monitored
over SSH connections.
2. Enable NETCONF on managed devices so that these devices function as NETCONF agents.
3. Install a network management system (NMS) on a personal computer (PC) or workstation so that the
PC or workstation functions as a NETCONF manager.
• Allows devices to proactively report alarms and events to the NMS in real time, if there are any.
• NETCONF supports VS-based independent device management. You can directly log in to a VS to
manage the corresponding device and use the NMS to configure NETCONF services for each VS through
schema packets.
• YANG supports VS-based independent device management. You can directly log in to a VS to manage
the corresponding device and use the NMS to configure YANG services for each VS through YANG
packets.
A device supports the CLI-to-XML translation, through which YANG packets are obtained to manage devices
2022-07-08 231
Feature Description
Definition
The data communication network (DCN) refers to the network on which network elements (NEs) exchange
Operation, Administration and Maintenance (OAM) information with the network management system
(NMS). It is constructed for communication between managing and managed devices.
A DCN can be an external or internal DCN. In Figure 1, an external DCN is between the NMS and an access
point, and an internal DCN allows NEs to exchange OAM information within it. In this document, internal
DCNs are described.
Gateway network elements (GNEs) are connected to the NMS using protocols, for example, the Simple
Network Management Protocol (SNMP). GNEs are able to forward data at the network or application layer.
An NMS directly communicates with a GNE and uses the GNE to deliver management information to non-
GNEs.
Purpose
When constructing a large network, hardware engineers must install devices on site, and software
commissioning engineers must configure the devices also on site. This network construction method requires
significant human and material resources, causing high capital expenditure (CAPEX) and operational
expenditure (OPEX). If a new NE is deployed but the NMS cannot detect the NE, the network administrator
cannot manage or control the NE. Plug-and-play can be used so that the NMS can automatically detect new
NEs and remotely commission the NEs to reduce CAPEX and OPEX.
2022-07-08 232
Feature Description
The DCN technique offers a mechanism to implement plug-and-play. After an NE is installed and started, an
IP address (NEIP address) mapped to the NEID of the NE is automatically generated. Each NE adds its NEID
and NEIP address to a link state advertisement (LSA). Then, Open Shortest Path First (OSPF) advertises all
Type-10 LSAs to construct a core routing table that contains mappings between NEIP addresses and NEIDs
on each NE. After detecting a new NE, the GNE reports the NE to the NMS. The NMS accesses the NE using
the IP address of the GNE and ID of the NE. To commission NEs, the NMS can use the GNE to remotely
manage the NEs on the network.
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
Benefits
The NMS is able to manage NEs using service channels provided by the managed NEs. No additional devices
are required, reducing CAPEX and OPEX.
• NEIP
NEIP addresses help managed terminals access NEs and allow addressing between NEs in IP
networking. An NEIP address consists of a network number and a host number. A network number
uniquely identifies a physical or logical link. All the NEs along the link have the same network number.
A network number is obtained using an AND operation on the 32-bit IP address and subnet mask. A
host number uniquely identifies a device on a link.
An NEIP address is derived from an NEID when an NE is being initialized. An NEIP address is in the
format of 128.subnet-number.basic-ID.
The following example uses the default NEID 0x09BFE0, which is 1001.10111111.11100000 in binary
format. The basic ID is the 16 least significant bits 10111111.11100000, which is 191.224 in decimal
format. The subnet number is the 8 most significant bits 00001001, which is 9 in decimal format.
Therefore, the NEIP address derived from 0x09BFE0 is 128.9.191.224.
2022-07-08 233
Feature Description
Before the NEIP address is manually changed, the NEIP address and NEID are associated; therefore, the
NEIP address changes if the NEID is changed. Once the NEIP address is manually changed, it no longer
changes when the associated NEID is changed.
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
Huawei NEs can use serial interfaces or sub-interfaces numbered 4094 for DCN communication. Non-
Huawei NEs cannot use serial interfaces for DCN communication. Therefore, to implement DCN
communication between Huawei NEs and non-Huawei NEs, sub-interfaces numbered 4094 must be
configured.
Using Serial Interfaces for DCN Communication
The devices on a data communication network (DCN) communicate with each other using the Point-to-Point
Protocol (PPP) through single-hop logical channels. Therefore, packets transmitted on the DCN are
encapsulated into PPP frames and forwarded through service ports at the data link layer.
As shown in Figure 1, the NMS uses a GNE to manage non-GNEs in the following process:
1. When a device starts with base configuration, DCN is automatically enabled, and the NEID
configuration is generated based on device planning.
2. After the DCN function is enabled, a PPP channel and an OSPF neighbor relationship are established
between devices.
3. OSPF LSAs are sent between OSPF neighbors to learn host routes carrying NEIP addresses to obtain
2022-07-08 234
Feature Description
4. GNE sends the mappings to NMS, the NMS use a GNE to access non-GNEs.
1. After PPP Network Control Protocol (NCP) negotiation is complete, a point-to-point route is generated
without network segment restrictions.
2. An OSPF neighbor relationship is set up, and an OSPF route is generated for the entire network.
3. NEIDs are advertised using OSPF LSAs, triggering the generation of a core routing table.
As shown in Figure 1, the NMS uses a GNE to manage non-GNEs in the following process:
1. Each neighbor learns host routes to NEIP addresses through OSPF, as well as mapping relationships
between NEIP addresses and NEIDs.
2. The GNE sends the mapping relationships to the NMS, the NMS use a GNE to access non-GNEs.
1. An OSPF neighbor relationship is set up, and an OSPF route is generated for the entire network.
2. NEIDs are advertised using OSPF link-state advertisements (LSAs), triggering the generation of a core
routing table.
DCN Application
During network deployment, every network element (NE) must be configured with software and
commissioned after hardware installation to ensure that all NEs can communicate with each other. As a
large number of NEs are deployed, on-site deployment for each NE requires significant manpower and is
time-consuming. In order to reduce the on-site deployment times and the cost of operation and
maintenance, the DCN can be deployed.
2022-07-08 235
Feature Description
In Figure 1, to improve reliability, active and standby GNEs can be deployed. If the active GNE fails, the NMS
can gracefully switch this function to the standby GNE.
1. A DCN VLAN group is configured on the GNE, and the VLAN ID of the Dot1q termination subinterface
is the same as the DCN VLAN ID of the main interface.
2. The GNE sends DCN negotiation packets to VLANs in the DCN VLAN group.
3. The DCN negotiation packets are sent to different leaf nodes through VLLs.
4. NEs learn the DCN VLAN ID sent by the GNE and establish DCN connections with the GNE.
2022-07-08 236
Feature Description
Terms
Term Description
GNE Gateway network elements (GNEs) are able to forward data at the
network or application layer. The NMS can use GNEs to manage
remote NEs connected through optical fibers.
Core routing table A core routing table consists of mappings between NEID and NEIP
addresses of NEs on a data communication network (DCN). Before
accessing a non-GNE through a GNE, the NMS must search the
core routing table for the NEIP address of the non-GNE based on
the destination NEID.
Definition
Link Automatic Discovery (LAD) is a Huawei proprietary protocol that discovers neighbors at the link layer.
LAD allows a device to issue link discovery requests as triggered by the NMS or command lines. After the
device receives link discovery replies, the device generates neighbor information and saves it in the local MIB.
The NMS can then query neighbor information in the MIB and generate the topology of the entire network.
Purpose
Large-scale networks demand increased NMS capabilities, such as obtaining the topology status of
connected devices automatically and detecting configuration conflicts between devices. Currently, most
NMSs use an automated discovery function to trace changes in the network topology but can only analyze
the network-layer topology. Network-layer topology information notifies you of basic events like the
2022-07-08 237
Feature Description
addition or deletion of devices, but gives you no information about the interfaces used by one device to
connect to other devices or the location or network operation mode of a device.
LAD is developed to resolve these problems. LAD can identify the interfaces on a network device and provide
detailed information about connections between devices. LAD can also display paths between clients,
switches, routers, application servers, and network servers. The detailed information provided by LAD can
help efficiently locate network faults.
Benefits
LAD helps network administrators promptly obtain detailed network topology and changes in the topology
and monitor the network status in real time, improving security and stability for network communication.
• When Ethernet interfaces are used on links, LAD packets are encapsulated into Ethernet frames. Figure
1 shows the LAD packet format on Ethernet interfaces.
Information 20-44 bytes LAD data unit, main part of an LAD packet
2022-07-08 238
Feature Description
• When Ethernet sub-interfaces are used on links, LAD packets are encapsulated into Ethernet frames.
Figure 2 shows the LAD packet format on Ethernet sub-interfaces.
Tag 4 bytes 2-byte Ethernet Type field and 2-byte VLAN field included
Information 20-44 bytes LAD data unit, main part of an LAD packet
2022-07-08 239
Feature Description
• When low-speed interfaces are used on links, LAD packets are encapsulated into PPP frames. Figure 3
shows the LAD packet format on low-speed interfaces.
Control 1 byte PPP frame type, fixed at 0x03, indicating an unsequenced frame
Protocol 2 bytes Packet type (LAD) carried by PPP frames, fixed at 0xce05
Information 20-44 bytes LAD data unit, main part of an LAD packet
The Information field is the same in all the three LAD packet formats, meaning that the LAD data units are
irrelevant to the link type. Figure 4 shows the format of the LAD data unit.
2022-07-08 240
Feature Description
• Link Detect packets: link discovery requests triggered by the NMS or command lines. Link Detect
packets carry Send Link Info SubTLV in the data unit. Figure 5 shows the format of the Link Detect
packet data unit.
• Link Reply packets: link discovery replies in response to the Link Detect packets sent by remote devices.
Link Reply packets carry the Send Link Info SubTLV (the same as that in the received Link Detect
packets) and Recv Link Info SubTLV. Figure 6 shows the format of the Link Reply packet data unit.
4.10.2.2 Implementation
Background
To monitor the network status in real-time and to obtain detailed network topology and changes in the
2022-07-08 241
Feature Description
topology, network administrators usually deploy the Link Layer Discovery Protocol (LLDP) on live networks.
LLDP, however, has limited applications due to the following characteristics:
• LLDP uniquely identifies a device by its IP address. IP addresses are expressed in dotted decimal notation
and therefore are not easy to maintain or manage, when compared with NE IDs that are expressed in
decimal integers.
• LLDP is not supported on Ethernet sub-interfaces, Eth-Trunk interfaces, or low-speed interfaces, and
therefore cannot discover neighbors for these types of interfaces.
• LLDP-enabled devices periodically broadcast LLDP packets, consuming many system resources and even
affecting the transmission of user services.
Link Automatic Discovery (LAD) addresses the preceding problems and is more flexible:
• LAD uniquely identifies a device by an NE ID in decimal integers, which are easier to maintain and
manage.
• LAD can discover neighbors for various types of interfaces and therefore are more widely used than
LLDP.
• LAD is triggered by an NMS or command lines and therefore can be implemented as you need.
Implementation
The following example uses the networking in Figure 1 to illustrate how LAD is implemented.
1. DeviceA determines the interface type, encapsulates local information into a Link Detect packet, and
sends the packet to DeviceB.
2. After DeviceB receives the link Detect packet, DeviceB parses the packet and encapsulates local
information and DeviceA's information carried in the packet into a Link Reply packet, and sends the
2022-07-08 242
Feature Description
3. After DeviceA receives the Link Reply packet, DeviceA parses the packet and saves local information
and DeviceB's information carried in the packet to the local MIB. The local and neighbor information is
recorded as one entry.
Local and remote devices exchange LAD packets to learn each other's NE ID, slot ID, subcard ID, interface
number, and even each other's VLAN ID if sub-interfaces are used.
4. The NMS exchanges NETCONF packets with DeviceA to obtain DeviceA's local and neighbor
information and then generates the topology of the entire network.
Benefits
After network administrators deploy LAD on devices, they can obtain information about all links connected
to the devices. LAD helps extend the network management scale. Network administrators can obtain
detailed network topology information and topology changes.
Networking Description
In single-neighbor networking, devices are directly connected, and each device interface connects only to one
neighbor. In Figure 1, DeviceA and DeviceB are directly connected, and each interface on DeviceA and Device
B connects only to one neighbor.
2022-07-08 243
Feature Description
Feature Deployment
After enabling Link Automatic Discovery (LAD) on DeviceA, administrators can use the NMS to obtain Layer
2 configurations of DeviceA and DeviceB, get a detailed network topology, and determine whether a
configuration conflict exists. LAD helps improve security and stability for network communication.
Networking Description
In multi-neighbor networking, devices are connected over an unknown network, and each device interface
connects to one or more neighbors. In Figure 1, DeviceA, DeviceB, and DeviceC are connected over a Layer 2
virtual private network (L2VPN). Devices on the L2VPN may have Link Automatic Discovery (LAD) disabled
or may not need to be managed by the NMS, but they can still transparently transmit LAD packets. DeviceA
has two neighbors, DeviceB and DeviceC.
Feature Deployment
After enabling Link Automatic Discovery (LAD) on DeviceA, administrators can use the NMS to obtain Layer
2 configurations of DeviceA, DeviceB, and DeviceC, get a detailed network topology, and determine whether
a configuration conflict exists. LAD helps ensure security and stability for network communication.
Networking Description
2022-07-08 244
Feature Description
On the network shown in Figure 1, an Eth-Trunk that comprises aggregated links exists between DeviceA
and DeviceB. Each aggregated link interface connects directly to only one neighbor, as if it were connected
in single-neighbor networking.
Feature Deployment
After enabling Link Automatic Discovery (LAD) on DeviceA, administrators can use the NMS to obtain Layer
2 configurations of DeviceA and DeviceB, get a detailed network topology, and determine whether a
configuration conflict exists. LAD helps ensure security and stability for network communication.
Terms
Term Definition
LAD A Huawei proprietary protocol that discovers neighbors at the link layer. LAD allows
a device to issue link discovery requests as triggered by the NMS or command lines.
After the device receives link discovery replies, the device generates neighbor
information and saves it in the local MIB. The NMS can then query neighbor
information in the MIB and generate the topology of the entire network.
LLDP A Layer 2 discovery protocol defined in IEEE 802.1ab. LLDP provides a standard link-
layer discovery mode to encapsulate information about the capabilities,
management address, device ID, and interface ID of a local device into LLDP packets
and send the packets to neighbors. The neighbors save the information received in a
standard MIB to help the NMS query and determine the communication status of
links.
2022-07-08 245
Feature Description
Definition
The Link Layer Discovery Protocol (LLDP), a Layer 2 discovery protocol defined in IEEE 802.1ab, provides a
standard link-layer discovery method that encapsulates information about the capabilities, management
address, device ID, and interface ID of a local device into LLDP packets and sends the packets to neighboring
devices. These neighboring devices save the information received in a standard management information
base (MIB) to help the network management system (NMS) query and determine the link communication
status.
Purpose
Diversified network devices are deployed on a network, and configurations of these devices are complicated.
Therefore, NMSs must be able to meet increasing requirements for network management capabilities, such
as the capability to automatically obtain the topology status of connected devices and the capability to
detect configuration conflicts between devices. A majority of NMSs use an automated discovery function to
trace changes in the network topology, but most can only analyze the network layer topology. Network
layer topology information notifies you of basic events, such as the addition or deletion of devices, but gives
you no information about the interfaces to connect a device to other devices. The NMSs can identify neither
the device location nor the network operation mode.
LLDP is developed to resolve these problems. LLDP can identify interfaces on a network device and provide
detailed information about connections between devices. LLDP can also display information about paths
between clients, switches, routers, application servers, and network servers, which helps you efficiently locate
network faults.
2022-07-08 246
Feature Description
Benefits
Deploying LLDP improves NMS capabilities. LLDP supplies the NMS with detailed information about network
topology and topology changes, and it detects inappropriate configurations existing on the network. The
information provided by LLDP helps administrators monitor network status in real time to keep the network
secure and stable.
LLDP Frames
LLDP frames are Ethernet frames encapsulated with LLDP data units (LLDPDUs). LLDP frames support two
encapsulation modes: Ethernet II and Subnetwork Access Protocol (SNAP). Currently, the NE40E supports the
Ethernet II encapsulation mode.
Figure 1 shows the format of an Ethernet II LLDP frame.
Field Description
Source MAC address A MAC address for an interface or a bridge MAC address for a device (Use
the MAC address for an interface if there is one; otherwise, use the bridge
MAC address for a device).
LLDPDU
An LLDPDU is a data unit encapsulated in the data field in an LLDP frame.
2022-07-08 247
Feature Description
A device encapsulates local device information in type-length-value (TLV) format and combines several TLVs
into an LLDPDU for transmission. You can combine various TLVs to form an LLDPDU as required. TLVs allow
a device to advertise its own status and learn the status of neighboring devices.
Figure 2 shows the LLDPDU format.
Each LLDPDU carries a maximum of 28 types of TLVs, and that each LLDPDU starts with the Chassis ID TLV,
Port ID TLV, and Time to Live TLV, and ends with the End of LLDPDU TLV. These four TLVs are mandatory.
Additional TLVs are selected as needed.
TLV
A TLV is the smallest unit of an LLDPDU. It gives type, length, and other information for a device object. For
example, a device ID is carried in the Chassis ID TLV, an interface ID in the Port ID TLV, and a network
management address in the Management Address TLV.
LLDPDUs can carry basic TLVs, TLVs defined by IEEE 802.1, TLVs defined by IEEE 802.3, and Data Center
Bridging Capabilities Exchange Protocol (DCBX) TLVs.
2022-07-08 248
Feature Description
TLV
• Organizationally specific TLVs: include TLVs defined by IEEE 802.1 and those defined by IEEE 802.3. They
are used to enhance network device management. Use these TLVs as needed.
2022-07-08 249
Feature Description
• TLV type: a 7–bit long field. Each value uniquely identifies a TLV type. For example, value 0 indicates the
end of LLDPDU TLV, and value 1 indicates a Chassis ID TLV.
• TLV information string length: a 9–bit long field indicating the length of a TLV string.
• TLV information string: a string that contains TLV information. This field contains a maximum of 511
bytes.
When TLV Type is 127, it indicates that the TLV is an organization-defined TLV. In this case, the TLV
structure is shown in Figure 4.
Organizationally unique identifier (OUI) identifies the organization that defines the TLV.
2022-07-08 250
Feature Description
Each management address is encapsulated in a Management Address TLV in an LLDP frame. The
management address must be set to a valid unicast IP address of a device.
• If you do not specify a management address, a device searches the IP address list and automatically
selects an IP address as the default management address.
• If the device does not find any proper IP address from the IP address list, the system uses a bridge MAC
address as the default management address.
The system searches for the management IP address in the following sequence: IP address of the loopback interface, IP
address of the management network interface, and IP address of the VLANIF interface. Among the IP addresses of the
same type, the system selects the smallest one as the management address.pac
Implementation
LLDP must be used together with MIBs. LLDP requires that each device interface be provided with four MIBs.
An LLDP local system MIB that stores status information of a local device and an LLDP remote system MIB
that stores status information of neighboring devices are the most important. The status information
includes the device ID, interface ID, system name, system description, interface description, device capability,
and network management address.
LLDP requires that each device interface be provided with an LLDP agent to manage LLDP operations. The
LLDP agent performs the following functions:
2022-07-08 251
Feature Description
• Identifies and processes LLDP packets sent by neighboring devices and maintains information in the
LLDP remote system MIB.
• Sends LLDP alarms to the NMS when detecting changes in information stored in the LLDP local or
remote MIB.
• The LLDP module maintains the LLDP local system MIB by exchanging information with the PTOPO
MIB, Entity MIB, Interface MIB, and Other MIBs of the device.
• An LLDP agent sends LLDP packets carrying local device information to neighboring devices directly
connected to the local device.
• An LLDP agent updates the LLDP remote system MIB after receiving LLDP packets from neighboring
devices.
The NMS collects and analyzes topology information stored in LLDP local and remote system MIBs on all
managed devices and determines the network topology. The information helps rapidly detect and rectify
network faults.
Working Mechanism
LLDP working modes
2022-07-08 252
Feature Description
• Tx/Rx mode: enables a device to send and receive LLDP packets. The default working mode is Tx/Rx.
When the LLDP working mode changes on an interface, the interface initializes the LLDP state machines. To prevent
repeatedly initializations caused by frequent working mode changes, the NE40E supports an initial delay on the
interface. When the working mode changes on the interface, the interface initializes the LLDP state machines after a
configured delay interval elapses.
• After LLDP is enabled on a device, the device periodically sends LLDP packets to neighboring devices. If
the configuration is changed on the local device, the device immediately sends LLDP packets to notify
neighboring devices of the changes. If information changes frequently, set a delay for an interface to
send LLDP packets. After an interface sends an LLDP packet, the interface does not send another LLDP
packet until the configured delay time elapses, which reduces the number of LLDP packets to be sent.
• The fast sending mechanism allows the NE40E to override a pre-configured delay time and quickly
advertise local information to other devices in the following situations:
■ A device receives an LLDP packet sent by a transmitting device, whereas the device has no
information about the transmitting device.
The fast sending mechanism shortens the interval at which LLDP packets are sent to 1 second. After a
specified number of LLDP packets are sent, the pre-configured delay time is restored.
Networking Description
In single neighbor networking, no interfaces between devices or interfaces between devices and media
endpoints (MEs) are directly connected to intermediate devices. Each device interface is connected only to
one remote neighboring device. In the single neighbor networking shown in Figure 1, Device B is directly
connected to Device A and the ME, and each interface of Device A and Device B is connected only to a single
2022-07-08 253
Feature Description
Feature Deployment
After LLDP is configured on Device A and Device B, an administrator can use the NMS to obtain Layer 2
configuration information about these devices, collect detailed network topology information, and determine
whether a configuration conflict exists. LLDP helps make network communications more secure and stable.
Networking Description
In multi-neighbor networking, each interface is connected to multiple remote neighboring devices. In the
multi-neighbor networking shown in Figure 1, the network connected to Device A, Device B, and Device C is
unknown. Devices on this unknown network may have LLDP disabled or may not need to be managed by
the NMS, but they can still transparently transmit LLDP packets. Interfaces on Device A, Device B, and Device
C are connected to multiple remote neighboring devices.
2022-07-08 254
Feature Description
Feature Deployment
After LLDP is configured on Device A, Device B, and Device C, an administrator can use the NMS to obtain
Layer 2 configuration information about these devices, collect detailed network topology information, and
determine whether a configuration conflict exists. LLDP helps make network communications more secure
and stable.
Networking Description
In Figure 1, aggregated links exist between interfaces on Device A and Device B. Each aggregated link
interface is connected directly to another aggregated link interface in the same way in single neighbor
networking.
2022-07-08 255
Feature Description
Feature Deployment
After LLDP is configured on Device A and Device B, an administrator can use the NMS to obtain Layer 2
configuration information about these devices, collect detailed network topology information, and determine
whether a configuration conflict exists. LLDP helps make network communications more secure and stable.
Terms
Term Description
DCBX Data Center Bridging Capabilities Exchange Protocol. DCBX provides parameter
negotiation and remote configuration for Data Center Bridging (DCB)-enabled
network devices.
agent A process running on managed devices. Each device interface is provided with an
LLDP agent to manage LLDP operations.
2022-07-08 256
Feature Description
VM virtual machine
Definition
Synchronization is classified into the following types:
• Clock Synchronization
Clock synchronization maintains a strict relationship between signal frequencies or between signal
phases. Signals are transmitted at the same average rate within the valid time. In this manner, all
devices on a network run at the same rate.
On a digital communication network, a sender places a pulse signal in a specific timeslot for
transmission. A receiver needs to extract this pulse signal from this specific timeslot to ensure that the
sender and receiver communicate properly. A prerequisite of successful communication between the
sender and receiver is clock synchronization between them. Clock synchronization enables the clocks on
the sender and receiver to be synchronized.
• Time Synchronization
Generally, the word "time" indicates either a moment or a time interval. A moment is a transient in a
period, whereas a time interval is the interval between two transients. Time synchronization is achieved
by adjusting the internal clocks and moments of devices based on received time signals. The working
principle of time synchronization is similar to that of clock synchronization. When a time is adjusted,
both the frequency and phase of a clock are adjusted. The phase of this clock is represented by a
moment in the form of year, month, day, hour, minute, second, millisecond, microsecond, and
nanosecond. Time synchronization enables devices to receive discontinuous time reference information
2022-07-08 257
Feature Description
and to adjust their times to synchronize times. Clock synchronization enables devices to trace a clock
source to synchronize frequencies.
The figure shows the difference between time synchronization and clock synchronization. In time
synchronization (also known as phase synchronization), watches A and B always keep the same time. In
clock synchronization, watches A and B keep different times, but the time difference between the two
watches is a constant value, for example, 6 hours.
Purpose
On a digital communication network, clock synchronization is implemented to limit the frequency or phase
difference between network elements (NEs) within an allowable range. Pulse code modulation (PCM) is
used to encode information into digital pulse signals before transmission. If two digital switching devices
have different clock frequencies, or if interference corrupts the digital bit streams during transmission, phase
drift or jitter occurs. Consequently, code-element loss or duplication may occur in the buffer of the involved
digital switching device, resulting in slip of transmitted bit streams. In addition, if the clock frequency or
phase difference exceeds an allowable range, bit errors or jitter may occur, degrading the network
transmission performance.
Clock Source
A device that provides clock signals for another device is called a clock source. A device may have multiple
clock sources, which are classified as follows:
2022-07-08 258
Feature Description
• Automatic clock source selection: The system uses the automatic clock source selection algorithm to
determine a clock source to be traced based on priorities, SSM levels, and clock IDs of clock sources.
• Manual clock source selection: A clock source to be traced is manually specified. This clock source must
have the highest SSM level.
• Forcible clock source selection: A clock source to be traced is forcibly specified. This clock source can be
any clock source.
You are advised to configure the automatic clock source selection mode. In this mode, the system dynamically selects an
optimal clock source based on clock source quality.
If a manually specified clock source becomes invalid, the system automatically switches to track the clock source
selected in automatic clock source selection mode. After the manually specified clock source recovers, the system does
not switch back to the manual clock source selection mode. If the conditions for manual clock source selection are not
met, automatic clock source selection takes effect. If a forcibly specified clock source becomes invalid, the system clock
enters the holdover state. If the conditions are not met, the system clock enters the free-run state.
2022-07-08 259
Feature Description
SSM level, the device selects a clock source based on the priorities of these clock sources.
SSM
The International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) defined the
SSM to identify the quality level of a synchronization source on synchronous digital hierarchy (SDH)
networks. As stipulated by the ITU-T, the four spare bits in one of the five Sa bytes in a 2 Mbit/s bit stream
are used to carry the SSM value. The use of the SSM value in clock source selection improves
synchronization network performance, prevents timing loops, achieves synchronization on networks with
different structures, and enhances synchronization network reliability.
The SSM levels in ascending order are as follows:
Extended SSM
The extended SSM function enables clock IDs to participate in automatic clock source selection. This function
prevents clock loops.
When the extended SSM function is enabled, the device does not allow clock IDs to participate in automatic
clock source selection in either of the following cases:
• The clock ID of a clock source is the same as the clock ID configured on the device.
Enhanced SSM
The enhanced SSM function adds four SSM levels to the original SSM levels. After enhanced SSM is enabled,
the system uses the enhanced SSM levels as the information for clock source selection and collects statistics
on the number of high-precision devices and the number of common-precision devices on clock transmission
links.
The four new SSM levels in ascending order are as follows:
1. ESEC: The clock source is a G.8262.1 enhanced synchronous equipment clock (eSEC).
2. EPRC: The clock source is a G.811.1 enhanced primary reference clock (ePRC).
3. PRTC: The clock source is a G.8272 primary reference time clock (PRTC).
2022-07-08 260
Feature Description
4. EPRTC: The clock source is a G.8272.1 enhanced primary reference time clock (ePRTC).
Pseudo Synchronization
In pseudo synchronization mode, each switching site has its own clock with very high accuracy and stability,
and these clock are independent of each other. There are very small differences in terms of clock frequency
and phase among these clocks, they do not affect service transmission and can be ignored. Therefore, clock
synchronization is not carried out among the switching sites. This is the reason that the mode is called
pseudo synchronization.
Pseudo synchronization is typically applicable to digital communication networks between countries.
Generally, countries use caesium clocks in scenarios of pseudo synchronization.
Master-Slave Synchronization
In master-slave synchronization mode, a master clock of high accuracy is set on a network and traced by
every site. Each sub-site traces its upper-stratum clock. In this way, clock synchronization is maintained
among all the NEs.
Master-slave synchronization is classified as direct or hierarchical master-slave synchronization.
Figure 1 illustrates direct master-slave synchronization. In this mode, all slave clocks synchronize with the
primary reference clock. Direct master-slave synchronization is applicable to simple networks.
Figure 2 illustrates hierarchical master-slave synchronization. In this mode, there are three stratums of
clocks: stratum-1 reference clock, stratum-2 slave clock, and stratum-3 slave clock. The stratum-2 slave
clocks synchronize with the stratum-1 reference clock, and the stratum-3 slave clocks synchronize with the
stratum-2 slave clocks. Hierarchical master-slave synchronization is applicable to large and complex
2022-07-08 261
Feature Description
networks.
• Acquiring
A slave clock traces the clock source provided by an upper-stratum clock. The clock source may be
provided either by the master clock or by the upper-stratum clock.
• Holdover
After losing connections to all the reference clocks, a slave clock enters the holdover state. In this case,
the slave clock uses the last frequency stored before it loses the connections as the reference clock
frequency. In addition, the slave clock provides the clock signals that conform to the original reference
clock to ensure that there is a small difference between the frequency of the provided clock signals and
that of the original reference clock in a period of time.
Because the inherent frequency of the oscillator is prone to drifts, the slave clock in the holdover state
may lose accuracy over a prolonged period of time. The accuracy of a clock in the holdover state is
second only to that of the clock in the acquiring state.
• Free-run
After losing connections to all external reference clocks, a slave clock loses the clock reference memory
or retains the holdover state for an excessively long time. As a result, the oscillator in the slave clock
starts working in the free-run state.
Synchronization
Thanks to the long transmission distance of optical fibers, synchronizing clock signals through synchronous
Ethernet links has become the most common networking mode for clock synchronization.
2022-07-08 263
Feature Description
2022-07-08 264
Feature Description
0100 0x04 G.812 transit site clock signals (SSUA, a rubidium clock)
2022-07-08 265
Feature Description
1000 0x08 G.812 local site clock signals (SSUB, a rubidium clock or a
crystal clock)
When the clock board is powered on, the default SSM levels of all reference sources are Unknown. The
sequence of the SSM levels from high to low is PRC, SSUA, SSUB, SEC, UNKNOWN, and DNU. If the SSM
level of a clock source is DNU and the SSM level participates in the selection of a clock source, the clock
source is not selected during protection switching.
The SSM level of output signals is determined by the traced clock source. When the clock works in the
trace state, the SSM level of output signals and that of the traced clock source are the same. When the
clock does not work in the trace state, the SSM level of output signals is SEC.
For a line clock source, the SSM can be extracted from an interface board and reported to the IPU. The
IPU then sends the SSM to the clock board. The IPU can also forcibly set the SSM of the line clock
source.
■ If the signal is 2.048 Mbit/s, the clock module can extract the SSM from the signal.
■ If the signal is 2.048 MHz, the SSM level can be set manually.
The Router can only select an SSM value listed in Table 1. For values not listed, the Router processes them as DNU.
2022-07-08 266
Feature Description
• Interface board
An interface board is responsible for inserting and extracting the SSM. The SSM of the best clock source
sent by the clock board is set on each synchronous physical interface on the interface board for
distribution. The SSM of the best clock source received by each synchronous interface is processed by
the interface board.
• Clock board
A clock board extracts the SSMs of an external clock and implements protection switching between
clock sources. After receiving SSMs from an interface board, the clock board determines the clock
source to be traced based on SSM levels, implements clock protection switching, and sends the SSM
level of the current clock source to other interface boards.
Definition
The 1588 adaptive clock recovery (ACR) algorithm is used to carry out clock (frequency) synchronization
between the NE40E and clock servers by exchanging 1588v2 messages over a clock link that is set up by
sending Layer 3 unicast packets.
Unlike 1588v2 that achieves frequency synchronization only when all devices on a network support 1588v2,
1588 ACR is capable of implementing frequency synchronization on a network with both 1588v2-aware
devices and 1588v2-unaware devices.
After 1588 ACR is enabled on a server, the server provides 1588 ACR frequency synchronization services for
clients.
1588 ACR records PDV performance statistics in the CF card. The performance statistics indicate the delay and jitter
information about packets but not information in the packets.
Purpose
All-IP has become the trend for future networks and services. Therefore, traditional networks based on the
2022-07-08 267
Feature Description
Synchronous Digital Hierarchy (SDH) have to overcome various constraints before migrating to IP packet-
switched networks. Transmitting Time Division Multiplexing (TDM) services over IP networks presents a
major technological challenge. TDM services are classified into two types: voice services and clock
synchronization services. With the development of VoIP, technologies of transmitting voice services over an
IP network have become mature and have been extensively used. However, development of technologies of
transmitting clock synchronization services over an IP network is still under way.
1588v2 is a software-based technology that carries out time and frequency synchronization. To achieve
higher accuracy, 1588v2 requires that all devices on a network support 1588v2; if not, frequency
synchronization cannot be achieved.
Derived from 1588v2, 1588 ACR implements frequency synchronization with clock servers on a network with
both 1588v2-aware devices and 1588v2-unaware devices. Therefore, in the situation where only frequency
synchronization is required, 1588 ACR is more applicable than 1588v2.
Benefits
This feature brings the following benefits to operators:
• Frequency synchronization can be achieved on networks with both 1588v2-aware and 1588v2-unaware
devices, reducing the costs of network construction.
• Operators can provide more services that can meet subscribers' requirements for frequency
synchronization.
1588 ACR clock synchronization is implemented in two modes: one-way mode and two-way mode.
• One-way mode
2022-07-08 268
Feature Description
1. The server sends the client 1588v2 messages at t1 and t1' and time-stamps the messages with t1
and t1'.
2. The client receives the 1588v2 messages at t2 and t2' and time-stamps the messages with t2 and
t2'.
t1 and t1' are the clock time of the server, and t2 and t2' are the clock time of the client.
By comparing the sending time on the server and the receiving time on the client, 1588 ACR calculates
a frequency offset between the server and client and then implements frequency synchronization. For
example, if the result of the formula (t2 - t1)/(t2' - t1') is 1, frequencies on the server and client are the
same; if not, the frequency of the client needs to be adjusted so that it is the same as the frequency of
the server.
• Two-way mode
1. The server clock sends a 1588 sync packet carrying a timestamp t1 to the client server at t1.
2. The client server receives a 1588 sync packet from the server clock at t2.
2022-07-08 269
Feature Description
3. The client clock sends a 1588 delay_req packet to the server clock at t3.
4. The server clock receives the 1588 delay_req packet from the client clock at t4, and sends a
delay_resp packet to the slave clock.
The same calculation method is used in two-way and one-way modes. t1 and t2 are compared with t3 and
t4. A group of data with less jitter is used for calculation. In the same network conditions, the clock signals
with less jitter in one direction can be traced, which is more precise than clock signal tracing in one direction.
The two-way mode has a better frequency recovery accuracy and higher reliability than the one-way mode.
If adequate bandwidth is provided, using clock synchronization in two-way mode is recommended for
frequency synchronization when deploying 1588 ACR.
Duration Mechanism
On a 1588 ACR client, you can configure a duration for Announce, Sync and delay_resp packets. The duration
value is carried in the TLV field of a packet for negotiating signaling and sent to a server.
Generally, the client sends a packet to renegotiate with the server before the duration times out so that the
server can continue to provide the client with synchronization services.
If the link connected to the client goes Down or fails, the client cannot renegotiate with the server. When
the duration times out, the server stops sending Sync packets to the client.
2022-07-08 270
Feature Description
On the preceding network, CSGs support 1588 ACR and function as clients to initiate requests for Layer 3
unicast connections to the upstream IPCLK server. The CSGs then exchange 1588v2 messages with the IPCLK
server over the connections, achieving frequency recovery. BITS1 and BITS2 are configured as clock servers
for the CSGs to provide protection.
One CSG sends line clock signals carrying frequency information to NodeB1 along an E1 link. The other CSG
transmits NodeB2 frequency information either along a synchronous Ethernet link or by sending 1588v2
messages. In this manner, both NodeBs connected to the CSGs can achieve frequency synchronization.
Terms
Term Description
2022-07-08 271
Feature Description
Term Description
Time Time synchronization, also called phase synchronization, refers to the consistency of
synchronization both frequencies and phases between signals. This means that the phase offset
between signals is always 0.
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers (IEEE), is a
PTP standard for Precision Clock Synchronization Protocol for Networked Measurement and
Control Systems. The Precision Time Protocol (PTP) is used for short.
ITU-T G.8265.1 G.8265.1 defines the main protocols of 1588 ACR. Therefore, G.8265.1 usually refers to
the 1588 ACR feature.
Abbreviations
2022-07-08 272
Feature Description
Definition
Circuit emulation service (CES) adaptive clock recovery (ACR) clock synchronization implements adaptive
clock frequency synchronization. CES ACR clock synchronization uses special circuit emulation headers to
encapsulate time multiplexing service (TDM) packets that carry clock frequency information and transmits
these packets over a packet switched network (PSN).
Purpose
If a clock frequency is outside the allowable error range, problems such as bit errors and jitter occur. As a
result, network transmission performance deteriorates. CES ACR uses the adaptive clock recovery algorithm
to synchronize clock frequencies and confines the clock frequencies of all network elements (NEs) on a
digital network to within the allowable error range, enhancing network transmission stability.
If the intermediate packet switched network (PSN) does not support clock synchronization at the physical
layer, CES ACR uses TDM services to implement synchronization.
4.14.2 References
The following table lists the references of this chapter.
ITU-T G.8261 Timing and synchronization aspects in packet networks Fully compliant
CES
The CES technology originated from the Asynchronous Transfer Mode (ATM) network. CES uses emulated
circuits to encapsulate circuit service data into ATM cells and transmits these cells over the ATM network.
Later, circuit emulation was used on the Metro Ethernet to transparently transmit TDM and other circuit
switched services.
CES uses special circuit emulation headers to encapsulate TDM service packets that carry clock frequency
information and transmits these packets over the PSN.
CES ACR
The CES technology generally uses the adaptive clock recovery algorithm to synchronize clock frequencies. If
an Ethernet transmits TDM services over emulated circuits, the Ethernet uses the adaptive clock recovery
algorithm to extract clock synchronization information from data packets.
2022-07-08 273
Feature Description
2. The CE1 encapsulates clock frequency information into TDM service packets sends to gateway IWF1.
3. Gateway IWF1 that connects to the master clock regularly sends service clock information to gateway
IWF2 that connects to the slave clock. The service clock information is coded using sequence numbers
or timestamp. The service clock information is encapsulated into T1/E1 service packets for
transmission.
4. Upon receipt, gateway IWF2 that connects to the slave clock extracts the timestamp or sequence
number from packets and uses ACR to recover clocks. The clock recovered on IWF2 tracks and locks
the clock imported to the TDM services on IWF1. This ensures frequency synchronization between the
two devices on the PSN.
On the network shown in Figure 1, the clock source sends clock frequency information to CE1. CE1
encapsulates the clock frequency information into TDM services and transmits the services over the
2022-07-08 274
Feature Description
intermediate PSN through routers. Upon receipt, the router connected to the slave clock uses CES ACR to
recover the clock frequency. In actual applications, multiple E1/T1 interfaces can belong to the same clock
recovery domain. The system uses the PW source selection algorithm to select a PW as the primary PW and
uses the primary PW to recover clocks. If the primary PW fails, the system automatically selects the next
available PW as the primary PW to recover clocks. If multiple PWs are configured to belong to the same
clock domain, the TDM services carried over these PWs must also have the same clock source. Otherwise,
packet loss or frequency deviation adjustment may occur.
Abbreviations
Definition
• Synchronization
This is the process of ensuring that the frequency offset or time difference between devices is kept
within a reasonable range. In a modern communications network, most telecommunications services
require network clock synchronization in order to function properly. Network clock synchronization
includes time synchronization and frequency synchronization.
■ Time synchronization
Time synchronization, also called phase synchronization, means that both the frequency of and the
time between signals remain constant. In this case, the time offset between signals is always 0.
■ Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a constant frequency offset
or phase offset. In this case, signals are transmitted at a constant average rate during any given
time period so that all the devices on the network can work at the same rate.
2022-07-08 275
Feature Description
Figure 1 shows the differences between time synchronization and frequency synchronization. If Watch A
and Watch B always have the same time, they are in time synchronization. If Watch A and Watch B
have different time, but the time offset remains constant, for example, 6 hours, they are in frequency
synchronization.
• IEEE 1588
IEEE 1588 is defined by the Institute of Electrical and Electronics Engineers (IEEE) as Precision Clock
Synchronization Protocol (PTP) for networked measurement and control systems. It is called the
Precision Time Protocol (PTP) for short.
IEEE 1588v1, released in 2002, applies to industrial automation and tests and measurements fields. With
the development of IP networks and the popularization of 3G networks, the demand for time
synchronization on telecommunications networks has increased. To satisfy this need, IEEE drafted IEEE
1588v2 based on IEEE 1588v1 in June 2006, revised IEEE 1588v2 in 2007, and released IEEE 1588v2 at
the end of 2008.
Targeted at telecommunications industry applications, IEEE 1588v2 improves on IEEE 1588v1 in the
following aspects:
1588v2 is a time synchronization protocol which allows for highly accurate time synchronization
between devices. It is also used to implement frequency synchronization between devices.
2022-07-08 276
Feature Description
• ITU-T G.8275.1
ITU-T G.8275.1 defines the precision time protocol telecom profile for phase/time synchronization with
full timing support from the network.
G.8275.1 defines three types of clocks, including T-GM, T-BC and T-TSC. A bearer network device is
configured as the T-BC.
• SMPTE-2059–2
SMPTE-2059-2 is an IEEE 1588-based standard that allows time synchronization of video devices over
an IP network.
Purpose
Data communications networks do not require time or frequency synchronization and, therefore, Routers on
such networks do not need to support time or frequency synchronization. On IP radio access networks
(RANs), time or frequency needs to be synchronized among base transceiver stations (BTSs). Therefore,
Routers on IP RANs are required to support time or frequency synchronization.
Frequency synchronization between BTSs on an IP RAN requires that frequencies between BTSs be
synchronized to a certain level of accuracy; otherwise, calls may be dropped during mobile handoffs. Some
wireless standards require both frequency and time synchronization. Table 1 shows the requirements of
wireless standards for time synchronization and frequency accuracy.
Table 1 Requirements of wireless standards for time synchronization and frequency accuracy
Different BTSs have different requirements for frequency synchronization. These requirements can be
satisfied through physical clock synchronization (including external clock input, WAN clock input, and
synchronous Ethernet clock input) and packet-based clock recovery.
Traditional packet-based clock recovery cannot meet the time synchronization requirement of BTSs. For
example, NTP-based time synchronization is only accurate to within one second and 1588v1-based time
2022-07-08 277
Feature Description
synchronization is only accurate to within one millisecond. To meet time synchronization requirements, BTSs
need to be connected directly to a global positioning system (GPS). This solution, however, has some
disadvantages such as GPS installation and maintenance costs are high and communications may be
vulnerable to security breaches because a GPS uses satellites from different countries.
1588v2, with hardware assistance, provides time synchronization accuracy to within one micro second to
meet the time synchronization requirements of wireless networks. Thus, in comparison with a GPS, 1588v2
deployment is less costly and operates independently of GPS, making 1588v2 strategically significant.
In addition, operators are paying more attention to the operation and maintenance of networks, requiring
Routers to provide network quality analysis (NQA) to support high-precision delay measurement at the 100
us level. Consequently, high-precision time synchronization between measuring devices and measured
devices is required. 1588v2 meets this requirement.
1588v2 packets are of the highest priority by default to avoid packet loss and keep clock precision.
Benefits
This feature brings the following benefits to operators:
• Construction and maintenance costs for time synchronization on wireless networks are reduced.
• Time synchronization and frequency synchronization on wireless networks are independent of GPS,
providing a higher level of strategic security.
Concepts of G.8275.1
ITU-T G.8275.1 defines the precision time protocol telecom profile for phase/time synchronization with full
timing support from the network. G.8275.1 is defined as a time synchronization protocol.
A physical network can be logically divided into multiple clock domains. Each clock domain has its own
independent synchronous time, with which clocks in the same domain synchronize.
A node on a time synchronization network is called a clock. G.8275.1 defines three types of clocks:
• A Telecom grandmaster (T-GM) can only be the master clock that provides time synchronization.
• A Telecom-boundary clock (T-BC) has more than one G.8275.1 interface. One interface of the T-BC
synchronizes time signals with an upstream clock, and the other interfaces distribute the time signals to
downstream clocks.
• A Telecom-transparent clock (T-TC) has more than one G.8275.1 interface through which the T-TC
forwards G.8275.1 packets, and corrects the packet transmission delay. A T-TC does not synchronize the
time through any of these G.8275.1 interfaces.
• A Telecom time slave clock (T-TSC) can only be the slave clock that synchronizes the time information
of the upstream device.
2022-07-08 278
Feature Description
Concepts of SMPTE-2059-2
SMPTE-2059-2 is an IEEE 1588-based standard that allows time synchronization of video devices over an IP
network.
The SMPTE-2059-2 protocol provides acceptable lock time, jitter, and precision.
SMPTE-2059-2 is developed based on IEEE 1588. For information about the principles, networking, and
related concepts of SMPTE-2059-2, see the IEEE 1588 protocol.
Clock Domain
A physical network can be logically divided into multiple clock domains. Each clock domain has a reference
time with which all devices in the domain are synchronized. The reference time in one clock domain is
different from and independent of that in another clock domain.
A device can transparently transmit the time information from multiple clock domains over a transport
network to provide reference times for multiple mobile carrier networks. The device, however, can join only
one clock domain and synchronize the time with only one reference time.
Clock Nodes
Each node on a time synchronization network is called a clock. 1588v2 defines the following types of clocks:
2022-07-08 279
Feature Description
other devices where a BC or an OC does. A TC has multiple 1588v2 interfaces. Through these interfaces,
the TC forwards 1588v2 packets and corrects the packet forwarding delay. Unlike a BC and OC, a TC
does not synchronize the time with other devices through any of these 1588v2 interfaces.
TCs are classified as either end-to-end (E2E) TCs or peer-to-peer (P2P) TCs.
• TC+OC
A TC+OC is a special TC. It has the same functions as a TC in terms of time synchronization (forwarding
1588v2 packets and correcting the forwarding delay) and performs clock synchronization on OC
interfaces (only clock synchronization is performed, whereas time synchronization is not).
As described earlier, a TC can correct the forwarding delay for the 1588v2 packets forwarded by itself.
As long as a TC'S inbound and outbound interfaces keep synchronized time, the time difference
between when the inbound interface receives a packet and when the outbound interface sends a packet
is the forwarding delay. However, if a TC is not synchronous with a BC or OC that performs time
synchronization, the packet forwarding delay is inaccurate. This results in the BC or OC calculating time
synchronization incorrectly, decreasing the time synchronization precision.
Usually, it is recommended that the clock synchronization between a TC and a BC or OC be
implemented through a physical clock, such as a WAN clock or synchronous Ethernet clock. If no
physical clock is available, the TC needs to synchronize the frequency using the 1588v2 Sync packets
periodically sent by an upstream device, thereby achieving clock synchronization with the upstream
device. This is the function of a TC+OC.
TC+OCs are classified as either E2E TC+OCs or P2P TC+OCs.
Figure 1 shows the positions of the OC, BC, and TC on a time synchronization network.
2022-07-08 280
Feature Description
• BC
• TC
• E2ETC
• P2PTC
• E2ETCOC
• TCandBC
• P2PTCOC
• In MAC encapsulation, VLAN IDs and 802.1p priorities are carried in 1588v2 packets. MAC encapsulation
is divided into two types:
■ Unicast encapsulation
2022-07-08 281
Feature Description
■ Multicast encapsulation
• In UDP encapsulation, differentiated services code point (DSCP) values are carried in 1588v2 packets.
UDP encapsulation is divided into two types:
■ Unicast encapsulation
■ Multicast encapsulation
Grandmaster
A time synchronization network is like a spanning tree, on which the grandmaster clock is the root node.
Other nodes synchronize their time with the grandmaster clock.
Master/Slave
When a pair of nodes performs time synchronization, the upstream node distributing the reference time is
the master node and the downstream node receiving the reference time is the slave node.
2022-07-08 282
Feature Description
Delay Mode
The Delay mode is applied to E2E delay measurement. Figure 1 shows the delay measurement in Delay
mode.
In Figure 1, t-sm and t-ms are delays in opposite directions. In the following example, the two delay values are the same.
If they are different, the asymmetrical delay correction mechanism can be used to compensate for the asymmetric delay.
For details about asymmetric delay correction, see the following part of this section.
Follow_Up packets are used in two-step mode. Here, the one-step mode is described and Follow_Up packets are
disregarded. The two-step mode is described later in this section.
A master node periodically sends a Sync packet carrying the sending timestamp t1 to the slave node. When
the slave node receives the Sync packet, it adds the timestamp t2 to the packet.
The slave node periodically sends a Delay_Req packet to the master node and records the sending
timestamp t3. When the master node receives the Delay_Req packet, it adds the timestamp t4 to the packet
and returns a Delay_Resp packet to the slave node.
In this way, the slave node obtains a set of timestamps, namely, t1, t2, t3, and t4. Essentially, the
bidirectional delays are as follows:
The sum of bidirectional delays on the link between the master and slave nodes is equal to (t4 – t1) – (t3 –
t2). The unidirectional delay (Delay) on the link between the master and slave nodes (assuming that the
delays in opposite directions are symmetric) is equal to [(t4 – t1) – (t3 – t2)]/2.
2022-07-08 283
Feature Description
If the time offset of the slave node relative to the master node is Offset, then:
t2 – t1 = Delay + Offset
t4 – t3 = Delay – Offset
Therefore, Offset is [(t2 – t1) – (t4 – t3)]/2.
Based on the time offset, the slave node synchronizes its time with the master node.
This process is performed repeatedly to maintain time synchronization between the slave and master nodes.
Figure 2 shows the networking.
Figure 2 shows a scenario in which a BC and an OC are directly connected. TCs can also be deployed
between the BC and OC; however, the TCs must be 1588v2-capable devices in order to ensure the precision
of time synchronization. If TCs are deployed, they only transparently transmit 1588v2 packets and correct
the forwarding delays in these packets.
Stable delay, without variation, between two nodes is key to achieving high precision in 1588v2 time
synchronization. Generally, link delays can meet this requirement. However, because the forwarding delay
varies significantly, the precision of time synchronization cannot be ensured if a forwarding device is
deployed between two nodes that perform time synchronization. The solution to this is to perform
forwarding delay correction on forwarding devices (which must be TCs).
Figure 3 shows how the forwarding delay correction is performed on a TC.
2022-07-08 284
Feature Description
The TC modifies the CorrectionField field of a 1588v2 packet on the inbound and outbound interfaces.
Specifically, the TC subtracts the timestamp indicating when the 1588v2 packet was received on the inbound
interface and adds the timestamp indicating when the 1588v2 packet was sent from the outbound interface.
As such, the forwarding delay of the 1588v2 packet on the TC is added to the CorrectionField field.
In this manner, the 1588v2 packet exchanged between the master and slave nodes, when passing through
multiple TCs, carry packet forwarding delays of all TCs in the CorrectionField field. When the slave node is
synchronized with the master node, the value of the CorrectionField field is deducted and the value obtained
is the link delay. This ensures high-precision time synchronization.
The preceding TCs are called E2E TCs. In Delay mode, only E2E TCs are applicable. Figure 4 shows how the
BC, OC and E2E TC are connected and how 1588v2 operates.
Figure 4 Networking of the BC, OC, and E2E TC and the synchronization process
PDelay Mode
2022-07-08 285
Feature Description
When performing time synchronization in PDelay mode, the slave node deducts both the packet forwarding
delay and upstream link delay. The time synchronization in PDelay mode requires that each device obtains
its upstream link delay. This can be achieved by running the peer delay protocol between adjacent devices.
Figure 5 shows the time synchronization process.
In Figure 1, t-sm and t-ms are delays in opposite directions. In the following example, the two delay values are the same.
If they are different, the asymmetrical delay correction mechanism can be used to compensate for the asymmetric delay.
For details about asymmetric delay correction, see the following part of this section.
Follow_Up packets are used in two-step mode. Here, the one-step mode is described and Follow_Up packets are
disregarded. The two-step mode is described later in this section.
Node 1 periodically sends a PDelay_Req packet carrying the sending timestamp t1 to node 2. When node 2
receives the Sync packet, it adds the timestamp t2 to the packet. Node 2 sends a PDelay_Resp packet to
node 1 and saves the sending timestamp t3. When node 1 receives the PDelay_Resp packet, it adds the
timestamp t4 to the packet.
In this way, node 1 obtains a set of timestamps, namely, t1, t2, t3, and t4. Essentially, the bidirectional delays
are as follows:
The sum of bidirectional delays on the link between node 1 and node 2 is equal to (t4 – t1) – (t3 – t2).
The unidirectional delay on the link between node 1 and node 2 (assuming that the delays in opposite
directions are symmetric) is equal to [(t4 – t1) – (t3 – t2)]/2.
The delay measurement in PDelay mode does not differentiate between the master and slave nodes. All
nodes send PDelay packets to their adjacent nodes to calculate adjacent link delay. This calculation process
repeats and the packet transmission delay in one direction is updated accordingly.
In the preceding process, the link delay is calculated and updated in real time, but time synchronization is
not performed. For time synchronization, Sync packets must be sent from the master node to the slave node.
2022-07-08 286
Feature Description
Specifically, the master node periodically sends a Sync packet to the slave node, which obtains two
timestamps, namely, t1 and t2. After the slave node corrects the delay by deducting the delay on the link
from the master node to the slave node, the obtained value (t2 – t1 – CorrectionField) is the time offset of
the slave node relative to the master node. Based on the time offset, the slave node synchronizes its time
with the master node. Figure 6 shows the networking.
Figure 6 Networking diagram of time synchronization in PDelay mode between the directly connected BC and OC
Figure 8 shows how the BC, OC and P2P TC are connected and how PDelay operates.
2022-07-08 287
Feature Description
Figure 8 Networking and schematic diagram of forwarding delay correction in PDelay mode on a P2P TC
One-Step/Two-Step
In one-step mode, Sync packets for time synchronization in Delay mode and PDelay_Resp packets for time
synchronization in PDelay mode include a sending timestamp.
In two-step mode, Sync packets for time synchronization in Delay mode and PDelay_Resp packets for time
synchronization in PDelay mode do not include a sending timestamp. Instead, their sending time is recorded
and then added as a timestamp in subsequent packets, such as Follow_Up and PDelay_Resp_Follow_Up
packets.
2022-07-08 288
Feature Description
Generally, the values of t-sm and t-ms are the same. If they are different and the difference remains fixed,
you can measure the delay difference using a meter, and then configure the delay difference. On this basis,
1588v2 calculates the asymmetry correction value during time synchronization calculation, thereby achieving
precise time synchronization even for links with asymmetric delays.
Packet Encapsulation
1588v2 defines the following packet encapsulation modes:
2022-07-08 289
Feature Description
server is connected to multiple BTSs and uses unicast UDP to exchange 1588v2 protocol packets. Figure
12 shows Layer 3 unicast encapsulation without tags.
The NE40E supports Layer 2 multicast encapsulation, Layer 2 unicast encapsulation, Layer 3 multicast
encapsulation, and Layer 3 unicast encapsulation.
BITS Interface
1588v2 enables time synchronization between clock nodes, but cannot synchronize these clock nodes with
the Coordinated Universal Time (UTC). To ensure that the clock nodes are synchronized with the UTC, an
external time source is required. In other words, the grandmaster clock needs to be connected to an external
time source to obtain synchronized time in non-1588v2 mode.
Currently, external time sources are predominantly satellite-based, for example, the GPS (US), Galileo
(Europe), GLONASS (Russia), and Beidou (China). Figure 14 shows the connection mode.
Each main control board on the NE40E provides one external time interface and one external clock interface,
and each channel of time/clock signals is exchanged between the active and standby main control boards.
2022-07-08 290
Feature Description
Two RJ45 interfaces, one of which functions as an external clock interface and the other as an external
time interface. They provide the following clock or time signals:
■ 2 MHz clock signal (differential level with one line clock input and one line clock output)
■ 2 Mbit/s clock signal (differential level with one line clock input and one line clock output)
■ DC level shifter (DCLS) time signal (RS422 differential level with one line clock input and one line
clock output)
■ 1 pps + TOD time signal (RS422 differential level with one line time input)
■ 1 pps + TOD time signal (RS422 differential level with one line time output)
Clock Synchronization
In addition to time synchronization, 1588v2 can be used for clock synchronization. That is, frequency
synchronization can be achieved through 1588v2 packets.
1588v2 time synchronization in Delay or PDelay mode requires the device at one or both ends of a link to
periodically send Sync packets to its peer.
The Sync packet carries a sending timestamp. After receiving the Sync packet, the peer end adds a receiving
timestamp to it. If the link delay is stable, the sending and receiving timestamps change at the same pace. If
the receiving timestamp changes faster or slower than the sending timestamp, the clock on the receiving
device runs faster or slower than the clock on the sending device. In this case, the local clock on the
receiving device must be adjusted to ensure frequency synchronization between the two devices.
Frequency synchronization through 1588v2 packets has a lower precision than that through synchronous
Ethernet. Where possible, you are therefore advised to use synchronous Ethernet to perform clock
synchronization and use 1588v2 to perform time synchronization.
1588v2 frequency synchronization can be implemented in either of the following modes:
• E2E frequency synchronization (Delay variation may occur on the intermediate network.)
In end-to-end mode, the intermediate devices do not need to support 1588v2. This mode only requires
that the delay variation of the forwarding path meet a specified requirement, for example, less than 20
ms. However, the frequency synchronization accuracy in this mode is low and can meet only the
requirements of the G.8261 and wireless base stations (50 ppb) rather than that of the stratum 3 clock
standard.
To achieve high frequency synchronization accuracy, 1588v2 requires Sync packets to be sent at a high rate
of at least 100 pps.
The NE40E is compliant with the following clock standards:
2022-07-08 291
Feature Description
At present, NE40E supports frequency synchronization through 1588v2 packets only in hop-by-hop mode,
not in E2E or inter-PDV-network mode. Although the NE40E is compliant with G.8261 and G.823/G.824,
compliance with G.813 and G.8262 is not guaranteed.
Offset Introduction
The function of 1588v2 and G.8275.1 requires that the delay on the transmit and receive paths between the
master and slave devices be the same. If the receive and transmit path delay values are different, a
synchronization error is introduced, which is half of the receive and transmit path delay difference. In the
hop-by-hop synchronization scenario, whether the delay of the receive and transmit paths between the
master and slave devices is the same is determined based on the lengths of the receive and transmit fibers.
As shown in Figure 1, fiber asymmetry does not occur if the transmit and receive fibers between the master
and slave devices are routed through the same optical cable and the lengths of pigtails are the same. If the
transmit and receive optical fibers between the master and slave devices are routed through optical cables
of different lengths or the lengths of pigtails are different, fiber asymmetry occurs.
2022-07-08 292
Feature Description
2022-07-08 293
Feature Description
Figure 2 Real-time offset monitoring and automatic compensation when GPS is deployed on the base station side
2. The clock interface of each device sends the time information of the device to the base station.
3. The base station calculates the offset values and sends the signaling packets back to each device,
which carry offset information.
4. NCE obtains the offset information received by each clock port from each device in polling mode.
5. NCE determines asymmetric links and offset values on the network based on the offset information
reported by each device.
6. NCE delivers the offset values to the devices at both ends of the asymmetric links.
2022-07-08 294
Feature Description
If the device cannot obtain the GPS time offset information, connect the reference source (such as the BITS
meter or Atom GPS) to the reference port on the device. Then, the device and NCE calculate the time offset
and automatically compensate for it.
1. After the passive port detection function is enabled on the entire network, the device automatically
determines the device with both the slave port and passive port using the BMC algorithm.
2. If the offset value on the passive port is greater than the threshold, an alarm is triggered. Otherwise,
the clock network is normal and no offset is required.
3. On the device where the passive port resides, select a reference port that supports 1588
synchronization and connect the reference source to the reference port.
4. Each device receives the time synchronization information delivered from the reference port and
calculates the offset between the restored time of the slave port and the time of the reference port.
5. NCE obtains the offset information fed back by each device and determines the asymmetric links and
offset values on the network.
6. NCE delivers the offset values to the devices at both ends of the asymmetric links.
1. When services are abnormal, select a reference port that supports 1588 synchronization at the end
node of the chain and connect the reference source to the reference port.
2022-07-08 295
Feature Description
2. Each device receives the time synchronization information delivered from the reference port and
calculates the offset between the restored time of the slave port and the time of the reference port.
3. NCE obtains the offset information fed back by each device and determines the asymmetric links and
offset values on the network.
4. NCE delivers the offset values to the devices at both ends of the asymmetric links.
As shown in Figure 1, the clock source can send clock signals to NodeBs through the 1588v2 clock, WAN
clock, synchronous Ethernet clock, or any combination of clocks.
Scenario description:
• GE links on the bearer network support the 1588v2 clock rather than the synchronous Ethernet clock.
Solution description:
• The Synchronous Digital Hierarchy (SDH) or synchronous Ethernet clock sends stratum 3 clock signals
through physical links. On the GE links that do not support the synchronous Ethernet clock, stratum 3
clock signals are transmitted through 1588v2.
• Disadvantage of the solution: Only frequency synchronization rather than time synchronization is
2022-07-08 296
Feature Description
performed.
Scenario description:
• The bearer and wireless networks are in the same clock domain.
Solution description:
• All nodes on the bearer network function as BC nodes, which support the link delay measurement
mechanism to handle fast link switching.
• Links or devices that do not support 1588v2 can be connected to devices with GPS or BITS clock
interfaces to perform time synchronization.
• Advantage of the solution: The time of all nodes is synchronous on the entire network.
• Disadvantage of the solution: All nodes on the entire network must support 1588v2.
2022-07-08 297
Feature Description
Figure 3 Networking diagram of the bearer and wireless networks in different clock domains
Scenario description:
Solution description:
• The GPS is used as a time source and is connected to the wireless IP clock server.
• BCs are deployed in the middle of the bearer network to synchronize the time of the intermediate
network.
• TCs are deployed on both ends of the bearer network. TCs only correct the message transmission delay
and send the time to NodeBs, but do not synchronize the time with the clock server.
• Advantage of the solution: The implementation is simple because the bearer network does not need to
synchronize with the clock server.
• Disadvantage of the solution: Devices on both ends of the bearer network need to support 1588v2 in
TCandBC mode.
Scenario description:
• The bearer and wireless networks are in the same clock domain.
2022-07-08 298
Feature Description
Solution description:
• Network-wide time synchronization is achieved from the core node in T-BC mode. All T-BC nodes
support path delay measurement to adapt to fast link switching.
• The advantage of the solution is that the network-wide time is synchronized to ensure the optimal
tracing path.
• The disadvantage of the solution is that all nodes on the network need to support 1588v2 and G.8275.1.
As shown in Figure 5, the clock server and the base station transmit TOP-encapsulated SMPTE-2059-2
packets over a bearer network enabled with QoS assurance (jitter < 20 ms).
Scenario Description
• The bearer network does not support SMPTE-2059-2 or the use of SyncE to restore frequency.
Solution Description
• Bearer network devices are connected to the wireless IP clock server, and SMPTE-2059-2 is used to
transmit and restore clock in E2E mode.
• The clock server sends timing messages in the SMPTE-2059-2 format. The bearer network transparently
transmits the timing messages. Upon receipt of the timing messages, NodeBs restore clock information.
• SMPTE-2059-2 packets are transparently transmitted over the bearer network by priority to ensure an
E2E jitter of less than 20 ms.
• Solution advantage: This solution is simply, with no need for bearer network devices to support SMPTE-
2059-2.
• Solution disadvantages: Only frequency synchronization rather than time synchronization is supported.
An E2E jitter of 20 ms is hard to guarantee.
2022-07-08 299
Feature Description
Terms
Terms Description
Synchronization
On a modern communications network, in most cases, the proper functioning of
telecommunications services requires network clock synchronization, meaning that the
frequency offset or time difference between devices must be kept in an acceptable range.
Network clock synchronization includes time synchronization and frequency synchronization.
Time synchronization
Time synchronization, also called phase synchronization, refers to the consistency of both
frequencies and phases between signals. This means that the phase offset between signals is
always 0.
Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a strict relationship
between signals based on a constant frequency offset or a constant phase offset, in which
signals are sent or received at the same average rate in a valid instance. In this manner, all
devices on the communications network operate at the same rate. That is, the phase
difference between signals remains a fixed value.
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers (IEEE), is a standard
PTP for Precision Clock Synchronization Protocol for Networked Measurement and Control
Systems. The Precision Time Protocol (PTP) is used for short.
Clock Logically, a physical network can be divided into multiple clock domains. Each clock domain
domain has a reference time, with which all devices in the domain are synchronized. Different clock
domains have their own reference time, which is independent of each other.
Clock node Each node on a time synchronization network is a clock. The 1588v2 protocol defines three
types of clocks: OC, BC, and TC.
Clock Clock source selection is a method to select reference clocks based on the clock selection
reference algorithm.
source
One-step In one-step mode, Sync messages in Delay mode and PDelay_Resp messages in PDelay mode
mode are stamped with the time when messages are sent.
Two-step In two-step mode, Sync messages in Delay mode and PDelay_Resp messages in PDelay mode
mode only record the time when messages are sent and carry no timestamps. The timestamps are
2022-07-08 300
Feature Description
Terms Description
Abbreviations
WiMax FDD Worldwide Interoperability for Microwave Access Frequency Division Duplex
WiMax TDD Worldwide Interoperability for Microwave Access Time Division Duplex
BC Boundary Clock
OC Ordinary Clock
TC Transparent Clock
Definition
1588 Adaptive Time Recovery (ATR) is a PTP-based technology that allows Routers to establish clock links
2022-07-08 301
Feature Description
and implement time synchronization over a third-party network using PTP packets in Layer 3 unicast mode.
1588 ATR is an advancement compared to 1588v2, the latter of which requires 1588v2 support on all
network devices.
1588 ATR is a client/server protocol through which servers communicate with clients to achieve time
synchronization.
When the time server (such as the SSU2000) supports only the 1588v2 unicast negotiation mode, the client
sends a negotiation request to the server, and the server sends time synchronization packets to the client
after the negotiation is established. The client is configured with the 1588 ATR hop-by-hop mode and
interconnected with the time server to achieve time synchronization in 1588v2 unicast negotiation mode.
After that, the client can function as a BC to provide time synchronization for downstream NodeBs.
Purpose
1588v2 is a software-based technology used to achieve frequency and time synchronization and can support
hardware timestamping to provide greater accuracy. However, 1588v2 requires support from all devices on
the live network.
To address this disadvantage, 1588 ATR is introduced to allow time synchronization over a third-party
network that includes 1588v2-incapable devices. On the live network, 1588v2 is preferred for 1588v2-
capable devices, and 1588 ATR is used when 1588v2-incapable devices exist.
Benefits
This feature offers the following benefits to carriers:
• Does not require 1588v2 to be supported by all network devices, reducing network construction costs.
• Fits for more network applications that meet time synchronization requirements.
Features Supported
The 1588 ATR features supported by NE40Es are as follows:
• An NE40E that functions as a 1588 ATR server can synchronize time information with upstream devices
using the BITS source and transmit time information to downstream devices.
• An NE40E that functions as a 1588 ATR server can synchronize time information with upstream devices
using 1588v2/G.8275.1 and transmit time information to downstream devices.
• The NE40E functioning as a 1588 ATR client supports time synchronization with upstream and
downstream devices in 1588 ATR hop-by-hop mode.
An NE40E can function only as the 1588 ATR server. The following restrictions apply to network deployment:
2022-07-08 302
Feature Description
• When 1588 ATR is used to implement time synchronization over a third-party network, reduce the packet delay
variation (PDV) and the number of devices on the third-party network as much as possible in order to ensure time
synchronization performance on clients. For details, see performance specifications for clients.
• The server and client communicate with each other through PTP packets which can be either Layer 3 IP packets or
single-VLAN-tagged packets. The PTP packets cannot carry two VLAN tags or the MPLS label.
• The interface used to send PTP packets on the server needs to be support 1588v2.
The NE40E supports the 1588 ATR client. Network deployment has the following restrictions:
• When the 1588 ATR client in hop-by-hop mode is interconnected with the time source in 1588v2 unicast
negotiation mode, the NE40E must be directly connected to the time source.
Synchronization Process
After negotiation is complete, 1588 ATR servers exchange PTP packets with clients to implement time
synchronization.
1588 ATR clock synchronization is implemented in two-way mode.
2022-07-08 303
Feature Description
3. The client sends a 1588 Delay_Req packet carrying timestamp t3 to the server.
4. The server receives the 1588 Delay_Req packet at timepoint t4 and then generates a Delay_Rep packet
and sends it to the slave clock.
The round-trip latency of the link between the server and client is (t4-t1)-(t3-t2). 1588 ATR requires the
same link latency on two links involved in the same round trip. Therefore, the offset of the client is [(t2 - t1)
-(t4 - t3)]/2, compared to the time of the server. The client then uses the calculation result to adjust its local
time.
Duration Mechanism
A 1588 ATR client supports the duration specified in Announce, Sync, and Delay Resp packets. The duration
can be placed to the TLV field in Signaling packets before they are sent to the server.
In normal situations, a client initiates a re-negotiation to a server before the duration expires so that the
server can continue providing synchronization with the client.
If a client becomes Down, it cannot initiate a re-negotiation. After the duration collapses, the server does not
send synchronization packets to the server any more.
Per-hop BC + Server
1588 ATR servers can synchronize time synchronization with upstream devices and send the time source
information to clients.
2022-07-08 304
Feature Description
Scenario Description
• Third-party networks (such as microwave and switch networks) do not support 1588v2 time
synchronization.
Solution Description
2022-07-08 305
Feature Description
• Configure 1588 ATR or an external Atom GPS timing module on the client to implement time
synchronization across third-party networks. BCs support the 1588 ATR server function. After
synchronizing time with an upstream device, a BC can function as an ATR server to provide the time
synchronization service for downstream NodeBs. A client can receive time synchronization information
through the ATOM GPS timing module or implement 1588 ATR time synchronization through
transparent transmission.
Scenario Description
• The time server (for example, the SSU2000) only supports the 1588v2 unicast negotiation mode.
• A client first sends a negotiation request to the server, which sends time synchronization packets back
to the client only after the negotiation relationship is established.
Solution Description
• You can configure the 1588 ATR hop-by-hop mode to interconnect the client with the time server in
order to implement time synchronization in 1588v2 unicast negotiation mode. The client then functions
as a BC to provide the time synchronization service for downstream NodeBs.
Scenario Description
2022-07-08 306
Feature Description
Solution Description
Lightweight clocks cannot be used in mobile backhaul scenarios, because lightweight time synchronization cannot
meet base station performance requirements.
Server-and-Client Mode
If the time node where the high-precision time source resides and the router close to base stations belong to
different VPNs, the interconnection device between the two VPNs needs to serve as a client to synchronize
time with the time source and as a server to provide the time service for the router close to base stations.
A device configured with the server-and-client mode is called a T-BC, which involves two important concepts:
• master-only vport: The master-only vport on a T-BC is always in the master state and outputs time
source information to the downstream device. It is usually used on an NE where multiple rings intersect.
The master-only vport outputs time information to the lower-layer network. It can also be used on an
NE connected to base stations to provide time information for base stations.
• vport: The status of the vport on a T-BC is not fixed. It is usually used on an NE where multiple rings
intersect. The NE uses the vport BMCA algorithm to implement ring network protection.
2022-07-08 307
Feature Description
Terms
Term Definition
Synchronization
Most telecommunication services running on a modern communications network require
network-wide synchronization. Synchronization means that the frequency offset or time
difference between devices must remain in a specified range. Clock synchronization is
categorized as frequency synchronization or time synchronization.
Time Time synchronization, also known as phase synchronization, refers to the consistency of both
synchronization
frequencies and phases between signals. That is, the phase offset between signals is always
0.
Frequency Frequency synchronization, also known as clock synchronization, refers to the strict
synchronization
relationship between signals based on a constant frequency offset or phase offset, in which
signals are sent or received at an average rate in a moment. In this manner, all devices in
the communications network operate at the same rate. That is, the difference of phases
between signals is a constant value.
IEEE 1588v2 A standard entitled Precision Clock Synchronization Protocol for Networked Measurement
PTP and Control Systems, defined by the Institute of Electrical and Electronics Engineers (IEEE). It
is also called the Precision Time Protocol (PTP).
ITU-T G.8275.2 defines the main protocols of 1588 ATR. Therefore, G.8275.2 usually refers to the
G.8275.2 1588 ATR feature.
2022-07-08 308
Feature Description
Background
As the commercialization of LTE-TDD and LTE-A accelerates, there is a growing need for time
synchronization on base stations. Traditionally, the GPS and PTP solutions were used on base stations to
implement time synchronization.
The GPS solution requires GPS antenna to be deployed on each base station, leading to high TCO. The PTP
solution requires 1588v2 support on network-wide devices, resulting in huge costs on network reconstruction
for network carriers.
Furthermore, GPS antenna can properly receive data from GPS satellites only when they are placed outdoor
and meet installation angle requirements. When it comes to indoor deployment, long feeders are in place to
penetrate walls, and site selection requires heavy consideration due to high-demanding lightning protection.
These disadvantages lead to high TCO and make GPS antenna deployment challenging on indoor devices.
Another weakness is that most indoor equipment rooms are leased, which places strict requirements for
coaxial cables penetrating walls and complex application procedure. For example, taking security factors into
consideration, the laws and regulations in Japan specify that radio frequency (RF) cables are not allowed to
be deployed in rooms by penetrating walls.
To address the preceding challenges, the Atom GPS timing system is introduced to NE40Es. Specifically, an
Atom GPS module which is comparable to a lightweight BITS device is inserted to an NE40E to provide GPS
access to the bearer network. Upon receipt of GPS clock signals, the Atom GPS module converts them into
SyncE signals and then sends the SyncE signals to NE40Es. Upon receipt of GPS time signals, the Atom GPS
module converts them into 1588v2 signals and then sends the 1588v2 signals to base stations. This
2022-07-08 309
Feature Description
Benefits
This feature offers the following benefits to carriers:
• For newly created time synchronization networks, the Atom GPS timing system reduces the deployment
costs by 80% compared to traditional time synchronization solutions.
• For the expanded time synchronization networks, the Atom GPS timing system can reuse the legacy
network to protect investment.
4.17.2.1 Modules
The Atom GPS timing system includes two types of modules: Atom GPS modules and clock/time processing
modules on Routers.
Related Modules
Figure 1 GPS timing
Atom GPS timing involves two modules: Atom GPS timing module an clock/time processing module on the
Router.
• GPS receiver: processes GPS RF signals and obtains frequency and time information from the GPS RF
signals.
■ Frequency PLL: locks the 1PPS reference clocks and outputs a high-frequency clock.
2022-07-08 310
Feature Description
■ Analog PLL (APLL): multiplies the system clock to a higher frequency clock.
■ Time PLL: locks the UTC time and outputs the system time.
• Real-time clock (RTC): provides real-time timestamps for PTP event messages.
• PTP grandmaster (GM): functions as the SyncE slave to obtain SyncE clock data.
• PTP BC: This module typically functions as a slave BC to process PTP messages and extract PTP
information.
2. The Atom GPS module uses a built-in frequency PLL module to trace and lock 1PPS phase and
frequency and output the system clock.
3. The Atom GPS module uses a built-in APLL module to multiply the system clock to a clock at GE rate
which is then used as the SyncE transmit clock.
4. The device uses the GE interface to obtain SyncE clock signals from the Atom GPS module and
transmits the clock signals to downstream devices.
2. The Atom GPS module uses a built-in time PLL module to trace time PLL, lock the UTC time, and
output the system time.
3. The Atom GPS module uses a built-in time RTC module to obtain the system time.
2022-07-08 311
Feature Description
4. The Atom GPS module uses a built-in PTP GM module to process PTP messages. The timestamps
carried in PTP event messages are generated by the RTC module.
5. The device uses the GE interface to obtain PTP time signals from the Atom GPS module and transmits
the time signals to downstream devices.
Terms
2022-07-08 312
Feature Description
Term Definition
Synchronization
Most telecommunication services running on a modern communications network require
network-wide synchronization. Synchronization means that the frequency offset or time
difference between devices must remain in a specified range. Clock synchronization is
categorized as frequency synchronization or time synchronization.
Time Time synchronization, also known as phase synchronization, refers to the consistency of both
synchronization
frequencies and phases between signals. That is, the phase offset between signals is always
0.
Frequency Frequency synchronization, also known as clock synchronization, refers to the strict
synchronization
relationship between signals based on a constant frequency offset or phase offset, in which
signals are sent or received at an average rate in a moment. In this manner, all devices in
the communications network operate at the same rate. That is, the difference of phases
between signals is a constant value.
IEEE 1588v2 A standard entitled Precision Clock Synchronization Protocol for Networked Measurement
PTP and Control Systems, defined by the Institute of Electrical and Electronics Engineers (IEEE). It
is also called the Precision Time Protocol (PTP).
2022-07-08 313
Feature Description
Background
As the commercialization of LTE-TDD and LTE-A accelerates, there is a growing need for time
synchronization on base stations. Traditionally, the GNSS (GPS/GLONASS/Beidou) and PTP solutions were
used on base stations to implement time synchronization.
The GNSS solution requires GNSS antenna to be deployed on each base station, leading to high TCO. The
PTP solution requires 1588v2 support on network-wide devices, resulting in huge costs on network
reconstruction for network carriers.
Furthermore, GNSS antenna can properly receive data from GNSS satellites only when they are placed
outdoor and meet installation angle requirements. When it comes to indoor deployment, long feeders are in
place to penetrate walls, and site selection requires heavy consideration due to high-demanding lightning
protection. These disadvantages lead to high TCO and make GNSS antenna deployment challenging on
indoor devices. Another weakness is that most indoor equipment rooms are leased, which places strict
requirements for coaxial cables penetrating walls and complex application procedure. For example, taking
security factors into consideration, the laws and regulations in Japan specify that radio frequency (RF) cables
are not allowed to be deployed in rooms by penetrating walls.
To address the preceding challenges, the Atom GNSS timing system is introduced to NE40Es. Specifically, an
Atom GNSS module which is comparable to a lightweight BITS device is inserted to an NE40E to provide
GNSS access to the bearer network. Upon receipt of GNSS clock signals, the Atom GNSS module converts
them into SyncE signals and then sends the SyncE signals to NE40Es. Upon receipt of GNSS time signals, the
Atom GNSS module converts them into 1588v2 signals and then sends the 1588v2 signals to base stations.
This mechanism greatly reduces the TCO for carriers.
Benefits
This feature offers the following benefits to carriers:
• For newly created time synchronization networks, the Atom GNSS timing system reduces the
deployment costs by 80% compared to traditional time synchronization solutions.
• For the expanded time synchronization networks, the Atom GNSS timing system can reuse the legacy
network to protect investment.
4.18.2.1 Modules
The Atom GNSS timing system includes two types of modules: Atom GNSS modules and clock/time
processing modules on Routers.
2022-07-08 314
Feature Description
Related Modules
Figure 1 Atom GNSS timing
Atom GNSS timing involves two modules: Atom GNSS timing module and clock/time processing module on
the Router.
• GNSS receiver: processes GNSS RF signals and obtains frequency and time information from the GNSS
RF signals.
■ Frequency PLL: locks the 1PPS reference clocks and outputs a high-frequency clock.
■ Analog PLL (APLL): multiplies the system clock to a higher frequency clock.
■ Time PLL: locks the UTC time and outputs the system time.
• Real-time clock (RTC): provides real-time timestamps for PTP event messages.
• PTP grandmaster (GM): functions as the SyncE slave to obtain SyncE clock data.
• PTP BC: This module typically functions as a slave BC to process PTP messages and extract PTP
information.
2022-07-08 315
Feature Description
2. The Atom GNSS module uses a built-in frequency PLL module to trace and lock 1PPS phase and
frequency and output the system clock.
3. The Atom GNSS module uses a built-in APLL module to multiply the system clock to a clock at GE rate
which is then used as the SyncE transmit clock.
4. The device uses the GE interface to obtain SyncE clock signals from the Atom GNSS module and
transmits the clock signals to downstream devices.
2. The Atom GNSS module uses a built-in time PLL module to trace time PLL, lock the UTC time, and
output the system time.
3. The Atom GNSS module uses a built-in time RTC module to obtain the system time.
4. The Atom GNSS module uses a built-in PTP GM module to process PTP messages. The timestamps
carried in PTP event messages are generated by the RTC module.
5. The device uses the GE interface to obtain PTP time signals from the Atom GNSS module and
transmits the time signals to downstream devices.
2022-07-08 316
Feature Description
Terms
Term Definition
Synchronization
Most telecommunication services running on a modern communications network require
network-wide synchronization. Synchronization means that the frequency offset or time
difference between devices must remain in a specified range. Clock synchronization is
categorized as frequency synchronization or time synchronization.
Time Time synchronization, also known as phase synchronization, refers to the consistency of both
synchronization
frequencies and phases between signals. That is, the phase offset between signals is always
0.
Frequency Frequency synchronization, also known as clock synchronization, refers to the strict
synchronization
relationship between signals based on a constant frequency offset or phase offset, in which
signals are sent or received at an average rate in a moment. In this manner, all devices in
the communications network operate at the same rate. That is, the difference of phases
between signals is a constant value.
IEEE 1588v2 A standard entitled Precision Clock Synchronization Protocol for Networked Measurement
PTP and Control Systems, defined by the Institute of Electrical and Electronics Engineers (IEEE). It
2022-07-08 317
Feature Description
Term Definition
The Network Time Protocol (NTP) is supported only by a physical system (PS).
Definition
The Network Time Protocol (NTP) is an application layer protocol in the TCP/IP protocol suite. NTP
synchronizes the time among a set of distributed time servers and clients. NTP is built on the Internet
Protocol (IP) and User Datagram Protocol (UDP). NTP messages are transmitted over UDP, using port 123.
NTP is evolved from the Time Protocol and the ICMP Timestamp message, but is specifically designed to
maintain time accuracy and robustness.
2022-07-08 318
Feature Description
Purpose
In the NTP model, a number of primary reference sources, synchronized to national standards by wire or
radio, are connected to widely accessible resources, such as backbone gateways. These gateways act as
primary time servers. The purpose of NTP is to convey timekeeping information from these primary time
servers to other time servers (secondary time servers). Secondary time servers are synchronized to the
primary time servers. The servers are connected in a logical hierarchy called a synchronization subnet. Each
level of the synchronization subnet is called a stratum. For example, the primary time servers are stratum 1,
and the secondary time servers are stratum 2. Servers with larger stratum numbers are more likely to have
less accurate clocks than those with smaller stratum numbers.
When multiple time servers exist on a network, use a clock selection algorithm to synchronize the stratums and time
offsets of time servers. This helps improve local clock precision.
There is no provision for peer discovery or virtual-circuit management in NTP. Duplicate detection is
implemented using processing algorithms.
Implementation
Figure 1 illustrates the process of implementing NTP. Device A and Device B are connected through a wide
area network (WAN). They both have independent system clocks that are synchronized through NTP.
In the following example:
• Before Device A synchronizes its system clock to Device B, the clock of Device A is 10:00:00 am and the
clock of Device B is 11:00:00 am.
• Device B functions as an NTP server, and Device A must synchronize its clock signals with Device B.
2022-07-08 319
Feature Description
1. Device A sends an NTP packet to Device B. When the packet leaves Device A, it carries a timestamp of
10:00:00 a.m. (T1).
2. When the NTP packet reaches Device B, Device B adds a receive timestamp of 11:00:01 a.m. (T2) to
the packet.
3. When the NTP packet leaves Device B, Device B adds a transmit timestamp of 11:00:02 a.m. (T3) to
the packet.
4. When Device A receives the response packet, it adds a new receive timestamp of 10:00:03 a.m. (T4) to
the packet.
Device A uses the received information to calculate the following important values:
• Roundtrip delay for the NTP packet: Delay = (T4 - T1) - (T3 - T2).
• Relative offset between Device A and Device B clocks: Offset = [(T2 - T1) + (T3 - T4)]/2.
According to the delay and the offset, Device A re-sets its own clock to synchronize with the clock of
Device B.
2022-07-08 320
Feature Description
• Transmit process
• Receive process
• Update process
These processes share a database and are interconnected through a message-transfer system.
When the client has multiple peers, its database is divided into several parts, with each part dedicated to a
peer.
Figure 1 shows the NTP implementation model.
Transmit Process
The transmit process, controlled by each timer for peers, collects information in the database and sends NTP
messages to the peers.
Each NTP message contains a local timestamp marking when the message is sent or received and other
information necessary to determine a clock stratum and manage the association. The rate at which
messages are sent is determined by the precision required by the local clock and its peers.
Receive Process
The receive process receives messages, including NTP messages and other protocol messages, as well as
information sent by directly connected radio clocks.
When receiving an NTP message, the receive process calculates the offset between the peer and local clocks
and incorporates it into the database along with other information that is useful for locating errors and
selecting peers.
Update Process
The update process handles the offset of each peer after receiving NTP response messages and selects the
most precise peer using a specific selection algorithm.
This process may involve either many observations of few peers or a few observations of many peers,
depending on the accuracy.
2022-07-08 321
Feature Description
The functions of the primary and secondary time servers are as follows:
• A primary time server is directly synchronized to a primary reference source, usually a radio clock or
global positioning system (GPS).
• A secondary time server is synchronized to another secondary time server or a primary time server.
Secondary time servers use NTP to send time information to other hosts in a Local Area Network (LAN).
When there is no fault, primary and secondary servers in the synchronization subnet assume a hierarchical
master-slave structure, with the primary servers at the root and secondary servers at successive stratums
toward the leaf node. The lower the stratum, the less precise the clock (where one is the highest stratum).
As the stratum increases from one, the clock sample accuracy gradually decreases, depending on the
network paths and local-clock stability. To prevent tedious calculations necessary to estimate errors in each
specific configuration, it is useful to calculate proportionate errors. Proportionate errors are approximate and
2022-07-08 322
Feature Description
based on the delay and dispersion relative to the root of the synchronization subnet.
This design helps the synchronization subnet in automatically reconfiguring the hierarchical master-slave
structure to produce the most accurate and reliable time, even when one or more primary or secondary
servers or the network paths in the subnet fail. If all primary servers fail, one or more backup primary servers
continue operations. If all primary servers over the subnet fail, the remaining secondary servers then
synchronize among themselves. In this case, distances reach upwards to a pre-selected maximum "infinity".
Upon reaching the maximum distance to all paths, a server drops off the subnet and runs freely based on its
previously calculated time and frequency. The timekeeping errors of a Device having a stabilized oscillator
are not more than a few milliseconds per day as these computations are expected to be very precise,
especially in terms of frequency.
In the case of multiple primary servers, a specific selection algorithm is used to select the server at a
minimum synchronization distance. When these servers are at approximately the same synchronization
distance, they may be selected randomly.
• The accuracy cannot be decreased because of random selection when the offset between the primary
servers is less than the synchronization distance.
• When the offset between the primary servers is greater than the synchronization distance, use filtering
and selection algorithms to select the best servers available and discard others.
2022-07-08 323
Feature Description
deleted in the last minute of the current day. The 2 bits, bit 0
and bit 1, are coded as follows:
00: no warning.
01: last minute has 61 seconds.
10: last minute has 59 seconds.
11: alarm condition (clock not synchronized).
Root Delay 32 bits Roundtrip delay (in ms) between the client and the primary
reference source.
Reference Timestamp 64 bits Local time at which the local clock was last set or corrected.
Value 0 indicates that the local clock is never synchronized.
Originate Timestamp 64 bits Local time at which the NTP request is sent by the client.
Receive Timestamp 64 bits Local time at which the request arrives at the time server
2022-07-08 324
Feature Description
Transmit Timestamp 64 bits Local time at which the response message is sent by the time
server to the client.
• Peer mode
• Client/server mode
• Broadcast mode
• Multicast mode
• Manycast mode
Peer Mode
In peer mode, the active and passive ends can be synchronized. The end with a lower stratum (larger
stratum number) is synchronized to the end with a higher stratum (smaller stratum number).
• Symmetric active: A host operating in this mode periodically sends messages regardless of the
reachability or stratum of its peer. The host announces its willingness to synchronize and be
synchronized by its peer.
The symmetric active end is a time server close to the leaf node in the synchronization subnet. It has a
low stratum (large stratum number). In this mode, time synchronization is reliable. A peer is configured
on the same stratum and two peers are configured on the stratum one level higher (one stratum
number smaller). In this case, synchronization poll frequency is not important. Even when error packets
are returned because of connection failures, the local clocks are not significantly affected.
• Symmetric passive: A host operating in this mode receives packets and responds to its peer. The host
announces its willingness to synchronize and be synchronized by its peer.
■ The host receives messages from a peer operating in the symmetric active mode.
The host operating in the symmetric passive mode is at a low stratum in the synchronization subnet. It does
not need to know the feature of the peer. A connection between peers is set up and status variables must be
updated only when the symmetric passive end receives NTP messages from the peer.
2022-07-08 325
Feature Description
In NTP peer mode, the active end functions as a client and the passive end functions as a server.
Client/Server Mode
• Client: A host operating in this mode periodically sends messages regardless of the reachability or
stratum of the server. The host synchronizes its clock with that on the server but does not alter the
clock on the server.
• Server: A host operating in this mode receives packets and responds to the client. The host provides
synchronization information for all its clients but does not alter its own clock.
A host operating in the client mode periodically sends NTP messages to a server during and after its restart.
The server does not need to retain state information when the client sends the request. The client freely
manages the interval for sending packets according to actual conditions.
Kiss-o'-Death (KOD) packets provide useful information to a client and are used for status reporting and
access control. When KOD is enabled on the server, the server can send packets with kiss codes DENY and
RATE to the client.
• After the client receives a packet with kiss code DENY, the client demobilizes any associations with that
server and stops sending packets to that server.
• After the client receives a packet with kiss code RATE, the client immediately reduces its polling interval
to that of the server and continues to reduce it each time it receives a RATE kiss code.
Broadcast Mode
• A host operating in broadcast mode periodically sends clock-synchronization packets to the broadcast
IPv4 address regardless of the reachability or stratum of the clients. The host provides synchronization
information for all its clients but does not alter its own clock.
• A client listens to the broadcast packets sent by the server. When receiving the first broadcast packet,
the client temporarily starts in the client/server mode to exchange packets with the server. This allows
the client to estimate the network delay. The client then reverts to the broadcast mode, continues to
listen to the broadcast packets, and re-synchronizes the local clock based on the received broadcast
packets.
The broadcast mode is run on multiple workstations. Therefore, high-speed LANs of the highest accuracy are
not required. In a typical scenario, one or more time servers in a LAN periodically send broadcast packets to
the workstations. The LAN packet transmission delay is only milliseconds.
If multiple time servers are available to enhance reliability, a clock selection algorithm is useful.
Multicast Mode
2022-07-08 326
Feature Description
• A host operating in the multicast mode periodically sends clock-synchronization packets to a multicast
IPv4/IPv6 address. The host is usually a time server using high-speed multicast media in a LAN. The host
provides synchronization information for all its peers but does not alter its own clock.
• A client listens to multicast packets sent by the server. After receiving the first multicast packet, the
client temporarily starts in the client/server mode to exchange packets with the server. This allows the
client to estimate the network delay. The client then reverts to the multicast mode, continues to listen
to the multicast packets, and re-synchronizes the local clock based on the received multicast packets.
Manycast Mode
• A client operating in manycast mode sends periodic request packets to a designated IPv4 or IPv6
multicast address in order to search for a minimum number of associations. It starts with a time to live
(TTL) value equal to one and continuously adding one to it until the minimum number of associations is
made, or when the TTL reaches a maximum value. If the TTL reaches its maximum value, and still not
enough associations are mobilized, the client stops transmission for a timeout period to clear all
associations, and then repeats the search process. If a minimum number of associations have been
mobilized, then the client starts transmitting one packet per timeout period to maintain the
associations.
• A designated manycast server within range of the TTL field in the packet header listens for packets with
that address. If a server is suitable for synchronization, it returns an ordinary server (mode 4) packet
using the client's unicast address.
Manycast mode is applied to a small set of servers scattered over a network. Clients can discover and
synchronize to the closest manycast server. Manycast can especially be used where the identity of the server
is not fixed and a change of server does not require reconfiguration of all the clients on the network.
NTP Operation
• A host operating in an active mode (symmetric active, client or broadcast mode) must be configured.
• Its peer operating in a passive mode (symmetric passive or server mode) requires no pre-configuration.
An error occurs when the host and its peer operate in the same mode. In such a case, one ignores messages
sent by the other, and their associations are then dissolved.
2022-07-08 327
Feature Description
Transmit Process
In all modes (except the client mode with a broadcast server and the server mode), the transmit process
starts when the peer timer expires. In the client mode with a broadcast server, messages are never sent. In
the server mode, messages are sent only in response to received messages. This process is also invoked by
the receive process when the received NTP message does not result in a local persistent association. To
ensure a valid response, the transmit timestamp must be added to packets to be sent. Therefore, the values
of variables carried in the response packet must be accurately saved.
Broadcast and multicast servers that are not synchronized will start the transmit process when the peer
timer expires.
Receive Process
The receive process starts when an NTP message arrives. First, it checks the mode field in the packet. Value 0
indicates that the peer runs an earlier NTP version. If the version number in the packet matches the current
version, the receive process continues with the following steps. If the version numbers do not match, the
packet is discarded, and the association (if not pre-configured) is dissolved. The receive process various
according to the following result of calculating the combination of the local and remote clock modes:
• If both the local and remote hosts are operating in client mode, an error occurs, and the packet is
discarded.
• If the result is recv, the packet is processed, and the association is marked reachable if the received
packet contains a valid header. In addition, if the received packet contains valid data, the clock-update
process is called to update the local clock. If the association was not previously configured, it is
dissolved.
• If the result is xmit, the packet is processed, and an immediate response packet is sent. The association
is then dissolved if it is not pre-configured.
• If the result is pkt, the packet is processed, and the association is marked reachable if the received
packet contains a valid header. In addition, if the received packet contains valid data, the clock-update
process is called to update the local clock. If the association was not pre-configured, an immediate reply
is sent, and the association is dissolved.
Packet Process
The packet process checks message validity, calculates delay/offset samples, and invokes other processes to
filter data and select a reference source. First, the transmit timestamp must be different from the transmit
timestamp in the last message. If the transmit timestamp are the same, the message may be an outdated
duplicate message.
Second, the originate timestamp must match the last message sent to the same peer. If a mismatch occurs,
the message may be out of order, forged, or defective.
Lastly, the packet process uses a clock selection algorithm to select the best clock sample from the specified
2022-07-08 328
Feature Description
clocks or clock groups at different stratums. The delay (peer delay), offset (peer offset), and dispersion (peer
dispersion) for the peer are all determined.
Clock-Update Process
After the offset, delay, and dispersion of the valid clock are determined by the clock-filter process, the clock-
selection process invokes the clock-update process. The result of the clock-selection and clock-combining
processes is the final clock correction value. The local-clock updates the local clock with this value. If no
reference source is found after these processes, the clock-update process does not perform any other
operation.
The clock-selection is then invoked. It contains two algorithms: intersection and clustering.
• The intersection algorithm generates a list of candidate peers suitable to be the reference source and
calculates a confidence interval for each peer. It discards falsetickers using a technique adopted from
Marzullo and Owicki [MAR85].
• The clustering algorithm orders the list of remaining candidates based on their stratums and
synchronization distances. It repeatedly discards outlier peers based on the dispersion until only the
most accurate, precise, and stable candidates remain.
If the offset, delay, and dispersion of the candidate peers are almost identical, first analyze the clock
situation by combining candidates. Then provide the parameters determined through comprehensive analysis
to the local end for updating the local clock.
Static Associations
Static associations are set up using commands.
Dynamic Associations
Dynamic associations are set up when an NTP packet is received by the client or peer.
2022-07-08 329
Feature Description
server to set up the association because it only responds passively to the client request.
• In symmetric peer mode, you must configure the IP address of the symmetric peer on the symmetric
active end. In such a case, a static association is established on the symmetric active end.
• In multicast mode, you must configure the multicast IP addresses of the interfaces on the multicast
server. In such a case, a static association is established on the server. You must also configure the
multicast IP address of the client on the interface, which listens to the multicast NTP packets. This is not
intended for setting up a static association but is intended for setting up a dynamic association after the
client receives a packet from the server.
• In broadcast mode, you must enable the server mode on the interfaces of the broadcast server. In such
a case, a static association is set up on the server. You must also configure the client mode on the
interface, which should listen to the broadcast NTP packets. This is not intended for setting up a static
association but is intended for setting up a dynamic association after the client receives a packet from
the server.
Access Control
The NTP is designed to handle accidental or malicious data modification or destruction. These problems
typically do not result in timekeeping errors on other time servers in the synchronization subnet. The success
of this design is, however, based on the redundant time servers and various network paths. It is also
assumed that data modification or destruction does not occur simultaneously on many time servers over the
synchronization subnet. To prevent subnet vulnerability, select trusted time servers and allow them to be the
clock sources.
• Access authority
Access control protects a local NTP service by setting the access authority. This is a simple measure to
ensure security.
• NTP authentication
Enable NTP authentication on networks that demand high security.
VPN can also be used to link two separate networks over the Internet and operate as a single network. This
is useful for organizations that have two physical sites. Rather than setting up VPN connections on each PC,
the connection between the two sites can be handled by devices, one at each location. After the
configuration is complete, the devices maintain a constant tunnel between them that links the two sites. The
links between nodes of a VPN are formed over virtual circuits between hosts of the larger network. VPNs are
often deployed by organizations to provide remote access to a secure organizational network.
Figure 1shows VPN support.
• Customer edge (CE): physically deployed at the customer site that provides access to VPN services.
• Provider edge (PE): a device or set of devices at the edge of the provider network and provides a
customer site view. PEs are aware of the VPNs that connect through them and maintain the VPN status.
• Provider (P): a device that operates inside the core network of the service provider and does not directly
connect to any customer endpoint. It is a part of implementing the provider-provisioned virtual private
network (PPVPN). It is not aware of VPN and does not maintain the VPN status. VPN is configured on
the interfaces on the PE devices that connect to the CE devices to provide VPN services.
Applicable Environment
The synchronization of clocks over the network is increasingly important as the network topology becomes
increasingly complex. NTP was developed to implement the synchronization of system clocks over the
network.
NTP ensures clock synchronization for the following applications:
• When incremental backup is performed between the standby server and client, both system clocks must
be consistent.
• Complicated events are profiled by multiple systems. To ensure the order of events, multiple systems
must be synchronized to the same clock.
• Normal Remote Procedure Call (RPC) should be ensured. To prevent the system from repeatedly calling
2022-07-08 331
Feature Description
a process and to ensure that a call has a fixed period, the system clocks must be synchronized;
otherwise, a call may time out before being performed.
• Certain applications must know the time when a user logs in to the system or when a file is modified.
• On a network, the offset between system clocks may be 1 minute or less. If the network is large, it is
impossible for the network administrator to enter only the clock datetime command (command for
time setting) to adjust system clocks.
• Collecting timestamps for debugging and events on different Devices is not helpful unless all these
Devices are synchronized to the same clock.
NTP synchronizes all clocks of network devices so that the devices can provide multiple applications based
on the uniform time. A local NTP end can be a reference source for other clocks or synchronize its clock to
other clock sources. Clocks on the network exchange time information and adjust the time until all are
almost identical.
Application Instances
As shown in Figure 1, the time server B in the LAN is synchronized to the time server A on the Internet, and
the hosts in the LAN are synchronized to the time server B in the LAN. In this way, the hosts are
synchronized to the time server on the Internet.
Definition
The open programmability system (OPS) is an open platform that provides OPS application programming
interfaces (APIs) to achieve device programmability, allowing third-party applications to run on the device.
2022-07-08 332
Feature Description
Purpose
Customers may require devices with specific openness so that they can develop their own functions and
deploy proprietary management policies to implement automated O&M, thereby lowering management
costs. However, conventional network devices provide only limited functions and predefined services. As
networks continue to develop, the static and inflexible service provisioning mode cannot meet the
requirements for diversified and differentiated services.
To meet the preceding requirements, Huawei offers an open programmable platform called OPS. The OPS
enables users and third-party developers to develop and deploy network management policies using open
OPS APIs. Through programmability, the system implements rapid service expansion, automatic function
deployment, and intelligent device management, helping to reduce network O&M costs and simplify
network operations.
Benefits
The OPS offers the following benefits:
• Supports user-defined configurations and programs, enabling flexible and dynamic service deployment
and simplifying network device management.
Security
The OPS provides the following security measures:
• Operation security: Resources are isolated by module and their usage can be monitored.
• Security of important information: OPS APIs use a secure communication protocol to prevent
information leakage during transmission. However, local data and operation security needs to be
assured by users.
2022-07-08 333
Feature Description
2022-07-08 334
Feature Description
OPS APIs are designed based on REST architectural principles. These principles enable web services to be
designed with a focus on system resources. The OPS opens managed objects (MOs), each of which is
uniquely identified by a Uniform Resource Identifier (URI), to achieve device openness. You can perform
operations on these objects using standard HTTP methods, such as GET (query), PUT (modify), POST
(create), and DELETE (delete).
Currently, the system integrates the Python running environment, enabling it to run Python scripts. Such
scripts need to define the method of sending HTTP requests to the system based on OPS APIs. By sending
HTTP requests to the system, Python scripts can be used to manage the system.
For details about OPS APIs supported by the device, see OPS API Reference.
configured trigger condition is met. Maintenance assistants enable the device to monitor its running status
and take appropriate actions, thereby improving system maintainability.
You can run the condition timer cron command to set the execution time of a maintenance assistant, in
cron format, so that a maintenance assistant can be run one or more times at specified times, dates, or
intervals. Table 2 lists the cron time formats.
2022-07-08 336
Feature Description
Execute an assistant time1-time2 time1 and time2 are The condition timer
within a time range. integers that specify the cron 0 0-3 1 3 * 2020
start and end time of an command configures a
assistant task, maintenance assistant to
respectively. They are be executed at the
connected by a hyphen following times:
(-) without spaces. time2 00:00 on March 1, 2020
must be greater than or 01:00 on March 1, 2020
equal to time1.
02:00 on March 1, 2020
The time is calculated in
03:00 on March 1, 2020
the format of time1,
time1 + 1, time1 + 2, ...,
time2.
2022-07-08 337
Feature Description
Execute an assistant at a The preceding formats The times are separated The condition timer
specified time can be used together. by a comma (,) without cron 0 0/10,2,4-5 1 3 *
combination. spaces. 2020 command
configures a
maintenance assistant to
be executed at the
following times:
00:00 on March 1, 2020
02:00 on March 1, 2020
04:00 on March 1, 2020
05:00 on March 1, 2020
10:00 on March 1, 2020
20:00 on March 1, 2020
In addition, the OPS supports the Maintain-Probe (MTP) function, which uses a maintenance assistant to
monitor protocol connectivity. If a protocol connection is torn down, the maintenance assistant script is run
to collect information about this event, thereby improving device maintainability.
1. Compile a Python script. For details, see Python APIs Supported by a Device.
2022-07-08 338
Feature Description
2. Upload the Python script to a device. For details, see File System Configuration.
3. Install the Python script. The Python script can be executed on a device only after being installed.
If you use a maintenance assistant, you can configure it with a Python script and trigger conditions for
executing the Python script. Then the system monitors the device running status in real time and
automatically executes the Python script when the specified trigger conditions are met.
For details about how to install and execute Python scripts, see OPS Configuration.
• The OPS requires you to be familiar with Python and know how to correctly compile Python scripts.
• The following Python script is only an example. You can modify it as required.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import traceback
import http.client
import string
# Define a class for invoking the OPS API. This class defines some methods to perform the operations of setting up an
HTTP connection.
# This part can be directly invoked without being modified.
class OPSConnection(object):
"""Make an OPS connection instance."""
2022-07-08 339
Feature Description
2022-07-08 340
Feature Description
# Specify the URI of system startup information. URIs identify management objects defined in OPS APIs. Different
management objects have different URIs.
# Modify the URI as required. For details about the URIs supported by the device, see the OPS API Reference.
uri = "/cfg/startupInfos/startupInfo"
# Specify the request content to be sent. This part corresponds to the URIs. Different URIs correspond to different
request contents.
# Modify the request content based on used URIs. For details on the format of the request content, see the OPS API
Reference.
req_data = \
'''<?xml version="1.0" encoding="UTF-8"?>
<startupInfo>
</startupInfo>
'''
# Execute a GET request. uri and req_data indicate the request URI and request content, respectively. ret indicates
whether the request is successful. rsp_data indicates the response data returned by the system after the request is
executed. For details about the format of the response data, see the OPS API Reference.
# The following is the response data about the system startup information. You can parse the response data to
obtain the system startup information.
'''
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply>
<data>
<cfg xmlns="http://www.huawei.com/netconf/vrp" format-version="1.0" content-version="1.0">
<startupInfos>
<startupInfo>
<position>6</position>
<nextStartupFile>flash:/vrpcfg.cfg</nextStartupFile>
<configedSysSoft>flash:/system-software.cc</configedSysSoft>
<curSysSoft>flash:/system-software.cc</curSysSoft>
<nextSysSoft>flash:/system-software.cc</nextSysSoft>
<curStartupFile>flash:/vrpcfg.cfg</curStartupFile>
<curPatchFile>NULL</curPatchFile>
<nextPatchFile>NULL</nextPatchFile>
</startupInfo>
</startupInfos>
</cfg>
</data>
</rpc-reply>
'''
# You can change the request type get() as required. For example, you can change it to set() or create().
ret, _, rsp_data = ops_conn.get(uri, req_data)
if ret != http.client.OK:
return None
return rsp_data
# The main() function defines the operations to be performed during script running. You can modify the function as
required.
def main():
"""The main function."""
# host indicates the loop address. Currently, OPS APIs support only internal invoking of the device, that is, the
2022-07-08 341
Feature Description
value is localhost.
host = "localhost"
try:
# Set up an HTTP connection.
ops_conn = OPSConnection(host)
# Invoke a function to obtain system startup information.
rsp_data = get_startup_info(ops_conn)
# Disable the HTTP connection.
ops_conn.close()
return
except:
errinfo = traceback.format_exc()
print(errinfo)
return
if __name__ == "__main__":
main()
• The Python APIs provided by the embedded running environment are unavailable outside the device.
• When a script assistant is used to execute a Python script, the ops_condition() and ops_execute() functions must be
defined in the script to set trigger conditions and tasks.
Function Description
After you subscribe to CLI events, if the entered character strings of a CLI match the regular expression, the
system executes the ops_execute() function in the Python script.
The OPS allows the system to use the Python script to open/close a CLI channel and execute commands.
Command Prototype
# Subscribe to a CLI event.
opsObj.cli.subscribe(tag, pattern, enter=False, sync=True, async_skip=False, sync_wait=30)
This API can only be used in the ops_condition() function of the maintenance assistant script.
# Execute a command.
2022-07-08 342
Feature Description
Parameter Description
Table 1 describes parameters supported by CLI event subscription APIs.
Method Description
enter The value can be True or False. True indicates that the regular expression is
matched immediately after you press Enter. False indicates that a regular
expression is matched after the keyword is supplemented.
sync Indicates whether the CLI terminal waits for script execution after a command
event is triggered. True indicates yes, and False indicates no.
async_skip The value can be True or False, indicating whether the original command is
skipped. (This setting takes effect only when sync is set to False.) True
indicates that the original command is not executed, and False indicates that
the original command is executed.
sync_wait The value is an integer ranging from 1 to 100, indicating the time during which
the CLI terminal waits for script execution. (This setting takes effect only when
sync is set to True.)
command Specifies a command to be executed. For example, system-view im. You do not
need to press Enter; the CLI automatically adds a carriage return. The value can
only be one command.
choice Specifies a lexical type, used for auto reply for interactive commands. choice =
{"Continue?": "n", "save": "n"}. A maximum of eight options are supported.
2022-07-08 343
Feature Description
Method Description
Multiple lines are entered for multi-line commands, such as header login
information. For example, choice={"": "a\r\nb\r\n\a"}.
■ First return value: The value 0 indicates a success, and the value 1 indicates a failure.
■ Second return value: This value describes success or failure reasons, expressed in a character string.
■ First return value: None indicates an error, and other values indicate command handles.
■ First return value: If None is returned, the command fails to be sent to the CLI or command
execution times out. Otherwise, the command output is returned. Each data package is 32 KB in
size, separated at a carriage return.
■ Second return value: If Next is 0, there is no more output. If Next is 1, more output will be
displayed. This function is still called for obtaining the next batch of data, except that you must set
command to None and choice to None.
Example
test.py
import ops
import igpcomm
def ops_condition(_ops):
_ops.cli.subscribe("con11","logbuffer1",True,True,False,10)
_ops.correlate("con11")
return ret
def ops_execute(_ops):
handle, err_desp= _ops.cli.open()
choice = {"Continue": "y", "save": "n"}
_ops.cli.execute(handle,"sys")
_ops.cli.execute(handle,"pm",None)
_ops.cli.execute(handle,"undo statistics-task a",choice)
_ops.cli.execute(handle,"commit",None)
ret = _ops.cli.close(handle)
print 'test2 =',ret
return 0
2022-07-08 344
Feature Description
1. When the front end executes the script, the CLI channel is opened, and the CLI terminal displays the
user view.
4. Run the undo statistics-task a command, which is an interactive command. The system then
automatically interacts based on the choice variable value.
• After the CLI channel is opened using the script, commands can be delivered to the device only when the CLI
terminal displays the user view.
• The CLI channel privileges are inherited from the user authorities of the maintenance assistant created.
• A script can be used to create only one CLI channel. If an attempt is made to create a second CLI channel using this
script, the system returns a failure.
• A VTY resource is consumed for every channel opened. The display users command shows that a VTY resource is
consumed by an assistant (Assistant: Name). When only three or less VTY resources are available, opening a
channel fails.
Function Description
After you subscribe to timer events and a timer event is triggered, the system executes the ops_execute()
function in the Python script.
This API can only be used in the ops_condition() function of the maintenance assistant script.
Command Prototype
# Timer event defined in the Linux cron timer description format
opsObj.timer.cron(tag, crontime)
# Timer triggered after a specified number of seconds elapses since zero hour of the year of 1970
opsObj.timer.absolute(tag, timelength)
# Timer triggered after a specified number of seconds elapses since a timer event is subscribed
opsObj.timer.countdown(tag, timelength)
2022-07-08 345
Feature Description
Parameter Description
Table 1 describes parameters supported by timer event subscription APIs.
Method Description
crontime Specifies a cron timer description. The value is a character string. For example, *
* * * * indicates that the timer is triggered every second.
• Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
def ops_condition(_ops):
_ops.timer.countdown("con11", 5)
_ops.correlate("con11")
def ops_execute(_ops):
_ops.syslog("Record an informational syslog.")
return 0
Function Description
2022-07-08 346
Feature Description
You can subscribe to IPv4 route change events. After you subscribe to IPv4 route change events and an IPv4
route change event is triggered, the system executes the ops_execute() function in the maintenance assistant
script.
This API can only be used in the ops_condition() function of the maintenance assistant script.
Command Prototype
opsObj.route.subscribe (tag, network, maskLen, minLen=None, maxLen=None, neLen=None, type="all", protocol="all")
Parameter Description
Table 1 describes parameters supported by IPv4 route change event APIs.
Method Description
network Specifies a route prefix. The value is in the IPv4 address format, such as
10.1.1.1.
maskLen Specifies a mask length. The value is an integer ranging from 0 to 32.
minLen Specifies the minimum mask length. The value must be greater than or equal
to the value of maskLen.
maxLen Specifies the maximum mask length. The value must be greater than or equal
to the value of minLen.
neLen Specifies a length that cannot be a mask length. The value must be greater
than or equal to the value of minLen and less than or equal to the value of
maxLen.
type Specifies an IPv4 route change event type. The value can be add, remove,
modify, or all. The value all indicates all route changes.
protocol Specifies a routing protocol. After this parameter is set, change events of routes
2022-07-08 347
Feature Description
Method Description
of the specified protocol are subscribed to. The value can be direct, static, isis,
ospf, bgp, rip, unr, or all. The default value is all, indicating that routes are not
filtered by protocol type.
• Second return value: This value describes success or failure reasons, expressed in a character string.
Interface Constraints
• A route change event can be triggered only when active routes change.
• A route change event is not triggered when route recursion results change or inactive routes change.
• The add event is triggered when an active route with a high preference is added.
• When an active route is deleted and a sub-optimal route becomes active, the remove and add events
are triggered, respectively.
• A maximum of three route change events can be triggered per second. If multiple route changes match
the subscription conditions, a maximum of 100 events can be triggered.
• A route change event can be triggered only when public network routes change.
Example
test.py
import ops
def ops_condition(_ops):
ret, reason = _ops.route.subscribe("con0", "10.1.1.1",maskLen=32, type="all", protocol="all")
ret, reason = _ops.correlate("(con0 and con1)")
return ret
def ops_execute(_ops):
a, des = _ops.context.save("test.py", 'Route event trigger')
return 0
When a route with the prefix 10.1.1.1/32 is added or deleted, a route change event is triggered.
Function Description
You can subscribe to IPv6 route change events. After you subscribe to IPv6 route change events and an IPv6
2022-07-08 348
Feature Description
route change event is triggered, the system executes the ops_execute() function in the maintenance assistant
script.
This API can only be used in the ops_condition() function of the maintenance assistant script.
Command Prototype
opsObj.route.subscribe6(self, tag, network, maskLen, minLen=None, maxLen=None, vpnName="_public_", optype="all",
protocol="all")
Parameter Description
Table 1 describes parameters supported by IPv6 route change event APIs.
Method Description
network Specifies an IPv6 route prefix. The value is in the IPv6 address format, such as
2001:db8:1::2.
maskLen Specifies a mask length. The value is an integer ranging from 0 to 128.
minLen Specifies the minimum mask length. The value must be greater than or equal
to the value of maskLen.
maxLen Specifies the maximum mask length. The value must be greater than or equal
to the value of minLen.
vpnName Specifies the name of the VPN instance in which route change events are to be
subscribed to. The value is a string of 1 to 31 characters. This parameter is
optional. If the parameter is not specified, change events of the corresponding
public network routes are subscribed to by default.
NOTE:
The specified VPN instance must be an existing one on the device, and the IPv6
address family must have been enabled in the VPN instance. If either condition is not
2022-07-08 349
Feature Description
Method Description
type Specifies an IPv6 route change event type. The value can be add, remove,
modify, or all. The value all indicates all route changes.
protocol Specifies a routing protocol. After this parameter is set, change events of routes
of the specified protocol are subscribed to. The value can be direct, static, isis,
bgp, unr, or all. The default value is all, indicating that routes are not filtered
by protocol type.
• Second return value: This value describes success or failure reasons, expressed in a character string.
Interface Constraints
• A route change event can be triggered only when active routes change.
• A route change event is not triggered when route recursion results change or inactive routes change.
• The add event is triggered when an active route with a high preference is added.
• When an active route is deleted and a sub-optimal route becomes active, the remove and add events
are triggered, respectively.
• A maximum of three route change events can be triggered per second. If multiple route changes match
the subscription conditions, a maximum of 100 events can be triggered.
• You can specify a VPN instance to subscribe to change events of VPN routes. If no VPN instance is
specified, change events of the corresponding public network routes are subscribed to by default.
• The specified VPN instance must be an existing one on the device, and the IPv6 address family must
have been enabled in the VPN instance. If either condition is not met, the subscription rule does not
take effect. After a specified VPN instance is deleted, the corresponding subscription rule is also deleted.
Example
test.py
import ops
def ops_condition(_ops):
ret, reason = _ops.route.subscribe6("con0", "2001:db8:1::2", maskLen=64, vpnName="testVpn", optype="all",
protocol="all")
ret, reason = _ops.correlate("(con0 and con1)")
2022-07-08 350
Feature Description
return ret
def ops_execute(_ops):
a, des = _ops.context.save("test.py", 'Route event trigger')
return 0
When a route with the prefix 2001:db8:1::2/64 is added to or deleted from the VPN instance testVpn, a
route change event is triggered.
Function Description
The OPS allows the maintenance assistant to subscribe to alarms. After an alarm is triggered, the system
executes the ops_execute() function in the Python script.
This API can only be used in the ops_condition() function of the maintenance assistant script.
Command Prototype
opsObj.alarm.subscribe(tag, feature, event, condition[4]=None, alarm_state=start, occurs=1, period=30)
Parameter Description
Table 1 describes parameters supported by alarm subscription APIs.
Method Description
feature Specifies a feature name, which is a well-known character string, such as ospf.
2022-07-08 351
Feature Description
Method Description
condition[4] Specifies a condition array. The value can be None. The array contains a
maximum of four members. For example:
conditions = []
con1 = {'name':'ifIndex', 'op':'eq', 'value':'100'}
conditions.append(con1)
con2 = {'name':' vpnInstance', 'op':'eq', 'value': 'abc'}
conditions.append(con2)
The relationship between multiple conditions is AND.
alarm_state Whether an alarm is generated or cleared. The value can be start or end.
• Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
def ops_condition(_ops):
_ops.cli.subscribe("con11","ospf","NBR_DOWN_REASON",[], start, 1,30)
_ops.correlate("con11")
def ops_execute(_ops):
print "Hello World"
return 0
After the script is executed, if the OSPF NBR status is DOWN and the alarm status is start, "Hello World" is
displayed.
Function Description
The OPS allows the maintenance assistant to subscribe to events. After an event is triggered, the system
executes the ops_execute() function in the Python script.
2022-07-08 352
Feature Description
This API can only be used in the ops_condition() function of the maintenance assistant script.
Command Prototype
opsObj.event.subscribe(tag, feature, event, condition[4], occurs=1, period=30)
Parameter Description
Table 1 describes parameters supported by event subscription APIs.
Method Description
feature Specifies a feature name, which is a well-known character string, such as ospf.
condition[4] Specifies a condition array. The value can be None. The array contains a
maximum of four members. For example:
conditions = []
con1 = {'name':'ifIndex', 'op':'eq', 'value':'100'}
conditions.append(con1)
con2 = {'name':' vpnInstance', 'op':'eq', 'value': 'abc'}
conditions.append(con2)
The relationship between multiple conditions is AND.
2022-07-08 353
Feature Description
• Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
def ops_condition(_ops):
_ops.cli.subscribe("con11","ospf", "NBR_DOWN_REASON", [], 1,30)
_ops.correlate("con11")
def ops_execute(_ops):
print "Hello World"
return 0
After the script is executed, if the OSPF NBR status is DOWN, "Hello World" is displayed.
Function Description
When user-compiled scripts are running on a device, some information is recorded in the device's log.
Command Prototype
opsObj.syslog(content, severity="informational", logtype="syslog")
Parameter Description
Table 1 describes the parameters supported by APIs for recording logs.
Method Description
content Specifies the log content. The maximum length of the character string is 512
bytes. If the length exceeds 512 bytes, the log fails to be recorded.
2022-07-08 354
Feature Description
Method Description
logtype Specifies a log type, which can be syslog or diagnose. If syslog is specified,
information is recorded in the syslog. After a syslog server is configured, the
syslog is uploaded to the syslog server. If diagnose is specified, information is
recorded in the diagnostic log on the device. The default value is syslog.
• Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
opsObj = ops.ops()
opsObj.syslog("Record an informational syslog.")
Function Description
The OPS allows you to obtain the OID of a specified MIB object and the query packet of the corresponding
command.
Command Prototype
# Obtain the OID of a specified MIB object.
opsObj.snmp.get(oid)
# Obtain the query packet of the corresponding command based on the obtained OID of the MIB object.
opsObj.snmp.get_snmp_get_command(oid)
# Obtain the query packet of the corresponding command based on the obtained OID of the next-hop node
of the MIB object.
opsObj.snmp.get_snmp_getnext_command(oid)
2022-07-08 355
Feature Description
Parameter Description
Table 1 describes the parameters supported by the API for obtaining OIDs and corresponding packets.
Table 1 Parameters supported by APIs for obtaining OIDs and corresponding packets
Method Description
oid Node in the MIB tree. It can be considered as a rule-based device parameter
code. The SNMP groups device parameters in a tree structure. Starting from the
root of the tree, nodes at each level has a code. These codes are separated by
periods (.) to form a string of codes called OID. You can use the OID to perform
operations on the parameters represented by the OID. The value is a character
string, such as 1.3.6.1.2.1.7.1.0.
■ Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
test = ops.ops()
test.snmp.get("1.3.6.1.4")
test.snmp.getnext("1.3.6.1.4")
test.snmp.get_snmp_get_command("1.3.6.1.4")
test.snmp.get_snmp_getnext_command("1.3.6.1.4")
After the script is executed, the OID 1.3.6.1.4 is obtained, the OID of the next hop in the MIB tree is retrieved,
and the query packet of the corresponding command is obtained.
Function Description
The OPS provides an API for outputting prompt information to the CLI terminal and reading user input from
2022-07-08 356
Feature Description
the CLI terminal when the CLI terminal is waiting for CLI event synchronization.
Command Prototype
# Output prompt information to the CLI terminal.
opsObj.terminal.write(msg, vty=None, fgrd=False)
Parameter Description
Table 1 describes the parameters supported by terminal display and read APIs.
Method Description
vty Specifies a user terminal. Currently, messages can only be displayed on user
terminals that wait for script execution or execute the script on the front end.
You can enter environment('_cli_vty') to obtain the VTY name or enter None.
fgrd Boolean value, which indicates whether to display prompt information on the
login pages of all users.
Only when fgrd is set to True in the system script, the prompt information is
displayed on all terminals. If a common script is used, the prompt information
is displayed only on the current terminal.
maxLen Specifies the maximum number of characters allowed to be input. The default
value is 512 characters.
timeout Specifies a timeout period for waiting for user inputs. The default value is 30s.
■ First return value: The value 0 indicates a success, and the value 1 indicates a failure.
■ Second return value: This value describes success or failure reasons, expressed in a character string.
2022-07-08 357
Feature Description
■ None: indicates a user input timeout, a user has entered Ctrl+C, or the parameter is incorrect.
Example
# When the front end executes the script, the Python script outputs "Hello World!" to the CLI terminal.
test.py
import ops
def ops_condition(_ops):
ret, reason = _ops.cli.subscribe("corn1","device",True,True,False,20)
return ret
def ops_execute(_ops):
_ops.terminal.write("Hello world!",None,False)
return 1
# When the front end executes the script, the character string entered by a user on the CLI terminal is
output.
test.py
import ops
def ops_condition(_ops):
ret, reason = _ops.cli.subscribe("corn1","device",True,True,False,20)
return ret
def ops_execute(_ops):
_ops.terminal.write("Enter your passwd:",None)
passwrd,ret = _ops.terminal.read(10,15,None)
print(passwrd)
Function Description
The OPS provides the function of saving and restoring script variables in Python scripts.
A maximum of 100 script variables can be stored. A variable with the same name as one that has been stored will
replace the stored one.
Command Prototype
# Save script variables.
opsObj.context.save(varName, value)
2022-07-08 358
Feature Description
opsObj.context.retrieve(varName)
Parameter Description
Table 1 describes the parameters supported by APIs for saving and restoring script variables.
Table 1 Parameters supported by APIs for saving and restoring script variables
Method Description
value Specifies the value of a variable. The value can be a string of a maximum of
1024 characters or an integer ranging from –2147483648 to 2147483647.
■ First return value: The value 0 indicates a success, and the value 1 indicates a failure.
■ Second return value: This value describes success or failure reasons, expressed in a character string.
■ First return value: If None is returned, restoring a specified user-defined environment variable fails.
Otherwise, user-defined environment variable values are returned.
■ Second return value: This value describes success or failure reasons, expressed in a character string.
Example
# Save script variables.
test.py
import ops
test = ops.ops()
print 'test context save'
a, des= test.context.save("varInt1", 111)
print 'save varInt1 return' , a
a, des= test.context.save("varStr2", 'testString')
print 'save varStr2 return' , a
print 'test context save over'
2022-07-08 359
Feature Description
import ops
test = ops.ops()
print 'test context retrieve'
a, des = test.context.retrieve("varInt1")
print 'retrieve varInt1 = ', a
a, des = test.context.retrieve("varStr2")
print 'retrieve varStr2 = ', a
print 'test context retrieve over'
Function Description
The OPS supports resident scripts. When a resident script is executed, ops.result() returns the execution
result, and ops.wait() suspends the script. After the script is triggered again, ops.wait() returns the result,
and script execution continues.
Command Prototype
# Return the script processing result to the OPS.
opsObj.result(status)
The result can also be returned using return. If neither of them is used, the default value 1 is returned. If both of them
are used, the result returned by opsObj.result() takes effect. If opsObj.result() is called consecutively, the first result takes
effect.
# Wait until the next event occurs and continue to execute the script.
opsObj.wait()
Parameter Description
Table 1 describes the parameters supported by resident script APIs.
Method Description
status Specifies a return value, indicating the script processing result sent to the OPS.
The value 0 indicates a success (the original command is skipped). Other values
are error codes.
2022-07-08 360
Feature Description
Example
test.py
import ops
def ops_condition(_ops):
_ops.cli.subscribe("con11","this",True,True,False,5)
_ops.correlate("con11")
return ret
def ops_execute(_ops):
a, des= _ops.context.save("wait1", 'ac1')
_ops.result(1)
_ops.wait()
a, des= _ops.context.save("wait2", 'ac2')
return 0
Function Description
The OPS allows you to associate multiple conditions with an OPS object.
Command Prototype
opsObj.correlate("correlation expression")
Parameter Description
Table 1 describes the parameters supported by multi-condition association APIs.
Method Description
correlation expression The value is a string of a maximum of 128 characters, consisting of a condition
identifier string and an operator (and, or, or andnot). Operators and and
andnot have the same priority, which is greater than that of or.
2022-07-08 361
Feature Description
• Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
def ops_condition(_ops):
ret1, reason1 = _ops.cli.subscribe("con1","display device",True,True,False,20)
ret2, reason2 = _ops.cli.subscribe("con2","display this",True,True,False,20)
_ops.correlate("con1 and con2")
def ops_execute(_ops):
_ops.terminal.write("Hello world!",None)
return 0
When con1 and con2 are both met, the assistant is triggered.
Function Description
The OPS allows you to specify the interval for monitoring the operating status of maintenance assistants. By
default, a maintenance assistant is triggered when the condition is met once within 30 seconds. This function
can be used to configure multiple conditions at the same time.
Command Prototype
opsObj.trigger(occurs=1, period=30, delay=0, suppress=0)
Parameter Description
Table 1 describes the parameters supported by multi-condition triggering APIs.
Method Description
period Specifies a detection period, in seconds. This parameter is valid only when the
2022-07-08 362
Feature Description
Method Description
• Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
def ops_condition(_ops):
ret1, reason1 = _ops.cli.subscribe("con1","display device",True,True,False,20)
ret2, reason2 = _ops.cli.subscribe("con2","display this",True,True,False,20)
_ops.correlate("con1 and con2")
_ops.trigger(occurs=1, period=10, delay=0, suppress=0)
def ops_execute(_ops):
_ops.terminal.write("Hello world!",None)
return 0
When a user enters display device and display this on the terminal, the maintenance assistant is triggered
only once within 10 seconds, and "Hello world!" will be displayed.
Function Description
The OPS allows you to obtain environment variables.
Command Prototype
opsObj.environment.get("envName")
2022-07-08 363
Feature Description
Parameter Description
Table 1 describes the parameters supported by APIs for obtaining environment variables.
Method Description
envName Specifies the name of an environment variable. The value is a character string.
Example
test.py
import ops
_ops = ops.ops()
Debug, description = _ops.environment.get("ops_debug")
After the script is executed, the status of the debugging function is obtained.
• Environment variables are classified into user-defined environment variables and system environment variables.
• User-defined environment variables are defined by users and their names start with a letter.
• System environment variables are defined by the system and their names start with an underscore (_).
• System environment variables are classified into public environment variables and event environment variables.
• In the registration phase, some event-related environment variables cannot be obtained because the event has not
occurred.
Function Description
The OPS allows you to set a data modeling language. The default language is Schema. If a device does not
support Schema, you can use this API to change the language to the YANG modeling language.
2022-07-08 364
Feature Description
Command Prototype
opsObj.set_model_type(ops_model_type)
Parameter Description
Table 1 describes the parameters supported by the API for setting a model type.
Method Description
• Second return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
_ops = ops.ops()
_ops.set_model_type("YANG")
After the script is executed, the OPS data modeling language is set to the YANG modeling language.
Function Description
This API is used to establish an OPS connection instance when OPS packets, commands, or SNMP operations
are delivered.
Command Prototype
# Create an OPS connection instance.
ops_conn = OPSConnection(host)
2022-07-08 365
Feature Description
ops_conn.close()
Parameter Description
Table 1 describes the parameters supported by the API for creating a connection.
Method Description
req_data Specifies the content of the request. The value contains a maximum of 65536
characters.
• Third return value: This value describes success or failure reasons, expressed in a character string.
Example
test.py
import ops
host ="localhost"
ops_conn = ops.OPSConnection(host)
uri = "/cli/cliTerminal"
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<cliTerminal>
2022-07-08 366
Feature Description
<opType>open</opType>
</cliTerminal>
'''
ret, _, rsp_data = ops_conn.create(uri, req_data)
uri = "/cli/cliTermResult"
req_data = '''<?xml version="1.0" encoding="UTF-8"?>
<cliTermResult>
<status></status>
<result></result>
<output></output>
</cliTermResult>
'''
ret, _, rsp_data = ops_conn.get(uri, req_data)
After the script is executed, a connection instance is created, and server resources are obtained.
You can instead automate the health check by configuring the OPS function on the device, as shown in
Figure 1. When this function is configured, the device automatically runs the health check commands,
periodically collects health check results, and sends these results to a server for analysis. If a fault occurs, the
system runs the pre-configured commands or scripts to isolate the faulty module and rectify the fault. This
function reduces the workload involved in performing device maintenance.
2022-07-08 367
Feature Description
Terms
Term Definition
The CUSP feature is only used to establish a communication channel between Huawei forwarder and controller in a CU
separation scenario.
Definition
2022-07-08 368
Feature Description
Concept Definition
2022-07-08 369
Feature Description
Purpose
Traditional network devices have both built-in forwarding and control planes. The forwarding plane varies
according to the device and is therefore hard to be opened. In terms of the control plane where forwarding
entries are generated, most devices do not allow a third-party control plane to replace the built-in control
plane. Hardware and software are closely coupled, reducing the upgrade frequency of network devices but
extending the time for the devices to support new technologies. Nowadays, however, various network
technologies continuously emerge to meet new requirements. Customers are urged to solve existing network
problems with the new network technologies.
To address this issue, CUSP is introduced to provide communication channels for the control and forwarding
planes. Using standardized open interfaces, CUSP separates the control plane from the forwarding plane and
allows the former to manage the latter.
In a CU separation scenario, CUSP channels are used for the communications between the control and
forwarding planes, so that the control plane delivers service entries to the forwarding plane and the
forwarding plane reports service events to the control plane.
Benefits
This feature promotes the standardization and generalization of high-performance forwarding planes
through standard interfaces.
2022-07-08 370
Feature Description
Figure 1 Comparison between a traditional network architecture and an SDN network architecture
The controller uses Experimenter packets to deliver private flow tables to forwarders, implementing service
entry delivery.
A CUSP agent is a component on a forwarder used to manage the CUSP protocol. The agent provides the
following functions:
1. After the controller and forwarder are both configured, the controller and forwarder establish a TCP
connection.
2. The controller and forwarder exchange Hello packets carrying version information over the TCP
connection to negotiate a channel with each other.
3. After the negotiation is complete, the controller sends a Features Request packet to query the
attribute information of the forwarder. Upon receipt of the packet, the forwarder replies with the
requested attribute information, such as the flow table format and buffer size, to the controller. Then,
2022-07-08 371
Feature Description
4. The controller and forwarder periodically send Echo Request packets to each other to detect the
connection status. After receiving an Echo Request packet sent from the initiator, the peer returns an
Echo Reply packet. If the initiator neither receives an Echo_Reply packet nor receives any other valid
CUSP packet after a specified number of attempts, the initiator considers the peer faulty and tears
down the connection. If the initiator does not receive any Echo_Reply packet but receives another valid
CUSP packet, it does not tear down the connection.
2022-07-08 372
Feature Description
2022-07-08 373
Feature Description
The controller uses Experimenter packets to deliver private flow tables to forwarders, with a private
smoothing process supported. A private flow table contains FES entries related to VXLAN services.
• Before a CUSP connection is reestablished, the controller uses the backed up data to process services.
• After the CUSP connection is reestablished, the controller re-collects forwarder information and updates
original information to ensure that services are properly processed.
Terms
Term Definition
2022-07-08 374
Feature Description
Definition
Remote Network Monitoring (RMON) is a standard monitoring specification defined by the IETF. It is an
enhancement of Management Information Base II (MIB II) specification used to monitor data traffic on a
network segment or across an entire network. RMON allows network administrators to more easily select
specific networks.
RMON implements the traffic statistics and alarm functions. These functions allow the NMS to remotely
manage and monitor devices.
• Traffic statistics function enables a managed device to periodically or continuously collect traffic
statistics on its connected network segment. The statistics include the total number of received packets
and the number of received long packets.
• Alarm function allows a managed device to generate a log and send a trap message to the NMS after
the managed device finds that a bound variable of a MIB object exceeds the alarm threshold (for
example, an interface rate or the percentage of broadcast packets reaches a specific value).
Purpose
RMON enables the NMS to monitor remote network devices more efficiently and proactively. In addition, it
decreases the volume of traffic between the NMS and agents and facilitates large-size network
management.
Benefits
RMON allows the NMS to effectively and efficiently collect statistics of a device, lowering network
maintenance costs.
Background
The Simple Network Management Protocol (SNMP) is a widely used network management protocol that
collects network communication statistics using agent software embedded in managed devices. The
management software obtains network management data by sending query signals to the Agent
Management Information Base (MIB) in polling mode. Although the MIB counter records data statistics, it
cannot analyze data historically. The NMS software continuously queries the managed devices for data in
polling mode, which is then used to build an overall picture of network traffic and traffic changes, in order
to analyze overall network status.
2022-07-08 375
Feature Description
• SNMP occupies significant network resources. In polling, abundant packets are generated on large-size
networks, which will cause network congestion or blocking. Therefore, SNMP is not applicable to
manage large-size networks or reclaim abundant data, such as the routing table.
• SNMP increases the burden on the network administrator. When polling, the network administrator
must manually collect information using the NMS software. If the administrator must monitor more
than three network segments, the workload will be unmanageable.
To provide more valuable management information, lighten the NMS workload, and allow the network
administrator to monitor multiple network segments, the Internet Engineering Task Force (IETF) developed
RMON for monitoring data traffic on a network segment or across an entire network.
• By building on the SNMP architecture, RMON consists of two parts, the NMS and the Agent located on
each device. Since RMON is not an entirely new protocol, an SNMP NMS can be used as an RMON
NMS, and the administrator does not need to learn a new technology, making RMON easier to
implement.
• When an abnormality occurs on the monitored object, the RMON agent uses the SNMP trap packet
transmission mechanism to send trap messages to the NMS. The SNMP trap function is usually used to
notify the managed device whether a function is running properly and the interface status changes.
Therefore, objects monitored, triggering conditions, and information reported differ between RMON and
SNMP.
• RMON enables SNMP to monitor remote network devices more efficiently and proactively. Using
RMON, managed devices automatically send trap messages when a specific monitored value exceeds
the alarm threshold. Therefore, managing devices do not need to obtain MIB variables by continuous
polling and comparison. This implementation reduces traffic volume between the managing and
managed devices, and allows large-size networks to be more easily and effectively managed.
Related Concepts
• NMS: A workstation that runs the network management software.
• RMON MIB: The network management medium of RMON. RMON Agent is embedded in monitored
devices that collect data and control the system within a network segment, as defined by the MIB. The
NMS obtains the management information from the RMON Agent and controls the network resources.
RMON MIB provides data link layer monitoring and diagnosis of device faults. To more easily and
effectively monitor network activities, Huawei has implemented four of the nine groups defined in
standard RMON MIB specifications, which are the statistics group, the history group, the event group,
and the alarm group.
2022-07-08 376
Feature Description
Functions
Statistics function
Ethernet statistics (corresponding to the statistics group in RMON MIB): The system collects the basic
statistics of monitored networks. The system continuously collects statistics of traffic and various packets
distribution on a network segment, or the number of various error frames and collisions. The statistics
include network collisions, CRC error packets, the number of oversize or undersize packets, the number of
broadcast or multicast packets, and the number of received bytes or packets.
Historical sampling function (corresponding to the history group in RMON MIB): The system periodically
samples network statuses and stores the information for later queries. The system also periodically samples
port traffic data, specifically bandwidth usage, the number of error packets, and the number of total packets.
Alarm function
The function to process an event as recording a log or sending trap messages (corresponding to the event
group in RMON MIB): The event group controls the events and prompts, and provides all the events
generated by the RMON Agent. A log is generated or trap messages are sent to the NMS for notifying an
occurred event.
Alarm threshold (corresponding to the alarm group in RMON MIB): The system monitors the objects of a
specific alarm type, and a sampled value can be either an absolute value or a difference in values. Once an
alarm's upper and lower thresholds are defined, the system will sample at a pre-defined interval. Sampled
values above the upper threshold trigger a rising alarm and sampled values below the threshold trigger a
falling alarm. The NMS processes them based on the definitions of the events. RMON Agent either records
the information as a log or sends trap messages to the NMS.
Benefits
RMON brings the following benefits for users:
• Expanded monitoring range: RMON MIB expands the range of network management to the data link
layer to more effectively monitor networks.
• Offline operation: RMON Agent can continuously collect error, performance, and configuration data
even when the administrator is not querying network statuses. RMON provides a solution for analyzing
the traffic in a specific range without consuming bandwidth resources.
• Data analysis: RMON Agent analyzes the problems occurred on the networks and the consumption of
network resources, providing information to diagnose faults and reducing the overall workload of the
NMS.
2022-07-08 377
Feature Description
• To collect traffic statistics on an Ethernet interface, realtime and historical traffic and packet statistics,
monitor port usage and collect error packet data.
• To monitor the traffic bytes on the interface, you can configure the function to process an event as
recording a log and set a threshold. When the traffic bytes in a minute exceed the threshold, a log is
recorded.
• To monitor broadcast and multicast traffic on the network, configure the function to process an event
as sending a trap message and set a threshold. When the number of broadcast and multicast packets
exceeds a predefined threshold, a trap message is sent to the NMS.
Definition
System of active immunization and diagnosis (SAID) is an intelligent fault diagnosis system that
automatically diagnoses and rectifies severe device or service faults by simulating human operations in
troubleshooting.
Purpose
A network is prone to severe problems if it fails to recover from a service interruption. At present, device
reliability is implemented through various detection functions. Once a device fault occurs, the device reports
an alarm or requires a reset for fault recovery. However, this mechanism is intended for fault detection of a
2022-07-08 378
Feature Description
single module. When a service interruption occurs, the network may fail to promptly recover from the fault,
adversely affecting services.
In addition, after receiving a reported fault, maintenance engineers may face a difficulty in collecting fault
information, preventing problem locating and adversely affecting device maintenance.
The SAID is promoted to address the preceding issues. The SAID achieves automated device fault diagnosis,
fault information collection, and service recovery, comprehensively improving the self-healing capability and
maintainability of devices.
Benefits
The SAID can automatically detect, diagnose, and rectify device faults, greatly improving network
maintainability and reducing maintenance costs.
Basic Concepts
• SAID node: detects, diagnoses, and rectifies faults on a device's modules in the SAID. SAID nodes are
classified into the following types:
■ Module-level SAID node: defends against, detects, diagnoses, and rectifies faults on a module.
■ SAID-level SAID node: detects, diagnoses, and rectifies faults on multiple modules.
• SAID node state machine: state triggered when a SAID node detects, diagnoses, and rectifies faults. A
SAID node involves seven states: initial, detecting, diagnosing, invalid-diagnose, recovering, judging, and
service exception states.
• SAID tracing: The SAID collects and stores information generated when a SAID node detects, diagnoses,
and rectifies faults. The information can be used to locate the root cause of a fault.
SAID
Fault locating in the SAID involves the fault detection, diagnosis, and recovery phases. The SAID has multiple
SAID nodes. Each time valid diagnosis is triggered (that is, the recovery process has been triggered), the
SAID records the diagnosis process information for fault tracing. The SAID's main processes are described as
follows:
1. Defense startup phase: After the system runs, it instructs modules to deploy fault defense (for
example, periodic logic re-loading and entry synchronization), starting the entire device's fault
defense.
2. Detection phase: A SAID node detects faults and finds prerequisites for problem occurrence. Fault
2022-07-08 379
Feature Description
detection is classified as periodic detection (for example, periodic traffic decrease detection) or
triggered detection (for example, IS-IS Down detection).
3. Diagnosis phase: Once a SAID node detects a fault, the SAID node diagnoses the fault and collects
various fault entries to locate fault causes (only causes based on which recovery measures can be
taken need to be located).
4. Recovery phase: After recording information, the SAID node starts to rectify the fault by level. After
the recovery action is completed at each level, the SAID node determines whether services recover (by
determining whether the fault symptom disappears). If the fault persists, the SAID node continues to
perform the recovery action at the next level until the fault is rectified. The recovery action is gradually
performed from a lightweight level to a heavyweight level.
5. Tracing phase: If the SAID determines the fault and its cause, this fault diagnosis is a valid diagnosis.
The SAID then records the diagnosis process. After entering the recovery phase, the SAID records the
recovery process for subsequent analysis.
1. When detecting a trigger event in the initial state, the SAID node enters the detecting state.
2. If the detection is not completed in the detecting state, the SAID node keeps working in this state.
3. If a detection timeout occurs or no fault is detected in the detecting state, the SAID node enters the
initial state.
4. When detecting a fault in the detecting state, the SAID node enters the diagnosing state.
5. If the diagnosis action is not completed in the diagnosing state, the SAID node keeps working in this
state.
2022-07-08 380
Feature Description
6. If an environmental change occurs in the diagnosing state or another SAID node enters the recovering
state, the SAID node enters the invalid-diagnose state.
7. If the diagnosis action is not completed in the invalid-diagnose state, the SAID node keeps working in
this state.
8. If no device exception is detected after the diagnosis action is completed in the diagnosing state, the
SAID node enters the initial state.
9. If a device exception is detected after the diagnosis action is completed in the diagnosing state, the
SAID node enters the recovering state.
10. If the recovery action is not completed in the recovering state, the SAID node keeps working in this
state.
11. If the recovery action is completed in the recovering state, the SAID node enters the judging state.
12. If the judgment action is not completed in the judging state, the SAID node keeps working in this
state.
13. If the service does not recover in the judging state and a secondary recovery action exists, the SAID
node enters the recovering state.
14. If the service does not recover in the judging state and no secondary recovery action exists, the SAID
node enters the service exception state.
15. In the service exception state, the SAID node periodically checks whether the service recovers.
16. If the service recovers in the judging state, the SAID node enters the initial state.
Background
The failure to ping a directly connected device often occurs on networks, causing services to be interrupted
for a long time and fail to automatically recover. The ping process involves various IP forwarding phases. A
ping failure may be caused by a hardware entry error, board fault, or subcard fault on the local device or a
fault on an intermediate device or the peer device. Therefore, it is difficult to locate or demarcate the specific
fault.
Definition
The ping service node is a specific SAID service node. This node performs link-heartbeat loopback detection
to detect service faults, diagnoses each ping forwarding phase to locate or demarcate faults, and takes
corresponding recovery actions.
Principles
For details about the SAID framework and principles, see Basic SAID Functions. SAID uses IP packets in which
2022-07-08 381
Feature Description
the protocol number is 1, indicating ICMP. The ping service node undergoes four phases (fault detection,
fault diagnosis, fault recovery, and service recovery determination) to implement automatic device diagnosis,
fault information collection, and service recovery.
• Fault detection
The ping service node performs link-heartbeat loopback detection to detect service faults. The packets
used are ICMP detection packets. There are 12 packet templates in total. Each template sends two
packets in sequence within a period of 30s. Therefore, a total of 24 packets are sent by the 12 templates
within a period of 30s. After five periods, the system starts to collect statistics on lost packets and
modified packets.
Link-heartbeat loopback detection is classified as packet modification detection or packet loss detection.
■ Packet modification detection checks whether the content of received heartbeat packets is the
same as the content of sent heartbeat packets. If one of the following conditions is met, a trigger
message is sent to instruct the SAID ping node to perform fault diagnosis:
■ Packet loss detection checks whether the difference between the number of received heartbeat
packets and the number of sent heartbeat packets is within the permitted range. If one of the
following conditions is met, a trigger message is sent to instruct the SAID ping node to perform
fault diagnosis:
■ After each packet sending period ends, the system checks the protocol status and whether ARP
entries exist on the interface and find that there is no ARP in three consecutive periods.
■ The absolute value of the difference between the number of lost packets whose payload is all
0s and the number of lost packets whose payload is all Fs is greater than 25% of the total
number of sent packets in five periods.
• Fault diagnosis
After receiving the triggered message in the fault detection state, the ping service node enters the fault
diagnosis state.
■ If a packet loss error is detected on the device, the SAID ping node checks whether a module
(subcard, TM, or NP) on the device is faulty. If no module is faulty, the system completes the
diagnosis and returns to the fault detection state.
■ If a packet loss error is detected on the device, the SAID ping node checks whether a module
(subcard, TM, or NP) on the device is faulty. If a module fault occurs, the system performs
loopback diagnosis. If packet loss or modification is detected during loopback, the local device is
faulty. The system then enters the fault recovery state. If no packet is lost during loopback
diagnosis, the system returns to the fault detection state.
2022-07-08 382
Feature Description
■ If a packet modification error is detected on the device, the SAID ping node checks whether a
module (subcard, TM, or NP) on the device is faulty. Loopback diagnosis is performed regardless of
whether a module fault occurs. If packet loss or packet modification occurs during loopback, the
local device is faulty. The system then enters the fault recovery state. If no packet is lost during the
loopback, the system returns to the fault detection state and generates a packet modification
alarm.
• Fault recovery
If a fault is detected during loopback diagnosis, the ping service node determines whether a counting
error occurs on the associated subcard.
■ If a counting error occurs on the subcard, the ping service node resets the subcard for service
recovery. Then, the node enters the service recovery determination state and performs link-
heartbeat loopback detection to determine whether services recover. If services recover, the node
returns to the fault detection state. If services do not recover, the node returns to the fault recovery
state and takes a secondary recovery action. (For a subcard reset, the secondary recovery action is
board reset.)
■ If no counting error occurs on the subcard, the ping service node resets the involved board for
service recovery. After the board starts, the node enters the service recovery determination state
and performs link-heartbeat loopback detection to determine whether services recover. If services
recover, the node returns to the fault detection state. If services do not recover, the node remains
in the service recovery determination state and periodically performs link-heartbeat loopback
detection until services recover.
• Fault alarm
If link-heartbeat loopback detects packet loss, it triggers SAID ping diagnosis and performs recovery
operations (reset the subcard or board). However, services fail to be recovered, and the device detects
packet loss and reports an alarm.
If link-heartbeat loopback detects packet modification, it triggers SAID ping diagnosis and reports an
alarm when any of the following conditions is met:
■ If services fail to be restored after recovery operations (reset the subcard or board), the device
detects packet loss and reports an alarm.
■ If a software error occurs, the device forcibly cancels link-heartbeat loopback and reports an alarm
if no other recovery operation is performed within 8 minutes.
■ If no packet loss or packet modification error occurs during link-heartbeat loopback, the device
2022-07-08 383
Feature Description
cancels the recovery operation. If no other recovery operation is performed within 8 minutes, the
device reports an alarm.
■ If the board does not support SAID ping, the device reports an alarm.
Background
A large number of forwarding failures occur on the network and cannot recover automatically. As a result,
services are interrupted and cannot be automatically restored for a long time. A mechanism is required to
detect forwarding failures that cannot recover automatically. After a forwarding entry (such as route
forwarding entry and ARP forwarding entry) failure is detected, proper measures are taken to rectify the
fault quickly.
Definition
The control plane with forwarding plane consistency check (CFC) service node is a specific service node in
the SAID framework. The CFC node selects some typical routes and compares the outbound interface, MAC
address, and label encapsulation information on the control plane with those on the forwarding plane. If the
information is inconsistent, the system enters the diagnosis state and performs the consistency check for
multiple times. If the comparison result remains, an alarm is generated.
Principles
The SAID system diagnoses the CFC service node through three phases: flow selection, check, and
troubleshooting. In this case, devices can perform automatic diagnosis, collect fault information
automatically, and generate alarms.
• Flow selection
There are a large number of routes on the live network. The system selects typical routes for the check.
Routes are selected based on the following priorities. Default route > 32-bit direct route > Static
route > Private routes > Others
The total number of 4000 flows can be selected, and the quota of each type of flow is limited. The
system delivers a flow selection task based on the standard quota of each type of flow. If the quota of a
type of flow is not used up, the extra quota is used for other types of flows after summarizing the
results.
• Check
After summarizing the flow selection results of interface boards and obtaining the final flow set to be
checked, the main control board broadcasts the flow selection information to each interface board. The
interface boards start to check the flows.
Data on the control plane is inconsistent with that on the forwarding plane in the following situations:
2022-07-08 384
Feature Description
1. The forwarding plane has the outbound interface, MAC address, and label encapsulation
information, but the control plane does not.
2. Data on the forwarding plane is incorrect (for example, an entry is invalid), and no hardware
forwarding result is obtained. If the outbound interface, MAC address, and label encapsulation
information can be obtained, the data compared with that on the control plane. In normal cases,
the data on the forwarding plane is the same as or is a subset of that on the control plane.
• Troubleshooting
After a fault occurs, the context information related to the fault is collected. Then, the device enters the
diagnosis state and repeatedly checks the incorrect flow. If an entry error occurs for three consecutive
times, the device enters the recovery state. If no error occurs once, the flow is considered normal and no
further diagnosis is required.
After the fault is diagnosed, you can run commands to restart the interface to rectify the fault.
After the fault recovery action is performed, the current flow needs to be checked again after it keeps
stable and does not change for 5 minutes. If the fault persists, an alarm is generated and the context
information related to the fault is collected. If the fault is rectified, the system enters the detection state
again and continues to check the subsequent flows.
After an alarm is generated, the SAID system keeps checking the current flow until the flow is correct.
Then, the alarm is cleared and the system enters the detection state.
Background
As the manufacturing technique of electronic components evolves towards deep submicron, the per-unit soft
failure rate of storage units in such components has been increasing. As a result, single event upset (SEU)
faults often occur, adversely affecting services.
Definition
If a subcard encounters an SEU fault, SAID for SEU performs loopbacks on all interfaces of the subcard. If
packet loss or modification occurs during loopback detection, the subcard is reset for fault rectification.
Principles
The SAID system diagnoses an SEU fault through three phases: fault detection, loopback detection, and
troubleshooting. This enables devices to perform automatic diagnosis and fault information collection.
• Fault detection
SAID for SEU detects an SEU fault on a logical subcard and starts loopback detection.
• Loopback detection
Loopback detection is to send ICMP packets from the CPU on the involved interface board to an
2022-07-08 385
Feature Description
interface on the faulty subcard and then loop back the ICMP packets from the interface to the CPU.
• Troubleshooting
1. If packet loss or modification occurs, SAID for SEU performs either of the following operations
depending on the status of the involved interface:
a. If the interface is physically Up, SAID for SEU resets the subcard.
b. If the interface is physically Down, SAID for SEU keeps the interface Down until the fault is
rectified.
2. If statistics about the sent and received loopback packets are properly collected and packet
verification is normal, the subcard does not need to be reset.
Terms
None.
Abbreviation
Definition
Key performance indicators (KPIs) indicate the performance of a running device at a specific time. A KPI may
be obtained by aggregating multiple levels of KPIs. The KPI data collected by the main control board and
interface boards is saved as an xxx.dat file and stored into the CF card on the main control board. The KPI
parsing tool parses the file according to a predefined parsing format and converts it into an Excel file. The
Excel file provides relevant fault and service impairment information, facilitating fault locating.
Purpose
The KPI system records key device KPIs in real time, provides service impairment information (for example,
the fault generation time, service impairment scope/type, relevant operation, and possible fault
2022-07-08 386
Feature Description
Benefits
The KPI system helps carriers quickly learn service impairment information and locate faults, so that they
can effectively improve network maintainability and reduce maintenance costs.
KPI System
Key performance indicators (KPIs) are periodically collected at a specified time, which slightly increases
memory and CPU usage. However, if a large number of KPIs are to be collected, services may be seriously
affected. Therefore, when memory or CPU usage exceeds 70%, enable the system to collect the KPIs of only
the CP-CAR traffic, message-queue, Memory Usage, and CPU Usage objects that do not increase the
memory or CPU usage.
The KPI system checks whether the receiving buffer area has data every 30 minutes. If the receiving buffer
area has data, the system writes the data into a data file and checks whether the data file size is greater
than or equal to 4 MB. If the data file size is greater than or equal to 4 MB, the system compresses the file
as a package named in the yyyy-mm-dd.hh-mm-ss.dat.zip format. After the compression is complete, the
system deletes the data file.
The KPI system obtains information about the size of the remaining CF card space each time a file is
generated.
• If the remaining CF card space is less than or equal to 50 MB, the KPI system deletes the oldest
packages compressed from data files.
• If the remaining CF card space is greater than 50 MB, the KPI system obtains data files from the
cfcard:/KPISTAT path and computes the total space used by all the packages compressed from data
files. If the space usage is greater than or equal to 110 MB, the KPI system deletes the oldest packages.
2022-07-08 387
Feature Description
1. The KPI system provides a registration mechanism for service modules. After the modules register, the
system collects service data at the specific collection time through periodic collection and storage
interfaces.
2. When the collection period of a service module expires, the KPI system invokes the module to collect
data. The module converts the collected data into a desired KPI packet format and saves the data on
the main control board through the interface provided by the KPI system.
3. The KPI parsing tool parses the file based on a predefined format and converts the file into an Excel
one.
KPI Categories
KPIs are categorized as access service, traffic monitoring, system, unexpected packet loss, resource. The
monitoring period can be 1, 5, 10, 15, or 30 minutes. At present, components (for example, NP and TM),
services (for example, QoS), and boards (for example, main control boards and interface boards) support KPI
collection.
Table 1 provides KPI examples.
2022-07-08 388
Feature Description
KPI KPI Sub- Board KPI KPI Monitoring Collected Reporting Incremental/Total
Category category Collection Period When Condition
Object CPU/Memory
Usage Is
Higher
Than
70%
2022-07-08 389
Feature Description
KPI KPI Sub- Board KPI KPI Monitoring Collected Reporting Incremental/Total
Category category Collection Period When Condition
Object CPU/Memory
Usage Is
Higher
Than
70%
• File header
For details about the header format of the .dat file, see Table 2.
• Data file
■ Packet header
For details about the packet header format, see Table 3.
2022-07-08 390
Feature Description
■ Data packet
For details about the packet format, see Table 4.
Table 5 describes the file format output after the system parses the source file according to the data formats
in Table 2, Table 3, and Table 4.
Reserved 4 0
Record header Data collection time 4 For example, the number of seconds
elapsed from 00:00:00 of January 1, 1970
Reserved 1 -
2022-07-08 391
Feature Description
Collection period 2 -
L UCHAR
V -
L UCHAR
V -
KPI-Value - -
KPI 2
KPI...
KPI N
KPI object 2 -
KPI object... -
2022-07-08 392
Feature Description
KPI object N -
Device
LoopBack
File Collect
VersionDateTime
Chassis
Slot Module
KPI- KPI- KPI- KPI- KPI- Type Interval
Record
Threshold
KPI- Unit
NameIP Type Date ClassSubClass
object
ID Name Mode Value
HUAWEI
1.1.1.1KPI 2017/4/27
V800R021C10SPC600
2017-0 1 CPUPSystem
CPU CPU 25088CPU Total 300 Always
NA 6 %
LOG 04- Usage Usage
27
14:47:49+00:00
HUAWEI
1.1.1.1KPI 2017/4/27
V800R021C10SPC600
2017-0 1 MEMP
System
Memory
Memory
25089Memory
Total 300 Always
NA 16 %
LOG 04- Usage Usage
27
14:48:49+00:00
HUAWEI
1.1.1.1KPI 2017/4/27
V800R021C10SPC600
2017-0 1 CPUPSystem
CPU CPU 25088CPU Total 300 Always
NA 6 %
LOG 04- Usage Usage
27
14:49:49+00:00
HUAWEI
1.1.1.1KPI 2017/4/27
V800R021C10SPC600
2017-0 1 MEMP
System
Memory
Memory
25089Memory
Total 300 Always
NA 16 %
LOG 04- Usage Usage
27
14:50:49+00:00
Definition
The protocol-aided diagnosis system (PADS) is an intelligent diagnosis system. It simulates service experts to
be online for 7 x 24 hours and to implement automatic service fault prevention, discovery, and diagnosis
2022-07-08 393
Feature Description
from end to end. The PADS also supports automatic fault recovery with the help of the self-healing system.
Purpose
The PADS derives from technical research on future customer O&M. It summarizes common fault modes
from thousands of faults reported by customers, and simulates experts in all fields to monitor the IP protocol
status for 7 x 24 hours. The actual customer O&M capability cannot meet the requirement of complex IP
protocol O&M capability. The PADS provides a unified O&M interface, hierarchical fault diagnosis, and
capabilities to diagnose and process common faults at IP protocol's system, device, and network levels. It can
record and analyze exception signs before a fault occurs, automatically start fault diagnosis, and
automatically isolate and recover a fault. This helps the intelligent O&M of devices on the live network.
Benefits
The PADS simplifies O&M, improves O&M efficiency, and reduces O&M costs.
PADS
The PADS simulates experts to monitor the service status in real time and automatically diagnoses and
recovers faults.
• Abnormal service status check: Diagnostic logs are recorded. The status of the latest abnormal services
can be queried using commands.
• In-process service status check: Diagnostic information is recorded in the PADS O&M file on the PADS-
dedicated CF card. The information can be used to restore the service status on site.
Implementation
2022-07-08 394
Feature Description
1. Each service saves the service status in real time to the PADS O&M file on the CF card. Key
information is backed up in the memory for analysis.
2. The intelligent fault analysis/prevention unit monitors the running status data of each service in real
time.
3. The intelligent fault diagnosis unit starts end-to-end automatic diagnosis after detecting an exception.
You can also run diagnostic commands to start end-to-end fault analysis.
4. In the diagnosis process, if information needs to be collected and analyzed across components and
devices, use the cross-component and cross-device communications capability provided by the PADS.
5. Diagnosis results can be queried by running commands at any time. If any fault in the diagnosis
results needs to be self-healed, the PADS interworks with the self-healing system to complete fault
self-healing.
Background
The theft of network devices can have severe consequences on network operations, interrupting service
continuity and affecting user experience. Stolen devices are often sold on the black market and subsequently
used illegally. The device anti-theft function restricts the services of stolen devices upon unauthorized use,
thereby reducing the possibility of device theft.
Definition
• Device anti-theft: By restricting the unauthorized use of stolen devices, the anti-theft function reduces
2022-07-08 395
Feature Description
the possibility of device theft because unusable devices have little value on the black market.
After the device anti-theft function is enabled for the main control board of a device, the function is
automatically enabled for the service boards that support the function.
Benefits
• Device anti-theft offers the following benefits to carriers:
• Benefits to users
Ensures service continuity.
2022-07-08 396
Feature Description
5 Network Reliability
Purpose
This document describes the network reliability feature in terms of its overview, principles, and applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
2022-07-08 397
Feature Description
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
2022-07-08 398
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2022-07-08 399
Feature Description
Definition
Reliability is a technology that can shorten traffic interruption time and ensure the quality of service on a
network, improving user experience.
Device reliability can be assessed from the following aspects: system, hardware, and software reliability
design; reliability test and verification; IP network reliability design.
As networks rapidly develop and applications become diversified, various value-added services (VASs) are
widely used. The requirement for network bandwidth increases dramatically. Any network service
interruption will result in immeasurable loss to carriers.
Demands for network infrastructure reliability are increasing.
This chapter describes IP reliability technologies supported by the NE40E.
Reliability Indexes
Reliability indexes include the mean time to repair (MTTR), mean time between failures (MTBF), and
availability.
Generally, product or system reliability is assessed based on the MTTR and MTBF.
• MTTR: The MTTR indicates the fault rectification capability in terms of maintainability. This index refers
to the average time that a component or a device takes to recover from a failure. The MTTR involves
spare parts management and customer service and plays an important role in evaluating device
maintainability.
The MTTR is calculated using the following formula:
MTTR = Fault detection time + Board replacement time + System initialization time + Link recovery time
+ Route convergence time + Forwarding recovery time
A smaller addend indicates a shorter MTTR and higher device availability.
• MTBF: The MTBF indicates fault probability. This index refers to the average time (usually expressed in
hours) when a component or a device is working properly.
• Availability: Availability indicates system utility. Availability can be improved when the MTBF increases
2022-07-08 400
Feature Description
1 Few faults in system software Hardware: simplified design, standardized circuits, reliable
and hardware application of components, reliability control in purchased
components, reliable manufacture, environment endurability,
highly accelerated life testing (HALT) and highly accelerated
stress screen (HASS).
Software: specifications for software reliability design
2 No impact on the system if a Redundancy design, switchover policy, and switchover success
fault occurs rate improvement
• Hierarchical networking: A network is divided into three layers: core layer, convergence layer, and edge
layer. According to service status or prediction, redundancy backup is configured so that a customer
edge device is dual-homed to the devices at the convergence layer. The devices at the convergence layer
are dual-homed to multiple devices in a single node or different nodes at the upper layer. The devices
at the core and convergence layers can be deployed as required. The devices at the core layer are fully
or half interconnected. Two devices are reachable to each other using one route at a fast traffic rate,
avoiding multi-interconnection.
2022-07-08 401
Feature Description
• Multi-interconnection is preferred at the same layer, whereas multi-device is preferred in a single node.
• A lower-layer device is dual- or multi-homed to multiple devices in a single node or different nodes.
• Common fault detection technologies include Bidirectional Forwarding Detection (BFD), which applies
to all layers.
Each layer of the TCP/IP reference model has fault detection mechanisms:
• Data link layer: ETH OAM, Spanning Tree Protocol (STP), Rapid Spanning Tree Protocol (RSTP), MSTP
• Network layer: Hello mechanism provided by protocols, Virtual Router Redundancy Protocol (VRRP),
and graceful restart (GR)
• Echo mode: Received packets are sent back to a peer without any change.
2022-07-08 402
Feature Description
• Switchback mode
As shown in Figure 1, an LDP LSP serves as a public network tunnel and TE is enabled between Ps to ensure
quality of service (QoS). This deployment enhances the QoS across the entire network and simplifies TE
deployment during PE replacements. If no intermediate devices exist and a fault occurs on the link between
P1 and P2, or P2 fails on a non-broadcast network, an LDP FRR switchover is performed on PE1 to ensure
that the switchover takes less than 50 ms.
TE FRR/LDP FRR switchovers depend on the detection of electrical or optical signals on an interface. If
intermediate devices exist and a link fails, a router cannot detect the interruption of optical signals and
therefore a switchover cannot be performed. BFD resolves this issue.
2022-07-08 403
Feature Description
LSPs. If a link fails, traffic switches to the bypass tunnel within 50 milliseconds.
The P2P TE bypass tunnel is established over the path P1 -> P5 -> P2 on the network shown in Figure 1. It
protects traffic over the link between P1 and P2. If the link between P1 and P2 fails, P1 switches traffic to
the bypass tunnel destined for P2.
An FRR bypass tunnel must be manually configured. An administrator can configure an explicit path for a
bypass tunnel and determine whether or not to plan bandwidth for the bypass tunnel.
P2P and P2MP TE tunnels can share a bypass tunnel. FRR protection functions for P2P and P2MP TE tunnels are as
follows:
• A bypass tunnel bandwidth with planned bandwidth can be bound to a specific number of both P2P and P2MP
tunnels in configuration sequence. The total bandwidth of the bound P2P and P2MP tunnels must be lower than or
equal to the bandwidth of the bypass tunnel.
• A bypass tunnel with no bandwidth can also be bound to both P2P and P2MP TE tunnels.
2022-07-08 404
Feature Description
does not receive the detection packets within a specified period, the end assumes that the link is interrupted
and reports to related modules to perform switchover.
As shown in Figure 1, PE1 and PE2 form a VRRP group, functioning as a backup for each other. VRRP
monitors the BFD session. For example, PE1 serves as the primary PE. When the link between Switch1 and
PE1 fails, the failure is fast detected with BFD and reported to VRRP. The VRRP group fast switches traffic,
and then PE2 becomes the primary PE.
As shown in Figure 1, PE3 and PE4 access a VPN. If the user network on the left of PE1 needs to
communicate with the user network on the right of PE3, PE1 can access the user network on the right
through PE3 and PE4 that back up each other. VPN FRR is implemented on PE1.
Similar to other FRR technologies, VPN FRR has an available bypass path for fast switchovers if the primary
path fails. For VPN FRR, two next hops (PE3 and PE4) are reserved for PE1 to access the remote VPN. One is
the primary PE and the other is the backup PE. The primary and backup PEs can be manually configured.
As shown in the preceding figure, PE1 has two next hops, PE3 and PE4, for the remote VPN route. PE1 can
select one of PE3 and PE4 as the active next hop and the other as the standby next hop.
2022-07-08 405
Feature Description
• If VPN FRR has not been configured, only a primary next-hop entry is delivered from the control plane
to the forwarding plane. When the primary next hop becomes invalid, the backup next-hop entry is
delivered to the forwarding plane, which slows down switchovers.
• If VPN FRR has been configured, both the primary and backup next-hop entries are delivered from the
control plane to the forwarding plane. When the primary next hop becomes invalid, the forwarding
plane immediately uses the backup next hop, which speeds up switchovers.
After BFD detects that the primary next hop fails, a switchover is performed within a very short period,
which implements high reliability.
Figure 1 IP FRR
As shown in Figure 1, PE1 has two paths to reach the CE. One is the path PE1 -> CE, and the other is PE1 ->
PE2 -> CE. In normal circumstances, traffic to the CE is forwarded by PE1 (the primary PE). If the link
between PE1 and the CE fails, IP FRR switches the traffic from the link between PE1 and the CE to the link
between PE2 and the CE.
Generally, a PE accesses a Layer 3 virtual private network (L3VPN). When IP FRR is used for a private
network, a private network neighbor relationship must be established between PE1 and PE2. The primary
and bypass paths are created for PE1 to access the CE.
Definition
2022-07-08 406
Feature Description
Bidirectional Forwarding Detection (BFD) is a fault detection protocol that can quickly detect a
communication failure between devices and notify upper-layer applications.
Purpose
To minimize the impact of device faults on services and improve network reliability, a network device must
be able to quickly detect faults when communicating with adjacent devices. Measures can then be taken to
promptly rectify the faults to implement service continuity.
On a live network, link faults can be detected using either of the following mechanisms:
• Hardware detection: For example, the Synchronous Digital Hierarchy (SDH) alarm function can be used
to quickly detect link hardware faults.
• Hello detection: If hardware detection is unavailable, Hello detection can be used to detect link faults.
• Hello detection takes more than 1 second to detect faults. When traffic is transmitted at gigabit rates,
such slow detection causes great packet loss.
• On a Layer 3 network, the Hello packet detection mechanism cannot detect faults for all routes, such as
static routes.
• A low-overhead, short-duration method is used to detect faults in a path between adjacent forwarding
engines. The faults can be interface, data link, and even forwarding engine faults.
• A single, unified mechanism is used to monitor any media and protocol layers in real time.
Benefits
BFD offers the following benefits:
• BFD rapidly monitors link and IP route connectivity to improve network performance.
• Adjacent systems running BFD rapidly detect communication failures and establish a backup channel to
restore communications, which improves network reliability.
• An upper-layer application provides BFD with parameters, such as the detection address and interval.
• BFD creates, deletes, or modifies sessions based on these parameters and notifies the upper-layer
application of the session status.
• Provides a low-overhead, short-duration method to detect faults in the path between adjacent
forwarding engines.
• Provides a single, unified mechanism to monitor any media and protocol layers in real time.
The following sections describe BFD fundamentals, including the BFD detection mechanism, types of links
that can be monitored, session establishment modes, and session management.
• Asynchronous mode: a major BFD detection mode. In this mode, both systems periodically send BFD
Control packets to each other. If one system fails to receive BFD Control packets consecutively, the
system considers the BFD session Down.
The echo function can be used when the demand mode is configured. After the echo function is activated,
the local system sends a BFD Control packet and the remote system loops back the packet along the
forwarding channel. If several consecutive echo packets are not received, the session is declared down.
2022-07-08 408
Feature Description
MPLS LSP Static BFD monitors the following types of LSPs: A BFD session used to
LDP LSP monitor the connectivity of
Traffic engineering (TE) tunnels and constraint- MPLS LSPs can be
based routed label switched paths (CR-LSPs) and established in either of the
Resource Reservation Protocol (RSVP) CR-LSPs that following modes:
are bound to tunnels Static configuration: The
Dynamic BFD monitors the following types of LSPs: negotiation of a BFD session
LDP LSP is performed using the local
RSVP CR-LSPs bound to tunnels and remote discriminators
that are manually
configured for the BFD
session to be established.
Dynamic establishment: The
negotiation of a BFD session
is performed using the BFD
discriminator type-length-
2022-07-08 409
Feature Description
Segment Routing BFD for SR-MPLS BE BFD for locator route can be
BFD for SR-MPLS TE LSP applied to SRv6 BE.
BFD for SR-MPLS TE
SBFD for SR-MPLS TE Policy
SBFD for SRv6 TE Policy
Mode Description
Static mode BFD session parameters, such as the local and remote discriminators, are
manually configured, and a request to create a BFD session is manually
delivered.
NOTE:
In static mode, configure unique local and remote discriminators for each BFD
session. This mode prevents incorrect discriminators from affecting BFD sessions
that are established using correct discriminators and prevents the BFD sessions
from alternating between Up and Down.
Dynamic establishment When a BFD session is to be established dynamically, the system processes
the local and remote discriminators as follows:
2022-07-08 410
Feature Description
Mode Description
• Down: A BFD session is in the Down state or a request has been sent.
• Init: The local end can communicate with the remote end and wants the session state to be Up.
The BFD status is displayed in the State field of a BFD Control packet. The system changes the session status
based on the local session status and the received session status of the peer.
The BFD state machine implements a three-way handshake for BFD session establishment or deletion to
ensure that the two systems detect the status changes.
The following shows BFD session establishment to describe the state machine transition process.
2022-07-08 411
Feature Description
1. Device A and Device B start their own BFD state machines with the initial state of Down. Device A and
Device B send BFD Control packets with the State field set to Down. If a static BFD session is
established, the Your Discriminator value in the BFD Control packets is manually specified. If a
dynamic BFD session is established, the Your Discriminator value is 0.
2. After receiving a BFD Control packet with the State field set to Down, Device B switches its state to
Init and sends a BFD Control packet with the State field set to Init.
After the local BFD session status changes to Init, Device B no longer processes received BFD Control packets with
the State field set to Down.
3. The BFD status change of Device A is the same as that of Device B, and Device A sends a packet with
the State field being Init to Device B.
4. Upon receipt of the BFD Control packet with the State field set to Init, Device B changes the local
status to Up.
5. The BFD status changes on Device A in the same way as that on Device B.
2022-07-08 412
Feature Description
Diag (Diagnostic) 5 bits Diagnostic word, which indicates the cause of a session status
change on the local BFD system:
0: No diagnostic information is displayed.
1: Detection timed out.
2: The Echo function failed.
3: The peer session went Down.
4: A BFD session on the forwarding plane was reset.
5: A path monitored by BFD went Down.
6: A cascaded path that is associated with the path monitored by
BFD went Down.
7: A BFD session is in the AdminDown state.
8: A reverse cascaded path that is associated with the path
monitored by BFD went Down.
9 to 31: reserved for future use.
P (Poll) 1 bit Whether the transmit end instructs the receive end to respond to a
packet:
0: The transmit ends request no confirmation.
1: The transmit end requests the receive end to confirm a connection
request or a parameter change.
F (Final) 1 bit Whether the transmit end responds to a packet with the P bit set to
1:
0: The transmit end does not respond to a packet with the P bit set
2022-07-08 413
Feature Description
to 1.
1: The transmit end responds to a packet with the P bit set to 1.
C (Control Plane 1 bit Whether the forwarding plane is separate from the control plane:
Independent) 0: The forwarding plane is not separate from the control plane. At
least one of the received peer C bit and local C bit is not 1, indicating
that BFD packets are transmitted on the control plane. In this case, if
the BFD session detects a Down event during GR, the service does
not need to respond.
1: The forwarding plane is separate from the control plane. Both the
received peer C bit and local C bit are 1, indicating that the BFD
implementation of the transmit end does not depend on the control
plane. The BFD packets are transmitted on the forwarding plane.
Even if the control plane fails, the BFD can still take effect. For
example, during the IS-IS GR process on the control plane, BFD
continues to monitor the link status using BFD packets with the C bit
set to 1. In this case, if the BFD session detects a Down event during
GR, the service module responds to the Down event by changing the
topology and routes to minimize traffic loss.
M (Multipoint) 1 bit This bit is reserved for BFD to support P2MP extension in the future.
Detect Mult 8 bits Detection timeout multiplier, which is used by the detecting party to
calculate the detection timeout period:
Demand mode: The local detection multiplier takes effect.
Asynchronous mode: The peer detection multiplier takes effect.
2022-07-08 414
Feature Description
Desired Min TX 32 bits Locally supported minimum interval (in milliseconds) at which BFD
Interval Control packets are sent.
Required Min RX 32 bits Locally supported minimum interval (in milliseconds) at which BFD
Interval Control packets are received.
Required Min 32 bits Locally supported minimum interval (in milliseconds) at which Echo
Echo RX Interval packets are received. Value 0 indicates that the local device does not
support the Echo function.
• Single-hop BFD checks the IP continuity between directly connected systems. The single hop refers to a
hop on an IP link. Single-hop BFD allows only one BFD session to be established for a specified data
protocol on a specified interface.
• Multi-hop BFD detects all paths between two systems. Each path may contain multiple hops, and these
paths may partially overlap.
2022-07-08 415
Feature Description
Typical application 2:
As shown in Figure 2, BFD monitors the multi-hop IPv4 path between Device A and Device C, and BFD
sessions are bound only to peer IP addresses.
Typical application 4:
As shown in Figure 4, BFD monitors the multi-hop IPv6 path between Device A and Device C, and BFD
sessions are bound only to peer IP addresses.
2022-07-08 416
Feature Description
In BFD for IP scenarios, BFD for PST is configured on a device. If a link fault occurs, BFD detects the fault and
triggers the PST to go Down. If the device restarts and the link fault persists, BFD is in the AdminDown state
and does not notify the PST of BFD Down. As a result, the PST is not triggered to go Down and the interface
bound to BFD is still Up.
Usage Scenario
2022-07-08 417
Feature Description
As shown in Figure 1, multicast BFD is configured on both Device A and Device B. BFD sessions are bound to
the outbound interface If1, and the default multicast address is used. After the configuration is complete,
multicast BFD quickly checks the continuity of the link between interfaces.
Usage Scenario
Figure 1 BFD for PIS
In Figure 1, a BFD session is established between Device A and Device B, and the default multicast address is
used to check the continuity of the single-hop link connected to the interface If1. After BFD for PIS is
configured and BFD detects a link fault, BFD immediately sends a message indicating the Down state to the
associated interface. The interface then enters the BFD Down state.
2022-07-08 418
Feature Description
On the network shown in Figure 1, a BFD for link-bundle session consists of one main session and multiple
sub-sessions.
• Each sub-session independently monitors an Eth-Trunk member interface and reports the monitoring
results to the main session. Each sub-session uses the same monitoring parameters as the main session.
• The main session creates a BFD sub-session for each Eth-Trunk member interface, summarizes the sub-
session monitoring results, and determines the status of the Eth-Trunk.
■ The main session is down only when all its sub-sessions are down.
■ If no member interfaces are added to the Eth-Trunk interface, the BFD for link-bundle session does
not have sub-sessions. In this case, the main session is down.
The main session's local discriminator is allocated from the range from 0x00100000 to 0x00103fff without
occupying the original BFD session discriminator range. The main session does not learn the remote
discriminator because it does not send or receive packets. A sub-session's local discriminator is allocated
from the original dynamic BFD session discriminator range using the same algorithm as a dynamic BFD
session.
Only sub-sessions consume BFD session resources per board. A sub-session must select the board on which
the physical member interface bound to this sub-session resides as a state machine board. If no BFD session
resources are available on the board, board selection fails. In this situation, the sub-session's status is not
used to determine the main session's status.
The process of establishing a passive BFD echo session as shown in Figure 1 is as follows:
1. Device B functions as a BFD session initiator and sends an asynchronous BFD packet to Device A. The
Required Min Echo RX Interval field carried in the packet is a nonzero value, which specifies that
Device A must support BFD echo.
2. After receiving the packet, Device A finds that the value of the Required Min Echo RX Interval field
carried in the packet is a nonzero value. If Device A has passive BFD echo enabled, it checks whether
any ACL that restricts passive BFD echo is referenced. If an ACL is referenced, only BFD sessions that
match specific ACL rules can enter the asynchronous echo mode. If no ACL is referenced, BFD sessions
immediately enter the asynchronous echo mode.
3. Device B periodically sends BFD echo packets, and Device A sends BFD echo packets (the source and
destination IP addresses are the local IP address, and the destination physical address is Device B's
physical address) at the interval specified by the Required Min RX Interval field. Both Device A and
Device B start a receive timer, with a receive interval that is the same as the interval at which they
2022-07-08 420
Feature Description
4. After Device A and Device B receive BFD echo packets from each other, they immediately loop back
the packets at the forwarding layer. Device A and Device B also send asynchronous BFD packets to
each other at an interval that is much less than that for sending echo packets.
Table 1 Differences between BFD echo sessions and common static single-hop sessions
Common IPv4 and Static MD and YD A matching session The source and
static single- IPv6 single-hop must be must be established destination IP addresses
hop session session configured. on the peer. are different.
Passive BFD IPv4 and Dynamic No MD or YD A matching session Both the source and
echo session IPv6 single-hop needs to be must be established destination IP addresses
session configured. and echo must be are a local IP address of
2022-07-08 421
Feature Description
One-arm IPv4 Static Only MD A matching session If the source address and
BFD echo IPv4 and single-hop needs to be does not need to be destination address are
session IPv6 session configured established on the not specified when a one-
(MD and YD peer. arm BFD echo session is
IPv4
are the created, the source and
same). destination IP addresses
are the same and the local
IP address is used.
If the unicast reverse path
forwarding (URPF)
function is enabled, to
prevent BFD packets from
being discarded
incorrectly, you need to
specify the source address
when creating a one-arm
BFD echo session. In this
case, the source address is
the specified IP address.
If a one-arm BFD echo
session is created on an
active-active network, the
destination IP address
must be specified to
ensure that BFD packets
can be sent back to the
correct initiating device.
Multi-hop session The board with the interface that receives BFD
negotiation packets is preferentially selected. If the
2022-07-08 422
Feature Description
Single-hop session bound to a physical interface If the board on which the bound interface or sub-
or its sub-interfaces interfaces reside is BFD-capable in hardware, this board
is selected. If the board does not have available BFD
resources, board selection fails.
Single-hop session bound to a trunk interface A board is selected from the boards on which trunk
member interfaces reside. If none of the boards has
available BFD resources, board selection fails.
If all of these boards are BFD-capable in hardware, one
will be selected based on load balancing.
BFD for LDP LSP session If an outbound interface is configured for a BFD for LDP
LSP session, the board on which the outbound interface
resides is preferentially selected.
If the outbound interface is a tunnel interface, a board
is selected based on multi-hop session rules because
tunnel interfaces reside on the main control board that
is BFD-incapable in hardware.
If the board on which the outbound interface resides is
BFD-capable in hardware, this board is selected.
If a BFD session is not configured with an outbound
interface, a board is selected for the BFD session based
on multi-hop session rules.
BFD for VLANIF session If a single-hop BFD for IP session that is not a Per-Link
session is bound to a VLANIF interface, board selection
is performed among the physical boards where Eth-
Trunk member interfaces reside. If none of the physical
boards have resources, board selection fails.
If a single-hop BFD for IP session that is a Per-Link
session is bound to a VLANIF interface, board selection
is performed for each of the physical boards where Eth-
2022-07-08 423
Feature Description
• If a BFD session associated with a static route detects a link failure when the BFD session is Down, the
BFD session reports the link failure to the system. The system then deletes the static route from the IP
routing table.
• If a BFD session associated with a static route detects that a faulty link recovers when the BFD session is
Up, the BFD session reports the fault recovery to the system. The system then adds the static route to
the IP routing table again.
• By default, a static route can still be selected even though the BFD session associated with it is
AdminDown (triggered by the shutdown command run either locally or remotely). If a device is
restarted, the BFD session needs to be re-negotiated. In this case, whether the static route associated
with the BFD session can be selected as the optimal route is subject to the re-negotiated BFD session
status.
2022-07-08 424
Feature Description
• Single-hop detection
In single-hop detection mode, the configured outbound interface and next hop address are the
information about the directly connected next hop. The outbound interface associated with the BFD
session is the outbound interface of the static route, and the peer address is the next hop address of the
static route.
• Multi-hop detection
In multi-hop detection mode, only the next hop address is configured. Therefore, the static route must
recurse to the directly connected next hop and outbound interface. The peer address of the BFD session
is the original next hop address of the static route, and the outbound interface is not specified. In most
cases, the original next hop is an indirect next hop. Multi-hop detection is performed on the static
routes that support route recursion.
For details about BFD, see the HUAWEI NE40E-M2 series Universal Service RouterFeature Description - Network
Reliability.
Background
Routing Information Protocol (RIP)-capable devices monitor the neighbor status by exchanging Update
packets periodically. During the period local devices detect link failures, carriers or users may lose a large
number of packets. Bidirectional forwarding detection (BFD) for RIP can speed up fault detection and route
convergence, which improves network reliability.
After BFD for RIP is configured on the Router, BFD can detect a fault (if any) within milliseconds and notify
the RIP module of the fault. The Router then deletes the route that passes through the faulty link and
switches traffic to a backup link. This process speeds up RIP convergence.
Table 1 describes the differences before and after BFD for RIP is configured.
Related Concepts
The BFD mechanism bidirectionally monitors data protocol connectivity over the link between two routers.
After BFD is associated with a routing protocol, BFD can rapidly detect a fault (if any) and notify the
2022-07-08 425
Feature Description
protocol module of the fault, which speeds up route convergence and minimizes traffic loss.
• Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators) must be
configured, and requests must be delivered manually to establish BFD sessions.
Static BFD is applicable to networks on which only a few links require high reliability.
• Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing protocols, and the local
discriminator is dynamically allocated, whereas the remote discriminator is obtained from BFD packets
sent by the neighbor.
When a new neighbor relationship is set up, a BFD session is established based on the neighbor and
detection parameters, including source and destination IP addresses. When a fault occurs on the link,
the routing protocol associated with BFD can detect the BFD session Down event. Traffic is switched to
the backup link immediately, which minimizes data loss.
Dynamic BFD is applicable to networks that require high reliability.
Implementation
For details about BFD implementation, see "BFD" in Universal Service Router Feature Description - Reliability
. Figure 1 shows a typical network topology for BFD for RIP.
1. RIP neighbor relationships are established among Device A, Device B, and Device C and between
Device B and Device D.
3. Device A calculates routes, and the next hop along the route from Device A to Device D is Device
B.
4. If a fault occurs on the link between Device A and Device B, BFD will rapidly detect the fault and
report it to Device A. Device A then deletes the route whose next hop is Device B from the routing
table.
5. Device A recalculates routes and selects a new path Device C → Device B → Device D.
6. After the link between Device A and Device B recovers, a new BFD session is established between
the two routers. Device A then reselects an optimal link to forward packets.
1. RIP neighbor relationships are established among Device A, Device B, and Device C and between
Device B and Device D.
2022-07-08 426
Feature Description
3. If a fault occurs on the link between Device A and Device B, BFD will rapidly detect the fault and
report it to Device A. Device A then deletes the route whose next hop is Device B from the routing
table.
4. After the link between Device A and Device B recovers, a new BFD session is established between
the two routers. Device A then reselects an optimal link to forward packets.
Usage Scenario
BFD for RIP is applicable to networks that require high reliability.
Benefits
BFD for RIP improves network reliability and enables devices to rapidly detect link faults, which speeds up
route convergence on RIP networks.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults between
forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two systems. The path
can be a physical link, a logical link, or a tunnel.
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session quickly detects a link fault and then
notifies OSPF of the fault, which speeds up OSPF's response to network topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol convergence must be
as quick as possible to improve network availability. Link faults are inevitable, and therefore a solution must
be provided to quickly detect faults and notify routing protocols.
2022-07-08 427
Feature Description
BFD for Open Shortest Path First (OSPF) associates BFD sessions with OSPF. After BFD for OSPF is
configured, BFD quickly detects link faults and notifies OSPF of the faults. BFD for OSPF accelerates OSPF
response to network topology changes.
Table 1 describes OSPF convergence speeds before and after BFD for OSPF is configured.
Table 1 OSPF convergence speeds before and after BFD for OSPF is configured
Principles
Figure 1 BFD for OSPF
Figure 1 shows a typical network topology with BFD for OSPF configured. The principles of BFD for OSPF are
described as follows:
3. The outbound interface on Device A connected to Device B is interface 1. If the link between Device A
and Device B fails, BFD detects the fault and then notifies Device A of the fault.
4. Device A processes the event that a neighbor relationship goes Down and recalculates routes. The new
route passes through Device C and reaches Device A, with interface 2 as the outbound interface.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults between
forwarding engines.
2022-07-08 428
Feature Description
To be specific, BFD detects the connectivity of a data protocol along a path between two systems. The path
can be a physical link, a logical link, or a tunnel.
In BFD for OSPFv3, a BFD session is associated with OSPFv3. The BFD session quickly detects a link fault and
then notifies OSPFv3 of the fault, which speeds up OSPFv3's response to network topology changes.
Purpose
A link fault or a topology change causes devices to recalculate routes. Therefore, it is important to shorten
the convergence time of routing protocols to improve network performance.
As link faults are inevitable, rapidly detecting these faults and notifying routing protocols is an effective way
to quickly resolve such issues. If BFD is associated with the routing protocol and a link fault occurs, BFD can
speed up the convergence of the routing protocol.
Principles
Figure 1 BFD for OSPFv3
Figure 1 shows a typical network topology with BFD for OSPFv3 configured. The principles of BFD for
OSPFv3 are described as follows:
3. The outbound interface of the route from DeviceA to DeviceB is interface 1. If the link between Device
A and DeviceB fails, BFD detects the fault and notifies DeviceA of the fault.
4. DeviceA processes the neighbor Down event and recalculates the route. The new outbound interface
of the route is interface 2. Packets from DeviceA pass through DeviceC to reach DeviceB.
2022-07-08 429
Feature Description
• Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators) are set using
commands, and requests must be delivered manually to establish BFD sessions.
• Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing protocols.
BFD for IS-IS enables BFD sessions to be dynamically established. After detecting a fault, BFD notifies IS-IS of
the fault. IS-IS sets the neighbor status to Down, quickly updates link state protocol data units (LSPs), and
performs the partial route calculation (PRC). BFD for IS-IS implements fast IS-IS route convergence.
Instead of replacing the Hello mechanism of IS-IS, BFD works with IS-IS to rapidly detect the faults that occur on
neighboring devices or links.
■ Global BFD is enabled on each device, and BFD is enabled on a specified interface or process.
■ Neighbors are Up, and a designated intermediate system (DIS) has been elected on a broadcast
network.
■ P2P network
After the conditions for establishing BFD sessions are met, IS-IS instructs the BFD module to
establish a BFD session and negotiate BFD parameters between neighbors.
■ Broadcast network
After the conditions for establishing BFD sessions are met and the DIS is elected, IS-IS instructs BFD
2022-07-08 430
Feature Description
to establish a BFD session and negotiate BFD parameters between the DIS and each device. No
BFD sessions are established between non-DISs.
On broadcast networks, devices (including non-DIS devices) of the same level on a network segment
can establish adjacencies. In BFD for IS-IS, however, BFD sessions are established only between the DIS
and non-DISs. On P2P networks, BFD sessions are directly established between neighbors.
If a Level-1-2 neighbor relationship is set up between the devices on both ends of a link, the following
situations occur:
■ On a broadcast network, IS-IS sets up a Level-1 BFD session and a Level-2 BFD session.
■ P2P network
If the neighbor relationship established between P2P IS-IS interfaces is not Up, IS-IS tears down the
BFD session.
■ Broadcast network
If the neighbor relationship established between broadcast IS-IS interfaces is not Up or the DIS is
reelected on the broadcast network, IS-IS tears down the BFD session.
If the configurations of dynamic BFD sessions are deleted or BFD for IS-IS is disabled from an interface,
all Up BFD sessions established between the interface and its neighbors are deleted. If the interface is a
DIS and the DIS is Up, all BFD sessions established between the interface and its neighbors are deleted.
If BFD is disabled from an IS-IS process, BFD sessions are deleted from the process.
BFD detects only the one-hop link between IS-IS neighbors because IS-IS establishes only one-hop neighbor
relationships.
Usage Scenario
Dynamic BFD needs to be configured based on the actual network. If the time parameters are not configured correctly,
network flapping may occur.
BFD for IS-IS speeds up route convergence through rapid link failure detection. The following is a networking
example for BFD for IS-IS.
2022-07-08 431
Feature Description
If the link between Device A and Device B fails, BFD can rapidly detect the fault and report it to IS-IS. IS-IS
sets the neighbor status to Down to trigger an IS-IS topology calculation. IS-IS also updates LSPs so that
Device C can promptly receive the updated LSPs from Device B, which accelerates network topology
convergence.
Fundamentals
On the network shown in Figure 1, DeviceA and DeviceB belong to AS 100 and AS 200, respectively. An EBGP
connection is established between the two devices.
BFD is used to monitor the BGP peer relationship between DeviceA and DeviceB. If the link between them
becomes faulty, BFD can quickly detect the fault and notifies BGP.
2022-07-08 432
Feature Description
On the network shown in 2, indirect multi-hop EBGP connections are established between DeviceA and
DeviceC and between DeviceB and DeviceD; a BFD session is established between DeviceA and DeviceC; a
BGP peer relationship is established between DeviceA and DeviceB; the bandwidth between DeviceA and
DeviceB is low. If the original forwarding path DeviceA->DeviceC goes faulty, traffic that is sent from DeviceE
to DeviceA is switched to the path DeviceA->DeviceB->DeviceD->DeviceC. Due to low bandwidth on the link
between DeviceA and DeviceB, traffic loss may occur on this path.
BFD for BGP TTL check applies only to the scenario in which DeviceA and DeviceC are indirectly connected EBGP peers.
Figure 2 Network diagram of setting a TTL value for checking the BFD session with a BGP peer
To prevent this issue, you can set a TTL value on DeviceC for checking the BFD session with DeviceA. If the
number of forwarding hops of a BFD packet (TTL value in the packet) is smaller than the TTL value set on
DeviceC, the BFD packet is discarded, and BFD detects a session down event and notifies BGP. DeviceA then
sends BGP Update messages to DeviceE for route update so that the traffic forwarding path can change to
DeviceE->DeviceF->DeviceB->DeviceD->DeviceC. For example, the TTL value for checking the BFD session on
DeviceC is set to 254. If the link between DeviceA and DeviceC fails, traffic sent from DeviceE is forwarded
through the path DeviceA->DeviceB->DeviceD->DeviceC. In this case, the TTL value in a packet decreases to
252 when the packet reaches DeviceC. Since 252 is smaller than the configured TTL value 254, the BFD
packet is discarded, and BFD detects a session down event and notifies BGP. DeviceA then sends BGP Update
messages to DeviceE for route update so that the traffic forwarding path can change to DeviceE->DeviceF->
DeviceB->DeviceD->DeviceC.
2022-07-08 433
Feature Description
Background
If a node or link along an LDP LSP that is transmitting traffic fails, traffic switches to a backup LSP. The path
switchover speed depends on the detection duration and traffic switchover duration. A delayed path
switchover causes traffic loss. LDP fast reroute (FRR) can be used to speed up the traffic switchover, but not
the detection process.
As shown in Figure 1, a local label switching router (LSR) periodically sends Hello messages to notify each
peer LSR of the local LSR's presence and establish a Hello adjacency with each peer LSR. The local LSR
constructs a Hello hold timer to maintain the Hello adjacency with each peer. Each time the local LSR
receives a Hello message, it updates the Hello hold timer. If the Hello hold timer expires before a Hello
message arrives, the LSR considers the Hello adjacency disconnected. The Hello mechanism cannot rapidly
detect link faults, especially when a Layer 2 device is deployed between the local LSR and its peer.
The rapid, light-load BFD mechanism is used to quickly detect faults and trigger a primary/backup LSP
switchover, which minimizes data loss and improves service reliability.
A BFD session that monitors LDP LSPs is negotiated in either static or dynamic mode:
• Static configuration: The negotiation of a BFD session is performed using the local and remote
discriminators that are manually configured for the BFD session to be established. On a local LSR, you
can bind an LSP with a specified next-hop IP address to a BFD session with a specified peer IP address.
2022-07-08 434
Feature Description
• Dynamic establishment: The negotiation of a BFD session is performed using the BFD discriminator
type-length-value (TLV) in an LSP ping packet. You must specify a policy for establishing BFD sessions
on a local LSR. The LSR automatically establishes BFD sessions with its peers and binds the BFD sessions
to LSPs using either of the following policies:
■ Host address-based policy: The local LSR uses all host addresses to establish BFD sessions. You can
specify a next-hop IP address and an outbound interface name of LSPs and establish BFD sessions
to monitor the specified LSPs.
■ Forwarding equivalence class (FEC)-based policy: The local LSR uses host addresses listed in a
configured FEC list to automatically establish BFD sessions.
BFD uses the asynchronous mode to check LSP continuity. That is, the ingress and egress periodically send
BFD packets to each other. If one end does not receive BFD packets from the other end within a detection
period, BFD considers the LSP Down and sends an LSP Down message to the LSP management (LSPM)
module.
Although BFD for LDP is enabled on a proxy egress, a BFD session cannot be established for the reverse path of a proxy
egress LSP on the proxy egress.
• A BFD for LDP tunnel session is triggered using a host IP address, a FEC list, or an IP prefix list.
• No next-hop address or outbound interface name can be specified in any BFD session trigger policies.
Usage Scenarios
• BFD for LDP LSP can be used when primary and bypass LDP FRR LSPs are established.
2022-07-08 435
Feature Description
• BFD for LDP Tunnel can be used when primary and bypass virtual private network (VPN) FRR LSPs are
established.
Benefits
BFD for LDP LSP provides a rapid, light-load fault detection mechanism for LDP LSPs, which improves
network reliability.
Benefits
No tunnel protection is provided in the NG-MVPN over P2MP TE function or VPLS over P2MP TE function. If
a tunnel fails, traffic can only be switched using route change-induced hard convergence, which renders low
performance. This function provides dual-root 1+1 protection for the NG-MVPN over P2MP TE function and
VPLS over P2MP TE function. If a P2MP TE tunnel fails, BFD for P2MP TE rapidly detects the fault and
switches traffic, which improves fault convergence performance and reduces traffic loss.
Principles
Figure 1 BFD for P2MP TE principles
In Figure 1, BFD is enabled on the root PE1 and the backup root PE2. Leaf nodes UPE1 to UEP4 are enabled
to passively create BFD sessions. Both PE1 and PE2 sends BFD packets to all leaf nodes along P2MP TE
tunnels. The leaf nodes receive the BFD packets transmitted only on the primary tunnel. If a leaf node
receives detection packets within a specified interval, the link between the root node and leaf node is
2022-07-08 436
Feature Description
working properly. If a leaf node fails to receive BFD packets within a specified interval, the link between the
root node and leaf node fails. The leaf node then rapidly switches traffic to a protection tunnel, which
reduces traffic loss.
Figure 1 BFD
On the network shown in Figure 1, without BFD, if LSRE is faulty, LSRA and LSRF cannot immediately detect
the fault due to the existence of Layer 2 switches, and the Hello mechanism will be used for fault detection.
However, Hello mechanism-based fault detection is time-consuming.
To address these issues, BFD can be deployed. With BFD, if LSRE fails, LSRA and LSRF can detect the fault in
a short time, and traffic can be rapidly switched to the path LSRA -> LSRB -> LSRD -> LSRF.
BFD for TE can quickly detect faults on CR-LSPs. After detecting a fault on a CR-LSP, BFD immediately
notifies the forwarding plane of the fault to rapidly trigger a traffic switchover. BFD for TE is usually used
together with the hot-standby CR-LSP mechanism.
A BFD session is bound to a CR-LSP and established between the ingress and egress. A BFD packet is sent by
the ingress to the egress along the CR-LSP. Upon receipt, the egress responds to the BFD packet. The ingress
can rapidly monitor the link status of the CR-LSP based on whether a reply packet is received.
After detecting a link fault, BFD reports the fault to the forwarding module. The forwarding module searches
for a backup CR-LSP and switches service traffic to the backup CR-LSP. The forwarding module then reports
the fault to the control plane.
2022-07-08 437
Feature Description
On the network shown in Figure 2, a BFD session is set up to detect faults on the link of the primary LSP. If a
fault occurs on this link, the BFD session on the ingress immediately notifies the forwarding plane of the
fault. The ingress switches traffic to the backup CR-LSP and sets up a new BFD session to detect faults on
the link of the backup CR-LSP.
On the network shown in Figure 3, a primary CR-LSP is established along the path LSRA -> LSRB, and a hot-
standby CR-LSP is configured. A BFD session is set up between LSRA and LSRB to detect faults on the link of
the primary CR-LSP. If a fault occurs on the link of the primary CR-LSP, the BFD session rapidly notifies LSRA
of the fault. After receiving the fault information, LSRA rapidly switches traffic to the hot-standby CR-LSP to
ensure traffic continuity.
2022-07-08 438
Feature Description
Background
When a Layer 2 device is deployed on a link between two RSVP nodes, an RSVP node can only use the Hello
mechanism to detect a link fault. For example, on the network shown in Figure 1, a switch exists between P1
and P2. If a fault occurs on the link between the switch and P2, P1 keeps sending Hello packets and detects
the fault after it fails to receive replies to the Hello packets. The fault detection latency causes seconds of
traffic loss. To minimize packet loss, BFD for RSVP can be configured. BFD rapidly detects a fault and triggers
TE FRR switching, which improves network reliability.
Implementation
2022-07-08 439
Feature Description
Usage Scenario
BFD for RSVP applies to a network on which a Layer 2 device exists between the TE FRR point of local repair
(PLR) on a bypass CR-LSP and an RSVP node on the primary CR-LSP.
Benefits
BFD for RSVP improves reliability on MPLS TE networks with Layer 2 devices.
Context
A VRRP group uses VRRP Advertisement packets to negotiate the master/backup VRRP status, implementing
device backup. If the link between devices in a VRRP group fails, VRRP Advertisement packets cannot be
exchanged to negotiate the master/backup status. A backup device attempts to preempt the master role
after a period that is three times the interval at which VRRP Advertisement packets are sent. During this
period, user traffic is still forwarded to the master device, which results in user traffic loss.
Bidirectional Forwarding Detection (BFD) is used to rapidly detect faults in links or IP routes. BFD for VRRP
enables a master/backup VRRP switchover to be completed within 1 second, thereby preventing traffic loss.
A BFD session is established between the master and backup devices in a VRRP group and is bound to the
VRRP group. BFD immediately detects communication faults in the VRRP group and instructs the VRRP
group to perform a master/backup switchover, minimizing service interruptions.
VRRP and BFD association modes
Association between VRRP and BFD can be implemented in the following modes. Table 1 lists their
differences.
Association
Usage Scenario Type of Associated Impact Mode BFD Support
Mode BFD Session
Association
A backup device Static BFD sessions or The VRRP group VRRP-enabled devices
between monitors the status of static BFD sessions adjusts priorities must support BFD.
2022-07-08 440
Feature Description
Association
Usage Scenario Type of Associated Impact Mode BFD Support
Mode BFD Session
Association
The master and Static BFD sessions or If the link or peer BFD VRRP-enabled devices
between backup devices static BFD sessions session goes down, must support BFD.
a VRRP monitor the link and with automatically BFD notifies the VRRP
group peer BFD sessions negotiated group of the fault.
and simultaneously. A link discriminators After receiving the
link BFD session is notification, the VRRP
and established between group immediately
peer the master and backup performs a
BFD devices. A peer BFD master/backup VRRP
sessions session is established switchover.
between a
downstream switch
and each VRRP device.
BFD helps determine
whether the fault
occurs between the
master device and
downstream switch or
between the backup
device and
downstream switch.
2022-07-08 441
Feature Description
Figure 1 Network diagram of associating a VRRP group with a common BFD session
• DeviceA (master) works in delayed preemption mode and its VRRP priority is 120.
• DeviceB works in immediate preemption mode and functions as the backup in the VRRP group with a
priority of 100.
• DeviceB in the VRRP group is configured to monitor a common BFD session. If BFD detects a fault and
the BFD session goes down, DeviceB increases its VRRP priority by 40.
1. Normally, DeviceA periodically sends VRRP Advertisement packets to notify DeviceB that it is working
properly. DeviceB monitors the status of DeviceA and the BFD session.
2. If BFD detects a fault, the BFD session goes down. DeviceB increases its VRRP priority to 140 (100 + 40
= 140), making it higher than DeviceA's VRRP priority. DeviceB then immediately preempts the master
role and sends gratuitous ARP packets to allow DeviceE to update address entries.
3. The BFD session goes up after the fault is rectified. In this case:
DeviceB restores its VRPP priority to 100 (140 – 40 = 100). DeviceB remains in the Master state and
continue to send VRRP6 Advertisement packets.
After receiving these packets, DeviceA checks that the VRRP priority carried in them is lower than the
local VRRP priority and preempts the master role after the specified VRRP status recovery delay
expires. DeviceA then sends VRRP Advertisement and gratuitous ARP packets.
After receiving a VRRP Advertisement packet that carries a priority higher than the local priority,
DeviceB enters the Backup state.
4. Both DeviceA and DeviceB are restored to their original states. As such, DeviceA forwards user-to-
network traffic again.
The preceding process shows that association between VRRP and BFD differs from VRRP. Specifically, after a
VRRP group is associated with a BFD session and a fault occurs, the backup device immediately preempts
the master role by increasing its VRRP priority, and it does not wait for a period three times the interval at
2022-07-08 442
Feature Description
which VRRP Advertisement packets are sent. This means that a master/backup VRRP switchover can be
performed in milliseconds.
Association Between a VRRP Group and Link and Peer BFD Sessions
In Figure 2, the master and backup devices monitor the status of link and peer BFD sessions. The BFD
sessions help determine whether a link fault is a local or remote fault.
DeviceA and DeviceB run VRRP. A peer BFD session is established between DeviceA and DeviceB to detect
link and device faults. A link BFD session is established between DeviceA and DeviceE and between DeviceB
and DeviceE to detect link and device faults. After DeviceB detects that the peer BFD session goes down and
the link BFD session between DeviceE and DeviceB goes up, DeviceB switches to the Master state and
forwards user-to-network traffic.
Figure 2 Network diagram of associating a VRRP group with link and peer BFD sessions
• A peer BFD session is established between DeviceA and DeviceB to detect link and device faults between
them.
• Link 1 and link 2 BFD sessions are established between DeviceE and DeviceA and between DeviceE and
DeviceB, respectively.
1. Normally, DeviceA periodically sends VRRP Advertisement packets to inform DeviceB that it is working
properly and monitors the BFD session status. DeviceB monitors the status of DeviceA and the BFD
session.
2. The BFD session goes down if BFD detects either of the following faults:
• Link 1 or DeviceE fails. In this case, link 1 BFD session and the peer BFD session go down. Link 2
BFD session is up.
2022-07-08 443
Feature Description
• DeviceA fails. In this case, link 1 BFD session and the peer BFD session go down. Link 2 BFD
session is up. DeviceB's VRRP state switches to Master.
3. After the fault is rectified, all the BFD sessions go up. If DeviceA works in preemption mode, DeviceA
and DeviceB are restored to their original VRRP states after VRRP negotiation is complete.
In normal cases, DeviceA's VRRP status is not impacted by a link 2 fault, instead, DeviceA continues to forward user-to-
network traffic. However, Device's VRRP status switches to Master if both the peer BFD session and link 2 BFD session go
down, and DeviceB detects the peer BFD session down event before detecting the link 2 BFD session down event. After
DeviceB detects the link 2 BFD session down event, DeviceB's VRRP status switches to Initialize.
Figure 3 shows the state machine for association between a VRRP group and link and peer BFD sessions.
Figure 3 State machine for association between a VRRP group and link and peer BFD sessions
The preceding process shows that after link BFD for VRRP and peer BFD for VRRP are configured, the backup
device can immediately switch to the Master state if a fault occurs, without waiting for a period three times
the interval at which VRRP Advertisement packets are sent or changing its VRRP priority. This means that a
master/backup VRRP switchover can be performed in milliseconds.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
Service Overview
Bidirectional Forwarding Detection (BFD) for pseudo wire (PW) monitors PW connectivity on a Layer 2
virtual private network (L2VPN) and informs the L2VPN of any detected faults. Upon receiving a fault
2022-07-08 444
Feature Description
notification from BFD, the L2VPN performs a primary/secondary PW switchover to protect services.
BFD for PW has two modes: time to live (TTL) and non-TTL.
The two static BFD for PW modes are described as follows:
• Static BFD for PW in TTL mode: The TTL of BFD packets is automatically calculated or manually
configured. BFD packets are encapsulated with PW labels and transmitted over PWs. A PW can either
have the control word enabled or not. The usage scenarios of static BFD for PW in TTL mode are as
follows:
■ Static BFD for single-segment PW (SS-PW): Two BFD-enabled nodes negotiate a BFD session based
on the configured peer address and TTL (the TTL for SS-PWs is 1) and exchange BFD packets to
monitor PW connectivity.
■ Static BFD for multi-segment PW (MS-PW): The remote peer address of the MS-PW to be detected
must be specified. BFD packets can pass through multiple superstratum provider edge devices
(SPEs) to reach the destination, regardless of whether the control word is enabled for the PW.
• Static BFD for PW in non-TTL mode: The TTL of BFD packets is fixed at 255. BFD packets are
encapsulated with PW labels and transmitted over PWs. A PW must have the control word enabled and
differentiate control packets from data packets by checking whether these packets carry the control
word.
Networking Description
Figure 1 Service transmission over E2E PWs
Figure 1 shows an IP radio access network (RAN) that consists of the following device roles:
• Cell site gateway (CSG): CSGs form the access network. On the IP RAN, CSGs function as user-end
provider edge devices (UPEs) to provide access services for NodeBs.
2022-07-08 445
Feature Description
• Aggregation site gateway (ASG): On the IP RAN, ASGs function as SPEs to provide access services for
UPEs.
• Radio service gateway (RSG): ASGs and RSGs form the aggregation network. On the IP RAN, RSGs
function as network provider edge devices (NPEs) to connect to the radio network controller (RNC).
The primary PW is along CSG1–ASG3–RSG5 and the secondary PW is along CSG1–CSG2–ASG4-RSG6. If the
primary PW fails, traffic switches to the secondary PW.
Feature Deployment
Configure static BFD for PW on the IP RAN as follows:
1. On CSG1, configure static BFD for the primary and secondary PWs.
When you configure static BFD for PW, note the following points:
• When you configure static BFD for the primary PW, ensure that the local discriminator on CSG1 is the remote
discriminator on RSG5 and that the remote discriminator on CSG1 is the local discriminator on RSG5.
• When you configure static BFD for the secondary PW, ensure that the local discriminator on CSG1 is the remote
discriminator on RSG6 and that the remote discriminator on CSG1 is the local discriminator on RSG6.
After you configure static BFD for PW on CSG1 and primary/secondary RSGs, services can quickly switch to
the secondary PW if the primary PW fails.
Service Overview
IP/MPLS backbone networks carry an increasing number of multicast services, such as IPTV, video
conferences, and massively multiplayer online role-playing games (MMORPGs), which all require bandwidth
assurance, QoS guarantee, and high network reliability. To provide better multicast services, the IETF
proposed the multicast VPLS solution. On a multicast VPLS network, the ingress transmits multicast traffic to
multiple egresses over a P2MP MPLS tunnel. This solution eliminates the need to deploy PIM and HVPLS on
the transit nodes, simplifying network deployment.
On a multicast VPLS network, multicast traffic can be carried over either P2MP TE tunnels or P2MP mLDP
tunnels. When P2MP TE tunnels are used, P2MP TE FRR must be deployed. If a link fault occurs, FRR allows
traffic to be rapidly switched to a normal link. If a node fails, however, traffic is not switched until the root
node detects the fault and recalculates links to set up a Source to Leaf (S2L) sub-LSP. Topology convergence
takes a long time in this situation, affecting service reliability.
To meet the reliability requirements of multicast services, configure BFD for multicast VPLS to monitor
2022-07-08 446
Feature Description
multicast VPLS links. When a link or node fails, BFD on the leaf nodes can rapidly detect the fault and
trigger protection switching so that the leaf nodes receive traffic from the backup multicast tunnel.
Networking Description
Figure 1 BFD for multicast VPLS
Figure 1 shows a dual-root 1+1 protection scenario in which PE-AGG1 is the master root node and PE-AGG2
is the backup root node. Each root node sets up a complete MPLS multicast tree to the UPEs (leaf nodes).
The two MPLS multicast trees do not have overlapping paths. After multicast flows reach PE-AGG1 and PE-
AGG2, PE-AGG1 and PE-AGG2 send the multicast flows along their respective P2MP tunnels to UPEs. Each
UPE receives two copies of multicast flows and selects one to send to users.
2022-07-08 447
Feature Description
1. An IGP runs between the UPEs, SPEs, and PE-AGGs to implement Layer 3 reachability.
2. Each PE-AGG sets up a P2P tunnel (a TE tunnel or LDP LSP) to each UPE. VPLS PWs are set up using
BGP-AD. In addition, BGP-AD is used to set up P2MP LSPs from PE-AGG1 and PE-AGG2 to the UPEs.
VPLS PWs recurse to the P2MP LSPs.
3. A protection group is configured on each UPE for P2MP tunnels so that each UPE can select one from
the two copies of multicast flows it receives.
4. BFD for multicast VPLS is deployed for P2MP tunnels to implement protection switching when BFD
detects a fault. On the PE-AGGs, BFD is configured to track the upstream AC interfaces. If the AC
between NPE1 and PE-AGG1 fails, the UPEs receive multicast flows from NPE2.
1. A root node triggers the establishment of a BFD session of the MultiPointHead type. Once established,
the BFD session is initially Up and requires no negotiation. BFD triggers the root node to periodically
send LSP ping packets along the P2MP tunnels and to send BFD detection packets at a configured BFD
detection interval.
2. A leaf node receives LSP ping packets and triggers the establishment of a BFD session of the
MultiPointTail type. Once established, the BFD session is initially Down. After the leaf node receives
BFD detection packets indicating that the BFD session on the root node is Up, the leaf node changes
its BFD session to the Up state and starts BFD detection.
BFD for multicast VPLS sessions support only one-way detection. The BFD session of the MultiPointHead type on
a root node only sends packets, whereas the BFD session of the MultiPointTail type on a leaf node only receives
packets.
On the network shown in Figure 1, if link 1 (an AC) fails, BFD on the master root node detects that the AC
interface is Down and stops sending BFD detection packets. The leaf nodes cannot receive BFD detection
packets, and therefore report the Down event, which triggers protection switching. The leaf nodes then
receive multicast flows from the backup multicast tunnel. Similarly, if node 2, link 3, node 4, or link 5 fails,
the leaf nodes also receive multicast flows from the backup multicast tunnel. After the fault is rectified, BFD
sessions are reestablished. The leaf nodes then receive multicast flows from the master multicast tunnel
again.
2022-07-08 448
Feature Description
• Hardware detection: For example, the Synchronous Digital Hierarchy (SDH) alarms are generated if link
faults are detected. Hardware detection detects faults rapidly; however, it is not applicable to all the
media.
• Slow Hello mechanism: It usually refers to the Hello mechanism offered by a routing protocol. This
mechanism takes seconds to detect a fault. In high-speed data transmission, for example, at gigabit
rates, the detection time longer than 1s causes the loss of a large amount of data. In delay-sensitive
services such as voice services, the delay longer than 1s is also unacceptable.
• Other detection mechanisms: Different protocols or device vendors may provide dedicated detection
mechanisms. However, these detection mechanisms are difficult to deploy when systems are
interconnected.
Bidirectional Forwarding Detection (BFD) provides unified detection for all media and protocol layers on the
entire network within milliseconds. Two systems set up a BFD session and periodically send BFD control
packets along the path between them. If one system does not receive BFD control packets within a detection
period, the system considers that a fault has occurred on the path.
In multicast applications, if the current designated router (DR) on a shared network segment is faulty, other
PIM neighbors trigger a new round of DR election only after the neighbor relationship times out. As a result,
multicast data transmission is interrupted. The interruption time (usually in seconds) is not shorter than the
timeout time of the neighbor relationship.
BFD for PIM can detect a link's status on a shared network segment within milliseconds and respond quickly
to a fault on a PIM neighbor. If the interface configured with BFD for PIM does not receive any BFD packets
from the current DR within a configured detection period, the interface considers that a fault has occurred
on the DR. The BFD module notifies the route management (RM) module of the session status, and the RM
module notifies the PIM module. Then, the PIM module triggers a new round of DR election immediately
rather than waiting for the neighbor relationship to time out. This shortens the multicast data transmission
interruption period and improves the reliability of multicast data transmission.
Currently, BFD for PIM can be used on IPv4 and IPv6 PIM-SM/SSM networks.
In Figure 1, on the shared network segment connected to user hosts, a PIM BFD session is set up between
the downstream interface (Port 2) of DeviceB and the downstream interface (Port 1) of DeviceC. Both ends
of the link send BFD packets to detect the link status.
2022-07-08 449
Feature Description
The downstream interface (Port 2) of DeviceB functions as the DR and is responsible for forwarding
multicast data to the receiver. If Port 2 fails, BFD immediately notifies the RM module of the session status,
and the RM module then notifies the PIM module. The PIM module triggers a new round of DR election. The
downstream interface (Port 1) of DeviceC is then elected as the new DR and forward multicast data to the
receiver immediately. This shortens the multicast data transmission interruption period.
2022-07-08 450
Feature Description
Figure 1 BFD for EVPN VPWS over SRv6 dual-homing active-active networking
On the network shown in Figure 1, CE2 is dual-homed to PE2 and PE3. CE1-to-CE2 traffic is transmitted in
load-balancing mode.
After BFD for EVPN VPWS is configured in the EVPL instances on PE1, PE2, and PE3, BFD sessions are
generated follows:
• Bidirectional BFD sessions are generated between PE1 and PE2, and between PE1 and PE3.
On the network shown in Figure 2, CE2 is dual-homed to PE2 and PE3. CE1-to-CE2 traffic is transmitted in
active/standby mode.
After BFD for EVPN VPWS is configured in the EVPL instances on PE1, PE2, and PE3, BFD sessions are
generated follows:
• A bidirectional BFD session is generated between PE1 and PE2 (active PE).
• No bidirectional BFD session is generated between PE1 and PE3 (standby PE).
2022-07-08 451
Feature Description
SBFD Principles
Figure 1 shows SBFD principles. Before link detection, an initiator and a reflector exchange SBFD control
packets to notify each other of SBFD parameters, for example, discriminators. During link detection, the
initiator proactively sends an SBFD Echo packet, and the reflector loops this packet back. The initiator then
determines the local state based on the looped-back packet.
• The initiator is responsible for detection and runs both an SBFD state machine and a detection
mechanism. Because the state machine has only up and down states, the initiator can send packets
carrying only the up or down state and receive packets carrying only the up or AdminDown state.
The initiator starts by sending an SBFD packet carrying the down state to the reflector. The destination
and source port numbers of the packet are 7784 and 4784, respectively; the destination IP address is a
user-configured address on the 127 network segment; the source IP address is the locally configured
LSR ID.
• The reflector does not have any SBFD state machine or detection mechanism. For this reason, it does
not proactively send SBFD Echo packets, but rather, it only reflects SBFD packets.
After receiving an SBFD packet from the initiator, the reflector checks whether the SBFD discriminator
carried in the packet matches the locally configured global SBFD discriminator. If they do not match,
the packet is discarded. If they match and the reflector is in the working state, the reflector reflects back
the packet. If they match but the reflector is not in the working state, the reflector sets the state to
AdminDown in the packet.
The destination and source port numbers in the looped-back packet are 4784 and 7784, respectively;
the source IP address is the locally configured LSR ID; the destination IP address is the source IP address
of the initiator.
2022-07-08 452
Feature Description
• The SBFD packet sent by the initiator carries the binding SID of the SR-MPLS TE tunnel on the reflector.
If the SR-MPLS TE tunnel has primary and backup LSPs, the SBFD packet also carries the
Primary/Backup LSP flag.
• When constructing a loopback packet, the reflector adds the binding SID carried in the SBFD packet
sent by the initiator to the loopback SBFD Echo packet. In addition, depending on the Primary/Backup
LSP flag carried in the SBFD packet, the reflector determines whether to steer the loopback SBFD Echo
packet to the primary or backup LSP of the SR-MPLS TE tunnel. This ensures that the SBFD session
status reflects the actual link status. In real-world deployment, make sure that the forward and reverse
tunnels share the same LSP.
In an inter-AS SR-MPLS TE tunnel scenario, if SBFD return packets are forwarded over the IP route by
default, the inter-AS IP route may be unreachable, causing SBFD to go down. In this case, you can configure
the SBFD return packets to be forwarded over the SR-MPLS TE tunnel.
2022-07-08 453
Feature Description
• Initial state: The initiator sets the initial state to Down in an SBFD packet to be sent to the reflector.
• Status migration: After receiving a looped packet carrying the Up state, the initiator sets the local status
to Up. After the initiator receives a looped packet carrying the Admin Down state, the initiator sets the
local status to Down. If the initiator does not receive a packet looped by the reflector before the timer
expires, the initiator also sets the local status to Down.
• Status holding: When the initiator is in the Up state and receives a looped packet carrying the Up state,
the initiator remains the local state of Up. When the initiator is in the Down state and receives a looped
packet carrying the Admin Down state or receives no packet after the timer expires, the initiator
remains the local state of Down.
2022-07-08 454
Feature Description
After SBFD is configured, PE1 rapidly detects a failure and switches traffic to a backup SR-MPLS TE LSP once
a link or P on the primary LSP fails.
2022-07-08 455
Feature Description
SBFD for SR-MPLS TE LSP determines whether a primary/backup LSP switchover needs to be performed,
whereas SBFD for SR-MPLS TE tunnel checks the actual tunnel status.
• If SBFD for SR-MPLS TE tunnel is not configured, the default tunnel status keeps Up, and the effective
status cannot be determined.
• If SBFD for SR-MPLS TE tunnel is configured but SBFD is administratively down, the tunnel interface
status is unknown because SBFD is not working in this case.
• If SBFD for SR-MPLS TE tunnel is configured and SBFD is not administratively down, the tunnel interface
status is the same as the SBFD status.
1. After SBFD for SR-MPLS TE Policy is enabled on the headend, the endpoint uses the endpoint address
(IPv4 address only) as the remote discriminator of the SBFD session corresponding to the segment list
in the SR-MPLS TE Policy by default. If multiple segment lists exist in the SR-MPLS TE Policy, the
remote discriminators of the corresponding SBFD sessions are the same.
2. The headend sends an SBFD packet encapsulated with a label stack corresponding to the SR-MPLS TE
2022-07-08 456
Feature Description
Policy.
3. After the endpoint device receives the SBFD packet, it returns a reply through the shortest IP link.
4. If the headend receives the reply, it considers that the corresponding segment list in the SR-MPLS TE
Policy is normal. Otherwise, it considers that the segment list is faulty. If all the segment lists
referenced by a candidate path are faulty, SBFD triggers a candidate path switchover.
SBFD return packets are forwarded over IP. If the primary paths of multiple SR-MPLS TE Policies between
two nodes differ due to different path constraints but SBFD return packets are transmitted over the same
path, a fault in the return path may cause all involved SBFD sessions to go down. As a result, all the SR-
MPLS TE Policies between the two nodes go down. The SBFD sessions of multiple segment lists in the same
SR-MPLS TE Policy also have this problem.
By default, if HSB protection is not enabled for an SR-MPLS TE Policy, SBFD detects all the segment lists only
in the candidate path with the highest preference in the SR-MPLS TE Policy. With HSB protection enabled,
SBFD can detect all the segment lists of candidate paths with the highest and second highest priorities in the
SR-MPLS TE Policy. If all the segment lists of the candidate path with the highest preference are faulty, a
switchover to the HSB path is triggered.
SBFD Implementation
Figure 1 shows how SBFD is implemented through the communication between an initiator and a reflector.
Before link detection, the initiator and reflector exchange SBFD control packets to advertise information,
such as the SBFD discriminator. In link detection, the initiator proactively sends an SBFD packet, and the
reflector reflects back this packet. The initiator determines the local state based on the reflected packet.
• The initiator is responsible for detection and runs an SBFD state machine and a detection mechanism.
Because the state machine has only Up and Down states, the initiator can send packets carrying only
the Up or Down state and receive packets carrying only the Up or Admin Down state.
The initiator first sends an SBFD packet with the initial state of Down and the destination port number
7784 to the reflector.
• The reflector does not have any SBFD state machine or detection mechanism. For this reason, it does
not proactively send SBFD Echo packets, but rather, it only reflects SBFD packets.
After receiving an SBFD packet from the initiator, the reflector checks whether the SBFD discriminator
carried in the packet matches the locally configured global SBFD discriminator. If they do not match,
the packet is discarded. If they match and the reflector is in the working state, the reflector reflects back
2022-07-08 457
Feature Description
the packet. If they match but the reflector is not in the working state, the reflector sets the state to
Admin Down in the packet.
• Initial state: The initiator first sends an SBFD packet in which the initial state is Down to the reflector.
• State transition: After receiving a reflected packet carrying the Up state, the initiator sets the local state
to Up. If the initiator receives a reflected packet carrying the Admin Down state, it sets the local state to
Down. The initiator also sets the local state to Down if it does not receive any reflected packet before
the timer expires.
• State retention: If the initiator is in the Up state and receives a reflected packet carrying the Up state,
the local state remains Up. However, if the initiator is in the Down state and receives a reflected packet
carrying the Admin Down state or does not receive any packet before the timer expires, the local state
remains Down.
2022-07-08 458
Feature Description
1. After SBFD for SRv6 TE Policy is enabled on the headend, the mapping between the destination IPv6
address and discriminator is configured on the headend. If multiple segment lists exist in the SRv6 TE
Policy, the remote discriminators of the corresponding SBFD sessions are the same.
2. The headend sends an SBFD packet encapsulated with a SID stack corresponding to the SRv6 TE
Policy.
3. After the endpoint device receives the SBFD packet, it returns an SBFD reply through the shortest IPv6
link.
4. If the headend receives the SBFD reply, it considers that the corresponding segment list in the SRv6 TE
Policy is normal. Otherwise, it considers that the segment list is faulty. If all the segment lists
referenced by a candidate path are faulty, SBFD triggers a candidate path switchover.
SBFD return packets are forwarded over IPv6. If the primary paths of multiple SRv6 TE Policies between two
nodes differ due to different path constraints but SBFD return packets are transmitted over the same path, a
fault in the return path may cause all involved SBFD sessions to go Down. As a result, all the SRv6 TE
Policies between the two nodes go Down. The SBFD sessions of multiple segment lists in the same SRv6 TE
Policy also have this problem.
By default, if HSB protection is not enabled for an SRv6 TE Policy, SBFD detects all the segment lists only in
the candidate path with the highest preference in the SRv6 TE Policy. With HSB protection enabled, SBFD
can detect all the segment lists referenced by candidate paths with the highest and second highest priorities
in the SRv6 TE Policy. If all the segment lists referenced by the candidate path with the highest preference
are faulty, a switchover to the HSB path is triggered.
2022-07-08 459
Feature Description
1. Create two bidirectional co-routed SRv6 TE Policies between the headend and endpoint, ensuring that
the forward and reverse segment lists of one SRv6 TE Policy share the same path as those of the other
SRv6 TE Policy.
2. Specify a binding SID (BSID) and a reverse BSID for the bidirectional co-routed segment lists, with the
BSID of one segment list being the same as the reverse BSID of the other segment list.
Figure 4 Key configurations that enable the forwarding of SBFD return packets over a segment list
After the forwarding of SBFD return packets over a segment list is enabled and an SBFD session is initiated
for a specific segment list, the reverse BSID of this segment list is sent to the peer device through BFD
packets. The peer device then finds the corresponding segment list according to the received BSID and
forwards return packets over this segment list.
Return packets can be encapsulated in Insert or Encaps mode, depending on the encapsulation mode
configured on device D for the SRv6 TE Policy.
Figure 5 shows the detection process when SBFD return packets forwarded over a segment list are
encapsulated in Insert mode.
2022-07-08 460
Feature Description
Figure 5 Detection process when SBFD return packets forwarded over a segment list are encapsulated in Insert
mode
Figure 6 shows the detection process when SBFD return packets forwarded over a segment list are
encapsulated in Encaps mode.
Figure 6 Detection process when SBFD return packets forwarded over a segment list are encapsulated in Encaps
mode
2022-07-08 461
Feature Description
1. After SBFD for SRv6 TE Policy is enabled on the headend, SBFD packets are forwarded based on the
network slice.
2. After receiving an SBFD packet, the endpoint device sends an SBFD reply packet.
3. If the headend receives the SBFD reply packet, it considers the SRv6 TE Policy normal. Otherwise, the
headend considers the SRv6 TE Policy faulty and sets SBFD to Down.
In this scenario, you must ensure that network slicing is deployed for the SRv6 TE Policy in E2E mode and
that the primary path is working properly. Otherwise, SBFD fails.
2022-07-08 462
Feature Description
1. After U-BFD is configured on the headend of the specified SRv6 TE Policy, the headend constructs a
special BFD packet in which both the source and destination IP addresses in the IP header are the local
IP address (that is, the TE IPv6 Router-ID) and both the local and remote discriminators are the same.
2. The headend encapsulates a SID stack corresponding to the SRv6 TE Policy into the BFD packet. This
transforms the packet into an SRv6 one.
3. After receiving the SRv6 packet through the SRv6 TE Policy, the endpoint processes the packet,
searches for a route according to the destination IPv6 address in the BFD packet, and then loops back
the BFD packet to the headend.
4. If the headend receives a U-BFD reply, it considers that the corresponding segment list in the SRv6 TE
Policy is normal. Otherwise, it considers that the segment list fails. If all the segment lists of a
candidate path fail, U-BFD triggers a switchover to the backup candidate path.
U-BFD return packets are forwarded over IPv6 routes. In cases where the primary paths of multiple SRv6 TE
Policies between two nodes differ due to different path constraints, if U-BFD return packets are transmitted
over the same path and this path fails, all the involved U-BFD sessions may go Down. As a result, all the
SRv6 TE Policies between the two nodes go Down. This problem also applies to U-BFD sessions of multiple
segment lists in the same SRv6 TE Policy.
By default, if HSB is not enabled for an SRv6 TE Policy, U-BFD detects all the segment lists of only the
candidate path with the highest preference in the SRv6 TE Policy. Conversely, when HSB is enabled, U-BFD
can detect all the segment lists of the candidate paths with the highest and second highest preferences in
the SRv6 TE Policy. If the segment lists of the candidate path with the highest preference all fail, a
switchover to the HSB path is triggered.
2022-07-08 463
Feature Description
solution is provided to allow U-BFD return packets to be forwarded over a segment list.
Figure 2 shows the key configurations that enable the forwarding of U-BFD return packets over a segment
list. The corresponding configuration requirements are as follows:
1. Create two bidirectional co-routed SRv6 TE Policies between the headend and endpoint, ensuring that
the forward and reverse segment lists of one SRv6 TE Policy share the same path as those of the other
SRv6 TE Policy.
2. Specify a binding SID (BSID) and a reverse BSID for the bidirectional co-routed segment lists, with the
BSID of one segment list being the same as the reverse BSID of the other segment list.
Figure 2 Key configurations that enable the forwarding of U-BFD return packets over a segment list
After the forwarding of U-BFD return packets over a segment list is enabled and a U-BFD session is initiated
for a specific segment list, the reverse BSID of this segment list is sent to the peer device through BFD
packets. The peer device then finds the corresponding segment list according to the received BSID and
forwards return packets over this segment list.
Return packets can also be encapsulated in Insert or Encaps mode, depending on the encapsulation mode
configured on device D for the SRv6 TE Policy.
Figure 3 shows the detection process when U-BFD return packets forwarded over a segment list are
encapsulated in Insert mode.
2022-07-08 464
Feature Description
Figure 3 Detection process when U-BFD return packets forwarded over a segment list are encapsulated in Insert
mode
Figure 4 shows the detection process when U-BFD return packets forwarded over a segment list are
encapsulated in Encaps mode.
2022-07-08 465
Feature Description
Figure 4 Detection process when U-BFD return packets forwarded over a segment list are encapsulated in Encaps
mode
1. After U-BFD for SRv6 TE Policy is enabled on the headend, U-BFD packets are forwarded based on the
network slice.
2022-07-08 466
Feature Description
2. After receiving a U-BFD packet, the endpoint device sends a U-BFD loopback packet.
3. If the headend receives the U-BFD loopback packet, it considers the SRv6 TE Policy normal. Otherwise,
it considers the SRv6 TE Policy faulty and sets U-BFD to Down.
In this scenario, you must ensure that network slicing is deployed for the SRv6 TE Policy in E2E mode and
that the primary path is working properly. Otherwise, U-BFD fails.
In an SRv6 TE Policy network slicing scenario, U-BFD return packets can also be forwarded over a segment
list. The detection process is as follows:
1. After U-BFD for SRv6 TE Policy is enabled on the headend, U-BFD packets are forwarded based on the
network slice.
2. After receiving a U-BFD packet, the endpoint device sends a U-BFD loopback packet based on the
configured reverse SRv6 TE Policy.
3. If the headend receives the U-BFD loopback packet, it considers the bidirectional SRv6 TE Policies
normal. Otherwise, it considers the bidirectional SRv6 TE Policies faulty and sets U-BFD to Down.
In this scenario, both the forward and reverse SRv6 TE Policies must meet the following requirements.
Otherwise, U-BFD fails.
• For the forward SRv6 TE Policy, ensure that network slicing is deployed for this policy in E2E mode and
that the primary path is working properly.
• For the reverse SRv6 TE Policy, if the encapsulation mode is Insert, ensure that network slicing is
deployed for this policy in E2E mode and that the primary path is working properly. If the encapsulation
mode is Encap, you only need to ensure that the primary path is working properly.
Definition
As a key technology used on scalable next generation networks, Multiprotocol Label Switching (MPLS)
provides multiple services with quality of service (QoS) guarantee. MPLS, however, introduces a unique
network layer, which causes faults. Therefore, MPLS networks must obtain operation, administration and
maintenance (OAM) capabilities.
OAM is an important means to reduce network maintenance costs. The MPLS OAM mechanism manages
operation and maintenance of MPLS networks.
For details about the MPLS OAM background, see ITU-T Recommendation Y.1710. For details about the
MPLS OAM implementation mechanism, see ITU-T Recommendation Y.1711.
Purpose
2022-07-08 467
Feature Description
The server-layer protocols, such as Synchronous Optical Network (SONET)/Synchronous Digital Hierarchy
(SDH), is below the MPLS layer; the client-layer protocols, such as IP, FR, and ATM, is above the MPLS layer.
These protocols have their own OAM mechanisms. Failures in the MPLS network cannot be rectified
completely through the OAM mechanism of other layers. In addition, the network technology hierarchy also
requires MPLS to have its independent OAM mechanism to decrease dependency between layers on each
other.
The MPLS OAM mechanism can detect, identify, and locate a defect at the MPLS layer effectively. Then, the
MPLS OAM mechanism reports and handles the defect. In addition, if a failure occurs, the MPLS OAM
mechanism triggers protection switching.
MPLS offers an OAM mechanism totally independent of any upper or lower layer. The following OAM
features are enabled on the MPLS user plane:
• Performs a traffic switchover if a fault occurs so that services meet service level agreements (SLAs).
Benefit
• MPLS OAM can rapidly detect link faults or monitor the connectivity of links, which helps measure
network performance and minimizes OPEX.
• If a link fault occurs, MPLS OAM rapidly switches traffic to the standby link to restore services, which
shortens the defect duration and improves network reliability.
1. The ingress sends a connectivity verification (CV) or fast failure detection (FFD) packet along an LSP
to be monitored. The packet passes through the LSP and arrives at the egress.
2. The egress compares the packet type, frequency, and trail termination source identifier (TTSI) in a
2022-07-08 468
Feature Description
received packet with the locally configured values to verify the packet. In addition, the egress collects
the numbers of correct and incorrect packets within a detection interval.
3. If the egress detects an LSP defect, it analyzes the defect type and sends a backward defect indication
(BDI) packet carrying defect information to the ingress along a reverse tunnel. The ingress can then
obtain the defect. If a protection group is correctly configured, the ingress switches traffic to a backup
LSP.
Reverse Tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse tunnel can transmit BDI
packets to notify the ingress of an LSP defect.
A reverse tunnel and the LSP to which the reverse tunnel is bound must have the same endpoints.
The reverse tunnel transmitting BDI packets can be either of the following types:
• If OAM is enabled on the ingress of an LSP later than that on the egress or if OAM is enabled on the
egress but disabled on the ingress, the egress generates a loss of connectivity verification defect
(dLOCV) alarm.
• Before the OAM detection packet type or the interval at which detection packets are sent are changed,
OAM must be disabled on the ingress and egress.
• OAM parameters (such as a detection packet type and an interval at which detection packets are sent)
must be set on both the ingress and egress, which may cause parameter inconsistency.
The NE40E implements the OAM auto protocol to resolve these drawbacks.
The OAM auto protocol is configured on the egress. With this protocol, the egress can automatically start
OAM functions after receiving the first OAM packet. In addition, the egress can dynamically stop running the
OAM state machine after receiving an FDI packet sent by the ingress.
Background
The Multiprotocol Label Switching (MPLS) operation, administration and maintenance (OAM) mechanism
effectively detects and locates MPLS link faults. The MPLS OAM mechanism also triggers a protection
2022-07-08 469
Feature Description
Related Concepts
• MPLS OAM packets
Table 1 describes MPLS OAM packets.
Continuity check: Sent by a local MEP to detect exceptions. If the local MEP detects an
Connectivity verification exception, it sends an alarm to its client-layer MEP. For example, if a CV-
(CV) packet enabled device receives a packet on an incorrect LSP, the device will report
an alarm indicating a forwarding error to the client-layer MEP.
Continuity check: Fast Sent by a maintenance association end point (MEP) to rapidly detect an
failure detection (FFD) LSP fault. If the MEP detects a fault, it sends an alarm to the client layer.
packet NOTE:
FFD and CV packets contain the same information and provide the same function.
They are processed in the same way, whereas FFD packets are processed more
quickly than CV packets.
FFD and CV cannot be started simultaneously.
Backward defect Sent by the egress to notify the ingress of an LSP defect.
indication (BDI) packet
• Channel defects
Table 2 describes channel defects that MPLS OAM can detect.
Defect Description
Type
2022-07-08 470
Feature Description
Defect Description
Type
• Reverse tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse tunnel can
transmit BDI packets to notify the ingress of an LSP defect. A reverse tunnel and the LSP to which the
reverse tunnel is bound must have the same endpoints, and they transmit traffic in opposite directions.
The reverse tunnels transmitting BDI packets include private or shared LSPs. Table 3 lists the two types
of reverse tunnel.
type Description
Private reverse Bound to only one LSP. The binding between the private reverse LSP and its forward
LSP LSP is stable but may waste LSP resources.
Shared reverse Bound to many LSPs. A TTSI carried in a BDI packet identifies a specific forward LSP
LSP bound to a reverse LSP. The binding between a shared reverse LSP and multiple
forward LSPs minimizes LSP resource wastes. If defects occur on multiple LSPs
bound to the shared reverse LSP, the reverse LSP may be congested with traffic.
Implementation
MPLS OAM periodically sends CV or FFD packets to monitor TE LSPs, PWs, or ring networks.
2022-07-08 471
Feature Description
Figure 1 illustrates a network on which MPLS OAM monitors TE LSP connectivity. The process of using
MPLS OAM to monitor TE LSP connectivity is as follows:
1. The ingress sends a CV or FFD packet along a TE LSP to be monitored. The packet passes through
the TE LSP and arrives at the egress.
2. The egress compares the packet type, frequency, and TTSI in the received packet with the locally
configured values to verify the packet. In addition, the egress collects the number of correct and
incorrect packets within a detection interval.
3. If the egress detects an LSP defect, the egress analyzes the defect type and sends a BDI packet
carrying defect information to the ingress along a reverse tunnel. The ingress can then be notified
of the defect. If a protection group is configured, the ingress switches traffic to a backup LSP.
1. For PE1 and PE2, a PW is established between them, OAM parameters are set on them, and they
2022-07-08 472
Feature Description
are enabled to send and receive OAM packets. OAM monitors the PW between PE1 and PE2 and
obtains PW information
2. If OAM detects a default, PE2 sends a BDI packet to PE1 over a reverse tunnel.
3. PEs notify CEs of the fault so that CE1 and CE2 can use the information to maintain networks.
• A dLOCV defect occurs if the OAM function is enabled on the ingress on an LSP later than that on the
egress or if OAM is enabled on the egress and disabled on the ingress.
• The dLOCV defect also occurs when OAM is disabled. OAM must be disabled on the ingress and egress
before the OAM detection packet type or the interval at which detection packets are sent can be
changed.
• OAM parameters, including a detection packet type and an interval at which detection packets are sent
must be set on both the ingress and egress. This is likely to cause a parameter inconsistency.
The OAM auto protocol enabled on the egress provides the following functions:
• Triggers OAM
■ If the sink node does not support OAM CC and CC parameters (including the detection packet type
and interval at which packets are sent), upon the receipt of the first CV or FFD packet, the sink
node automatically records the packet type and interval at which the packet is sent and uses these
parameters in CC detection that starts.
■ If the OAM function-enabled sink node does not receive CV or FFD packets within a specified
period of time, the sink node generates a BDI packet and notifies the NMS of the BDI defect.
• Dynamically stops running the OAM. If the detection packet type or interval at which detection packets
are sent is to be changed on the source node, the source node sends an FDI packet to instruct the sink
node to stop the OAM state machine. If an OAM function is to be disabled on the source node, the
source node also sends an FDI packet to instruct the sink node to stop the OAM state machine.
2022-07-08 473
Feature Description
Figure 1 illustrates an IP RAN in the Layer 2 to edge scenario. The MPLS OAM implementation is as follows:
• The BTS, NodeB, BSC, and RNC can be directly connected to an MPLS network.
• A TE tunnel between PE1 and PE4 is established. PWs are established over the TE tunnel to transmit
various services.
• MPLS OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1 and PE4 on both ends
of a PW. These PEs are enabled to send and receive OAM detection packets, which allows OAM to
monitor the PW between PE1 and PE4. OAM can obtain basic PW information. If OAM detects a default,
PE4 sends a BDI packet to PE1 over a reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC
of fault information so that the user-side devices can use the information to maintain networks.
The working principles of PE2 and PE3 are the same as those of PE 1.
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service (VPLS) services
require an operation, administration and maintenance (OAM) mechanism. MultiProtocol Label Switching
Transport Profile MPLS OAM provides a mechanism to rapidly detect and locate faults, which facilitates
network operation and maintenance and reduces the network maintenance costs.
Networking Description
As shown in Figure 1, a user-end provider edge (UPE) on the access network is dual-homed to SPE1 and
SPE2 on the aggregation network. A VLL supporting access links of various types is deployed on the access
network. A VPLS is deployed on the aggregation network to form a point-to-multipoint leased line network.
Additionally, Fast Protection Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection
switching (APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching instances
(VSIs) created on the two superstratum provider edges (SPEs).
2022-07-08 474
Feature Description
Feature Deployment
To deploy MPLS OAM to monitor link connectivity of VLL and VPLS pseudo wires (PWs), configure
maintenance entity groups (MEGs) and maintenance entities (MEs) on the UPE, SPE1, and SPE2 and then
enable one or more of the continuity check (CC), loss measurement (LM), and delay measurement (DM)
functions. The UPE monitors link connectivity and performance of the primary and secondary PWs.
• When SPE1 detects a link fault on the primary PW, SPE1 sends a Remote Defect Indication (BDI) packet
to the UPE, instructing the UPE to switch traffic from the primary PW to the secondary PW. Meanwhile,
the UPE sends a MAC Withdraw packet, in which the value of the PE-ID field is SPE1's ID, to SPE2. After
receiving the MAC Withdraw packet, SPE2 transparently forwards the packet to the NPE and the NPE
deletes the MAC address it has learned from SPE1. After that, the NPE learns a new MAC address from
the secondary PW.
• After the primary PW recovers, the UPE switches traffic from the secondary PW back to the primary PW.
Meanwhile, the UPE sends a MAC Withdraw packet, in which the value of the PE-ID field is SPE2's ID, to
SPE1. After receiving the MAC Withdraw packet, SPE1 transparently forwards the packet to the NPE and
the NPE deletes the MAC address it has learned from SPE2. After that, the NPE learns a new MAC
address from the new primary PW.
Terms
Item Definition
reverse A direction opposite to the direction that traffic flows along the monitored
service link.
forward A direction that traffic flows along the monitored service link.
2022-07-08 475
Feature Description
Item Definition
path merge LSR An LSR that receives the traffic transmitted on the protection path in MPLS
OAM protection switching.
If the path merge LSR is not the traffic destination, it sends and merges the
traffic transmitted on the protection path onto the working path.
If the path merge LSR is the destination of traffic, it sends the traffic to the
upper-layer protocol for handling.
path switch LSR An LSR that switches or replicates traffic between the primary service link and
the bypass service link.
user plane A set of traffic forwarding components through which traffic flow passes. An
OAM CV or FFD packet is periodically inserted to this traffic flow to monitor
the forwarding component status. In IETF drafts, the user plane is also called
the data plane.
Ingress An LSR from which the forward LSP originates and at which the reverse LSP
terminates.
Egress An LSR at which the forward LSP terminates and from which the reverse LSP
originates.
CV Connectivity Verification
DM Delay Measurement
LM Loss Measurement
2022-07-08 476
Feature Description
SD Signal Deterioration
SF Signal Failure
Definition
Multiprotocol Label Switching Protocol Transport Profile (MPLS-TP) is a transport technique that integrates
MPLS packet switching with traditional transport network features. MPLS-TP networks are poised to replace
traditional transport networks in the future. MPLS-TP Operation, Administration, and Maintenance (MPLS-TP
OAM) works on the MPLS-TP client layer. It can effectively detect, identify, and locate faults in the client
layer and quickly switch traffic when links or nodes become defective. OAM is an important part of any plan
to reduce network maintenance expenditures.
Purpose
Both networks and services are part of an ongoing process of transformation and integration. New services
like triple play services, Next Generation Network (NGN) services, carrier Ethernet services, and Fiber-to-the-
x (FTTx) services are constantly emerging from this process. Such services demand more investment and
have higher OAM costs. They require state of the art QoS, full service access, and high levels of expansibility,
reliability, and manageability of transport networks. Traditional transport network technologies such as
Multi-Service Transfer Platform (MSTP), Synchronous Digital Hierarchy (SDH), or Wavelength Division
Multiplexing (WDM) cannot meet these requirements because they lack a control plane. Unlike traditional
technologies, MPLS-TP does meet these requirements because it can be used on next-generation transport
networks that can process data packets, as well as on traditional transport networks.
Because traditional transport networks have high reliability and maintenance benchmarks, MPLS-TP must
provide powerful OAM capabilities. MPLS-TP OAM provides the following functions:
• Fault management
• Performance monitoring
Benefits
2022-07-08 477
Feature Description
• MPLS-TP OAM can rapidly detect link faults or monitor the connectivity of links, which helps measure
network performance and minimizes OPEX.
• If a link fault occurs, MPLS-TP OAM rapidly switches traffic to the standby link to restore services, which
shortens the defect duration and improves network reliability.
• ME
An ME maintains a relationship between two MEPs. On a bidirectional label switched path (LSP) that
has two MEs, MPLS-TP OAM detection can be performed on the MEs without affecting each other. One
ME can be nested within another ME but cannot overlap with another ME.
• MEG
A maintenance entity group (MEG) comprises one or more MEs that are created for a transport link. If
the transport link is a point-to-point bidirectional path, such as a bidirectional co-routed LSP or pseudo
wire (PW), a MEG comprises only one ME.
• MEP
A MEP is the source or sink node in a MEG. Figure 2 shows ME node deployment.
2022-07-08 478
Feature Description
■ For a bidirectional LSP, only the ingress label edge router (LER) and egress LER can function as
MEPs, as shown in Figure 2.
■ For a PW, only user-end provider edges (UPEs) can function as MEPs.
MEPs trigger and control MPLS-TP OAM operations. OAM packets can be generated or terminated on
MEPs.
Fault Management
Table 1 lists the MPLS-TP OAM fault management functions supported by the NE40E.
Function Description
Performance Monitoring
Table 2 lists the MPLS-TP OAM performance monitoring functions supported by the NE40E.
Function Description
Loss measurement (LM) Collects statistics about lost frames. LM includes the following functions:
2022-07-08 479
Feature Description
Function Description
Delay measurement Collects statistics about delays and delay variations (jitter). DM includes the
(DM) following functions:
One-way frame delay measurement
Two-way frame delay measurement
Maintenance entity (ME) All MPLS-TP OAM functions are Section layer:
performed on MEs. Each ME Each pair of adjacent LSRs forms
consists of two maintenance an ME.
2022-07-08 480
Feature Description
MEG end point (MEP) A MEP is the source or sink node Section layer: Each LSR can
in a MEG. function as a MEP.
Each LSR functions as an LSR.
LSP layer: Only an LER can
function as a MEP.
LSRs A, D, E, and G are LERs
functioning as MEPs.
PW layer: Only PW terminating
provider edge (T-PE) LSRs can
function as MEPs.
LSRs A and G are T-PEs
functioning as MEPs.
MEG intermediate point (MIP) Intermediate nodes between two Section layer:
MEPs on both ends of a MEG. No MIPs.
MIPs only respond to OAM LSP layer:
packets sent by MEPs and do not LSRs B, C, and F function as MIPs.
take the initiative in OAM packet
PW layer:
2022-07-08 481
Feature Description
Usage Scenario
MPLS-TP OAM monitors the following types of links:
CC
CC is a proactive OAM operation. It detects LOC faults between any two MEPs in a MEG. A MEP sends CC
messages (CCMs) to a remote RMEP at specified intervals. If the RMEP does not receive a CCM for a period
3.5 times provided that; if the specified interval, it considers the connection between the two MEPs faulty.
This causes the RMEP to report an alarm and enter the Down state, and the RMEP triggers automatic
protection switching (APS) on both MEPs. After receiving a CCM from the MEP, the RMEP will clear the
alarm and exit the Down state.
CV
CV is also a proactive OAM operation. It enables a MEP to report alarms when unexpected or error packets
are received. For example, if a CV-enabled MEP receives a packet from an LSP and finds that this packet has
been transmitted in error along an LSP, the MEP will report an alarm indicating a forwarding error.
• Near-end packet loss value: the number of dropped packets expected to arrive at the local MEP.
• Far-end packet loss value: the number of dropped packets the local MEP has sent.
To collect packet loss statistics for both incoming and outgoing packets, each MEP must have both of the
following counters enabled:
2022-07-08 482
Feature Description
• TxFCf: the local TxFCl value recorded when the local MEP sent a CCM.
• RxFCb: the local RxFCl value recorded when the local MEP received a CCM.
• TxFCb: the TxFCf value carried in a received CCM. This TxFCb value is the local TxFCl when the local
MEP receives a CCM.
After receiving CCMs carrying packet count information, both MEPs use the following formulas to measure
near- and far-end packet loss values:
Near-end packet loss value = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Far-end packet loss value = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
• TxFCf[tc], RxFCb[tc], and TxFCb[tc] are the TxFCf, RxFCb, and TxFCb values, respectively, which are
carried in the most recently received CCM. RxFCl[tc] is the local RxFCl value recorded when the local
MEP received the CCM.
• TxFCf[tp], RxFCb[tp], and TxFCb[tp] are the TxFCf, RxFCb, and TxFCb values, respectively, which are
carried in the previously received CCM. RxFCl[tp] is the local RxFCl value recorded when the local MEP
received the previous CCM.
2022-07-08 483
Feature Description
mode, a local MEP periodically sends loss measurement messages (LMMs) to an RMEP carrying the
following information:
• TxFCf: the local TxFCl value recorded when the LMM was sent.
After receiving an LMM, the RMEP responds to the local MEP with loss measurement replies (LMRs) carrying
the following information:
• RxFCf: the local RxFCl value recorded when the LMM was received.
• TxFCb: the local TxFCl value recorded when the LMR was sent.
After receiving an LMR, the local MEP uses the following formulas to calculate near- and far-end packet loss
values:
Near-end packet loss value = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Far-end packet loss value = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
• TxFCf[tc], RxFCf[tc], and TxFCb[tc] are the TxFCf, RxFCf, and TxFCb values, respectively, which are
carried in the most recently received LMR. RxFCl[tc] is the local RxFCl value recorded when the most
recent LMR arrives at the local MEP.
• TxFCf[tp], RxFCf[tp], and TxFCb[tp] are the TxFCf, RxFCf, and TxFCb values, respectively, which are
carried in the previously received LMR. RxFCl[tp] is the local RxFCl value recorded when the previous
LMR arrived at the local MEP.
2022-07-08 484
Feature Description
The link delay time can be measured using either one- or two-way frame delay measurement. Table 1
describes these frame delay measurement functions.
One-way frame Measures the network delay time on a One-way frame delay measurement can
delay unidirectional link between MEPs. be used only on a unidirectional link. A
measurement MEP and its RMEP on both ends of the
link must have synchronous time.
Two-way frame Measures the network delay time on a Two-way frame delay measurement can
delay bidirectional link between MEPs. be used on a bidirectional link between a
measurement local MEP and its RMEP. The local MEP
does not need to synchronize its time with
its RMEP.
After the RMEP receives a 1DM, it subtracts the TxTimeStampf value from the RxTimef value to calculate the
delay time:
Frame delay time = RxTimef - TxTimeStampf
The frame delay value can be used to measure the delay variation that is the absolute difference between
two delay time values.
One-way frame delay measurement can only be performed when the two MEPs on both ends of a link have
synchronous time. If these MEPs have asynchronous time, they can only measure the delay variation.
2022-07-08 485
Feature Description
Two-way frame delay measurement is performed by E2E MEPs. A MEP periodically sends a DMM carrying
TxTimeStampf (the time when the DMM was sent). After receiving the DMM, the RMEP responds with a
delay measurement reply (DMR). This message carries RxTimeStampf (the time when the DMM was
received) and TxTimeStampb (the time when the DMR was sent). The value in every field of the DMM is
copied exactly to the DMR, with the exception that the source and destination MAC addresses are
interchanged.
Upon receipt of the DMR, the local MEP calculates the two-way frame delay time using the following
formula:
Frame delay = RxTimeb (the time the DMR was received) - TxTimeStampf
To obtain a more accurate result, RxTimeStampf and TxTimeStampb are used. RxTimeStampf indicates the
time a DMM is received, and TxTimeStampb indicates the time a DMR is sent. After the local MEP receives
the DMR, it calculates the frame delay time using the following formula:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
Two-way frame delay measurement supports both delay and delay variation measurement even if these
MEPs do not have synchronous time. The frame delay time is the round-trip delay time. If both MEPs have
synchronous time, the round-trip delay time can be calculated by combining the two delay values using the
following formulas:
• After a local MEP detects a link fault using the continuity check (CC) function, the local MEP sets the
RDI flag to 1 in CCMs and sends the CCMs along a reverse path to notify its RMEP of the fault.
• After the fault is rectified, the local MEP sets the RDI flag to 0 in CCMs and sends them to inform the
RMEP that the fault is rectified.
2022-07-08 486
Feature Description
• The RDI function is associated with the proactive continuity check function and takes effect only after the
continuity check function is enabled.
• The RDI function applies only to bidirectional links. In the case of a unidirectional LSP, before RDI can be used, a
reverse path must be bound to the LSP.
5.5.2.6 Loopback
Background
On a multiprotocol label switching transport profile (MPLS-TP) network, a virtual circuit may traverse
multiple exchanging devices (nodes), including maintenance association end points (MEPs) and maintenance
association intermediate points (MIPs). Any faulty node or link fault in a virtual circuit may lead to the
unavailability of the entire virtual circuit. Moreover, the fault cannot be located. Loopback (LB) can be
configured on a source device (MEP) to detect or locate faults in links between the MEP and a MIP or
between MEPs.
Related Concepts
LB and continuity check (CC) are both connectivity monitoring tools on an MPLS-TP network. Table 1
describes differences between CC and LB.
Implementation
The loopback function monitors the connectivity of bidirectional links between a MEP and a MIP and
between MEPs.
1. The source MEP sends a loopback message (LBM) to a destination. If a MIP is used as the destination,
2022-07-08 487
Feature Description
the TTL in the LBM must be equal to the number of hops from the source to the destination. LBM
checks whether the target MIP ID carried by itself and the MIP ID are the same. If a MEP is used as
the destination, the TTL must be greater than or equal to the number of hops to the destination. The
TTL setting prevents the LBM from being discarded before reaching the destination.
2. After the destination receives the LBM, it checks whether the target MIP ID or MEP ID matches the
local MIP ID or MEP ID. If they do not match, the destination discards the LBM. If they match, the
destination responds with a loopback reply (LBR).
3. If the source MEP receives the LBR within a specified period of time, it considers the destination
reachable and the loopback test successful. If the source MEP does not receive the LBR after the
specified period of time elapses, it records a loopback test timeout and log information that is used to
analyze the connectivity failure.
Figure 1 illustrates a loopback test. LSRA initiates a loopback test to LSRC on an LSP. The loopback test
process is as follows:
1. LSRA sends LSRC an LBM carrying a specified TTL and a MIP ID. LSRB transparently transmits the LBM
to LSRC.
2. Upon receipt, LSRC determines that the TTL carried in the LBM times out and checks whether the
target MIP ID carried in the LBM matches the local MIP ID. If they do not match, LSRC discards the
LBM. If they match, LSRC responds with an LBR.
3. If LSRA receives the LBR within a specified period of time, it considers LSRC reachable. If LSRA fails to
receive the LBR after a specified period of time elapses, LSRA considers LSRC unreachable and records
log information that is used to analyze the connectivity failure.
In Figure 1, in Layer 2 to edge scenario on an IP RAN, mature PWE3 techniques are used to carry services.
The process of transmitting services between a BST/NodeB and a RNC/BSC is as follows:
• The BTS, NodeB, BSC, and RNC can be directly connected to an MPLS-TP network.
• A TE tunnel between PE1 and PE4 is established. PWs are established over the TE tunnel to transmit
various services.
• MPLS-TP OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1 and PE4 on both
ends of a PW. These PEs are enabled to send and receive OAM detection packets, which allows OAM to
monitor the PW between PE1 and PE4. OAM can obtain basic PW information. If OAM detects a default,
PE4 sends a RDI packet to PE1 over a reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC
of fault information so that the user-side devices can use the information to maintain networks.
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service (VPLS) services
require an operation, administration and maintenance (OAM) mechanism. MultiProtocol Label Switching
Transport Profile (MPLS-TP) OAM provides a mechanism to rapidly detect and locate faults, which facilitates
network operation and maintenance and reduces the network maintenance costs.
Networking Description
As shown in Figure 1, a user-end provider edge (UPE) on the access network is dual-homed to SPE1 and
SPE2 on the aggregation network. A VLL supporting access links of various types is deployed on the access
network. A VPLS is deployed on the aggregation network to form a point-to-multipoint leased line network.
Additionally, Fast Protection Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection
2022-07-08 489
Feature Description
switching (APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching instances
(VSIs) created on the two superstratum provider edges (SPEs).
Feature Deployment
To deploy MPLS-TP OAM to monitor link connectivity of VLL and VPLS pseudo wires (PWs), configure
maintenance entity groups (MEGs) and maintenance entities (MEs) on the UPE, SPE1, and SPE2 and then
enable one or more of the continuity check (CC), and loopback (LB) functions. The UPE monitors link
connectivity and performance of the primary and secondary PWs.
• When SPE1 detects a link fault on the primary PW, SPE1 sends a Remote Defect Indication (RDI) packet
to the UPE, instructing the UPE to switch traffic from the primary PW to the secondary PW. Meanwhile,
the UPE sends a MAC Withdraw packet, in which the value of the PE-ID field is SPE1's ID, to SPE2. After
receiving the MAC Withdraw packet, SPE2 transparently forwards the packet to the NPE and the NPE
deletes the MAC address it has learned from SPE1. After that, the NPE learns a new MAC address from
the secondary PW.
• After the primary PW recovers, the UPE switches traffic from the secondary PW back to the primary PW.
Meanwhile, the UPE sends a MAC Withdraw packet, in which the value of the PE-ID field is SPE2's ID, to
SPE1. After receiving the MAC Withdraw packet, SPE1 transparently forwards the packet to the NPE and
the NPE deletes the MAC address it has learned from SPE2. After that, the NPE learns a new MAC
address from the new primary PW.
Terms
None
Abbreviations
2022-07-08 490
Feature Description
CC Continuity Check
CV Connectivity Verification
DM Delay Measurement
LB Loopback
LM Loss Measurement
LT Linktrace
PW pseudowire
SPE Superstratum PE
TST Test
UPE Underlayer PE
2022-07-08 491
Feature Description
Definition
The Virtual Router Redundancy Protocol (VRRP) is a standard-defined fault-tolerant protocol that groups
several physical routing devices into a virtual one. If a physical routing device (master) that serves as the
next hop of hosts fails, the virtual device switches traffic to a different physical routing device (backup),
thereby ensuring service continuity and reliability.
VRRP allows logical and physical devices to work separately and implements route selection among multiple
egress gateways.
On the network shown in Figure 1, a VRRP group is configured on two Routers, one of which serves as the
master, and the other as the backup. The two devices form a virtual Router that is assigned a virtual IP
address and a virtual MAC address. Hosts are only aware of this virtual Router, as opposed to the master
and backup Routers, and they use it to communicate with devices on different network segments.
A virtual Router consists of a master Router and one or more backup Routers. Only the master Router
forwards packets. If the master Router fails, a backup Router is elected as the new master Router through
VRRP negotiation and takes over traffic.
On a multicast or broadcast LAN such as an Ethernet network, a logical VRRP gateway ensures reliability for
key links. VRRP is highly reliable and prevents service interruption if a physical VRRP-enabled gateway fails.
VRRP configuration is simple and takes effect without modifying configurations such as routing protocols.
Purpose
As networks rapidly develop and applications diversify, various value-added services (VASs), such as Internet
Protocol television (IPTV) and video conferencing, are being widely deployed. As a result, network reliability
is required to ensure uninterrupted service transmission for users.
Hosts are usually connected to an external network through a default gateway. If the default gateway fails,
communication between the hosts and external network is interrupted. System reliability can be improved
using dynamic routing protocols (such as RIP and OSPF) or ICMP Router Discovery Protocol (IRDP).
2022-07-08 492
Feature Description
However, this method requires complex configurations and each host must support dynamic routing
protocols.
VRRP provides a better option, which involves grouping multiple routing devices into a virtual router without
changing existing networking. The IP address of the virtual router is configured as the default gateway
address. If a gateway fails, VRRP selects a different gateway to forward traffic, thereby ensuring reliable
communication.
Hosts on a local area network (LAN) are usually connected to an external network through a default
gateway. When the hosts send packets destined for addresses not within the local network segment, these
packets follow a default route to an egress gateway (PE in Figure 2). Subsequently, PE forwards packets to
the external network to enable the hosts to communicate with the external network.
If PE fails, the hosts connected to it will not be able to communicate with the external network, causing
service interruptions. This communication failure persists even if an additional Router is added to the LAN.
The reasons for this are that only one default gateway can be configured for most hosts on a LAN and hosts
send packets destined for addresses beyond the local network segment only through the default gateway
even if they are connected to multiple Router.
One common method of improving system reliability is by configuring multiple egress gateways. However,
this works only if hosts support route selection among multiple egress gateways. Another method involves
deploying a dynamic routing protocol, such as Routing Information Protocol (RIP), or Open Shortest Path
First (OSPF), as well as Internet Control Message Protocol (ICMP). However, it is difficult to run a dynamic
routing protocol on every host due to possible management or security issues, as well as the fact that a
host's operating system may not support the dynamic routing protocol.
VRRP resolves this issue. VRRP is configured only on involved Routers to implement gateway backup, without
any networking changes or burden on hosts.
Benefits
Benefits to carriers:
• Simplified network management: On a multicast or broadcast LAN such as an Ethernet network, VRRP
provides a highly reliable default link that is applicable even if a device fails. Furthermore, it prevents
2022-07-08 493
Feature Description
network interruptions caused by single link faults without changing configurations, such as those of
dynamic routing and route discovery protocols.
• Strong adaptability: VRRP Advertisement packets are encapsulated into IP packets, supporting various
upper-layer protocols.
Benefits to users:
• Simple configuration: Users only need to specify a gateway address, without the need to configure
complex routing protocols on their hosts.
• Improved user experience: Users are unaware of single point of failures on gateways, and their hosts
can uninterruptedly communicate with external networks.
Implementation
Feature Supported by IPv4 Supported by IPv6 Difference
In this document, if a VRRP function supports both IPv4 and IPv6, the implementation of this VRRP function is the same
for IPv4 and IPv6 unless otherwise specified.
VRRP work in two modes: master/backup mode and load balancing mode.
• A single VRRP group is configured and consists of a master device and several backup devices.
• The Router with the highest priority functions as the master device and forwards service traffic.
• Other Routers function as backup devices and monitor the master device's status. If the master device
fails, a backup device with the highest priority preempts the master role and takes over service traffic
forwarding.
■ PE1 functions as the master device in VRRP group 1 and the backup device in VRRP group 2.
■ PE2 functions as the master device in VRRP group 2 and the backup device in VRRP group 1.
■ In normal circumstances, different Routers process different user groups' traffic to implement load
balancing.
VRRP load balancing can be implemented in two modes. For details, see VRRP Fundamentals in HUAWEI NE40E-M2
series Universal Service Router Feature Description - Network Reliability.
• Virtual router: also referred to as a VRRP group, consists of a master device and one or more backup
devices. A virtual router is a default gateway used by hosts within a shared LAN, and is identified by a
virtual router ID and one or more virtual IP addresses.
■ VRID: virtual router ID. A group of devices with the same VRID form a virtual router.
2022-07-08 495
Feature Description
■ Virtual IP address: IP address of a virtual router. A virtual router can have one or more virtual IP
addresses, which are manually assigned.
■ Virtual MAC address: MAC address that is generated by the virtual router based on the VRID. A
virtual router has one virtual MAC address, in the format of 00-00-5E-00-01-{VRID} (VRRP for IPv4)
or 00-00-5E-00-02-{VRID} (VRRP for IPv6). A virtual router uses the virtual MAC address instead of
the actual interface MAC address to respond to ARP (VRRP for IPv4) or NS (VRRP for IPv6)
requests.
• IP address owner: A VRRP device is considered an IP address owner if it uses the virtual IP address as a
real interface address. If an IP address owner is available, it usually functions as the master in a VRRP
group.
• Primary IP address: an IP address (usually the first configured one) selected from the set of real
interface IP addresses. The primary IP address is used as the source IP address in a VRRP Advertisement
packet.
• VRRP router: a device running VRRP. It can belong to one or more virtual routers.
■ Virtual router backup: a group of VRRP devices that do not forward packets. Instead, they can be
elected as the new master if the current master fails.
• Priority: priority of a router in a VRRP group. A VRRP group elects the master and backup devices based
on priorities.
■ Preemption mode: In this mode, a backup device preempts the master role if it has a higher priority
than that of the current master.
■ Non-preemption mode: In this mode, a backup device does not preempt the master role even if it
has a higher priority than that of the current master, provided that the current master is working
properly.
• VRRP timers:
■ Adver_Interval timer: The master sends a VRRP Advertisement packet each time the Adver_Interval
timer expires. The default timer value is 1 second.
■ Master_Down timer: A backup device preempts the master role after the Master_Down timer
expires. The Master_Down timer value is calculated using the following formula: Master_Down
timer value = (3 x Adver_Interval timer value) + Skew_Time, where Skew_Time = (256 -
Priority)/256
2022-07-08 496
Feature Description
Two VRRP versions currently exist: VRRPv2 and VRRPv3. VRRPv2 applies only to IPv4 networks, and VRRPv3
applies to both IPv4 and IPv6 networks. VRRP is classified as VRRP for IPv4 (VRRP4) or VRRP for IPv6
(VRRP6) by network type. VRRP for IPv4 supports VRRPv2 and VRRPv3, whereas VRRP for IPv6 supports only
VRRPv3.
• On an IPv4 network, VRRP packets are encapsulated into IPv4 packets and sent to an IPv4 multicast
address assigned to a VRRP group. In the IPv4 packet header, the source address is the primary IPv4
address of the interface that sends the packets (not the virtual IPv4 address), the destination address is
224.0.0.18, the TTL is 255, and the protocol number is 112.
• On an IPv6 network, VRRP packets are encapsulated into IPv6 packets and sent to an IPv6 multicast
address assigned to a VRRP6 group. In the IPv6 packet header, the source address is the link-local
address of the interface that sends the packets (not the virtual IPv6 address), the destination address is
FF02::12, the TTL is 255, and the protocol number is 112.
You can manually switch VRRP versions on a NE40E. Unless otherwise specified, VRRP packets in this document refer to
VRRPv2 packets.
Field Description
2022-07-08 497
Feature Description
Field Description
2022-07-08 498
Feature Description
Field Description
• Time unit of the interval for sending VRRP Advertisement packets: VRRPv3 uses centiseconds, whereas
VRRPv2 uses seconds.
2022-07-08 499
Feature Description
Initialize A VRRP router is unavailable and does not After a router receives a Startup event, it
process VRRP Advertisement packets. changes its status as follows:
A router enters the Initialize state when it Changes from Initialize to Master if the
starts or detects a fault. router is an IP address owner with a priority
of 255.
Changes from Initialize to Backup if the
router has a priority less than 255.
Master A router in the Master state provides the The master router changes its status as
following functions: follows:
Sends a VRRP Advertisement packet each Changes from Master to Backup if the VRRP
time the Adver_Interval timer expires. priority in a received VRRP Advertisement
Responds to an ARP request with an ARP packet is higher than the local VRRP priority.
reply carrying the virtual MAC address. Remains in the Master state if the VRRP
Forwards IP packets sent to the virtual MAC priority in a received VRRP Advertisement
address. packet is the same as the local VRRP priority.
Allows ping to a virtual IP address by default. Changes from Master to Initialize after it
receives a Shutdown event, indicating that
the VRRP-enabled interface has been shut
down.
NOTE:
2022-07-08 500
Feature Description
Backup A router in the Backup state provides the A backup router changes its status as follows:
following functions: Changes from Backup to Master after it
Receives VRRP Advertisement packets from receives a Master_Down timer timeout event.
the master router and checks whether the Changes from Backup to Initialize after it
master router is working properly based on receives a Shutdown event, indicating that
information in the packets. the VRRP-enabled interface has been shut
Does not respond to an ARP request carrying down.
a virtual IP address.
Discards IP packets sent to the virtual MAC
address.
Discards IP packets sent to virtual IP
addresses.
If, in preemption mode, it receives a VRRP
Advertisement packet carrying a VRRP
priority lower than the local VRRP priority, it
preempts the Master state after a specified
preemption delay.
If, in non-preemption mode, it receives a
VRRP Advertisement packet carrying a VRRP
priority lower than the local VRRP priority it
remains in the Backup state.
Resets the Master_Down timer but does not
compare IP addresses if it receives a VRRP
Advertisement packet carrying a VRRP
priority higher than or equal to the local
VRRP priority.
1. VRRP elects the master router from a VRRP group based on router priorities. Once elected, the master
router sends a gratuitous ARP packet carrying the virtual MAC address to its connected device or host
2022-07-08 501
Feature Description
2. The master router periodically sends VRRP Advertisement packets to all backup routers in the VRRP
group to advertise its configurations (such as the priority) and operating status.
3. If the master router fails, VRRP elects a new master router from the VRRP group based on router
priorities.
4. The new master router immediately sends a gratuitous ARP packet carrying the virtual MAC address
and virtual IP address to update MAC entries on its connected device or host. After the update is
complete, user traffic is switched to the new master router. The switching process is transparent to
users.
5. If the original master router recovers and its priority is 255, it immediately switches to the Master
state. If the original master router recovers and its priority is lower than 255, it switches to the Backup
state and recovers the previously configured priority.
6. If a backup router's priority is higher than the master router's priority, VRRP determines whether to
reelect a new master router, depending on the backup router's working mode (preemption or non-
preemption).
To ensure that the master and backup routers work properly, VRRP must implement the following functions:
■ If a router finds that the VRRP Advertisement packet carries a priority higher than or equal to its
priority, this router remains in the Backup state.
■ If a router finds that the VRRP Advertisement packet carries a priority lower than its priority, the
router may switch to the Master state or remain in the Backup state, depending on its working
mode. If the router is working in preemption mode, it switches to the Master state; if the router is
working in non-preemption mode, it remains in the Backup state.
• If multiple VRRP routers enter the Master state at the same time, they exchange VRRP Advertisement packets to
determine the master or backup role. The VRRP router with the highest priority remains in the Master state, and
VRRP routers with lower priorities switch to the Backup state. If these routers have the same priority and the VRRP
group is configured on a router's interface with the largest primary IP address, that router becomes the master
router.
• If a VRRP router is the IP address owner, it immediately switches to the Master state after receiving a Startup
event.
2022-07-08 502
Feature Description
■ If the master router gives up the master role (for example, the master router leaves the VRRP
group), it sends VRRP Advertisement packets carrying a priority of 0 to the backup routers. Rather
than waiting for the Master_Down timer to expire, the backup router with the highest priority
switches to the Master state after a specified switching time. This switching time is called
Skew_Time, in seconds. The Skew_Time is calculated using the following equation:
Skew_Time = (256 - Backup router's priority)/256
■ If the master router fails and cannot send VRRP Advertisement packets, the backup routers cannot
immediately detect the master router's operating status. In this situation, the backup router with
the highest priority switches to the Master state after the Master_Down timer expires. The
Master_Down timer value (in seconds) is calculated using the following equation:
Master_Down timer value = (3 x Adver_Interval timer value) + Skew_Time
If network congestion occurs, a backup router may not receive VRRP Advertisement packets from the master router. If
this situation occurs, the backup router proactively switches to the Master state. If the new master router receives a
VRRP Advertisement packet from the original master router, the new master router will switch back to the Backup state.
As a result, the routers in the VRRP group frequently switch between Master and Backup. You can configure a
preemption delay to resolve this issue. After the configuration is complete, the backup router with the highest priority
switches to the Master state only when all of the following conditions are met:
VRRP Authentication
VRRP supports different authentication modes and keys in VRRP Advertisement packets that meet various
network security requirements.
• On secure networks, you can use the non authentication mode. In this mode, a device does not
authenticate VRRP Advertisement packets before sending them. After a peer device receives VRRP
Advertisement packets, it does not authenticate them either, but it considers them authentic and valid.
• On insecure networks, you can use the simple or message digest algorithm 5 (HMAC-MD5)
authentication mode.
2022-07-08 503
Feature Description
configured ones. If they are the same, the peer device considers the packet valid. If they are
different, the peer device considers the packet invalid and discards it.
■ HMAC-MD5 authentication: A device uses the MD5 algorithm to encrypt the locally configured
authentication key and saves the encrypted authentication key in the Authentication Data field.
After receiving a VRRP Advertisement packet, the device uses the HMAC-MD5 algorithm to encrypt
the authentication key carried in the packet and checks packet validity by comparing the encrypted
authentication key saved in the Authentication Data field with the encrypted authentication key
carried in the VRRP Advertisement packet.
Master/Backup Mode
A VRRP group comprises a master router and one or more backup routers. As shown in Figure 1, Device A is
the master router and forwards packets, and Device B and Device C are backup routers and monitor Device
A's status. If Device A fails, Device B or Device C is elected as a new master router and takes over services
from Device A.
2022-07-08 504
Feature Description
• Device A is the master. It supports delayed preemption and its VRRP priority is set to 120.
• Device B is a backup. It supports immediate preemption and its VRRP priority is set to 110.
• Device C is a backup. It supports immediate preemption and its VRRP priority is the default value 100.
1. When Device A functions properly, user traffic travels along the path Device E -> Device A -> Device D.
Device A periodically sends VRRP Advertisement packets to notify Device B and Device C of its status.
2. If Device A fails, its VRRP functions are unavailable. Because Device B has a higher priority than Device
C, Device B switches to the Master state and Device C remains in the Backup state. User traffic
switches to the new path Device E -> Device B -> Device D.
3. After Device A recovers, it enters the Backup state (its priority remains 120). After receiving a VRRP
Advertisement packet from Device B, the current master, Device A finds that its priority is higher than
that of Device B. Therefore, Device A preempts the Master state after the preemption delay elapses,
and sends VRRP Advertisement packets and gratuitous ARP packets.
After receiving a VRRP Advertisement packet from Device A, Device B finds that its priority is lower
2022-07-08 505
Feature Description
than that of Device A and changes from the Master state to the Backup state. User traffic then
switches to the original path Device E -> Device A -> Device D.
• Multi-gateway load balancing: Multiple VRRP groups with virtual IP addresses are created and specified
as gateways for different users to implement load balancing.
Figure 2 illustrates multi-gateway load balancing.
■ VRRP group 1: Device A is the master router, and Device B is the backup router.
■ VRRP group 2: Device B is the master router, and Device A is the backup router.
VRRP groups 1 and 2 back up each other and serve as gateways for different users, therefore load-
balancing service traffic.
• Single-gateway load balancing: A load-balance redundancy group (LBRG) with a virtual IP address is
created, and VRRP groups without virtual IP addresses are added to the LBRG. The LBRG is specified as
a gateway to implement load balancing for all users.
Single-gateway load balancing, an enhancement to multi-gateway load balancing, simplifies user-side
configurations and facilitates network maintenance and management.
Figure 3 shows single-gateway load balancing.
2022-07-08 506
Feature Description
■ VRRP group 1: an LBRG. Device A is the master router, and Device B is the backup router.
■ VRRP group 2: an LBRG member group. Device B is the master router, and Device A is the backup
router.
VRRP group 1 serves as a gateway for all users. After receiving an ARP request packet from a user, VRRP
group 1 returns an ARP response packet and encapsulates its virtual MAC address or VRRP group 2's
virtual MAC address in the response.
5.6.2.5 mVRRP
Principles
A switch is dual-homed to two Routers at the aggregation layer on a metropolitan area network (MAN).
Multiple VRRP groups can be configured on the two Routers to transmit various types of services. Because
each VRRP group must maintain its own state machine, a large number of VRRP Advertisement packets are
transmitted between the Routers.
To help reduce bandwidth and CPU resource consumption during VRRP packet transmission, a VRRP group
can be configured as a management Virtual Router Redundancy Protocol (mVRRP) group. Other VRRP
groups are bound to the mVRRP group and become service VRRP groups. Only the mVRRP group sends
VRRP packets to negotiate the master/backup status. The mVRRP group determines the master/backup
status of service VRRP groups.
As shown in Figure 1, an mVRRP group can be deployed on the same side as service VRRP groups or on the
interfaces that directly connect Device A and Device B.
2022-07-08 507
Feature Description
Related Concepts
mVRRP group: has all functions of a common VRRP group. Different from a common VRRP group, an mVRRP
group can be tracked by service VRRP groups and determine their statuses. An mVRRP group provides the
following functions:
• When the mVRRP group functions as a gateway, it determines the master/backup status of devices and
transmits services. In this situation, a common VRRP group with the same ID as the mVRRP group must
be created and assigned a virtual IP address. The mVRRP group's virtual IP address is a gateway IP
address set by users.
• When the mVRRP group does not function as a gateway, it determines the master/backup status of
devices but does not transmit services. In this situation, the mVRRP group does not require a virtual IP
address. You can create an mVRRP group directly on interfaces to simplify maintenance.
Service VRRP group: After common VRRP groups are bound to an mVRRP group, they become service VRRP
groups. Service VRRP groups do not need to send VRRP packets to determine their states. The mVRRP group
sends VRRP packets to determine its state and the states of all its bound service VRRP groups. A service
VRRP group can be bound to an mVRRP group in either of the following modes:
• Flowdown: The flowdown mode applies to networks on which both upstream and downstream packets
are transmitted over the same path. If the master device in an mVRRP group enters the Backup or
Initialize state, the VRRP module instructs all service VRRP groups that are bound to the mVRRP group
in flowdown mode to enter the Initialize state.
• Unflowdown: The unflowdown mode applies to networks on which upstream and downstream packets
can be transmitted over different paths. If the mVRRP group enters the Backup or Initialize state, the
VRRP module instructs all service VRRP groups that are bound to the mVRRP group in unflowdown
mode to enter the same state.
Multiple service VRRP groups can be bound to an mVRRP group. However, the mVRRP group cannot function as a
2022-07-08 508
Feature Description
Benefits
VRRP offers the following benefits:
• Simplified management. An mVRRP group determines the master/backup status of service VRRP groups.
• Reduced CPU and bandwidth resource consumption. Service VRRP groups do not need to send VRRP
packets.
Background
Virtual Router Redundancy Protocol (VRRP) can monitor the status change only in the VRRP-enabled
interface on the master device. If a VRRP-disabled interface on the master device or the uplink connecting
the interface to a network fails, VRRP cannot detect the fault, which causes traffic interruptions.
To resolve this issue, configure VRRP to monitor the VRRP-disabled interface status. If a VRRP-disabled
interface on the master device or the uplink connecting the interface to a network fails, VRRP instructs the
master device to reduce its priority to trigger a master/backup VRRP switchover.
Related Concepts
If a VRRP-disabled interface of a VRRP device goes Down, the VRRP device changes its VRRP priority in either
of the following modes:
• Increased mode: The VRRP device increases its VRRP priority by a specified value.
• Reduced mode: The VRRP device reduces its VRRP priority by a specified value.
Implementation
As shown in Figure 1, a VRRP group is configured on Device A and Device B. Device A is the master device,
and Device B is the backup device.
Device A is configured to monitor interface 1. If interface 1 fails, Device A reduces its VRRP priority and sends
a VRRP Advertisement packet carrying a reduced priority. After Device B receives the packet, it checks that its
VRRP priority is higher than the received priority and preempts the Master state.
After interface 1 goes Up, Device A restores the VRRP priority. After Device A receives a VRRP Advertisement
packet carrying Device B's priority in preemption mode, Device A checks that its VRRP priority is higher than
the received priority and preempts the Master state.
2022-07-08 509
Feature Description
Benefits
The association between VRRP and a VRRP-disabled interface helps trigger a master/backup VRRP
switchover if the VRRP-disabled interface fails or the uplink connecting the interface to a network fails.
Background
To prevent failures on a VRRP-disabled interface from causing service interruptions, configure a VRRP group
to track the VRRP-disabled interface. However, a VRRP group can track only one VRRP-disabled interface at
a time. As the network scale is expanding and more interfaces are appearing, a VRRP group is required to
track more VRRP-disabled interfaces. If the original technology is used, the configuration workload is very
large.
To reduce the configuration workload, you can add multiple VRRP-disabled interfaces to an interface
monitoring group and enable a VRRP group to track the interface monitoring group. When the link failure
ratio of the interface monitoring group reaches a specified threshold, the VRRP group performs a
master/backup switchover to ensure reliable service transmission.
2022-07-08 510
Feature Description
Related Concepts
A VRRP group can track three interface monitoring groups at the same time.
• A VRRP group can track two interface monitoring groups on the access side in normal mode (link is not
specified). When the link failure ratio on the access side reaches a specified threshold, the VRRP group
reduces the priority of the local device to trigger the remote device to preempt the Master state.
• A VRRP group can track one interface monitoring group on the network side in link mode. When the
link failure ratio on the network side reaches a specified threshold, the local device in the VRRP group
changes to the Initialize state and sends a VRRP Advertisement packet carrying a priority of 0 to the
remote device to trigger the remote device to preempt the Master state.
Implementation
Each interface in an interface monitoring group has a Down weight. If an interface goes Down, the fault
weight of the interface monitoring group to which the interface belongs increases; if an interface goes Up,
the fault weight of the interface monitoring group to which the interface belongs decreases. The fault
weight of an interface monitoring group reflects link quality. VRRP can be configured to track an interface
monitoring group. If the fault weight of the interface monitoring group changes, the system notifies the
VRRP module of the change. The VRRP module calculates the VRRP priority or status based on the fault rate
of the interface monitoring group, configured monitoring mode, and priority change value.
2022-07-08 511
Feature Description
Benefits
Configuring VRRP to track an interface monitoring group on a device where a VRRP group is configured
helps to reduce the workload for configuring the VRRP group to track VRRP-disabled interfaces.
Context
A VRRP group uses VRRP Advertisement packets to negotiate the master/backup VRRP status, implementing
device backup. If the link between devices in a VRRP group fails, VRRP Advertisement packets cannot be
exchanged to negotiate the master/backup status. A backup device attempts to preempt the master role
after a period that is three times the interval at which VRRP Advertisement packets are sent. During this
period, user traffic is still forwarded to the master device, which results in user traffic loss.
Bidirectional Forwarding Detection (BFD) is used to rapidly detect faults in links or IP routes. BFD for VRRP
enables a master/backup VRRP switchover to be completed within 1 second, thereby preventing traffic loss.
A BFD session is established between the master and backup devices in a VRRP group and is bound to the
VRRP group. BFD immediately detects communication faults in the VRRP group and instructs the VRRP
2022-07-08 512
Feature Description
Association
Usage Scenario Type of Associated Impact Mode BFD Support
Mode BFD Session
Association
A backup device Static BFD sessions or The VRRP group VRRP-enabled devices
between monitors the status of static BFD sessions adjusts priorities must support BFD.
a VRRP the master device in a with automatically according to the BFD
group VRRP group. A negotiated session status and
and a common BFD session discriminators determines whether to
common is used to monitor the perform a
BFD link between the master/backup
session master and backup switchover according
devices. to the adjusted
priorities.
Association
The master and Static BFD sessions or If the link or peer BFD VRRP-enabled devices
between backup devices static BFD sessions session goes down, must support BFD.
a VRRP monitor the link and with automatically BFD notifies the VRRP
group peer BFD sessions negotiated group of the fault.
and simultaneously. A link discriminators After receiving the
link BFD session is notification, the VRRP
and established between group immediately
peer the master and backup performs a
BFD devices. A peer BFD master/backup VRRP
sessions session is established switchover.
between a
downstream switch
and each VRRP device.
BFD helps determine
whether the fault
occurs between the
master device and
downstream switch or
between the backup
device and
downstream switch.
2022-07-08 513
Feature Description
Figure 1 Network diagram of associating a VRRP group with a common BFD session
• DeviceA (master) works in delayed preemption mode and its VRRP priority is 120.
• DeviceB works in immediate preemption mode and functions as the backup in the VRRP group with a
priority of 100.
• DeviceB in the VRRP group is configured to monitor a common BFD session. If BFD detects a fault and
the BFD session goes down, DeviceB increases its VRRP priority by 40.
1. Normally, DeviceA periodically sends VRRP Advertisement packets to notify DeviceB that it is working
properly. DeviceB monitors the status of DeviceA and the BFD session.
2. If BFD detects a fault, the BFD session goes down. DeviceB increases its VRRP priority to 140 (100 + 40
= 140), making it higher than DeviceA's VRRP priority. DeviceB then immediately preempts the master
role and sends gratuitous ARP packets to allow DeviceE to update address entries.
3. The BFD session goes up after the fault is rectified. In this case:
DeviceB restores its VRPP priority to 100 (140 – 40 = 100). DeviceB remains in the Master state and
continue to send VRRP6 Advertisement packets.
After receiving these packets, DeviceA checks that the VRRP priority carried in them is lower than the
local VRRP priority and preempts the master role after the specified VRRP status recovery delay
expires. DeviceA then sends VRRP Advertisement and gratuitous ARP packets.
After receiving a VRRP Advertisement packet that carries a priority higher than the local priority,
DeviceB enters the Backup state.
2022-07-08 514
Feature Description
4. Both DeviceA and DeviceB are restored to their original states. As such, DeviceA forwards user-to-
network traffic again.
The preceding process shows that association between VRRP and BFD differs from VRRP. Specifically, after a
VRRP group is associated with a BFD session and a fault occurs, the backup device immediately preempts
the master role by increasing its VRRP priority, and it does not wait for a period three times the interval at
which VRRP Advertisement packets are sent. This means that a master/backup VRRP switchover can be
performed in milliseconds.
Association Between a VRRP Group and Link and Peer BFD Sessions
In Figure 2, the master and backup devices monitor the status of link and peer BFD sessions. The BFD
sessions help determine whether a link fault is a local or remote fault.
DeviceA and DeviceB run VRRP. A peer BFD session is established between DeviceA and DeviceB to detect
link and device faults. A link BFD session is established between DeviceA and DeviceE and between DeviceB
and DeviceE to detect link and device faults. After DeviceB detects that the peer BFD session goes down and
the link BFD session between DeviceE and DeviceB goes up, DeviceB switches to the Master state and
forwards user-to-network traffic.
Figure 2 Network diagram of associating a VRRP group with link and peer BFD sessions
• A peer BFD session is established between DeviceA and DeviceB to detect link and device faults between
them.
• Link 1 and link 2 BFD sessions are established between DeviceE and DeviceA and between DeviceE and
DeviceB, respectively.
1. Normally, DeviceA periodically sends VRRP Advertisement packets to inform DeviceB that it is working
2022-07-08 515
Feature Description
properly and monitors the BFD session status. DeviceB monitors the status of DeviceA and the BFD
session.
2. The BFD session goes down if BFD detects either of the following faults:
• Link 1 or DeviceE fails. In this case, link 1 BFD session and the peer BFD session go down. Link 2
BFD session is up.
DeviceA's VRRP state switches to Initialize.
DeviceB's VRRP state switches to Master.
• DeviceA fails. In this case, link 1 BFD session and the peer BFD session go down. Link 2 BFD
session is up. DeviceB's VRRP state switches to Master.
3. After the fault is rectified, all the BFD sessions go up. If DeviceA works in preemption mode, DeviceA
and DeviceB are restored to their original VRRP states after VRRP negotiation is complete.
In normal cases, DeviceA's VRRP status is not impacted by a link 2 fault, instead, DeviceA continues to forward user-to-
network traffic. However, Device's VRRP status switches to Master if both the peer BFD session and link 2 BFD session go
down, and DeviceB detects the peer BFD session down event before detecting the link 2 BFD session down event. After
DeviceB detects the link 2 BFD session down event, DeviceB's VRRP status switches to Initialize.
Figure 3 shows the state machine for association between a VRRP group and link and peer BFD sessions.
Figure 3 State machine for association between a VRRP group and link and peer BFD sessions
The preceding process shows that after link BFD for VRRP and peer BFD for VRRP are configured, the backup
device can immediately switch to the Master state if a fault occurs, without waiting for a period three times
the interval at which VRRP Advertisement packets are sent or changing its VRRP priority. This means that a
master/backup VRRP switchover can be performed in milliseconds.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
2022-07-08 516
Feature Description
Principles
Metro Ethernet solutions use Virtual Router Redundancy Protocol (VRRP) tracking Bidirectional Forwarding
Detection (BFD) to detect link faults and protect links between the master and backup network provider
edges (NPEs) and between NPEs and user-end provider edges (UPEs). If UPEs do not support BFD, Metro
Ethernet solutions cannot use VRRP tracking BFD. If UPEs support 802.3ah, Metro Ethernet solutions can use
802.3ah as a substitute for BFD to detect link faults and protect links between NPEs and UPEs. Ethernet
operation, administration and maintenance (OAM) technologies, such as Ethernet in the First Mile (EFM)
OAM defined in IEEE 802.3ah, provide functions, such as link connectivity detection, link failure monitoring,
remote failure notification, and remote loopback for links between directly connected devices.
Implementation
EFM can detect only local link failures. If the link between the UPE and NPE1 fails, NPE2 cannot detect the failure. NPE2
has to wait three VRRP Advertisement packet transmission intervals before it switches to the Master state. During this
period, upstream service traffic is interrupted. To speed up master/backup VRRP switchovers and minimize the service
interruption time, configure VRRP also to track the peer BFD session.
Figure 1 shows a network on which VRRP tracking EFM is configured. NPE1 and NPE2 are configured to
belong to a VRRP group. A peer BFD session is configured to detect the faults on the two NPEs and on the
link between the two NPEs. An EFM session is configured between the UPE and NPE1 and between the UPE
and NPE2 to detect the faults on the UPE and NPEs and on the links between the UPE and NPEs. The VRRP
group determines the VRRP status of NPEs based on the link status reported by EFM and the peer BFD
session.
In Figure 1, the following example describes how EFM and a peer BFD session affect the VRRP status when a
2022-07-08 517
Feature Description
• A peer BFD session is established between NPEs to detect link and device failures on the link between
the NPEs.
• An EFM session is established between NPE1 and the UPE and between NPE2 and UPE to detects link
and node faults on the links between NPEs and the UPE.
1. In normal circumstances, NPE1 periodically sends VRRP Advertisement packets to inform NPE2 that
NPE1 works properly. NPE1 and NPE2 both track the EFM and peer BFD session status.
2. If NPE1 or the link between the UPE and NPE1 fails, the status of the EFM session between the UPE
and NPE1 changes to Discovery, the status of the peer BFD session changes to Down, and the status
of the EFM session between the UPE and NPE2 changes to Detect. NPE1's VRRP status directly
changes from Master to Initialize, and NPE2's VRRP status directly changes from Backup to Master.
3. After NPE1 or the link between the UPE and NPE1 recovers, the status of the peer BFD session
changes to Up, and the status of the EFM session between the UPE and NPE1 changes to Detect. If the
preemption function is configured on NPE1, NPE1 changes back to the Master state after VRRP
negotiation, and NPE2 changes back to the Backup state.
In normal circumstances, if the link between the UPE and NPE2 fails, NPE1 remains in the Master state and
continues to forward upstream traffic. However, NPE2's VRRP status changes to Master if NPE2 detects the Down
state of the peer BFD session before it detects the Discovery state of the link between itself and the UPE. After
NPE2 detects the Discovery state of the link between itself and the UPE, NPE2's VRRP status changes from
Master to Initialize.
2022-07-08 518
Feature Description
Benefits
VRRP tracking EFM facilitates master/backup VRRP switchovers on a network on which UPEs do not support
BFD but support 802.3ah.
Context
Association between VRRP and Ethernet in the First Mile (EFM) effectively speeds up link fault detection on
a network where UPEs do not support BFD. However, EFM can detect faults only on single-hop links. As
shown in Figure 1, EFM cannot detect faults on the link between NPE1 and UPE2 or between NPE2 and
UPE2 because UPE1 are deployed between NPE1 and NPE2 and UPE3 are deployed between NPE2 and
UPE2.
Connectivity fault management (CFM) defined in 802.1ag provides functions, such as E2E connectivity fault
detection, fault notification, fault verification, and fault locating. CFM can monitor the connectivity of the
entire network and locate connectivity faults. It can also be used together with protection switching
techniques to improve network reliability. Association between VRRP and CFM enables a VRRP group to
rapidly perform a master/backup VRRP switchover when CFM detects a link fault. This implementation
minimizes service interruption time.
Implementation
CFM can detect only local link (NPE1-UPE2 link) failures. If the remote link (NPE2-UPE2 link) fails, NPE2 functioning as
the backup cannot detect the failure. NPE2 has to wait for a period three times the interval at which VRRP
Advertisement packets are sent before it switches to the Master state. During this period, upstream service traffic is
2022-07-08 519
Feature Description
interrupted. To speed up master/backup VRRP switchovers and minimize the service interruption time, configure VRRP
also to track the peer BFD session.
Figure 2 shows a network on which VRRP tracks CFM and the peer BFD session.
• A peer BFD session is established between NPE1 and NPE2 to detect link and device failures between
them.
• A CFM session is configured between UPE2 and NPE1 and between UPE2 and NPE2 to detect the faults
on UPE2 and the NPEs and on links between UPE2 and the NPEs.
1. In normal circumstances, NPE1 periodically sends VRRP Advertisement packets to inform NPE2 that it
works properly. NPE1 monitors the CFM and peer BFD session status, and NPE2 monitors the master
device as well as the CFM and peer BFD session status.
2. If NPE1 or the link between NPE1 and UPE2 fails, the CFM session between NPE1 and UPE2 goes
down, so does the peer BFD session between NPE1 and NPE2. The CFM session between UPE2 and
NPE2 is up. In this case, NPE1's VRRP status directly changes from Master to Initialize, and NPE2's
VRRP status directly changes from Backup to Master.
3. After NPE1 or the link between UPE2 and NPE1 recovers, the peer BFD session goes up again, and the
CFM session between NPE1 and UPE2 also goes up. If NPE1 is configured to work in preemption
mode, NPE1 changes back to the Master state after VRRP negotiation, and NPE2 changes back to the
Backup state.
In normal circumstances, if the link between NPE2 and UPE2 fails, NPE1 remains in the Master state and
continues to forward upstream traffic. However, NPE2's VRRP status switches to Master if both the peer BFD
2022-07-08 520
Feature Description
session between NPE1 and PNE2 and the CFM session between NPE2 and UPE2 go down, and NPE2 detects the
peer BFD session down event before detecting the CFM session down event. After NPE2 detects the CFM session
down event, NPE2's VRRP status switches to Initialize.
Figure 3 shows the state machine for association between VRRP and CFM.
Benefits
Association between VRRP and CFM prevents service interruptions caused by dual master devices in a VRRP
group and speeds up master/backup VRRP switchovers, thereby improving network reliability.
Background
To improve network reliability, VRRP can be configured on a device to track the following objects:
• Interface
• EFM session
• BFD session
Failure of a tracked object can trigger a rapid master/backup VRRP switchover to ensure service continuity.
In Figure 1, however, if Interface 2 on Device C goes Down and its IP address (10.3.1.1) becomes
unreachable, VRRP is unable to detect the fault. As a result, user traffic is dropped.
2022-07-08 521
Feature Description
To resolve the preceding issue, you can associate VRRP with network quality analysis (NQA). Using test
instances, NQA sends probe packets to check the reachability of destination IP addresses. After VRRP is
associated with an NQA test instance, VRRP tracks the NQA test instance to implement rapid master/backup
VRRP switchovers. For the example shown in the preceding figure, you can configure an NQA test instance
on Device A to check whether the IP address 10.3.1.1 of Interface 2 on Device C is reachable.
VRRP association with an NQA test instance is required on only the local device (Device A).
Implementation
You can configure VRRP association with an NQA test instance to track a gateway Router's uplink, which is a
cross-device link. If the uplink fails, NQA instructs VRRP to reduce the gateway Router's priority by a
specified value. Reducing the priority enables another gateway Router in the VRRP group to take over
services and become the master, thereby ensuring communication continuity between hosts on the LAN
served by the gateway and the external network. After the uplink recovers, NQA instructs VRRP to restore
the gateway Router's priority.
2022-07-08 522
Feature Description
As shown in Figure 2:
• An NQA test instance is created on Device A to detect the reachability of the destination IP address
10.3.1.1.
• VRRP is configured on Device A to track the NQA test instance. If the status of the NQA test instance is
Failed, Device A reduces its priority to trigger a master/backup VRRP switchover. A VRRP group can
track a maximum of eight NQA test instances.
1. Device A tracks the NQA test instance periodically and sends VRRP Advertisement packets to notify its
status to Device B.
2. When the uplink fails, the status of the NQA test instance changes to Failed. NQA notifies VRRP of the
link detection failure, and Device A reduces its priority by a specified value. Because Device B has a
higher priority than Device A, Device B preempts the Master state and takes over services.
3. When the uplink recovers, the status of the NQA test instance changes to Success. NQA notifies VRRP
of the link detection success, and Device A restores the original priority. If preemption is enabled on
Device A, Device A preempts the Master state and takes over services after VRRP negotiation.
Benefits
VRRP association with NQA implements a rapid master/backup VRRP switchover if a cross-device uplink fails.
Context
To improve device reliability, two user gateways working in master/backup mode are connected to a Layer 3
network, and VRRP is enabled on these gateways to determine their master/backup status. After a VRRP
group is configured, if an uplink route to a network becomes unreachable due to an uplink failure or
topology change, user hosts are unaware of the change, causing service traffic loss.
Association between VRRP and route status can resolve this problem. If the uplink route is withdrawn or
becomes inactive, the VRRP group is notified of the change, adjusts its master device's VRRP priority, and
performs a master/backup VRRP switchover. This ensures that user traffic can be forwarded along a properly
functioning link.
Implementation
A VRRP group can be configured to track an uplink route to determine whether the route is reachable. The
VRRP priority of the master device decreases by a specified value. A backup device with a priority higher than
others preempts the Master state and takes over traffic. This process ensures communication continuity
2022-07-08 523
Feature Description
between these hosts and the external network. After the uplink recovers, the RM module instructs VRRP to
restore the master device's VRRP priority.
In Figure 1, a VRRP group is configured on DeviceA and DeviceB. DeviceA is the master and forwards user-
to-network traffic, and DeviceB is the backup. DeviceA in the VRRP group is configured to track the
10.1.2.0/24 route.
When the uplink between DeviceA and DeviceC fails, the route to 10.1.2.0/24 becomes unreachable. As such,
DeviceA reduces its VRRP priority by a specified value so that its new priority is lower than the priority of
DeviceB. DeviceB immediately preempts the master role and takes over traffic forwarding, thereby
preventing user traffic loss.
Figure 1 Network diagram of configuring association between VRRP and route status
• DeviceA functions as the master in the VRRP group with a priority of 120.
• DeviceB works in immediate preemption mode and functions as the backup in the VRRP group with a
priority of 100.
• DeviceA tracks the 10.1.2.0/24 route and reduces its VRRP priority by 40 if it is notified that the route is
unreachable.
1. Normally, DeviceA periodically sends VRRP Advertisement packets to inform DeviceB that it is working
properly.
2022-07-08 524
Feature Description
2. When the uplink between DeviceA and DeviceC fails, the 10.1.2.0/24 route becomes unreachable, and
the VRRP group is notified of the route status change. After receiving this notification, DeviceA reduces
its VRRP priority to 80 (120 – 40 = 80). Because the VRRP priority of DeviceB, which is working in
immediate preemption mode, is now higher than the priority of DeviceA, DeviceB immediately
preempts the master role and sends gratuitous ARP packets to allow DeviceE to update address
entries.
3. When the faulty link recovers, the 10.1.2.0/24 route becomes reachable again. DeviceA then restores
its VRRP priority to 120 (80 + 40 = 120), preempts the master role, and sends VRRP Advertisement and
gratuitous ARP packets. After DeviceB receives the Advertisement packet carrying a priority higher
than its own, it switches to the Backup state.
4. Both DeviceA and DeviceB are restored to their original states. As such, DeviceA forwards user-to-
network traffic again.
The preceding process shows that association between a VRRP group and an uplink route can prevent traffic
loss. In situations where the uplink route is unreachable, the VRRP group triggers a master/backup VRRP
switchover through priority adjustment so that the backup device can take over user-to-network traffic.
Benefits
Association between VRRP and route status helps implement a master/backup VRRP switchover when an
uplink route to a network is unreachable. It also ensures that the VRRP group performs a master/backup
VRRP switchback, minimizing traffic interruption duration.
Compared with association between VRRP and interface status, association between VRRP and route status
can detect not only faults of the directly connected uplink interface, but also device and link faults when
uplink traffic traverses multiple devices.
Background
A VRRP group is configured on Device1 and Device2 on the network shown in Figure 1. Device1 is a master
device, whereas Device2 is a backup device. The VRRP group serves as a gateway for users. User-to-network
traffic travels through Device1. However, network-to-user traffic may travel through Device1, Device2, or
both of them over a path determined by a dynamic routing protocol. Therefore, user-to-network traffic and
network-to-user traffic may travel along different paths, which interrupts services if firewalls are attached to
devices in the VRRP group, complicates traffic monitoring or statistics collection, and increases costs.
To address the preceding problems, the routing protocol is expected to select a route passing through the
master device so that the user-to-network and network-to-user traffic travels along the same path.
Association between direct routes and a VRRP group can meet expectations by allowing the dynamic routing
protocol to select a route based on the VRRP status.
2022-07-08 525
Feature Description
Related Concepts
Direct route: a 32-bit host route or a network segment route that is generated after a device interface is
assigned an IP address and its protocol status is Up. A device automatically generates direct routes without
using a routing algorithm.
Implementation
Association between direct routes and a VRRP group allows VRRP interfaces to adjust the costs of direct
network segment routes based on the VRRP status. The direct route with the master device as the next hop
has the lowest cost. A dynamic routing protocol imports the direct routes and selects the direct route with
the lowest cost. For example, VRRP interfaces on Device1 and Device2 on the network shown in Figure 1 are
configured with association between direct routes and the VRRP group. The implementation is as follows:
• Device1 in the Master state sets the cost of its route to the directly connected virtual IP network
segment to 0 (default value).
• Device2 in the Backup state increases the cost of its route to the directly connected virtual IP network
segment.
A dynamic routing protocol selects the route with Device1 as the next hop because this route costs less than
2022-07-08 526
Feature Description
the other route. Therefore, both user-to-network traffic and network-to-user traffic travel through Device1.
Usage Scenario
When a data center is used, firewalls are attached to devices in a VRRP group to improve network security.
Network-to-user traffic cannot pass through a firewall if it travels over a path different than the one used by
user-to-network traffic.
When an IP radio access network (RAN) is configured, VRRP is configured to set the master/backup status of
aggregation site gateways (ASGs) and radio service gateways (RSGs). Network-to-user and user-to-network
traffic may pass through different paths, complicating network operation and management.
Association between direct routes and a VRRP group can address the preceding problems by ensuring the
user-to-network and network-to-user traffic travels along the same path.
Principles
As shown in Figure 1, the base station attached to the cell site gateway (CSG) on a mobile bearer network
accesses aggregation nodes PE1 and PE2 over primary and secondary pseudo wires (PWs) and accesses PE3
and PE4 over primary and secondary links. PE3 and PE4 are configured to belong to a Virtual Router
Redundancy Protocol (VRRP) group. If PE1 fails, traffic switches from the primary link to the secondary link.
Before a master/backup VRRP switchover is complete, service traffic is temporarily interrupted.
2022-07-08 527
Feature Description
To meet carrier-class reliability requirements, configure devices in the VRRP group to forward traffic even
when they are in the Backup state. This configuration can prevent traffic interruptions in the preceding
scenario.
Implementation
As shown in Figure 1, upstream traffic travels along the path CSG -> PE1 -> PE3 -> RNC1/RNC2 in normal
circumstances. PE3 is in the Master state, and PE4 in the Backup state.
If PE1 fails, traffic switches from the primary link between PE1 and PE3 to the secondary link between PE2
and PE4. Because the speed of a primary/secondary link switchover is higher than that of a master/backup
VRRP switchover:
• If PE4 cannot forward traffic, service traffic is temporarily interrupted before the master/backup VRRP
switchover is complete.
• If PE4 can forward traffic, PE4 takes over service traffic forwarding even if the master/backup VRRP
switchover is not complete.
Benefits
Traffic forwarding by a backup device improves master/backup VRRP switchover performance and reduces
the service interruption time.
2022-07-08 528
Feature Description
Principles
On the network shown in Figure 1, VRRP-enabled NPEs are connected to user-side PEs through active and
standby links. User traffic travels over the active link to the master NPE1, and NPE1 forwards user traffic to
the Internet. If NPE1 is working properly, user traffic travels over the path UPE -> PE1 -> NPE1. If the active
link or NPE1's interface 1 tracked by the VRRP group fails, an active/standby link switchover and a
master/backup VRRP switchover are implemented. After the switchovers, user traffic switches to the path
UPE -> PE1 -> PE2 -> NPE2. After the fault is rectified, an active/standby link switchback and a
master/backup VRRP switchback are implemented. If the active link becomes active before the original
master device restores the Master state, user traffic is interrupted.
To prevent user traffic interruptions, the rapid VRRP switchback function is used to allow the original master
device to switch from the Backup state to the Master state immediately after the fault is rectified.
Related Concept
A VRRP switchback is a process during which the original master device switches its status from Backup to
Master after a fault is rectified.
Implementation
2022-07-08 529
Feature Description
Rapid VRRP switchback allows the original master device to switch its status from Back to Master without
using VRRP Advertisement packets to negotiate the status. For example, on the network shown in Figure 1,
device configurations are as follows:
• A common VRRP group is configured on NPE1 and NPE2 that run VRRP. An mVRRP group is configured
on directly connected interfaces of NPE1 and NPE2. The common VRRP group is bound to the mVRRP
group and becomes a service VRRP group. The mVRRP group determines the master/backup status of
the service VRRP group.
• NPE1 has a VRRP priority of 120 and works in the Master state in the mVRRP group.
• NPE2 has a VRRP priority of 100 and works in the Backup state in the mVRRP group.
• NPE1 tracks interface 1 and reduces its priority by 40 if interface 1 goes Down.
1. If NPE1 is working properly, NPE1 periodically sends VRRP Advertisement packets to notify NPE2 of
the Master state. NPE1 tracks interface 1 connected to the active link.
2. If the active link or interface 1 fails, interface 1 goes Down. The service VRRP group on NPE1 is in the
Initialize state. NPE1 reduces its mVRRP priority to 80 (120 - 40). As a result, the mVRRP priority of
NPE2 is higher than that of NPE1, and NPE2 immediately preempts the Master state. NPE2 then sends
a VRRP Advertisement packet carrying a higher priority than that of NPE1. After receiving the packet,
the mVRRP group on NPE1 stops sending VRRP Advertisement packets and enters the Backup state.
The status of the service VRRP group is the same as that of the mVRRP group on NPE2. User traffic
switches to the path UPE -> PE1 -> PE2 -> NPE2.
3. After the fault is rectified, interface 1 goes Up and NPE1 increases its VRRP priority to 120 (80 + 40).
NPE1 immediately preempts the Master state and sends VRRP Advertisement packets to NPE2. User
traffic switches back to the path UPE -> PE1 -> NPE1.
If rapid VRRP switchback is not configured and NPE1 restores its priority to 120, NPE1 has to wait until it receives
VRRP Advertisement packets carrying a lower priority than its own priority from NPE2 before preempting the
Master state.
4. NPE1 then sends VRRP Advertisement packets carrying a higher priority than NPE2's priority. After
receiving the VRRP Advertisement packets, NPE2 enters the Backup state. Both NPE1 and NPE2 restore
their previous status.
Usage Scenario
Rapid VRRP switchback applies to a specific network with all of the following characteristics:
• The master device in an mVRRP group tracks a VRRP-disabled interface or feature and reduces its VRRP
priority if the interface or feature status becomes Down.
• Devices in a VRRP group are connected to user-side devices over the active and standby links.
2022-07-08 530
Feature Description
Benefits
Rapid VRRP switchback speeds up a VRRP switchback after a fault is rectified.
Context
Common VRRP is multicast VRRP and only allows VRRP Advertisement packets to be multicast. Multicast
VRRP Advertisement packets, however, can be forwarded within only one broadcast domain (for example,
one VLAN or VSI). Therefore, common VRRP groups apply only to Layer 2 networks. However, in some
special networking scenarios, network devices need to be deployed on a Layer 3 network and work in
master/backup mode. The limitation of common VRRP means that it does not apply to devices on a Layer 3
network that need to negotiate their master/backup status using VRRP.
To address this issue, Huawei develops unicast VRRP based on VRRPv2, which allows VRRP Advertisement
packets to pass through a Layer 3 network. After a unicast VRRP group is configured on two devices on a
Layer 3 network, the master device in this group sends unicast VRRP Advertisement packets to the backup
device through the Layer 3 network, implementing master/backup status negotiation between the two
devices.
Implementation
The implementation of unicast VRRP is similar to that of common VRRP.
In addition to the master/backup status negotiation between devices, unicast VRRP provides the following
extended functions:
• Security authentication: MD5 and HMAC-SHA256 authentication can be configured for unicast VRRP
groups. To improve network security, configuring HMAC-SHA256 authentication is recommended.
• Delayed preemption: prevents the master/backup status of devices in a unicast VRRP group from
changing frequently, thereby ensuring network stability.
• Association with an uplink interface or BFD. When the master device or the master link fails, a
master/backup unicast VRRP switchover is triggered to ensure network reliability.
• Association with an interface monitoring group: When the link failure rate on the access or network side
reaches a specified threshold, the unicast VRRP group performs a master/backup switchover to ensure
network reliability.
As an extension to association between VRRP and a VRRP-disabled interface, association between a unicast VRRP
group and an interface monitoring group reduces configuration workload and implements uplink and downlink
2022-07-08 531
Feature Description
monitoring.
Application Scenarios
Unicast VRRP applies when two devices on a Layer 3 network need to use VRRP to negotiate their
master/backup status.
Unlike common VRRP, unicast VRRP does not provide redundancy protection for gateways that uses virtual IP addresses
and does not periodically send gratuitous ARP packets.
Benefits
Unicast VRRP allows two devices on a Layer 3 network to use VRRP to negotiate their master/backup status.
Unicast VRRP can be associated with a VRRP-disabled interface or BFD, if the master device in a unicast
VRRP group fails, the backup device rapidly detects the fault and becomes the new master device.
Service Overview
NodeBs and radio network controllers (RNCs) on an IP radio access network (IPRAN) do not have dynamic
routing capabilities. Static routes must be configured to allow NodeBs to communicate with access
aggregation gateways (AGGs) and allow RNCs to communicate with radio service gateways (RSGs) at the
aggregation level. To ensure that various value-added services, such as voice, video, and cloud computing,
are not interrupted on mobile bearer networks, a VRRP group can be deployed to implement gateway
redundancy. When the master device in a VRRP group goes Down, a backup device takes over, ensuring
normal service transmission and enhancing device reliability at the aggregation layer.
Networking Description
Figure 1 shows the network for the IPRAN gateway protection solution. A NodeB is connected to AGGs over
an access ring or is dual-homed to two AGGs. The cell site gateways (CSGs) and AGGs are connected using
the Pseudowire Emulation Edge-to-Edge (PWE3) technology, which ensures connection reliability. Two VRRP
groups can be configured on the AGGs and RSGs to implement gateway backup for the NodeB and RNC,
respectively.
2022-07-08 532
Feature Description
Feature Deployment
Table 1 describes VRRP-based gateway protection applications on an IPRAN.
Deploy Associate an To meet various service demands, different VRRP groups can be configured
VRRP groups mVRRP group on AGGs to provide gateway functions for different user groups. Each VRRP
on AGGs to with a service group maintains its own state machine, leading to transmission of multiple
implement VRRP group. VRRP packets on the AGGs. These packets use a significant amount of
gateway bandwidth when traversing the access network.
backup for To simplify VRRP operations and reduce bandwidth consumption, an
the NodeB. mVRRP group can be associated with service VRRP groups on AGGs. During
this process, service VRRP groups function as gateways for the NodeB and
are associated with the mVRRP group. The mVRRP group processes VRRP
Advertisement packets and determines the master/backup status of the
associated service VRRP group.
Associate an By default, when a VRRP group detects that the master device goes Down,
mVRRP group the backup device attempts to preempt the Master state after 3 seconds
with a BFD (three times the interval at which VRRP Advertisement packets are
session. broadcast). During this period, no master device forwards user traffic, which
leads to traffic forwarding interruptions.
BFD can detect link faults in milliseconds. After an mVRRP group is
associated with a BFD session and BFD detects a fault, a master/backup
VRRP switchover is implemented, preventing user traffic loss. When the
2022-07-08 533
Feature Description
master device goes Down, the BFD module instructs the backup device in
the mVRRP group to preempt the Master state and take over traffic. The
status of the service VRRP group associated with the mVRRP group changes
accordingly. This implementation reduces service interruptions.
Associate During the traffic transmission between the NodeB and RNC, user-to-
direct network network and network-to-user traffic may travel through different paths,
segment causing network operation, maintenance, and management difficulties. For
routes with a example, the NodeB sends traffic destined for the RNC through the master
service VRRP AGG. The RNC sends traffic destined for the NodeB through the backup
group. AGG. This implementation increases traffic monitoring costs. Association
between direct network segment routes and a service VRRP group can be
deployed to ensure that user-to-network and network-to-user traffic travels
through the same path.
Deploy Deploy basic RSGs provide gateway functions for the RNC. Basic VRRP functions can be
VRRP groups VRRP configured on the RSGs to implement gateway backup. In normal
on RSGs to functions. circumstances, the master device forwards user traffic. When the master
implement device goes Down, the backup device takes over.
gateway
Associate a A VRRP group can be associated with a BFD session to implement a rapid
backup for
VRRP group master/backup VRRP switchover when BFD detects a fault. When the
the RNC.
with a BFD master device goes Down, the BFD module instructs the backup device in
session. the VRRP group to preempt the Master state and take over traffic. This
implementation reduces service interruptions.
Associate Direct network segment routes can be associated with a VRRP group to
direct network ensure the same path for both user-to-network and network-to-user traffic
segment between the NodeB and RNC.
routes with a
VRRP group.
2022-07-08 534
Feature Description
path for network-to-user traffic is RNC -> RSG1 -> P -> AGG1 -> CSG.
When AGG1 goes Down, a primary/secondary PW switchover is performed. Traffic sent from the NodeB goes
through the CSGs to AGG2 through the new primary PW. AGG2 forwards the traffic to RSG1 through the P
device and RSG2. Then, RSG1 sends the traffic to the RNC. The path for user-to-network traffic is CSG ->
AGG2 -> P -> RSG2 -> RSG1 -> RNC, and the path for network-to-user traffic is RNC -> RSG1 -> RSG2 -> P -
> AGG2 -> CSG.
When AGG1 recovers, it becomes the master device after a specified preemption delay elapses. AGG2 then
becomes the backup device. Traffic sent from the NodeB goes through the CSGs to AGG1 over the previous
primary PW. AGG1 sends the traffic to RSG1 through the P device. RSG1 then sends the traffic to the RNC.
The path for user-to-network traffic is CSG -> AGG1 -> P -> RSG1 -> RNC, and the path for network-to-user
traffic is RNC -> RSG1 -> P -> AGG1 -> CSG.
2022-07-08 535
Feature Description
PW pseudo wire
Definition
Easy-to-use Ethernet techniques support good bandwidth expansibility on low-cost hardware. With these
advantages, Ethernet services and structures are the first choice for many enterprise networks, metropolitan
area networks (MANs), and wide area network (WANs). The increasing popularity of Ethernet applications
encourages carriers to use improved Ethernet OAM functions to maintain and operate Ethernet networks.
OAM mechanisms for server-layer services such as synchronous digital hierarchy (SDH) and for client-layer
services such as IP cannot be used on Ethernet networks. Ethernet OAM differs from client- or server-layer
OAM and has been developed to support the following functions:
These functions help carriers provide services based on service level agreements (SLAs).
Ethernet operation, administration and maintenance (OAM) is used for Ethernet networks.
2022-07-08 536
Feature Description
• Fault management
■ Ethernet OAM sends detection packets on demand or periodically to monitor network connectivity.
■ Ethernet OAM uses methods similar to Packet Internet Groper (PING) and traceroute used on IP
networks to locate and diagnose faults on Ethernet networks.
■ Ethernet OAM is used together with a protection switching protocol to trigger a device or link
switchover if a connectivity fault is detected. Switchovers help networks achieve carrier-class
reliability, by ensuring that network interruptions are less than or equal to 50 milliseconds.
• Performance management
Ethernet OAM measures network transmission parameters including packet loss ratio, delay, and jitter
and collects traffic statistics including the numbers of sent and received bytes and the number of frame
errors. Performance management is implemented on access devices. Carriers use this function to
monitor network operation and dynamically adjust parameters in real time based on statistical data.
This process reduces maintenance costs.
Link- Monitors physical EFM supports link continuity EFM is used on links between
level Ethernet links directly check, fault detection, fault customer edges (CEs) and user-end
Ethernet connecting carrier advertisement, and loopback for provider edges (UPEs) on a
OAM networks to user P2P Ethernet link maintenance. metropolitan area network (MAN)
networks. For Unlike CFM that is used for a shown in Figure 1. It helps maintain
example, the Institute specific type of service, EFM is the reliability and stability of
of Electrical and used on links transmitting connections between a user network
Electronics Engineers various services. and a provider network. EFM monitors
(IEEE) 802.3ah, also and detects faults in P2P Ethernet
known as Ethernet in physical links or simulated links.
the First Mile (EFM),
supports Ethernet
OAM for the last-
mile links and also
monitors direct
physical Ethernet
links.
Network- Checks network IEEE 802.1ag, also known as CFM is used at the access and
2022-07-08 537
Feature Description
level connectivity, connectivity fault management aggregation layers of the MAN shown
Ethernet pinpoints connectivity (CFM), defines OAM functions, in Figure 1. For example, CFM
OAM faults, and monitors such as continuity check (CC), monitors the link between a user-end
E2E network loopback (LB), and linktrace provider edge (UPE) and a PE. It
performance at the (LT), for Ethernet bearer monitors network-wide connectivity
access and networks. CFM applies to large- and detects connectivity faults. CFM is
aggregation layers. scale E2E Ethernet networks. used together with protection
For example, IEEE switchover mechanisms to maintain
802.1ag (CFM) and network reliability.
Y.1731.
Y.1731 is an OAM protocol Y.1731 is a CFM enhancement that
defined by the applies to access and aggregation
Telecommunication networks. Y.1731 supports
Standardization Sector of the performance monitoring functions,
International such as LM and DM, in addition to
Telecommunication Union (ITU- fault management that CFM supports.
T). It covers items defined in
IEEE 802.1ag and provides
additional OAM messages for
fault management and
performance monitoring. Fault
management includes alarm
indication signal (AIS), remote
defect indication (RDI), locked
signal (LCK), test signal,
maintenance communication
channel (MCC), experimental
(EXP) OAM, and vendor specific
(VSP) OAM. Performance
monitoring includes frame loss
measurement (LM) and delay
measurement (DM).
2022-07-08 538
Feature Description
Benefits
P2P EFM, E2E CFM, E2E Y.1731, and their combinations are used to provide a complete Ethernet OAM
solution, which brings the following benefits:
• Ethernet is deployed near user premises using remote terminals and roadside cabinets at remote central
offices or in unattended areas. Ethernet OAM allows remote maintenance, saving the trouble in onsite
maintenance. Engineers operate detection, diagnosis, and monitoring protocols and techniques from
remote locations to maintain Ethernet networks. Remote OAM maintenance saves the trouble of onsite
maintenance and helps reduce maintenance and operation expenditures.
• Ethernet OAM supports various performance monitoring tools that are used to monitor network
operation and assess service quality based on SLAs. If a device using the tools detects faults, the device
sends traps to a network management system (NMS). Carriers use statistics and trap information on
NMSs to adjust services. The tools help ensure proper transmission of voice and data services.
OAMPDUs
EFM works at the data link layer and uses protocol packets called OAM protocol data units (OAMPDUs).
EFM devices periodically exchange OAMPDUs to report link status, helping network administrators
effectively manage networks. Figure 1 shows the format and common types of OAMPDUs. Table 1 lists and
describes fields in an OAMPDU.
Field Description
Dest addr Destination MAC address, which is a slow-protocol multicast address. Network
bridges cannot forward slow-protocol packets. EFM OAMPDUs cannot be
forwarded over multiple devices, even if OAM is supported or enabled on the
devices.
Source addr Source address, which is a unicast MAC address of a port on the transmit end. If no
port MAC address is specified on the transmit end, the bridge MAC address of the
transmit end is used.
Subtype Subtype of a slow protocol. The value is 0x03, which means that the slow sub-
protocol is EFM.
2022-07-08 540
Feature Description
Field Description
Critical Event
Link Fault
Information OAMPDU Used for peer discovery. Two OAM entities in the handshake phase send
OAM PDUs at a specified interval to monitor link connectivity.
Used for fault notification. Upon receipt of an OAM PDU carrying Critical
Event in the Flags field, the local end sends an alarm to notify the NMS of
the remote device fault.
Event Notification Used to monitor links. If an errored frame event, errored symbol period event
OAMPDU , or errored frame second summary event occurs on an interface, the
interface sends an Event Notification OAMPDU to notify the remote interface
of the event.
Loopback Control OAMPDU Used to enable or disable the remote loopback function.
Connection Modes
EFM supports two connection modes: active and passive. Table 3 describes capabilities of processing
OAMPDUs in the two modes.
2022-07-08 541
Feature Description
• An EFM connection can be initiated only by an OAM entity working in active mode. An OAM entity working in
passive mode waits to receive a connection request from its peer entity. Two OAM entities that both work in
passive mode cannot establish an EFM connection between them.
• An OAM entity that is to initiate a loopback request must work in active mode.
5.7.2.2 Background
As telecommunication technologies develop quickly and the demand for service diversity is increasing,
various user-oriented tele services are being provided over digital and intelligent media through broadband
paths. Backbone network technologies, such as synchronous digital hierarchy (SDH), Asynchronous Transfer
Mode (ATM), passive optical network (PON), and dense wavelength division multiplexing (DWDM), grow
mature and popular. The technologies allow the voice, data, and video services to be transmitted over a
single path to every home. Telecommunication experts and carriers focus on using existing network
resources to support new types of services and improve the service quality. The key point is to provide a
solution to the last-mile link to a user network.
A "last mile" reliability solution also needs to be provided. High-end clients, such as banks and financial
companies, demand high reliability. They expect carriers to monitor both carrier networks and last-mile links
that connect users to those carrier networks. EFM can be used to satisfy these demands.
2022-07-08 542
Feature Description
On the network shown in Figure 1, EFM is an OAM mechanism that applies to the last-mile Ethernet access
links to users. Carriers use EFM to monitor link status in real time, rapidly locate failed links, and identify
fault types if faults occur. OAM entities exchange various OAMPDUs to monitor link connectivity and locate
link faults.
OAM Discovery
During the discovery phase, a local EFM entity discovers and establishes a stable EFM connection with a
remote EFM entity. Figure 2 shows the discovery process.
2022-07-08 543
Feature Description
EFM entities at both ends of an EFM connection periodically exchange Information OAMPDUs to monitor
link connectivity. The interval at which Information OAMPDUs are sent is also known as an interval between
handshakes. If an EFM entity does not receive Information OAMPDUs from the remote EFM entity within the
connection timeout period, the EFM entity considers the connection interrupted and sends a trap to the
network management system (NMS). Establishing an EFM connection is a way to monitor physical link
connectivity automatically.
Link Monitoring
Monitoring Ethernet links is difficult if network performance deteriorates while traffic is being transmitted
over physical links. To resolve this issue, the EFM link monitoring function can be used. This function can
detect data link layer faults in various environments. EFM entities that are enabled with link monitoring
exchange Event Notification OAMPDUs to monitor links.
If an EFM entity receives a link event listed in Table 1, it sends an Event Notification OAMPDU to notify the
remote EFM entity of the event and also sends a trap to an NMS. After receiving the trap on the NMS, an
administrator can determine the network status and take remedial measures as needed.
2022-07-08 544
Feature Description
Errored symbol If the number of symbol errors that This event helps the device detect code
period event occur on a device's interface during a errors during data transmission at the
specified period of time reaches a physical layer.
specified upper limit, the device
generates an errored symbol period
event, advertises the event to the
remote device, and sends a trap to the
NMS.
Errored frame event If the number of frame errors that This event helps the device detect frame
occur on a device's interface during a errors that occur during data transmission
specified period of time reaches a at the MAC sublayer.
specified upper limit, the device
generates an errored frame event,
advertises the event to the remote
device, and sends a trap to the NMS.
Errored frame An errored frame second is a one- This event helps the device detect errored
seconds summary second interval wherein at least one frame seconds that occur during data
event frame error is detected. If the number transmission at the MAC sublayer.
of errored frame seconds that occur
during a specified period of time
reaches a specified upper limit on a
device's interface, the device generates
an errored frame second summary
event, advertises the event to the
remote device, and sends a trap to the
NMS.
Fault Notification
After the OAM discovery phase finishes, two EFM entities at both ends of an EFM connection exchange
Information OAMPDUs to monitor link connectivity. If traffic is interrupted due to a remote device failure,
the remote EFM entity sends an Information OAMPDU carrying an event listed in Table 2 to the local EFM
entity. After receiving the notification, the local EFM entity sends a trap to the NMS. An administrator can
view the trap on the NMS to determine link status and take measures to rectify the fault.
2022-07-08 545
Feature Description
Link fault If a loss of signal (LoS) error occurs because the interval at which OAMPDUs
are sent elapses or a physical link fails, the local device sends a trap to the
NMS.
Critical event If an unidentified critical event occurs because a fault is detected using
association between the remote EFM entity and a specific feature, the local
device sends a trap to the NMS. Remote EFM entities can be associated with
protocols, including Bidirectional Forwarding Detection (BFD), connectivity
fault management (CFM), and Multiprotocol Label Switching (MPLS) OAM.
Remote Loopback
Figure 3 demonstrates the principles of remote loopback. When a local interface sends non-OAMPDUs to a
remote interface, the remote interface loops the non-OAMPDUs back to the local interface, not to the
destination addresses of the non-OAMPDUs. This process is called remote loopback. An EFM connection
must be established to implement remote loopback.
A device enabled with remote loopback discards all data frames except OAMPDUs, causing a service
interruption. To prevent impact on services, use remote loopback to check link connectivity and quality
before a new network is used or after a link fault is rectified.
The local device calculates communication quality parameters such as the packet loss ratio on the current
link based on the numbers of sent and received packets. Figure 4 shows the remote loopback process.
2022-07-08 546
Feature Description
If the local device attempts to stop remote loopback, it sends a message to instruct the remote device to
disable remote loopback. After receiving the message, the remote device disables remote loopback.
If remote loopback is left enabled, the remote device keeps looping back service data, causing a service
interruption. To prevent this issue, a capability can be configured to disable remote loopback automatically
after a specified timeout period. After the timeout period expires, the local device automatically sends a
message to instruct the remote device to disable remote loopback.
2022-07-08 547
Feature Description
shown in Figure 1, when EFM detects a fault in the link between CE1 and CE4, association between EFM and
EFM interfaces can be used to trigger an active/standby link switchover, improving transmission quality and
reliability.
Maintenance Domain
MDs are discrete areas within which connectivity fault detection is enabled. The boundary of an MD is
determined by MEPs configured on interfaces. An MD is identified by an MD name.
To help locate faults, MDs are divided into levels 0 through 7. A larger value indicates a higher level, and an
MD covers a larger area. One MD can be tangential to another MD. Tangential MDs share a single device
and this device has one interface in each of the MDs. A lower level MD can be nested in a higher level MD.
An MD must be fully nested in another MD, and the two MDs cannot overlap. A higher level MD cannot be
nested in a lower level MD.
Classifying MDs based on levels facilitates fault diagnosis. MD2 is nested in MD1 on the network shown in
Figure 1. If a fault occurs in MD1, PE2 through PE6 and all the links between the PEs are checked. If no fault
is detected in MD2, PE2, PE3, and PE4 are working properly. This means that the fault is on PE5, PE6, or PE7
or on a link between these PEs.
In actual network scenarios, a nested MD can monitor the connectivity of the higher level MD in which it is
nested. Level settings allow 802.1ag packets to transparently travel through a nested MD. For example, on
the network shown in Figure 1, MD2 with the level set to 3 is nested in MD1 with the level set to 6. 802.1ag
packets must transparently pass through MD2 to monitor the connectivity of MD1. The level setting allows
802.1ag packets to pass through MD2 to monitor the connectivity of MD1 but prevents 802.1ag packets that
monitor MD2 connectivity from passing through MD1. Setting levels for MDs helps locate faults.
Figure 1 MDs
802.1ag packets are exchanged and CFM functions are implemented based on MDs. Properly planned MDs
help a network administrator locate faults.
Default MD
A single default MD with the highest priority can be configured for each device according to Std 802.1ag-
2007.
2022-07-08 549
Feature Description
On the network shown in Figure 2, if default MDs with the same level as the higher level MDs are
configured on devices in lower level MDs, MIPs are generated based on the default MDs to reply to requests
sent by devices in higher level MDs. CFM detects topology changes and monitors the connectivity of both
higher and lower level MDs.
The default MD must have a higher level than all MDs to which MEPs configured on the local device belong.
The default MD must also be of the same level as a higher level MD. The default MD transmits high level
request messages and generates MIPs to send responses.
Standard 802.1ag-2007 states that one default MD can be configured on each device and associated with
multiple virtual local area networks (VLANs). VLAN interfaces can automatically generate MIPs based on the
default MDs and a creation rule.
Maintenance Association
Multiple MAs can be configured in an MD as needed. Each MA contains MEPs. An MA is uniquely identified
by an MD name and an MA name.
An MA serves a specific service such as VLAN. A MEP in an MA sends packets carrying tags of the specific
service and receives packets sent by other MEPs in the MA.
2022-07-08 550
Feature Description
• Outward-facing MEP: sends packets out of the interface on which the MEP is configured.
Manual configuration Only IEEE Std 802.1ag-2007 supports manual MIP configuration. The MIP level
must be set. Manually configured MIPs are preferable to automatically generated
MIPs. Although configuring MIPs manually is easy, managing many manually
configured MIPs is difficult and errors may occur.
Automatic creation A device automatically generates MIPs based on configured creation rules.
Configuring creation rules is complex, but properly configured rules ensure correct
MIP settings.
The following part describes automatic MIB creation principles.
2022-07-08 551
Feature Description
Version Manually Creation Rule MEPs Are Configured for MIPs Are Created
Configured Low-Level MDs
MIPs Exist on
an Interface
None - -
2. Query all interfaces in the service instance and check whether MEPs are configured on these interfaces.
3. Query levels of all MEPs and locate the MEP with the highest level.
MIPs are separately calculated in each service instance such as a VLAN. In a single service instance, MAs in
MDs with different levels have the same VLAN ID but different levels.
For each service instance of each interface, the device attempts to calculate a MIP from the lowest level MEP
based on the rules listed in Table 1 and the following conditions:
• Each MD on a single interface has a specific level and is associated with multiple creation rules. The
creation rule with the highest priority applies. An explicit rule has a higher priority than a default rule,
and a default rule takes precedence over a none rule.
• The level of a MIP must be higher than any MEP on the same interface.
• An explicit rule applies to an interface only when MEPs are configured on the interface.
• A single MIP can be generated on a single interface. If multiple rules for generating MIPs with different
levels can be used, a MIP with the lowest level is generated.
2022-07-08 552
Feature Description
The following example illustrates how to create a MIP based on a default rule defined in IEEE Std 802.1ag-
2007.
On the network shown in Figure 5, MD1 through MD5 are nested in MD7, and MD2 through MD5 are
nested in MD1. MD7 has a higher level than MD1 through MD5, and MD1 has a higher level than MD2
through MD5. Multiple MEPs are configured on Device A in MD1, and the MEPs belong to MDs with
different levels.
2022-07-08 553
Feature Description
A default rule is configured on Device A to create a MIP in MD1. The procedure for creating the MIP is as
follows:
1. Device A compares MEP levels and finds the MEP at level 5, the highest level. The MEP level is
determined by the level of the MD to which the MEP belongs.
2. Device A selects the MD at level 6, which is higher than the MEP of level 5.
Hierarchical MP Maintenance
MEPs and MIPs are maintenance points (MPs). MPs are configured on interfaces and belong to specific MAs
shown in Figure 6.
Figure 6 MPs
The scope of maintenance performed and the types of maintenance services depend on the need of the
organizations that use carrier-class Ethernet services. These organizations include leased line users, service
providers, and network carriers. Users purchase Ethernet services from service providers, and service
providers use their networks or carrier networks to provide E2E Ethernet services. Carriers provide transport
services.
Figure 7 shows locations of MEPs and MIPs and maintenance domains for users, service providers, and
carriers.
2022-07-08 554
Feature Description
Operator 1, Operator 2, Service provider, and Customer use MDs with levels 3, 4, 5, and 6, respectively. A
higher MD level indicates a larger MD.
Field Description
2022-07-08 555
Feature Description
Field Description
MD Level Level of an MD. The value ranges from 0 to 7. A larger value indicates a
higher level.
OpCode Message code value, specifying a specific type of CFM packet. Table 4
describes the types of CFM packets.
0x01 Continuity check message (CCM) Used for monitoring E2E link connectivity.
0x02 Loopback reply (LBR) message Reply to a Loopback message (LBM). LBRs are sent by
local nodes enabled with loopback.
0x03 Loopback message (LBM) Sent by an interface that initiates loopback detection.
0x04 Linktrace reply (LTR) message Reply to a Linktrace message (LTM). LTRs are sent by
local nodes enabled with linktrace.
5.7.3.2 Background
IP-layer mechanisms, such as Simple Network Management Protocol (SNMP), IP ping, and IP traceroute, are
used to manage network-wide services, detect faults, and monitor performance on traditional Ethernet
networks. These mechanisms are unsuitable for client-layer E2E Ethernet operation and management.
2022-07-08 556
Feature Description
CFM supports service management, fault detection, and performance monitoring on the E2E Ethernet
network. In Figure 1:
• A network is logically divided into maintenance domains (MDs). For example, network devices that a
single Internet service provider (ISP) manages are in a single MD to distinguish between ISP and user
networks.
• Two maintenance association end points (MEPs) are configured on both ends of a management
network segment to be maintained to determine the boundary of an MD.
• Maintenance association intermediate points (MIPs) can be configured as needed. A MEP initiates a test
request, and the remote MEP (RMEP) or MIP responds to the request. This process provides information
about the management network segment to help detect faults.
CFM supports level-specific MD management. An MD at a given level can manage MDs at lower levels but
cannot manage an MD at a higher level than its own. Level-specific MD management is used to maintain a
service flow based on level-specific MDs and different types of service flows in an MD.
Continuity Check
CC monitors the connectivity of links between MEPs. A MEP periodically sends multicast continuity check
messages (CCMs) to an RMEP in the same MA. If an RMEP does not receive a CCM within a period 3.5 times
the interval at which CCMs are sent, the RMEP considers the path between itself and the MEP faulty.
2022-07-08 557
Feature Description
Figure 1 CC
1. CCM generation
A MEP generates and sends CCMs. MEP1, MEP2, and MEP3 are in the same MA on the network shown
in Figure 1 and are enabled to send CCMs to one another at the same interval.
Each CCM carries a level equal to the MEP level.
3. Fault identification
If a MEP does not receive CCMs from its RMEP within a period three times the interval at which CCMs
are sent, the MEP considers the path between itself and the RMEP faulty. A log is generated to provide
information for fault diagnosis. A user can implement loopback or linktrace to locate the fault. MEPs
in an MA exchange CCMs to monitor links, implementing multipoint to multipoint (MP2MP) detection.
4. CCM processing
If a MEP receives a CCM carrying a level higher than the local level, it forwards this CCM. If a MEP
receives a CCM carrying a level lower than the local level, it does not forward this CCM. This process
prevents a lower level CCM from being sent to a higher level MD.
Loopback
Loopback is also called 802.1ag MAC ping. Similar to IP ping, loopback monitors the connectivity of a path
between a local MEP and an RMEP.
A MEP initiates an 802.1ag MAC ping test to monitor the reachability of an RMEP or MIP destination
address. The MEP, MIP, and RMEP have the same level and they can share an MA or be in different MAs.
The MEP sends Loopback messages (LBMs) to the RMEP or MIP. After receiving the messages, the RMEP or
MIP replies with loopback replies (LBRs). Loopback helps locate a faulty node because a faulty node cannot
2022-07-08 558
Feature Description
send an LBR in response to an LBM. LBMs and LBRs are unicast packets.
The following example illustrates the implementation of loopback on the network shown in Figure 2.
Figure 2 Loopback
CFM is configured to monitor a path between PE1 (MEP1) and PE4 (MEP2). The MD level of these MEPs is 6.
A MIP with a level of 6 is configured on PE2 and PE3. If a fault is detected in a link between PE1 and PE4,
loopback can be used to locate the fault. Figure 3 illustrates the loopback process.
MEP1 can measure the network delay based on 802.1ag MAC ping results or the frame loss ratio based on
the difference between the number of LBMs and the number of LBRs.
Linktrace
Linktrace is also called 802.1ag MAC trace. Similar to IP traceroute, linktrace identifies a path between two
MEPs.
2022-07-08 559
Feature Description
A MEP initiates an 802.1ag MAC trace test to monitor a path to an RMEP or MIP destination address. The
MEP, MIP, and RMEP have the same level and they can share an MA or be in different MAs. A source MEP
constructs and sends a Linktrace message (LTM) to a destination MEP. After receiving this message, each
MIP forwards it and replies with a linktrace reply (LTR). Upon receipt, the destination MEP replies with an
LTR and does not forward the LTM. The source MEP obtains topology information about each hop on the
path based on the LTRs. LTMs are multicast packets and LTRs are unicast packets.
Figure 4 Linktrace
The following example illustrates the implementation of linktrace on the network shown in Figure 4.
1. MEP1 sends MEP2 an LTM carrying a time to live (TTL) value and the MAC address of the destination
MEP2.
2. After the LTM arrives at MIP1, MIP1 reduces the TTL value in the LTM by 1 and forwards the LTM if
the TTL is not zero. MIP1 then replies with an LTR to MEP1. The LTR carries forwarding information
and the TTL value carried by the LTM when MIP1 received it.
3. After the LTM reaches MIP2 and MEP2, the process described above for MIP1 is repeated for MIP2 and
MEP2. In addition, MEP2 determines that its MAC address is the destination address carried in the LTM
and therefore does not forward the LTM.
4. The LTRs from MIP1, MIP2, and MEP2 provide MEP1 with information about the forwarding path
between MEP1 and MEP2.
If a fault occurs on the path between MEP1 and MEP2, MEP2 or a MIP cannot receive the LTM or
reply with an LTR. MEP1 can locate the faulty node based on such a response failure. For example, if
the link between MEP1 and MIP2 works properly but the link between MIP2 and MEP2 fails, MEP1 can
receive LTRs from MIP1 and MIP2 but fails to receive a reply from MEP2. MEP1 then considers the
path between MIP2 and MEP2 faulty.
Alarm Types
2022-07-08 560
Feature Description
If CFM detects a fault in an E2E link, it triggers an alarm and sends the alarm to the network management
system (NMS). A network administrator uses the information to troubleshoot. Table 1 describes alarms
supported by CFM.
hwDot1agCfmUnexpectedMEGLevelCleared
During an interval equal to 3.5 times the CCM transmission
period, a MEP does not receive CCM frames with an incorrect
MEG level.
hwDot1agCfmMismerge A MEP receives a CCM frame with a correct MEG level but an
incorrect MEG ID.
hwDot1agCfmUnexpectedMEP A MEP receives a CCM frame with a correct MEG level and MEG
ID but an unexpected MEP ID.
hwDot1agCfmUnexpectedPeriod A MEP receives a CCM frame with a correct MEG level, MEG ID,
and MEP ID but a Period field value different than its own CCM
transmission period.
2022-07-08 561
Feature Description
hwDot1agCfmExceptionalMACStatus The interface connecting the RMEP to the MEP does not work
properly based on Status type-length-value (TLV) information
carried in a CCM sent by an RMEP.
hwDot1agCfmExceptionalMACStatusCleared
The interface connecting the RMEP to the MEP is restored based
on Status TLV information carried in a CCM sent by an RMEP.
hwDot1agCfmRDI A MEP receives a CCM frame with the RDI field set.
hwDot1agCfmRDICleared A MEP receives a CCM frame with the RDI field cleared.
Alarm Anti-jitter
Multiple alarms and clear alarms may be generated on an unstable network enabled with CC. These alarms
consume system resources and deteriorate system performance. An RMEP activation time can be set to
prevent false alarms, and an alarm anti-jitter time can be set to limit the number of alarms generated.
Function Description
Setting
RMEP activation Prevents false alarms. A local MEP with the ability to receive CCMs can accept CCMs only
time after the RMEP activation time elapses.
Alarm Suppression
If different types of faults trigger more than one alarm, CFM alarm suppression allows the alarm with the
highest level to be sent to the NMS. If alarms persist after the alarm with the highest level is cleared, the
alarm with the second highest level is sent to the NMS. The process repeats until all alarms are cleared.
2022-07-08 562
Feature Description
• A single fault may trigger alarms with different levels. After the alarm with the highest level is cleared,
alarms with lower levels may also be cleared.
5.7.4.1 Background
EFM and CFM are used to detect link faults. Y.1731 is an enhancement of CFM and is used to monitor
service performance.
Figure 1 shows typical Y.1731 networking. Y.1731 performance monitoring tools can be used to assess the
quality of the purchased Ethernet tunnel services or help a carrier conduct regular service level agreement
(SLA) monitoring.
Function Overview
Y.1731 can manage fault information and monitor performance.
• Fault management functions include continuity check (CC), loopback (LB), and linktrace (LT). The
principles of Y.1731 fault management are the same as those of CFM fault management.
• Performance monitoring functions include single- and dual-ended frame loss measurement, one- and
two-way frame delay measurement, alarm indication signal (AIS), Ethernet test function (ETH-Test),
Single-ended Synthetic Loss Measurement (SLM), Ethernet lock signal function (ETH-LCK), ETH-BN on
virtual private LAN service (VPLS) networks, virtual leased line (VLL) networks, and virtual local area
networks (VLANs). BGP VPLS and VLL scenarios support AIS only.
2022-07-08 563
Feature Description
Single-ended Collects frame loss statistics to To collect frame loss statistics, select either single-
Frame Loss assess the quality of links or dual-ended frame loss measurement:
Measurement between MEPs, independent of Dual-ended frame loss measurement provides
continuity check (CC). more accurate results than single-ended frame loss
measurement. The interval between dual-ended
frame loss measurements varies with the interval
between CCM transmissions. The CCM
transmission interval is shorter than the interval
between single-ended frame loss measurements.
Dual-ended frame loss measurement allows for a
short interval between dual-ended frame loss
measurements.
Single-ended frame loss measurement can be used
to minimize the impact of many CCMs on the
network.
Dual-ended Frame Collects frame loss statistics to To collect frame loss statistics, select either single-
Loss Measurement assess link quality on CFM CC- or dual-ended frame loss measurement:
enabled devices. Dual-ended frame loss measurement provides
more accurate results than single-ended frame loss
measurement. The interval between dual-ended
frame loss measurements varies with the interval
between CCM transmissions. The CCM
transmission interval is shorter than the interval
between single-ended frame loss measurements.
Dual-ended frame loss measurement allows for a
short interval between dual-ended frame loss
measurements.
Single-ended frame loss measurement can be used
to minimize the impact of many CCMs on the
network.
One-way Frame Measures the network delay on To measure the link delay, select either one- or
Delay a unidirectional link between two-way frame delay measurement:
Measurement MEPs. One-way frame delay measurement can be used
to measure the delay on a unidirectional link
Two-way Frame Measures the network delay on
between a MEP and its RMEP. The MEP must
Delay a bidirectional link between
2022-07-08 564
Feature Description
AIS Detects server-layer faults and AIS is used to suppresses local alarms when faults
suppresses alarms, minimizing must be rapidly detected.
the impact on network
management systems (NMSs).
ETH-Test Verifies bandwidth throughput ETH-Test is used for a carrier to verify the
and bit errors. throughput and bit errors for a newly established
link.
ETH-Test is used for a user to verify the
throughput and bit errors for a leased link.
ETH-LCK Informs the server-layer (sub- The ETH-LCK function must work with the ETH-
layer) MEP of administrative Test function.
locking and the interruption of
traffic destined for the MEP in
the inner maintenance domain
(MD).
Single-ended Collects frame loss statistics on Single-ended synthetic frame LM is used to collect
Synthetic Loss point-to-multipoint or E-Trunk accurate frame loss statistics on point-to-
Measurement links to monitor link quality. multipoint links.
(SLM)
2022-07-08 565
Feature Description
ETH-LM
Ethernet frame loss measurement (ETH-LM) enables a local MEP and its RMEP to exchange ETH-LM frames
to collect frame loss statistics on E2E links. ETH-LM modes are classified as near- or far-end ETH-LM.
Near-end ETH-LM applies to an inbound interface, and far-end ETH-LM applies to an outbound interface on
a MEP. ETH-LM counts the number of errored frame seconds to determine the duration during which a link
is unavailable.
■ On-demand measurement collects single-ended frame loss statistics at a time or a specific number
of times for diagnosis.
A local MEP sends a loss measurement message (LMM) carrying an ETH-LM request to its RMEP. After
receiving the request, the RMEP responds with a loss measurement reply (LMR) carrying an ETH-LM
response. Figure 1 illustrates the process for single-ended frame loss measurement.
After single-ended frame loss measurement is enabled, a MEP on PE1 sends an RMEP on PE2 an ETH-
LMM carrying an ETH-LM request. The MEP then receives an ETH-LMR message carrying an ETH-LM
response from the RMEP on PE2. The ETH-LMM carries a local transmit counter TxFCl (with a value of
TxFCf), indicating the time when the message was sent by the local MEP. After receiving the ETH-LMM,
PE2 replies with an ETH-LMR message, which carries the following information:
2022-07-08 566
Feature Description
■ RxFCf: value of the local counter RxFCl at the time of ETH-LMM reception
■ TxFCb: value of the local counter TxFCl at the time of ETH-LMR transmission
After receiving the ETH-LMR message, PE1 measures near- and far-end frame loss based on the
following values:
■ Received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and local counter RxFCl value that is
the time when this ETH-LMR message was received. These values are represented as TxFCf[tc],
RxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this ETH-LMR message was received.
■ Previously received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and local counter RxFCl
value that is the time when this ETH-LMR message was received. These values are represented as
TxFCf[tp], RxFCf[tp], TxFCb[tp], and RxFCl[tp].
tp is the time when the previous ETH-LMR message was received.
2022-07-08 567
Feature Description
After dual-ended frame loss measurement is configured, each MEP periodically sends a CCM carrying a
request to its RMEP. After receiving the CCM, the RMEP collects near- and far-end frame loss statistics
but does not forward the message. The CCM carries the following information:
■ TxFCf: value of the local counter TxFCl at the time of CCM transmission
■ RxFCb: value of the local counter RxFCl at the time of the reception of the last CCM
PE1 uses received information to measure near- and far-end frame loss based on the following values:
■ Received CCM's TxFCf, RxFCb, and TxFCb values and local counter RxFCl value that is the time
when this CCM was received. These values are represented as TxFCf[tc], RxFCb[tc], TxFCb[tc], and
RxFCl[tc].
tc is the time when this CCM was received.
■ Previously received CCM's TxFCf, RxFCb, and TxFCb values and local counter RxFCl value that is the
time when this CCM was received. These values are represented as TxFCf[tp], RxFCb[tp], TxFCb[tp],
and RxFCl[tp].
tp is the time when the previous CCM was received.
ETH-DM
Delay measurement (DM) measures the delay and its variation. A MEP sends its RMEP a message carrying
ETH-DM information and receives a response message carrying ETH-DM information from its RMEP.
2022-07-08 568
Feature Description
message, the RMEP measures the one-way frame delay and its variation.
One-way frame delay measurement can be implemented only after the MEP synchronizes the time with
its RMEP. The delay variation can be measured regardless of whether the MEP synchronizes the time
with its RMEP. If a MEP synchronizes its time with its RMEP, the one-way frame delay and its variation
can be measured. If the time is not synchronized, only the one-way delay variation can be measured.
One-way frame delay measurement can be implemented in either of the following modes:
■ On-demand measurement: calculates the one-way frame delay at a time or a specific number of
times for diagnosis.
One-way frame delay measurement is implemented on an E2E link between a local MEP and its RMEP.
The local MEP sends 1DMs to the RMEP and then receives replies from the RMEP. After one-way frame
delay measurement is configured, a MEP periodically sends 1DMs carrying TxTimeStampf (the time
when the 1DM was sent). After receiving the 1DM, the RMEP parses TxTimeStampf and compares this
value with RxTimef (the time when the DM frame was received). The RMEP calculates the one-way
frame delay based on these values using the following equation:
Frame delay = RxTimef - TxTimeStampf
The frame delay can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets are used to prioritize services. Traffic passing through the P
device on the network shown in Figure 5 carries 802.1p priorities of 1 and 2.
One-way frame delay measurement is enabled on PE1 to send traffic with a priority of 1 to measure the
frame delay on a link between PE1 and PE2. Traffic with a priority of 2 is also sent. After receiving
traffic with priorities of 1 and 2, the P device forwards traffic with a higher priority, delaying the arrival
of traffic with a priority of 1 at PE2. As a result, the frame delay calculated on PE2 is inaccurate.
802.1p priority-based one-way frame delay measurement can be enabled to obtain accurate results.
2022-07-08 569
Feature Description
Two-way frame delay measurement can be implemented in either of the following modes:
■ On-demand measurement: calculates the two-way frame delay at a time for diagnosis.
Two-way frame delay measurement is performed by a local MEP to send a delay measurement
message (DMM) to its RMEP and then receive a DMR from the RMEP. After two-way frame delay
measurement is configured, a MEP periodically sends DMMs carrying TxTimeStampf (the time when the
DMM was sent). After receiving the DMM, the RMEP replies with a DMR message. This message carries
RxTimeStampf (the time when the DMM was received) and TxTimeStampb (the time when the DMR
was sent). The value in every field of the DMM is copied to the DMR, with the exception that the source
and destination MAC addresses were interchanged. Upon receipt of the DMR message, the MEP
calculates the two-way frame delay using the following equation:
2022-07-08 570
Feature Description
AIS
AIS is a protocol used to transmit fault information.
A MEP is configured in MD1 with a level of 6 on each of CE1 and CE2 access interfaces on the user network
shown in Figure 8. A MEP is configured in MD2 with a level of 3 on each of PE1 and PE2 access interfaces on
a carrier network.
• If CFM detects a fault in the link between AIS-enabled PEs, CFM sends AIS packet data units (PDUs) to
CEs. After receiving the AIS PDUs, the CEs suppress alarms, minimizing the impact of a large number of
alarms on a network management system (NMS).
• After the link between the PEs recovers, the PEs stop sending AIS PDUs. CEs do not receive AIS PDUs
during a period of 3.5 times the interval at which AIS PDUs are sent. Therefore, the CEs cancel the
alarm suppression function.
2022-07-08 571
Feature Description
ETH-Test
ETH-Test is used to perform one-way on-demand in-service or out-of-service diagnostic tests on the
throughput, frame loss, and bit errors.
• Verifying throughput and frame loss: Throughput means the maximum bandwidth of a link without
packet loss. When you use ETH-Test to verify the throughput, a MEP sends frames with ETH-Test
information at a preconfigured traffic rate and collects frame loss statistics for a specified period. If the
statistical results show that the number of sent frames is greater than the number of received frames,
frame loss occurs. The MEP sends frames at a lower rate until no frame loss occurs. The traffic rate
measured at the time when no packet loss occurs is the throughput of this link.
• Verifying bit errors: ETH-Test is implemented by verifying the cyclic redundancy code (CRC) of the Test
TLV field carried in ETH-Test frames. For the ETH-Test implementation, four types of test patterns can
be specified in the test TLV field: Null signal without CRC-32, Null signal with CRC-32, PRBS 2-31-1
without CRC-32, and PRBS 2-31-1 with CRC-32. Null signal indicates all 0s signal. PRBS, pseudo random
binary sequence, is used to simulate white noise. A MEP sends ETH-Test frames carrying the calculated
CRC value to the RMEP. After receiving the ETH-Test frames, the RMEP recalculates the CRC value. If
the recalculated CRC value is different from the CRC value carried in the sent ETH-Test frames, bit
errors occur.
ETH-Test provides two types of test modes: out-of-service ETH-Test and in-service ETH-Test:
• Out-of-service ETH-Test mode: Client data traffic is interrupted in the diagnosed entity. To resolve this
issue, the out-of-service ETH-Test function must be used together with the ETH-LCK function.
• In-service ETH-Test mode: Client data traffic is not interrupted, and the frames with the ETH-Test
information are transmitted using part of bandwidths.
ETH-LCK
2022-07-08 572
Feature Description
ETH-LCK is used for administrative locking on the MEP in the outer MD with a higher level than the inner
MD, that is, preventing CC alarms from being generated in the outer MD. When implementing ETH-LCK, a
MEP in the inner MD sends frames with the ETH-LCK information to the MEP in the outer MD. After
receiving the frames with the ETH-LCK information, the MEP in the outer MD can differentiate the alarm
suppression caused by administrative locking from the alarm suppression caused by a fault in the inner MD
(the AIS function).
To suppress CC alarms from being generated in the outer MD, ETH-LCK is implemented with out-of-service
ETH-Test. A MEP in the inner MD with a lower level initiates ETH-Test by sending an ETH-LCK frame to a
MEP in the outer MD. Upon receipt of the ETH-LCK frame, the MEP in the outer MDsuppresses all CC alarms
immediately and reports an ETH-LCK alarm indicating administrative locking. Before out-of-service ETH-Test
is complete, the MEP in the inner MD sends ETH-LCK frames to the MEP in the outer MD. After out-of-
service ETH-Test is complete, the MEP in the inner MD stops sending ETH-LCK frames. If the MEP in the
outer MD does not receive ETH-LCK frames for a period 3.5 times provided that; if the specified interval, it
releases the alarm suppression and reports a clear ETH-LCK alarm.
As shown in Figure 9, MD2 with the level of 3 is configured on PE1 and PE2; MD1 with the level of 6 is
configured on CE1 and CE2. When PE1's MEP1 sends out-of-service ETH-Test frames to PE2's MEP2, MEP1
also sends ETH-LCK frames to CE1's MEP11 and CE2's MEP22 separately to suppress MEP11 and MEP22
from generating CC alarms. When MEP1 stops sending out-of-service ETH-Test frames, it also stops sending
ETH-LCK frames. If MEP11 and MEP22 do not receive ETH-LCK frames for a period 3.5 times provided that; if
the specified interval, they release the alarm suppression.
Figure 9 ETH-LCK
Single-ended ETH-SLM
SLM measures frame loss using synthetic frames instead of data traffic. When implementing SLM, the local
MEP exchanges frames containing ETH-SLM information with one or more RMEPs.
2022-07-08 573
Feature Description
2. After receiving the ETH-SLM request frames, the RMEPs send ETH-SLM reply frames to the local MEP.
A frame with the single-ended ETH-SLM request information is called an SLM, and a frame with the single-
ended ETH-SLM reply information is called an SLR. SLM frames carry SLM protocol data units (PDUs), and
SLR frames carry SLR PDUs.
Single-ended SLM and single-ended frame LM are differentiated as follows: On the point-to-multipoint
network shown in Figure 10, inward MEPs are configured on PE1's and PE3's interfaces, and single-ended
frame LM is performed on the PE1-PE3 link. Traffic coming through PE1's interface is destined for both PE2
and PE3, and single-ended frame LM will collect frame loss statistics for all traffic, including the PE1-to-PE2
traffic. As a result, the collected statistics are not accurate. Unlike singled-ended frame LM, single-ended
SLM collects frame loss statistics only for the PE1-to-PE3 traffic, which is more accurate.
When implementing single-ended SLM, PE1 sends SLM frames to PE3 and receives SLR frames from PE3.
SLM frames contain TxFCf, the value of TxFCl (frame transmission counter), indicating the frame count at
the transmit time. SLR frames contain the following information:
• TxFCf: value of TxFCl (frame transmission counter) indicating the frame count on PE1 upon the SLM
transmission
• TxFCb: value of RxFCl (frame receive counter) indicating the frame count on PE3 upon the SLR
transmission
After receiving the last SLR frame during a measurement period, a MEP on PE1 measures the near-end and
far-end frame loss based on the following values:
• Last received SLR's TxFCf and TxFCb, and value of RxFCl (frame receive counter) indicating the frame
count on PE1 upon the SLR reception. These values are represented as TxFCf[tc], TxFCb[tc], and
RxFCl[tc].
tc indicates the time when the last SLR frame was received during the measurement period.
• Previously received SLR's TxFCf and TxFCb, and value of RxFCl (frame receive counter) indicating the
frame count on PE1 upon the SLR reception. These values are represented as TxFCf[tp], TxFCb[tp], and
RxFCl[tp].
tp indicates the time when the last SLR frame was received during the previous measurement period.
2022-07-08 574
Feature Description
ETH-BN
Ethernet bandwidth notification (ETH-BN) enables server-layer MEPs to notify client-layer MEPs of the
server layer's connection bandwidth when routing devices connect to microwave devices. The server-layer
devices are microwave devices, which dynamically adjust the bandwidth according to the prevailing
atmospheric conditions. The client-layer devices are routing devices. Routing devices can only function as
ETH-BN packets' receive ends and must work with microwave devices to implement this function.
As shown in Figure 12, server-layer MEPs are configured on the server-layer devices, and the ETH-BN
sending function is enabled. The levels of client-layer MEPs must be specified for the server-layer MEPs when
the ETH-BN sending function is enabled. Client-layer MEPs are configured on the client-layer devices, and
the ETH-BN receiving function is enabled. The levels of the client-layer MEPs are the same as those specified
for the server-layer MEPs.
• If the ETH-BN function has been enabled on the server-layer devices Device2 and Device3 and the
bandwidth of the server-layer devices' microwave links decreases, the server-layer devices send ETH-BN
packets to the client-layer devices (Device1 and Device4). After receiving the ETH-BN packets, the
client-layer MEPs can use bandwidth information in the packets to adjust service policies, for example,
to reduce the rate of traffic sent to the degraded links.
• When the server-layer devices' microwave links work properly, whether to send ETH-BN packets is
determined by the configuration of the server-layer devices. When the server-layer microwave devices
stop sending ETH-BN packets, the client-layer devices do not receive any ETH-BN packets. The ETH-BN
data on the client-layer devices is aged after 3.5 times the interval at which ETH-BN packets are sent.
2022-07-08 575
Feature Description
When planning ETH-BN, you must check that the service burst traffic is consistent with a device's buffer capability.
Usage Scenario
Y.1731 supports performance statistics collection on both end-to-end and end-to-multi-end links.
End-to-end performance statistics collection
On the network shown in Figure 13, Y.1731 collects statistics about the end-to-end link performance
between the CE and PE1, between PE1 and PE2, or between the CE and PE3.
End-to-multi-end performance statistics collection
On the network shown in Figure 14, user-to-network traffic from different users traverses CE1 and CE2 and
is converged on CE3. CE3 forwards the converged traffic to the UPE. Network-to-user traffic traverses CE3,
and CE3 forwards the traffic to CE1 and CE2.
When Y.1731 is used to collect statistics about the link performance between the CE and the UPE, end-to-
end performance statistics collection cannot be implemented. This is because only one inbound interface (on
the UPE) sends packets but two outbound interfaces (on CE1 and CE2) receive the packets. In this case,
statistics on the outbound interfaces fail to be collected. To resolve this issue, end-to-multi-end performance
statistics collection can be implemented.
The packets carry the MAC address of CE1 or CE2. The UPE identifies the outbound interface based on the
destination MAC address carried in the packets and collects end-to-end performance statistics.
2022-07-08 576
Feature Description
Both end-to-multi-end and end-to-end performance statistics collection applies to VLL, VPLS, and VLAN
scenarios and has the same statistics collection principles.
5.7.5.1 Background
Link detection protocols are used to monitor the connectivity of links between devices and detect faults. A
single fault detection protocol cannot detect all faults in all links on a complex network. A combination of
protocols and techniques must be used to detect link faults.
Ethernet OAM detects faults in Ethernet links and advertises fault information to interfaces or other protocol
modules. Ethernet OAM fault advertisement is implemented by an OAM manager (OAMMGR) module,
application modules, and detection modules. An OAMMGR module associates one module with another. A
detection module monitors link status and network performance. If a detection module detects a fault, it
instructs the OAMMGR module to notify an application module or another detection module of the fault.
After receiving the notification, the application or detection module takes measures to prevent a
communication interruption or service quality deterioration.
The OAMMGR module helps an Ethernet OAM module to advertise fault information to a detection or
2022-07-08 577
Feature Description
application module. If an Ethernet OAM module detects a fault, it instructs the OAMMGR module to send
alarms to the network management system (NMS). A network administrator takes measures based on
information displayed on the NMS. Ethernet OAM fault advertisement includes fault information
advertisement between CFM and other modules.
The following example illustrates fault information advertisement between EFM and detection modules over
a path CE5 -> CE4 -> CE1-> PE2 -> PE4 on the network shown in Table 1.
EFM is used to monitor Although EFM detects a fault, EFM The EFM module can be associated with
the direct link between cannot notify PE6 of the fault. As a the CFM module.
CE1 and PE2, and CFM result, PE6 still forwards network If the EFM module detects a fault, it
is used to monitor the traffic to PE2, causing a traffic instructs the OAMMGR module to notify
link between PE2 and interruption. the CFM module of the fault.
PE6. Although CFM detects a fault, CFM If the CFM module detects a fault, it
cannot notify CE1 of the fault. As a instructs the OAMMGR module to notify
result, CE1 still forwards user traffic the EFM module of the fault.
to PE2, causing a traffic interruption. The association allows a module to
2022-07-08 578
Feature Description
EFM is used to monitor Although EFM detects a fault, EFM The EFM module can be associated with
the direct link between cannot notify PE6 of the fault. As a the BFD module.
CE1 and PE2, and BFD result, PE6 still forwards network If the EFM module detects a fault, it
is used to monitor the traffic to PE2, causing a traffic instructs the OAMMGR module to notify
link between PE2 and interruption. the BFD module of the fault.
PE6. Although BFD detects a fault, EFM If the BFD module detects a fault, it
cannot notify CE1 of the fault. As a instructs the OAMMGR module to notify
result, CE1 still forwards user traffic the EFM module of the fault.
to PE2, causing a traffic interruption. If EFM on CE1 detects a fault or receives
fault information sent by PE2, the
association between EFM and BFD works
and deletes the MAC entry, which
switches traffic to a backup link.
The association allows a module to
notify another associated module of a
fault and to send an alarm to an NMS. A
network administrator analyzes alarm
information and takes measures to
rectify the fault.
EFM is used to monitor Although EFM detects a fault, EFM The EFM module can be associated with
the direct link between cannot notify PE6 of the fault. As a the BFD module.
CE1 and PE2, and BFD result, PE6 still forwards network If the EFM module detects a fault, it
is used to monitor the traffic to PE2, causing a traffic instructs the OAMMGR module to notify
link between PE2 and interruption. the BFD module of the fault.
PE6. Although BFD detects a fault, EFM If the BFD module detects a fault, it
cannot notify CE1 of the fault. As a instructs the OAMMGR module to notify
result, CE1 still forwards user traffic the EFM module of the fault.
to PE2, causing a traffic interruption. If EFM on CE1 detects a fault or receives
fault information sent by PE2, the
association between EFM and BFD works
2022-07-08 579
Feature Description
Table 2 describes fault information advertisement between EFM and VRRP modules.
A VRRP group is If links connected to a VRRP group fail, To help prevent data loss, the VRRP
configured to VRRP packets cannot be sent to module can be associated with the EFM
determine the negotiate the master/backup status. A module. If a fault occurs, the EFM
master/backup status backup VRRP device preempts the module notifies the VRRP module of the
of provider edges- Master state after a period of three fault. After receiving the notification,
aggregation (PE- times the interval at which VRRP the VRRP module triggers a
AGGs). packets are sent. As a result, data loss master/backup VRRP switchover.
2022-07-08 580
Feature Description
Figure 1 Networking for fault information advertisement between CFM and detection modules
The following example illustrates fault information advertisement between CFM and detection modules over
a path UPE1 -> PE2 -> PE4 -> PE6 -> PE8 on the network shown in Table 1.
CFM is used to monitor Although CFM detects a fault in the CFM can be associated with port 1.
the link between UPE1 link between UPE1 and PE4, CFM If CFM detects a fault, it instructs the
and PE4. cannot notify PE6 of the fault. As a OAMMGR module to disconnect port 1
result, PE6 still forwards network intermittently. This operation allows
traffic to PE4, causing a traffic other modules to detect the fault.
interruption.
If port 1 goes Down, it instructs the
Although port 1 on PE4 goes Down,
2022-07-08 581
Feature Description
port 1 cannot notify CE1 of the fault. OAMMGR module to notify CFM of the
As a result, CE1 still forwards user fault. After receiving the notification,
traffic to PE4, causing a traffic CFM notifies PE6 of the fault.
interruption. The association between CFM and a port
is used to detect faults in an active link
of a link aggregation group or in the link
aggregation group in 1:1 active/standby
mode. If a fault is detected, a protection
switchover is triggered.
EFM is deployed to Although CFM detects a fault, CFM The EFM module can be associated with
monitor the link cannot notify CE1 of the fault. As a the CFM module.
between CE1 and UPE1, result, CE1 still forwards user traffic If the EFM module detects a fault, it
and CFM is deployed to to PE4, causing a traffic interruption. instructs the OAMMGR module to notify
monitor the link the CFM module of the fault.
between PE4 and PE8.
If the CFM module detects a fault, it
instructs the OAMMGR module to notify
the EFM module of the fault.
The association allows a module to
notify another associated module of a
fault and to send an alarm to an NMS. A
network administrator analyzes alarm
information and takes measures to
rectify the fault.
CFM is configured to Although CFM detects a fault in the Two CFM modules can be associated
monitor the links link between PE4 and PE8, it cannot with each other. If a CFM module detects
between UPE1 and PE4 notify UPE1 of the fault. As a result, a fault, it instructs the OAMMGR module
and between PE4 and UPE1 still forwards user traffic to PE4 to notify the other CFM module of the
PE8. through PE2, causing a traffic fault and sends an alarm to an NMS. A
interruption. network administrator analyzes alarm
notify PE8 of the fault. As a result, CFM can be associated with MAC or ARP
PE8 still forwards network traffic to entry clearing. If CFM detects a fault, it
PE4 through PE6, causing a traffic instructs an interface to clear MAC or
interruption. ARP entries, triggering traffic to be
switched to a backup link.
2022-07-08 582
Feature Description
CFM is used to monitor Although CFM detects a fault in the The CFM module can be associated with
the link between UPE1 link between UPE1 and PE4, it cannot the BFD module.
and PE4. notify PE8 of the fault. As a result, If the CFM module detects a fault, it
BFD can be used to PE8 still forwards network traffic to instructs the OAMMGR module to notify
monitor the non- PE4 through PE6, causing a traffic the BFD module of the fault.
Ethernet link between interruption. If the BFD module detects a fault, it
PE4 and PE8. The non- Although BFD detects a fault, BFD instructs the OAMMGR module to notify
Ethernet link can be a cannot notify UPE1 of the fault. As a the CFM module of the fault.
packet over result, UPE1 still forwards user traffic The association allows a module to
synchronous digital to PE4 through PE2, causing a traffic notify another associated module of a
hierarchy interruption. fault and to send an alarm to an NMS. A
(SDH)/synchronous network administrator analyzes alarm
optical network information and takes measures to
(SONET) (POS) link. rectify the fault.
2022-07-08 583
Feature Description
2022-07-08 584
Feature Description
Table 2 describes fault information advertisement between CFM and VRRP modules.
A VRRP backup group is If a fault occurs on the link CFM can be associated with the
configured to determine the between NPE1 (the master) and VRRP module on NPEs. If CFM
master/backup status of PE-AGG1, NPE2 cannot receive detects a fault in the link between
network provider edges (NPEs). VRRP packets within a period of PE-AGG1 and NPE1, it instructs the
CFM is used to monitor links three times the interval at which OAMMGR module to notify the
between NPEs and PE-AGGs. VRRP packets are sent. NPE2 then VRRP module of the fault. After
preempts the Master state. As a receiving the notification, the VRRP
result, two master devices coexist module triggers a master/backup
in a VRRP backup group, and the VRRP switchover. NPE1 then
UPE receives double copies of changes its VRRP status to Initialize.
network traffic. NPE2 changes its VRRP status from
Backup to Master after a period of
three times the interval at which
VRRP packets are sent. This process
prevents two master devices from
coexisting in the VRRP backup
2022-07-08 585
Feature Description
group.
A VRRP backup group is If a fault occurs on the backbone When VRRP status changes on
configured to determine the network, it triggers a NPEs, the VRRP module notifies PE-
master/backup status of NPEs. master/backup VRRP switchover AGGs' CFM modules of VRRP status
CFM is used to monitor links but cannot trigger an changes.
between NPEs and PE-AGGs. active/standby PW switchover. As a The CFM module on each PE-AGG
result, the CE still transmits user notifies the PW module of the
PW redundancy is configured to
traffic to the previous master NPE, status change and triggers an
determine the active/standby
causing a traffic interruption. active/standby PW switchover.
status of PWs.
Each PE-AGG notifies its associated
UPE of the PW status change.
After the UPE receives the
notification, it determines the
primary/backup status of PWs.
2022-07-08 586
Feature Description
Figure 1 shows a typical MAN network. The following example illustrates Ethernet OAM applications on a
MAN.
• EFM is used to monitor P2P direct links between a digital subscriber line access multiplexer (DSLAM)
and a user-end provider edge (UPE) or between a LAN switch (LSW) and a UPE. If EFM detects errored
frames, codes, or frame seconds, it sends alarms to the network management system (NMS) to provide
information for a network administrator. EFM uses the loopback function to assess link quality.
• CFM is used to monitor E2E links between a UPE and an NPE or between a UPE and a provider edge-
aggregation (PE-AGG). A network planning engineer groups the devices of each Internet service
provider (ISP) into an MD and maps a type of service to an MA. A network maintenance engineer
enables maintenance points to exchange CCMs to monitor network connectivity. After receiving an
alarm on the NMS, a network administrator can enable loopback to locate faults or enable linktrace to
discover paths.
2022-07-08 587
Feature Description
• Y.1731 is used to measure packet loss and the delay on E2E links between a UPE and an NPE or
between a UPE and a PE-AGG at the aggregation layer.
On the mobile backhaul network shown in Figure 1, the transport network between the CSG and RSGs, and
the wireless networks between NodeBs/eNodeBs and the CSG and between RSGs and RNCs may be operated
by different carriers. When a link fault occurs on a network, it is very important to demarcate and locate the
fault.
Ethernet OAM can be used on the transport and wireless networks to demarcate and locate faults.
■ EFM is used to monitor the connectivity of links between a NodeB/eNodeB and CSG1 or between
RNCs and RSGs.
■ EFM detects errored codes, frames, and frame seconds on links between a NodeB/eNodeB and
CSG1 and between RNCs and RSGs. If the number of errored codes, frames, or frame seconds
exceeds a configured threshold, an alarm is sent to the NMS. A network administrator is notified of
link quality deterioration and can assess the risk of adverse impact on voice traffic.
■ Loopback is used to monitor the quality of voice links between a NodeB/eNodeB and CSG1 or
between RNCs and RSGs.
• CFM is used to locate faulty links over which E2E services are transmitted.
■ CFM periodically monitors links between cell site gateway (CSG) 1 and remote site gateways
(RSGs). If CFM detects a fault, it sends an alarm to the NMS. A network administrator analyzes
alarm information and takes measures to rectify the fault.
■ Loopback and linktrace are enabled on links between CSG1 and the RSGs to help link fault
diagnosis.
• Y.1731 is used together with CFM to monitor link performance and voice and data traffic quality.
2022-07-08 588
Feature Description
Definition
Link-state pass through (LPT) transparently transmits the local link status to the opposite end so that the
opposite end can perform operations accordingly.
Purpose
Ethernet LPT can detect and report a link fault on the Ethernet user side or a fault on an intermediate point-
to-point network.
After detecting a fault on the local link, the local user equipment automatically enables a backup link and
uses the backup link to communicate with the opposite user equipment. The opposite user equipment,
however, cannot obtain information about the local link fault. Therefore, it still uses the original link to
communicate with the local user equipment. As a result, services are interrupted.
Benefits
If Ethernet LPT is enabled, the local user equipment can send information about the local link fault to the
opposite network edge equipment using Ethernet LPT packets. The opposite network edge equipment
disables the UNI-side port so that the opposite user equipment starts to use the backup link. In this manner,
services are transmitted over the backup link between the user equipment at both ends.
2022-07-08 589
Feature Description
PE1 and PE2 are enabled with Ethernet LPT and transmit packets to each other. When a fault occurs on link
1:
1. CE1 detects that link 1 is malfunctioning and enables the backup link to communicate with CE2.
PE1 periodically transmits Ethernet LPT packets to PE2. After detecting that link 1 is malfunctioning,
PE1 sends Ethernet LPT packets containing a message to PE2, indicating that link 1 is malfunctioning.
2. After receiving and interpreting the Ethernet LPT packets, PE2 acknowledges that the user side link of
PE1 is malfunctioning and disables its user side port.
After detecting that the user side port of PE2 is disabled, CE2 enables the backup link to communicate
with CE1.
After the fault on the user side link of PE1 is rectified, services on the backup link can be switched back to
the working link according to the following steps.
1. After detecting that the fault on link 1 is rectified, CE1 switches services on the backup link to the
working link and tries to communicate with CE2 using the working link.
After detecting that the fault on link 1 is rectified, PE1 sends Ethernet LPT packets containing a
message to PE2, indicating that the fault on its user side link is rectified.
2. After receiving and interpreting the Ethernet LPT packets, PE2 acknowledges that the fault on the user
side link of PE1 is rectified and enables its user side port.
After detecting that the user side port is enabled, CE2 switches services on the backup link back to the
working link and communicates with CE1 using the working link.
2022-07-08 590
Feature Description
PE1 and PE2 are enabled with Ethernet LPT and transmit packets to each other. When a point-to-point
network fault occurs:
1. PE1 receives no Ethernet LPT packets from PE2 and detects that Ethernet LPT communication fails.
Then, PE1 disables its user side port.
After detecting that the user side port of PE1 is disabled, CE1 enables the backup link to communicate
with CE2.
2. PE2 receives no Ethernet LPT packets from PE1 and detects that Ethernet LPT communication fails.
Then, PE2 disables its user side port.
After detecting that the user side port of PE2 is disabled, CE2 enables the backup link to communicate
with CE1.
After the point-to-point network fault is rectified, services on the backup link can be switched back to the
working link according to the following steps.
1. After receiving and interpreting the Ethernet LPT packets, PE1 detects that the fault is rectified and
enables its user side port.
After detecting that the user side port is enabled, CE1 switches services on the backup link back to the
working link and tries to communicate with CE2 using the working link.
2. After receiving and interpreting the Ethernet LPT packets, PE2 detects that the fault is rectified and
enables its user side port.
After detecting that the user side port is enabled, CE2 switches services on the backup link back to the
working link and communicates with CE1 using the working link.
Under common conditions, data between CE1 and CE2 traverses link 1, the point-to-point network, and link
2. The point-to-point network can be built based on PWE3 or QinQ links. If a fault occurs on link 1, link 2, or
the point-to-point network, communication between CE1 and CE2 is interrupted.
transmission. When link 1 is malfunctioning, PE2 disables link 2. When the point-to-point network is
malfunctioning, PE1 disables link 1 and PE2 disables link 2. In this manner, CE1 and CE2 can communicate
with each other by using the backup link.
Definition
Dual-device backup is a feature that ensures service traffic continuity in scenarios in which a master/backup
status negotiation protocol (for example, VRRP or E-Trunk) is deployed. Dual-device backup enables the
master device to back up service control data to the backup device in real time. When the master device or
the link directly connected to the master device fails, service traffic quickly switches to the backup device.
When the master device or the link directly connected to the master device recovers, service traffic switches
back to the master device. Therefore, dual-device backup improves service and network reliability.
Purpose
In traditional service scenarios, all users use a single device to access a network. Once the device or the link
directly connected to the device fails, all user services are interrupted, and the service recovery time is
uncertain. To resolve this issue, deploy dual-device backup to enable the master device to back up service
2022-07-08 592
Feature Description
Benefits
• Dual-device backup offers the following benefits to users:
Related Concepts
If VRRP is used as a master/backup status negotiation protocol, dual-device backup involves the following
concepts:
• VRRP
VRRP is a fault-tolerant protocol that groups several routers into a virtual router. If the next hop of a
host is faulty, VRRP switches traffic to another router, which ensures communication continuity and
reliability.
For details about VRRP, see the chapter "VRRP" in NE40E Feature Description - Network Reliability.
• RUI
RUI is a Huawei-specific redundancy protocol that is used to back up user information between devices.
RUI, which is carried over the Transmission Control Protocol (TCP), specifies which user information can
be transmitted between devices and the format and amount of user information to be transmitted.
• RBS
The remote backup service (RBS) is an RUI module used for inter-device backup. A service module uses
the RBS to synchronize service control data from the master device to the backup device. When a
master/backup VRRP switchover occurs, service traffic quickly switches to a new master device.
• RBP
The remote backup profile (RBP) is a configuration template that provides a unified user interface for
dual-device backup configurations.
If E-Trunk is used as a master/backup status negotiation protocol, dual-device backup involves the following
concept:
• E-Trunk
E-Trunk implements inter-device link aggregation, providing device-level reliability. E-Trunk aggregates
data links of multiple devices to form a link aggregation group (LAG). If a link or device fails, services
2022-07-08 593
Feature Description
are automatically switched to the other available links or devices in the E-Trunk, improving link and
device-level reliability.
For details about E-Trunk, see "E-Trunk" in NE40E Feature Description - LAN Access and MAN Access.
Implementation
There are two primary backup protocols, VRRP and E-Trunk. The following takes ARP dual-device backup
and IGMP Snooping backup as an example.
• Dual-device ARP hot backup enables the master device to back up the ARP entries at the control and
forwarding layers to the backup device in real time. When the backup device switches to a master
device, it uses the backup ARP entries to generate host routing information without needing to relearn
ARP entries, ensuring downlink traffic continuity.
■ Manually triggered dual-device ARP hot backup: You must manually establish a backup platform
and backup channel for the master and backup devices. In addition, you must manually trigger ARP
entry backup from the master device to the backup device. This backup mode has complex
configurations.
■ Automatically enabled dual-device ARP hot backup: You need to establish only a backup channel
between the master and backup devices, and the system automatically implements ARP entry
backup. This backup mode has simple configurations.
• Dual-device IGMP snooping hot backup enables the master device to back up IGMP snooping entries to
the backup device in a master/backup E-Trunk scenario. If the master device or the link between the
master device and user fails, the backup device switches to a master device and takes over, ensuring
multicast service continuity.
2022-07-08 594
Feature Description
Benefits
Dual-device backup provides a unified platform for backing up service control data from the master device
to the backup device.
5.9.2.1 Overview
The NE40E ensures high reliability of services through the following approaches:
• Status control: Several BRASs negotiate a master BRAS through VRRP. With the help of BFD or Ethernet
OAM, the master BRAS can detect a link fault quickly and traffic can be switched to the standby BRAS
immediately.
• Service control: Information about access users is backed up to the standby BRAS from the master BRAS
through TCP. This ensures service consistency.
• Route control: By controlling routes in the address pool or user routes in a real-time manner, the BRAS
ensures that downstream traffic can reach users smoothly when an active/standby switchover occurs.
VRRP
VRRP is a fault-tolerant protocol defined in relevant standards . As shown in Figure 1, the Routers on the
2022-07-08 595
Feature Description
LAN (Device1, Device2, and Device3) are arranged in a backup group using VRRP. This backup group
functions as a virtual router.
On the LAN, hosts need to obtain only the IP address of the virtual router rather than the IP address of each
router in the backup group. The hosts set the IP address of the virtual router as the address of their default
gateway. Then, the hosts can communicate with an external network through the virtual gateway.
VRRP dynamically associates the virtual router with a physical router that transmits services. When the
physical router fails, another router is selected to take over services and user services are not affected. The
internal network and the external network can communicate without interruption.
As shown in Figure 2, the two Routers negotiate the master and standby states using VRRP. The NE40E
supports active/standby status selection of interfaces and sub-interfaces.
BFD is enabled between the two Routers to detect links between the two devices. BFD in this mode is called
Peer-BFD. BFD is also enabled between the Router and the LSW to detect links between the Router and the
LSW. BFD in this mode is called Link-BFD.
When a link fails, through VRRP, the new master and standby devices can be negotiated, but several seconds
2022-07-08 596
Feature Description
are needed and the requirements of carrier-grade services cannot be met. Through BFD or Eth OAM, a faulty
link can be detected in several milliseconds and the device can perform a fast active/standby switchover with
the help of VRRP.
During the implementation of an active/standby switchover, VRRP has to determine device status based on
Link-BFD status and Peer-BFD status. As shown in Figure 2, when Link 1 fails, the Peer-BFD status and Link-
BFD status of Device1 both go down and Device1 becomes the standby device. In this case, the Peer-BFD
status of Device2 goes down but the Link-BFD status of Device2 is still up. Therefore, Device2 becomes the
master device.
In actual networking, certain LSWs may not support BFD. In this case, you have to select another detection
mechanism. Besides BFD, the NE40E also supports detection of links connecting to LSWs through Eth OAM.
The NE40E supports monitoring of upstream links (for example, Link 3 in Figure 2) to enhance reliability
protection for the network side. When an upstream link fails, the NE40E responds to the link failure quickly
and performs an active/standby link switchover.
Attribute Description
2022-07-08 597
Feature Description
Attribute Description
QosProfile Name of a QoS profile delivered by the RADIUS server. It is used to meet
users' requirements for QoS.
UCL-Group UCL for user group policy control delivered by the RADIUS server.
Remanent-Volume Volume of the remaining traffic delivered by the RADIUS server. It is used to
control the online traffic of users.
Session-Timeout Remaining time delivered by the RADIUS server. It is used to control the
online duration of users.
Up-CIR Upstream traffic committed information rate (CIR) delivered by the RADIUS
server.
Up-PIR Upstream traffic peak information rate (PIR) delivered by the RADIUS
server.
2022-07-08 598
Feature Description
Attribute Description
Radius proxy IP address Destination IP address carried in a received RADIUS packet sent by a client
when the BAS device functions as a RADIUS proxy.
Radius client IP address Source IP address carried in a received RADIUS packet sent by a client when
the BAS device functions as a RADIUS proxy.
When backing up information about access users, you need to ensure that the configurations of the active
and standby BRASs are consistent, including the IP address, VLAN, and QoS parameters. You need to ensure
2022-07-08 599
Feature Description
the consistency of common attributes. The special attributes of a user are backed up through TCP. Figure 1
shows the process of backing up the special attributes of a user. A TCP connection can be set up based on
the uplinks connecting to the MAN.
Figure 1 Diagram for user information backup for high service reliability
The user information backup function supports backup of information about authentication, accounting, and
authorization of users. The NE40E controls user access according to the master/backup status negotiated
through VRRP. Only the active device can handle users' access requests and perform authentication, real-
time accounting, and authorization for users. The standby device discards users' access requests.
After a user logs on through the active device, the active device backs up information about the user to the
standby device through TCP. The standby device generates a corresponding service based on user
information. This ensures that the standby device can smoothly take over services from the active device
when the active device fails.
When the active device fails (for example, the system restarts), services are switched to the standby device.
When the active device recovers, services need to be switched back. The active device, however, lacks
information about users. Therefore, information about users on the standby device must be backed up to the
active device in batch. At present, the maximum rate of information backup is 1000 pieces of information
per second.
As shown in Figure 2, the entire service control process can be divided into the following phases:
1. Backup phase
• The two NE40Es negotiate the active device (Device1) and standby device (Device2) using VRRP.
• A user logs on through Device1, and information about this user is backed up to Device2 in a
real-time manner.
• The two NE40Es detect the link between them through BFD or Ethernet OAM.
2. Switchover phase
• For user-to-network traffic, if a link to Device 1 fails, VRRP, with the help of BFD or Ethernet
OAM, rapidly switches Device 1 to the backup state and Device 2 to the master state and
advertises gratuitous ARP packets to update the MAC address table on the LSW, which allows
following user packets to successfully reach Device2.
• For network-to-user traffic, if a link to Device 1 fails, Device 2 forwards traffic based on the
backup ARP entry, preventing traffic loss.
3. Switchback phase
2022-07-08 600
Feature Description
• The link on the Device1 recovers, and VRRP renegotiates the active device and the standby device.
Then, Device1 acts as the active device; Device2 acts as the standby device. In this case, Device2
needs to back up information about all users to Device1 in batch and Device1 needs to back up
information about users on it to Device2. User entry synchronization between the two devices is
bidirectional.
• Before the batch backup is completed, the VRRP switchover is not performed. At this time, Device
1 is still the standby device and Device2 is still the active device. When the batch backup is
completed, the VRRP switchover is performed. Device1 becomes the active device and sends a
free ARP packet; Device2 becomes the standby device and completes switchback of user services.
The NE40E provides high reliability protection for Web authentication users. The principle of high reliability protection
for Web authentication users is similar to that for ordinary access users. No special configuration is needed on the Web
server.
2022-07-08 601
Feature Description
2022-07-08 602
Feature Description
2. After receiving DHCP packets, the master BRAS attempts to authenticate user information. If
2022-07-08 603
Feature Description
authentication is successful, the master BRAS allocates an IP address to the user. The slave BRAS
does not provide access services for the user.
4. The master BRAS sends user information to the slave BRAS along a backup channel. The slave
BRAS uses the information to locally generate control and forwarding information for the user.
On the network shown in Figure 1, the procedure for ordering multicast programs is as follows:
1. A DHCP STB sends an Internet Group Management Protocol (IGMP) Report message to an
aggregation switch, and the switch forwards the message to both the master and slave BRASs.
2. Both the master and slave BRASs receive the IGMP Report message, and pull multicast traffic
from multicast sources.
3. The master BRAS replicates multicast traffic to the STB, but the slave BRAS does not.
2022-07-08 604
Feature Description
1. The STB establishes a Point-to-Point Protocol (PPP) connection to the master BRAS. The BRAS backs
up the STB information to the slave BRAS. After receiving the information, the slave BRAS locally
generates control and forwarding information for the PPP user.
2. The STB sends an IGMP Report message. After receiving the message, the master BRAS backs the
message up to the slave BRAS, sends a Join message to the RP to pull multicast traffic, and establishes
a rendezvous point tree (RPT).
3. After receiving the IGMP Report message from the backup channel, the slave BRAS sends a Join
message to the RP to pull multicast traffic and establishes an RPT.
4. The master BRAS replicates multicast traffic to the STB, but the slave BRAS does not.
Figure 2 Multicast service hot backup for a DHCP STB using SmartLink to control active and standby links
2022-07-08 605
Feature Description
Figure 3 Multicast service hot backup for a DHCP STB using E-Trunk to control active and standby links
On the network shown in Figure 1, NE40E-1 and NE40E-2 function as BRASs and run redundancy user
information (RUI). A Virtual Router Redundancy Protocol (VRRP) group is configured for the two NE40Es,
with NE40E-1 as the master and NE40E-2 as the backup. When the link between the switch (SW) and NE40E
-1 goes faulty, the fault triggers a master/backup VRRP switchover. Then, NE40E-2 becomes the master and
starts neighbor discovery (ND) detection, and NE40E-1 becomes the backup and stops the ND detection. If
the link-local address or MAC address on an interface of NE40E-2 is different from that of an interface on
NE40E-1, some users will go offline, or some user packets will be discarded.
To prevent a user from detecting the active link fault, NE40E-2 must use the same link-local address and
MAC address as those of NE40E-1.
2022-07-08 607
Feature Description
On the network shown in Figure 2, the NE40Es act as the master and backup DHCPv6 servers by running
VRRP. The master DHCPv6 server assigns an IPv6 address to the PC. The DHCPv6 packets that the master
DHCPv6 server sends carry the DHCP unique identifier (DUID), which uniquely identifies the DHCPv6 server.
If RUI is enabled for the two DHCPv6 servers, to ensure that the new master DHCPv6 server sends correct
DHCPv6 packets to the PC after a master/backup switchover, the master and backup DHCPv6 servers must
use the same DUID.
The PC automatically generates a DUID in the link-layer address (DUID-LL) mode using the virtual MAC
address of the VRRP group. This process avoids the need to configure a DUID in the link-layer address plus
time (DUID-LLT) mode or configure a DUID statically.
After the DUID is generated in the DUID-LL mode, the master and backup DHCPv6 servers do not use the
globally configured DUID, saving the process of backing up the DUID between the servers.
On the network shown in Figure 3, the NE40Es act as the master and backup DHCPv6 relay agents. A unique
DHCPv6 relay agent remote-ID identifies the master DHCPv6 relay agent. In the RUI-enabled scenario, to
enable the backup DHCPv6 relay agent to forward the DHCPv6 packets after a master/backup switchover,
the master and backup DHCPv6 relay agents must use the same DHCPv6 relay agent remote-ID. This way
ensures that the DHCPv6 server processes the packets correctly.
The RUI-enabled PC uses the DUID that identifies the master and backup DHCPv6 servers as the DHCPv6
relay agent remote-ID to identify both the master and backup DHCPv6 relay agents.
Networking Description
Dual-device ARP hot backup enables the master device to back up the ARP entries at the control and
forwarding layers to the backup device in real time. When the backup device switches to a master device, it
uses the backup ARP entries to generate host routing information. After you deploy dual-device ARP hot
backup, the new master device forwards downlink traffic without needing to relearn ARP entries. Dual-
device ARP hot backup ensures downlink traffic continuity.
Dual-device ARP hot backup applies in both Virtual Router Redundancy Protocol (VRRP) and enhanced trunk (E-Trunk)
scenarios. This section describes the implementation of dual-device ARP hot backup in VRRP scenarios.
Figure 1 shows a typical network topology in which a Virtual Router Redundancy Protocol (VRRP) backup
group is deployed. In the topology, Device A is a master device, and Device B is a backup device. In normal
circumstances, Device A forwards both uplink and downlink traffic. If Device A or the link between Device A
and the switch fails, a master/backup VRRP switchover is triggered to switch Device B to the Master state.
Device B needs to advertise a network segment route to a device on the network side to direct downlink
traffic from the network side to Device B. If Device B has not learned ARP entries from a device on the user
side, the downlink traffic is interrupted. Device B forwards the downlink traffic only after it learns ARP
entries from a device on the user side.
2022-07-08 609
Feature Description
devices. Device A and Device B receives ARP packets sent by Device C and the two devices learn incomplete
ARP entries. In this case, Device A and Device B need to learn ARP entries from each other and back up ARP
information for each other. If Device A fails, services can be switched to Device B, which prevents A-to-C or
B-to-C traffic interruptions.
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not learn ARP entries from a
device on the user side, deploy dual-device ARP hot backup on Device A and Device B, as shown in Figure 3.
After the deployment, Device B backs up the ARP entries on Device A in real time. If a master/backup VRRP
switchover occurs, Device B forwards downlink traffic based on the backup ARP entries without needing to
relearn ARP entries from a device on the user side.
Networking Description
Dual-device IGMP snooping hot backup enables the master device and the backup device synchronously
generate multicast entries in real time. The IGMP protocol packets are synchronized from the master device
to the backup device, so that the same multicast forwarding table entries can be generated on the backup
device. After you deploy dual-device ARP hot backup, the new master device forwards downlink traffic
without needing to relearn multicast forwarding table entries by IGMP snooping. Dual-device IGMP snooping
hot backup ensures downlink traffic continuity.
Figure 1 shows a typical network topology in which an Eth—Trunk group is deployed. In the topology,
Device A is a master device, and Device B is a backup device. In normal circumstances, Device A forwards
both uplink and downlink traffic. If Device A or the link between Device A and the switch fails, a
master/backup Eth—Trunk link switchover is triggered to switch Device B to the Master state. Device B
needs to advertise a network segment route to a device on the network side to direct downlink traffic from
the network side to Device B. If Device B has not generated multicast forwarding entries directing traffic to
the user side, the downlink traffic is interrupted. Device B forwards the downlink traffic only after it
generates forwarding entries directing traffic to the user side.
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not generate multicast forwarding
entries directing traffic to the user side, deploy dual-device IGMP snooping hot backup on Device A and
Device B, as shown in Figure 2.
2022-07-08 611
Feature Description
After the deployment, Device A and Device B generate the same multicast forwarding entries at the same
time. If a master/backup Eth-Trunk link switchover occurs, Device B forwards downlink traffic based on the
generated multicast forwarding entries without needing to generate the entries directing traffic to the user
side.
Networking Description
DHCP server dual-device hot backup effectively implements rapid service switching by keeping user session
information synchronized on the master and backup devices in real time on the control and forwarding
planes. The user session information (including the IP address, MAC address, DHCP lease, and Option 82)
generated during user access from the master device is synchronized to the backup device. When VRRP
detects a link failure on the master device, a VRRP packet is sent to adjust the priority, triggering a
master/backup VRRP switchover. After the master/backup VRRP switchover is performed, the original backup
device takes over to assign addresses for new users or process lease renewal requests from online users.
Users are not aware of DHCP server switching.
Figure 1 shows the typical network with a VRRP group deployed. DeviceA and DeviceB are the master and
backup devices, respectively. Both DeviceA and DeviceB are DHCP servers that assign IP addresses to clients.
In normal situations, DeviceA processes DHCP users' login and lease renewal requests. If DeviceA or the link
between DeviceA and the switch fails, a master/backup VRRP switchover is performed. DeviceB then
becomes the master. DeviceB can assign addresses to new users or process lease renewal requests from
online users only after user session information on DeviceA has been synchronized to DeviceB.
2022-07-08 612
Feature Description
Feature Deployment
If DeviceA or the link between DeviceA and the switch fails, new users cannot go online and the existing
online users cannot renew their leases. To resolve this issue, configure DHCP server dual-device hot backup
on DeviceA and DeviceB.
On the network shown in Figure 2, after DHCP server dual-device hot backup is configured on DeviceA and
DeviceB, DeviceB synchronizes user session information from DeviceA in real time. If a master/backup VRRP
switchover occurs, DeviceB can assign addresses to new users or process lease renewal requests from online
users based on the user session information synchronized from DeviceA.
2022-07-08 613
Feature Description
Dual-homing access may fail to be deployed in a multi-node backup scenario due to insufficient resources. If
this problem occurs, single-homing access can be used. On the network shown in Figure 1, network traffic
can be forwarded by either NE40E 1 or NE40E 2. If common single-homing access is used, NE40E 2 will
discard User1's change-of-authorization (COA) or disconnect message (DM) and web authentication
response messages upon receipt. This case causes User1's COA/DM and web authentications to fail. If the
link between NE40E 1 and the network goes faulty, the preceding problem will also occur.
To resolve the preceding problem, configure user data virtual backup between NE40E 1 and NE40E 2. On the
network shown in Figure 2, information about User1's identity is backed up on NE40E 2. The aggregation
switch S1 is single-homed to NE40E 1. VRRP is deployed on the access side. One VRRP protection group is
deployed for each pair of active and standby links. If the VRRP group is in the Master state, the access link
can be accessed by users. If User1's COA/DM and web authentication response messages are randomly
delivered to NE40E 2, user data virtual backup allows NE40E 2 to forward the response messages to NE40E
1. Additionally, if the link between NE40E 1 and the network goes faulty, NE40E 2 can also take over the
traffic on the faulty link, preventing traffic interruption.
2022-07-08 614
Feature Description
Single-homing access in a multi-node backup scenario can be implemented only after user data virtual backup is
configured.
2022-07-08 615
Feature Description
Figure 2 Dual-homing access through the ring network (semi-ring) formed by aggregation switches
2022-07-08 616
Feature Description
In the topology shown in Figure 1, focus on the VLAN planning, and make sure that the two NE40Es can be
accessed by users simultaneously.
2022-07-08 617
Feature Description
2022-07-08 618
Feature Description
Addresses
This section describes load balancing based on the odd and even media access control (MAC) addresses
carried in user packets.
As shown in Figure 1, two Virtual Router Redundancy Protocol (VRRP) groups are deployed on the access
side. One VRRP group uses NE40E 1 as the master and NE40E 2 as the backup, and the other uses NE40E 2
as the master and NE40E 1 as the backup.
In multi-device backup scenarios, configure load balancing based on odd and even MAC addresses to enable
the master NE40E to forward only user packets carrying odd or even MAC addresses.
To determine the forwarding path of uplink traffic and prevent packet disorder, the master and backup
NE40Es in the same virtual local area network (VLAN) must use different virtual MAC addresses to establish
sessions with hosts.
2022-07-08 619
Feature Description
As shown in Figure 1, the NE40Es serve as multicast replication points. Multicast hot backup does not apply
to VLAN-based or interface-based multicast replication.
Networking Description
Dual-device ND hot backup enables the master device to back up ND entries at the control and forwarding
layers to the backup device in real time. When the backup device switches to a master device, it uses the
backup ND entries to generate host route information. After you deploy dual-device ND hot backup, once a
master/backup VRRP6 switchover occurs, the new master device forwards downlink traffic with no need for
relearning ND entries. Dual-device ND hot backup ensures downstream traffic continuity.
Figure 1 shows a typical network topology in which a VRRP6 backup group is deployed. In the topology,
Device A is a master device, and Device B is a backup device. In normal circumstances, Device A forwards
both upstream and downstream traffic. If Device A or the link between Device A and the switch fails, a
master/backup VRRP6 switchover is triggered and Device B becomes the master device. Then, Device B needs
to advertise network segment routes to devices on the network side so that downstream traffic is directed
from the network side to Device B. If Device B has not learned ND entries from user-side devices, the
downstream traffic is interrupted. Therefore, downstream traffic can be properly forwarded only after Device
B is deployed with ND dual-device hot backup and learns ND entries of user-side devices.
In addition to a master/backup VRRP6 switchover, a master/backup E-Trunk switchover also triggers this problem.
Therefore, dual-device ND hot backup also applies to E-Trunk master/backup scenarios. This section describes the
implementation of dual-device ND hot backup in VRRP6 scenarios.
2022-07-08 620
Feature Description
Feature Deployment
As shown in Figure 2, a VRRP6 backup group is configured on Device A and Device B. Device A is a master
device, and Device B is a backup device. Device A forwards upstream and downstream traffic.
If Device A or the link between Device A and the switch fails, a master/backup VRRP6 switchover is triggered
and Device B becomes the master device. Device B advertises network segment routes to network-side
devices and downstream traffic is directed to Device B.
• Before you deploy dual-device ND hot backup, Device B does not learn the ND entry of a user-side
device and therefore a large number of ND Miss messages are transmitted. As a result, system
resources are consumed and downstream traffic is interrupted.
• After you deploy dual-device ND hot backup, Device B backs up ND information on Device A in real
time. When Device B receives downstream traffic, it forwards the downstream traffic based on the
2022-07-08 621
Feature Description
backup ND information.
Terms
Term Definition
Dual-device backup A feature in which one device functions as a master device and the other functions
as a backup device. In normal circumstances, the master device provides service
access and the backup device monitors the running status of the master device.
When the master device fails, the backup device switches to a master device and
provides service access, ensuring service traffic continuity.
Remote Backup A configuration template that provides a unified user interface for dual-system
Profile backup configurations.
Remote Backup An inter-device backup channel, used to synchronize data between two devices so
Service that user services can smoothly switch from a faulty device to another device during
a master/backup device switchover.
Redundancy User A Huawei-proprietary protocol used by devices to back up user information between
Information each other over TCP connections.
DR designated router
2022-07-08 622
Feature Description
TE Traffic Engineering
Definition
A bit error refers to the deviation between a bit that is sent and the bit that is received. Cyclic redundancy
checks (CRCs) are commonly used to detect bit errors. Bit errors caused by line faults can be corrected by
rectifying the associated link faults. Random bit errors caused by optical fiber aging or optical signal jitter,
however, are more difficult to correct. Bit-error-triggered protection switching is a reliability mechanism that
2022-07-08 623
Feature Description
triggers protection switching based on bit error events (bit error occurrence event or correction event) to
minimize bit error impact.
Purpose
The demand for network bandwidth is rapidly increasing as mobile services evolve from narrowband voice
services to integrated broadband services, including voice and streaming media. Meeting the demand for
bandwidth with traditional bearer networks dramatically raises carriers' operation costs. To tackle the
challenges posed by this rapid broadband-oriented development, carriers urgently need mobile bearer
networks that are flexible, low-cost, and highly efficient. IP-based mobile bearer networks are an ideal
choice. IP radio access networks (IPRANs), a type of IP-based mobile bearer network, are being increasingly
widely used.
Traditional bearer networks use retransmission or the mechanism that allows one end to accept only one
copy of packets from multiple copies of packets sent by the other end to minimize bit error impact. IPRANs
have higher reliability requirements than traditional bearer networks when carrying broadband services.
Traditional fault detection mechanisms cannot trigger protection switching based on random bit errors. As a
result, bit errors may degrade or even interrupt services on an IPRAN.
To solve this problem, configure bit-error-triggered protection switching.
To prevent impacts on services, check whether protection links have sufficient bandwidth resources before deploying bit-
error-triggered protection switching.
Benefits
Bit-error-triggered protection switching offers the following benefits:
• Protects traffic against random bit errors, meeting high reliability requirements and improving service
quality.
• Enables devices to record bit error events. These records help carriers locate the nodes or lines that have
bit errors and take corrective measures accordingly.
Background
Bit-error-triggered protection switching enables link bit errors to trigger protection switching on network
applications, minimizing the impact of bit errors on services. To implement bit-error-triggered protection
2022-07-08 624
Feature Description
switching, establish an effective bit error detection mechanism to ensure that network applications promptly
detect bit errors.
Related Concepts
Bit error detection involves the following concepts:
• Bit error: deviation between a bit that is sent and the bit that is received.
• BER: number of bit errors divided by the total number of transferred bits during a certain period. The
BER can be considered as an approximate estimate of the probability of a bit error occurring on any
particular bit.
• LSP BER: calculation result based on the BER of each node on an LSP.
• Link-quality: applies to link quality adjustment. This type of detection triggers route cost changes and in
turn route reconvergence to prevent bit errors from affecting services.
2022-07-08 625
Feature Description
BFD packets to advertise the BER. On the network shown in Figure 1, a dynamic CR-LSP is deployed from
PE1 to PE2. If both the transit node P and egress PE2 detect bit errors:
1. The P node obtains the local BER and sends PE2 a BFD packet carrying the BER.
2. PE2 obtains the local BER. After receiving the BER from the P node, PE2 calculates the BER of the CR-
LSP based on the BER received and the local BER.
3. PE2 sends PE1 a BFD packet carrying the BER of the CR-LSP.
4. After receiving the BER of the CR-LSP, PE1 determines the bit error status based on a specified
threshold. If the BER exceeds the threshold, PE1 performs protection switching.
1. The P node uses AIS packets to notify PE2 of the bit error event.
2. After receiving the AIS packets, PE2 reports an AIS alarm to trigger local protection switching. PE2
then sends CRC-AIS packets to PE1 and uses the APS protocol to complete protection switching
through negotiation with PE1.
2022-07-08 626
Feature Description
Background
If bit errors occur on an interface, deploy bit-error-triggered section switching to trigger an upper-layer
application associated with the interface for a service switchover.
Implementation Principles
Trigger-section bit error detection must be enabled on an interface. After detecting bit errors on an inbound
interface, a device notifies the interface management module of the bit errors. The link layer protocol status
of the interface then changes to bit-error-detection Down, triggering an upper-layer application associated
with the interface for a service switchover. After the bit errors are cleared, the link layer protocol status of
the interface changes to Up, triggering an upper-layer application associated with the interface for a service
switchback. The device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device.
• If bit-error-triggered section switching also has been deployed on the peer device, the bit error status is
advertised to the interface management module of the peer device. The link layer protocol status of the
interface then changes to bit-error-detection Down or Up, triggering an upper-layer application
associated with the interface for a service switchover or switchback.
• If bit-error-triggered section switching is not deployed on the peer device, the peer device cannot detect
the bit error status of the interface's link. In this case, the peer device can only depend on an upper-
layer application (for example, IGP) for link fault detection.
For example, on the network shown in Figure 1, trigger-section bit error detection is enabled on each
interface, and nodes communicate through IS-IS routes. In normal cases, IS-IS routes on PE1 and PE2 are
preferentially transmitted over the primary link. Therefore, traffic in both directions is forwarded over the
primary link. If PE2 detects bit errors on the interface to PE1:
2022-07-08 627
Feature Description
• The link layer protocol status of the interface changes to bit-error-detection Down, triggering IS-IS
routes to be switched to the secondary link. Traffic from PE2 to PE1 is then forwarded over the
secondary link. PE2 uses a BFD packet to notify PE1 of the bit errors.
• After receiving the BFD packet, PE1 sets the link layer protocol status of the corresponding interface to
bit-error-detection Down, triggering IS-IS routes to be switched to the secondary link. Traffic from PE1
to PE2 is then forwarded over the secondary link.
If trigger-section bit error detection is not supported or enabled on PE1's interface to PE2, PE1 can only use
IS-IS to detect that the primary link is unavailable, and then performs an IS-IS route switchover.
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered section switching to cope with link bit errors on the LDP
LSPs.
After bit-error-triggered section switching is deployed, if bit errors occur on both the primary and secondary links on an
LDP LSP, the interface status changes to bit-error-detection Down on both the primary and secondary links. As a result,
services are interrupted. Therefore, it is recommended that you deploy bit-error-triggered IGP route switching.
Background
Bit-error-triggered section switching can cope with link bit errors. If bit errors occur on both the primary and
secondary links, bit-error-triggered section switching changes the interface status on both the primary and
secondary links to bit-error-detection Down. As a result, services are interrupted because no link is available.
To resolve the preceding issue, deploy bit-error-triggered IGP route switching. After the deployment is
2022-07-08 628
Feature Description
complete, link bit errors trigger IGP route costs to be adjusted, preventing upper-layer applications from
transmitting service traffic to links with bit errors. Bit-error-triggered IGP route switching ensures normal
running of upper-layer applications and minimizes the impact of bit errors on services.
Implementation Principles
Link-quality bit error detection must be enabled on an interface. After detecting bit errors on an inbound
interface, a device notifies the interface management module of the bit errors. The link quality level of the
interface then changes to Low, triggering an IGP (OSPF or IS-IS) to increase the cost of the interface's link. In
this case, IGP routes do not preferentially select the link with bit errors. After the bit errors are cleared, the
link quality level of the interface changes to Good, triggering the IGP to restore the original cost for the
interface's link. In this case, IGP routes preferentially select the link again. The device also notifies the BFD
module of the bit error status, and then uses BFD packets to advertise the bit error status to the peer device.
• If bit-error-triggered IGP route switching also has been deployed on the peer device, the bit error status
is advertised to the interface management module of the peer device. The link quality level of the
interface then changes to Low or Good, triggering the IGP to increase the cost of the interface's link or
restore the original cost for the link. IGP routes on the peer device then do not preferentially select the
link with bit errors or preferentially select the link again.
• If bit-error-triggered IGP route switching is not deployed on the peer device, the peer device cannot
detect the bit error status of the interface's link. Therefore, the IGP does not adjust the cost of the link.
Traffic from the peer device may still pass through the link with bit errors. As a result, bidirectional IGP
routes pass through different links. The local device can receive traffic properly, and services are not
interrupted. However, the impact of bit errors on services cannot be eliminated.
For example, on the network shown in Figure 1, link-quality bit error detection is enabled on each interface,
and nodes communicate through IS-IS routes. In normal cases, IS-IS routes on PE1 and PE2 are preferentially
transmitted over the primary link. Therefore, traffic in both directions is forwarded over the primary link. If
PE2 detects bit errors on interface 1:
• PE2 adjusts the link quality level of interface 1 to Low, triggering IS-IS to increase the cost of the
interface's link to a value (for example, 40). PE2 uses a BFD packet to advertise the bit errors to PE1.
• After receiving the BFD packet, PE1 also adjusts the link quality level of interface 1 to Low, triggering IS-
IS to increase the cost of the interface's link to a value (for example, 40).
IS-IS routes on both PE1 and PE2 preferentially select the secondary link, because the cost (20) of the
secondary link is less than the cost (40) of the primary link. Traffic in both directions is then switched to the
secondary link.
If bit-error-triggered IGP route switching is not supported or enabled on PE1, PE1 cannot detect the bit
errors. In this case, PE1 still sends traffic to PE2 through the primary link. PE2 can receive traffic properly, but
services are affected by the bit errors.
If PE2 detects bit errors on both interface 1 and interface 2, PE2 adjusts the link quality levels of the
interfaces to Low, triggering the costs of the interfaces' links to be increased to 40. IS-IS routes on PE2 still
2022-07-08 629
Feature Description
preferentially select the primary link to ensure service continuity, because the cost (40) of the primary link is
less than the cost (50) of the secondary link. To eliminate the impact of bit errors on services, you must
manually restore the link quality.
Bit-error-triggered section switching and bit-error-triggered IGP route switching are mutually exclusive.
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered IGP route switching to cope with link bit errors on the LDP
LSPs. Bit-error-triggered IGP route switching ensures service continuity even if bit errors occur on both the
primary and secondary links on an LDP LSP. Therefore, it is recommended that you deploy bit-error-
triggered IGP route switching.
Background
If a trunk interface is used to increase bandwidth, improve reliability, and implement load balancing, deploy
bit-error-triggered trunk update to cope with bit errors detected on trunk member interfaces.
Implementation Principles
According to the types of protection switching triggered, bit-error-triggered trunk update is classified as
follows:
Trunk-bit-error-triggered section switching
On the network shown in Figure 1, trigger-section or trigger-LSP bit error detection must be enabled on
each trunk member interface. After detecting bit errors on a trunk interface's member interface, a device
2022-07-08 630
Feature Description
advertises the bit errors to the trunk interface, triggering the trunk interface to delete the member interface
from the forwarding plane. The trunk interface then does not select the member interface to forward traffic.
After the bit errors are cleared from the member interface, the trunk interface re-adds the member interface
to the forwarding plane. The trunk interface can then select the member interface to forward traffic. If bit
errors occur on all trunk member interfaces or the number of member interfaces without bit errors is lower
than the lower threshold for the trunk interface's Up links, the trunk interface goes Down. An upper-layer
application associated with the trunk interface is then triggered to perform a service switchover. If the
number of member interfaces without bit errors reaches the lower threshold for the trunk interface's Up
links, the trunk interface goes Up. An upper-layer application associated with the trunk interface is then
triggered to perform a service switchback.
The device also notifies the BFD module of the bit error status, and then uses BFD packets to advertise the
bit error status to the peer device connected to the trunk interface.
• If trunk-bit-error-triggered section switching also has been deployed on the peer device, the bit error
status is advertised to the trunk interface of the peer device. The trunk interface is then triggered to
delete or re-add the member interface from or to the forwarding plane. The trunk interface is also
triggered to go Down or Up, implementing switchover or switchback synchronization with the device.
• If trunk-bit-error-triggered section switching is not deployed on the peer device, the peer device cannot
detect the bit error status of the interface's link. To ensure normal running of services, the device can
receive traffic from the member interface with bit errors in the following cases:
■ The trunk interface of the device has deleted the member interface with bit errors from the
forwarding plane or has gone Down.
■ The trunk interface of the peer device can still forward traffic.
2022-07-08 631
Feature Description
interface, triggering the trunk interface to delete the member interface from the forwarding plane. The trunk
interface then does not select the member interface to forward traffic. After the bit errors are cleared from
the member interface, the trunk interface re-adds the member interface to the forwarding plane. The trunk
interface can then select the member interface to forward traffic. If bit errors occur on all trunk member
interfaces or the number of member interfaces without bit errors is lower than the lower threshold for the
trunk interface's Up links, the trunk interface ignores the bit errors on the member interfaces and remains
Up. However, the link quality level of the trunk interface becomes Low, triggering an IGP (OSPF or IS-IS) to
increase the cost of the trunk interface's link. IGP routes then do not preferentially select the link. If the
number of member interfaces without bit errors reaches the lower threshold for the trunk interface's Up
links, the link quality level of the trunk interface changes to Good, triggering the IGP to restore the original
cost for the trunk interface's link. In this case, IGP routes preferentially select the link again.
The device also notifies the BFD module of the bit error status, and then uses BFD packets to advertise the
bit error status to the peer device connected to the trunk interface.
• If trunk-bit-error-triggered IGP route switching also has been deployed on the peer device, the bit error
status is advertised to the trunk interface of the peer device. The trunk interface is then triggered to
delete or re-add the member interface from or to the forwarding plane. The link quality level of the
trunk interface is also triggered to change to Low or Good. In this case, the cost of IGP routes is
adjusted, implementing switchover or switchback synchronization with the device.
• If trunk-bit-error-triggered IGP route switching is not deployed on the peer device, the peer device
cannot detect the bit error status of the interface's link. If the trunk interface of the device has deleted
the member interface with bit errors from the forwarding plane, the trunk interface of the peer device
may still select the member interface to forward traffic. Similarly, if the link quality level of the trunk
interface on the device has changed to Low, the IGP is triggered to increase the cost of the trunk
interface's link. In this case, IGP routes do not preferentially select the link. However, IGP on the peer
device does not adjust the cost of the link. Traffic from the peer device may still pass through the link
with bit errors. As a result, bidirectional IGP routes pass through different links. To ensure normal
running of services, the device can receive traffic from the member interface with bit errors. However,
bit errors may affect service quality.
2022-07-08 632
Feature Description
Layer 2 trunk interfaces do not support an IGP. Therefore, bit-error-triggered IGP route switching cannot be deployed on
Layer 2 trunk interfaces. If bit errors occur on all Layer 2 trunk member interfaces or the number of member interfaces
without bit errors is lower than the lower threshold for the trunk interface's Up links, the trunk interface remains in the
Up state. As a result, protection switching cannot be triggered. To eliminate the impact of bit errors on services, you
must manually restore the link quality.
Usage Scenario
If a trunk interface is deployed, deploy bit-error-triggered trunk update to cope with bit errors detected on
trunk member interfaces. Trunk-bit-error-triggered IGP route switching is recommended.
Background
To cope with link bit errors along an RSVP-TE tunnel and reduce the impact of bit errors on services, deploy
bit-error-triggered RSVP-TE tunnel switching. After the deployment is complete, service traffic is switched
from the primary CR-LSP to the backup CR-LSP if bit errors occur.
Implementation Principles
On the network shown in Figure 1, trigger-LSP bit error detection must be enabled on each node's interfaces
on the RSVP-TE tunnels. To implement dual-ended switching, configure the RSVP-TE tunnels in both
directions as bidirectional associated CR-LSPs. If a node on a CR-LSP detects bit errors in a direction, the
ingress of the tunnel obtains the BER of the CR-LSP after BER calculation and advertisement. For details, see
Bit Error Detection.
2022-07-08 633
Feature Description
The ingress then determine the bit error status of the CR-LSP based on the BER threshold configured for the
RSVP-TE tunnel. For rules for determining the bit error status of the CR-LSP, see Figure 2.
• If the BER of the CR-LSP is greater than or equal to the switchover threshold of the RSVP-TE tunnel, the
CR-LSP is always in the excessive BER state.
• If the BER of the CR-LSP falls below the switchback threshold, the CR-LSP changes to the normalized
BER state.
Figure 2 Rules for determining the bit error status of the CR-LSP
After the bit error statuses of the primary and backup CR-LSPs are determined, the RSVP-TE tunnel
determines whether to perform a primary/backup CR-LSP switchover based on the following rules:
• If the primary CR-LSP is in the excessive BER state, the RSVP-TE tunnel attempts to switch traffic to the
backup CR-LSP.
• If the primary CR-LSP changes to the normalized BER state or the backup CR-LSP is in the excessive BER
state, traffic is switched back to the primary CR-LSP.
The RSVP-TE tunnel in the opposite direction also performs the same switchover, so that traffic in the
2022-07-08 634
Feature Description
upstream and downstream directions is not transmitted over the CR-LSP with bit errors.
Usage Scenario
If RSVP-TE tunnels are used as public network tunnels, deploy bit-error-triggered RSVP-TE tunnel switching
to cope with link bit errors along the tunnels.
Background
SR-MPLS TE LSP establishment does not require protocols. Therefore, an SR-MPLS TE LSP can be established
as long as a label stack is delivered. If an SR-MPLS TE LSP encounters bit errors, upper-layer services may be
affected.
To cope with link bit errors along an SR MPLS-TE tunnel and reduce the impact of bit errors on services,
deploy bit-error-triggered SR-MPLS TE LSP switching. After this function is enabled, service traffic is switched
from the primary SR MPLS-TE LSP to the backup SR MPLS-TE LSP if bit errors occur.
Implementation Principles
On the network shown in Figure 1, bit error detection must be enabled on the PEs along the SR MPLS-TE
tunnel. If static BFD detects bit errors on the primary LSP of the SR-MPLS TE tunnel, it instructs the SR-MPLS
TE tunnel to switch traffic from the primary LSP to the backup LSP. This minimizes the impact on services.
The SR MPLS-TE tunnel is unidirectional. To detect bit errors on the LSP from PE1 to PE2, enable bit error
detection on PE1. To detect bit errors on the LSP from PE2 to PE1, enable bit error detection on PE2.
Usage Scenario
2022-07-08 635
Feature Description
If an SR MPLS-TE tunnel is used as a public network tunnel, deploy bit-error-triggered SR MPLS-TE LSP
switching to cope with link bit errors along the tunnel.
Background
When PW redundancy is configured for L2VPN services, bit-error-triggered switching can be configured. With
this function, if bit errors occur, services can switch between the primary and secondary PWs.
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. PW redundancy can be configured
in either a single segment or multi-segment scenario.
■ PE2 switches traffic destined for PE1 to the path bypass PW -> PE3 -> secondary PW -> PE1 and
sends a BFD packet to notify PE1 of the bit errors.
■ Upon receipt of the BFD packet, PE1 switches traffic destined for PE2 to the path secondary PW->
PE3 -> bypass PW -> PE2.
Traffic between PE1 and PE2 can travel along bit-error-free links.
2022-07-08 636
Feature Description
■ PE2 switches traffic destined for PE1 to the path bypass PW -> PE3 -> PW2 -> SPE2 -> secondary
PW -> PE1 and sends a BFD packet to notify SPE1 of the bit errors.
■ Upon receipt of the BFD packet, SPE1 sends an LDP Notification message to notify PE1 of the bit
errors.
■ Upon receipt of the notification, PE1 switches traffic destined for PE2 to the path secondary PW ->
SPE2 -> PW2 -> PE3 -> bypass PW-> PE2.
Traffic between PE1 and PE2 can travel along bit-error-free links. If bit errors occur on a link between
PE1 and SPE1, the processing is the same as that in the single-segment PW redundancy scenario.
After traffic switches to the secondary PW, and bit errors are removed from the primary PW, traffic switches
back to the primary PW based on a configured switchback policy.
If an RSVP-TE tunnel is established for PWs, and bit-error-triggered RSVP-TE tunnel switching is configured, a switchover
is preferentially performed between the primary and hot-standby CR-LSPs in the RSVP-TE tunnel. A primary/secondary
PW switchover can be triggered only if the primary/hot-standby CR-LSP switchover fails to remove bit errors in either of
the following situations:
Usage Scenario
If L2VPN is used to carry user services and PW redundancy is deployed to ensure reliability, deploy bit-error-
triggered switching for PW to minimize the impact of bit errors on user services and improve service quality.
2022-07-08 637
Feature Description
Background
On an FRR-enabled HVPN, bit-error-triggered switching can be configured for VPN routes. With this
function, if bit errors occur on the HVPN, VPN routes re-converge so that traffic switches to a bit-error-free
link.
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. In Figure 1, an HVPN is
configured on an IP/MPLS backbone network. VPN FRR is configured on a UPE. If SPE1 detects bit errors, the
processing is as follows:
• SPE1 reduces the Local Preference attribute value or increase the Multi-Exit Discrimination (MED)
attribute value. Then, the preference value of a VPN route that SPE1 advertises to an NPE is reduced. As
a result, the NPE selects the VPN route to SPE2, not the VPN route to SPE1. Traffic switches to the
standby link. In addition, SPE1 sends a BFD packet to notify the UPE of bit errors.
• Upon receipt of the BFD packet, the UPE switches traffic to the standby link over the VPN route
destined for SPE2.
If the bit errors on the active link are removed, the UPE re-selects the VPN routes destined for SPE1, and
SPE1 restores the preference value of the VPN route to be advertised to the NPE. Then the NPE also re-
selects the VPN route destined for SPE1.
If an RSVP-TE tunnel is established for an L3VPN, and bit-error-triggered RSVP-TE tunnel switching is configured, a
2022-07-08 638
Feature Description
traffic switchover between the primary and hot-standby CR-LSPs in the RSVP-TE tunnel is preferentially performed. An
active/standby L3VPN route switchover can be triggered only if the primary/hot-standby CR-LSP switchover fails to
remove bit errors in either of the following situations:
Usage Scenario
If L3VPN is used to carry user services and VPN FRR is deployed to ensure reliability, deploy bit-error-
triggered L3VPN switching to minimize the impact of bit errors on user services and improve service quality.
Background
In PW/E-PW over static CR-LSP scenarios, if primary and secondary PWs are configured, deploy bit-error-
triggered protection switching. If bit errors occur, service traffic is switched from the primary PW to the
secondary PW.
Implementation Principles
The MAC-layer SD alarm function (Trigger-LSP type) must be enabled on interfaces, and then MPLS-TP
OAM must be deployed to monitor CR-LSPs/PWs. Static PWs/E-PWs are classified as SS-PWs or MS-PWs.
In an SS-PW networking scenario (see Figure 1), the bit error generation and clearing process is as follows:
Bit error generation:
• If the BER on an inbound interface of the P node reaches a specified threshold, the CRC module detects
the bit error status of the inbound interface, notifies all static CR-LSP modules, and constructs and sends
AIS packets to PE2.
• Upon receipt of the AIS packets, PE2 notifies static PWs established over the CR-LSPs of the bit errors
and instructs the TP OAM module to perform APS. APS triggers a primary/backup CR-LSP switchover,
and a PW established over the new primary CR-LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error status on the
inbound interface. The CRC module informs the TP OAM module that the bit errors have been cleared. Upon
receipt of the notification, the TP OAM module stops sending AIS packets to PE2 functioning as the egress.
PE2 does not receive AIS packets after a specified period and determines that the bit errors have been
cleared. PE2 then generates an AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a
primary/backup CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
2022-07-08 639
Feature Description
In an MS-PW networking scenario (see Figure 2), the bit error generation and clearing process is as follows:
Bit error generation:
• The CRC module of an inbound interface on the SPE detects bit errors and determines to send either an
SF or SD alarm based on a specified BER threshold. The CRC module then notifies the TP OAM module
of the bit errors. The TP OAM module notifies the bit error status, sends RDI packets, and performs APS.
The APS module instructs the peer node to perform a traffic switchover, which triggers a
primary/backup CR-LSP switchover. The PW established over the bit-error-free CR-LSP takes over traffic.
• If the BER on an inbound interface of the SPE reaches a specified threshold, the CRC module detects the
bit error status of the inbound interface, sets all static CR-LSP modules to the bit error status, and
constructs and sends AIS packets to PE2.
• Upon receipt of the AIS packets, PE2 notifies the TP OAM module. The TP OAM module then performs
APS, which triggers a primary/backup CR-LSP switchover. The PW established over the bit-error-free CR-
LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error status on the
inbound interface. The CRC module informs the TP OAM module that the bit errors have been cleared. Upon
receipt of the notification, the TP OAM module stops sending AIS packets to PE2 functioning as the egress.
PE2 does not receive AIS packets after a specified period and determines that the bit errors have been
cleared. PE2 then generates an AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a
primary/backup CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
2022-07-08 640
Feature Description
If a tunnel protection group has been deployed for static CR-LSPs carrying PWs/E-PWs, bit errors preferentially trigger
static CR-LSP protection switching. Bit-error-triggered PW protection switching is performed only when bit-error-
triggered static CR-LSP protection switching fails to protect services against bit errors (for example, bit errors occur on
both the primary and backup CR-LSPs).
Usage Scenario
If static CR-LSPs/PWs/E-PWs are used to carry user services and MPLS-TP OAM is deployed to ensure
reliability, deploy bit-error-triggered APS to minimize the impact of bit errors on user services and improve
service quality.
Bit error A device uses the CRC - This feature is the To prevent line jitters
detection algorithm to detect bit basis of other bit- from frequently
errors on an inbound error-triggered triggering service
interface. Bit error protection switchovers and
detection types are switching features. switchbacks, set the bit
classified as trigger-LSP, error alarm clear
trigger-section, or link- threshold to be one
2022-07-08 641
Feature Description
Bit-error- If bit errors are generated Trigger-section bit This feature is Enable bit-error-
triggered or cleared on an interface, error detection independently triggered section
section the link layer protocol must be enabled deployed. switching on the
switching status of the interface on an interface. When deploying interfaces at both ends
changes to bit-error- The bit error status trunk-bit-error- of a link.
detection Down or Up, must be advertised triggered section If bit errors occur on
triggering an upper-layer using BFD packets. switching, you can both the primary and
application associated enable bit-error- secondary links, bit-
with the interface for a triggered section error-triggered section
service switchover or switching on trunk switching may interrupt
switchback. member interfaces. services. Therefore, bit-
error-triggered IGP
route switching is
recommended.
Bit-error- If bit errors are generated Link-quality bit This feature is Enable bit-error-
triggered or cleared on an interface, error detection independently triggered IGP route
IGP route the link quality level of must be enabled deployed. switching on the
switching the interface changes to on an interface. When deploying interfaces at both ends
Low or Good, triggering The bit error status trunk-bit-error- of a link.
an IGP (OSPF or IS-IS) to must be advertised triggered IGP route
increase the cost of the using BFD packets. switching, you
interface's link or restore must deploy bit-
the original cost for the error-triggered IGP
link. IGP routes on the route switching on
peer device then do not trunk interfaces.
2022-07-08 642
Feature Description
Bit-error- If bit errors are generated When deploying Trunk-bit-error- Enable the same bit-
triggered or cleared on a trunk trunk-bit-error- triggered section error-triggered
trunk member interface, the triggered section switching is protection switching
update trunk interface is switching, you independently function on the trunk
triggered to delete or re- must enable deployed. interfaces at both ends.
add the member interface trigger-section or When deploying Trunk-bit-error-
from or to the forwarding trigger-LSP bit trunk-bit-error- triggered IGP route
plane. If bit errors occur error detection on triggered IGP route switching is
on all trunk member trunk member switching, you recommended.
interfaces or the number interfaces. must deploy bit- Layer 2 trunk interfaces
of member interfaces When deploying error-triggered IGP do not support an IGP.
without bit errors is lower trunk-bit-error- route switching on Therefore, bit-error-
than the lower threshold triggered IGP route trunk interfaces. triggered IGP route
for the trunk interface's switching, you switching cannot be
Up links, bit-error- must enable link- deployed on Layer 2
triggered protection quality bit error trunk interfaces.
switching involves the detection on trunk
following modes: member interfaces.
Trunk-bit-error-triggered The bit error status
section switching: The must be advertised
trunk interface goes using BFD packets.
Down, triggering an
upper-layer application
associated with the trunk
interface to perform a
service switchover.
Trunk-bit-error-triggered
IGP route switching: The
trunk interface ignores
the bit errors on the
2022-07-08 643
Feature Description
Bit-error- The ingress of the primary Trigger-LSP bit This feature is To implement dual-
triggered and backup CR-LSPs error detection independently ended switching, deploy
RSVP-TE determines the bit error must be enabled deployed. bit-error-triggered
tunnel statuses of the CR-LSPs on an interface. This feature is protection switching on
switching based on link BERs. A The bit error status deployed together the RSVP-TE tunnels in
service switchover or must be advertised with bit-error- both directions and
switchback is then using BFD packets. triggered PW configure the tunnels as
performed based on the switching. bidirectional associated
bit error statuses of the This feature is CR-LSPs.
CR-LSPs. deployed together
with bit-error-
triggered L3VPN
switching.
Bit-error- If bit errors occur, service Trigger-LSP bit This feature is If an RSVP-TE tunnel
triggered traffic is switched from error detection deployed together with bit-error-triggered
PW the primary PW to the must be enabled with bit-error- protection switching
switching secondary PW. on an interface. triggered RSVP-TE enabled is used to carry
The bit error status tunnel switching. a PW, bit-error-
must be advertised triggered RSVP-TE
using BFD packets. tunnel switching is
preferentially
performed. Bit-error-
triggered PW switching
2022-07-08 644
Feature Description
Bit-error- If bit errors occur, VPN Trigger-LSP bit This feature is If an RSVP-TE tunnel
triggered routes are triggered to error detection deployed together with bit-error-triggered
L3VPN reconverge. Service traffic must be enabled with bit-error- protection switching
route is then switched to the on an interface. triggered RSVP-TE enabled is used to carry
switching link without bit errors. The bit error status tunnel switching. an L3VPN, bit-error-
must be advertised triggered RSVP-TE
using BFD packets. tunnel switching is
preferentially
performed. Bit-error-
triggered L3VPN route
switching is performed
only when bit-error-
triggered RSVP-TE
tunnel switching fails to
protect services against
bit errors.
2022-07-08 645
Feature Description
Background
In an NG MVPN scenario, multicast data flows must be transmitted on a link that has no or few bit errors
because even low bit error rates may cause black screen, erratic display, or frame interruption. If multiple
links exist between the upstream and downstream nodes of an mLDP tunnel and some links are logical
instead of physical direct links, the upstream node randomly selects an outbound interface by default, and
sends packets over the link of the selected interface to the downstream node. Customers require link
switching for NG MVPN multicast data flows if the link in use has a high bit error rate. They expect the
mLDP upstream node to detect bit error rates of the links and switch to an outbound interface that is
connected to a downstream link with few or no bit errors if the link in use is of low quality.
Fundamentals
The upstream and downstream nodes establish an IS-IS neighbor relationship using a logical direct link.
BFD-based bit error detection is enabled on the logical interfaces. After the downstream node detects a bit
error fault on its inbound interface, the downstream node notifies the interface management module of the
fault. The upper-layer service module then takes an action, for example, changing the IGP cost. The
downstream node also notifies the BFD module of the bit error status and uses BFD messages to transmit
the bit error status and bit error rate to the IS-IS neighbor, that is, the upstream node. If the upstream node
is capable of bit error detection based on the IP neighbor type, the BFD module on the upstream node
receives the bit error rate. The mLDP tunnel then selects an outbound interface based on the bit error rates
2022-07-08 646
Feature Description
of links, implementing association between NG MVPN services and bit error rates. After the bit error fault is
rectified, the interface associated with the IS-IS service is restored. For example, the cost of the associated
IGP is restored.
On the network shown in Figure 1, the leaf node and P2 are directly connected using a logical link on the
path Leaf-P1-Root-P2. A physical direct link is also available between the leaf node and P2. On the NG
MVPN, the leaf node is a downstream node, and P1 and P2 are upstream nodes. Normally, the primary path
from the leaf node to the root node is Leaf-P1-Root. If bit errors occur on the interface connecting the leaf
node to P1:
• The leaf node notifies the local interface management module of the bit error fault, triggering IS-IS to
increase the link cost of the interface and switch the IS-IS route to the backup link. The mLDP egress
node then selects the backup unicast path Leaf-P2-Root.
• The leaf node also uses BFD messages to transmit the bit error status to P2. P2 functions as an mLDP
intermediate node and has two links to its downstream node. After P2 receives the bit error rate of the
logical direct link, P2 switches the downstream outbound interface to an interface on the physical direct
link with no bit errors. mLDP outbound interface switching is then complete.
Currently, bit error rate-based protection switching takes effect only after a neighbor relationship is established, and only
mLDP tunnels support bit error rate-based selection of outbound interfaces.
Application Scenarios
In an NG MVPN scenario, multiple links exist between an upstream node and a downstream node, and bit
errors need to be detected on logical instead of physical direct links between the two nodes.
2022-07-08 647
Feature Description
Networking Description
Figure 1 shows typical L2VPN+L3VPN networking in an IP RAN application. A VPWS based on an RSVP-TE
tunnel is deployed at the access layer, an L3VPN based on an RSVP-TE tunnel is deployed at the aggregation
layer, and L2VPN access to L3VPN is configured on the AGGs. To ensure reliability, deploy PW redundancy
for the VPWS, configure VPN FRR protection for the L3VPN, and configure hot-standby protection for the
RSVP-TE tunnels.
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered RSVP-TE tunnel switching, bit-
error-triggered PW switching, and bit-error-triggered L3VPN route switching in the scenario shown in Figure
1. The deployment process is as follows:
2022-07-08 648
Feature Description
• Bit-error-triggered L3VPN route switching: Configure bit-error-triggered L3VPN route switching in the
VPNv4 view of AGG1.
Scenario 2
On the network shown in Figure 3, if bit errors occur on both locations 1 and 2, both the primary and
secondary links of the RSVP-TE tunnel between the CSG and AGG1 detect the bit errors. In this case, bit-
error-triggered RSVP-TE tunnel switching cannot protect services against bit errors. The bit errors further
trigger PW and L3VPN route switching.
• After detecting the bit errors, the CSG performs a primary/secondary PW switchover and switches
upstream traffic to AGG2.
2022-07-08 649
Feature Description
• After detecting the bit errors, AGG1 reduces the priority of VPNv4 routes advertised to RSG1, so that
RSG1 preferentially selects VPNv4 routes advertised by AGG2. Downstream traffic is then switched to
AGG2.
Networking Description
Figure 1 shows typical L2VPN+L3VPN networking in an IP RAN application. A VPWS based on an LDP LSP is
deployed at the access layer, an L3VPN based on an LDP LSP is deployed at the aggregation layer, and
L2VPN access to L3VPN is configured on the AGGs. To ensure reliability, deploy LDP and IGP synchronization
for the LDP LSPs, and configure Eth-Trunk interfaces on key links.
2022-07-08 650
Feature Description
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered IGP route switching in the scenario
shown in Figure 1. Deploy trunk-bit-error-triggered IGP route switching on the Eth-Trunk interfaces. The
deployment process is as follows:
• Enable link-quality bit error detection on each physical interface and Eth-Trunk member interface.
• Enable bit-error-triggered IGP route switching on each physical interface and Eth-Trunk interface.
2022-07-08 651
Feature Description
Scenario 2
On the network shown in Figure 3, if bit errors occur on location 2 (Eth-Trunk member interface), AGG1
detects the bit errors.
• If the number of member interfaces without bit errors is still higher than the lower threshold for the
Eth-Trunk interface's Up links, the Eth-Trunk interface deletes the Eth-Trunk member interface from the
forwarding plane. In this case, service traffic is still forwarded over the normal path.
• If the number of member interfaces without bit errors is lower than the lower threshold for the Eth-
Trunk interface's Up links, the Eth-Trunk interface ignores the bit errors on the Eth-Trunk member
interface and remains Up. However, the link quality level of the Eth-Trunk interface becomes Low,
triggering an IGP (OSPF or IS-IS) to increase the cost of the Eth-Trunk interface's link. IGP routes then
do not preferentially select the link. AGG1 also uses a BFD packet to advertise the bit errors to the peer
device, so that the peer device also performs the same processing. Both upstream and downstream
traffic are then switched to the paths without bit errors.
2022-07-08 652
Feature Description
Networking Description
Figure 1 shows a typical IP RAN. L2VPN services are carried on static CR-LSPs. CR-LSP APS is configured to
provide tunnel-level protection. Additionally, PW APS/E-PW APS is configured for L2VPN services to provide
service-level protection.
2022-07-08 653
Feature Description
Feature Deployment
To meet high reliability requirements of the IP RAN and protect services against bit errors, configure bit-
error-triggered protection switching for the CR-LSPs/PWs. To do so, enable bit error detection on the
interfaces along the CR-LSPs/PWs, configure the switching type as trigger-LSP, and configure bit error alarm
generation and clearing thresholds. If the BER reaches the bit error alarm threshold configured on an
interface of a device along a static CR-LSP or PW, the device determines that a bit error occurrence event
has occurred and notifies the MPLS-TP OAM module of the event. The MPLS-TP OAM module uses AIS
packets to advertise the bit error status to the egress, and then APS is used to trigger a traffic switchover.
Terms
Term Definition
Bit error The deviation between a bit that is sent and the bit
that is received. Cyclic redundancy checks (CRCs) are
commonly used to detect bit errors.
BER (bit error rate) A bit error rate (BER) indicates the probability that
incorrect packets are received and packets are
discarded.
2022-07-08 654
Feature Description
PW pseudo wire
2022-07-08 655
Feature Description
Purpose
This document describes the interface and link feature in terms of its overview, principle, and applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
2022-07-08 656
Feature Description
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
2022-07-08 657
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2022-07-08 658
Feature Description
Definition
An interface is a point of interaction between devices on a network. Interfaces are classified into physical
and logical interfaces.
• Logical interfaces are manually configured interfaces that do not exist physically. They are used to
exchange data.
Purpose
A physical interface connects a device to another device using a transmission medium (for example, a cable).
The physical interface and transmission medium together form a transmission channel that transmits data
between the devices. Before data reaches a device, it must pass through the transmission channel. In
addition, sufficient bandwidth must be provided to reduce channel congestion.
A logical interface does not require additional hardware resources, thereby reducing investment costs.
Generally, a switching device provides multiple interfaces, many of which have the same configuration. To
simplify the configuration of interfaces, create an interface group and add interfaces to the interface group.
When you run a command in the interface group view, the system automatically applies the command to all
the interfaces in the interface group. In this manner, interfaces in a group are configured in batches.
Benefits
Interface management brings the following benefits to users:
• Data can be transmitted properly over a transmission channel that a physical interface and a
transmission medium form, therefore enabling communication between users.
• Data communication can be implemented using logical interfaces, without additional hardware
requirements.
• An interface group can be used to implement batch interface configurations, simplifying interface
2022-07-08 659
Feature Description
Interface Types
Devices exchange data and interact with other devices on a network through interfaces. Interfaces are
classified into physical and logical interfaces.
• Physical Interfaces
Physical interfaces physically exist on boards. They are divided into the following types:
■ LAN interfaces: interfaces through which the Router can exchange data with other devices on a
LAN.
■ WAN interfaces: interfaces through which the Router can exchange data with remote devices on
external networks.
• Logical Interfaces
Logical interfaces are manually configured interfaces that do not exist physically. Logical interfaces can
be used to exchange data.
Table 1 Commands, views, and prompts of physical interfaces supported by the NE40E
NOTE:
The interfaces
2022-07-08 660
Feature Description
2022-07-08 661
Feature Description
50GE 0/1/0
command in the
system view.
NOTE:
command in the
system view.
The default rate of
this type of
interface is 50
Gbit/s and can be
switched to 100
Gbit/s.
2022-07-08 662
Feature Description
2022-07-08 663
Feature Description
The link layer is responsible for accurately sending data from a node to a neighboring node. It receives
packets from the network layer, encapsulates the packets in frames, and then sends the frames to the
physical layer.
Major link layer protocols supported by the NE40E are listed as follows:
• Ethernet
Currently, the LAN mostly refers to the Ethernet. The Ethernet is a broadcast network, which is flexible
and simple in configuration as well as easy to expand. For these reasons, the Ethernet is widely used.
• Trunk
Trunks can be classified into Eth-Trunks and IP-Trunks. An Eth-Trunk must be composed of Ethernet
links, and an IP-Trunk must be composed of POS links.
The trunk technology has the following advantages:
■ Bandwidth increase: The bandwidth of a trunk is the total bandwidth of all member interfaces.
■ Reliability enhancement: When a link fails, other links in the same trunk automatically take over
the services on the faulty link to prevent traffic interruption.
• PPP
The Point-to-Point Protocol (PPP) is used to encapsulate IP packets on serial links. It supports both the
asynchronous transmission of 8-bit data without the parity check and the bit-oriented synchronous
connection.
PPP consists of the Link Control Protocol (LCP) and the Network Control Protocol (NCP). LCP is used to
create, configure, and test links; NCP is used to control different network layer protocols.
• HDLC
The High-Level Data Link Control (HDLC) is a suite of protocols that are used to transmit data between
network nodes. HDLC is widely used at the data link layer.
In HDLC, the receiver responds with an acknowledgment when it receives frames transmitted over the
network. In addition, HDLC manages data flows and the interval at which data packets are transmitted.
MTU
The maximum transmission unit (MTU) is the size (in bytes) of the longest packet that can be transmitted
on a physical network. The MTU is very important for interworking between two devices on a network. If the
size of a packet exceeds the MTU supported by a transit node or a receiver, the transit node or receiver may
fragment the packet before forwarding it or may even discard it, increasing the network transmission loads.
MTU values must be correctly negotiated between devices to ensure that packets reach the receiver.
• If fragmentation is disallowed, packet loss may occur during data transmission at the IP layer. To ensure
that long packets are not discarded during transmission, configure forcible fragmentation for long
packets.
• When an interface with a small MTU receives long packets, the packets have to be fragmented.
Consequently, when the quality of service (QoS) queue becomes full, some packets may be discarded.
2022-07-08 664
Feature Description
Loopback
The physical interface of the router supports loopback local and loopback remote. The following figure
shows the two loopback paths.
• loopback local
The differences between local loopback and optical fiber loopback based on optical modules are as
follows: In the local loopback mode, service traffic does not pass through the Framer's optical module
driver circuit. During the forwarding tests, only a few boards inserted with optical modules perform
optical fiber loopback to test the Framer's optical module driver circuit. Local loopback can be
configured on the interfaces to test the forwarding performance and stability, which saves materials.
Redirection is a class behavior of a QoS policy. Redirection can change the IP packets' next-hop IP
addresses and outbound interfaces, and apply to specific interfaces to change the IP service forwarding
destination. When redirection works with interface loopback, you can use the interface connected to the
tester to test all the interfaces on the board. If only loopback local is configured on the interface and
redirection is not configured or the configured policy is not matched, the system does not forward
packets.
Local loopback can also verify system functions. Take the mirroring function as an example. Due to
limited materials, you can run the loopback local command on the observing interface to monitor the
traffic and verify whether the function takes effect.
• loopback remote
Remote loopback is used for fault diagnosis at the physical layer. You can check physical link quality
through the subcard statistics, interface status, or other parameters.
2022-07-08 665
Feature Description
As shown in the preceding figure, after the interface with loopback remote configured receives a
packet from A, B does not forward the packet based on the destination address. Instead, B directly
returns the packet through another interface (Layer 2 or Layer 3 interface) to A.
The processing on A when A receives the returned packet from B is as follows:
■ If the interface on A is a Layer 3 interface, Ping packets looped back from B is discarded by A
because the destination MAC address is different from the MAC address of the interface on End A.
However, interface statistics exist on the subcard. You can check physical link quality by the Input
and Output fields on the interface.
■ If the interface on A is a Layer 2 interface, the interface cannot successfully transmit Ping packets.
If a tester or other methods are used for A to transmit a packet, A does not check the MAC address
of the packet looped back from B, and instead, A directly forwards the packet based on the MAC
address.
■ If A sends the packet with the MAC address of the peer device as the destination MAC address,
the packet is repeatedly looped back between the two devices.
■ If A sends the packet whose destination MAC address is a broadcast MAC address, the packet
is repeatedly looped back between two devices and is broadcast to the broadcast domain.
This method causes broadcast storms. Therefore, exercise caution when using this method.
Control-Flap
The status of an interface on a device may alternate between up and down for various reasons, including
physical signal interference and incorrect link layer configurations. The changing status causes Multiprotocol
Label Switching (MPLS) and routing protocols to flap. As a result, the device may break down, causing
network interruption. Control-flap controls the frequency of interface status alternations between up and
down to minimize the impact on device and network stability.
The following two control modes are available.
2022-07-08 666
Feature Description
• control-flap
Interface flapping control controls the frequency of interface status alternations between Up and Down
to minimize the impact on device and network stability.
Interface flapping suppression involves the following concepts:
2022-07-08 667
Feature Description
■ Penalty value: This value is calculated based on the status of the interface using the
suppression algorithm. The penalty value increases with the changing times of the interface
status and decreases with the half life.
■ Suppression threshold (suppress): The interface is suppressed when the penalty value is
greater than the suppression threshold.
■ Reuse threshold (reuse): The interface is no longer suppressed when the penalty value is
smaller than the reuse threshold.
■ Ceiling threshold (ceiling): The penalty value no longer increases when the penalty value
reaches the ceiling threshold.
The parameter configuration complies with the following rule: reuse threshold (reuse) <
suppression threshold (suppress) < maximum penalty value (ceiling).
■ Half life
When an interface goes down for the first time, the half life starts. A device matches against the
half life based on the actual interface status. If a specific half life is reached, the penalty value
decreases by half. Once a half life ends, another half life starts.
■ Half life when an interface is up (decay-ok): When the interface is up, if the period since the
end of the previous half life reaches the current half life, the penalty value decreases by half.
■ Half life when an interface is down (decay-ng): When the interface is down, if the period since
the end of the previous half life reaches the current half life, the penalty value decreases by
half.
■ Maximum suppression time: The maximum suppression time of an interface is 30 minutes. When
the period during which an interface is suppressed reaches the maximum suppression time, the
interface is automatically freed from suppression.
■ You can set the preceding parameters to restrict the frequency at which an interface alternates
between up and down.
Configuration Recommendations
Objective
suppress reuse decay-ok decay-ng
2022-07-08 668
Feature Description
Configuration Recommendations
Objective
suppress reuse decay-ok decay-ng
■ If an interface remains up for a long period and the interface needs to be used as soon as it goes
up, decreasing decay-ok is recommended.
■ If an interface remains down for a long period and the interface needs to be suppressed as soon as
it goes down, increasing decay-ng is recommended.
Example:
2022-07-08 669
Feature Description
suppress
2022-07-08 670
Feature Description
reuse
2022-07-08 671
Feature Description
decay-ok and
decay-ng
■ If the penalty value exceeds the interface suppressing threshold, the interface is suppressed. When
the interface is suppressed, the outputs of the display interface, display interface brief, and display
ip interface brief commands show that the protocol status of the interface remains
DOWN(dampening suppressed) and does not change with the physical status.
■ If the penalty value falls below the interface reuse threshold, the interface is freed from
suppression. When the interface is freed from suppression, the protocol status of the interface is in
compliance with the actual status and does not remain Down (dampening suppressed).
■ If the penalty value reaches ceiling, the penalty value no longer increases.
2022-07-08 672
Feature Description
• damp-interface
Related concepts:
■ penalty value: a value calculated by a suppression algorithm based on an interface's flappings. The
suppression algorithm increases the penalty value by a specific value each time an interface goes
down and decreases the penalty value exponentially each time the interface goes up.
■ suppress: An interface is suppressed if the interface's penalty value is greater than the suppress
value.
■ reuse: An interface is no longer suppressed if the interface's penalty value is less than the reuse
value.
■ half-life-period: period that the penalty value takes to decrease to half. A half-life-period begins
to elapse when an interface goes Down for the first time. If a half-life-period elapses, the penalty
value decreases to half, and another half-life-period begins.
■ max-suppress-time: maximum period during which an interface's status is suppressed. After max-
suppress-time elapses, the interface's actual status is reported to upper layer services.
Figure 4 shows the relationship between the preceding parameters. To facilitate understanding, figures
in Figure 4 are all multiplied by 1000.
2022-07-08 673
Feature Description
At t1, an interface goes down, and its penalty value increases by 1000. Then, the interface goes up, and
its penalty value decreases exponentially based on the half-life rule. At t2, the interface goes down
again, and its penalty value increases by 1000, reaching 1600, which has exceeded the suppress value
1500. At this time if the interface goes up again, its status is suppressed. As the interface keeps flapping,
its penalty value keeps increasing until it reaches the ceiling value 10000 at tA. As time goes by, the
penalty value decreases and reaches the reuse value 750 at tB. The interface status is then no longer
suppressed.
Loopback interfaces, Layer 2 interfaces that are converted from Layer 3 interfaces using the portswitch command, and
Null interfaces do not support MTU or control-flap configuration.
DCN serial interface After DCN is enabled globally, a DCN serial interface is automatically created.
Virtual Ethernet (VE) When an L2VPN accesses multiple L3VPNs, VE interfaces are used to
interface terminate the L2VPN for L3VPN access. Because a common VE interface is
bound to only one board, services will be interrupted if the board fails.
2022-07-08 674
Feature Description
Global VE interface When an L2VPN accesses multiple L3VPNs, global VE interfaces are used to
terminate the L2VPN for L3VPN access.
A common VE interface is bound to only one board. If the board fails, services
on the common VE interface will be interrupted. Unlike common VE
interfaces, global VE interfaces support global L2VE and L3VE. Services on
global VE interfaces will not be interrupted if some boards fail.
The loopback function on global VE interfaces works properly even when a
board is powered off or damaged. The loopback process has been optimized
on global VE interfaces to enhance the interface forwarding performance.
Global VE interfaces can be created on a device if the device is powered on.
Flexible Ethernet (FlexE) A physical interface in standard Ethernet mode has fixed bandwidth. However,
interface FlexE technology enables one or more physical interfaces to work in FlexE
mode and adds them to a group. The total bandwidth of this group can be
allocated on demand to logical interfaces in the group. The group to which
physical interfaces are added is referred to as a FlexE group. The logical
interfaces that share bandwidth of the physical interfaces in the FlexE group
are called FlexE interfaces (also referred to as FlexE service interfaces).
FlexE interface bandwidth varies, which allows services to be isolated.
Compared with traditional technologies, FlexE technology permits bit-level
interface bundling, which solves uneven per-flow or per-packet hashing that
challenges traditional trunk technology. In addition, each FlexE interface has a
specific MAC address, and forwarding resources between interfaces are
isolated. This prevents head-of-line blocking (HOL blocking) that occurs when
traditional logical interfaces such as VLAN sub-interfaces are used for
forwarding.
FlexE interface technology especially fits scenarios in which high-performance
interfaces are required for transport, such as mobile bearer, home broadband,
and leased line access. Services of different types are carried on specific FlexE
interfaces, and are assigned specific bandwidth. FlexE technology achieves
service-specific bandwidth control, and meets network slicing requirements in
5G scenarios.
VLAN channelized sub- A channelized interface can strictly isolate interface bandwidth. A VLAN
interface channelized sub-interface is a channelization-enabled sub-interface of an
Ethernet physical interface. Different types of services are carried on different
channelized sub-interfaces. Specific bandwidth values are configured on
channelized sub-interfaces to strictly isolate bandwidth among different
channelized sub-interfaces on the same physical interface. This allows each
service to be assigned specific bandwidth and prevents bandwidth preemption
2022-07-08 675
Feature Description
NOTE:
InLoopback0 interface
An InLoopBack0 interface is a fixed loopback interface that is automatically
created at the system startup.
An InLoopBack0 interface uses the fixed loopback address 127.0.0.1/8 to
receive data packets destined for the host where the InLoopBack0 interface
resides. The loopback address of an InLoopBack0 interface is not advertised.
Null0 interface A Null0 interface, similar to a null device supported in some operating
systems, is automatically created by the system. All data packets sent to a
Null0 interface are discarded.
Therefore, you only need to ensure that the data packets to be filtered out are
forwarded to a Null0 interface without the need of configuring any ACL.
2022-07-08 676
Feature Description
Eth-Trunk interface An Eth-Trunk interface can have multiple physical interfaces bundled to
increase bandwidth, improve reliability, and implement load balancing.
For more information, see Trunk.
VLANIF interface A VLANIF interface belongs to a Layer 3 interface and can be configured with
an IP address. A VLANIF interface that has an IP address configured enables a
Layer 2 device to communicate with a Layer 3 device. Layer 3 switching
combines routing and switching and improves overall network whole
performance. After a Layer 3 switch transmits a data flow using a routing
table, it generates a mapping between a MAC address and IP address. When
the Layer 3 switch receives the same data flow, it transmits the data flow over
Layer 2 instead of Layer 3. The routing table must have correct routing
entries, so that the Layer 3 switch can transmit the data flow for the first
time. A VLANIF interface and a routing protocol must be configured on a
Layer 3 switch to ensure Layer 3 route reachability.
ATM bundle interface An ATM bundle interface is used to forward one type of service from NodeBs
to an RNC over the same PW.
In the scenarios where multiple NodeBs connect to a CSG through E1, CE1, or
CPOS links, each NodeB may have voice, video, and data services, which
require the CSG to create three PVCs for each NodeB. If one PW is used to
2022-07-08 677
Feature Description
transmit one type of service on each NodeB, a large number of PWs must be
configured on the CSG. The growing number of NodeBs and service types
increasingly burdens the CSG. To address this problem, sub-interfaces that
connect NodeBs to the CSG and transmit the same type of service can be
bound to one ATM bundle interface. A PW is then set up on the ATM bundle
interface to transmit the services to the RNC. In this way, each type of service
requires one ATM bundle interface and one PW on a CSG, thereby reducing
the number of PWs, alleviating the burden on the CSG, and improving service
scalability.
Channelized serial Serial interfaces are channelized from E1 or CPOS interfaces to carry PPP
interface services.
The number of a serial interface channelized from an E1 interface is in the
format of E1 interface number:channel set number. For example, the serial
interface channelized from channel set 1 of CE1 2/0/0 is serial 2/0/0:1.
The number of a serial interface channelized from a CPOS interface is in the
format of CPOS interface number/E1 interface number:channel set number.
For example, the serial interface channelized from channel 3 of CPOS 2/0/0's
E1 channel 2 is serial 2/0/0/2:3.
IP-Trunk interface To improve communication capabilities of links, you can bundle multiple POS
interfaces to form an IP-Trunk interface. An IP-Trunk interface obtains the
sum of bandwidths of member interfaces. You can add POS interfaces to an
IP-Trunk interface to increase the bandwidth of the interface. To prevent
traffic congestion, traffic to the same destination can be balanced among
member links of the IP-Trunk interface, not along a single path. You can
configure an IP-Trunk interface to improve link reliability. If one member
interface goes Down, traffic can still be forwarded by the remaining active
member interfaces. An IP-Trunk interface must have HDLC encapsulated as its
link layer protocol.
For more information, see IP-Trunk.
POS-Trunk interface A POS-Trunk interface can have multiple POS interfaces bundled to support
APS. A POS-Trunk interface must have PPP encapsulated as its link layer
protocol.
CPOS-Trunk interface A CPOS-Trunk interface can have multiple CPOS interfaces bundled to support
APS.
Trunk serial interface A trunk serial interface is channelized from a CPOS-Trunk interface to support
APS.
2022-07-08 678
Feature Description
MP-group interface An MP-group interface that has multiple serial interfaces bundled is
exclusively used by MP to increase bandwidth and improve reliability.
For more information, see MP Principles.
Global MP-group interface A protection channel can be configured to take over traffic from one or more
working channels in case the working channels fail, which improves network
reliability. Two CPOS interfaces are added to a CPOS-Trunk interface, which is
then channelized into trunk serial interfaces. A global MP-group interface can
have multiple trunk serial interfaces bundled to carry PPP services. If one
CPOS link fails, the other CPOS link takes over the PPP traffic.
IMA-group interface When users access an ATM network at a rate between T1 and T3 or between
E1 and E3, it is cost-ineffective for the carrier to directly use T3 or E3 lines. In
this situation, an IMA-group interface can have multiple T1 or E1 interfaces
bundled to carry ATM services. The bandwidth of an IMA-group interface is
approximately the total bandwidth of all member interfaces.
For more information, see ATM IMA.
Global IMA-group A protection channel can be configured to take over traffic from one or more
interface working channels in case the working channels fail, which improves network
reliability. Before ATM services are deployed on CPOS interfaces, two CPOS
interfaces must be added to a CPOS-Trunk interface, which is then
channelized into trunk serial interfaces. A global IMA-group interface can
have multiple trunk serial interfaces bundled to carry ATM services. If one
CPOS link fails, the other CPOS link takes over the ATM traffic.
6.2.2.3 FlexE
Definition
Flexible Ethernet (FlexE) is an interface technology that implements service isolation and network slicing on
a bearer network. Based on the standard Ethernet technology defined in IEEE 802.3, FlexE decouples the
MAC layer from the PHY layer by adding a FlexE shim layer between them (for its implementation, see
Figure 1). With FlexE, the one-to-one mapping between MACs and PHYs is not a must any more, and M
MACs can be mapped to N PHYs, thereby implementing flexible rate matching. For example, one 100GE PHY
2022-07-08 679
Feature Description
can be divided into a pool of twenty 5 Gbit/s timeslots, and service interfaces can flexibly apply for separate
bandwidth from this pool.
Purpose
The need for higher mobile bearer bandwidth is increasing as 5G networks continue to evolve. In addition,
customers want a unified network to transmit various services, such as home broadband, private line access,
and mobile bearer services. These factors place increasingly higher requirements on telecommunication
network interfaces.
When standard Ethernet interfaces are used as telecommunication network interfaces, the following issues
exist:
• More flexible bandwidth granularities are not supported: Diverse services and application scenarios
require Ethernet interfaces to provide more flexible bandwidth granularities without being restricted by
the rate ladder (10 Gbit/s–25 Gbit/s–40 Gbit/s–50 Gbit/s–100 Gbit/s–200 Gbit/s–400 Gbit/s) defined by
IEEE 802.3. It may take years for IEEE 802.3 to define a new interface standard, which cannot meet the
requirements of application changes. Furthermore, formulating an interface standard for each
bandwidth requirement is impossible, and therefore other interface solutions are required.
• The Ethernet interface capability of IP devices depends on the capability of optical transmission devices,
and their development is not synchronous: For example, optical transmission devices do not have 25GE
or 50GE interfaces. However, when IP and optical transmission devices are interconnected, the link rate
of the optical transmission device must strictly match the Ethernet rate of the corresponding User-to-
Network Interface (UNI).
• Enhanced QoS capability for multi-service bearing is not supported: Standard Ethernet interfaces
perform scheduling based on QoS packet priorities. As a result, long packets will block the pipe,
increasing the latency of short packets. In this case, services affect each other.
2022-07-08 680
Feature Description
• Supporting more flexible bandwidth granularities: FlexE supports the flexible configuration of interface
rates, which may or may not correspond to the interface rates defined in the existing IEEE 802.3
standard. This meets the requirement for diverse services and application scenarios.
• Decoupling from the capability of optical transmission devices: The Ethernet interface rate of IP devices
is decoupled from the link rate of optical transmission devices, meaning that the link rate of optical
transmission devices does not need to strictly match the Ethernet rate of a UNI. In this way, the existing
optical transmission network (OTN) can be utilized to the maximum extent to support Ethernet
interfaces with new bandwidths.
• Supporting the enhanced QoS capability for multi-service bearing: FlexE provides channelized hardware
isolation on physical-layer interfaces to implement hard slicing for SLA assurance and isolated
bandwidth for services.
FlexE involves three concepts: FlexE client, FlexE shim, and FlexE group.
• FlexE client: corresponds to an externally observed user interface that functions in the same way as
traditional service interfaces on existing IP/Ethernet networks. FlexE clients can be configured flexibly to
meet specific bandwidth requirements. They support Ethernet MAC data streams of various rates
(including 10 Gbit/s, 40 Gbit/s, N x 25 Gbit/s, and even non-standard rates), and the Ethernet MAC data
streams are transmitted to the FlexE shim layer as 64B/66B encoded bit streams.
• FlexE shim: functions as a layer that maps or demaps the FlexE clients carried over a FlexE group. It
decouples the MAC and PHY layers and implements key FlexE functions through calendar timeslot
distribution.
• FlexE group: consists of various Ethernet PHYs defined in IEEE 802.3. By default, the PHY bandwidth is
2022-07-08 681
Feature Description
Bonding
As shown in Figure 1, bonding means that multiple PHYs are bonded to support a higher rate. For example,
two 100GE PHYs can be bonded to provide a MAC rate of 200 Gbit/s.
Figure 1 Bonding
Channelization
As shown in Figure 2, channelization allows multiple low-rate MAC data streams to share one or more PHYs.
For example, channelization allows four MAC data streams (35 Gbit/s, 25 Gbit/s, 20 Gbit/s, and 20 Gbit/s) to
be carried over one 100GE PHY or allows three MAC data streams (150 Gbit/s, 125 Gbit/s, and 25 Gbit/s) to
be carried over three 100GE PHYs.
Figure 2 Channelization
Sub-rating
As shown in Figure 3, sub-rating allows MAC data streams with a single low rate to share one or more PHYs,
2022-07-08 682
Feature Description
and uses a specially defined error control block to reduce the rate. For example, a 100GE PHY carries only 50
Gbit/s MAC data streams.
Sub-rating is a subset of channelization in a certain sense.
Figure 3 Sub-rating
Calendar Mechanism
Figure 2 shows the calendar mechanism of the FlexE shim. Twenty blocks (corresponding to timeslots 0 to
19) are used as a logical unit, and 1023 "twenty blocks" are then used as a calendar component. The
calendar components are distributed in a specified order into timeslots, each of which has a bandwidth
granularity of 5 Gbit/s for data transmission.
In terms of bit streams, each 64B/66B block is carried over a timeslot (basic logical unit carrying the 64B/66B block), as
shown in Figure 2.
2022-07-08 683
Feature Description
FlexE allocates available timeslots in a FlexE group based on bandwidth required by each FlexE client to
form a mapping from the FlexE client to one or more timeslots. In addition, the calendar mechanism is used
to carry one or more FlexE clients in the FlexE group.
The SH is a synchronization header field added after 64B/66B encoding is performed on the data, and its bit
width is 2 bits. If the value is 10, the carried data is a control block; if the value is 01, the carried data is a
2022-07-08 684
Feature Description
data block; if the value is 00 or 11, the field is invalid; and if the value is ss, the synchronization header is
valid and may be 10 or 01.
In an overhead frame, the first overhead block is a control block, the second and third overhead blocks are
data blocks, and the fourth to eighth overhead blocks are allocated to management or synchronization
messaging channels. Table 1 describes the meaning of each field in an overhead frame.
2022-07-08 685
Feature Description
messaging channel).
2022-07-08 686
Feature Description
2022-07-08 687
Feature Description
Synchronous framing involves the SH, 0x4B, 0x5, and OMF fields in the data receive direction, and is used to identify the
first overhead block of the overhead frame. If the SH, 0x4B, and 0x5 fields do not match the expected positions for five
times, the FlexE overhead multiframe is unlocked, indicating that the received 32 overhead frames are not from the
same overhead multiframe. As a result, the restored timeslot information is incorrect. In addition, if the OMF field passes
the CRC, the overhead multiframe is locked when the bit changes from 0 to 1 or from 1 to 0. If an error occurs in a
frame, the overhead multiframe is unlocked.
2022-07-08 688
Feature Description
2022-07-08 689
Feature Description
1. FlexE Client1 on Router1 uses timeslot table A to send packets based on 5 Gbit/s bandwidth.
2. After the bandwidth of FlexE Client1 is adjusted to 10 Gbit/s, Router1 establishes timeslot table B in
the transmit direction and sends a CR message to Router2.
3. After receiving the CR message from Router1, Router2 establishes timeslot table B in the receive
direction and sends a CA message to Router1, indicating that timeslot table B in the receive direction
has been established.
4. After receiving the CA message from Router2, Router1 sends a CCC message for timeslot table
switching to Router2. After this and in the next timeslot period, both Router1 and Router2 use timeslot
table A for packet sending and receiving.
5. Router1 uses timeslot table B to send packets after the next timeslot period after Router1 sends the
CCC message. After receiving an overhead frame that identifies the next timeslot period, Router2 uses
timeslot table B to receive packets.
Similarly, after the bandwidth of FlexE Client1 is adjusted to 10 Gbit/s on Router2, the timeslot table is also
changed to timeslot table B in the receive direction of Router1. In this case, both ends use timeslot table B to
send and receive packets.
2022-07-08 690
Feature Description
The 1 Gbit/s timeslot granularity is a sub-timeslot of a 5 Gbit/s timeslot and takes effect only in the 5 Gbit/s timeslot. If
the bandwidth exceeds 5 Gbit/s, the 5 Gbit/s timeslot granularity is used.
2022-07-08 691
Feature Description
After the interfaces are switched to the FlexE mode, the link connection is automatically added to the
topology of the NMS. In addition, the DCN function is enabled by default to allow the NMS to manage the
devices.
After a standard Ethernet interface is switched to the FlexE mode, the original standard Ethernet interface
disappears. A FlexE client needs to be determined based on the defined rules to carry the bandwidth and
configuration of the original standard Ethernet interface, implementing configuration restoration. As shown
2022-07-08 692
Feature Description
1. The configurations of the standard Ethernet interfaces are saved on the NMS, and the standard
Ethernet interfaces of the upstream and downstream NEs are switched to the FlexE mode.
2. FlexE clients are created based on the defined rules to carry the bandwidth of the original standard
Ethernet interfaces.
3. The configurations of the original standard Ethernet interfaces are restored to the created FlexE
clients.
Figure 2 Configuration restoration after the standard Ethernet interfaces are switched to the FlexE mode
The bandwidth of a FlexE client can be configured in either of the following modes:
• Set the bandwidth of a FlexE client to the bandwidth of an original standard Ethernet interface. For
example, set the bandwidth of a FlexE client to 100 Gbit/s for a 100GE interface. This mode applies to
existing network reconstruction scenarios. Before creating a slice, switch the standard Ethernet interface
2022-07-08 693
Feature Description
to a FlexE client with the same bandwidth. After the reconstruction is complete, adjust the bandwidth of
the FlexE client according to the slice bandwidth requirement and create new slicing interfaces.
• Configure the default slice's bandwidth as the bandwidth of a FlexE client, and reserve other bandwidth
for new slices. For example, set the bandwidth of a FlexE client to 50 Gbit/s as the default slice's
bandwidth for a 100GE interface, and reserve the remaining 50 Gbit/s bandwidth for new slices.
To enable devices to be managed by the NMS when standard Ethernet interfaces are connected to FlexE
physical interfaces, four modes are designed for the FlexE physical interfaces. This implements DCN
communication between the standard Ethernet interfaces and FlexE physical interfaces.
• FlexE_DCN_Auto (Init State): default mode when a board goes online. It indicates that the FlexE mode is
used and services can be configured on FlexE physical interfaces. The underlying forwarding plane can
directly communicate with the peer FlexE physical interface through DCN.
• FlexE_DCN_Auto (ETH State): After the "PCS Link Up && Shim LOF" state is detected, the forwarding
2022-07-08 694
Feature Description
plane is auto-negotiated to the standard Ethernet mode so that this interface can communicate with
the peer interface through DCN.
• FlexE_Lock_Mode: FlexE lock mode. The forwarding plane does not perform mode negotiation to
prevent auto-negotiation exceptions.
• ETH_Mode: standard Ethernet mode, which cannot be auto-negotiated to the FlexE mode.
As shown in Figure 2, when a standard Ethernet interface is connected to a FlexE physical interface, the
initial status of the FlexE physical interface is FlexE_DCN_Auto (Init State) and the FlexE physical interface
starts auto-negotiation. The standard Ethernet interface works in ETH_Mode mode. After the negotiation is
complete, the control and management planes of the FlexE physical interface remain in FlexE mode and
related configurations are retained, but the forwarding plane uses the standard Ethernet mode for
forwarding, implementing the DCN connectivity between the standard Ethernet interface and FlexE physical
interface.
Figure 2 Interconnection mode of a standard Ethernet interface and FlexE physical interface
• OH mode: Clock messages are transmitted using FlexE overhead timeslots. The configuration related to
clock synchronization is the same as the configuration on a standard Ethernet interface.
• Client mode: Clock messages are transmitted using FlexE clients. In this mode, the FlexE interface that
carries clock services must be bound to a FlexE physical interface that has clock services deployed.
The time synchronization modes at the two ends of a FlexE link must be the same (either the OH or Client mode).
2022-07-08 695
Feature Description
1. Each FlexE client is presented to the FlexE shim as a 64B/66B encoded bit stream.
2. A FlexE client is rate-adapted in idle insertion/deletion mode to match the clock of the FlexE group.
The rate of the adapted signal is slightly less than the nominal rate of the FlexE client to allow room
for the alignment markers on the PHYs of the FlexE group and insertion of the FlexE overhead.
3. The 66B blocks from each FlexE client are sequentially distributed and inserted into the calendar.
4. Error control blocks are generated for insertion into unused or unavailable timeslots to ensure that the
data in these timeslots is not considered valid.
5. The control function manages which timeslots each FlexE client is inserted into and inserts the FlexE
overhead on each PHY in the transmit direction.
6. Calendar distribution is responsible for allocating the 66B blocks of different FlexE clients in the
calendar to a sub-calendar according to the TDM timeslot distribution mechanism. The sub-calendar
then schedules the 66B blocks to the corresponding PHYs in the FlexE group in polling mode.
7. The stream of 66B blocks of each PHY is distributed to the PCS lanes of that PHY with the insertion of
alignment markers, and the layers below the PCS continues to be used intact as specified for the
standard Ethernet defined by IEEE 802.3.
2022-07-08 696
Feature Description
1. The lower layers of the PCS of the PHYs are used according to the standard Ethernet defined by IEEE
802.3. The PCS lanes complete operations such as deskewing and alignment marker removal, and
send traffic to the FlexE shim.
2. The calendar logically interleaves the sub-timeslots of each FlexE instance, re-orders them, and
extracts the FlexE overhead.
3. If any PHY in the FlexE group fails or overhead frame/multiframe locking is not implemented for any
FlexE instance, all FlexE clients in the FlexE group generate local faults (LFs).
4. The control function manages the timeslots extracted by each FlexE client from each FlexE instance in
the receive direction.
5. The extracted timeslots are sent to each FlexE client based on the 66B blocks.
6. The rate of a FlexE client is adjusted in idle insertion/deletion mode when necessary, and the stream
of 66B blocks is extracted to the FlexE client at the adaptation rate. Similarly, because the alignment
marker on a PHY of the FlexE group and the FlexE overhead occupy space, the rate of the FlexE client
after the adaptation is slightly lower than the nominal rate of the FlexE client.
2022-07-08 697
Feature Description
Interface groups are classified into permanent and temporary interface groups. Multiple interfaces can be
added to a permanent or temporary interface group to enable batch command configurations for the
interfaces. The differences between permanent and temporary interface groups are described as follows:
• After a user exits the view of a temporary interface group, the system automatically deletes the
temporary interface group. A permanent interface group, however, can be deleted only by using a
command.
• Information about a permanent interface group can be viewed, whereas information about a temporary
interface group cannot.
In the example network shown in Figure 1, ten binding interfaces are located on the network side, and two
track interfaces are located on the user side. You can set a Down weight for each binding interface and a
Down weight threshold for each track interface. For example, the Down weight of each binding interface is
set to 10, and the Down weight thresholds of track interfaces A and B are set to 20 and 80, respectively.
When the number of Down binding interfaces in the interface monitoring group increases to 2, the system
automatically instructs track interface A to go Down. When the number of Down binding interfaces in the
interface monitoring group increases to 8, the system automatically instructs track interface B to go Down.
When the number of Down binding interfaces in the interface monitoring group falls below 8, track interface
B automatically goes Up. When the number of Down binding interfaces in the interface monitoring group
falls below 2, track interface A automatically goes Up.
6.2.3.1 Sub-interface
In the network shown in Figure 1, multiple sub-interfaces are configured on the physical interface of Device.
Like a physical interface, each sub-interface can be configured with one IP address. The IP address of a sub-
interface must be on the same network segment as the IP address of a remote network, and the IP address
of each sub-interface must be on a unique network segment.
2022-07-08 699
Feature Description
Figure 1 GE Sub-interface
With these configurations, a virtual connection is established between a sub-interface and a remote network.
This allows the remote network to communicate with the local sub-interface and consequently communicate
with the local network.
6.2.3.2 Eth-Trunk
In the network shown in Figure 1, an Eth-Trunk that bundles two full-duplex 1000 Mbit/s interfaces is
established between Device A and Device B. The maximum bandwidth of the trunk link is 2000 Mbit/s.
Backup is enabled within the Eth-Trunk. If a link fails, traffic is switched to the other link to ensure link
reliability.
In addition, network congestion can be avoided because traffic between Device A and Device B is balanced
between the two member links.
The application and networking diagram of IP-Trunk are similar to those of Eth-Trunk.
2022-07-08 700
Feature Description
As shown in Figure 1, the traditional LAG technology uses the hash algorithm to distribute data flows to
physical interfaces. As a result, load imbalance occurs and the bandwidth utilization cannot reach 100%. For
example, two 100GE physical interfaces are bonded into a LAG. Assume that there are four groups of data
flows. 80 Gbit/s data flows are hashed to the upper link, and 40 Gbit/s and 30 Gbit/s data flows are hashed
to the lower link. In this case, the bandwidth utilization cannot reach 100%, regardless of whether 50 Gbit/s
data flows are hashed to either the upper or lower link.
2022-07-08 701
Feature Description
Unaware Mode
Optical transmission devices carry mappings according to the bit transparent transmission mechanism, as
shown in Figure 1. The unaware mode applies to scenarios where the Ethernet rate is the same as the
wavelength rate of colored optical modules. It provides FlexE support without requiring hardware upgrades,
fully utilizing legacy optical transmission devices. In addition, it can use FlexE bonding to provide E2E ultra-
high bandwidth channels across OTNs.
2022-07-08 702
Feature Description
Termination Mode
FlexE is terminated on the ingress interfaces of optical transmission devices. The OTN detects FlexE UNIs,
restores the FlexE client data flows, and further maps the data flows to the optical transmission devices for
transmission, as shown in Figure 2. This mode is the same as the mode in which standard Ethernet interfaces
are carried over the OTN, and can implement traffic grooming for different FlexE clients on the OTN.
Aware Mode
The aware mode mainly uses the FlexE sub-rating function and is applicable to scenarios where the single-
wavelength rate of colored optical modules is lower than the rate of an Ethernet interface. As shown in
Figure 3, 150 Gbit/s data flows need to be transmitted between routers and optical transmission devices. In
this case, two 100GE PHYs can be bonded to form a FlexE group. The PHYs in the FlexE group are configured
based on 75% valid timeslots, and the remaining 25% timeslots are filled with special error control blocks to
indicate that they are invalid.
When FlexE UNIs map data flows to the OTN in aware mode, the OTN directly discards invalid timeslots,
extracts the data to be carried based on the bandwidth of the original data flows, and then maps these data
flows to the optical transmission devices with matching rates. The configurations of the optical transmission
devices must be the same as those of the FlexE UNIs, so that the optical transmission devices can detect the
FlexE UNIs for data transmission.
2022-07-08 703
Feature Description
2022-07-08 704
Feature Description
Channelized sub-interfaces are mainly used to isolate downstream traffic on the backbone aggregation
network and downstream traffic on the PE network side.
Improving Reliability
• IP address unnumbered
When an interface will only use an IP address for a short period, it can borrow an IP address from
another interface to save IP address resources. Usually, the interface is configured to borrow a loopback
interface address to remain stable.
• Router ID
Some dynamic routing protocols require that Routers have IDs. A router ID uniquely identifies a Router
in an autonomous system (AS).
If OSPF and BGP are configured with router IDs, the system needs to select the maximum IP address as
the router ID from the local interface IP addresses. If the IP address of a physical interface is selected,
when the physical interface goes Down, the system does not reselect a router ID until the selected IP
address is deleted.
Because a loopback interface is stable and usually up, the IP address of the loopback interface is
recommended as the router ID of the Router.
• BGP
To prevent BGP sessions from being affected by physical interface faults, you can configure a loopback
interface as the source interface that sends BGP packets.
When a loopback interface is used as the source interface of BGP packets, note the following:
■ In the case of an EBGP connection, EBGP is allowed to establish neighbor relationships through
indirectly connected interfaces.
• MPLS LDP
In MPLS LDP, a loopback interface address is often used as the transmission address to ensure network
2022-07-08 705
Feature Description
Classifying information
• SNMP
To ensure the security of servers, a loopback interface address is used as the source IP address rather
than the outbound interface address of SNMP trap messages. In this manner, packets are filtered to
protect the SNMP management system. The system allows only the packets from the loopback interface
address to access the SNMP port. This facilitates reading and writing trap messages.
• NTP
The Network Time Protocol (NTP) synchronizes the time of all devices. NTP specifies a loopback
interface address as the source address of the NTP packets sent from the local Router.
To ensure the security of NTP, NTP specifies a loopback interface address rather than the outbound
interface address as the source address. In this situation, the system allows only the packets from the
loopback interface address to access the NTP port. In this manner, packets are filtered to protect the
NTP system.
• Information recording
During the display of network traffic records, a loopback interface address can be specified as the
source IP address of the network traffic to be output.
In this manner, packets are filtered to facilitate network traffic collection. This is because only the
packets from the loopback interface address can access the specified port.
• Security
Identifying the source IP address of logs on the user log server helps to locate the source of the logs
rapidly. It is recommended that you configure a loopback address as the source IP address of log
messages.
• HWTACACS
After Huawei Terminal Access Controller Access Control System (HWTACACS) is configured, the packets
sent from the local Router use the loopback address as the source address. In this manner, packets are
filtered to protect the HWTACACS server.
This is because only the packets sent from the loopback interface address can access the HWTACACS
server. This facilitates reading and writing logs. There are only loopback interface addresses rather than
outbound interface addresses in HWTACACS logs.
• RADIUS authentication
During the configuration of a RADIUS server, a loopback interface address is specified as the source IP
address of the packets sent from the Router.
This ensures the security of the server. In this situation, packets are filtered to protect the RADIUS server
and RADIUS agent. This is because only the packets from a loopback interface address can access the
port of the RADIUS server. This facilitates reading and writing logs. There are only loopback interface
addresses rather than outbound interface addresses in RADIUS logs.
2022-07-08 706
Feature Description
• Loop prevention
The Null0 interface is typically used to prevent routing loops. For example, during route aggregation, a
route to the Null0 interface is always created.
In the example network shown in Figure 1, DeviceA provides access services for multiple remote nodes.
DeviceA is the gateway of the local network that uses the Class B network segment address
172.16.0.0/16. DeviceA connects to three subnets through DeviceB, DeviceC, and DeviceD, respectively.
Figure 1 Example for using the Null0 interface to prevent routing loops
If RouterDeviceE on the ISP network receives a packet with the destination address on the network
segment 172.16.10.0/24, it forwards the packet to DeviceA.
If the destination address of the packet does not belong to the network segment to which DeviceB,
DeviceC, or DeviceD is connected, DeviceA searches the routing table for the default route, and then
2022-07-08 707
Feature Description
• Traffic filtering
The Null0 interface provides an optional method for filtering traffic. Unnecessary packets are sent to the
Null0 interface to avoid using an Access Control List (ACL).
Both the Null0 interface and ACL can be used to filter traffic as follows.
■ Before the ACL can be used, ACL rules must be configured and then applied to an interface. When
a Router receives a packet, it searches the ACL.
■ If the action is permit, the Router searches the forwarding table and then determines whether
to forward or discard the packet.
■ The Null0 interface must be specified as the outbound interface of unnecessary packets. When a
Router receives a packet, it searches the forwarding table. If the Router finds that the outbound
interface of the packet is the Null0 interface, it discards the packet.
Using a Null0 interface to filter traffic is more efficient and faster than using an ACL. For example, if
you do not want a Router to accept packets with a specified destination address, use the Null0 interface
for packet filtering. This only requires a route to be configured. Using an ACL for packet filtering
requires an ACL rule to be configured and then applied to the corresponding interface on a Router.
However, the Null0 interface can filter only Router-based traffic, whereas an ACL can filter both Router-
based and interface-based traffic. For example, if you do not want Serial 1/0/0 on a Router to accept
traffic with the destination address 172.18.0.0/16, you can only configure an ACL rule and then apply it
to Serial 1/0/0 for traffic filtering.
Different encapsulation modes can be configured for tunnel interfaces depending on the utilities of the
interfaces. The following table lists the types of tunnels for which tunnel interfaces can be created.
IPv6 over IPv4 tunnel IPv4 An IPv6 over IPv4 manual tunnel
is manually configured between
the two border routing devices.
The source and destination IPv4
addresses of the tunnel need to
be statically specified. A manual
tunnel can be used for
communication between IPv6
networks and can also be
configured between a border
routing device and host. A manual
tunnel offers the point-to-point
service.
2022-07-08 709
Feature Description
IPv4 over IPv6 tunnel IPv6 An IPv4 over IPv6 manual tunnel
is manually configured between
the two border routing devices.
The source and destination IPv6
addresses of the tunnel need to
be statically specified. A manual
tunnel can be used for
communication between IPv4
networks and can also be
configured between a border
routing device and host. A manual
tunnel offers the point-to-point
service.
2022-07-08 710
Feature Description
command to the interface group. When you run the configuration command in the interface group view, the
system automatically executes this command on all the member interfaces in the interface group,
implementing batch configuration. As shown in Figure 1, a large number of interfaces exist on the
aggregation and access switches. Many of these interfaces have the same configurations. If they are
configured separately, the management cost is high. Therefore, you can create interface groups on these
switches.
2022-07-08 711
Feature Description
To resolve this problem, you can configure an interface monitoring group and add multiple network-side
interfaces on the PEs to the interface monitoring group. When a link failure occurs on the network side and
the interface monitoring group detects that the status of a certain proportion of network-side interfaces
changes, the system instructs the user-side interfaces associated with the interface monitoring group to
change their status accordingly and allows traffic to be switched between the master and backup links.
Therefore, the interface monitoring group can be used to prevent traffic overloads or interruptions.
2022-07-08 712
Feature Description
Definition
Currently, carrier-class networks require high reliability for IP devices. As such, devices on the networks are
required to rapidly detect faults. After fast detection is enabled on an interface, the alarm reporting speed is
accelerated. As a result, the physical status of the interface frequently alternates between up and down,
causing frequent network flapping. Therefore, alarms must be filtered and suppressed to prevent frequent
network flapping.
Transmission alarm suppression can efficiently filter and suppress alarm signals to prevent interfaces from
frequently flapping. In addition, transmission alarm customization can control the impact of alarms on the
interface status.
Transmission alarm customization and suppression provide the following functions:
• Transmission alarm customization allows you to specify alarms that can cause the physical status of an
interface to change. This function helps filter out unwanted alarms.
• Transmission alarm suppression allows you to suppress frequent network flapping by setting thresholds
and using a series of algorithms.
Purpose
Transmission alarm customization allows you to filter unwanted alarms, and transmission alarm suppression
enables you to set thresholds on customized alarms, allowing devices to ignore burrs generated during
transmission link protection and preventing frequent network flapping.
On a backbone or metro network, IP devices are connected to transmission devices, such as Synchronous
Digital Hierarchy (SDH), Wavelength Division Multiplexing (WDM), or Synchronous Optical Network
(SONET) devices. If a transmission device becomes faulty, the interconnected IP device receives an alarm.
The transmission devices then perform a link switchover. After the link of the transmission device recovers,
the transmission device sends a clear alarm to the IP device. After an alarm is generated, a link switchover
lasts 50 ms to 200 ms. In the log information on IP devices, the transmission alarms are displayed as burrs
that last 50 ms to 200 ms. These burrs will cause the interface status of IP devices to switch frequently. IP
devices will perform route calculation frequently. As a result, routes flap frequently, affecting the
performance of IP devices.
From the perspective of the entire network, IP devices are expected to ignore such burrs. That is, IP devices
must customize and suppress the alarms that are generated during transmission device maintenance or link
switchovers. This can prevent route flapping. Transmission alarm customization can control the impact of
transmission alarms on the physical status of interfaces. Transmission alarm suppression can efficiently filter
and suppress specific alarm signals to avoid frequent interface flapping.
Network Flapping
Network flapping occurs when the physical status of interfaces on a network frequently alternates between
Up and Down.
Alarm Burrs
An alarm burr is a process in which alarm generation and alarm clearance signals are received in a short
period (The period varies with specific usage scenarios, devices, or service types).
For example, if a loss of signal (LOS) alarm is cleared 50 ms after it is generated, the process from the alarm
generation to clearance is an alarm burr.
Alarm Flapping
Alarm flapping is a process in which an alarm is repeatedly generated and cleared in a short period (The
period varies with specific usage scenarios, devices, or service types).
For example, if an LOS alarm is generated and cleared 10 times in 1s, alarm flapping occurs.
• penalty: penalty value. Each time an interface receives an alarm generation signal. Each time an
interface receives an alarm clearance signal, the figure of merit value decreases exponentially.
• suppress: alarm suppression threshold. When the figure of merit value exceeds this threshold, alarms
are suppressed. This value must be smaller than the ceiling value and greater than the reuse value.
• ceiling: maximum value of figure of merit. When an alarm is repeatedly generated and cleared in a
short period, figure of merit significantly increases and, therefore, takes a long time to return to reuse.
To avoid long delays returning to reuse, a ceiling value can be set to limit the maximum value of
figure of merit. The figure of merit value does not increase when it reaches the ceiling value.
• reuse: alarm reuse threshold. When this value is greater than that of figure of merit, alarms are not
suppressed. This value must be smaller than the suppress value.
• decay-ok: time used by figure of merit to decrease to half when an alarm clearance signal is received.
• decay-ng: time used by figure of merit to decrease to half when an alarm generation signal is
received.
2022-07-08 714
Feature Description
1. After a transmission device generates alarms, it determines whether to report the alarms to its
connected IP device based on the alarm types.
• If the alarms are b3tca, sdbere, or sfbere, the transmission device determines whether the alarm
threshold is reached.
If the threshold is reached, the transmission device reports the alarms to the IP devices for
processing.
If the threshold is not reached, the transmission device ignores these alarms.
• For all other alarms, they are directly reported to the IP device for processing.
2. If the recording function is enabled on the IP device, the alarms are recorded.
3. The IP device determines whether to change the physical status of the interface based on customized
alarm types.
• If no alarm types are customized to affect the physical status of the interface, these alarms are
ignored. The physical status of the interface remains unchanged.
• If an alarm type is customized to affect the physical status of the interface, the alarm is processed
based on the transmission alarm customization mechanism.
• When a certain type of alarms is customized to affect the interface status but transmission alarm
filtering or suppression is not configured:
■ The physical status of the interface changes to Down if such an alarm is generated
• If a certain type of alarms is customized to affect the interface status and transmission alarm filtering or
suppression is configured, the IP device processes the alarm according to the filtering mechanism or
suppression parameters.
• If the alarm signal is a burr, it is ignored. The physical status of the interface remains unchanged.
■ The physical status of the interface changes to Down if the signal is an alarm generation signal.
2022-07-08 715
Feature Description
■ The physical status of the interface changes to Up if the signal is an alarm clearance signal that is
not suppressed.
■ If no alarm generation or clearance signal is received, figure of merit decreases with time.
■ If an alarm generation signal is received, the physical status of the interface changes to Down, and
figure of merit increases by the penalty value.
■ If an alarm clearance signal is received, the physical status of the interface changes to Up, and
figure of merit decreases exponentially.
• When an alarm's figure of merit reaches suppress, this alarm is suppressed. The generation or
clearance signal of this alarm does not affect the physical status of the interface.
• When an alarm is frequently generated, figure of merit reaches ceiling. It does not increase even if
new alarm signals arrive. If no alarm signals arrive, figure of merit decreases with time.
• When an alarm's figure of merit decreases to reuse, this alarm is free from suppression.
After the alarm is free from suppression, the process repeats if this alarm is generated again.
Figure 1 shows the correlation between a transmission device sending alarm generation signals and how
figure of merit increases and decreases.
1. At t1 and t2, figure of merit is smaller than suppress. Therefore, alarm signals generated at t1 and t2
affect the physical status of the interface, and the physical status of the interface changes to Down.
2022-07-08 716
Feature Description
2. At t3, figure of merit exceeds suppress, and the alarm is suppressed. The physical status of the
interface is not affected, even if new alarm signals arrive.
3. At t4, figure of merit reaches ceiling. If new alarm signals arrive, figure of merit is recalculated but
does not exceed ceiling.
4. At t5, figure of merit falls below reuse, and the alarm is free from suppression.
Terms
None
2022-07-08 717
Feature Description
Purpose
This document describes the LAN Access and MAN Access features in terms of its overview, principle, and
applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
2022-07-08 718
Feature Description
password to "%^%#". This causes the password to be displayed directly in the configuration file.
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
2022-07-08 719
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
2022-07-08 720
Feature Description
in earlier issues.
Overview
Ethernet technology originated from an experimental network on which multiple PCs were connected at 3
Mbit/s. In general, Ethernet refers to a standard connection for 10 Mbit/s Ethernet networks. The Digital
Equipment Corporation (DEC), Intel, and Xerox joined efforts to develop and then issue Ethernet technology
in 1982. The IEEE 802.3 standard is based on and compatible with the Ethernet standard.
In TCP/IP, the encapsulation format of IP packets of Ethernet and the IEEE 802.3 network is defined in RFC
standard. Currently, the most commonly-used encapsulation format is Ethernet_II which is also called
Ethernet DIX.
To distinguish Ethernet frames of these two types, in this document Ethernet frames of Ethernet are called Ethernet_II
frames; Ethernet frames of IEEE802.3 network are called IEEE 802.3 frames.
Purpose
Ethernet and token ring networks are typical local area network (LANs).
Ethernet has become the most important LAN networking technology because it is flexible, simple, and easy
to implement.
• Shared Ethernet
Initially, Ethernet networks were shared networks with 10M Ethernet technology. Ethernet networks
were constructed with coaxial cables, and computers and terminals were connected through intricate
connectors. This structure is complex and only suitable for communications in half-duplex mode
because only one line exists.
In 1990, 10BASE-T Ethernet based on twisted pair cables emerged. In this technology, terminals are
connected to a hub through twisted pair cables and communicate through a shared bus in the hub. The
structure is physically a star topology. CSMA/CD is still used because inside the hub, all terminals are
connected to a shared bus.
2022-07-08 721
Feature Description
All the hosts are connected to a coaxial cable in a similar manner. When a large number of hosts exist,
the following problems arise:
• 100M Ethernet
100M Ethernet works at a higher rate (10 times the rate of 10M Ethernet) and differs from 10M
Ethernet in the following ways:
■ Network type: 10M Ethernet supports only a shared Ethernet, while 100M Ethernet is a 10M/100M
auto-sensing Ethernet and can work in half-duplex or full-duplex mode.
■ Negotiation mechanism: 10M Ethernet uses Normal Link Pulses (NLPs) to detect the link
connection status, while 100M Ethernet uses auto-negotiation between two link ends.
• 10BASE-2
• 10BASE-5
• 10BASE-T
• 10BASE-F
• 100BASE-T4
2022-07-08 722
Feature Description
• 100BASE-TX
• 100BASE-FX
• 1000BASE-SX
• 1000BASE-LX
• 1000BASE-TX
In these cabling standards, 10, 100, and 1000 represent the transmission rate (in Mbit/s), and BASE
represents baseband.
The greatest limitation of coaxial cable is that devices on the cable are connected in series, so a single
point of failure (SPOF) may cause a breakdown of the entire network. As a result, the physical
standards of coaxial cables, 10BASE-2 and 10BASE-5, have fallen into disuse.
2022-07-08 723
Feature Description
10Base-T and 100Base-TX have different transmission rates, but both apply to Category 5 twisted pair
cables. 10Base-T transmits data at 10 Mbit/s, while 100Base-TX transmits data at 100 Mbit/s.
100Base-T4 is now rarely used.
Using Gigabit Ethernet technology, you can upgrade an existing Fast Ethernet network from 100 Mbit/s
to 1000 Mbit/s.
Gigabit Ethernet uses 8B10B coding at the physical layer. In traditional Ethernet transmission
technologies, the data link layer delivers 8-bit data sets to the physical layer, where they are processed
and sent still as 8 bits to the physical link for transmission.
In contrast, on the optical fiber-based Gigabit Ethernet, the physical layer maps the 8-bit data sets
transmitted from the data link layer to 10-bit data sets before sending them out.
2022-07-08 724
Feature Description
The development of 10GE is well under way, and will be widely deployed in future.
CSMA/CD
• Concept of CSMA/CD
Ethernet was originally designed to connect stations, such as computers and peripherals, on a shared
physical line. However, the stations can only access the shared line in half-duplex mode. Therefore, a
mechanism of collision detection and avoidance is required to enable multiple devices to share the
same line in way that gives each device fair access. Carrier Sense Multiple Access with Collision
Detection (CSMA/CD) was therefore introduced.
The concept of CSMA/CD is as follows:
2022-07-08 725
Feature Description
The stations stop transmitting after sensing the conflict, and then resume transmission after a
random delay time.
• If the line is in use, the station waits until the line is idle.
2. If two stations send data at the same time, a conflict occurs on the line, and the signal becomes
unstable.
• Half-duplex mode
Half-duplex mode has the following features:
2022-07-08 726
Feature Description
• Full-duplex mode
After Layer 2 switches replace hubs, the shared Ethernet changes to the switched Ethernet, and the
half-duplex mode is replaced by the full-duplex mode. As a result, the transmission rate of data frames
increases significantly, with the maximum throughput doubled.
The full-duplex mode fundamentally solves the problem of collisions on Ethernets and eliminates the
need for CSMA/CD.
Full-duplex mode has the following features:
Except for hubs, all network cards, Layer 2 switches, and Routers produced in the past 10 years support
full-duplex mode.
Full-duplex mode has the following requirements:
■ Physical media over which sending and receiving frames are separated
■ Point-to-point connection
Ethernet Auto-Negotiation
• Purpose of auto-negotiation
As the earlier Ethernet utilizes a 10 Mbit/s half-duplex mode, mechanisms such as CSMA/CD are
required to guarantee system stability. As technology developed, the full-duplex mode and 100M
Ethernet have emerged in succession, both of which have significantly improved Ethernet performance.
However, they have also introduced an entirely new problem: achieving compatibility between older
and newer Ethernet networks.
The auto-negotiation technology has been introduced to solve this problem. With auto-negotiation, the
device at each end of a physical link chooses the same operation parameters by exchanging
information. The main parameters to be automatically negotiated include mode (half-duplex or full-
duplex), rate, and flow control. Once negotiation completes, the devices operate in the agreed mode
and rate.
• Principle of auto-negotiation
Auto-negotiation is based on a bottom-layer mechanism of twisted-pair Ethernets, and applies only to
such Ethernets.
2022-07-08 727
Feature Description
When data is not transmitted over a twisted pair cable, the cable does not remain idle. Instead, it
continues transmitting low frequency pulse signals, and any Ethernet adapter with interfaces for twisted
pair cables can identify these pulses. The device at each end can also identify lower frequency pulses —
referred to as fast link pulses (FLPs) — after such pulses have been inserted. In this way, the devices
achieve auto-negotiation by using FLPs to transmit a small amount of data. Figure 1 shows the pulse
insertion process.
Auto-negotiation priorities of the Ethernet duplex link are listed as follows in descending order:
■ 1000M full-duplex
■ 1000M half-duplex
■ 100M full-duplex
■ 100M half-duplex
■ 10M full-duplex
■ 10M half-duplex
If auto-negotiation succeeds, the Ethernet card activates the link. Then, data can be transmitted over it.
If auto-negotiation fails, the link is inaccessible.
Auto-negotiation is implemented at the physical layer and does not require any data packets or have
impact on upper-layer protocols.
Two connected interfaces can communicate with each other only when they are in the same working
mode.
■ If both interfaces work in the same non-auto-negotiation mode, the interfaces can communicate.
■ If both interfaces work in auto-negotiation mode, the interfaces can communicate through
negotiation. The negotiated working mode depends on the interface with lower capability.
Specifically, if one interface works in full-duplex mode and the other interface works in half-duplex
mode, the negotiated working mode is half-duplex. The auto-negotiation function also allows the
interfaces to negotiate the use of the traffic control function.
■ If a local interface works in auto-negotiation mode and the remote interface works in a non-auto-
negotiation mode, the negotiated working mode of the local interface depends on the working
mode of the remote interface.
Table 5 describes the auto-negotiation rules for interfaces of the same type.
2022-07-08 728
Feature Description
Table 5 Auto-negotiation rules for interfaces of the same type (local interface working in auto-
negotiation mode)
2022-07-08 729
Feature Description
According to the auto-negotiation rules described in Table 5 and Table 6, if an interface works in
auto-negotiation mode and the connected interface works in a non-auto-negotiation mode,
packets may be dropped or auto-negotiation may fail. It is recommended that you configure two
connected interfaces to work in the same mode to ensure that they can communicate properly.
FE and higher-rate optical interfaces only support full-duplex mode. Auto-negotiation is enabled on
2022-07-08 730
Feature Description
GE optical interfaces only for the negotiation of flow control only. When devices are directly
connected using GE optical interfaces, auto-negotiation is enabled on the optical interfaces to
detect unidirectional optical fiber faults. If one of two optical fibers is faulty, the fault information
is synchronized on both ends through auto-negotiation. As a result, interfaces on both ends go
Down. After the fault is rectified, the interfaces go Up again through auto-negotiation.
HUB
• Hub principle
When terminals are connected through twisted pair cables, a convergence device called a hub is
required. Hubs operate at the physical layer. Figure 2 shows a hub operation model.
A hub is configured as a box with multiple interfaces, each of which can connect to a terminal.
Therefore, multiple devices can be connected through a hub to form a star topology.
Note that although the physical topology is a star, the hub uses bus and CSMA/CD technologies.
■ Category-II hub: provides interfaces of different types. For example, a Category-II hub can provide
both Category-5 twisted pair interfaces and optical fiber interfaces.
2022-07-08 731
Feature Description
Aside from the interface provision, these hub types have no differences in their internal operation.
In practice, Category-I hubs are commonly used.
• Data is sent in full-duplex mode without having to detect if the line is idle.
Duplex mode, either half or full, refers to the operation mode of the physical layer. Access mode refers to
the access of the data link layer. Therefore, in the Ethernet, the data link layer and physical layer are
associated.
Therefore, different access modes are required for different operation modes. This brings about some
inconvenience to the design and application of the Ethernet.
Some organizations and vendors have proposed dividing the data link layer into two sub-layers: the Logical
Link Control (LLC) sub-layer and the Media Access Control (MAC) sub-layer. Then, different physical layers
correspond to different MAC sub-layers, and the LLC sub-layer becomes totally independent, as shown in
Figure 1.
MAC Sub-layer
• Functions of the MAC sub-layer
The MAC sub-layer is responsible for the following:
■ Transmitting data over the data link layer. After receiving data from the LLC sub-layer, the MAC
sub-layer adds the MAC address and control information to the data, and then transfers the data
to the physical link. During this process, the MAC sub-layer provides other functions, such as the
check function.
2022-07-08 732
Feature Description
The two types of MAC are integrated in a network interface card. After the network interface card is
initialized, auto-negotiation is performed to choose an operation mode, and then a MAC is chosen
according to the operation mode.
2022-07-08 733
Feature Description
3. The MAC sub-layer adds the destination and source MAC addresses to the data, calculates the
length of the data frame, and forms Ethernet frames.
4. The Ethernet frame is sent to the peer according to the destination MAC address.
5. The peer compares the destination MAC address with entries in the MAC address table.
The preceding describes frame transmission in unicast mode. After an upper-layer application is added
to a multicast group, the data link layer generates a multicast MAC address according to the
application, and then adds the multicast MAC address to the MAC address table. The MAC sub-layer
then receives frames with the multicast MAC address and transmits the frames to the upper layer.
■ DMAC
Indicates the destination MAC address, which specifies the receiver of the frame.
■ SMAC
Indicates the source MAC address, which specifies the sender of the frame.
■ Type
The 2-byte Type field identifies the upper layer protocol of the Data field. The receiver can interpret
the meaning of the Data field according to the Type field.
Multiple protocols can coexist on a local area network (LAN). The hexadecimal values in the Type
field of an Ethernet_II frame specify different protocols.
■ Frames with the Type field value 0806 are Address Resolution Protocol (ARP) frames.
■ Frames with the Type field value 0835 are Reverse Address Resolution Protocol (RARP) frames.
■ Frames with the Type field value 8137 are Internetwork Packet Exchange (IPx) and Sequenced
Packet Exchange (SPx) frames.
■ Data
The minimum length of the Data field is 46 bytes, which ensures that the frame is at least 64 bytes
2022-07-08 734
Feature Description
in length. A 46-byte Data field is required even if a station transmits 1 byte of data.
If the payload of the Data field is less than 46 bytes, the Data field must be padded to 46 bytes.
The maximum length of the Data field is 1500 bytes.
■ CRC
The Cyclic Redundancy Check (CRC) field provides an error detection mechanism.
Each sending device calculates a CRC code from the DMAC, SMAC, Type, and Data fields. Then the
CRC code is filled into the 4-byte CRC field.
As shown in Figure 3, the format of an IEEE 802.3 frame is similar to that of an Ethernet_II frame. In an
IEEE 802.3 frame, however, the Type field is changed to the Length field, and the LLC field and Sub-
Network Access Protocol (SNAP) field occupy 8 bytes of the Data field.
■ Length
The Length field specifies the number of bytes of the Data field.
■ LLC
The LLC field consists of three sub-fields: Destination Service Access Point (DSAP), Source Service
Access Point (SSAP), and Control.
■ SNAP
The SNAP field consists of the Org Code field and Type field. Three bytes of the Org Code field are
all 0s. The Type field functions the same as that in Ethernet_II frames.
■ If DSAP and SSAP are both 0xff, the IEEE 802.3 frame becomes a NetWare-Ethernet frame bearing
NetWare data.
■ If DSAP and SSAP are both 0xaa, the IEEE 802.3 frame becomes an Ethernet_SNAP frame.
Ethernet_SNAP frames can encapsulate the data of multiple protocols. The SNAP can be considered
as an extension of the Ethernet protocol. SNAP allows vendors to invent their own Ethernet
transmission protocols.
The Ethernet_SNAP standard is defined by IEEE 802.1 to help ensure compatibility between the
operations between IEEE 802.3 LANs and Ethernet networks.
2022-07-08 735
Feature Description
• Jumbo frames
Jumbo frames are Ethernet frames of greater length complying with vendor standards. Such frames are
dedicated to Gigabit Ethernet.
Jumbo frames carry more than 1518 bytes of payload. Generally, Ethernet frames carry a maximum
payload of 1518 bytes. Therefore, to implement transmission of large-sized datagrams at the IP layer,
datagram fragmentation is required to transmit the data within an Ethernet frame. A frame header and
a framer trailer are added to each frame during frame transmission. Therefore, to reduce network costs
and improve network usage and transmission rate, Jumbo frames are introduced.
The two Ethernet interfaces that need to communicate must both support jumbo frames so that NE40Es
can merge several standard-sized Ethernet frames into a jumbo frame to improve transmission
efficiency.
The default value of the Jumbo frame is 10000 bytes.
LLC Sub-layer
As described, the MAC sub-layer supports IEEE 802.3 frames and Ethernet_II frames. In an Ethernet_II frame,
the Type field identifies the upper layer protocol. Therefore, on a device, the LLC sub-layer is not needed and
only the MAC sub-layer is required.
In an IEEE 802.3 frame, useful features are defined at the LLC sub-layer in addition to the traditional services
of the data link layer. These features are specified by the sub-fields of DSAP, SSAP, and Control.
Networks can support the following types of point-to-point services:
• Connection-less service
Currently, the Ethernet implements this service.
• Connection-oriented service
The connection is set up before data is transmitted. The reliability of the data transmission is ensured.
The following is an example describing the application of SSAP and DSAP with terminals A and B that use
connection-oriented services. Data is transmitted using the following process:
2. After receiving the frame, if B has enough resources, B returns an acknowledgment message that
contains a Service Access Point (SAP). The SAP identifies the connection required by A.
3. After receiving the acknowledgment message, A knows that B has set up a local connection between
them. After creating a SAP, A sends a message containing the SAP to B. The connection is set up.
4. The LLC sub-layer of A encapsulates the data into a frame. The DSAP field is filled in with the SAP
sent by B; the SSAP field is filled in with that created by A. Then the LLC sub-layer of A transfers the
data to its MAC sub-layer.
2022-07-08 736
Feature Description
5. The MAC sub-layer of A adds the MAC address and Length field to the frame, and then transfers the
frame to the data link layer.
6. After the frame is received at the MAC sub-layer of B, the frame is transferred to the LLC sub-layer.
The LLC sub-layer identifies the connection that the frame belongs to according to the DSAP field.
7. After checking and acknowledging the frame based on the connection type, the LLC sub-layer of B
transfers the frame to the upper layer.
8. After the frame reaches its destination, A sends B a frame instructing B to release the connection. At
this time, the communications end.
2022-07-08 737
Feature Description
Definition
Trunk is a technology that bundles multiple physical interfaces into a single logical interface. This logical
interface is called a trunk interface, and each bundled physical interface is called a member interface.
Trunk technology helps increase bandwidth, enhance reliability, and carry out load balancing.
Purpose
Without trunk technology, the transmission rate between two network devices connected by a 100 Mbit/s
Ethernet twisted pair cable can only reach 100 Mbit/s. To obtain a higher transmission rate, you must
change the transmission media or upgrade the network to a Gigabit Ethernet, which is costly to small- and
medium- sized enterprises and schools.
Trunk technology provides an economical solution. For example, a trunk interface with three 100 Mbit/s
member interfaces working in full-duplex mode can provide a maximum bandwidth of 300 Mbit/s.
Both Ethernet interfaces and Packet over SONET/SDH (POS) interfaces can be bundled into a trunk
interface. These two types of interfaces, however, cannot be member interfaces of the same trunk interface.
The reasons are as follows:
• Ethernet interfaces apply to a broadcast network where packets are sent to all devices on the network.
• POS interfaces apply to a P2P network, because the link layer protocol of POS interfaces is High-level
Data Link Control (HDLC), which is a point-to-point (P2P) protocol.
Benefits
This feature offers the following benefits:
• Increased bandwidth
2022-07-08 738
Feature Description
A trunk link can be considered as a point-to-point link. The devices on the end the link can be both Routers
or switches, or a Router on one end and a switch on the other.
A trunk has the following advantages:
• Greater bandwidth
The total bandwidth of a trunk interface equals the sum of the bandwidth of all its member interfaces.
In this manner, the interface bandwidth is multiplied.
• Higher reliability
If a member interface fails, traffic on the faulty link is then switched to an available member link. This
ensures higher reliability for the entire trunk link.
• Load balancing
Load balancing can be carried out on a trunk interface, which distributes traffic among its member
interfaces and then transmits the traffic through the member links to the same destination. This
prevents network congestion that occurs when all traffic is transmitted over one link.
• Parameters of the member physical interfaces on both ends of a trunk link must be consistent. The
parameters include:
2022-07-08 739
Feature Description
link.
After the datagram forwarding mechanism is introduced, frames are transmitted in either of the
following manners:
■ Frames with the same source and destination MAC addresses are transmitted over the same
physical link.
■ Frames with the same source and destination IP addresses are transmitted over the same physical
link.
• Assignment of IP addresses
As shown in Figure 1, two devices are directly connected through three interfaces, and the three interfaces
are bundled into an Eth-Trunk interface on each end of the trunk link. If the bandwidth of each interface is 1
Gbit/s, the bandwidth of the Eth-Trunk interface is 3 Gbit/s. If the Eth-Trunk interface has two Up member
interfaces, its bandwidth is reduced to 2 Gbit/s.
You can set the following thresholds to stabilize an Eth-Trunk interface's status and bandwidth as well as
reduce the impact brought by frequent changes of member link status.
2022-07-08 740
Feature Description
When the number of member links in the Up state is smaller than the lower threshold, the Eth-Trunk
interface goes Down. This ensures the minimum available bandwidth of an Up trunk link.
For example, if an Eth-Trunk interface needs to provide a minimum bandwidth of 2 Gbit/s and each
member link can provide 1 Gbit/s bandwidth, the lower threshold must be set to 2 or a larger value. If
one or no member links are in the Up state, the Eth-Trunk interface goes Down.
MAC address
Each station or server connected to an Ethernet interface of a device has its own MAC address. The MAC
address table on the device records information about the MAC addresses of connected devices.
When a Layer 3 router is connected to a Layer 2 switch through two Eth-Trunk links for different services, if
2022-07-08 741
Feature Description
both Eth-Trunk interfaces on the router adopt the default system MAC address, the system MAC address is
learned by the switch and alternates between the two Eth-Trunk interfaces. In this case, a loop probably
occurs between the two devices. To prevent loops, you can change the MAC address of an Eth-Trunk
interface by using the mac-address command. By configuring the source and destination MAC addresses for
two Eth-Trunk links, you can guarantee the normal transmission of service data flows and improve the
network reliability.
After the MAC address of an Eth-Trunk interface is changed, the device sends gratuitous ARP packets to
update the mapping relationship between MAC addresses and ports.
MTU
Generally, the IP layer controls the maximum length of frames that are sent each time. Each time the IP
layer receives an IP packet to be sent, it checks which local interface the packet needs to be sent to and
queries the MTU of the interface. Then, the IP layer compares the MTU with the packet length to be sent. If
the packet length is greater than the MTU, the IP layer fragments the packet to ensure that the length of
each fragment is smaller or equal to the MTU.
If forcible unfragmentation is configured, certain packets are lost during data transmission at the IP layer. To
ensure jumbo packets are not dropped during transmission, you need to configure forcible fragmentation.
Generally, it is recommended that you adopt the default MTU value of 1500 bytes. If you need to change the
MTU of an Eth-Trunk interface, you need to change the MTU of the peer Eth-Trunk interface to ensure that
the MTUs of both interfaces are the same. Otherwise, services may be interrupted.
2022-07-08 742
Feature Description
Basic Concepts
• Link aggregation
Link aggregation is a method of bundling several physical interfaces into a logical interface to increase
bandwidth and reliability.
2022-07-08 743
Feature Description
and then active member interfaces are selected according to the configuration of the Eth-Trunk
interface on the Actor. In static LACP mode, the active interfaces selected by devices must be consistent
at both ends; otherwise, the LAG cannot be set up. To ensure the consistency of the active interfaces
selected at both ends, you can set a higher priority for one end. Then the other end can select the active
interfaces accordingly.
If neither of the devices at the two ends of an Eth-Trunk link is configured with the system priority, the
devices adopt the default value. In this case, the Actor is selected according to the system ID. That is,
the device with the smaller system ID becomes the Actor.
A member interface of an Eth-Trunk interface in static LACP mode can send LACPDUs in either active or
passive mode:
■ In active mode, an interface sends LACPDUs to the peer end for negotiation immediately after
being added to an Eth-Trunk interface in static LACP mode.
■ In passive mode, an interface does not actively send LACPDUs after being added to an Eth-Trunk
interface in static LACP mode. Instead, it responds to LACPDUs only when receiving LACPDUs from
the peer end that works in active mode.
The Eth-Trunk member interfaces at both ends cannot both work in passive mode; otherwise, no ends
actively send LACPDUs for negotiation, which means LACPDU negotiation fails.
2022-07-08 744
Feature Description
M:N backup applies to the scenario where bandwidth of M links needs to be provided and link
redundancy is required. If an active link fails, an LACP-enabled device can automatically select the
backup link with the highest priority and add it to the LAG.
If no backup link is available and the number of Up member links is less than the lower threshold for
the number of Up links, the device shuts down the trunk interface.
2022-07-08 745
Feature Description
Difference/Similarity
Manual Load Balancing Mode Static LACP Mode
Similarity The LAG is created and deleted manually, and the member links are added and deleted
manually.
In this mode, load balancing is carried out among all member interfaces. The NE40E supports two types of
load balancing:
2022-07-08 746
Feature Description
Figure 3 LACPDU
2. Devices at both ends determine the Actor according to the system LACP priority and system ID.
As shown in Figure 5, devices at both ends receive LACPDUs from each other. When Device B receives
LACPDUs from Device A, Device B checks and records information about Device A and compares their
2022-07-08 747
Feature Description
system priorities. If the system priority of Device A is higher than that of Device B, Device A functions
as the Actor and Device B selects active interfaces according to the interface priority of Device A. In
this manner, devices on both ends select the same active interfaces.
3. Devices at both ends determine active interfaces according to the LACP priorities and interface IDs of
the Actor.
On the network shown in Figure 6, after the devices at both ends determine the Actor, both devices
select active interfaces according to the interface priorities on the Actor.
After both devices select the same interfaces as active interfaces, an Eth-Trunk is established between
them and traffic is then load balanced among active links.
■ A link is Down.
2022-07-08 748
Feature Description
■ After LACP preemption is enabled, the priority of the backup interface is changed to be higher than
that of the current active interface.
If any of the preceding conditions are met, a link switching occurs in the following steps:
2. The backup link with the highest priority is selected to replace the faulty active link.
3. The backup link of the highest priority becomes the active link and then begins forwarding data.
The link switching is complete.
• LACP preemption
After LACP preemption is enabled, interfaces with higher priorities in a LAG function as active interfaces.
As shown in Figure 7, Port 1, Port 2, and Port 3 are member interfaces of Eth-Trunk 1. The upper
threshold for the number of active interfaces is 2. LACP priorities of Port 1 and Port 2 are set to 9 and
10, respectively. The LACP priority of Port 3 is the default value. When LACP negotiation is complete,
Port 1 and Port 2 are selected as active interfaces because their LACP priorities are higher. Port 3
becomes the backup interface.
■ Port 1 fails and then recovers. When Port 1 fails, Port 3 takes its place. After Port 1 recovers, if
LACP preemption is not enabled on the Eth-Trunk, Port 1 remains as the backup interface. If LACP
preemption is enabled on the Eth-Trunk, Port 1 becomes the active interface after it recovers, and
Port 3 becomes the backup interface again.
■ If LACP preemption is enabled and you want Port 3 to take the place of Port 1 or Port 2 as an
active interface, you can set the LACP priority value of Port 3 to a smaller value. If LACP
preemption is not enabled, the system does not re-select an active interface or switch the active
interface when the priority of a backup interface is higher than that of the active interface.
2022-07-08 749
Feature Description
• Loop detection
LACP supports loop detection. If a local Eth-Trunk interface in static LACP mode receives the LACPDU
sent by itself, the Eth-Trunk interface sets its member interfaces to the Unselected state so that they
cease to participate in service traffic forwarding.
■ If the Eth-Trunk interfaces on each end of a link can exchange LACPDUs normally and LACP
negotiation succeeds, the member interfaces in Unselected state are restored to the Selected state
and resume service traffic forwarding.
■ If the Eth-Trunk interfaces on each end of a link still cannot exchange LACPDUs normally, the
member interfaces remain in the Unselected state, and the member interfaces still cannot
participate in service traffic forwarding.
7.3.2.5 E-Trunk
Definition
Enhanced trunk (E-Trunk) implements inter-device link aggregation, which increases reliability from the
board level to the device level.
Background
Eth-Trunk implements link reliability of single devices. However, if such a device fails, Eth-Trunk fails to take
effect.
To improve network reliability, carriers introduced device redundancy with master and backup devices. If the
master device or primary link fails, the backup device can take over user services. In this situation, another
device must be dual-homed to the master and backup devices, and inter-device link reliability must be
ensured.
E-Trunk was introduced to meet the requirements. E-Trunk aggregates data links of multiple devices to form
a LAG. If a link or device fails, services are automatically switched to the other available links or devices in
the E-Trunk, improving link and device-level reliability.
Basic Concepts
2022-07-08 750
Feature Description
■ The LACP E-Trunk system priority is used for the E-Trunk to which Eth-Trunk interfaces in static LACP mode
are added.
■ The LACP system priority is used for Eth-Trunk interfaces in static LACP mode.
On the network shown in Figure 1, the LACP system priorities of PE1 and PE2 are 60 and 50,
respectively. The LACP E-Trunk system priorities of PE1 and PE2 are both 100. Because PE1 and PE2 are
added to the E-Trunk, their LACP E-Trunk system priority 100 takes effect and is used when PE1 and
PE2 perform LACP negotiation with the CE. Because the CE's LACP system priority is higher, the CE
becomes the LACP Actor.
2022-07-08 751
Feature Description
■ The LACP E-Trunk system ID is used for the E-Trunk to which Eth-Trunk interfaces in static LACP mode are
added.
■ The LACP system ID is used for Eth-Trunk interfaces in static LACP mode.
• E-Trunk priority
E-Trunk priorities determine the master/backup status of the devices in an aggregation group. As shown
in Figure 1, the smaller the E-Trunk priority value, the higher the E-Trunk priority. PE1 has a higher E-
Trunk priority than PE2, and therefore PE1 is the master device while PE2 is the backup device.
• E-Trunk ID
An E-Trunk ID is an integer that uniquely identifies an E-Trunk.
• Working mode
The working mode is subject to the working mode of the Eth-Trunk interface added to the E-Trunk. The
Eth-Trunk interface works in one of the following modes: automatic, forced master and forced backup.
■ Automatic mode: Eth-Trunk interfaces as E-Trunk members work in automatic mode, and their
master/backup status is determined through negotiation.
■ Forced master mode: Eth-Trunk interfaces as E-Trunk members are forced to work in master mode.
■ Forced backup mode: Eth-Trunk interfaces as E-Trunk members are forced to work in backup
mode.
• Timeout period
Normally, the master and backup devices in an E-Trunk periodically send Hello messages to each other.
If the backup device does not receive any Hello message within the timeout period, it becomes the
master device.
The timeout period is obtained through the formula: Timeout period = Sending period x Multiplier.
If the multiplier is 3, the backup device becomes the master device if it does not receive any Hello
message within three consecutive sending periods.
2022-07-08 752
Feature Description
■ PE end
The same Eth-Trunk and E-Trunk interfaces are created on PE1 and PE2. In addition, the Eth-Trunk
interfaces are added to the E-Trunk group.
The Eth-Trunk interfaces can work in either static LACP mode or manual load balancing mode. The Eth-Trunk
and E-Trunk configurations on PE1 and PE2 must be the same.
■ CE end
Adding Eth-Trunk interfaces in static LACP mode to an E-Trunk: Create an Eth-Trunk interface in
static LACP mode on the CE, and add the CE interfaces connecting to the PEs to the Eth-Trunk
interface. This ensures link reliability.
Adding Eth-Trunk interfaces in manual load balancing mode to an E-Trunk: Create an Eth-Trunk
interface in manual load balancing mode on the CE, and add the CE interfaces connecting to the
PEs to the Eth-Trunk interface. Then, configure Ethernet operation, administration and
maintenance (OAM) on the CE and PEs, ensuring link reliability.
The E-Trunk group is invisible to the CE.
2022-07-08 753
Feature Description
There are few scenarios for configuring IP addresses for Eth-Trunk interfaces, which connect the CE and PEs to
transmit Layer 3 services and which on PEs are added to an E-Trunk. In most cases, Eth-Trunk interfaces work as
Layer 2 interfaces.
E-Trunk packets carrying the source IP address and port number configured on the local end are sent
through UDP. Factors triggering the sending of E-Trunk packets are as follows:
■ The configurations change. For example, the E-Trunk priority, packet sending period, timeout
period multiplier, addition/deletion of a member Eth-Trunk interface, or source/destination IP
address of the E-Trunk group changes.
E-Trunk packets need to carry their timeout interval. The peer device uses this interval as the timeout
interval of the local device.
Status of the Local E- Working Mode of the Status of the Peer Eth- Status of the Local
Trunk Local Eth-Trunk Trunk Interface Eth-Trunk Interface
Interface
2022-07-08 754
Feature Description
Status of the Local E- Working Mode of the Status of the Peer Eth- Status of the Local
Trunk Local Eth-Trunk Trunk Interface Eth-Trunk Interface
Interface
In normal situations:
■ If PE1 functions as the master, Eth-Trunk 10 of PE1 functions as the master, and its link status is
Up.
■ If PE2 functions as the backup, Eth-Trunk 10 of PE2 functions as the backup, and its link status is
Down.
If the link between the CE and PE1 fails, the following situations occur:
1. PE1 sends an E-Trunk packet containing information about the faulty Eth-Trunk 10 of PE1 to PE2.
2. After receiving the E-Trunk packet, PE2 finds that Eth-Trunk 10 on the peer is faulty. Then, the
status of Eth-Trunk 10 on PE2 becomes master. After E-Trunk status negotiation, the Eth-Trunk
10 on PE2 goes Up.
The Eth-Trunk status on PE2 becomes Up, and traffic of the CE is forwarded through PE2. In this
way, traffic destined for the peer CE is protected.
1. If the PEs are configured with BFD, the PE2 detects that the BFD session status becomes Down,
then functions as the master and Eth-Trunk 10 of PE2 functions as the master.
2. If the PEs are not configured with BFD, PE2 will not receive any E-Trunk packet from PE1 before
its timeout period runs out, after which PE2 will function as the master and Eth-Trunk 10 of PE2
will function as the master.
After E-Trunk status negotiation, the Eth-Trunk 10 on PE2 goes Up. The traffic of the CE is
forwarded through PE2. In this way, destined for the peer CE is protected.
• Switchback mechanism
The local device is in master state. In such a situation, if the physical status of the Eth-Trunk interface
2022-07-08 755
Feature Description
on the local device goes Down or the local device fails, the peer device becomes the master and the
physical status of the member Eth-Trunk interface becomes Up.
When the local end recovers, the local end needs to function as the master. Therefore, the local Eth-
Trunk interface enters the negotiation state. After being informed by Eth-Trunk that the negotiation
ability is Up, the local E-Trunk starts the switchback delay timer. After the switchback delay timer times
out, the local Eth-Trunk interface becomes the master. After the Eth-Trunk negotiation completes, the
Eth-Trunk link goes Up.
E-Trunk Restrictions
To improve the reliability of CE and PE links and to ensure that traffic can be automatically switched
between these links, the configurations on both ends of the E-Trunk link must be consistent. Use the
networking in Figure 2 as an example.
• The Eth-Trunk link directly connecting PE1 to the CE and the Eth-Trunk link directly connecting PE2 to
the CE must be configured with the same working rate, and duplex mode. This ensures that both Eth-
Trunk interfaces have the same key and join the same E-Trunk group.
• Peer IP addresses must be specified for the PEs to ensure Layer 3 connectivity. The address of the local
PE is the peer address of the peer PE, and the address of the peer PE is the peer address of the local PE.
Here, it is recommended that the addresses of the PEs are configured as loopback interface addresses.
• The two PEs must be configured with the same security key (if necessary).
7.3.2.6 mLACP
Definition
Multi-chassis LACP (mLACP) is used for LACP negotiation of aggregated links between devices in the same
redundancy group (RG). These devices use Inter-Chassis Communication Protocol (ICCP) channels that are
established using LDP sessions to exchange LACP configuration and status information.
Purpose
mLACP, which complies with RFC 7275, provides similar functions to E-Trunk. In multi-chassis scenarios, the
local device cannot obtain the configuration and negotiation parameters of the peer device that has Eth-
Trunk in LACP mode configured. As such, master/backup protection cannot be implemented. To resolve this
issue, mLACP can be used. It synchronizes LACP configuration and status information between dual-homed
devices through a reliable ICCP channel. In this way, master/backup protection can be implemented.
Basic Concepts
2022-07-08 756
Feature Description
Figure 1 mLACP
• ICCP session
ICCP establishes reliable ICCP channels over LDP sessions for different devices to transmit information.
• ICCP RG
Two or more PEs in the same administrative domain form an RG to protect services and use the
established ICCP sessions to communicate with each other.
• mLACP node ID
Node IDs are used to ensure unique LACP port numbers on different devices in an RG. A node ID is an
integer ranging from 0 to 7. After a node ID is configured, the rules for generating an LACP port
number in an RG are as follows:
• mLACP ROID
A redundant object ID (ROID) identifies a redundant Eth-Trunk link in mLACP.
2022-07-08 757
Feature Description
LACP uses the port priorities and port numbers of the member interfaces of the Actor to select active
links. LACP first compares their port priorities. Note that a smaller value indicates a higher priority. If
the port priorities are the same, LACP compares their port numbers. A smaller port number indicates a
higher priority. After an Eth-Trunk interface is added to mLACP, all member interfaces of the Eth-Trunk
interface share the same mLACP port priority. As shown in Figure 1, the port priority is 100 for PE1 and
is 32768 for PE2. As the port priority of PE1 is higher than that of PE2, PE1 becomes the master device,
and PE2 the backup device.
mLACP Implementation
Establishing an ICCP session
Before an ICCP session is established, an LDP session must be established between two devices in an RG.
After you create an RG, you need to specify the IP address of the remote LDP peer for the RG on the two
devices. The devices first negotiate ICCP capabilities through the LDP session. After that, the two devices
exchange RG Connect messages over the LDP session, requesting a connection. If a device both sends and
receives an RG Connect message to and from its peer, it considers that an ICCP session has been successfully
established.
Establishing an mLACP connection
mLACP is enabled for an RG after an mLACP system ID, system priority, and node ID are configured in the
RG. After an ICCP session is established, if mLACP is enabled in the RG, mLACP on one device sends an
mLACP Connect TLV message to other devices in the same RG to request to establish an mLACP connection.
After a device receives the message, it sets bit A to 1 in its mLACP Connect TLV message and notifies the
sender of the receipt. If one device in an RG both sends and receives an mLACP Connect TLV message with
bit A set to 1, it considers that an mLACP connection is set up successfully.
mLACP negotiation process
After an mLACP connection is established, the two devices in an RG exchange mLACP Synchronization Data
TLV messages to notify each other of all data relating to their systems, Eth-Trunk interfaces, and Eth-Trunk
member interfaces. After the synchronization is complete, mLACP compares the port priorities and port
numbers of the member interfaces of the Actor and selects the Eth-Trunk member interface with the highest
priority as the reference interface. Then, it sets the master role for the local end. Following this, mLACP
selects appropriate member interfaces from the master Eth-Trunk interface and activates them according to
LACP rules. If the link of the master Eth-Trunk interface is faulty on one device, this device notifies its peer
through an mLACP message. After detecting this fault, the backup Eth-Trunk interface becomes the master.
If the master Eth-Trunk interface recovers and has the highest port priority, traffic is immediately switched
back. A device can send an mLACP Port Priority TLV message to request its peer to change the port priority.
This is used for traffic switchover and switchback in fault or recovery scenarios. As such, the mLACP port
priority actually used by mLACP may not be the configured value.
mLACP Constraints
To ensure mLACP runs normally and that there are automatic switchovers in the event of a fault, comply
with the following rules:
2022-07-08 758
Feature Description
• The Eth-Trunk interfaces to be added to mLACP must work in static LACP mode. The Eth-Trunk IDs of
PE1 and PE2 must be the same.
• Devices in the same RG must be configured with the same system ID and system priority. However,
their node IDs must be different, or mLACP negotiation fails.
• To ensure a normal master/backup switchover if a fault occurs, you are advised to configure the two
devices running mLACP as the Actors. In other words, you are advised to set their mLACP system IDs
and system priorities to smaller values.
Service Overview
As the volume of services deployed on networks increases, the bandwidth provided by a single P2P physical
link working in full-duplex mode cannot meet the requirements of service traffic.
To increase bandwidth, existing interface boards can be replaced with interface boards of higher bandwidth
capacity. However, this would waste existing device resources and increase upgrade expenditure. If more
links are used to interconnect devices, each Layer 3 interface must be configured with an IP address, wasting
IP addresses.
To increase bandwidth without replacing the existing interface boards or wasting IP address resources,
bundle physical interfaces into a logical interface using Eth-Trunk to provide higher bandwidth.
Networking Description
As shown in Figure 1, traffic of different services is sent to the core network through the user-end provider
edge (UPE) and provider edge-access aggregation gateway (PE-AGG). Different services are assigned
different priorities. To ensure the bandwidth and reliability of the link between the UPE and the PE-AGG, a
link aggregation group, Eth-Trunk 1, is established.
2022-07-08 759
Feature Description
Feature Deployment
In Figure 1, Eth-Trunk interfaces are created on the UPE and PE-AGG, and the physical interfaces that
directly connect the UPE and PE-AGG are added to the Eth-Trunk interfaces. Eth-Trunk offers the following
benefits:
• Improved link bandwidth. The maximum bandwidth of the Eth-Trunk link is three times that of each
physical link.
• Improved link reliability. If one physical link fails, traffic is switched to another physical link of the Eth-
Trunk link.
• Network congestion prevention. Traffic between the UPE and PE-AGG is load-balanced on the three
physical links of the Eth-Trunk link.
• Prompt transmission of high-priority packets, with quality of service (QoS) policies applied to Eth-Trunk
interfaces.
You can select the operation mode for the Eth-Trunk as follows:
• If devices at both ends of the Eth-Trunk link support the Link Aggregation Control Protocol (LACP), Eth-
Trunk interfaces in static LACP mode are recommended.
• If the device at either end of the Eth-Trunk does not support LACP, Eth-Trunk interfaces in manual load
balancing mode are recommended.
2022-07-08 760
Feature Description
Service Overview
Eth-Trunk implements link reliability between single devices. However, if a device fails, Eth-Trunk does not
take effect.
To improve network reliability, carriers introduced device redundancy with master and backup devices. If the
master device or primary link fails, the backup device can take over user services. However, in this situation,
the master and backup devices must be dual-homed by a downstream device, and inter-device link reliability
must be ensured.
In dual-homing networking, Virtual Router Redundancy Protocol (VRRP) can be used to ensure device-level
reliability, and Eth-Trunk can be used to ensure link reliability. In some cases, however, traffic cannot be
switched to the backup device and secondary link simultaneously if the master device or primary link fails.
As a result, traffic is interrupted. To address this issue, use Enhanced Trunk (E-Trunk) to implement both
device- and link-level reliability.
Networking Description
In Figure 1, the customer edge (CE) is dual-homed to the virtual private LAN service (VPLS) network, and
Eth-Trunk is deployed on the CE and provider edges (PEs) to implement link reliability.
In normal situations, the CE communicates with remote devices on the VPLS network through PE1. If PE1 or
the link between the CE and PE1 fails, the CE cannot communicate with PE1. To ensure that services are not
interrupted, deploy an E-Trunk on PE1 and PE2. If PE1 or the link between the CE and PE1 fails, traffic is
switched to PE2. The CE then continues to communicate with remote devices on the VPLS network through
PE2. If PE1 or the link between the CE and PE1 recovers, traffic is switched back to PE1. An E-Trunk provides
backup between Eth-Trunk links of the PEs, improving device-level reliability.
2022-07-08 761
Feature Description
Feature Deployment
Use an E-Trunk comprised of Eth-Trunk interfaces in static LACP mode as an example. Figure 1 shows how
the Eth-Trunk and E-Trunk are deployed.
• Deploy Eth-Trunk interfaces in static LACP mode on the CE and PEs and add the interfaces that directly
connect the CE and PEs to the Eth-Trunk interfaces to implement link reliability.
• Deploy an E-Trunk on the PEs and add the Eth-Trunk interfaces in static LACP mode to the E-Trunk to
implement device-level reliability.
Definition
GARP VLAN Registration Protocol (GVRP) is an application of Generic Attribute Registration Protocol (GARP)
for registering and deregistering VLAN attributes.
GARP propagates attributes among protocol participants so that they dynamically register or deregister
attributes. It supports different upper-layer applications by filling different attributes into GARP PDUs and
identifies applications through destination MAC addresses.
IEEE Std 802.1Q assigns 01-80-C2-00-00-21 to the VLAN application. GVRP is therefore developed for VLAN
pruning and dynamic VLAN creation.
Purpose
Before GVRP is developed, a network administrator must manually create VLANs on network devices. In
Figure 1, Device A and Device C connect to Device B through trunk links. VLAN 2 is created on Device A, and
VLAN 1 exists on Device B and Device C by default. To allow packets belonging to VLAN 2 on Device A to be
transmitted to Device C over Device B, the network administrator must manually create VLAN 2 on Device B
and Device C, a simple task considering this networking.
2022-07-08 762
Feature Description
If the networking is so complicated that the network administrator cannot ascertain the topology in a short
time or if numerous VLANs require configuring, the VLAN configuration is time-consuming, and
misconfiguration may occur. GVRP reduces the heavy VLAN configuration workload by completing VLAN
configuration through automatic VLAN registration.
Benefits
GVRP can rapidly propagate VLAN attributes of one device throughout an entire switching network, thereby
reducing manual configuration workload and possible configuration errors.
Participant
A participant is an interface that runs a protocol. On a device running GVRP, each GVRP-enabled interface is
a GVRP participant, as shown in Figure 1.
2022-07-08 763
Feature Description
A Router added to an ERPS ring is called a node. A maximum of two ports on a node can be added to the
same ERPS ring. Device A, DeviceB, and Device C in Figure 1 are nodes of the ERPS major ring.
GVRP registers and deregisters VLAN attributes through attribute declarations and reclaim declarations.
• When an interface receives a VLAN attribute declaration, it registers or joins the VLAN specified in the
declaration.
• When an interface receives a VLAN attribute reclaim declaration, it deregisters or leaves the VLAN
specified in the reclaim declaration.
GARP Messages
GARP participants exchange VLAN information through GARP messages. There are three types of GARP
messages:
2022-07-08 764
Feature Description
• Join message
When a GARP participant expects other devices to register its attributes, it sends Join messages to other
devices. These attributes are either manually configured attributes or those registered by receiving Join
messages from other participants.
• Leave message
When a GARP participant expects other devices to deregister its attributes, it sends Leave messages to
other devices. These attributes are either manually deleted attributes or those deleted by receiving
Leave messages from other participants.
• LeaveAll message
When a GARP device starts, its LeaveAll timer starts. When the LeaveAll timer expires, the device sends
a LeaveAll message.
A LeaveAll message deregisters all attributes so that devices can re-register each other's attributes and
periodically delete junk attributes on the network. For example, an attribute of a participant has been
deleted but, due to a sudden power outage, the participant does not send any Leave messages to
request that other participants deregister the attribute. As a result, this attribute becomes junk. The junk
attribute is deleted when the other participants receive a LeaveAll message.
Timers
GARP defines four timers:
• Join timer
The Join timer controls the sending of Join messages, including JoinIn and JoinEmpty messages.
A participant starts the Join timer after sending an initial Join message. If the participant receives a
JoinIn message before the Join timer expires, it does not send a second Join message. If the participant
does not receive any JoinIn message before the Join timer expires, it sends a second Join message. This
mechanism ensures that Join messages can be sent to other participants.
Each interface maintains an independent Join timer.
• Hold timer
The Hold timer controls the sending of Join messages (JoinIn and JoinEmpty messages) and Leave
messages (LeaveIn and LeaveEmpty messages).
After a participant is configured with an attribute or receives messages and the Hold timer expires, it
sends the messages to other participants. The participant encapsulates messages received within the
2022-07-08 765
Feature Description
hold time into a minimum number of packets before sending. If the participant does not start the Hold
timer, it forwards messages immediately upon receipt. As a result, a large number of packets will be
transmitted on the network, jeopardizing network stability and wasting the data sizes of packets.
Each interface maintains an independent Hold timer. The Hold timer value must be less than or equal
to half of the Join timer value.
• Leave timer
The Leave timer controls attribute deregistration.
A participant starts the Leave timer after receiving a Leave or LeaveAll message. If the participant does
not receive any Join message of an attribute before the Leave timer expires, the participant deregisters
the attribute.
A participant cannot deregister an attribute immediately upon receipt of a Leave message, because this
attribute may still exist on other participants. This is why the Leave timer is beneficial.
For example, an attribute has two sources on the network: participant A and participant B. Other
participants register this attribute through GARP. If this attribute is deleted from participant A,
participant A sends a Leave message to other participants. After receiving the Leave message,
participant B sends a Join message to notify other participants of the existence of this attribute. After
receiving the Join message from participant B, other participants retain the attribute. Other participants
deregister the attribute only if they do not receive any Join message of the attribute within a period
longer than twice the Join timer value. Therefore, the Leave timer value must be larger than twice the
Join timer value.
Each interface maintains an independent Leave timer.
• LeaveAll timer
When a GARP device starts, its LeaveAll timer starts. When the LeaveAll timer expires, the device sends
a LeaveAll message to request other GARP devices to re-register all of its attributes. Then, it restarts its
LeaveAll timer for a new round of polling.
After receiving the LeaveAll message, the other devices restart all of their GARP timers, including the
LeaveAll timer. They propagate the LeaveAll messages to all other connected devices except for the
device that sent the LeaveAll message. When the LeaveAll timer expires again, the device sends another
LeaveAll message, reducing excessive LeaveAll messages sent within a short period of time.
If the LeaveAll timers of multiple devices expire at the same time, all of the devices send LeaveAll
messages simultaneously. This results in the sending of unnecessary LeaveAll messages. To resolve this
problem, each device uses a random value that is larger than its LeaveAll timer value but less than 1.5
times its LeaveAll timer value. When a LeaveAll event occurs, all attributes of a device are deregistered.
The LeaveAll event affects the entire network; therefore, you must set a proper value for the LeaveAll
timer that is at least greater than the Leave timer value.
Each device maintains a global LeaveAll timer.
2022-07-08 766
Feature Description
• Normal mode: allows a GVRP interface to register, deregister, and propagate dynamic and static VLANs.
• Fixed mode: forbids a GVRP interface to register, deregister, or propagate dynamic VLANs, but allows it
to register, deregister, and propagate static VLANs.
• Forbidden mode: forbids a GVRP interface to register, deregister, or propagate all VLANs.
One-Way Registration
Figure 1 One-way registration
VLAN 2 is manually created on Device A. Interfaces on Device B and Device C are automatically added to
VLAN 2 through one-way registration as follows:
1. After VLAN 2 is manually created on Device A, Port 1 of Device A starts the Join timer and Hold timer.
When the Hold timer expires, Port 1 sends the first JoinEmpty message to Device B. When the Join
timer expires, Port 1 restarts the Hold timer. When the Hold timer expires again, Port 1 sends the
second JoinEmpty message.
2. Upon receipt of the first JoinEmpty message, Device B creates dynamic VLAN 2, adds Port 2 to VLAN
2, and requests Port 3 to start the Join timer and Hold timer. When the Hold timer expires, Port 3
sends the first JoinEmpty message to Device C. When the Join timer expires, Port 3 restarts the Hold
timer. When the Hold timer expires again, Port 3 sends the second JoinEmpty message to Device C.
After Port 2 of Device B receives the second JoinEmpty message from Port 1, Port 2 of Device B leaves
the JoinEmpty message unprocessed because Port 2 has been added to VLAN 2.
3. After Port 4 of Device C receives the first JoinEmpty message, Device C creates dynamic VLAN 2 and
2022-07-08 767
Feature Description
adds Port 4 to VLAN 2. After Port 4 of Device C receives the second JoinEmpty message, Port 4 leaves
the JoinEmpty message unprocessed, because Port 2 has been added to VLAN 2.
4. Each time a LeaveAll timer expires or a LeaveAll message is received, each device restarts the LeaveAll
timer, Join timer, Hold timer, and Leave timer. Then, Port 1 of Device A repeats step 1 to send
JoinEmpty messages. Port 3 of Device B sends JoinEmpty messages to Device C in the same way.
Two-Way Registration
Figure 2 Two-way registration
After one-way registration is complete, Port 1, Port 2, and Port 4 are added to VLAN 2. Port 3 is not added to
VLAN 2, because it does not receive a JoinEmpty or JoinIn message. To add Port 3 to VLAN 2, implement
VLAN registration from Device C to Device A as follows:
1. Manually create VLAN 2 on Device C after one-way registration is complete (the dynamic VLAN is
replaced by the static VLAN). Port 4 of Device C starts the Join timer and Hold timer. When the Hold
timer expires, Port 4 sends the first JoinIn message (because it has registered VLAN 2) to Device B.
When the Join timer expires, Port 4 restarts the Hold timer. When the Hold timer expires again, Port 4
sends the second JoinIn message to Device B.
2. Upon receipt of the first JoinEmpty message, Device B adds Port 3 to VLAN 2 and requests Port 2 to
start the Join timer and Hold timer. When the Hold timer expires, Port 2 sends the first JoinIn message
to Device A. When the Join timer expires, Port 2 restarts the Hold timer. When the Hold timer expires
again, Port 2 sends the second JoinIn message to Device A.
3. After Port 3 of Device B receives the second JoinIn message from Port 4, because Port 3 has been
added to VLAN 2, Port 3 leaves the JoinIn message unprocessed.
4. After Device A receives the first JoinIn message from Port 2, it stops sending JoinEmpty messages to
Device B. Each time a LeaveAll timer expires or a LeaveAll message is received, each device restarts
the LeaveAll timer, Join timer, Hold timer, and Leave timer. When the Hold timer of Device A's Port 1
expires, Port 1 sends a JoinIn message to Device B.
2022-07-08 768
Feature Description
6. Upon receipt, Device C does not create dynamic VLAN 2 because it has static VLAN 2 created.
One-Way Deregistration
Figure 3 One-way deregistration
1. Manually delete static VLAN 2 from Device A. Port 1 of Device A starts the Hold timer. When the Hold
timer expires, Port 1 sends a LeaveEmpty message to Device B. The LeaveEmpty message needs to be
sent only once.
2. Upon receipt, Port 2 of Device B starts the Leave timer. When the Leave timer expires, Port 2
deregisters VLAN 2. Because Port 3 is still in VLAN 2, VLAN 2 still exists on Device B. Device B then
requests Port 3 to start the Hold timer and Leave timer. When the Hold timer expires, Port 3 sends a
LeaveIn message to Port 4 of Device C. Upon receipt, Port 4 does not deregister VLAN 2 because VLAN
2 is a static VLAN on Device C. Because static VLAN 2 still exists on Device C, Port 3 still receives the
JoinIn message from Port 4 before the Leave timer expires. That means that both Device A and Device
B still learn dynamic VLAN 2.
3. After Device C receives the LeaveIn message, Port 4 does not deregister VLAN 2 because VLAN 2 is a
static VLAN on Device C.
Two-Way Deregistration
2022-07-08 769
Feature Description
1. Manually delete static VLAN 2 from Device C. Port 4 of Device C starts the Hold timer. When the Hold
timer expires, Port 4 sends a LeaveEmpty message to Device B.
2. Upon receipt, Port 3 of Device B starts the Leave timer. When the Leave timer expires, Port 3
deregisters VLAN 2. Dynamic VLAN 2 is then deleted from Device B. Device B then requests Port 2 to
start the Hold timer. When the Hold timer expires, Port 2 sends a LeaveEmpty message to Device A.
3. Upon receipt, Port 1 of Device A starts the Leave timer. When the Leave timer expires, Port 1
deregisters VLAN 2. Dynamic VLAN 2 is then deleted from Device A.
2022-07-08 770
Feature Description
Attribute Type Attribute type, defined by specific The value is 0x01 for GVRP,
GARP applications indicating that the attribute value
is a VLAN ID
2022-07-08 771
Feature Description
Term
Term Definition
GARP Generic Attribute Registration Protocol. A protocol propagates attributes among participants
so that they can dynamically register or deregister the attributes.
GVRP GARP VLAN Registration Protocol (GVRP). An application of GARP for registering and
deregistering VLAN attributes.
Definition
Layer 2 protocol tunneling allows Layer 2 devices to use Layer 2 tunneling technology to transparently
transmit Layer 2 protocol data units (PDUs) across a Layer 2 network. Layer 2 protocol tunneling supports
standard protocols, such as Spanning Tree Protocol (STP), Link Aggregation Control Protocol (LACP), as well
as user-defined protocols.
Purpose
Layer 2 protocol tunneling ensures transparent transmission of private Layer 2 PDUs over a public network.
The ingress device replaces the multicast destination MAC address in the received Layer 2 PDUs with a
specified multicast MAC address before transmitting them onto the public network. The egress device
2022-07-08 772
Feature Description
restores the original multicast destination MAC address and then forwards the Layer 2 PDUs to their
destinations.
Background
Layer 2 protocols running between user networks, such as Spanning Tree Protocol (STP) and Link
Aggregation Control Protocol (LACP), must traverse a backbone network to perform Layer 2 protocol
calculation.
On the network shown in Figure 1, User Network 1 and User Network 2 both run a Layer 2 protocol,
Multiple Spanning Tree Protocol (MSTP). Layer 2 protocol data units (PDUs) on User Network 1 must
traverse a backbone network to reach User Network 2 to build a spanning tree. Generally, the destination
MAC addresses in Layer 2 PDUs of the same Layer 2 protocol are the same. For example, the MSTP PDUs are
BPDUs with the destination MAC address 0180-C200-0000. Therefore, when a Layer 2 PDU reaches an edge
device on a backbone network, the edge device cannot identify whether the PDU comes from a user network
or the backbone network and sends the PDU to the CPU to calculate a spanning tree.
In Figure 1, CE1 on User Network 1 builds a spanning tree together with PE1 but not with CE2 on User
Network 2. As a result, the Layer 2 PDUs on User Network 1 cannot traverse the backbone network to reach
User Network 2.
To resolve the preceding problem, use Layer 2 protocol tunneling. The NE40E supports tunneling for the
following Layer 2 protocols:
2022-07-08 773
Feature Description
• 802.1X
Layer 2 PDUs can be tunneled across a backbone network if all of the following conditions are met:
• All sites of a user network can receive Layer 2 PDUs from one another.
• Layer 2 PDUs of a user network are not processed by the CPUs of backbone network devices.
• Layer 2 PDUs of different user networks must be isolated and not affect each other.
Layer 2 protocol tunneling prevents Layer 2 PDUs of different user networks from affecting each other,
which cannot be achieved by other technologies.
BPDU
Bridge protocol data units (BPDUs) are most commonly used by Layer 2 protocols, such as STP and MSTP.
BPDUs are protocol packets multicast between Layer 2 switches. BPDUs of different protocols have different
destination MAC addresses and are encapsulated in compliance with IEEE 802.3. Figure 2 shows the BPDU
format.
2022-07-08 774
Feature Description
Layer 2 protocol tunneling provides a BPDU tunnel for BPDUs. The BPDU tunnel can be considered a Layer 2
tunneling technology that allows user networks at different regions to transparently transmit BPDUs across
a backbone network, isolating user networks from the backbone network.
1. The ingress device replaces the multicast destination MAC address in the Layer 2 PDUs with a
specified multicast MAC address so that it does not send the Layer 2 PDUs to its CPU for
processing.
The specified multicast MAC address cannot be a multicast MAC address used by well-known protocols.
2. The ingress device then determines whether to add an outer VLAN tag to the Layer 2 PDUs with a
specified multicast MAC address based on the configured Layer 2 protocol tunneling type.
1. The egress device restores the original multicast destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and the
specified multicast MAC address.
2. The egress device then determines whether to remove the outer VLAN tag from the Layer 2 PDUs
with the original multicast destination MAC address based on the configured Layer 2 protocol
tunneling type.
Layer 2 PDUs can be tunneled across a backbone network if all of the following conditions are met:
• All sites of a user network can receive Layer 2 PDUs from one another.
• Layer 2 PDUs of a user network are not processed by the CPUs of backbone network devices.
• Layer 2 PDUs of different user networks must be isolated and not affect each other.
Table 1 describes the Layer 2 protocol tunneling types that Huawei devices support.
2022-07-08 775
Feature Description
Untagged Layer 2 Protocol Backbone network edge devices receive untagged Layer 2 PDUs.
Tunneling
VLAN-based Layer 2 Backbone network edge devices receive Layer 2 PDUs that carry a single
Protocol Tunneling VLAN tag.
QinQ-based Layer 2 Backbone network edge devices receive Layer 2 PDUs that carry a single
Protocol Tunneling VLAN tag and need to tunnel Layer 2 PDUs that carry double VLAN tags.
Hybrid VLAN-based Layer 2 Backbone network edge devices receive both tagged and untagged Layer 2
Protocol Tunneling PDUs.
On the network shown in Figure 1, each PE interface connects to one user network, and each user network
belongs to either LAN-A or LAN-B. Layer 2 PDUs from user networks to PEs on the backbone network do not
carry VLAN tags. The PEs, however, must identify which LAN the Layer 2 PDUs come from. Layer 2 PDUs
from a user network in LAN-A must be sent to the other user networks in LAN-A, but not to the user
networks in LAN-B. In addition, Layer 2 PDUs cannot be processed by PEs. To meet the preceding
requirements, configure interface-based Layer 2 protocol tunneling on backbone network edge devices.
1. The ingress device on the backbone network identifies the protocol type of the received Layer 2 PDUs
and tags them with the default VLAN ID of the interface that has received them.
2022-07-08 776
Feature Description
2. The ingress device replaces the multicast destination MAC address in the Layer 2 PDUs with a specified
multicast MAC address based on the configured mapping between the multicast destination MAC
address and the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a specified multicast
MAC address to the egress devices.
4. The egress devices restore the original destination MAC address in the Layer 2 PDUs based on the
configured mapping between the multicast destination MAC address and the specified multicast
address and send the Layer 2 PDUs to the user networks.
In most circumstances, PEs serve as aggregation devices on a backbone network. On the network shown in
Figure 2, the aggregation interfaces on PE1 and PE2 receive Layer 2 PDUs from both LAN-A and LAN-B. To
differentiate between the Layer 2 PDUs of the two LANs, the PEs must identify tagged Layer 2 PDUs from
CEs, with Layer 2 PDUs from LAN-A carrying VLAN 200 and those from LAN-B carrying VLAN 100. To meet
the preceding requirements, configure backbone network devices to identify tagged Layer 2 PDUs and allow
Layer 2 PDUs carrying specified VLAN IDs to pass through and also configure VLAN-based Layer 2 protocol
tunneling on backbone network edge devices.
1. User network devices send Layer 2 PDUs with specified VLAN IDs to the backbone network.
2. The ingress device on the backbone network identifies that the Layer 2 PDUs carry a single VLAN tag
and replaces the multicast destination MAC address in the Layer 2 PDUs with a specified multicast
2022-07-08 777
Feature Description
MAC address based on the configured mapping between the multicast destination MAC address and
the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a specified multicast
MAC address to the egress devices.
4. The egress devices restore the original destination MAC address in the Layer 2 PDUs based on the
configured mapping between the multicast destination MAC address and the specified multicast
address and send the Layer 2 PDUs to the user networks.
If VLAN-based Layer 2 protocol tunneling is used when many user networks connect to a backbone network,
a large number of VLAN IDs of the backbone network are required. This may result in insufficient VLAN
resources. To reduce the consumption of VLAN resources, configure QinQ on the backbone network to
forward Layer 2 PDUs.
For details about QinQ, see QinQ in NE40E Feature Description - LAN and MAN Access.
On the network shown in Figure 3, after QinQ is configured, a PE adds an outer VLAN ID of 20 to the
received Layer 2 PDUs that carry VLAN IDs in the range 100 to 199 and an outer VLAN ID of 30 to the
received Layer 2 PDUs that carry VLAN IDs in the range 200 to 299 before transmitting these Layer 2 PDUs
across the backbone network. To tunnel Layer 2 PDUs from the user networks across the backbone network,
configure QinQ-based Layer 2 protocol tunneling on PEs' aggregation interfaces.
2022-07-08 778
Feature Description
1. The ingress device on the backbone network adds a different outer VLAN tag (public VLAN ID) to the
received Layer 2 PDUs based on the inner VLAN IDs (user VLAN IDs) carried in the PDUs.
2. The ingress device replaces the multicast destination MAC address in the Layer 2 PDUs with a specified
multicast MAC address based on the configured mapping between the multicast destination MAC
address and the specified multicast MAC address.
3. The ingress device transmits the Layer 2 PDUs with a specified multicast MAC address through
different Layer 2 tunnels based on the outer VLAN IDs.
4. The internal devices on the backbone network forward the Layer 2 PDUs with a specified multicast
MAC address to the egress devices.
5. The egress devices restore the original destination MAC address in the Layer 2 PDUs based on the
configured mapping between the multicast destination MAC address and the specified multicast
address, remove the outer VLAN tags, and send the Layer 2 PDUs to the user networks based on the
inner VLAN IDs.
On the network shown in Figure 4, PE1, PE2, and PE3 constitute a backbone network. LAN-A and LAN-C
belong to VLAN 3; LAN-B and LAN-D belong to VLAN 2. All LANs send tagged Layer 2 PDUs. CE1 can
forward Layer 2 PDUs carrying VLAN 2 and VLAN 3. CE2 can forward Layer 2 PDUs carrying VLAN 3. CE3 can
forward Layer 2 PDUs carrying VLAN 2. CE1, CE2, and CE3 also run an untagged Layer 2 protocol, such as
LLDP.
PEs therefore receive both tagged and untagged Layer 2 PDUs. To transparently transmit both tagged and
untagged Layer 2 PDUs, configure hybrid VLAN-based Layer 2 protocol tunneling on backbone network edge
devices.
Hybrid VLAN-based Layer 2 protocol tunneling functions as a combination of interface-based and VLAN-based Layer 2
protocol tunneling. For details about the tunneling process, see Untagged Layer 2 Protocol Tunneling and VLAN-based
Layer 2 Protocol Tunneling.
2022-07-08 779
Feature Description
PE1, PE2, and PE3 constitute a backbone network and use different interfaces to connect to LAN-A and LAN-
B. Layer 2 PDUs from user networks to PEs on the backbone network do not carry VLAN tags. The PEs,
however, must identify which LAN the Layer 2 PDUs come from. Layer 2 PDUs from a user network in LAN-A
must be sent to the other user networks in LAN-A, but not to the user networks in LAN-B. In addition, Layer
2 PDUs cannot be processed by PEs. To meet the preceding requirements, configure interface-based Layer 2
protocol tunneling on backbone network edge devices. Multiple Spanning Tree Protocol (MSTP) runs on the
LANs.
To tunnel Layer 2 PDUs from the user network across the backbone network, configure untagged Layer 2
protocol tunneling on user-side interfaces on PE1, PE2, and PE3.
The Layer 2 protocol tunneling process is as follows:
1. PE1 identifies the protocol type of the Layer 2 PDUs and tags the Layer 2 PDUs with the default VLAN
ID of the interface that has received the Layer 2 PDUs.
2. PE1 replaces the multicast destination MAC address in the Layer 2 PDUs with a specified multicast
MAC address based on the configured mapping between the multicast destination MAC address and
2022-07-08 780
Feature Description
3. The internal devices on the backbone network forward the Layer 2 PDUs with a specified multicast
MAC address to the egress devices.
4. The egress devices PE2 and PE3 restore the original destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and the specified
multicast address and send the Layer 2 PDUs to the user networks. The Layer 2 PDUs are
transparently transmitted.
In most circumstances, PEs serve as aggregation devices on a backbone network. PE1, PE2, and PE3
constitute a backbone network, the aggregation interfaces on PE1 and PE2 receive Layer 2 PDUs from both
LAN-A and LAN-B. To differentiate the Layer 2 PDUs from the two LANs, the PEs must identify tagged Layer
2 PDUs from CEs, with Layer 2 PDUs from LAN-A carrying VLAN 200 and those from LAN-B carrying VLAN
100
2022-07-08 781
Feature Description
To tunnel Layer 2 PDUs from the user network across the backbone network, configure VLAN-based Layer 2
protocol tunneling on user-side interfaces on PE1, PE2, and PE3.
1. CE1 sends Layer 2 PDUs with specified VLAN Tag to the backbone network.
2. Configure Layer 2 forwarding on the aggregation device PE1 to allow BPDUs that carry specific VLAN
Tags to pass through.
3. PE1 receives Layer 2 PDUs from the user networks and identifies that the Layer 2 PDUs carry a single
VLAN tag. PE1 then replaces the multicast destination MAC address in the Layer 2 PDUs with a
specified multicast MAC address and sends the PDUs onto the backbone network.
4. The internal devices on the backbone network forward the Layer 2 PDUs with a specified multicast
MAC address to the egress devices.
5. The egress devices PE2 and PE3 restore the original destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and the specified
multicast address and send the Layer 2 PDUs to the user networks. The Layer 2 PDUs are
transparently transmitted.
2022-07-08 782
Feature Description
PE1 and PE2 constitute a backbone network and use only VLAN 20 and VLAN 30 for Layer 2 forwarding. CEs
send Layer 2 PDUs carrying VLAN 100 and VLAN 200 to the PEs. After QinQ is configured, a PE adds an
outer VLAN ID of 20 to the received Layer 2 PDUs carrying VLAN 100 and an outer VLAN ID of 30 to the
received Layer 2 PDUs carrying VLAN 200 before transmitting these Layer 2 PDUs across the backbone
network. To tunnel Layer 2 PDUs from the user networks across the backbone network, configure QinQ-
based Layer 2 protocol tunneling on PEs' aggregation interfaces.
1. PE1 receives Layer 2 PDUs and adds a different outer VLAN tag (public VLAN ID) based on the inner
VLAN IDs carried in the PDUs.
2. PE1 replaces the multicast destination MAC address in the Layer 2 PDUs with a specified multicast
MAC address based on the configured mapping between the multicast destination MAC address and
the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a specified multicast
MAC address through different Layer 2 tunnels based on the outer VLAN IDs to the egress device.
4. The egress device PE2 restores the original multicast destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and the specified
multicast MAC address. The egress device also removes the outer VLAN tags and sends the Layer 2
PDUs to the user networks based on the inner VLAN IDs. The Layer 2 PDUs are transparently
transmitted.
Application
When each edge device interface on a backbone network connects to more than one user network and some
Layer 2 protocol data units (PDUs) from the user networks carry VLAN tags while others do not, configure
hybrid VLAN-based Layer 2 protocol tunneling to allow both the tagged and untagged Layer 2 PDUs from
the user networks to be tunneled across the backbone network. Layer 2 PDUs from the user networks then
travel through different Layer 2 tunnels to reach the destinations to perform Layer 2 protocol calculation.
In Figure 1, PEs on the backbone network edge must tunnel both tagged and untagged Layer 2 PDUs from a
large number of VLAN users across the backbone network.
PE1, PE2, and PE3 constitute a backbone network. LAN-A and LAN-C belong to VLAN 3; LAN-B and LAN-D
belong to VLAN 2. All LANs send tagged Layer 2 PDUs. CE1 can forward Layer 2 PDUs carrying VLAN 2 and
VLAN 3. CE2 can forward Layer 2 PDUs carrying VLAN 3. CE3 can forward Layer 2 PDUs carrying VLAN 2.
CE1, CE2, and CE3 also run an untagged Layer 2 protocol, such as LLDP.
To tunnel both tagged and untagged Layer 2 PDUs from a large number of VLAN users across the backbone
network, configure hybrid tagged and hybrid untagged attributes and enable both interface-based and
VLAN-based Layer 2 protocol tunneling on the user-side interfaces of PE1, PE2, and PE3.
1. PE1 receives tagged and untagged Layer 2 PDUs and adds the default VLAN ID of the interface that
has received the untagged Layer 2 PDUs to these PDUs.
2. PE1 replaces the multicast destination MAC address in the Layer 2 PDUs with a specified multicast
MAC address based on the configured mapping between the multicast destination MAC address and
the specified multicast MAC address.
3. The internal devices on the backbone network forward the Layer 2 PDUs with a specified multicast
MAC address to the egress devices.
4. The egress devices PE2 and PE3 restore the original destination MAC address in the Layer 2 PDUs
based on the configured mapping between the multicast destination MAC address and the specified
multicast address. They also remove the outer VLAN tags and send the Layer 2 PDUs to the user
networks.
2022-07-08 784
Feature Description
Definition
The virtual local area network (VLAN) technology logically divides a physical LAN into multiple VLANs that
are broadcast domains. Each VLAN contains a group of PCs that have the same requirements. A VLAN has
the same attributes as a LAN. PCs of a VLAN can be placed on different LAN segments. Hosts can
communicate within the same VLAN, while cannot communicate in different VLANs. If two PCs are located
on one LAN segment but belong to different VLANs, they do not broadcast packets to each other. In this
manner, network security is enhanced.
Purpose
The traditional LAN technology based on the bus structure has the following defects:
• Networks have security risks as all the hosts in a LAN share the same transmission channel.
The network constructs a collision domain. More computers on the network cause more conflicts and lower
network efficiency. The network is also a broadcast domain. When many computers on the network send
data, broadcast traffic consumes much bandwidth.
Traditional networks face collision domain and broadcast domain issues, and cannot ensure information
security.
To offset the defects, bridges and Layer 2 switches are introduced to consummate the traditional LAN.
Bridges and Layer 2 switches can forward data from the inbound interface to outbound interface in
switching mode. This properly solves the access conflict problem on the shared media, and limits the
collision domain to the port level. Nevertheless, the bridge or Layer 2 switch networking can only solve the
problem of the collision domain, but not the problems of broadcast domain and network security.
In this document, the Layer 2 switch is referred to as the switch for short.
To reduce the broadcast traffic, you need to enable the broadcast only among hosts that need to
communicate with each other, and isolate the hosts that do not need the broadcast. A Router can select
routes based on IP addresses and effectively suppress broadcast traffic between two connected network
segments. The Router solution, however, is costly. Therefore, multiple logical LANs, namely, VLANs are
developed on the physical LAN.
In this manner, a physical LAN is divided into multiple broadcast domains, that is, multiple VLANs. The intra-
VLAN communication is not restricted, while the inter-VLAN communication is restricted. As a result,
2022-07-08 785
Feature Description
Figure 1 is a networking diagram of a typical VLAN application. Device A, Device B, and Device C are placed
at different locations, such as different floors in an office building. Each switch connects to three computers
which belong to three different VLANs. In Figure 1, each dashed line frame identifies a VLAN. Packets of
enterprise customers in the same VLAN are broadcast within the VLAN but not among VLANs. In this way,
enterprise customers in the same VLAN can share resources as well as protect their information security.
• Broadcast domains are confined. A broadcast domain is confined to a VLAN. This saves bandwidth and
improves network processing capabilities.
• Network security is enhanced. Packets from different VLANs are separately transmitted. PCs in one
VLAN cannot directly communicate with PCs in another VLAN.
• Network robustness is improved. A fault in a VLAN does not affect PCs in other VLANs.
• Virtual groups are set up flexibly. With the VLAN technology, PCs in different geographical areas can be
grouped together. This facilitates network construction and maintenance.
Benefits
The VLAN technology offers the following benefits:
• EType
The 2-byte Type field indicates a frame type. If the value of the field is 0x8100, it indicates an 802.1Q
frame. If a device that does not support 802.1Q frames receives an 802.1Q frame, it discards the frame.
• PRI
The 3-bit Priority field indicates the frame priority. A greater the PRI value indicates a higher frame
priority. If a switch is congested, it preferentially sends frames with a higher priority.
• CFI
The 1-bit Canonical Format Indicator (CFI) field indicates whether a MAC address is in the canonical
format. If the CFI field value is 0, the MAC address is in canonical format. If the CFI field value is 1, the
MAC address is not in canonical format. This field is mainly used to differentiate among Ethernet
frames, Fiber Distributed Digital Interface (FDDI) frames, and token ring frames. The CFI field value in
an Ethernet frame is 0.
• VID
The 12-bit VLAN ID (VID) field indicates to which VLAN a frame belongs. VLAN IDs range from 0 to
4095. The values 0 and 4095 are reserved, and therefore VLAN IDs range from 1 to 4094.
Each frame sent by an 802.1q-capable switch carries a VLAN ID. On a VLAN, Ethernet frames are
classified into the following types:
2022-07-08 787
Feature Description
Link Types
VLAN links can be divided into the following types:
• Access link: a link connecting a host and a switch. Generally, a PC does not know which VLAN it belongs
to, and PC hardware cannot distinguish frames with VLAN tags. Therefore, PCs send and receive only
untagged frames. In Figure 2, links between PCs and the switches are access links.
• Trunk link: a link connecting switches. Data of different VLANs is transmitted along a trunk link. The
two ends of a trunk link must be able to distinguish frames with VLAN tags. Therefore, only tagged
frames are transmitted along trunk links. In Figure 2, links between switches are trunk links. Frames
transmitted over trunk links carry VLAN tags.
Port Types
Some ports of a device can identify VLAN frames defined by IEEE 802.1Q, whereas others cannot. Ports can
be divided into four types based on whether they can identify VLAN frames:
• Access port
An access port connects a switch to a host over an access port, as shown in Figure 2. An access port has
the following features:
■ Allows only frames tagged with the port default VLAN ID (PVID) to pass.
• Trunk port
A trunk port connects a switch to another switch over a trunk link. A trunk port has the following
2022-07-08 788
Feature Description
features:
■ Directly sends the frame if the port permits the VLAN ID carried in the frame.
■ Discards the frame if the port denies the VLAN ID carried in the frame.
• Hybrid port
A hybrid port connects a switch to either a host over an access link or another switch over a trunk link.
A hybrid port allows frames from multiple VLANs to pass and can remove VLAN tags from some
outgoing VLAN frames.
Figure 3 Ports
• QinQ port
An 802.1Q-in-802.1Q (QinQ) port refers to a QinQ-enabled port. A QinQ port adds an outer tag to a
single-tagged frame. In this manner, the number of VLANs can meet the requirement of networks.
Figure 4 shows the format of a QinQ frame. The outer tag is a public network tag for carrying a public
network VLAN ID. The inner tag is a private network tag for carrying a private network VLAN ID.
VLAN Classification
VLANs are classified based on port numbers. In this mode, VLANs are classified based on the numbers of
ports on a switching device. The network administrator configures a port default VLAN ID (PVID) for each
port on the switch. When a data frame reaches a port which is configured with a PVID, the frame is marked
with the PVID if the data frame carries no VLAN tag. If the data frame carries a VLAN tag, the switching
2022-07-08 789
Feature Description
device will not add a VLAN tag to the data frame even if the port is configured with a PVID. Different types
of ports process VLAN frames in different manners.
Basic Principles
To improve frame processing efficiency, frames arriving at a switch must carry a VLAN tag for uniform
processing. If an untagged frame enters a switch port which has a PVID configured, the port then add a
VLAN tag whose VID is the same as the PVID to the frame. If a tagged frame enters a switch port that has a
PVID configured, the port does not add any tag to the frame.
The switch processes frames in a different way according to the port types. The following table describes
how a port processes a frame.
Port Type Method for Processing a Method for Method for Sending Application
Received Untagged Processing a a Frame
Frame Received Tagged
Frame
Access port Accepts the frame and Accepts the frame if Removes the tag An access port
adds a tag with the the VLAN ID carried from the frame and connects a switch
default VLAN ID to the in the frame is the sends the frame. to a PC and can be
frame. same as the default added to only one
VLAN ID. VLAN.
Discards the frame
if the VLAN ID
carried in the frame
is different from the
default VLAN ID.
Trunk port Discards the frame. Accepts the frame if Directly sends the A trunk port can be
the port permits the frame if the port added to multiple
VLAN ID carried in permits the VLAN ID VLANs to send and
the frame. carried in the frame. receive frames for
Discards the frame Discards the frame if these VLANs. A
if the port denies the port denies the trunk port connects
the VLAN ID carried VLAN ID carried in a switch to another
in the frame. the frame. switch or to a
router.
Hybrid port If only the port default If only the port If only the port A hybrid port can
2022-07-08 790
Feature Description
Port Type Method for Processing a Method for Method for Sending Application
Received Untagged Processing a a Frame
Frame Received Tagged
Frame
2022-07-08 791
Feature Description
Port Type Method for Processing a Method for Method for Sending Application
Received Untagged Processing a a Frame
Frame Received Tagged
Frame
QinQ port QinQ ports are enabled with the IEEE 802.1QinQ protocol. A QinQ port adds a tag to a
single-tagged frame, and thus the number of VLANs can meet the requirement of a
Metropolitan Area Network.
2022-07-08 792
Feature Description
must be able to recognize and send packets belonging to this VLAN, and a trunk link is used.
A trunk link plays the following two roles:
• Reply function
A trunk link can transparently transmit VLAN packets from a switch to another interconnected switch.
• Backbone function
A trunk link can transmit multiple VLAN packets.
On the network shown in Figure 1, the trunk link between DeviceA and DeviceB must support both the
intra-VLAN 2 communication and the intra-VLAN 3 communication. Therefore, the ports at both ends of the
trunk link must be configured to be bound to VLAN 2 and VLAN 3. That is, Port 2 on DeviceA and Port 1 on
DeviceB must belong to both VLAN 2 and VLAN 3.
Host A sends a frame to Host B in the following process:
2. A tag is added to the frame on Port 4. The VID field of the tag is set to 2, that is, the ID of the VLAN
to which Port 4 belongs.
3. Device A checks whether its MAC address table contains the MAC address destined for Host B.
• If not, Device A sends the frame to all interfaces bound to VLAN 2 except for Port 4.
5. After receiving the frame, Device B checks whether its MAC address table contains the MAC address
destined for Host B.
• If not, Device B sends the frame to all interfaces bound to VLAN 2 except for Port 1.
2022-07-08 793
Feature Description
If VLAN 2 and VLAN 3 are configured on the switch, to enable VLAN 2 to communicate with VLAN 3,
you need to perform the following operations: create two sub-interfaces on the routed Ethernet
interface that is connected to the switch. Sub-interface 1 is used to forward traffic to VLAN 2, and sub-
interface 2 is used to forward traffic to VLAN 3.
Then, configure 802.1Q encapsulation on and assign IP addresses to the sub-interfaces.
On the switch, you need to configure the switched Ethernet port to a Trunk or Hybrid interface and
allow frames of VLAN 2 and VLAN 3 to pass.
• Layer 3 switch
Layer 3 switching combines both routing and switching techniques to implement routing on a switch,
improving the overall performance of the network. After sending the first data flow based on a routing
table, a Layer 3 switch generates a mapping table, in which the mapping between the MAC address and
the IP address about this data flow is recorded. If the switch needs to send the same data flow again, it
directly sends the data flow at Layer 2 but not Layer 3 based on the mapping table. In this manner,
2022-07-08 794
Feature Description
delays on the network caused by route selection are eliminated, and data forwarding efficiency is
improved.
To allow the first data flow to be correctly forwarded based on the routing table, the routing table must
contain correct routing entries. Therefore, configuring a Layer 3 interface and a routing protocol on the
Layer 3 switch is required. VLANIF interfaces are therefore introduced.
A VLANIF interface is a Layer 3 logical interface, which can be configured on either a Layer 3 switch or a
router.
As shown in Figure 3, VLAN 2 and VLAN 3 are configured on the switch. You can then create two
VLANIF interfaces on the switch and assign IP addresses to and configure routes for them. In this
manner, VLAN 2 can communicate with VLAN 3.
The Layer 3 switching offsets the defects in the scheme of Layer 2 switch + Router, and can implement
faster traffic forwarding at a lower cost. Nevertheless, the Layer 3 switching has the following defects:
■ The Layer 3 switching is applicable only to a network whose interfaces are almost all Ethernet
interfaces.
■ The Layer 3 switching is applicable only to a network with stable routes and few changes in the
network topology.
• A PC does not need to know the VLAN to which it belongs. It sends only untagged frames.
• After receiving an untagged frame from a PC, a switching device determines the VLAN to which the frame belongs.
The determination is based on the configured VLAN division method such as port information, and then the
switching device processes the frame accordingly.
• If the frame needs to be forwarded to another switching device, the frame must be transparently transmitted along
a trunk link. Frames transmitted along trunk links must carry VLAN tags to allow other switching devices to
properly forward the frame based on the VLAN information.
• Before sending the frame to the destination PC, the switching device connected to the destination PC removes the
VLAN tag from the frame to ensure that the PC receives an untagged frame.
2022-07-08 795
Feature Description
Generally, only tagged frames are transmitted on trunk links; only untagged frames are transmitted on access links. In
this manner, switching devices on the network can properly process VLAN information and PCs are not concerned about
VLAN information.
Background
A VLAN is widely used on switching networks because of its flexible control of broadcast domains and
convenient deployment. On a Layer 3 switch, the interconnection between the broadcast domains is
implemented by using one VLAN with a logical Layer 3 interface. However, this wastes IP addresses.
Following is an example that shows how IP addresses are wasted.
On the network shown in Table 1, VLAN 2 requires 10 host addresses. A subnet address 1.1.1.0/28 with a
mask length of 28 bits is assigned to VLAN 2. 1.1.1.0 is the subnet number, and 1.1.1.15 is the directed
broadcast address. These two addresses cannot serve as the host address. In addition, 1.1.1.1, as the default
address of the network gateway of the subnet, cannot be used as the host address. The remaining 13
addresses ranging from 1.1.1.2 to 1.1.1.14 can be used by the hosts. In this way, although VLAN 2 needs only
ten addresses, 13 addresses are assigned to it according to the subnet division.
VLAN 3 requires five host addresses. A subnet address 1.1.1.16/29 with a mask length of 29 bits is assigned
to VLAN 3. VLAN 4 requires only one address. A subnet address 1.1.1.24/30 with a mask length of 30 bits is
assigned to VLAN 4.
2 1.1.1.0/28 1.1.1.1 14 13 10
2022-07-08 796
Feature Description
3 1.1.1.16/29 1.1.1.17 6 5 5
4 1.1.1.24/30 1.1.1.25 2 1 1
The preceding VLANs require a total of 16 (10 + 5 + 1) addresses. However, at least 28 (16 + 8 + 4)
addresses are occupied by the common VLANs. In this way, nearly half of the addresses are wasted. In
addition, if only three hosts, not 10 hosts are bound to VLAN 2 later, the extra addresses cannot be used by
other VLANs and thereby are wasted.
Meanwhile, this division is inconvenient for later network upgrades and expansions. For example, if you want
to add two more hosts to VLAN 4 and do not want to change the IP addresses assigned to VLAN 4, and the
addresses after 1.1.1.24 has been assigned to others, a new subnet with the mask length of 29 bits and a
new VLAN must be assigned to the new hosts. VLAN 4 has only three hosts, but the three hosts are assigned
to two subnets, and a new VLAN is required. This is inconvenient for network management.
In above, many IP addresses are used as the addresses of subnets, directional broadcast addresses of
subnets, and default addresses of network gateways of subnets and therefore cannot be used as the host
addresses in VLANs. This reduces addressing flexibility and wastes many addresses. To solve this problem,
VLAN aggregation is used.
Principles
The VLAN aggregation technology, also known as the super VLAN, provides a mechanism that partitions the
broadcast domain by using multiple VLANs in a physical network so that different VLANs can belong to the
same subnet. In VLAN aggregation, two concepts are involved, namely, super VLAN and sub VLAN.
• Super VLAN: In a super VLAN that is different from a common VLAN, only Layer 3 interfaces are
created, and physical ports are not contained. The super VLAN can be viewed as a logical Layer 3
concept. It is a collection of many sub VLANs.
• Sub VLAN: It is used to isolate broadcast domains. In a sub VLAN, only physical ports are contained, and
Layer 3 VLAN interfaces cannot be created. A sub VLAN implements Layer 3 switching through the
Layer 3 interface of the super VLAN.
A super VLAN can contain one or more sub VLANs that identify different broadcast domains. The sub VLAN
does not occupy an independent subnet segment. In the same super VLAN, IP addresses of hosts belong to
the subnet segment of the super VLAN, regardless of the mapping between hosts and sub VLANs.
Therefore, the same Layer 3 interface is shared by sub VLANs. Some subnet IDs, default gateway addresses
of the subnet, and directed broadcast addresses of the subnet are saved; meanwhile, different broadcast
domains can use the addresses in the same subnet segment. As a result, subnet differences are eliminated,
addressing becomes flexible and idle addresses are reduced.
2022-07-08 797
Feature Description
For example, on the network shown in Table 1, VLAN 2 requires 10 host addresses, VLAN 3 requires 5 host
addresses, and VLAN 4 requires 1 host address.
To implement VLAN aggregation, create VLAN 10 and configure VLAN 10 as a super VLAN. Then assign a
subnet address 1.1.1.0/24 with the mask length of 24 to VLAN 10; 1.1.1.0 is the subnet number, and 1.1.1.1 is
the gateway address of the subnet, as shown in Figure 2. Address assignment of sub VLANs (VLAN 2, VLAN
3, and VLAN 4) is shown in Table 2.
3 5 1.1.1.12-1.1.1.16 5
4 1 1.1.1.17 1
In VLAN aggregation implementation, sub VLANs are not divided according to the previous subnet border.
Instead, their addresses are flexibly assigned in the subnet corresponding to the super VLAN according to the
required host number.
As the Table 2 shows that VLAN 2, VLAN 3, and VLAN 4 share a subnet (1.1.1.0/24), a default gateway
address of the subnet (1.1.1.1), and a directed broadcast address of the subnet (1.1.1.255). In this manner,
the subnet ID (1.1.1.16, 1.1.1.24), the default gateway of the subnet (1.1.1.17, 1.1.1.25), and the directed
2022-07-08 798
Feature Description
broadcast address of the subnet (1.1.1.15, 1.1.1.23, and 1.1.1.27) can be used as IP addresses of hosts.
Totally, 16 addresses (10 + 5 + 1 = 16) are required for the three VLANs. In practice, in this subnet, a total of
16 addresses are assigned to the three VLANs (1.1.1.2 to 1.1.1.17). A total of 19 IP addresses are used, that
is, the 16 host addresses together with the subnet ID (1.1.1.0), the default gateway of the subnet (1.1.1.1),
and the directed broadcast address of the subnet (1.1.1.255). In the network segment, 236 addresses (255 -
19 = 236) are available, which can be used by any host in the sub VLAN.
Inter-VLAN Communication
• Introduction
VLAN aggregation ensures that different VLANs use the IP addresses in the same subnet segment. This,
however, leads to the problem of Layer 3 forwarding between sub VLANs.
In common VLAN mode, the hosts of different VLANs can communicate with each other based on the
Layer 3 forwarding through their respective gateways. In VLAN aggregation mode, the hosts in a super
VLAN use the IP addresses on the same network segment and share the same gateway address. The
hosts in different sub VLANs belong to the same subnet. Therefore, they communicate with each other
based on the Layer 2 forwarding, rather than the Layer 3 forwarding through a gateway. In practice,
hosts in different sub VLANs are isolated in Layer 2. As a result, sub VLANs fail to communicate with
each other.
To solve the preceding problem, you can use proxy ARP.
For details of proxy ARP, see the chapter "ARP" in the NE40E Feature Description - IP Services.
Figure 3 Layer 3 communication between different sub VLANs based on ARP proxy
In the scenario where Host A has no ARP entry of Host B in its ARP table and the gateway (L3 Switch)
2022-07-08 799
Feature Description
has proxy ARP enabled, Host A in VLAN 2 wants to communication with Host B in VLAN 3. The
communication process is as follows:
1. After comparing the IP address of Host B 1.1.1.3 with its IP address, Host A finds that both IP
addresses are on the same network segment 1.1.1.0/24 and its ARP table has no ARP entry of
Host B.
2. Host A broadcasts an ARP request to ask for the MAC address of Host B.
3. Host B is not in the broadcast domain of VLAN 2, and cannot receive the ARP request.
4. The proxy-ARP enabled gateway between the sub VLANs receives the ARP request from Host A
and finds that the IP address of Host B 1.1.1.3 is the IP address of a directly connected interface.
Then the gateway broadcasts an ARP request to all the other sub VLAN interfaces to ask for the
MAC address of Host B.
6. After receiving the ARP response from Host B, the gateway replies with its MAC address to Host
A.
7. Both the gateway and Host A have the ARP entry of Host B.
8. Host A sends packets to the gateway, and then the gateway sends the packets from Host A to
Host B at the Layer 3. In this way, Host A and Host B can communicate with each other.
The process that Host B sends packets to Host A is similar, and is not mentioned.
2022-07-08 800
Feature Description
Host A sends a frame to Switch 1 through Port 1. Upon receipt, Switch 1 adds a VLAN tag with a VLAN
ID 2 to the frame. The VLAN ID 2 is not changed to the VLAN 10 on Switch 1 even if VLAN 2 is the sub
VLAN of VLAN 10. When the frame is sent by a trunk Port 3, it still carries the ID of VLAN 2.
That is to say, Switch 1 itself does not send the frames from VLAN 10. If Switch 1 receives frames from
VLAN 10, it discards these frames as there is no physical port for VLAN 10.
A super VLAN has no physical port. This limitation is obligatory, as shown below:
■ If you configure the super VLAN and then the trunk interface, the frames of a super VLAN are
filtered automatically according to the allowed VLAN range set on the trunk interface.
On the network shown in Figure 4, no frame of super VLAN 10 passes through Port 3 on Switch 1,
even though the interface allows frames from all VLANs to pass through.
■ If you configure the trunk interface and allow all VLAN packets to pass through, you still cannot
configure the super VLAN on Switch 1, because any VLAN with physical ports cannot be configured
as the super VLAN.
As for Switch 1, the valid VLANs are just VLAN 2 and VLAN 3, and all frames from these VLANs are
forwarded.
As shown in Figure 5, Switch 1 is configured with super VLAN 4, sub VLAN 2, sub VLAN 3, and a
common VLAN 10. Switch 2 is configured with two common VLANs, namely, VLAN 10 and VLAN 20.
Suppose that Switch 1 is configured with the route to the network segment 1.1.3.0/24, and Switch 2 is
configured with the route to the network segment 1.1.1.0/24. Then Host A in sub VLAN 2 that belongs
to the super VLAN 4 needs to communicate with Host C in Switch 2.
2022-07-08 801
Feature Description
1. After comparing the IP address of Host C 1.1.3.2 with its IP address, Host A finds that two IP
addresses are not on the same network segment 1.1.1.0/24.
2. Host A broadcasts an ARP request to ask for the MAC address of the gateway (Switch 1).
3. After receiving the ARP request, Switch 1 finds the ARP request packet is from sub VLAN 2 and
replies with an ARP response to Host A through sub VLAN 2. The source MAC address in the ARP
response packet is the MAC address of VLANIF 4 for super VLAN 4.
5. Host A sends the packet to the gateway, with the destination MAC address as the MAC address of
VLANIF 4 for super VLAN 4, and the destination IP address as 1.1.3.2.
6. After receiving the packet, Switch 1 performs the Layer 3 forwarding and sends the packet to
Switch 2, with the next hop address as 1.1.2.2, the outgoing interface as VLANIF 10.
7. After receiving the packet, Switch 2 performs the Layer 3 forwarding and sends the packet to
Host C through the directly connected interface VLANIF 20.
8. The response packet from Host C reaches Switch 1 after the Layer 3 forwarding on Switch 2.
9. After receiving the packet, Switch 1 performs the Layer 3 forwarding and sends the packet to
Host A through the super VLAN.
• After VLAN mapping is configured on an interface, the interface replaces the VLAN tag of a local VLAN
frame with an external VLAN tag before sending the VLAN frame out.
• When receiving the VLAN frame, the interface replaces the VLAN tag of a received VLAN frame with the
local VLAN tag.
2022-07-08 802
Feature Description
If devices in two VLANs need to communicate using VLAN mapping, the IP addresses of these devices must
be on the same network segment. Otherwise, devices in the two VLANs must communicate through routes,
and VLAN mapping does not take effect.
The NE40E supports only 1 to 1 VLAN mapping. When a VLAN mapping-enabled interface receives a single-
tagged frame, the interface replaces the VLAN ID in the frame with a specified VLAN ID.
If a user runs a command to enable a VLAN to go Down, VLAN damping does not need to be configured.
Background
On an ME network, users and services are differentiated based on a single VLAN tag or double VLAN tags
carried in packets and then access different Virtual Private Networks (VPNs) through sub-interfaces. In some
special scenarios where the access device does not support QinQ or a single VLAN tag is used in different
services, different services cannot be distributed to different Virtual Switching Instances (VSIs) or VPN
2022-07-08 803
Feature Description
instances.
As shown in Figure 1, the high-speed Internet (HSI), Voice over Internet Protocol (VoIP), and Internet
Protocol Television (IPTV) services belong to VLAN 10 and are converged to the UPE through Switch; the
UPE is connected to the SR and BRAS through Layer 2 virtual private networks (L2VPNs).
If the UPE does not support QinQ, it cannot differentiate the received HSI, VoIP, and IPTV services for
transmitting them over different Pseudo Wires (PWs). In this case, you can configure the UPE to resolve the
802.1p priorities, DiffServ Code Point (DSCP) values, or EthType values of packets. Then, the UPE can
transmit different packets over different PWs based on the 802.1p priorities, DSCP values, or EthType values
of the packets.
In a similar manner, if the UPE is connected to the SR and BRAS through L3VPNs, the UPE can transmit
different services through different VPN instances based on the 802.1p priorities or DSCP values of the
packets.
Basic Concepts
As shown in Figure 1, sub-interfaces of different types are configured at the attachment circuit (AC) side of
the UPE to transmit packets with different 802.1p priorities, DSCP values, or EthTypes through different PWs
or VPN instances. This implements flexible service access. Flexible service access through sub-interfaces is a
technology that differentiates L2VPN access based on the VLAN IDs and 802.1p priorities/DSCP
values/EthType values in packets.
The sub-interfaces are classified in Table 1 based on service identification policies configured on them.
2022-07-08 804
Feature Description
2022-07-08 805
Feature Description
NOTE:
Untagged+DSCP is applicable to
only the IP and L3VPN access
scenario.
2022-07-08 806
Feature Description
2022-07-08 807
Feature Description
2022-07-08 808
Feature Description
As shown in Figure 2, the 802.1p priority is represented by a 3-bit PRI (priority) field in a VLAN frame
defined in IEEE 802.1Q. The value ranges from 0 to 7. The greater the value, the higher the priority.
When the switching device is congested, the switching device preferentially sends packets with higher
priorities. In flexible service access, this field is used to identify service types so that different services
can access different L2VPNs/L3VPNs.
The EthType is represented by a 2-bit LEN/ETYPE field, as shown in Figure 2. In flexible service access,
this field is used to identify service types based on EthType values (PPPoE or IPoE) so that different
services can access different L2VPNs.
• DSCP
As shown in Figure 3, the DSCP is represented by the first 6 bits of the Type of Service (ToS) field in an
IPv4 packet header, as defined in relevant standards. The DSCP guarantees QoS on IP networks. Traffic
control on the gateway depends on the DSCP field.
In flexible service access, this field is used to identify service types so that different services can access
different L2VPNs/L3VPNs.
On the network shown in Figure 4, when a CSG accesses an IP station, VPWS is not required on the CSG and
MASG. After the CSG receives IP packets, it performs the following:
1. The CSG directly encapsulates the packets with VLAN IDs and 802.1p priorities for differentiating
services. The CSG encapsulates the IP packets as follows:
• Encapsulates different services of the same user with the same VLAN ID but different 802.1p
priorities.
• Encapsulates different services of different users with different VLAN IDs but the same or
different 802.1p priorities.
2. Then, the CSG sends the encapsulated packets to PE1. After PE1 receives the packets, its 802.1p sub-
interface resolves the packets to obtain their VLAN IDs and 802.1p priorities. The packets then access
2022-07-08 809
Feature Description
different VSIs through priority mapping. In this manner, different services are transmitted to PE2
through different VSIs.
3. After PE2 receives the packets, it sends the packets to the MASG.
• Huawei high-end Routers can function as PEs. In this scenario, only the configurations of PEs are mentioned. For
detailed configurations of other devices, see the related configuration guides.
• You can configure the 802.1p priorities on the CSG through commands.
• For details on L2VPNs, see the chapters "VPWS" and "VPLS" in the NE40E Feature Description - VPN.
2022-07-08 810
Feature Description
On the network shown in Figure 6, after the CSG receives IP packets, it performs the following:
1. The CSG directly encapsulates the packets with VLAN IDs and DSCP values for differentiating services.
The CSG encapsulates the IP packets as follows:
• Encapsulates different services of the same user with the same VLAN ID but different DSCP
values.
• Encapsulates different services of different users with different VLAN IDs but the same or
different DSCP values.
2. Then, the CSG sends the encapsulated packets to PE1. After PE1 receives the packets, its DSCP sub-
interface resolves the packets to obtain their VLAN IDs and DSCP values. The packets then access
different VSIs through priority mapping. In this manner, different services are transmitted to PE2
through different VSIs.
3. After PE2 receives the packets, it sends the packets to the RNC.
2022-07-08 811
Feature Description
• Huawei high-end Routers can function as PEs. In this scenario, only the configurations of PEs are mentioned. For
detailed configurations of other devices, see the related configuration guides.
• You can configure the DSCP values on the CSG through commands.
• For details on L2VPNs, see the chapters "VPWS" and "VPLS" in the NE40E Feature Description - VPN.
As shown in Figure 7, after the CSG receives IP packets, it performs the following:
1. The CSG directly encapsulates the packets with VLAN IDs and DSCP values for differentiating services.
The CSG encapsulates the IP packets as follows:
• Encapsulates different services of the same user with the same VLAN ID but different DSCP
values.
• Encapsulates different services of different users with different VLAN IDs but the same or
different DSCP values.
2. Then, the CSG sends the encapsulated packets to PE1. After PE1 receives the packets, its DSCP sub-
interface resolves the packets to obtain their VLAN IDs and DSCP values. The packets then access
different VPN instances through priority mapping. In this manner, different services are transmitted to
PE2 through different VPN instances.
3. After PE2 receives the packets, it sends the packets to the RNC.
• Huawei high-end Routers can function as PEs. In this scenario, only the configurations of PEs are mentioned. For
detailed configurations of other devices, see the related configuration guides.
• You can configure the DSCP values on the CSG through commands.
• For details on L3VPNs, see the chapter "BGP/MPLS IP VPN" in the NE40E Feature Description - VPN.
As shown in Figure 8, when a CSG accesses an IP station, VPWS is not required on the CSG and MASG. After
the CSG receives IP packets, it performs the following:
1. The CSG directly encapsulates the packets with VLAN IDs and 802.1p priorities for differentiating
services. The CSG encapsulates the IP packets as follows:
2022-07-08 812
Feature Description
• Encapsulates different services of the same user with the same VLAN ID but different 802.1p
priorities.
• Encapsulates different services of different users with different VLAN IDs but the same or
different 802.1p priorities.
2. Then, the CSG sends the encapsulated packets to PE1. After PE1 receives the packets, its 802.1p sub-
interface resolves the packets to obtain their VLAN IDs and 802.1p priorities. The packets then access
different VPN instances through priority mapping. In this manner, different services are transmitted to
PE2 through different VPN instances.
3. After PE2 receives the packets, it sends the packets to the RNC.
• Huawei high-end Routers can function as PEs. In this scenario, only the configurations of PEs are mentioned. For
detailed configurations of other devices, see the related configuration guides.
• You can configure the 802.1p priorities on the CSG through commands.
• For details on L3VPNs, see the chapter "BGP/MPLS IP VPN" in the NE40E Feature Description - VPN.
2022-07-08 813
Feature Description
2022-07-08 814
Feature Description
2022-07-08 815
Feature Description
Terms
None
Definition
802.1Q-in-802.1Q (QinQ) is a technology that adds another layer of IEEE 802.1Q tag to the 802.1Q tagged
packets entering the network. This technology expands the VLAN space by tagging the tagged packets. It
allows services in a private VLAN to be transparently transmitted over a public network.
2022-07-08 816
Feature Description
Purpose
During intercommunication between Layer 2 LANs based on the traditional IEEE 802.1Q protocol, when two
user networks access each other through a carrier network, the carrier must assign VLAN IDs to users of
different VLANs, as shown in Figure 1. User Network1 and User Network2 access the backbone network
through PE1 and PE2 of a carrier network respectively.
Figure 1 Intercommunication between Layer 2 LANs using the traditional IEEE 802.1Q protocol
To connect VLAN 100 - VLAN 200 on User Network1 to VLAN 100 - VLAN 200 on User Network2, interfaces
connecting CE1, PE1, the P, PE2, and CE2 can be configured to function as trunk interfaces and to allow
packets from VLAN 100 - VLAN 200 to pass through.
This configuration, however, makes user VLANs visible on the backbone network and wastes the carrier's
VLAN ID resources (4094 VLAN IDs are used). In addition, the carrier has to manage user VLAN IDs, and
users do not have the right to plan their own VLANs.
The 12-bit VLAN tag defined in IEEE 802.1Q identifies only a maximum of 4096 VLANs, unable to isolate and
identify mass users in the growing metro Ethernet (ME) network. QinQ is therefore developed to expand the
VLAN space by adding another 802.1Q tag to an 802.1Q tagged packet. In this way, the number of VLANs
increases to 4096 x 4096.
In addition to expanding VLAN space, QinQ is applied in other scenarios with the development of the ME
network and carriers' requirements on refined operation. The outer and inner VLAN tags can be used to
differentiate users from services. For example, the inner tag represents a user, while the outer tag represents
a service. Moreover, QinQ functions as a simple and practical VPN technology by transparently transmitting
private VLAN services over a public network. It extends services of a core MPLS VPN to the ME network and
implements an end-to-end VPN.
Since the QinQ technology is easy to use, it has been widely applied on ISP networks. For example, it is used
by multiple services on the metro Ethernet. As the metro Ethernet develops, different vendors propose their
own metro Ethernet solutions. QinQ with its simplicity and flexibility, plays important roles in metro Ethernet
solutions.
Benefits
QinQ offers the following benefits:
2022-07-08 817
Feature Description
• Facilitates service deployment by allowing the inner and outer tags to represent different information.
For example, use the inner tag to identify a user and the outer tag to identify a service.
• Allows ISPs to implement refined operation by providing diversified encapsulation and termination
modes.
QinQ packets carry two VLAN tags when they are transmitted across a carrier network. The meanings of the
two tags are described as follows:
• Inner VLAN tag: private VLAN tag that identifies the VLAN to which a user belongs.
• Outer VLAN tag: public VLAN tag that is assigned by a carrier to a user.
QinQ Encapsulation
QinQ encapsulation is to add another 802.1Q tag to a single-tagged packet. QinQ encapsulation is usually
performed on UPE interfaces connecting to users.
Currently, only interface-based QinQ encapsulation is supported. Interface-based QinQ encapsulation, also
2022-07-08 818
Feature Description
known as QinQ tunneling, encapsulates packets that enter the same interface with the same outer VLAN
tag. This encapsulation mode cannot flexibly distinguish between users and services.
• After an interface receives a packet with one or two VLAN tags, the device removes the VLAN tags and
forwards the packet at Layer 3. The outbound interface decides whether to add one or two VLAN tags
to the packet.
• Before an interface forwards a packet, the device adds the planned VLAN tag to the packet.
The following section describes the termination types, the VLAN tag termination sub-interfaces, and the
applications of VLAN tag termination.
• Termination type
VLAN packets are classified into dot1q packets, which carry only one VLAN tag, and QinQ packets,
which carry two VLAN tags. Accordingly, there are two VLAN tag termination modes:
Sub-interfaces for QinQ VLAN tag termination are classified into the following types:
■ Explicit sub-interface for QinQ VLAN tag termination: The pair of VLAN tags specifies two
VLANs.
■ Implicit sub-interface for QinQ VLAN tag termination: The pair of VLAN tags specifies two
ranges of VLANs.
Dot1q and QinQ VLAN tag termination sub-interfaces do not support transparent transmission of packets that do
not contain a VLAN tag, and discard received packets that do not contain a VLAN tag.
■ Inter-VLAN communication
2022-07-08 819
Feature Description
The VLAN technology is widely used because it allows Layer 2 packets of different users to be
transmitted separately. With the VLAN technology, a physical LAN is divided into multiple logical
broadcast domains (VLANs). Hosts in the same VLAN can communicate with each other at Layer 2,
but hosts in different VLANs cannot. The Layer 3 routing technology is required for communication
between hosts in different VLANs. The following interfaces can be used to implement inter-VLAN
communication:
User-VLAN Sub-interface
User-VLAN sub-interfaces are used for user access to a BRAS. Different user-VLAN sub-interfaces can be
configured on an interface for different VLAN users. After users' VLAN packets arrive on a BRAS, the BRAS
can differentiate user services based on the VLAN IDs in the packets and then use proper authentication and
address allocation methods for the users. After that, the BRAS sends users' VLAN packets to a RADIUS server
for user location identification.
After user-VLAN sub-interfaces on a BRAS receive matching packets, they remove VLAN tags and then
forward the packets at Layer 3.
• Incoming packets supported by user-VLAN sub-interfaces fall into the following categories:
■ Any-other packets
If packets received on user-VLAN sub-interfaces are neither single-tagged nor double-tagged VLAN
packets permitted by the sub-interfaces, these packets are forwarded by user-VLAN sub-interfaces
of any-other type at Layer 3.
2022-07-08 820
Feature Description
To allow branches to communicate within Company 1 or Company 2 but not between the two companies,
configure QinQ tunneling on PE1 and PE2. The configuration roadmap is as follows:
• On PE1, user packets entering Port 1 and Port 3 are encapsulated with an outer VLAN tag 10, and user
packets entering Port 2 are encapsulated with an outer VLAN tag 20.
2022-07-08 821
Feature Description
• On PE2, user packets entering Port 1 and Port 2 are encapsulated with an outer VLAN tag 20.
• Port 4 on PE1 and Port 3 on PE2 allow the packets tagged with VLAN 20 to pass.
Company 1 2 to 500 10
• QinQ tunneling adds the same outer tag to the frames that enter a QinQ interface.
• Layer 2 selective QinQ adds distinctive outer tags to the frames that enter a QinQ interface according to
inner tags.
On the network shown in Figure 1, Company 1 and Company 2 have more than one branch.
• Interface 1 on PE1 both receives packets from VLANs of Company 1 and Company 2.
2022-07-08 822
Feature Description
To allow branches to communicate within Company 1 or Company 2 but not between the two companies,
configure Layer 2 selective QinQ on PE1 and PE2.
• Table 1 shows the planning of outer VLAN tags in the packets entering different interfaces on PE1 and
PE2.
• Interface 3 on PE1 or PE2 allows the packets tagged with VLAN 20 to pass.
2022-07-08 823
Feature Description
In Figure 2, Device A is a non-Huawei device that uses 0x9100 as the EtherType value, and Device B is a
Huawei device which uses 0x8000 as the EtherType value. To implement interworking between the Huawei
and the non-Huawei devices, configure 0x9100 as the EtherType value in the outer VLAN tag of QinQ
packets sent by the Huawei device.
2022-07-08 824
Feature Description
packets whose outer tag represents the service and inner tag represents the user. In this situation, you can
configure VLAN tag swapping on the UPE to swap the inner and outer tags.
After VLAN tag swapping is configured, once the UPE receives packets with double VLAN tags, it swaps the
inner and outer VLAN tags. VLAN tag swapping does not take effect on packets carrying a single tag.
Principles
QinQ mapping maps VLAN tags in user packets to specified tags before the user packets are transmitted
across the public network.
• Before sending local VLAN frames, a sub-interface replaces the tags in the local frames with external
VLAN tags.
• Before receiving frames from external VLANs, a sub-interface replaces the tags in the external VLANs
with local VLAN tags.
QinQ mapping allows a device to map a user VLAN tag to a carrier VLAN tag, shielding different user VLAN
IDs in packets.
2022-07-08 825
Feature Description
QinQ mapping is deployed on edge devices of a Metro Ethernet. It is applied in but not limited to the
following scenarios:
• VLAN IDs deployed at new sites and old sites conflict, but new sites need to communicate with old sites.
• VLAN IDs planned by each site on the public network conflict. These sites do not need to communicate.
Currently, only 1 to 1 QinQ mapping is supported. When a QinQ mapping-enabled sub-interface receives a
single-tagged packet, the sub-interface replaces the VLAN ID in the frame with a specified VLAN ID.
2. Upon receipt, Switch 1 adds VLAN ID 10 to the frame, and forwards the frame to Switch 2. After Sub-
interface1 on Switch 2 receives the frame with VLAN ID 10, Sub-interface 1 on Switch 2 replaces VLAN
ID 10 with carrier VLAN ID 50. Interface 2 on Switch 2 then sends the frame with carrier VLAN ID 50
to the Internet service provider (ISP) network.
4. After Sub-interface 1 on Switch 3 receives the tagged frame from Switch 2, Sub-interface 1 on Switch
3 replaces the carrier VLAN ID 50 with VLAN ID 30.
2022-07-08 826
Feature Description
• In symmetric mode, when sub-interfaces for QinQ VLAN tag termination are used to access an L2VPN,
packets received by the edge devices on the two ends of the public network must carry the same VLAN
tags.
In symmetry mode, the VLAN planning at each site must be consistent, and only users in the same
VLAN at different sites can communicate with each other. In this mode, user VLANs can be isolated
according to inner tags. MAC address learning is based only on outer tags, and inner tags are
transparently transmitted to the remote end.
• In asymmetric mode, when sub-interfaces for QinQ VLAN tag termination are used to access an L2VPN,
packets received by the edge devices on the two ends of the public network may carry different VLAN
tags.
In asymmetrical mode, the VLANs planning at each site can be different, and users in VLANs at any sites
can communicate with each other. In this mode, user VLANs cannot be isolated, and MAC address
learning is based on both inner and outer tags.
Table 1 and Table 2 describe how a PE processes user packets that arrive at an L2VPN in different ways.
2022-07-08 827
Feature Description
Asymmetry mode Removes both the inner and outer Removes both inner and outer tags and adds
tags. another tag.
Asymmetry mode Adds two tags. Removes one tag and adds another double
tags.
• If the user packets contain one tag, the sub-interface that has IP forwarding configured is a sub-
interface for Dot1q VLAN tag termination.
• If the user packets contain double tags, the sub-interface that has IP forwarding configured is a sub-
interface for QinQ VLAN tag termination.
2022-07-08 828
Feature Description
The sub-interface for Dot1q VLAN tag termination first identifies the outer VLAN tag and then generates an
ARP entry containing the IP address, MAC address, and outer VLAN tag.
• For the upstream traffic, the termination sub-interface strips the Ethernet frame header (including MAC
address) and the outer VLAN tag, and searches the routing table to perform Layer 3 forwarding based
on the destination IP address.
• For the downstream traffic, the termination sub-interface encapsulates IP packets with the Ethernet
frame header (including MAC address) and outer VLAN tag according to ARP entries and then sends IP
packets to the target user.
The sub-interface for QinQ VLAN tag termination first identifies double VLAN tags and then generates an
ARP entry containing the IP address, MAC address, and double VLAN tags.
2022-07-08 829
Feature Description
• For the upstream traffic, the termination sub-interface strips the Ethernet frame header (including MAC
address) and double VLAN tags, and searches the routing table to perform Layer 3 forwarding based on
the destination IP address.
• For the downstream traffic, the termination sub-interface encapsulates IP packets with the Ethernet
frame header (including MAC address) and double VLAN tags according to ARP entries and then sends
IP packets to the target user.
• If the user packets contain one tag, the sub-interface that has proxy ARP configured is a sub-interface
for Dot1q VLAN tag termination.
• If the user packets contain double tags, the sub-interface that has proxy ARP configured is a sub-
interface for QinQ VLAN tag termination.
To solve this problem, configure proxy ARP on the sub-interface for Dot1q VLAN tag termination. The
detailed communication process is as follows:
2. After receiving the ARP Request message, the PE checks the destination IP address of the message and
finds that the destination IP address is not the IP address of its sub-interface for Dot1q VLAN tag
termination. Then, the PE searches its ARP table for the PC3's ARP entry.
• If the PE finds this ARP entry, the PE checks whether inter-VLAN proxy ARP is enabled.
■ If inter-VLAN proxy ARP is enabled, the PE sends the MAC address of its sub-interface for
Dot1q VLAN tag termination to PC1.
■ If inter-VLAN proxy ARP is not enabled, the PE discards the ARP Request message.
2022-07-08 830
Feature Description
• If the PE does not find this ARP entry, the PE discards the ARP Request message sent by PC1 and
checks whether inter-VLAN proxy ARP is enabled.
■ If inter-VLAN proxy ARP is enabled, the PE sends an ARP Request message to PC3. After the
PE receives an ARP Reply message from PC3, an ARP entry of PC3 is generated in the PE's
ARP table.
■ If inter-VLAN proxy ARP is not enabled, the PE does not perform any operations.
3. After learning the MAC address of the sub-interface for Dot1q VLAN tag termination, PC1 sends IP
packets to the PE based on this MAC address.
To solve this problem, enable proxy ARP on the sub-interface for QinQ VLAN tag termination. The detailed
communication process is as follows:
2022-07-08 831
Feature Description
2. After receiving the ARP Request message, the PE checks the destination IP address of the message and
finds that the destination IP address is not the IP address of its sub-interface for QinQ VLAN tag
termination. Then, the PE searches its ARP table for the PC3's ARP entry.
• If the PE finds this ARP entry, the PE checks whether inter-VLAN proxy ARP is enabled.
■ If inter-VLAN proxy ARP is enabled, the PE sends the MAC address of its sub-interface for
QinQ VLAN tag termination to PC1.
■ If inter-VLAN proxy ARP is not enabled, the PE discards the ARP Request message.
• If the PE does not find this ARP entry, the PE discards the ARP Request message sent by PC1 and
checks whether inter-VLAN proxy ARP is enabled.
■ If inter-VLAN proxy ARP is enabled, the PE sends an ARP Request message to PC3. After the
PE receives an ARP Reply message from PC3, an ARP entry of PC3 is generated in the PE's
ARP table.
■ If inter-VLAN proxy ARP is not enabled, the PE does not perform any operations.
3. After learning the MAC address of the sub-interface for QinQ VLAN tag termination, PC1 sends IP
packets to the PE based on this MAC address.
• If the user packets contain one tag, the sub-interface that has the DHCP server function configured is a
sub-interface for Dot1q VLAN tag termination.
• If the user packets contain double tags, the sub-interface that has the DHCP server function configured
is a sub-interface for QinQ VLAN tag termination.
On the network shown in Figure 1, the user packet received by the DHCP server carries a single tag. To
enable the sub-interface for Dot1q VLAN tag termination on the DHCP server to assign an IP address to a
DHCP client, configure the DHCP server function on the sub-interface for Dot1q VLAN tag termination.
On the network shown in Figure 2, the switch has selective QinQ configured, and the user packet received by
the DHCP server carries double tags. To enable the sub-interface for QinQ VLAN tag termination on the
DHCP server to assign an IP address to a DHCP client, configure the DHCP server function on the sub-
interface for QinQ VLAN tag termination.
function is configured on termination sub-interfaces. This function allows the sub-interfaces to add user tag
information into Option 82, so that a DHCP server can assign IP addresses based on the tag information.
The DHCP relay function can be configured on a sub-interface for Dot1q VLAN tag termination or sub-
interface for QinQ VLAN tag termination, based on whether the user packets received by a PE contain one or
two VLAN tags.
• If the user packets contain one tag, the sub-interface that has the DHCP relay function configured is a
sub-interface for Dot1q VLAN tag termination.
• If the user packets contain double tags, the sub-interface that has the DHCP relay function configured is
a sub-interface for QinQ VLAN tag termination.
On the sub-interface for Dot1q VLAN tag termination, the DHCP relay function is implemented as follows:
1. When receiving a DHCP request message, the DHCP relay adds user tag information into the Option
82 field in the message.
2. When receiving a DHCP reply message (ACK message) from the DHCP server, the DHCP relay analyzes
the DHCP reply and generates a binding table.
3. The DHCP relay checks user packets based on the user tag information.
2022-07-08 834
Feature Description
interface for QinQ VLAN tag termination does not support the DHCP relay, the DHCP relay regards the
received packet as an invalid packet and discards it. As a result, the DHCP client cannot obtain an IP address
from the DHCP server.
On the sub-interface for QinQ VLAN tag termination, the DHCP relay function is implemented as follows:
1. When receiving a DHCP request message, the DHCP relay adds user tag information into the Option
82 field in the message.
2. When receiving a DHCP reply message (ACK message) from the DHCP server, the DHCP relay analyzes
the DHCP reply and generates a binding table.
3. The DHCP relay checks user packets based on the user tag information.
• If the user packets contain one tag, the sub-interface that has VRRP configured is a sub-interface for
Dot1q VLAN tag termination.
• If the user packets contain double tags, the sub-interface that has VRRP configured is a sub-interface
for QinQ VLAN tag termination.
2022-07-08 835
Feature Description
On the network shown in Figure 1, sub-interfaces for Dot1q VLAN tag termination specify an outer tag, such
as tag 100, to configure a VRRP group.
• Only one VRRP instance needs to be created for users on the same network segment, even if they carry
different VLAN tags.
2022-07-08 836
Feature Description
On the network shown in Figure 2, sub-interfaces for QinQ VLAN tag termination specify double tags, such
as an inner tag 100, outer tag 1000 to configure a VRRP group.
• Only one VRRP instance needs to be created for users on the same network segment, even if they carry
different VLAN tags.
2022-07-08 837
Feature Description
• If the user packets contain one tag, the sub-interface that has L3VPN functions configured is a sub-
interface for Dot1q VLAN tag termination.
• If the user packets contain double tags, the sub-interface that has L3VPN functions configured is a sub-
interface for QinQ VLAN tag termination.
2022-07-08 838
Feature Description
Figure 1 L3VPN access through a sub-interface for Dot1q VLAN tag termination
2022-07-08 839
Feature Description
Figure 2 L3VPN access through a sub-interface for QinQ VLAN tag termination
Figure 1 VPWS access through a sub-interface for QinQ VLAN tag termination
2022-07-08 840
Feature Description
• If the user packets contain one tag, the sub-interface that has VPLS functions configured is a sub-
interface for Dot1q VLAN tag termination.
• If the user packets contain double tags, the sub-interface that has VPLS functions configured is a sub-
interface for QinQ VLAN tag termination.
2022-07-08 841
Feature Description
Figure 1 VPLS access through a sub-interface for Dot1q VLAN tag termination
VPLS supports the Point-to-Multipoint Protocol (P2MP) and forwards data by learning MAC addresses. In
this case, VPLS access through a sub-interface for Dot1q VLAN tag termination can be performed by MAC
address learning on the basis of a single VLAN tag. Note that there are no restrictions on VLAN tags for
VPLS access.
2022-07-08 842
Feature Description
Figure 2 VPLS access through a sub-interface for QinQ VLAN tag termination
VPLS supports the P2MP and forwards data by learning MAC addresses. In this case, VPLS access through a
sub-interface for QinQ VLAN tag termination can be performed by MAC address learning on the basis of
double VLAN tags. Note that there are no restrictions on VLAN tags for VPLS access.
2022-07-08 843
Feature Description
On the network shown in Figure 1, when the DSLAM forwards double-tagged multicast packets to the UPE,
the UPE processes the packets as follows based on double-tag contents:
1. When the double-tagged packets carrying an outer S-VLAN tag and an inner C-VLAN tag are
transmitted to the UPE to access the Virtual Switching Instances (VSIs), the UPE terminates the double
tags and binds the packets to the multicast VSIs through Pseudo Wires (PWs). Then, the PE-AGG
terminates PWs and adds multicast VLAN tags to the packets. Finally, the packets are transmitted to
the multicast source. For example, IPTV packets with S-VLAN 3 and C-VLANs ranging from 1 to 1000
are terminated on the UPE and then access a PW. The PE-AGG terminates the PW and adds multicast
VLAN 8 to the packets. IGMP snooping sets up forwarding entries based on the interface number, S-
VLAN tag, and C-VLAN tag and supports multicast packets with different C-VLAN tags. Each PW then
forwards the multicast packets based on their S-VLAN IDs and C-VLAN IDs.
2. When the double-tagged packets carrying an outer C-VLAN tag and an inner S-VLAN tag are
transmitted to the UPE, the UPE enabled with VLAN swapping swaps the outer C-VLAN tag and inner
S-VLAN tag. If multicast packets access Layer 2 VLANs, the packets are processed in mode 1; if
multicast packets access VSIs, the packets are processed in mode 2.
• Single-tagged packets: The sub-interface for Dot1q VLAN tag termination needs to have IGMP and
IGMP snooping configured.
• Double-tagged packets: The sub-interface for QinQ VLAN tag termination needs to have IGMP and
IGMP snooping configured.
2022-07-08 844
Feature Description
2022-07-08 845
Feature Description
To solve this problem, the 802.1p value in the inner VLAN tag must be processed on a QinQ sub-interface.
The following three ways are available on a QinQ interface:
• Ignores the 802.1p value in the inner VLAN tag, but resets the 802.1p value in the outer VLAN tag.
• Automatically maps the 802.1p value in the inner VLAN tag to an 802.1p value in the outer VLAN tag.
2022-07-08 846
Feature Description
• Sets the 802.1p value in the outer VLAN tag according to the 802.1p value in the inner VLAN tag.
• Uniform mode: The 802.1p value in the inner VLAN tag is used.
• Maps the 802.1p value in the inner VLAN tag to an 802.1p value in the outer VLAN tag. Multiple 802.1p
values in the inner VLAN tag can be mapped to an 802.1p value in the outer VLAN tag, but one 802.1p
value in the inner VLAN tag cannot be mapped to multiple 802.1p values in the outer VLAN tag.
2022-07-08 847
Feature Description
PVCs are used to carry services that are assigned with different VLAN ID ranges. The following table lists the
VLAN ID ranges for each service.
If a user needs to use the VoIP service, user VoIP packets are sent to a DSLAM over a specified PVC and
assigned with VLAN ID 301. When the packets reach the UPE, an outer VLAN ID (for example, 2000) is
added to the packets. The inner VLAN ID (301) represents the user, and the outer VLAN ID (2000) represents
the VoIP service (the DSLAM location can also be marked if you add different VLAN tags to packets received
by different DSLAMs). The UPE then sends the VoIP packets to the NPE where the double VLAN tags are
terminated. Then, the NPE sends the packets to an IP core network or a VPN.
HSI and IPTV services are processed in the same way. The difference is that QinQ termination of HSI services
is implemented on the BRAS.
The NPE can generate a Dynamic Host Configuration Protocol (DHCP) binding table to avoid network
2022-07-08 848
Feature Description
attacks. In addition, the NPE can implement DHCP authentication based on the two-layer tags and has
Virtual Router Redundancy Protocol (VRRP) enabled to ensure service reliable access.
A carrier deploys the VPLS technology on the IP/MPLS core network and QinQ on the ME network. Three
VLANs are assigned for each site to identify the finance, marketing and other departments, and the VLAN ID
for finance is 100, for marketing is 200, and for others is 300. An outer VLAN 1000 is encapsulated on a UPE
(Packets can be added with different VLAN tags on different UPEs). The sub-interface bound to a VSI on the
NPE connected to the UPE is in symmetry mode. In this way, users belonging to the same VLAN in different
sites can communicate with each other.
Terms
Term Definition
QinQ interface An interface that can process VLAN frames with a single tag (Dot1q termination) or
with double tags (QinQ termination).
VLAN tag An interface that identifies the single or double tags in a packet and removes the
termination sub- single or double tags before sending the packets.
2022-07-08 849
Feature Description
Term Definition
interface
Definition
An Ethernet virtual connection (EVC) defines a uniform Layer 2 service transport and configuration model.
Proposed by the Metro Ethernet Forum (MEF), an EVC is an association of two or more user network
interfaces on an Internet service provider (ISP) network. In the EVC model, bridge domains (BDs) are used to
isolate user networks.
2022-07-08 850
Feature Description
An EVC is a model for transporting Ethernet services, rather than a specific service or technique.
Purpose
Figure 1 shows the service model supported by the NE40E.
The service model of the NE40E has limitations, which are described in Table 1. To address the limitations in
Table 1, the EVC model is implemented on the NE40E, as shown in Figure 2.
Table 1 provides a comparison between the traditional service model and the EVC model of the NE40E.
2022-07-08 851
Feature Description
Table 1 Comparison between the traditional service model and the EVC model of the NE40E
Ethernet Sub-interfaces and Layer 2 interfaces, which EVC Layer 2 sub-interfaces only.
Flow Point have various types and require different Configurations are unified on the Layer 2
(EFP) configurations. sub-interfaces. The configurations include
traffic encapsulation types, traffic behaviors,
traffic policies, and traffic forwarding modes.
Traffic encapsulation types and behaviors
can be combined flexibly in a traffic policy so
that a device can use a policy to transmit a
specific type of service through a specific
EVC Layer 2 sub-interface.
2022-07-08 852
Feature Description
Layer 3 VLANIF interface: terminates Layer 2 packets BDIF interface: terminates Layer 2 services
access and provides Layer 3 access. and provides Layer 3 access.
Benefits
EVC unifies the Ethernet service model and configuration model, simplifies configuration management,
improves O&M efficiency, and enhances service access scalability.
2022-07-08 853
Feature Description
multipoint EVC The E-LAN service is a multipoint-to-multipoint Ethernet connection that provides the
Ethernet/VLAN extension function across the carrier network.
Before describing how EVC carries services, this section introduces the related concepts.
Related Concepts
• EVC Layer 2 sub-interface
An EVC Layer 2 sub-interface is a Layer 2 service access object. It can be connected only to a BD or
VPWS network but cannot be directly connected to a Layer 3 network.
• BD
A BD is a broadcast domain in the EVC model. VLAN tags are transparent within a BD, and MAC
address learning is based on BDs.
An EVC Layer 2 sub-interface belongs to only one BD. After an EVC Layer 2 sub-interface is added to a
BD, its services are isolated from services in other BDs.
• BDIF
A BDIF interface is a Layer 3 logical interface that terminates Layer 2 services and provides Layer 3
access.
Each BD can have only one BDIF interface.
Figure 1 shows a diagram of EVC service bearing, involving EFPs, broadcast domains, and Layer 3 access.
EFP
2022-07-08 854
Feature Description
An EVC Layer 2 sub-interface is used as an EVC EFP, on which traffic encapsulation types and behaviors can
be flexibly combined. A traffic encapsulation type and behavior are grouped into a traffic policy. Traffic
policies help implement flexible Ethernet traffic access.
• Traffic encapsulation
A Layer 2 Ethernet can transmit untagged, single-tagged, and double-tagged packets. To enable a
specific EVC Layer 2 sub-interface to transmit a specific type of packet, specify an encapsulation type on
the EVC Layer 2 sub-interface. Table 2 lists traffic encapsulation types supported by EVC Layer 2 sub-
interfaces.
Untagged An EVC Layer 2 sub-interface with this traffic Only a single encapsulation type
encapsulation type can only receive packets can be specified on each EVC
carrying no VLAN tags. Layer 2 sub-interface.
2022-07-08 855
Feature Description
2022-07-08 856
Feature Description
On a main interface, if only one EVC Layer 2 sub-interface is created and the encapsulation type is
default, all traffic is forwarded through the EVC Layer 2 sub-interface.
On a main interface, if there are EVC sub-interfaces of both the default and other traffic encapsulation
types (such as dot1q and QinQ), and all the non-default EVC sub-interfaces are down, traffic precisely
matching these non-default EVC sub-interfaces will not be forwarded through the default EVC sub-
interface.
■ Different types of sub-interfaces, including common sub-interfaces, EVC Layer 2 sub-interfaces, sub-interfaces
for dot1q VLAN tag termination, and sub-interfaces for QinQ VLAN tag termination, can be created on the
same main interface. Among these sub-interfaces, only EVC Layer 2 sub-interfaces can be connected to BDs
and configured with traffic encapsulation types, traffic behaviors, traffic policies, and traffic forwarding
modes.
■ After an EVC Layer 2 sub-interface of the default type is connected to a BD, no BDIF interface can be created
for the BD.
• Traffic behavior
Table 3 lists traffic behaviors supported by EVC Layer 2 sub-interfaces.
The rules of the traffic behaviors in Table 3 are as follows:
■ Only one traffic behavior can be specified on each EVC Layer 2 sub-interface.
■ The traffic behavior except map single outbound for incoming traffic must be the inverse of that
for outgoing traffic.
■ By default, no traffic behavior is specified on an EVC Layer 2 sub-interface. The EVC Layer 2 sub-
interface transparently forwards all received packets, without modifying their tag settings.
2022-07-08 857
Feature Description
2022-07-08 858
Feature Description
2022-07-08 859
Feature Description
2022-07-08 860
Feature Description
2022-07-08 861
Feature Description
ID value by a communication.
2022-07-08 862
Feature Description
• Traffic policies
A traffic policy is a combination of a traffic encapsulation type and a traffic behavior. On the network
shown in Figure 7, users on PE1 need to communicate with users on other PEs at Layer 2. To meet this
requirement, the following steps must be performed:
■ Create a BD on PE1, create an EVC Layer 2 sub-interface on the PE1 interface used for user access,
configure an encapsulation type for this sub-interface and add it to the BD.
■ Create a BD with the same ID as that on PE1 on each of the other PEs, configure EVC Layer 2 sub-
interfaces on PE interfaces used for user access, specify traffic encapsulation types and behaviors,
and add all EVC Layer 2 sub-interfaces to the BD.
■ Create EVC Layer 2 sub-interfaces connecting all PEs and add them to the BD with the same ID as
that on PE1.
All users must be on the same network segment to enable users on PE1 to communicate with users on the other
PEs.
2022-07-08 863
Feature Description
DeviceInterface
Traffic Traffic Processing of Incoming Packets Processing of Outgoing Packets
Name Name Encapsulation
Behavior on an Inbound Interface on an Outbound Interface
PE3 port3 QinQ map 2-to-1 Maps the outer VLAN tag 30 and Maps the single VLAN tag 10 in
vid 10 inner VLAN tag 300 in the the packets to double VLAN tags
received packets to a single (outer VLAN tag 30 and inner
VLAN tag 10. VLAN tag 300).
PE3 port4 Default push vid 10 Adds VLAN tag 10 to the Removes the VLAN tag 10 from
received untagged packets. the packets.
Traffic encapsulation types and behaviors can be combined flexibly in policies. Table 4 describes the
matching between traffic encapsulation types and behaviors.
2022-07-08 864
Feature Description
QoS policies can also be configured on EVC Layer 2 sub-interfaces to provide differentiated services (DiffServ) and
help use resources efficiently.
• Traffic forwarding
Figure 8 shows how traffic is forwarded based on the EVC model when Layer 2 sub-interfaces receive
packets carrying two VLAN tags.
2022-07-08 865
Feature Description
EVC Layer 2 sub-interfaces are created on the PE1 and PE2 interfaces connecting to the CEs. A traffic
policy is deployed on each EVC Layer 2 sub-interface, and the sub-interfaces are added to BD1.
2022-07-08 866
Feature Description
packets.
On PE1, the EVC Layer 2 sub-interface of port1 adds outer VLAN tag 100 and inner VLAN tag 10 to
the packets based on the configured traffic encapsulation type and traffic behavior, and then
forwards the packets.
Broadcast Domain
EVC provides a unified broadcast domain model, as shown in Figure 9.
Layer 3 Access
A BDIF interface can be created for a BD in the EVC model. This interface terminates Layer 2 services and
provides Layer 3 access. Figure 10 shows how a BDIF interface works.
A BD is created on a device. An EVC Layer 2 sub-interface is created on the user side of the device, and a
2022-07-08 867
Feature Description
traffic policy (traffic encapsulation type and behavior) is configured on the EVC Layer 2 sub-interface. Then
the EVC Layer 2 sub-interface is bound to the BD. In this manner, packets from the user network are
forwarded at Layer 2 through the BD.
A BDIF interface is created for the BD, and an IP address is configured for the BDIF interface. The BDIF
interface then functions as a virtual interface that forwards packets at Layer 3.
When forwarding packets, the BDIF interface matches only the destination MAC address in each packet.
• Layer 2 to Layer 3: The EVC Layer 2 sub-interface sends the received user packets to the bound BD
based on the configured traffic policy. If the destination MAC address of a user packet is the MAC
address of the BDIF interface, the device removes the Layer 2 header of the packet and searches its
routing table for Layer 3 forwarding. For all the other user packets, the device directly forwards them at
Layer 2.
• Layer 3 to Layer 2: When receiving packets, the device searches its routing table for the outbound BDIF
interface and then sends the packets to this interface. Upon receipt, the BDIF interface encapsulates the
packets based on the ARP entry, searches the MAC address table for the outbound interface, and then
forwards them at Layer 2.
• After PE1 receives the packet, PE1's forwarder selects a PW for forwarding the packet.
• PE1 then adds two labels to the packet based on the PW forwarding entry and tunnel information. The
inner private network label identifies the PW, and the outer public network label identifies the tunnel
between PE1 and PE2.
• After the Layer 2 packet reaches PE2 through the public network tunnel, PE2 removes the private
network label.
• PE2's forwarder then selects an AC and forwards the Layer 2 packet from CE1 to CE2 over the AC.
When PE1 receives Layer 2 packets from CE1, PE1 determines whether to accept them and how to process
their VLAN tags according to inbound interface types. Similarly, when PE2 forwards Layer 2 packets from
CE1 to CE2, PE2 processes their VLAN tags according to outbound interface types. The following describes
how different types of EVC Layer 2 sub-interfaces accept and process different types of packets, and how the
original data packets are changed in different traffic policies.
Untagged EVC sub-interface
An untagged EVC sub-interface accepts only packets that do not carry VLAN tags.
2022-07-08 868
Feature Description
Figure 2 Unchanged original data packet when no traffic behavior is configured on untagged EVC sub-
interfaces
Figure 3 Processing on the original data packet when traffic behavior push1 is configured on untagged EVC
sub-interfaces
Figure 4 Processing on the original data packet when traffic behavior push2 is configured on untagged EVC
sub-interfaces
2022-07-08 869
Feature Description
Figure 5 Unchanged original data packet when no traffic behavior is configured on dot1q EVC sub-interfaces
Figure 6 Processing on the original data packet when traffic behavior push1 is configured on dot1q EVC sub-
interfaces
Figure 7 Processing on the original data packet when traffic behavior push2 is configured on dot1q EVC sub-
interfaces
Figure 8 Processing on the original data packet when traffic behavior pop single is configured on dot1q EVC
sub-interfaces
2022-07-08 870
Feature Description
Figure 9 Processing on the original data packet when traffic behavior map offset is configured on dot1q EVC
sub-interfaces
Figure 10 Processing on the original data packet when traffic behavior map 1 to 1 is configured on dot1q
EVC sub-interfaces
Figure 11 Processing on the original data packet when traffic behavior map 1 to 2 is configured on dot1q
EVC sub-interfaces
Figure 12 Processing on the original data packet when traffic behavior single outbound is configured on
dot1q EVC sub-interfaces
2022-07-08 871
Feature Description
modifying it.
Figure 13 Unchanged original data packet when no traffic behavior is configured on QinQ EVC sub-
interfaces
Figure 14 Processing on the original data packet when traffic behavior push1 is configured on QinQ EVC
sub-interfaces
Figure 15 Processing on the original data packet when traffic behavior pop single is configured on QinQ EVC
sub-interfaces
Figure 16 Processing on the original data packet when traffic behavior pop double is configured on QinQ
EVC sub-interfaces
2022-07-08 872
Feature Description
Figure 17 Processing on the original data packet when traffic behavior swap is configured on QinQ EVC sub-
interfaces
Figure 18 Processing on the original data packet when traffic behavior map offset is configured on QinQ EVC
sub-interfaces
Figure 19 Processing on the original data packet when traffic behavior map 1 to 1 is configured on QinQ
EVC sub-interfaces
Figure 20 Processing on the original data packet when traffic behavior map 2 to 1 is configured on QinQ
EVC sub-interfaces
2022-07-08 873
Feature Description
Figure 21 Processing on the original data packet when traffic behavior map 1 to 2 is configured on QinQ
EVC sub-interfaces
Figure 22 Processing on the original data packet when traffic behavior map 2 to 2 is configured on QinQ
EVC sub-interfaces
Figure 23 Processing on the original data packet when traffic behavior single outbound is configured on
QinQ EVC sub-interfaces
Service Overview
As enterprises widen their global reach and establish more branches in different regions, applications such as
instant messaging and teleconferencing are becoming more common. This imposes high requirements for
end-to-end (E2E) Datacom technologies. A network capable of providing point to multipoint (P2MP) and
multipoint to multipoint (MP2MP) services is paramount to Datacom function implementation. To ensure
the security of enterprise data, secure, reliable, and transparent data channels must be provided for
multipoint transmission.
Generally, enterprises lease virtual switching instances (VSIs) on a carrier network to carry services between
branches.
Networking Description
Figure 1 Networking of enterprise service distribution
In Figure 1, Branch 1 and Branch 3 belong to one department (the Procurement department, for example),
and Branch 2 and Branch 4 belong to another department (the R&D department, for example). Services
must be isolated between these departments, but each department can plan their VLANs independently (for
example, different service development teams belong to different VLANs). The enterprise plans to
2022-07-08 875
Feature Description
dynamically adjust the departments but does not want to lease multiple VSIs on the carrier network because
of the associated costs.
Feature Deployment
In the traditional service model supported by the NE40E shown in Figure 1, common sub-interfaces (VLAN
type), sub-interfaces for dot1q VLAN tag termination, or sub-interfaces for QinQ VLAN tag termination are
created on the user-side interfaces of the PEs. These sub-interfaces are bound to different VSIs on the carrier
network to isolate services in different departments. If the enterprise sets up another department, the
enterprise must lease another VSI from the carrier to isolate the departments, increasing costs.
To allow the enterprise to dynamically adjust its departments and reduce costs, the EVC model can be
deployed on the PEs. In the EVC model, multiple BDs are connected to the same VSI, and the BDs are
isolated from each other.
In Figure Diagram of EVC bearing VPLS services, the EVC model is deployed as follows:
1. VPLS connections are created on the PEs to ensure communication on the Layer 2 network.
3. EVC Layer 2 sub-interfaces are created on the PEs on the user side, are configured with QinQ traffic
encapsulation type and pop double traffic behavior, and transmit Enterprise services to the carrier
network.
2022-07-08 876
Feature Description
4. A BDIF interface is created in each BD, and the BDs are bound to the same VSI to transmit enterprise
services over pseudo wires (pws) in tagged mode.
Figure 2 shows the VSI channel mode in which BDs are connected to the VPLS network. The VSI functions as
the network-side channel, and BDs function as service instances on the access layer. A VSI can carry service
traffic in multiple BDs.
• When a packet travels from a BD to a PW, the PE adds the BD ID to the packet as the outer tag (P-
Tag).
• When a packet travels from a PW to a BD, the PE searches for the VSI instance based on the VC label
and the BD based on the P-Tag.
The NE40E also supports the exclusive VSI service mode. This mode is similar to a traditional service mode in
which sub-interfaces are bound to different VSIs to connect to the VPLS network. Figure 3 shows a diagram
of the exclusive VSI service mode.
In the exclusive VSI service mode, each VSI is connected to only one BD, and the BD occupies the VSI
resource exclusively.
Service Description
As globalization gains momentum, more and more enterprises set up branches in foreign countries and
requirements for office flexibility are increasing. An urgent demand for carriers is to provide Layer 2 links for
enterprises to set up their own enterprise networks, so that enterprise employees can conveniently visit
enterprise intranets outside their offices.
By combining previous access modes with the current IP backbone network, VPWS prevents duplicate
network construction and saves operation costs.
Networking Description
Figure 1 Configuring EVC VPWS Services
In the traditional service model supported by the NE40E, common sub-interfaces (VLAN type), Dot1q VLAN
tag termination sub-interfaces, or QinQ VLAN tag termination sub-interfaces are created on the user-side
interfaces of PEs. These sub-interfaces are bound to different VSIs on the carrier network. If Layer 2 devices
use different access modes on a network, service management and configuration are complicated and
difficult. To resolve this issue, configure an EVC to carry Layer 2 services. This implementation facilitates
network planning and management, driving down enterprise costs.
On the VPWS network shown in Figure 1, VPN1 services use the EVC VPWS model. The traffic encapsulation
type and behavior are configured on the PE to ensure service connectivity within the same VPN instance.
Feature Deployment
1. Create a Layer 2 EVC sub-interface on the PE and specify the traffic encapsulation type and behavior
on the Layer 2 sub-interface.
Terms
2022-07-08 878
Feature Description
Terms Definition
EVC Ethernet Virtual Connection. A model for carrying Ethernet services over a
metropolitan area network (MAN). It is defined by the Metro Ethernet Forum (MEF).
An EVC is a model, rather than a specific service or technique.
BD bridge domain
Definition
Generally, redundant links are used on an Ethernet switching network to provide link backup and enhance
network reliability. The use of redundant links, however, may produce loops, causing broadcast storms and
MAC address table instability. As a result, the communication quality deteriorates, and the communication
service may even be interrupted. The Spanning Tree Protocol (STP) is introduced to resolve this problem.
Related Concepts
STP has a narrow sense and a broad sense.
• STP, in a narrow sense, refers to only the STP protocol defined in IEEE 802.1D.
• STP, in a broad sense, refers to the STP protocol defined in IEEE 802.1D, the Rapid Spanning Tree
Protocol (RSTP) defined in IEEE 802.1W, and the Multiple Spanning Tree Protocol (MSTP) defined in
IEEE 802.1S.
• STP
STP, a management protocol at the data link layer, is used to detect and prevent loops on a Layer 2
network. STP blocks redundant links on a Layer 2 network and trims a network into a loop-free tree
topology.
The STP topology, however, converges at a slow speed. A port cannot be changed to the Forwarding
state until twice the time specified by the Forward Delay timer elapses.
• RSTP
2022-07-08 879
Feature Description
• MSTP
MSTP defines a VLAN mapping table in which VLANs are associated with multiple spanning tree
instances (MSTIs). In addition, MSTP divides a switching network into multiple regions, each of which
has multiple independent MSTIs. In this manner, the entire network is trimmed into a loop-free tree
topology, and replication and circular propagation of packets and broadcast storms are prevented on
the network. In addition, MSTP provides multiple redundant paths to balance VLAN traffic.
MSTP is compatible with STP and RSTP. Table 1 shows a comparison between STP, RSTP, and MSTP.
Spanning
Characteristics Usage Scenario Precautions
Tree
Protocol
free tree is generated. scenario where all VLANs share If the current switching device
supports only STP, STP is
Broadcast storms are one spanning tree. In this
recommended.
therefore prevented, and situation, users or services do If the current switching device
redundancy is not need to be differentiated. supports both STP and RSTP,
RSTP is recommended.
implemented. If the current switching device
supports STP or RSTP, and
RSTP MSTP, MSTP is recommended.
In an RSTP region, a loop-
free tree is generated.
Broadcast storms are
thereby prevented, and
redundancy is
implemented.
RSTP allows fast
convergence of a network
topology.
2022-07-08 880
Feature Description
Spanning
Characteristics Usage Scenario Precautions
Tree
Protocol
Purpose
After a spanning tree protocol is configured on an Ethernet switching network, it calculates the network
topology and implements the following functions to remove network loops:
• Loop prevention: The potential loops on the network are cut off after redundant links are blocked.
• Link redundancy: When an active path becomes faulty, a redundant link can be activated to ensure
network connectivity.
Benefits
This feature offers the following benefits to carriers:
• Compared with dual-homing networking, the ring networking requires fewer fibers and transmission
resources. This reduces resource consumption.
• STP prevents broadcast storms. This implements real-time communication and improves communication
reliability.
7.9.2.1 Background
STP is used to prevent loops in a LAN. As a LAN expands, STP has become an important protocol for the
LAN. The devices running STP discover loops on the network by exchanging information with one another,
and block certain interfaces to cut off loops.
2022-07-08 881
Feature Description
Basic Design
STP runs at the data link layer. The devices running STP discover loops on the network by exchanging
information with each other and trim the ring topology into a loop-free tree topology by blocking a certain
interface. In this manner, replication and circular propagation of packets are prevented on the network. In
2022-07-08 882
Feature Description
The devices running STP usually communicate with each other by exchanging configuration Bridge Protocol
Data Units. BPDUs are classified into two types:
• Configuration BPDU: used to calculate a spanning tree and maintain the spanning tree topology.
• Topology Change Notification (TCN) BPDU: used to inform associated devices of a topology change.
Configuration BPDUs contain the following information for devices to calculate the spanning tree.
• Root bridge ID: is composed of a root bridge priority and the root bridge's MAC address. Each STP network has only
one root bridge.
• Cost of the root path: indicates the cost of the shortest path to the root bridge.
• Designated bridge ID: is composed of a bridge priority and a MAC address.
• Designated port ID: is composed of a port priority and a port name.
• Message Age: specifies the lifetime of a BPDU on the network.
• Max Age: specifies the maximum time a BPDU is saved.
• Hello Time: specifies the interval at which BPDUs are sent.
• Forward Delay: specifies the time interface status transition takes.
• ID
Two types of IDs are available: Bridge IDs (BIDs) and Port IDs (PIDs).
■ BID
As defined in IEEE 802.1D, a BID is composed of a 16-bit bridge priority and a bridge MAC address.
The bridge priority occupies the left most 16 bits and the MAC address occupies the rightmost 48
bits.
On an STP-enabled network, the device with the smallest BID is selected to be the root bridge.
■ PID
The PID is composed of a 4-bit port priority and a 12-bit port number. The port priority occupies
2022-07-08 883
Feature Description
the left most 4 bits and the port number occupies remaining bits on the right.
The PID is used to select the designated port.
The port priority affects the role of a port in a specified spanning tree instance. For details, see STP Topology
Calculation.
• Path cost
The path cost is a port variable and is used to select a link. STP calculates the path cost to select a
robust link and blocks redundant links to trim the network into a loop-free tree topology.
On an STP-enabled network, the accumulative cost of the path from a certain port to the root bridge is
the sum of the costs of all the segment paths into which the path is separated by the ports on the
transit bridges.
Table 1 shows the path costs defined in IEEE 802.1t. Different device manufacturers use different path
cost standards.
2022-07-08 884
Feature Description
The rate of an aggregated link is the sum of the rates of all Up member links in the aggregated group.
2022-07-08 885
Feature Description
Three Elements
There are generally three elements used when a ring topology is to be trimmed into a tree topology: root
bridge, root port, and designated port. Figure 1 shows the three elements.
• Root bridge
The root bridge is the bridge with the smallest BID. The smallest BID is determined by exchanging
configuration BPDUs.
• Root port
The root port is a port that has the fewest path cost to the root bridge. To be specific, the root port is
determined based on the path cost. Among all STP-enabled ports on a network bridge, the port with
the smallest root path cost is the root port. There is only one root port on an STP-enabled device, but
there is no root port on the root bridge.
• Designated port
For description of a designated bridge and designated port, see Table 2.
Device Device that forwards configuration Designated bridge port that forwards
BPDUs to a directly connected configuration BPDUs to a device
device
LAN Device that forwards configuration Designated bridge port that forwards
2022-07-08 886
Feature Description
As shown in Figure 2, AP1 and AP2 reside on Device A; BP1 and BP2 reside on Device B; CP1 and CP2
reside on Device C.
■ Device A sends configuration BPDUs to Device B through AP1. Device A is the designated bridge of
Device B, and AP1 on Device A is the designated port.
■ Two devices, Device B and Device C, are connected to the LAN. If Device B is responsible for
forwarding configuration BPDUs to the LAN, Device B is the designated bridge of the LAN and BP2
on Device B is the designated port.
After the root bridge, root port, and designated port are selected successfully, the entire tree topology is set
up. When the topology is stable, only the root port and the designated port forward traffic. All the other
ports are in the Blocking state and receive only STP protocol packets instead of forwarding user traffic.
Root BID Each STP-enabled network has only one root bridge.
2022-07-08 887
Feature Description
Root path cost Cost of the path from the port sending configuration BPDUs to
the root bridge.
After a device on the STP-enabled network receives configuration BPDUs, it compares the fields shown in
Table 3 with that of the configuration BPDUs. The four comparison principles are as follows:
During the STP calculation, the smaller the value, the higher the priority.
• Smallest BID: used to select the root bridge. Devices running STP select the smallest BID as the root BID
shown in Table 3.
• Smallest root path cost: used to select the root port on a non-root bridge. On the root bridge, the path
cost of each port is 0.
• Smallest sender BID: used to select the root port when a device running STP selects the root port
between two ports that have the same path cost. The port with a smaller BID is selected as the root
port in STP calculation. Assume that the BID of Device B is less than that of Device C in Figure 1. If the
path costs in the BPDUs received by port A and port B on Device D are the same, port B becomes the
root port.
• Smallest PID: used to block the port with a greater PID but not the port with a smaller PID when the
ports have the same path cost. The PIDs are compared in the scenario shown in Figure 3. The PID of
port A on Device A is less than that of port B. In the BPDUs that are received on port A and port B, the
path costs and BIDs of the sending devices are the same. Therefore, port B with a greater PID is blocked
to cut off loops.
2022-07-08 888
Feature Description
Port States
Table 4 shows the port status of an STP-enabled device.
Forwarding A port in the Forwarding state forwards user Only the root port and designated port can
traffic and BPDUs. enter the Forwarding state.
Learning When a device has a port in the Learning This is a transitional state.
state, the device creates a MAC address table
based on the received user traffic but does
not forward user traffic.
Listening A port in the Listening state does not A root port or designated port can enter the
forward user traffic but receives BPDUs. Listening state only when the alternate port,
backup port, or protection function takes
effect.
Blocking A port in the Blocking state receives and This is the final state of a blocked port.
forwards only BPDUs, not user traffic.
Disabled A port in the Disabled state does not forward The port is Down.
BPDUs or user traffic.
2022-07-08 889
Feature Description
A Huawei datacom device uses MSTP by default. Port states supported by MSTP are the same as those supported by
STP/RSTP.
The following parameters affect the STP-enabled port states and convergence.
• Hello time
The Hello timer specifies the interval at which an STP-enabled device sends configuration BPDUs and
Hello packets to detect link faults.
Modification of the Hello timer takes effect only if the configuration of the root bridge is modified. The
root bridge adds certain fields in BPDUs to inform non-root bridges of the change in the interval. After
a topology changes, TCN BPDUs will be sent. This interval is irrelevant to the transmission of TCN
BPDUs.
2022-07-08 890
Feature Description
The Forward Delay timer specifies the duration of a port spent in both the Listening and Learning states. The port
in the Listening or Learning state is blocked, which is key to preventing transient loops.
Configuration BPDUs are transmitted over the entire network, ensuring a unique Max Age value. After a
non-root bridge running STP receives a configuration BPDU, the non-root bridge compares the Message
Age value with the Max Age value in the received configuration BPDU.
■ If the Message Age value is less than or equal to the Max Age value, the non-root bridge forwards
the configuration BPDU.
■ If the Message Age value is greater than the Max Age value, the configuration BPDU ages, and the
non-root bridge directly discards it. In this case, the network size is considered too large and the
non-root bridge disconnects from the root bridge.
If the configuration BPDU is sent from the root bridge, the value of Message Age is 0. Otherwise, the value of
Message Age indicates the total time during which a BPDU is sent from the root bridge to the local bridge,
including the delay in transmission. In real world situations, each time a configuration BPDU passes through a
bridge, the value of Message Age increases by 1.
• Configuration BPDUs are heartbeat packets. STP-enabled designated ports send BPDUs at intervals
specified by the Hello timer.
• Topology Change Notification (TCN) BPDUs are sent only after the device detects network topology
changes.
A BPDU is encapsulated into an Ethernet frame. In an Ethernet frame, the destination MAC address is the
multicast MAC address 01-80-C2-00-00-00; the value of the Length/Type field is the length of MAC data; in
the LLC header, as defined in the IEEE standard, the values of DSAP and SSAP are 0x42 and the value of UI is
0x03; the BPDU header follows the LLC header. Figure 1 shows the format of an Ethernet frame.
2022-07-08 891
Feature Description
Configuration BPDU
Configuration BPDUs are most commonly used.
During initialization, each bridge actively sends configuration BPDUs. After the network topology becomes
stable, only the root bridge actively sends configuration BPDUs. Other bridges send configuration BPDUs
only after receiving configuration BPDUs from upstream devices. A configuration BPDU is at least 35 bytes
long, including the parameters such as the BID, path cost, and PID. A BPDU is discarded if both the sender
BID and Port ID field values are the same as those of the local port. Otherwise, the BPDU is processed. In
this manner, BPDUs containing the same information as that of the local port are not processed.
Table 1 shows the format of a BPDU.
Protocol 2 Always 0
Identifier
Protocol 1 Always 0
Version
Identifier
BPDU Type 1 Indicates the type of a BPDU. The value can be:
0x00: configuration BPDU
0x80: TCN BPDU
Root Path Cost 4 Indicates the cumulative cost of all links to the root bridge.
2022-07-08 892
Feature Description
Message Age 2 Records the time since the root bridge originally generated the
information that a BPDU is derived from.
If the configuration BPDU is sent from the root bridge, the value of
Message Age is 0. Otherwise, the value of Message Age indicates the
total time during which a BPDU is sent from the root bridge to the local
bridge, including the delay in transmission. In real world situations, each
time a configuration BPDU passes through a bridge, the value of
Message Age increases by 1.
Forward Delay 2 Indicates the time spent in the Listening and Learning states.
Figure 2 shows the Flags field. Only the left most and right most bits are used in STP.
• Once the ports are enabled with STP, the designated ports send configuration BPDUs at intervals
specified by the Hello timer.
• When a root port receives configuration BPDUs, the device where the root port resides sends a copy of
the configuration BPDUs to the specified ports on itself.
• When receiving a configuration BPDU with a lower priority, a designated port immediately sends its
own configuration BPDUs to the downstream device.
TCN BPDU
The contents of TCN BPDUs are simple, including only three fields: Protocol ID, Version, and Type, as shown
in Table 1. The value of the Type field is 0x80, four bytes in length.
TCN BPDUs are transmitted by each device to its upstream device to notify the upstream device of changes
in the downstream topology, until they reach the root bridge. A TCN BPDU is generated in one of the
following scenarios:
2022-07-08 893
Feature Description
• Where the port is in the Forwarding state and at least one designated port resides on the device
• Where a designated port receives TCN BPDUs and sends a copy to the root bridge
As each bridge considers itself the root bridge, the value of the root BID field in the BPDU sent by each port is
recorded as its BID; the value of the Root Path Cost field is the cumulative cost of all links to the root bridge; the
sender BID is the ID of the local bridge; the Port ID is the Port ID (PID) of the local bridge port that sends the
BPDU.
Once a port receives a BPDU with a priority higher than that of itself, the port extracts certain
information from the BPDU and synchronizes its own information with the obtained information. The
port stops sending the BPDU immediately after saving the updated BPDU.
When sending a BPDU, each device fills in the Sender BID field with its own BID. When a device
considers itself the root bridge, the device fills in the Root BID field with its own BID. As shown in
Figure 1, Port B on Device B receives a BPDU with a higher priority from Device A, and therefore
considers Device A the root bridge. When another port on Device B sends a BPDU, the port fills in its
Root BID field with DeviceA_BID. The preceding intercommunication is repeatedly performed between
two devices until all devices consider the same device as the root bridge. This indicates that the root
bridge is selected. Figure 2 shows the root bridge selection.
2022-07-08 894
Feature Description
In the Root Path Cost algorithm, after a port receives a BPDU, the port extracts the value of the Root Path Cost
field, and adds the obtained value and the path cost on the itself to obtain the root path cost. The path cost on
the port covers only directly-connected path costs. The cost can be manually configured on a port. If the root
path costs on two or more ports are the same, the port that sends a BPDU with the smallest sender BID value is
selected as the root port.
2022-07-08 895
Feature Description
Device A is selected as a designated port. The device where a designated port resides is called a
designated bridge on the network segment. In Figure 1, Device A is a designated bridge on the
network segment.
After the network convergence is implemented, only the designated port and root port are in the
Forwarding state. The other ports are in the Blocking state. They do not forward user traffic.
Ports on the root bridge are all designated ports unless loops occur on the root bridge. Figure 4 shows
the designated port selection.
2022-07-08 896
Feature Description
1. After the network topology changes, a downstream device continuously sends Topology Change
Notification (TCN) BPDUs to an upstream device.
2. After the upstream device receives TCN BPDUs from the downstream device, only the designated port
processes them. The other ports may receive TCN BPDUs but do not process them.
3. The upstream device sets the TCA bit of the Flags field in the configuration BPDUs to 1 and returns
the configuration BPDUs to instruct the downstream device to stop sending TCN BPDUs.
4. The upstream device sends a copy of the TCN BPDUs to the root bridge.
5. Steps 1, 2, 3, and 4 are repeated until the root bridge receives the TCN BPDUs.
6. The root bridge sets the TC and TCA bits of the Flags field in the configuration BPDUs to 1 to instruct
the downstream device to delete MAC address entries.
• TCN BPDUs are used to inform the upstream device and root bridge of topology changes.
• Configuration BPDUs with the Topology Change Acknowledgement (TCA) bit being set to 1 are used by the
upstream device to inform the downstream device that the topology changes are known and instruct the
downstream device to stop sending TCN BPDUs.
• Configuration BPDUs with the Topology Change (TC) bit being set to 1 are used by the upstream device to inform
the downstream device of topology changes and instruct the downstream device to delete MAC address entries. In
this manner, fast network convergence is achieved.
Figure 4 is used as an example to show how the network topology converges when the root bridge or
designated port of the root bridge becomes faulty.
2022-07-08 897
Feature Description
As shown in Figure 6, the root bridge becomes faulty, Device B and Device C will reselect the root
bridge. Device B and Device C exchange configuration BPDUs to select the root bridge.
Figure 7 Diagram of topology changes in the case of a faulty designated port on the root bridge
As shown in Figure 7, the designated port of the root bridge, port 1, becomes faulty. Port 6 is selected
as the root port through exchanging configuration BPDUs of Device B and Device C.
In addition, port6 sends TCN BPDUs after entering the forwarding state. Once the root bridge receives
the TCN BPDUs, it will send TC BPDUs to instruct the downstream device to delete MAC address entries.
2022-07-08 898
Feature Description
Disadvantages of STP
STP ensures a loop-free network but has a slow network topology convergence speed, leading to service
deterioration. If the network topology changes frequently, the connections on the STP-enabled network are
frequently torn down, causing frequent service interruption. Users can hardly tolerate such a situation.
Disadvantages of STP are as follows:
• Port states or port roles are not subtly distinguished, which is not conducive to the learning and
deployment for beginners.
A network protocol that subtly defines and distinguishes different situations is likely to outperform the
others.
■ Ports in the Listening, Learning, and Blocking states do not forward user traffic and are not even
slightly different to users.
■ The differences between ports in essence never lie in the port states but the port roles from the
perspective of use and configuration.
It is possible that the root port and designated port are both in the Listening state or Forwarding
state.
• The STP algorithm determines topology changes after the time set by the timer expires, which slows
down network convergence.
• The STP algorithm requires a stable network topology. After the root bridge sends configuration Bridge
Protocol Data Units (BPDUs), other devices forward them until all bridges on the network receive the
configuration BPDUs.
This also slows down topology convergence.
• More port roles are defined to simplify the knowledge and deployment of STP.
2022-07-08 899
Feature Description
As shown in Figure 1, RSTP defines four port roles: root port, designated port, alternate port, and
backup port.
The functions of the root port and designated port are the same as those defined in STP. The alternate
port and backup port are described as follows:
■ An alternate port is blocked after learning the configuration BPDUs sent by other bridges.
■ A backup port is blocked after learning the configuration BPDUs sent by itself.
■ An alternate port backs up the root port and provides an alternate path from the designated
bridge to the root bridge.
■ A backup port backs up the designated port and provides an alternate path from the root
bridge to the related network segment.
After all RSTP-enabled ports are assigned roles, topology convergence is completed.
2022-07-08 900
Feature Description
■ If a port neither forwards user traffic nor learns MAC addresses, the port is in the Discarding state.
■ If a port does not forward user traffic but learns MAC addresses, the port is in the Learning state.
■ If a port forwards user traffic and learns MAC addresses, the port is in the Forwarding state.
Table 1 shows the comparison between port states in STP and RSTP.
Port states and port roles are not necessarily related. Table 1 lists states of ports with different roles.
Table 1 Comparison between states of STP ports and RSTP ports with different roles
• Configuration BPDUs in RSTP are differently defined. Port roles are described based on the Flags field
defined in STP.
Compared with STP, RSTP slightly redefined the format of configuration BPDUs.
■ The value of the Type field is no longer set to 0 but 2. Therefore, the RSTP-enabled device always
discards the configuration BPDUs sent by an STP-enabled device.
■ The 6 bits in the middle of the original Flags field are reserved. Such a configuration BPDU is called
an RST BPDU, as shown in Figure 2.
2022-07-08 901
Feature Description
• Rapid convergence
■ Proposal/agreement mechanism
When a port is selected as a designated port, in STP, the port does not enter the Forwarding state
until a Forward Delay period expires; in RSTP, the port enters the Discarding state, and then the
proposal/agreement mechanism allows the port to immediately enter the Forwarding state. The
proposal/agreement mechanism must be applied on the P2P links in full duplex mode.
For details, see RSTP Implementation.
2022-07-08 902
Feature Description
enters the Forwarding state. This is because there must be a path from the root bridge to a
designated port on the network segment connecting to the alternate port.
When the port role changes, the network topology will change accordingly. For details, see RSTP
Implementation.
■ Edge ports
In RSTP, a designated port on the network edge is called an edge port. An edge port directly
connects to a terminal and does not connect to any other devices.
An edge port does not receive configuration BPDUs, and therefore does not participate in the RSTP
calculation. It can directly change from the Disabled state to the Forwarding state without any
delay, just like an STP-incapable port. If an edge port receives bogus BPDUs from attackers, it is
deprived of the edge port attributes and becomes a common STP port. The STP calculation is
implemented again, causing network flapping.
• Protection functions
Table 2 shows protection functions provided by RSTP.
BPDU On a device, ports that are directly After BPDU protection is enabled on a device, if
protection connected to a user terminal such an edge port receives an RST BPDU, the device
as a PC or file server are configured shuts down the edge port without depriving of its
as edge ports. attributes, and notifies the NMS of the shutdown
Usually, no Rapid Spanning Tree event. The edge port can be started only by the
(RST) BPDU will be sent to edge network administrator.
ports. If a device receives bogus RST To allow an edge port to automatically start
BPDUs on an edge port, the device after being shut down, you can configure the
automatically sets the edge port to auto recovery function and set the delay on the
a non-edge port, and performs STP port. In this manner, an edge port starts
calculation again. This causes automatically after the set delay. If the edge port
network flapping. receives RST BPDUs again, the edge port will
again be shut down.
NOTE:
Root Due to incorrect configurations or If a designated port is enabled with the root
protection malicious attacks on the network, protection function, the port role cannot be
2022-07-08 903
Feature Description
the root bridge may receive RST changed. Once a designated port that is enabled
BPDUs with a higher priority. with root protection receives RST BPDUs with a
Consequently, the valid root bridge higher priority, the port enters the Discarding
is no longer able to serve as the state and does not forward packets. If the port
root bridge, and the network does not receive any RST BPDUs with a higher
topology incorrectly changes. This priority before a period (generally two Forward
also causes the traffic that should Delay periods) expires, the port automatically
be transmitted over high-speed enters the Forwarding state.
links to be transmitted over low- NOTE:
speed links, leading to network
Root protection can take effect on only designated
congestion. ports.
Loop On an RSTP-enabled network, the After loop protection is configured, if the root
protection device maintains the status of the port or alternate port does not receive RST
root port and blocked ports by BPDUs from the upstream device for a long time,
continually receiving BPDUs from the device notifies the NMS that the port enters
the upstream device. the Discarding state. The blocked port remains in
If ports cannot receive BPDUs from the Blocked state and does not forward packets.
the upstream device due to link This prevents loops on the network. The root port
congestion or unidirectional link or alternate port restores the Forwarding state
failures, the device re-selects a root after receiving new RST BPDUs.
port. Then, the previous root port NOTE:
becomes a designated port and the Loop protection can take effect on only the root
blocked ports change to the port and alternate ports.
Topology After receiving TC BPDUs, a device After the TC BPDU attack defense is enabled, the
Change (TC) will delete its MAC entries and ARP number of times that TC BPDUs are processed by
BPDU attack entries. In the event of a malicious the device within a given time period is
defense attack by sending bogus TC BPDUs, configurable. If the number of TC BPDUs that the
a device receives a large number of device receives within the given time exceeds the
TC BPDUs within a short period, specified threshold, the device processes TC
and busies itself deleting its MAC BPDUs only for the specified number of times.
entries and ARP entries. As a result, Excess TC BPDUs are processed by the device as a
the device is heavily burdened, whole for once after the specified period expires.
rendering the network rather In this manner, the device is prevented from
unstable. frequently deleting its MAC entries and ARP
2022-07-08 904
Feature Description
P/A Mechanism
To allow a Huawei device to communicate with a non-Huawei device, a proper rapid transition mechanism
needs to be configured on the Huawei device based on the Proposal/Agreement (P/A) mechanism on the
non-Huawei device.
The P/A mechanism helps a designated port to enter the Forwarding state as soon as possible. As shown in
Figure 1, the P/A negotiation is performed based on the following port variables:
1. proposing: When a port is in the Discarding or Learning state, this variable is set to 1. Additionally, a
Rapid Spanning Tree (RST) BPDU with the Proposal field being 1 is sent to the downstream device.
2. proposed: After a port receives an RST BPDU with the Proposal field being 1 from the designated port
on the peer device, this variable is set to 1, urging the designated port on this network segment to
enter the Forwarding state.
3. sync: After the proposed variable is set to 1, the root port receiving the proposal sets the sync variable
to 1 for the other ports on the same device; a non-edge port receiving the proposal enters the
Discarding state.
2022-07-08 905
Feature Description
4. synced: After a port enters the Discarding state, it sets its synced variable to 1 in the following
manner: If this port is the alternate, backup, or edge port, it will immediately set its synced variable to
1. If this port is the root port, it will monitor the synced variables of the other ports. After the synced
variables of all the other ports are set to 1, the root port sets its synced variable to 1, and sends an
RST BPDU with the Agreement field being 1.
5. agreed: After the designated port receives an RST BPDU with the Agreement field being 1 and the port
role field indicating the root port, this variable is set to 1. Once the agreed variable is set to 1, this
designated port immediately enters the Forwarding state.
As shown in Figure 2, a new link is established between the root bridges Device A and Device B. On Device B,
p2 is an alternate port; p3 is a designated port in the Forwarding state; p4 is an edge port. The P/A
mechanism works in the following process:
2. After receiving an RST BPDU with a higher priority, p1 realizes that it will become a root port but not
a designated port, and therefore it stops sending RST BPDUs.
3. p0 enters the Discarding state, and sends RST BPDUs with the Proposal field being 1.
4. After receiving an RST BPDU with the Proposal field being 1, Device B sets the sync variable to 1 for all
its ports.
5. As p2 has been blocked, its status keeps unchanged; p4 is an edge port, and therefore it does not
participate in calculation. Therefore, only the non-edge designated port p3 needs to be blocked.
6. After p2, p3, and p4 enter the Discarding state, their synced variables are set to 1. The synced variable
2022-07-08 906
Feature Description
of the root port p1 is then set to 1, and p1 sends an RST BPDU with the Agreement field being 1 to
Device A. Except for the Agreement field, which is set to 1, and the Proposal field, which is set to 0,
the RST BPDU is the same as that was received.
7. After receiving this RST BPDU, Device A identifies it as a reply to the proposal that it just sent, and
therefore p0 immediately enters the Forwarding state.
This P/A negotiation process finishes, and Device B continues to perform the P/A negotiation with its
downstream device.
Theoretically, STP can quickly select a designated port. To prevent loops, STP has to wait for a period of time
long enough to determine the status of all ports on the network. All ports can enter the Forwarding state at
least one forward delay later. RSTP is developed to eliminate this bottleneck by blocking non-root ports to
prevent loops. By using the P/A mechanism, the upstream port can rapidly enter the Forwarding state.
After a device detects the topology change (TC), it performs the following procedures:
• Start a TC While Timer for every non-edge port. The TC While Timer value doubles the Hello Timer
value.
All MAC addresses learned by the ports whose status changes are cleared before the timer expires.
These ports send RST BPDUs with the TC field being 1. Once the TC While Timer expires, they stop
sending the RST BPDUs.
• After another device receives the RST BPDU, it clears the MAC addresses learned by all ports excluding
the one that receives the RST BPDU and the edge. The device then starts a TC While Timer for all non-
edge ports and the root port, the same as the preceding process.
To use the P/A mechanism, ensure that the link between the two devicesis a point to point (P2P) link in full-duplex
mode. Once the P/A negotiation fails, a designated port can forward traffic only after the forwarding delay timer expires
twice. This delay time is the same as that in STP.
2022-07-08 907
Feature Description
Unless otherwise specified, STP in this document includes STP defined in IEEE 802.1D, RSTP defined in IEEE 802.1W, and
MSTP defined in IEEE 802.1S.
• Background
On the network shown in Figure 1, users access the VPLS network through a ring network that is
comprised of CE1, CE2, PE1, and PE2. The PEs are fully connected on the VPLS network. The packet
forwarding process is as follows (using the forwarding of broadcast or unknown unicast packets from
CE1 as an example):
1. After CE1 receives a broadcast or unknown unicast packet, it forwards the packet to both PE1 and
2022-07-08 908
Feature Description
CE2.
2. After PE1 (CE2) receives the packet, it cannot find the outbound interface based on the
destination MAC address of the packet, and therefore broadcasts the packet.
3. After PE2 receives the packet, it also broadcasts the packet. Because PEs do not forward data
received from a PW back to the PW, PE2 (PE1) sends the packet to a CE and the remote PE.
As a result, a loop occurs on the path CE1 -> CE2 -> PE2 -> PE1 -> CE1 or the path CE1 -> PE1 -> PE2 ->
CE2 -> CE1. The CEs and PEs all receive duplicate traffic.
• Solution
To address this problem, enable STP on CE1, CE2, PE1, and PE2; deploy an mPW between PE1 and PE2,
deploy a service PW between PE1 and the PE and between PE2 and the PE, and associate service PWs
with the mPW; enable MSTP for the mPW and AC interfaces so that the mPW can participate in STP
calculation and block a CE interface to prevent duplicate traffic. In addition, configure PE1 and PE2 as
the root bridge and secondary root bridge so that the blocked port resides on the link between the CEs.
As shown in Figure 2, STP is enabled globally on PE1, PE2, CE1, and CE2; an mPW is deployed between
PE1 and PE2; STP is enabled on GE 1/0/1 on PE1 and PE2 and on GE 1/0/1 and GE 1/0/2 on CE1 and
CE2. PE2 is configured as the primary root bridge and PE1 is configured as the secondary root bridge
(determined by the bridge priority) to block the port connecting CE2 to CE1. After STP calculation and
association between the mPW and service PWs are implemented, remote devices no longer receive
duplicate traffic.
• Reliability
2022-07-08 909
Feature Description
On the network shown in Figure 3 the mPW does not detect a fault on the link between the PE and PE2
because PE1 is reachable to the PE and a new service PW can be created. In addition, the STP topology
remains unchanged, and therefore the blocked port is unchanged and STP recalculation is not required.
If the STP topology changes, each node sends a TCN BPDU to trigger the updating of local MAC address
entries. In addition, the TCN BPDU triggers the PW to send MAC Withdraw packets to instruct the
remote device to update the learned MAC address entries locally. In this manner, traffic is switched to
an available link.
As shown in Figure 4, if the mPW between PE1 and PE2 fails, the ring network topology is recalculated,
and the blocked port on CE2 is unblocked and enters the Forwarding state. In this situation, the remote
PE receives permanent duplicate packets.
2022-07-08 910
Feature Description
To resolve this problem, configure root protection on the secondary root bridge PE1's GE 1/0/1
connecting to CE1. As shown in Figure 5, if the mPW between PE1 and PE2 fails, PE1's GE 1/0/1 is
blocked because it receives BPDUs with higher priorities. As the link along the path PE1 -> CE1 -> CE2 -
> PE2 is working properly, PE1's blocked port keeps receiving BPDUs with higher priorities, and
therefore this port remains in the blocked state. This prevents the remote PE from receiving duplicate
traffic.
2022-07-08 911
Feature Description
• Load balancing
As shown in Figure 6, MSTP is enabled for ports connecting PEs and CEs, for the mPW between PE1 and
PE2, and for ports connecting CE1 and CE2. MSTP is globally enabled on PE1, PE2, CE1, and CE2. After
PE1 is configured as the primary root bridge and PE2 is configured as the backup root bridge
(determined by bridge priority), MSTP calculation is performed to block the port connecting CE1 and
CE2. A mapping is configured between VLANs and MSTIs to implement load balancing.
2022-07-08 912
Feature Description
1. ASBRs in different VPLS ASs (Metro-E areas) are connected back to back. ASBR1#AS1, which functions
as the CPE of ASBR1#AS2, accesses VSI#AS2; ASBR1#AS2, which functions as the CPE of ASBR1#AS1,
accesses VSI#AS1. A VPLS or HVPLS network is set up in VPLS#AS1 and VPLS#AS2 (Metro-E areas) by
using LDP, and data is forwarded in the VSIs.
2. The local ASBR and the peer can be connected through PW interfaces, Layer 2 physical interfaces, and
Layer 3 physical interfaces. The peer ASBR is connected to the local ASBR as a CE.
2022-07-08 913
Feature Description
• Option A problem
In inter-AS VPLS Option A mode, redundant connections are established between ASs, and broadcast
and unknown unicast packets may be forwarded in a loop. As shown in Figure 7, VPLS#AS1 and
VPLS#AS2 are connected by two links to improve reliability. After Option A is adopted, fully connected
PWs between PEs and ASBRs in an AS are configured with split horizon to prevent loops, but broadcast
and unknown unicast packets are looped between ASBRs. PEs receive duplicate packets even if ASBRs in
a VPLS AS are not connected.
2022-07-08 914
Feature Description
2022-07-08 915
Feature Description
■ As shown in Figure 10, inter-AS ASBRs are connected through Layer 2 or Layer 3 interfaces. VLANs
on an interface can be allocated to different instances by using the MSTP multi-instance feature.
Then MSTP can block a port based on the instances. Each AS contains multiple MSTIs that are
independent of each other. Therefore, load balancing can be implemented.
■ As shown in Figure 11, PWs between ASBRs are fully connected. By using the MSTP multi-process
feature, E-STP associates mPWs with MSTP processes. Processes are independent of each other,
and therefore the mPWs are independent of each other. Multiple service PWs are associated with
an mPW. After the mPW is blocked, the associated service PWs are also blocked. This helps break
off the loop between VPLS ASs and perform load balancing by blocking an interface as required.
2022-07-08 916
Feature Description
On the network shown in Figure 1, after CE and PE running STP discover loops on the network by
exchanging information with each other, they trim the ring topology into a loop-free tree topology by
blocking a certain port. In this manner, replication and circular propagation of packets are prevented on the
network and the switching devices are released from processing duplicated packets, thereby improving their
processing performance.
2022-07-08 917
Feature Description
Terms
Term Definition
STP Spanning Tree Protocol. A protocol used in the local area network (LAN) to eliminate
loops. Devices running STP discover loops in the network by exchanging information
with each other, and block certain interfaces to eliminate loops.
RSTP Rapid Spanning Tree Protocol. A protocol which is given detailed description by the
IEEE 802.1w. Based on STP, RSTP modifies and supplements to STP, and is therefore
able to implement faster convergence than STP.
MSTP Multiple Spanning Tree Protocol. A new spanning tree protocol defined in IEEE 802.1s
that introduces the concepts of region and instance. To meet different requirements,
MSTP divides a large network into regions where multiple spanning tree instances
(MSTIs) are created. These MSTIs are mapped to virtual LANs (VLANs) and bridge
protocol data units (BPDUs) carrying information about regions and instances are
transmitted between network bridges, and therefore, a network bridge can know
which region itself belongs to based on the BPDU information. Multi-instance RSTP is
run within regions, whereas RSTP-compatible protocols are run between regions.
VLAN Virtual local area network. A switched network and an end-to-end logical network that
is constructed by using the network management software across different network
segments and networks. A VLAN forms a logical subnet, that is, a logical broadcast
domain. One VLAN can include multiple network devices.
2022-07-08 918
Feature Description
Definition
Multiple Spanning Tree Protocol (MSTP) defined in IEEE 802.1S. MSTP defines a VLAN mapping table in
which VLANs are associated with multiple spanning tree instances (MSTIs). In addition, MSTP divides a
switching network into multiple regions, each of which has multiple independent MSTIs. In this manner, the
entire network is trimmed into a loop-free tree topology, and replication and circular propagation of packets
and broadcast storms are prevented on the network. In addition, MSTP provides multiple redundant paths to
balance VLAN traffic. MSTP is compatible with STP and RSTP.
Purpose
After an MSTP is configured on an Ethernet switching network, it calculates the network topology and
implements the following functions to remove network loops:
• Loop prevention: The potential loops on the network are cut off after redundant links are blocked.
• Link redundancy: When an active path becomes faulty, a redundant link can be activated to ensure
network connectivity.
2022-07-08 919
Feature Description
Benefits
This feature offers the following benefits to carriers:
• Compared with dual-homing networking, the ring networking requires fewer fibers and transmission
resources. This reduces resource consumption.
• MSTP prevents broadcast storms. This implements real-time communication and improves
communication reliability.
On the network shown in Figure 1, STP or RSTP is enabled. The broken line shows the spanning tree. Device
F is the root device. The links between Device A and Device D and between Device B and Device E are
blocked. VLAN packets are transmitted by using the corresponding links marked with "VLAN2" or "VLAN3."
Host A and Host B belong to VLAN 2 but they cannot communicate with each other because the link
between Device B and Device E is blocked and the link between Device C and Device F denies packets from
VLAN 2.
To overcome the defects of STP and RSTP, IEEE released 802.1S standard in 2002, which defined Multiple
Spanning Tree Protocol (MSTP). MSTP is compatible with STP and RSTP and supports both fast convergence
and multiple redundancy paths for data forwarding, achieving load balancing between VLAN data during
data forwarding.
2022-07-08 920
Feature Description
MSTP divides a switching network into multiple regions, each of which has multiple spanning trees that are
independent of each other. Each spanning tree is called a Multiple Spanning Tree Instance (MSTI) and each
region is called a Multiple Spanning Tree (MST) region.
An instance is a collection of VLANs. Binding multiple VLANs to an instance saves communication costs and reduces
resource usage. The topology of each MSTI is calculated independent of one another, and traffic can be balanced among
MSTIs. Multiple VLANs that have the same topology can be mapped to one instance. The forwarding status of the
VLANs for a port is determined by the port status in the MSTI.
As shown in Figure 2, MSTP maps VLANs to MSTIs in the VLAN mapping table. Each VLAN can be mapped
to only one MSTI. This means that traffic of a VLAN can be transmitted in only one MSTI. An MSTI, however,
can correspond to multiple VLANs.
In this manner, devices within the same VLAN can communicate with each other; packets of different VLANs
are load balanced along different paths.
2022-07-08 921
Feature Description
MST Region
An MST region contains multiple devices and network segments between them. The devices of one MST
region have the following characteristics:
• MSTP-enabled
A LAN can comprise several MST regions that are directly or indirectly connected. Multiple devices can be
grouped into an MST region by using MSTP configuration commands.
As shown in Figure 2, the MST region D0 contains Device A, Device B, Device C, and Device D, and has three
MSTIs.
2022-07-08 922
Feature Description
As shown in Figure 2, the mappings in the VLAN mapping table of the MST region D0 are as follows:
Regional Root
Regional roots are classified as Internal Spanning Tree (IST) and MSTI regional roots.
In the region B0, C0, and D0 on the network shown in Figure 4, the devices closest to the Common and
Internal Spanning Tree (CIST) root are IST regional roots.
An MST region can contain multiple spanning trees, each called an MSTI. An MSTI regional root is the root
of the MSTI. On the network shown in Figure 3, each MSTI has its own regional root.
2022-07-08 923
Feature Description
Figure 3 MSTI
MSTIs are independent of each other. An MSTI can correspond to one or more VLANs, but a VLAN can be
mapped to only one MSTI.
Master Bridge
The master bridge is the IST master, which is the device closest to the CIST root in a region, for example,
Device A shown in Figure 2.
If the CIST root is in an MST region, the CIST root is the master bridge of the region.
CIST Root
2022-07-08 924
Feature Description
On the network shown in Figure 4, the CIST root is the root bridge of the CIST. The CIST root is a device in
A0.
CST
A Common Spanning Tree (CST) connects all the MST regions on a switching network.
If each MST region is considered a node, the CST is calculated by using STP or RSTP based on all the nodes.
As shown in Figure 4, the MST regions are connected to form a CST.
IST
An IST resides within an MST region.
An IST is a special MSTI with the MSTI ID being 0, called MSTI 0.
An IST is a segment of the CIST in an MST region.
As shown in Figure 4, the devices in an MST region are connected to form an IST.
CIST
A CIST, calculated by using STP or RSTP, connects all the devices on a switching network.
As shown in Figure 4, the ISTs and the CST form a complete spanning tree, the CIST.
2022-07-08 925
Feature Description
SST
A Single Spanning Tree (SST) is formed in either of the following situations:
Port Role
Based on RSTP, MSTP has two additional port types. MSTP ports can be root ports, designated ports,
alternate ports, backup ports, edge ports, master ports, and regional edge port.
The functions of root ports, designated ports, alternate ports, and backup ports have been defined in RSTP.
Table 1 lists all port roles in MSTP.
Root port A root port is the non-root bridge port closest to the root bridge. Root bridges do not have
root ports.
Root ports are responsible for sending data to root bridges.
As shown in Figure 5, Device A is the root; CP1 is the root port on Device C; BP1 is the root
port on Device B; DP1 is the root port on Device D.
Designated The designated port on a device forwards BPDUs to the downstream device.
port As shown in Figure 5, AP2 and AP3 are designated ports on Device A; BP2 is a designated
port on Device B; CP2 is a designated port on Device C.
Alternate From the perspective of sending BPDUs, an alternate port is blocked after a BPDU sent by
port another bridge is received.
From the perspective of user traffic, an alternate port provides an alternate path to the root
bridge. This path is different than using the root port.
As shown in Figure 5, AP4 is an alternate port.
Backup port From the perspective of sending BPDUs, a backup port is blocked after a BPDU sent by itself
is received.
From the perspective of user traffic, a backup port provides a backup/redundant path to a
2022-07-08 926
Feature Description
Master port A master port is on the shortest path connecting MST regions to the CIST root.
BPDUs of an MST region are sent to the CIST root through the master port.
Master ports are special regional edge ports, functioning as root ports on ISTs or CISTs and
master ports in instances.
As shown in Figure 5, Device A, Device B, Device C, and Device D form an MST region. AP1
on Device A, being the nearest port in the region to the CIST root, is the master port.
Regional A regional edge port is located at the edge of an MST region and connects to another MST
edge port region or an SST.
During MSTP calculation, the roles of a regional edge port in the MSTI and the CIST
instance are the same. If the regional edge port is the master port in the CIST instance, it is
the master port in all the MSTIs in the region.
As shown in Figure 5, AP1, DP2, and DP3 in an MST region are directly connected to other
regions, and therefore they are all regional edge ports of the MST region.
AP1 is a master port in the CIST. Therefore, AP1 is the master port in every MSTI in the MST
region.
Edge port An edge port is located at the edge of an MST region and does not connect to any device.
Generally, edge ports are directly connected to terminals.
After MSTP is enabled on a port, edge-port detecting is started automatically. If the port
fails to receive BPDU packets within seconds, the port is set to an edge port. Otherwise, the
port is set to a non-edge port.
As shown in Figure 5, BP3 is an edge port.
2022-07-08 927
Feature Description
Figure 5 Root port, designated port, alternate port, and backup port
Forwarding A port in the Forwarding state can send and receive BPDUs as well as forward user traffic.
Learning This is a transition stage. A port in the Learning state learns MAC addresses from user traffic
to construct a MAC address table.
In the Learning state, the port can send and receive BPDUs, but not forward user traffic.
There is no necessary link between the port status and the port role. Table 3 lists the relationships between
port roles and port status.
Port Status Root Designated Port Regional Edge Alternate Port Backup Port
Port/Master Port
Port
2022-07-08 928
Feature Description
Port Status Root Designated Port Regional Edge Alternate Port Backup Port
Port/Master Port
Port
2022-07-08 929
Feature Description
The first 36 bytes of an intra-region or inter-region MST BPDU are the same as those of an RST BPDU.
Fields from the 37th byte of an MST BPDU are MSTP-specific. The field MSTI Configuration Messages
consists of configuration messages of multiple MSTIs.
Table 2 lists the major information carried in an MST BPDU.
CIST Flags 1 Indicates the Common and Internal Spanning Tree (CIST) flags.
2022-07-08 930
Feature Description
Identifier
CIST External Path 4 Indicates the total path costs from the MST region where the
Cost switching device resides to the MST region where the CIST root
switching device resides. This value is calculated based on link
bandwidth.
CIST Regional Root 8 Indicates the ID of the regional root switching device on the CIST,
Identifier that is, the Internal Spanning Tree (IST) master ID. If the root is
in this region, the CIST Regional Root Identifier is the same as the
CIST Root Identifier.
CIST Port Identifier 2 Indicates the ID of the designated port in the IST.
Max Age 2 Indicates the maximum lifecycle of the BPDU. If the Max Age
timer expires, it is considered that the link to the root fails.
MST Configuration 51 Indicates the MST regional label information, which includes four
Identifier fields shown in Figure 2. Interconnected switching devices that
are configured with the same MST configuration identifier belong
to one region. For details about these four fields, see Table 3.
CIST Internal Root 4 Indicates the total path costs from the local port to the IST
Path Cost master. This value is calculated based on link bandwidth.
CIST Bridge 8 Indicates the ID of the designated switching device on the CIST.
Identifier
CIST Remaining 1 Indicates the remaining hops of the BPDU in the CIST.
Hops
2022-07-08 931
Feature Description
Messages (may be 16 bytes, and therefore this field has N x 16 bytes in the case of
absent) N MSTIs. Figure 3 shows the structure of a single MSTI
configuration message. Table 3 describes every sub-field.
Configuration Name 32 Indicates the regional name. The value is a 32-byte string.
2022-07-08 932
Feature Description
MSTI Internal Root Path Cost 4 Indicates the total path costs from the local
port to the MSTI regional root switching
device. This value is calculated based on link
bandwidth.
If a port transmits either dot1s or legacy BPDUs by default, the user needs to identify the format of BPDUs
sent by the peer, and then runs a command to configure the port to support the peer BPDU format. Once
the configuration is incorrect, a loop probably occurs due to incorrect MSTP calculation.
By using the stp compliance command, you can configure a port on a Huawei datacom device to
automatically adjust the MST BPDU format. With this function, the port automatically adopts the peer BPDU
format. The following MST BPDU formats are supported by Huawei datacom devices:
• auto
• dot1s
• legacy
In addition to dot1s and legacy formats, the auto mode allows a port to automatically switch to the BPDU
format used by the peer based on BPDUs received from the peer. In this manner, the two ports use the same
BPDU format. In auto mode, a port uses the dot1s BPDU format by default, and keeps pace with the peer
2022-07-08 933
Feature Description
MSTP Principle
In Multiple Spanning Tree Protocol (MSTP), the entire Layer 2 network is divided into multiple MST regions,
which are interconnected by a single Common Spanning Tree (CST). In a Multiple Spanning Tree (MST)
region, multiple spanning trees are calculated, each of which is called a Multiple Spanning Tree Instances
(MSTI). Among these MSTIs, MSTI 0 is also known as the internal spanning tree (IST). Like STP, MSTP uses
configuration messages to calculate spanning trees, but the configuration messages are MSTP-specific.
Vectors
Both MSTIs and the CIST are calculated based on vectors, which are carried in Multiple Spanning Tree Bridge
Protocol Data Units (MST BPDUs). Therefore, switching devices exchange MST BPDUs to calculate MSTIs and
the Common and Internal Spanning Tree (CIST).
The priorities of vectors in braces are in descending order from left to right.
Table 1 describes the vectors.
2022-07-08 934
Feature Description
Root ID Identifies the root switching device for the CIST. The root identifier consists of
the priority value (16 bits) and MAC address (48 bits).
External root path Indicates the path cost from a CIST regional root to the root. ERPCs saved on
cost (ERPC) all switching devices in an MST region are the same. If the CIST root is in an
MST region, ERPCs saved on all switching devices in the MST region are 0s.
Regional root ID Identifies the MSTI regional root. The regional root ID consists of the priority
value (16 bits) and MAC address (48 bits).
Internal root path Indicates the path cost from the local bridge to the regional root. The IRPC
cost (IRPC) saved on a regional edge port is greater than the IRPC saved on a non-
regional edge port.
Designated switching Identifies the nearest upstream bridge on the path from the local bridge to
device ID the regional root. If the local bridge is the root or the regional root, this ID is
the local bridge ID.
Designated port ID Identifies the port on the designated switching device connected to the root
port on the local bridge. The port ID consists of the priority value (4 bits) and
port number (12 bits). The priority value must be a multiple of 16.
Receiving port ID Identifies the port receiving the BPDU. The port ID consists of the priority
value (4 bits) and port number (12 bits). The priority value must be a multiple
of 16.
5. If IRPCs are the same, compare the IDs of designated switching devices.
6. If the IDs of designated switching devices are the same, compare the IDs of designated ports.
7. If the IDs of designated ports are the same, compare the IDs of receiving ports.
If the priority of a vector carried in the configuration message of a BPDU received by a port is higher
2022-07-08 935
Feature Description
than the priority of the vector in the configuration message saved on the port, the port replaces the
saved configuration message with the received one. In addition, the port updates the global
configuration message saved on the device. If the priority of a vector carried in the configuration
message of a BPDU received on a port is equal to or lower than the priority of the vector in the
configuration message saved on the port, the port discards the BPDU.
CIST Calculation
After completing the configuration message comparison, the switching device with the highest priority on
the entire network is selected as the CIST root. MSTP calculates an IST for each MST region, and computes a
CST to interconnect MST regions. On the CST, each MST region is considered a switching device. The CST
and ISTs constitute a CIST for the entire network.
MSTI Calculation
In an MST region, MSTP calculates an MSTI for each VLAN based on mappings between VLANs and MSTIs.
Each MSTI is calculated independently. The calculation process is similar to the process for STP to calculate a
spanning tree. For details, see STP Topology Calculation.
• The spanning tree is calculated independently for each MSTI, and spanning trees of MSTIs are
independent of each other.
• MSTP calculates the spanning tree for an MSTI in the manner similar to STP.
• A port can play different roles or have different status in different MSTIs.
2022-07-08 936
Feature Description
Multiple Spanning Tree Protocol (MSTP) supports both ordinary and enhanced Proposal/Agreement (P/A)
mechanisms:
• Ordinary P/A
The ordinary P/A mechanism supported by MSTP is implemented in the same manner as that supported
by Rapid Spanning Tree Protocol (RSTP). For details about the P/A mechanism supported by RSTP, see
RSTP Implementation.
• Enhanced P/A
1. The upstream device sends a proposal to the downstream device, indicating that the port
connecting to the downstream device wants to enter the Forwarding state as soon as possible.
After receiving this Bridge Protocol Data Units (BPDU), the downstream device sets its port
connecting to the upstream device to the root port, and blocks all non-edge ports.
2. The upstream device continues to send an agreement. After receiving this BPDU, the root port
enters the Forwarding state.
3. The downstream device replies with an agreement. After receiving this BPDU, the upstream
device sets its port connecting to the downstream device to the designated port, and the port
enters the Forwarding state.
By default, Huawei devices use the enhanced P/A mechanism. If a Huawei device needs to communicate
with a non-Huawei device that uses the ordinary P/A mechanism, run the stp no-agreement-check
command to configure the Huawei device to use the ordinary P/A mechanism. In this manner, these two
devices can communicate with each other.
2022-07-08 937
Feature Description
Background
On the network shown in Figure 1:
• Multiple rings are connected to UPE1 and UPE2 through different ports.
• The devices on the rings reside at the access layer, running STP or RSTP. In addition, UPE1 and UPE2
work for different carriers, and therefore they need to reside on different spanning trees whose
topology changes do not affect each other.
On the network shown in Figure 1, devices and UPEs construct multiple Layer 2 rings. STP must be enabled
on these rings to prevent loops. UPE1 and UPE2 are connected to multiple access rings that are independent
of each other. The spanning tree protocol cannot calculate a single spanning tree for all devices. Instead, the
spanning tree protocol must be enabled on each ring to calculate a separate spanning tree.
MSTP supports MSTIs, but these MSTIs must belong to one MST region and devices in the region must have
the same configurations. If the devices belong to different regions, MSTP calculates the spanning tree based
on only one instance. Assume that devices on the network belong to different regions, and only one
spanning tree is calculated in one instance. In this case, the status change of any device on the network
affects the stability of the entire network. On the network shown in Figure 1, the devices connected to UPEs
2022-07-08 938
Feature Description
support only STP or RSTP but not MSTP. When MSTP-enabled UPEs receive RST BPDUs from the devices, the
UPEs consider that they and devices belong to different regions. As a result, only one spanning tree is
calculated for the rings composed of UPEs and devices, and the rings affect each other.
To prevent this problem, MSTP multi-process is introduced. MSTP multi-process is an enhancement to MSTP.
The MSTP multi-process mechanism allows ports on devices to be bound to different processes. MSTP
calculation is performed based on processes. In this manner, only ports that are bound to a process
participate in the MSTP calculation for this process. With the MSTP multi-process mechanism, spanning trees
of different processes are calculated independently and do not affect each other. The network shown in
Figure 1 can be divided into multiple MSTP processes by using MSTP multi-process. Each process takes
charge of a ring composed of devices. The MSTP processes have the same functions and support MSTIs. The
MSTP calculation for one process does not affect the MSTP calculation for another process.
Purpose
On the network shown in Figure 1, MSTP multi-process is configured to implement the following:
• Improves the networking reliability. For a network composed of many Layer 2 access devices, using
MSTP multi-process reduces the adverse effect of a single node failure on the entire network.
The topology is calculated for each process. If a device fails, only the topology corresponding to the
process to which the device belongs changes.
• Reduces the network administrator workload during network expansion, facilitating operation and
maintenance.
To expand a network, you only need to configure new processes, connect the processes to the existing
network, and keep the existing MSTP processes unchanged. If device expansion is performed in a
process, only this process needs to be modified.
Principles
• Public link status
As shown in Figure 1, the public link between UPE1 and UPE2 is a Layer 2 link running MSTP. The public
link between UPE1 and UPE2 is different from the links connecting devices to UPEs. The ports on the
2022-07-08 939
Feature Description
public link need to participate in the calculation for multiple access rings and MSTP processes.
Therefore, the UPEs must identify the process from which MST BPDUs are sent.
In addition, a port on the public link participates in the calculation for multiple MSTP processes, and
obtains different status. As a result, the port cannot determine its status.
To prevent this situation, a port on a public link always adopts its status in MSTP process 0 when
participating in the calculation for multiple MSTP processes.
After a devices normally starts, MSTP process 0 exists by default, and MSTP configurations in the system view and
interface view belong to this process.
• Reliability
On the network shown in Figure 2, after the topology of a ring changes, the MSTP multi-process
mechanism helps UPEs flood a TC packet to all devices on the ring and prevent the TC packet from
being flooded to devices on the other ring. UPE1 and UPE2 update MAC and ARP entries on the ports
corresponding to the changed spanning tree.
2022-07-08 940
Feature Description
On the network shown in Figure 3, if the public link between UPE1 and UPE2 fails, multiple devices that
are connected to the UPEs will unblock their blocked ports.
Assume that UPE1 is configured with the highest priority, UPE2 with the second highest priority, and
devices with default or lower priorities. After the link between UPE1 and UPE2 fails, the blocked ports
(replacing the root ports) on devices no longer receive packets with higher priorities and re-performs
state machine calculation. If the calculation changes the blocked ports to designated ports, a permanent
loop occurs, as shown in Figure 4.
2022-07-08 941
Feature Description
• Solutions
To prevent a loop between access rings, use either of the following solutions:
2022-07-08 942
Feature Description
Use the blue ring shown in Figure 5 as an example. UPE1 is configured with the highest priority,
UPE2 with the second highest priority, and devices on the blue ring with default or lower priorities.
In addition, root protection is enabled on UPE2.
Assume that a port on S1 is blocked. When the public link between UPE1 and UPE2 fails, the
blocked port on S1 begins to calculate the state machine because it no longer receives BPDUs of
higher priorities. After the calculation, the blocked port becomes the designated port and performs
P/A negotiation with the downstream device.
After S1, which is directly connected to UPE2, sends BPDUs of higher priorities to the UPE2 port
enabled with root protection, the port is blocked. From then on, the port remains blocked because
it continues receiving BPDUs of higher priorities. In this manner, no loop will occur.
• All devices on the network belong to the same Multiple Spanning Tree (MST) region.
2022-07-08 943
Feature Description
• VLAN 10 packets are forwarded within MSTI 1; VLAN 30 packets are forwarded within MSTI 3; VLAN 40
packets are forwarded within MSTI 4; VLAN 20 packets are forwarded within MSTI 0.
On the network shown in Figure 1, Device A and Device B are aggregation-layer devices, and Device C and
Device D are access-layer devices. VLAN 10 and VLAN 30 are terminated on aggregation-layer devices, and
VLAN 40 is terminated on an access-layer device. Therefore, Device A and Device B can be configured as the
roots of instances 1 and 3 respectively; Device C can be configured as the root of instance 4.
2022-07-08 944
Feature Description
Terms
Term Definition
STP Spanning Tree Protocol. A protocol used in the local area network (LAN) to eliminate
loops. Devices running STP discover loops in the network by exchanging information
with each other, and block certain interfaces to eliminate loops.
RSTP Rapid Spanning Tree Protocol. A protocol which is given detailed description by the
IEEE 802.1w. Based on STP, RSTP modifies and supplements to STP, and is therefore
able to implement faster convergence than STP.
MSTP Multiple Spanning Tree Protocol. A new spanning tree protocol defined in IEEE 802.1s
that introduces the concepts of region and instance. To meet different requirements,
MSTP divides a large network into regions where multiple spanning tree instances
(MSTIs) are created. These MSTIs are mapped to virtual LANs (VLANs) and bridge
protocol data units (BPDUs) carrying information about regions and instances are
transmitted between network bridges, and therefore, a network bridge can know
2022-07-08 945
Feature Description
Term Definition
which region itself belongs to based on the BPDU information. Multi-instance RSTP is
run within regions, whereas RSTP-compatible protocols are run between regions.
VLAN Virtual local area network. A switched network and an end-to-end logical network that
is constructed by using the network management software across different network
segments and networks. A VLAN forms a logical subnet, that is, a logical broadcast
domain. One VLAN can include multiple network devices.
Definition
The Rapid Ring Protection Protocol (RRPP) is a link layer protocol used to prevent loops on an Ethernet ring
2022-07-08 946
Feature Description
network. Devices running RRPP exchange packets with each other to detect loops on the network and block
specified interfaces to eliminate loops. RRPP snooping notifies a virtual private LAN service (VPLS) network
of RRPP ring status changes.
Purpose
As shown in Figure 1, Underlayer Provider Edges (UPEs) are connected to the VPLS network where NPEs
reside in the form of an RRPP ring. NPEs are connected through a PW, and therefore cannot serve as RRPP
nodes to directly respond to RRPP protocol packets. As a result, the VPLS network is unaware of status
changes of the RRPP ring. When the RRPP ring topology changes, each node on the VPLS network still
forwards downstream data according to the entries generated before the RRPP ring topology changes. As a
result, the downstream traffic cannot be forwarded.
To resolve the problem, configure RRPP snooping on sub-interfaces or VLANIF interfaces to allow the VPLS
network to transparently transmit RRPP protocol packets and detect changes on the RRPP ring. When the
RRPP ring is faulty, NPE D on the VPLS network synchronously clears the forwarding entries of the VSIs
(including the associated VSIs) on the local node and those of the remote NPE B to re-learn forwarding
entries. This ensures that traffic can be switched to a normal path and downstream traffic can be normally
forwarded.
Benefits
2022-07-08 947
Feature Description
When RRPP snooping is configured on sub-interfaces or VLANIF interfaces, the VPLS network can
transparently transmit RRPP protocol packets, detect changes on the RRPP ring, and upgrade the forwarding
entries to ensure that traffic is switched in time to a congestion-free path.
Basic Concepts
Ethernet devices can be configured as nodes with different roles on an RRPP ring. RRPP ring nodes exchange
and process RRPP packets to detect the status of the ring network and communicate any topology changes
throughout the network. The master node on the ring blocks or unblocks the secondary port depending on
the status of the ring network. If a device or link on the ring network fails, the backup link immediately
starts to eliminate loops.
• RRPP ring
An RRPP ring consists of interconnected devices configured with the same control VLAN. An RRPP ring
has a major ring and subring. Sub-ring protocol packets are transmitted through the major ring as data
packets; major ring protocol packets are transmitted only within the major ring.
• Control VLAN
The control VLAN is a concept relative to the data VLAN. In an RRPP ring, a control VLAN is used to
transmit only RRPP packets, whereas a data VLAN is used to transmit data packets.
• Node type
Master node: The master node determines how to handle topology changes. Each RRPP ring must have
only one master node. Any device on the Ethernet ring can serve as the master node.
Transit node: On an RRPP ring, all nodes except the master node are transit nodes. Each transit node
monitors the status of its directly connected RRPP link and notifies the master node of any changes in
link status.
Edge node and assistant edge node: A device can serve as an edge node or assistant edge node on the
sub-ring, and as a transit node on the major ring. On an RRPP sub-ring, either of the two nodes crossed
with the major ring can be specified as an edge node, and if one of the two nodes crossed with the
major ring is specified as an edge node, the other node is the assistant edge node. Each sub-ring must
have only one edge node and one assistant edge node.
RRPP Packets
Table1 shows the RRPP protocol packet types.
2022-07-08 948
Feature Description
HEALTH(HELLO) A packet sent from the master node to detect whether a loop
exists on a network.
COMMON-FLUSH-FDB A packet sent from the master node to instruct the transit,
edge, or assistant edge node to update its MAC address
forwarding table, ARP entries, and ND entries.
COMPLETE-FLUSH-FDB A packet sent from the master node to instruct the transit,
edge, or assistant edge node to update its MAC address
forwarding table, ARP entries, and ND entries. In addition, this
packet instructs the transit node to unblock the temporarily
blocked ports.
MAJOR-FAULT A packet sent from an assistant edge node to notify the edge
node that the major ring in the domain fails if the assistant
edge node does not receive the Edge-Hello packet from the
edge port within a specified period.
2022-07-08 949
Feature Description
• Destination MAC Address: indicates the destination MAC address of an RRPP packet.
• Source Mac Address: indicates the source MAC address for an RRPP packet, which is the bridge MAC
address of the device.
• EtherType: indicates the encapsulation type. This field occupies 16 bits and has a fixed value of 0x8100
for tagged encapsulation.
• PRI: indicates the priority of Class of Service (COS). This field occupies 4 bits and has a fixed value of
0xe.
• RRPP_LENGTH: indicates the length of an RRPP data unit. This field occupies 16 bits and has a fixed
value of 0x0040.
• RRPP_VER: indicates the version of an RRPP packet. This field occupies 8 bits, and the current version is
0x01.
■ HEALTH = 0x05
■ COMPLETE-FLUSH-FDB = 0x06
■ COMMON-FLUSH-FDB = 0x07
■ LINK-DOWN = 0x08
2022-07-08 950
Feature Description
■ EDGE-HELLO = 0x0a
■ MAJOR-FAULT= 0x0b
• SYSTEM_MAC_ADDR: indicates the bridge MAC address from which the packet is sent. This field
occupies 48 bits.
To resolve this problem, RRPP snooping can be enabled on the sub-interface or VLANIF interface of NPE D
and associated with VSIs that are not bound to the current sub-interface or VLANIF interface on NPE D. If
the RRPP ring fails, NPE D on the VPLS network clears the forwarding entries of the VSIs (including the
associated VSIs) on the local node and the forwarding entries of the remote NPE B to re-learn forwarding
2022-07-08 951
Feature Description
entries. This ensures that traffic can be switched to a normal path and downstream traffic can be normally
forwarded.
As shown in Figure 2, the link between UPE C and UPE A is faulty, and the RRPP master node UPE A sends a
COMMON-FLUSH-FDB packet to notify the transit nodes on the RRPP ring to clear their MAC address tables.
NPE D does not clear its MAC address table because it cannot process the COMMON-FLUSH-FDB packet. If a
downstream data packet needs to be sent to UPE A, NPE D still sends it to UPE A along the original path,
leading to a traffic interruption. After UPE B clears its MAC address table, the upstream packet sent by UPE
A is regarded as an unknown unicast packet on the RRPP ring and is forwarded to the VPLS network along
the path UPE A -> UPE B -> NPE D. After relearning the MAC address, NPE D can normally forward the
downstream traffic destined to UPE A.
When the fault on the RRPP ring is rectified, the master node UPE A sends a COMPLETE-FLUSH-FDB packet
to request the transit nodes to clear their MAC address tables. NPE D does not clear its original MAC address
entry because it cannot process the COMPLETE-FLUSH-FDB packet. As a result, the downstream traffic
between NPE D and UPE A is interrupted as well.
On the network shown in Figure 3, after the RRPP snooping is enabled on sub-interface 1.1 and sub-
interface 2.1 of NPE D, NPE D can process the COMMON-FLUSH-FDB and COMPLETE-FLUSH-FDB packets.
2022-07-08 952
Feature Description
When the RRPP ring topology changes and NPE D receives the COMMON-FLUSH-FDB or COMPLETE-FLUSH-
FDB packet from the master node UPE A, NPE D clears the MAC address table of the VSI associated with
sub-interface 1.1 and sub-interface 2.1 and then notifies other NPEs in this VSI to clear their MAC address
tables.
If a downstream data packet needs to be sent to UPE A, because NPE D cannot find the MAC address entry
of this packet, NPE D regards the packet as an unknown unicast packet and broadcasts it in the VLAN. After
learning the MAC address entry of UPE A, NPE D sends it to UPE A over UPE B. This ensures downstream
traffic continuity.
Definition
Ethernet Ring Protection Switching (ERPS) is a protocol defined by the International Telecommunication
Union - Telecommunication Standardization Sector (ITU-T) to prevent loops at Layer 2. As the standard
number is ITU-T G.8032/Y1344, ERPS is also called G.8032. ERPS defines Ring Auto Protection Switching
(RAPS) Protocol Data Units (PDUs) and protection switching mechanisms. It can be used for communication
between Huawei and non-Huawei devices on a ring network.
2022-07-08 953
Feature Description
Related Concepts
ERPSv1 and ERPSv2 are currently available. ERPSv1 was released by the ITU-T in June 2008, and ERPSv2 was
released by the ITU-T in August 2010. ERPSv2, fully compatible with ERPSv1, extends ERPSv1 functions.
Table 1 compares ERPSv1 and ERPSv2.
Ring type Supports single rings only. Supports single rings and multi-rings.
A multi-ring topology comprises major
rings and sub-rings.
Port role configuration Supports the RPL owner port and Supports the RPL owner port, RPL
ordinary ports. neighbor port, and ordinary ports.
Manual port blocking Not supported. Supports forced switch (FS) and
manual switch (MS).
As ERPSv2 is fully compatible with ERPSv1, configuring ERPSv2 is recommended if all devices on an ERPS ring support
both ERPSv1 and ERPSv2.
Purpose
Generally, redundant links are used on an Ethernet switching network to provide link backup and enhance
network reliability. The use of redundant links, however, may produce loops, causing broadcast storms and
rendering the MAC address table unstable. As a result, the communication quality deteriorates, and
communication services may even be interrupted. To resolve these problems, ERPS can be used for loop
avoidance purposes.
ERPS blocks the ring protection link (RPL) owner port to remove loops and unblocks it to promptly restore
2022-07-08 954
Feature Description
Benefits
This feature offers the following benefits:
Introduction
Ethernet Ring Protection Switching (ERPS) is a protocol used to block specified ports to prevent loops at the
link layer of an Ethernet network.
On the network shown in Figure 1, Device A through Device D constitute a ring and are dual-homed to an
upstream IP/MPLS network. This access mode will cause a loop on the entire network. To eliminate
redundant links and ensure link connectivity, ERPS is used to prevent loops.
2022-07-08 955
Feature Description
Figure 1 shows a typical ERPS single-ring network. The following describes ERPS based on this networking:
ERPS Ring
An ERPS ring consists of interconnected switches that have the same control VLAN. A ring is a basic ERPS
unit.
ERPS rings are classified as major rings (closed) or sub-rings (open). On the network shown in Figure 2,
Device A through Device D constitute a major ring, and Device C through Device F constitute a sub-ring.
Only ERPSv2 supports sub-rings.
2022-07-08 956
Feature Description
Node
A node refers to a switch added to an ERPS ring. A node can have a maximum of two ports added to the
same ERPS ring. Device A through Device D in Figure 1 are nodes on an ERPS major ring.
Port Role
ERPS defines three port roles: ring protection link (RPL) owner port, RPL neighbor port (only in ERPSv2), and
ordinary port.
• Ordinary port
Ordinary ports are ring ports other than the RPL owner and neighbor ports.
An ordinary port monitors the status of the directly connected ERPS link and sends R-APS PDUs to
inform the other ports if the link status changes.
Port Status
On an ERPS ring, an ERPS-enabled port can be in either of the following states:
• Forwarding: The port forwards user traffic and sends and receives R-APS PDUs.
• Discarding: The port does not forward user traffic but can receive and send ERPS R-APS PDUs.
Control VLAN
A control VLAN is configured for an ERPS ring to transmit R-APS PDUs. Each ERPS ring must be configured
with a control VLAN. After a port is added to an ERPS ring that has a control VLAN configured, the port is
added to the control VLAN automatically. Different ERPS rings cannot be configured with the same control
VLAN ID.
Unlike control VLANs, data VLANs are used to transmit data packets.
2022-07-08 957
Feature Description
ERP Instance
On a device running ERPS, the VLAN in which R-APS PDUs and data packets are transmitted must be
mapped to an Ethernet Ring Protection (ERP) instance so that ERPS forwards or blocks the VLAN packets
based on blocking rules. Otherwise, VLAN packets will probably cause broadcast storms on the ring network
and render the network unavailable.
Timer
ERPS defines four timers: guard timer, WTR timer, hold-off timer, and WTB timer (only in ERPSv2).
• Guard timer
After a faulty link or node recovers or a clear operation is executed, the nodes on the two ends of the
link or the recovered node sends R-APS No Request (NR) messages to inform the other nodes of the
link or node recovery and starts a guard timer. Before the timer expires, each involved node does not
process any R-APS PDUs to avoid receiving out-of-date R-APS (SF) messages. After the timer expires, if
the involved node still receives an R-APS (SF) message, the local port enters the Forwarding state.
• WTR Timer
If the RPL owner port is unblocked due to a link or node failure, the involved port may not go Up
immediately after the link or node recovers. To prevent the RPL owner port from alternating between
Up and Down, the node where the RPL owner port resides starts a WTR timer after receiving an R-APS
(NR) message. If the node receives an R-APS Signal Fail (SF) message before the timer expires, it
terminates the WTR timer (R-APS SF message: a message sent by a node to other nodes after the node
in an ERPS ring detects that one of its ring ports becomes Down). If the node does not receive any R-
APS (SF) message before the timer expires, it blocks the RPL owner port when the timer expires and
sends an R-APS (NR, RB) message. After receiving this R-APS (NR, RB) message, the nodes set their
recovered ports on the ring to the Forwarding state.
• Hold-off timer
Protection switching sequence requirements vary for Layer 2 networks running ERPS. For example, in a
multi-layer service application, a certain period of time is required for a server to recover should it fail.
(During this period, no protection switching is performed, and the client does not detect the failure.) A
hold-off timer can be set to ensure that the server is given adequate time to recover. If a fault occurs,
the fault is not immediately reported to ERPS. Instead, the hold-off timer starts. If the fault persists after
the timer expires, the fault will be reported to ERPS.
• WTB timer
The WTB timer starts after an FS or MS operation is performed. When multiple nodes on an ERPS ring
are in the FS or MS state, the clear operation takes effect only after the WTB timer expires. This ensures
that the RPL owner port will not be blocked immediately.
The WTB timer value cannot be configured. Its value is the guard timer value plus 5.
2022-07-08 958
Feature Description
• In revertive switching, the RPL owner port is re-blocked after the wait to restore (WTR) timer expires,
and the traffic channel is blocked on the RPL.
ERPSv1 supports only revertive switching. ERPSv2 supports both revertive and non-revertive switching.
If the RPL has high bandwidth, blocking a low-bandwidth link and unblocking the RPL allows traffic to use
the RPL and have more bandwidth. ERPS supports two manual port blocking modes: forced switch (FS) and
manual switch (MS).
• FS: forcibly blocks a port immediately after FS is configured, irrespective of whether link failures have
occurred.
• MS: forcibly blocks a port when link failures and FS conditions are absent.
In addition to FS and MS operations, ERPS also supports the clear operation. The clear operation has the
following functions:
• Triggers revertive switching before the WTR or wait to block (WTB) timer expires in the case of revertive
operations.
• With VCs: R-APS PDUs on sub-rings are transmitted to the major ring through interconnection nodes.
The RPL owner port of a sub-ring blocks both R-APS PDUs and data traffic.
• With NVCs: R-APS PDUs on sub-rings are terminated on the interconnection nodes. The RPL owner port
blocks data traffic but not R-APS PDUs on each sub-ring.
In ERPSv2, sub-rings can interlock in multi-ring topologies. The sub-rings attached to other sub-rings must use
non-virtual channels.
2022-07-08 959
Feature Description
On the network shown in Figure 3, a major ring is interconnected with two sub-rings. The sub-ring on the
left has a VC, whereas the sub-ring on the right has an NVC.
By default, sub-rings use NVCs to transmit R-APS PDUs, except for the scenario shown in Figure 4.
When sub-ring links are not contiguous, VCs must be used. On the network shown in Figure 4, links b and d belong to
major rings 1 and 2, respectively; links a and c belong to the sub-ring. Because links a and c are not contiguous, they
cannot detect the status change between each other. Therefore, VCs must be used for R-APS PDU transmission.
Table 1 lists the advantages and disadvantages of R-APS PDU transmission modes on sub-rings with VCs or
NVCs.
2022-07-08 960
Feature Description
Table 1 Comparison between R-APS PDU transmission modes on sub-rings with VCs or NVCs
Using VCs Applies to scenarios in which sub-ring Requires VC resource reservation and control VLAN
links are not contiguous. Existing assignment from adjacent rings.
Ethernet ring networks, even non- R-APS PDUs of sub-rings are transmitted through
ERPS ring networks, can be VCs, and therefore sub-rings do not detect
interconnected using VCs. The existing topology changes of neighboring networks. This
ring networks can function as major may affect protection switching performance if
rings, without any additional these topology changes require protection
configuration. switching on the sub-rings.
Using NVCs Does not require resource reservation Does not apply to scenarios in which sub-ring links
or control VLAN assignment from are not contiguous.
adjacent rings.
Each sub-ring has independent
switching time, irrelevant to other
network topologies.
2022-07-08 961
Feature Description
MEL 3 bits Identifies the maintenance entity group (MEG) level of the R-APS PDU.
OpCode 8 bits Indicates an R-APS PDU. The value of this field is 0x28.
TLV Offset 8 bits Indicates that the TLV starts after an offset of 32 bytes. The value of this
field is fixed at 0x20.
R-APS Specific 32 x 8 Carries R-APS ring information and is the core in an R-APS PDU. This
Information bits field has different meanings for some of its sub-fields in ERPSv1 and
ERPSv2. Figure 2 shows the R-APS Specific Information field format in
ERPSv1. Figure 3 shows the R-APS Specific Information field format in
ERPSv2.
TLV Not Describes information to be loaded. The end TLV value is 0x00.
limited
2022-07-08 962
Feature Description
Request/State 4 bits Indicates that this R-APS PDU is a request or state PDU. The value can be:
1101: forced switch (FS)
1110: Event
1011: signal failed (SF)
0111: manual switch (MS)
0000: no request (NR)
Others: reserved
Reserved 1 4 bits Reserved 1 is used in ERPSv1 for message reply or protection identifier.
Node ID 6 x 8 bits Identifies the MAC address of a node on the ERPS ring. It is informational
and does not affect protection switching on the ERPS ring.
Reserved 2 24 x 8 bits Reserved for future extension and should be ignored upon reception.
Currently, this sub-field should be encoded as all 0s in transmission.
2022-07-08 963
Feature Description
1. To prevent loops, ERPS blocks the RPL owner port and also the RPL neighbor port (if any is
configured). All other ports can transmit service traffic.
2. The RPL owner port sends R-APS (NR) messages to all other nodes on the ring at an interval of 5s,
indicating that ERPS links are normal.
2022-07-08 964
Feature Description
A Link Fails
As shown in Figure 2, if the link between DeviceD and DeviceE fails, the ERPS protection switching
mechanism is triggered. The ports on both ends of the faulty link are blocked, and the RPL owner port and
RPL neighbor port are unblocked to send and receive traffic. This mechanism ensures that traffic is not
interrupted. The process is as follows:
1. After DeviceD and DeviceE detect the link fault, they block their ports on the faulty link and perform a
Filtering Database (FDB) flush.
2. DeviceD and DeviceE send three consecutive R-APS Signal Fail (SF) messages to the other LSWs and
then send one R-APS (SF) message at an interval of 5s afterwards.
3. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush. DeviceC on which the
RPL owner port resides and DeviceB on which the RPL neighbor port resides unblock the respective
RPL owner port and RPL neighbor port, and perform an FDB flush.
2022-07-08 965
Feature Description
Figure 2 ERPS single ring networking (unblocking the RPL owner port and RPL neighbor port if a link fails)
• If the ERPS ring uses revertive switching, the RPL owner port is blocked again, and the link that has
recovered is used to forward traffic.
• If the ERPS ring uses non-revertive switching, the RPL remains unblocked, and the link that has
recovered remains blocked.
The following example uses revertive switching to describe the process after the link recovers.
1. After the link between DeviceD and DeviceE recovers, DeviceD and DeviceE start a guard timer to
avoid receiving out-of-date R-APS PDUs. The two devices do not receive any R-APS PDUs before the
timer expires. At the same time, DeviceD and DeviceE send R-APS (NR) messages to the other LSWs.
2. After receiving an R-APS (NR) message, DeviceC on which the RPL owner port resides starts the WTR
timer. After the WTR timer expires, DeviceC blocks the RPL owner port and sends R-APS (NR, RB)
messages.
3. After receiving an R-APS (NR, RB) message, DeviceD and DeviceE unblock the ports at the two ends of
2022-07-08 966
Feature Description
the link that has recovered, stop sending R-APS (NR) messages, and perform an FDB flush. The other
LSWs also perform an FDB flush after receiving an R-APS (NR, RB) message.
Protection Switching
• Forced switch
On the network shown in Figure 3, DeviceA through DeviceE on the ERPS ring can communicate with
each other. A forced switch (FS) operation is performed on the DeviceE's port that connects to DeviceD,
and the DeviceE's port is blocked. The RPL owner port and RPL neighbor port are then unblocked to
send and receive traffic. This ensures that traffic is not interrupted. The process is as follows:
1. After the DeviceE's port that connects to DeviceD is forcibly blocked, DeviceE performs an FDB
flush.
2. DeviceE sends three consecutive R-APS (SF) messages to the other LSWs and then after 5s, sends
another R-APS (SF) message.
3. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush. DeviceC on which
the RPL owner port resides and DeviceB on which the RPL neighbor port resides unblock the
respective RPL owner port and RPL neighbor port, and perform an FDB flush.
2022-07-08 967
Feature Description
• Clear
After a clear operation is performed on DeviceE, the port that is forcibly blocked by FS sends R-APS
(NR) messages to all other ports on the ERPS ring.
■ If the ERPS ring uses revertive switching, the RPL owner port starts the WTB timer after receiving
an R-APS (NR) message. After the WTB timer expires, the FS operation is cleared. The RPL owner
port is then blocked, and the blocked port on DeviceE is unblocked. If you perform a clear
operation on DeviceC on which the RPL owner port resides before the WTB timer expires, the RPL
owner port is immediately blocked, and the blocked port on DeviceE is unblocked.
■ If the ERPS ring uses non-revertive switching and you want to block the RPL owner port, perform a
clear operation on DeviceC on which the RPL owner port resides.
• Manual switch
Compared with an FS operation, a manual switch (MS) operation triggers protection switching in a
similar way except that an MS operation does not take effect in FS, MS, or link failure conditions.
1. To prevent loops, each ring blocks its Ring Protection Link (RPL) owner port. All other ports can
transmit data traffic.
2. The RPL owner port on each ring sends R-APS (NR) messages to all other nodes on the same ring at
an interval of 5s. The R-APS (NR) messages on the major ring are transmitted only on this ring. The R-
APS (NR) messages on each sub-ring are terminated on the interconnection nodes and therefore are
not transmitted to the major ring.
2022-07-08 968
Feature Description
Traffic between PC1 and the upper-layer network travels along the path PC1 <-> Device F <-> Device B <->
Device A <-> PE1; traffic between PC2 and the upper-layer network travels along the path PC2 <-> Device G
<-> Device D <-> Device E <-> PE2.
Figure 1 ERPS multi-ring networking with sub-rings that do not have VCs (links are normal)
A Link Fails
In Figure 2, if the link between Device D and Device G fails, ERPS is triggered. Specifically, the ports on both
ends of the faulty link are blocked, and the RPL owner port on sub-ring 2 is unblocked to send and receive
user traffic. In this situation, traffic from PC1 is not interrupted and still travels along the original path.
Device C and Device D inform the other nodes on the major ring of the topology change so that traffic from
PC2 is also not interrupted. Traffic between PC2 and the upper-layer network travels along the path PC2 <->
Device G <-> Device C <-> Device B <-> Device A <-> Device E <-> PE2. The detailed process is as follows:
1. After Device D and Device G detect the link fault, they both block their ports on the faulty link and
perform a Filtering Database (FDB) flush.
2. Device G sends three consecutive R-APS (SF) messages to the other devices on sub-ring 2 and then
sends one R-APS (SF) message at an interval of 5s afterwards.
2022-07-08 969
Feature Description
3. Device G unblocks the RPL owner port and performs an FDB flush.
4. After the interconnection node Device C receives an R-APS (SF) message, it performs an FDB flush.
Device C and Device D then send R-APS Event messages within the major ring to notify the topology
change of sub-ring 2.
5. After receiving an R-APS Event message, the other major ring nodes perform an FDB flush. Traffic
from PC2 is then rapidly switched to a normal link.
• If the revertive switching mode is configured for the ERPS major and sub-rings, the RPL owner port is
blocked again, and the link that has recovered is used to forward traffic.
• If the non-revertive switching is configured for the ERPS major and sub-rings, the RPL owner port
remains unblocked, but the link that has recovered remains blocked.
The following example uses revertive switching to describe the process after the link recovers.
2022-07-08 970
Feature Description
1. After the link between Device D and Device G recovers, Device D and Device G start a guard timer to
avoid receiving out-of-date R-APS PDUs. The two routers do not receive any R-APS PDUs before the
timer expires. Device D and Device G then send R-APS (NR) messages within sub-ring 2.
2. Device G on which the RPL owner port resides starts the WTR timer. After the WTR timer expires,
Device G blocks the RPL owner port, unblocks its port on the link that has recovered, and then sends
R-APS (NR, RB) messages within sub-ring 2.
3. After receiving an R-APS (NR, RB) message from Device G, Device D unblocks its port on the recovered
link, stops sending R-APS (NR) messages, and performs an FDB flush. Device C also performs an FDB
flush.
4. Device C and Device D, the interconnection nodes, send R-APS Event messages within the major ring
to notify the link recovery of sub-ring 2.
5. After receiving an R-APS Event message, the other major ring nodes perform an FDB flush.
Traffic from PC2 then travels in the same way as that shown in Figure 1.
1. To prevent loops, each ring blocks its RPL owner port. All other ports can transmit data traffic.
2. The RPL owner port on each ring sends R-APS (NR) messages to all other nodes on the same ring at
an interval of 5s. The R-APS (NR) messages of each major ring are transmitted only within the same
major ring, whereas the R-APS (NR) messages of the sub-ring are transmitted to the major rings over
the interconnection nodes.
Traffic between PC1 and PC2 travels along the path PC1 <-> Device E <-> Device B <-> Device C <-> Device F
<-> PC2.
2022-07-08 971
Feature Description
Figure 3 ERPS multi-ring networking with a sub-ring that has VCs (links are normal)
A Link Fails
As shown in Figure 4, if the link between Device B and Device C fails, ERPS is triggered. Specifically, the ports
on both ends of the faulty link are blocked, and the RPL owner port on the sub-ring is unblocked to send
and receive user traffic. Device B and Device C inform the other nodes on the major rings of the topology
change so that traffic between PCs is not interrupted. Traffic between PC1 and PC2 then travels along the
path PC1 <-> Device E <-> Device B <-> Device A <-> Device D <-> Device C <-> Device F <-> PC2. The
detailed process is as follows:
1. After Device B and Device C detect the link fault, they both block their ports on the faulty link and
perform an FDB flush.
2. Device B sends three consecutive R-APS (SF) messages to the other devices on the sub-ring and then
sends one R-APS (SF) message at an interval of 5s afterwards. The R-APS (SF) messages then arrive at
major ring 1.
3. After receiving an R-APS (SF) message, Device A on major ring 1 unblocks its RPL owner port and
performs an FDB flush.
4. The other major ring nodes also perform an FDB flush. Traffic between PCs is then rapidly switched to
a normal link.
2022-07-08 972
Feature Description
Figure 4 ERPS multi-ring networking with a sub-ring that has VCs (a link fails)
• If the revertive switching mode is configured for the ERPS major rings and sub-ring, the RPL owner port
is blocked again, and the link that has recovered is used to forward traffic.
• If the non-revertive switching is configured for the ERPS major rings and sub-ring, the RPL owner port
remains unblocked, but the link that has recovered remains blocked.
The following example uses revertive switching to describe the process after the link recovers.
1. After the link between Device B and Device C recovers, Device B and Device C start a guard timer to
avoid receiving out-of-date R-APS PDUs. The two routers do not receive any R-APS PDUs before the
timer expires. Then Device B and Device C send R-APS (NR) messages, which are transmitted within
the major rings and sub-ring.
2. Device A starts the WTR timer. After the WTR timer expires, Device A blocks the RPL owner port and
then sends R-APS (NR, RB) messages to other connected devices.
3. After receiving an R-APS (NR, RB) message from Device A, Device B and Device C unblock its port on
the recovered link, stop sending R-APS (NR) messages, and perform an FDB flush.
4. After receiving an R-APS (NR, RB) message from Device A, other devices also perform an FDB flush.
blocked ports. Each blocked port verifies the completeness of the physical ring and blocks or forwards data
without affecting each other.
ERPS multi-instance allows a physical ring to have two ERPS rings. Each ERPS ring is configured with one or
more ERP instances. Each ERP instance represents a VLAN range. The topology calculated for an ERPS ring
does not apply to or affect the other ERPS ring. With a specific ERP instance for each ERPS ring, a blocked
port takes effect only for VLANs of that specific ERPS ring. Different VLANs can use separate paths,
implementing traffic load balancing and link backup.
After Ethernet CFM is deployed on ERPS nodes connecting to transmission devices and detects a
transmission link failure, Ethernet CFM informs the ERPS ring of the failure so that ERPS can perform fast
protection switching.
On the network shown in Figure 1, DeviceA, DeviceB, and DeviceC form an ERPS ring. Three relay nodes exist
between DeviceA and DeviceC. Ethernet CFM is configured on DeviceA and DeviceC. Interface 1 on DeviceA
is associated with Interface 1 on Relay 1, and Interface 1 on DeviceC is associated with Interface 1 on Relay
3.
In normal situations, the RPL owner port sends R-APS (NR) messages to all other nodes on the ring at an
interval of 5s, indicating that ERPS links are normal.
If Relay 2 fails, DeviceA and DeviceC detect the Ethernet CFM failure, block their Interface 1, send R-APS (SF)
messages through their respective interfaces connected to DeviceB, and then perform a Filtering Database
(FDB) flush.
After receiving an R-APS (SF) message, DeviceB unblocks the RPL owner port and performs an FDB flush.
Figure 2 shows the networking after Relay 2 fails.
2022-07-08 975
Feature Description
After Relay 2 recovers, Relay2 in revertive switching mode re-blocks the RPL owner port and sends R-APS
(NR, RB) messages.
After DeviceA and DeviceC receive an R-APS (NR, RB) message, DeviceA and DeviceC unblock their blocked
Interface 1 and perform an FDB flush so that traffic changes to the normal state, as shown in Figure 1.
2022-07-08 976
Feature Description
If ERPS multi-instances are configured, ERPS is implemented in the same manner as that in Figure 1, except
that two logical ERPS rings are configured on the physical ring in Figure 1, and each logical ERPS ring has its
switches, port roles, and control VLANs independently configured.
Terms
Term Description
FDB Forwarding database. A collection of entries for guiding data forwarding. There are Layer 2
FDB and Layer 3 FDB. The Layer 2 FDB refers to the MAC table, which provides information
about MAC addresses and outbound interfaces and guides Layer 2 forwarding. The Layer 3
FDB refers to the ARP table, which provides information about IP addresses and outbound
interfaces and guides Layer 3 forwarding.
2022-07-08 977
Feature Description
Term Description
MSTP Multiple Spanning Tree Protocol. A new spanning tree protocol defined in IEEE 802.1s. MSTP
uses the concepts of region and instance. Based on different requirements, MSTP divides a
large network into regions where instances are created. These instances are mapped to
VLANs. BPDUs with region and instance information are transmitted between bridges. A
bridge determines which domain it belongs to based on the information carried in BPDUs.
RSTP Rapid Spanning Tree Protocol. A protocol defined in IEEE 802.1w which is released in 2001.
RSTP is the amendment and supplementation to STP, implementing rapid convergence.
STP Spanning Tree Protocol (STP). A protocol defined in IEEE 802.1d which is released in 1998.
This protocol is used to eliminate loops on a LAN. The devices running STP detect loops on the
network by exchanging information with each other, and block specified interfaces to
eliminate loops.
FS forced switch
MS Manual Switch
NR No Request
SF Signal Fail
2022-07-08 978
Feature Description
Definition
MAC flapping-based loop detection is a method for detecting Ethernet loops based on the frequency of MAC
address entry flapping.
Purpose
Generally, redundant links are used on an Ethernet network to provide link backup and enhance network
reliability. Redundant links, however, may produce loops and cause broadcast storms and MAC address entry
flapping. As a result, the communication quality deteriorates, and communication services may even be
interrupted. To eliminate loops on the network, the spanning tree protocols or Layer 2 loop detection
technology was introduced. If you want to apply a spanning tree protocol, the protocol must be supported
and you need to configure it on each user network device. If you want to apply the Layer 2 loop detection
technology, user network devices must allow Layer 2 loop detection packets to pass. Therefore, the spanning
tree protocols or the Layer 2 loop detection technology cannot be used to eliminate loops on user networks
with unknown connections or user networks that do not support the spanning tree protocols or Layer 2 loop
detection technology.
MAC flapping-based loop detection is introduced to address this problem. It does not require protocol packet
negotiation between devices. A device independently checks whether a loop occurs on the network based on
MAC address entry flapping.
Devices can block redundant links based on the frequency of MAC address entry flapping to eliminate loops
on the network.
Benefits
This feature offers the following benefits to users:
On the network shown in Figure 1, the consumer edge (CE) is dual-homed to the provider edges (PEs) of the
Ethernet network. To avoid loops and broadcast storms, deploy MAC flapping-based loop detection on PE1,
2022-07-08 979
Feature Description
PE2, and the CE. For example, when receiving user packets from the CE, PE1 records in its MAC address table
the CE MAC address as the source MAC address and port1 as the outbound interface. When PE1 receives
packets forwarded by PE2 from the CE, the source MAC address of the packets remains unchanged, but the
outbound interface changes. In this case, PE1 updates the CE's MAC address entry in its MAC address table.
Because PE1 repeatedly receives user packets with the same source MAC address through different
interfaces, PE1 constantly updates the MAC address entry. In this situation, with MAC flapping-based loop
detection, PE1 detects the MAC address flapping and concludes that a loop has occurred. PE1 then blocks its
port1 and generates an alarm, or it just generates an alarm, depending on user configurations.
After MAC flapping-based loop detection is configured on a device and the device receives packets with fake source MAC
addresses from attackers, the device may mistakenly conclude that a loop has occurred and block an interface based on
the configured blocking policy. Therefore, key user traffic may be blocked. It is recommended that you disable MAC
flapping-based loop detection on properly running devices. If you have to use MAC flapping-based loop detection to
detect whether links operate properly during site deployment, be sure to disable this function after this stage.
The basic concepts for MAC flapping-based loop detection are as follows:
• Detection cycle
If a device detects a specified number of MAC address entry flaps within a detection cycle, the device
concludes that a loop has occurred. The detection cycle is configurable.
• Temporary blocking
If a device concludes that a loop has occurred, it blocks an interface or PW for a specified period of
time.
• Permanent blocking
After an interface or a PW is blocked and then unblocked, if the total number of times that loops occur
exceeds the configured maximum number, the interface or PW is permanently blocked.
An interface or PW that is permanently blocked can be unblocked only manually.
• Blocking policy
2022-07-08 980
Feature Description
A device on which MAC flapping-based loop detection is deployed blocks PWs based only on the
blocking priorities of the PWs. If the device detects a loop, it blocks the PW with a lower blocking
priority.
• Accurate blocking
After MAC flapping-based loop detection is deployed on a device and the device detects a loop, the
device blocks an AC interface with a lower blocking priority by default. However, MAC address entries of
interfaces without loops may change due to the impact from a remote loop, and traffic over the
interfaces with lower blocking priorities is interrupted. To address this problem, deploy accurate
blocking of MAC flapping-based loop detection. Accurate blocking determines trusted and untrusted
interfaces by analyzing the frequency of MAC address entry flapping. When a MAC address entry
changes repeatedly, accurate blocking can accurately locate and block the interface with a loop, which
is an untrusted interface.
In addition, MAC flapping-based loop detection can associate an interface with its sub-interfaces bound with
virtual switching instances (VSIs). If a loop occurs in the VSI bound to a sub-interface, the sub-interface is
blocked. However, a loop may also exist in a VSI bound to another sub-interface. If the loop is not
eliminated in time, it will cause traffic congestion or even a network breakdown. To allow a device to inform
the network administrator of loops, enable MAC flapping-based loop detection association on the interface
of the sub-interfaces bound to VSIs. In this situation, if a sub-interface bound to a VSI is blocked due to a
loop, its interface is also blocked and an alarm is generated. After that, all the other sub-interfaces bound
with VSIs are blocked.
2022-07-08 981
Feature Description
On the network shown in Figure 1, CE2 and CE3 are connected to PE1 to provide redundant links. This
deployment may generate loops because the connections on the user network of CE2 and CE3 are unknown.
Specifically, if CE2 and CE3 are connected, PE1 interfaces connected to CE2 and CE3 may receive user
packets with the same source MAC address, causing MAC address entry flapping or even damaging MAC
address entries. In this situation, you can deploy MAC flapping-based loop detection on PE1 and configure a
blocking policy for AC interfaces to prevent such loops. The blocking policy can be either of the following:
• Blocking interfaces based on their blocking priorities: If a device detects a loop, it blocks the interface
with a lower blocking priority.
• Blocking interfaces based on their trusted or untrusted states: If a device detects a loop, it blocks the
untrusted interface.
MAC flapping-based loop detection can also detect PW-side loops. The principles of blocking PWs are similar
to those of blocking AC interfaces.
In addition, MAC flapping-based loop detection can associate an interface with its sub-interfaces bound with
virtual switching instances (VSIs). If a loop occurs in the VSI bound to a sub-interface, the sub-interface is
blocked. However, a loop may also exist in a VSI bound to another sub-interface. If the loop is not
eliminated in time, it will cause traffic congestion or even a network breakdown. To inform the network
administrator of loops, enable MAC flapping-based loop detection association on the interface of the sub-
interfaces bound with VSIs. In this situation, if a sub-interface bound with a VSI is blocked due to a loop, the
interface on which the sub-interface is configured is also blocked and an alarm is generated. After that, all
the other sub-interfaces bound with VSIs are blocked.
Terms
None
AC attachment circuit
PW pseudo wire
Definition
Virtual extensible local area network (VXLAN) is a Network Virtualization over Layer 3 (NVO3) technology
that uses MAC-in-UDP encapsulation.
Purpose
As a widely deployed core cloud computing technology, server virtualization greatly reduces IT and O&M
costs and improves service deployment flexibility.
2022-07-08 983
Feature Description
On the network shown in Figure 1, a server is virtualized into multiple virtual machines (VMs), each of which
functions as a host. A great increase in the number of hosts causes the following problems:
Most networks currently use VLANs to implement network isolation. However, the deployment of
VLANs on large-scale virtualized networks has the following limitations:
■ The VLAN tag field defined in IEEE 802.1Q has only 12 bits and can support only a maximum of
4096 VLANs, which cannot meet user identification requirements of large Layer 2 networks.
2022-07-08 984
Feature Description
Benefits
As server virtualization is being rapidly deployed on data centers based on physical network infrastructure,
VXLAN offers the following benefits:
• A maximum of 16M VXLAN segments are supported using 24-bit VNIs, which allows a data center to
accommodate multiple tenants.
• Non-VXLAN network edge devices do not need to identify the VM's MAC address, which reduces the
number of MAC addresses that have to be learned and enhances network performance.
• MAC-in-UDP encapsulation extends Layer 2 networks, decoupling between physical and virtual
networks. Tenants are able to plan their own virtual networks, not limited by the physical network IP
addresses or broadcast domains. This greatly simplifies network management.
2022-07-08 985
Feature Description
VXLAN allows a virtual network to provide access services to a large number of tenants. In addition, tenants
are able to plan their own virtual networks, not limited by the physical network IP addresses or broadcast
domains. This greatly simplifies network management. Table 1 describes VXLAN concepts.
Concept Description
Underlay and VXLAN allows virtual Layer 2 or Layer 3 networks (overlay networks) to be built
overlay networks over existing physical networks (underlay networks). Overlay networks use
encapsulation technologies to transmit tenant packets between sites over Layer 3
forwarding paths provided by underlay networks. Tenants are aware of only overlay
networks.
Network A network entity that is deployed at the network edge and implements network
virtualization edge virtualization functions.
(NVE) NOTE:
2022-07-08 986
Feature Description
Concept Description
VXLAN tunnel A VXLAN tunnel endpoint that encapsulates and decapsulates VXLAN packets. It is
endpoint (VTEP) represented by an NVE.
A VTEP connects to a physical network and is assigned a physical network IP
address. This IP address is irrelevant to virtual networks.
In VXLAN packets, the source IP address is the local node's VTEP address, and the
destination IP address is the remote node's VTEP address. This pair of VTEP
addresses corresponds to a VXLAN tunnel.
VXLAN network A VXLAN segment identifier similar to a VLAN ID. VMs on different VXLAN
identifier (VNI) segments cannot communicate directly at Layer 2.
A VNI identifies only one tenant. Even if multiple terminal users belong to the same
VNI, they are considered one tenant. A VNI consists of 24 bits and supports a
maximum of 16M tenants.
A VNI can be a Layer 2 or Layer 3 VNI.
A Layer 2 VNI is mapped to a BD for intra-segment transmission of VXLAN packets.
A Layer 3 VNI is bound to a VPN instance for inter-segment transmission of VXLAN
packets.
Bridge domain (BD) A Layer 2 broadcast domain through which VXLAN data packets are forwarded.
VNIs identifying VNs must be mapped to BDs so that a BD can function as a VXLAN
network entity to transmit VXLAN traffic.
Virtual Bridge A Layer 3 logical interface created for a BD. Configuring IP addresses for VBDIF
Domain Interface interfaces allows communication between VXLANs on different network segments
(VBDIF) and between VXLANs and non-VXLANs and implements Layer 2 network access to a
Layer 3 network.
Gateway A device that ensures communication between VXLANs identified by different VNIs
and between VXLANs and non-VXLANs.
2022-07-08 987
Feature Description
service network carried over VXLAN tunnels are called the overlay network. The following combinations of
underlay and overlay networks exist in VXLAN scenarios.
IPv4 over IPv4 The overlay and underlay networks are In Figure 1, Server IP and VTEP IP are
both IPv4 networks. both IPv4 addresses.
IPv6 over IPv4 The overlay network is an IPv6 network, In Figure 1, Server IP is an IPv6 address,
and the underlay network is an IPv4 and VTEP IP is an IPv4 address.
network.
IPv4 over IPv6 The overlay network is an IPv4 network, In Figure 1, Server IP is an IPv4 address,
and the underlay network is an IPv6 and VTEP IP is an IPv6 address.
network.
IPv6 over IPv6 The overlay and underlay networks are In Figure 1, Server IP and VTEP IP are
both IPv6 networks. both IPv6 addresses.
Figure 1 shows VXLAN packet formats for different combinations of underlay and overlay networks.
2022-07-08 989
Feature Description
Field Description
Outer UDP header DestPort: destination port number, which is 4789 for UDP.
Source Port: source port number, which is calculated by performing the hash
operation on inner Ethernet frame headers.
NOTE:
Outer IP header IP SA: source IP address, which is the IP address of the local VTEP of a VXLAN
tunnel.
IP DA: destination IP address, which is the IP address of the remote VTEP of a
2022-07-08 990
Feature Description
Field Description
VXLAN tunnel.
Outer Ethernet header MAC DA: destination MAC address, which is the MAC address mapped to the
next hop IP address based on the destination VTEP address in the routing
table of the VTEP on which the VM that sends packets resides.
MAC SA: source MAC address, which is the MAC address of the VTEP on which
the VM that sends packet resides.
802.1Q Tag: VLAN tag carried in packets. This field is optional.
Ethernet Type: Ethernet frame type.
Introduction
Ethernet virtual private network (EVPN) is a VPN technology used for Layer 2 internetworking. EVPN is
similar to BGP/MPLS IP VPN. EVPN defines a new type of BGP network layer reachability information (NLRI),
called the EVPN NLRI. The EVPN NLRI defines new BGP EVPN routes to implement MAC address learning
and advertisement between Layer 2 networks at different sites.
VXLAN does not provide a control plane, and VTEP discovery and host information (IP and MAC addresses,
VNIs, and gateway VTEP IP address) learning are implemented by traffic flooding on the data plane,
resulting in high traffic volumes on DC networks. To address this problem, VXLAN uses EVPN as the control
plane. EVPN allows VTEPs to exchange BGP EVPN routes to implement automatic VTEP discovery and host
information advertisement, preventing unnecessary traffic flooding.
In summary, EVPN introduces several new types of BGP EVPN routes through BGP extension to advertise
VTEP addresses and host information. In this way, EVPN applied to VXLAN networks enables VTEP discovery
and host information learning on the control plane instead of on the data plane.
2022-07-08 991
Feature Description
Field Description
Ethernet Unique ID for defining the connection between local and remote devices
Segment
Identifier
MAC Address Length of the host MAC address carried in the route
Length
2022-07-08 992
Feature Description
address of a host.
• ARP advertisement
A MAC/IP route can carry both the MAC and IP addresses of a host, and therefore can be used to
advertise ARP entries between VTEPs. The MAC Address field identifies the MAC address of the host,
whereas the IP Address field identifies the IP address of the host. This type of MAC/IP route is called the
ARP route.
• IP route advertisement
In distributed VXLAN gateway scenarios, to implement Layer 3 communication between inter-subnet
hosts, the source and remote VTEPs that function as Layer 3 gateways must learn the host IP routes.
The VTEPs function as BGP EVPN peers to exchange MAC/IP routes so that they can obtain the host IP
routes. The IP Address field identifies the destination address of the IP route. In addition, the MPLS
Label2 field must carry the L3VNI. This type of MAC/IP route is called the integrated routing and
bridging (IRB) route.
An ARP route carries host MAC and IP addresses and an L2VNI. An IRB route carries host MAC and IP addresses, an
L2VNI, and an L3VNI. Therefore, IRB routes carry ARP routes and can be used to advertise IP routes as well as ARP
entries.
An ND route carries host MAC and IPv6 addresses and an L2VNI. An IRBv6 route carries host MAC and IPv6
addresses, an L2VNI, and an L3VNI. Therefore, IRBv6 routes carry ND routes and can be used to advertise both host
IPv6 routes and ND entries.
2022-07-08 993
Feature Description
Field Description
Flags Flags indicating whether leaf node information is required for the tunnel.
This field is inapplicable in VXLAN scenarios.
2022-07-08 994
Feature Description
Inclusive multicast routes are used on the VXLAN control plane for automatic VTEP discovery and dynamic
VXLAN tunnel establishment. VTEPs that function as BGP EVPN peers transmit L2VNIs and VTEPs' IP
addresses through inclusive multicast routes. The originating router's IP Address field identifies the local
VTEP's IP address; the MPLS Label field identifies an L2VNI. If the remote VTEP's IP address is reachable at
Layer 3, a VXLAN tunnel to the remote VTEP is established. In addition, the local end creates a VNI-based
ingress replication list and adds the peer VTEP IP address to the list for subsequent BUM packet forwarding.
Type 5 Route: IP Prefix Route
Figure 3 shows the format of an IP prefix route.
Field Description
Ethernet Unique ID for defining the connection between local and remote devices
Segment
Identifier
An IP prefix route can carry either a host IP address or a network segment address.
• When carrying a host IP address, the route is used for IP route advertisement in distributed VXLAN
2022-07-08 995
Feature Description
gateway scenarios, which functions the same as an IRB route on the VXLAN control plane.
• When carrying a network segment address, the route can be advertised to allow hosts on a VXLAN
network to access the specified network segment or external network.
• Advantage: Inter-segment traffic can be centrally managed, and gateway deployment and management
is easy.
• Disadvantages:
■ Forwarding paths are not optimal. Inter-segment Layer 3 traffic of data centers connected to the
same Layer 2 gateway must be transmitted to the centralized Layer 3 gateway for forwarding.
■ The ARP entry specification is a bottleneck. ARP entries must be generated for tenants on the Layer
3 gateway. However, only a limited number of ARP entries are allowed by the Layer 3 gateway,
impeding data center network expansion.
2022-07-08 996
Feature Description
• Function as a Layer 2 VXLAN gateway to connect to physical servers or VMs and allow tenants to access
VXLANs.
• Function as a Layer 3 VXLAN gateway to perform VXLAN encapsulation and decapsulation to allow
inter-segment VXLAN communication and access to external networks.
• Flexible deployment. A leaf node can function as both Layer 2 and Layer 3 VXLAN gateways.
• Improved network expansion capabilities. A leaf node only needs to learn the ARP or ND entries of
servers attached to it. A centralized Layer 3 gateway in the same scenario, however, has to learn the
ARP or ND entries of all servers on the network. Therefore, the ARP or ND entry specification is no
longer a bottleneck on a distributed VXLAN gateway.
2022-07-08 997
Feature Description
The following VXLAN tunnel establishment uses an IPv4 over IPv4 network as an example. Table 1 shows the
implementation differences between the other combinations of underlay and overlay networks and IPv4 over
IPv4.
IPv6 over During dynamic MAC address learning, a Layer 2 gateway learns the local host's MAC
IPv4 address using neighbor solicitation (NS) packets sent by the host.
In the inter-subnet interworking scenario, an IPv6 address must be configured for the Layer 3
gateway's VBDIF interface. During inter-subnet packet forwarding, the Layer 3 gateway
needs to search its IPv6 routing table for the next-hop address of the destination IPv6
address, queries the ND table based on the next-hop address, and then obtains information
such as the destination MAC address.
IPv4 over The VTEPs at both ends of a VXLAN tunnel use IPv6 addresses, and IPv6 Layer 3 route
IPv6 reachability must be implemented between the VTEPs.
IPv6 over The VTEPs at both ends of a VXLAN tunnel use IPv6 addresses, and IPv6 Layer 3 route
IPv6 reachability must be implemented between the VTEPs.
During dynamic MAC address learning, a Layer 2 gateway learns the local host's MAC
address using NS packets sent by the host.
In the inter-subnet interworking scenario, an IPv6 address must be configured for the Layer 3
gateway's VBDIF interface. During inter-subnet packet forwarding, the Layer 3 gateway
needs to search its IPv6 routing table for the next hop address of the destination IPv6
address, queries the ND table based on the next-hop address, and then obtains information
such as the destination MAC address.
2022-07-08 998
Feature Description
A VXLAN tunnel is identified by a pair of VTEP IP addresses. A VXLAN tunnel can be statically created after
you configure local and remote VNIs, VTEP IP addresses, and an ingress replication list, and the tunnel goes
Up when the pair of VTEPs is reachable at Layer 3.
On the network shown in Figure 1, Leaf 1 connects to Host 1 and Host 3; Leaf 2 connects to Host 2; Spine
functions as a Layer 3 gateway.
• To allow Host 3 and Host 2 to communicate, Layer 2 VNIs and an ingress replication list must be
configured on Leaf 1 and Leaf 2. The peer VTEPs' IP addresses must be specified in the ingress
replication list. A VXLAN tunnel can be established between Leaf 1 and Leaf 2 if their VTEPs have Layer
3 routes to each other.
• To allow Host 1 and Host 2 to communicate, Layer 2 VNIs and an ingress replication list must be
configured on Leaf 1, Leaf 2, and also Spine. The peer VTEPs' IP addresses must be specified in the
ingress replication list. A VXLAN tunnel can be established between Leaf 1 and Spine and between Leaf
2 and Spine if they have Layer 3 routes to the IP addresses of the VTEPs of each other.
Although Host 1 and Host 3 both connect to Leaf 1, they belong to different subnets and must communicate
through the Layer 3 gateway (Spine). Therefore, a VXLAN tunnel is also required between Leaf 1 and Spine.
2022-07-08 999
Feature Description
VXLAN supports dynamic MAC address learning to allow communication between tenants. MAC address
entries are dynamically created and do not need to be manually maintained, greatly reducing maintenance
workload. The following example illustrates dynamic MAC address learning for intra-subnet communication
on the network shown in Figure 2.
1. Host 3 sends an ARP request for Host 2's MAC address. The ARP request carries the source MAC
address being MAC3, destination MAC address being all Fs, source IP address being IP3, and
destination IP address being IP2.
2. Upon receipt of the ARP request, Leaf 1 determines that the Layer 2 sub-interface receiving the ARP
request belongs to a BD that has been bound to a VNI (20), meaning that the ARP request packet
must be transmitted over the VXLAN tunnel identified by VNI 20. Leaf 1 then learns the mapping
between Host 3's MAC address, BDID (Layer 2 broadcast domain ID), and inbound interface (Port1 for
the Layer 2 sub-interface) that has received the ARP request and generates a MAC address entry for
Host 3. The MAC address entry's outbound interface is Port1.
3. Leaf 1 then performs VXLAN encapsulation on the ARP request, with the VNI being the one bound to
the BD, source IP address in the outer IP header being the VTEP's IP address of Leaf 1, destination IP
address in the outer IP header being the VTEP's IP address of Leaf 2, source MAC address in the outer
Ethernet header being NVE1's MAC address of Leaf 1, and destination MAC address in the outer
Ethernet header being the MAC address of the next hop pointing to the destination IP address. Figure
3 shows the VXLAN packet format. The VXLAN packet is then transmitted over the IP network based
on the IP and MAC addresses in the outer headers and finally reaches Leaf 2.
2022-07-08 1000
Feature Description
4. After Leaf 2 receives the VXLAN packet, it decapsulates the packet and obtains the ARP request
originated from Host 3. Leaf 2 then learns the mapping between Host 3's MAC address, BDID, and
VTEP's IP address of Leaf 1 and generates a MAC address entry for Host 3. Based on the next hop
(VTEP's IP address of Leaf 1), the MAC address entry's outbound interface recurses to the VXLAN
tunnel destined for Leaf1.
5. Leaf 2 broadcasts the ARP request in the Layer 2 domain. Upon receipt of the ARP request, Host 2
finds that the destination IP address is its own IP address and saves Host 3's MAC address to the local
MAC address table. Host 2 then responds with an ARP reply.
So far, Host 2 has learned Host 3's MAC address. Therefore, Host 2 responds with a unicast ARP reply. The
ARP reply is transmitted to Host 3 in the same manner. After Host 2 and Host 3 learn the MAC address of
each other, they will subsequently communicate with each other in unicast mode.
Dynamic MAC address learning is required only between hosts and Layer 3 gateways in inter-subnet communication
scenarios. The process is the same as that for intra-subnet communication.
2022-07-08 1001
Feature Description
1. After Leaf 1 receives Host 3's packet, it determines the Layer 2 BD of the packet based on the access
interface and VLAN information and searches for the outbound interface and encapsulation
information in the BD.
2. Leaf 1's VTEP performs VXLAN encapsulation based on the encapsulation information obtained and
forwards the packets through the outbound interface obtained.
3. Upon receipt of the VXLAN packet, Leaf 2's VTEP verifies the VXLAN packet based on the UDP
destination port number, source and destination IP addresses, and VNI. Leaf 2 obtains the Layer 2 BD
based on the VNI and performs VXLAN decapsulation to obtain the inner Layer 2 packet.
4. Leaf 2 obtains the destination MAC address of the inner Layer 2 packet, adds VLAN tags to the
packets based on the outbound interface and encapsulation information in the local MAC address
table, and forwards the packets to Host 2.
2022-07-08 1002
Feature Description
1. After Leaf 1 receives Terminal A's packet, it determines the Layer 2 BD of the packet based on the
2022-07-08 1003
Feature Description
2. Leaf 1's VTEP obtains the ingress replication list for the VNI, replicates packets based on the list, and
performs VXLAN encapsulation by adding outer headers. Leaf 1 then forwards the VXLAN packet
through the outbound interface.
3. Upon receipt of the VXLAN packet, Leaf 2's VTEP and Leaf 3's VTEP verify the VXLAN packet based on
the UDP destination port number, source and destination IP addresses, and VNI. Leaf 2/Leaf 3 obtains
the Layer 2 BD based on the VNI and performs VXLAN decapsulation to obtain the inner Layer 2
packet.
4. Leaf 2/Leaf 3 checks the destination MAC address of the inner Layer 2 packet and finds it a BUM MAC
address. Therefore, Leaf 2/Leaf 3 broadcasts the packet onto the network connected to the terminals
(not the VXLAN tunnel side) in the Layer 2 broadcast domain. Specifically, Leaf 2/Leaf 3 finds the
outbound interfaces and encapsulation information not related to the VXLAN tunnel, adds VLAN tags
to the packet, and forwards the packet to Terminal B/Terminal C.
Terminal B/Terminal C responds to Terminal A in the same process as intra-subnet known unicast packet forwarding.
2022-07-08 1004
Feature Description
1. After Leaf 1 receives Host 1's packet, it determines the Layer 2 BD of the packet based on the access
interface and VLAN information and searches for the outbound interface and encapsulation
information in the BD.
2. Leaf 1's VTEP performs VXLAN encapsulation based on the outbound interface and encapsulation
information and forwards the packets to Spine.
3. After Spine receives the VXLAN packet, it decapsulates the packet and finds that the destination MAC
address of the inner packet is the MAC address (MAC3) of the Layer 3 gateway interface (VBDIF10) so
2022-07-08 1005
Feature Description
4. Spine removes the inner Ethernet header, parses the destination IP address, and searches the routing
table for a next hop address. Spine then searches the ARP table based on the next hop address to
obtain the destination MAC address, VXLAN tunnel's outbound interface, and VNI.
5. Spine performs VXLAN encapsulation on the inner packet again and forwards the VXLAN packet to
Leaf 2, with the source MAC address in the inner Ethernet header being the MAC address (MAC4) of
the Layer 3 gateway interface (VBDIF20).
6. Upon receipt of the VXLAN packet, Leaf 2's VTEP verifies the VXLAN packet based on the UDP
destination port number, source and destination IP addresses, and VNI. Leaf 2 then obtains the Layer 2
broadcast domain based on the VNI and removes the outer headers to obtain the inner Layer 2
packet. It then searches for the outbound interface and encapsulation information in the Layer 2
broadcast domain.
7. Leaf 2 adds VLAN tags to the packets based on the outbound interface and encapsulation information
and forwards the packets to Host 2.
This mode uses EVPN to automatically discover VTEPs and dynamically establish VXLAN tunnels, providing
high flexibility and is applicable to large-scale VXLAN networking scenarios. It is recommended for
establishing VXLANs with centralized gateways.
The following uses an IPv4 over IPv4 network as an example. Table 1 shows the implementation differences
between IPv4 over IPv4 networks and other combinations of underlay and overlay networks.
2022-07-08 1006
Feature Description
IPv6 over During dynamic MAC address learning, the Layer 2 gateway learns the local host's MAC
IPv4 address through neighbor discovery. Hosts at both ends learn each other's MAC address by
exchanging Neighbor Solicitation (NS)/Neighbor Advertisement (NA) packets.
In the inter-subnet interworking scenario, an IPv6 address must be configured for the Layer 3
gateway's VBDIF interface. During inter-subnet packet forwarding, the Layer 3 gateway
needs to search its IPv6 routing table for the next hop address of the destination IPv6
address, query the ND table based on the next hop address, and then obtain information
such as the destination MAC address.
IPv4 over A BGP EVPN IPv6 peer relationship is established between gateways.
IPv6 The VTEP IP addresses are IPv6 addresses.
IPv6 over A BGP EVPN IPv6 peer relationship is established between gateways.
IPv6 The VTEP IP addresses are IPv6 addresses.
During dynamic MAC address learning, the Layer 2 gateway learns the local host's MAC
address through neighbor discovery. Hosts at both ends learn each other's MAC address by
exchanging NS/NA packets.
In the inter-subnet interworking scenario, an IPv6 address must be configured for the Layer 3
gateway's VBDIF interface. During inter-subnet packet forwarding, the Layer 3 gateway
needs to search its IPv6 routing table for the next hop address of the destination IPv6
address, query the ND table based on the next hop address, and then obtain information
such as the destination MAC address.
2022-07-08 1007
Feature Description
A VXLAN tunnel is determined by a pair of VTEP IP addresses. When a local VTEP receives the same remote VTEP IP
address repeatedly, only one VXLAN tunnel can be established, but packets are encapsulated with different VNIs before
being forwarded through the tunnel.
The following example illustrates how to dynamically establish a VXLAN tunnel using BGP EVPN between
Leaf1 and Leaf2 on the network shown in Figure 2.
2022-07-08 1008
Feature Description
1. First, a BGP EVPN peer relationship is established between Leaf1 and Leaf2. Then, Layer 2 broadcast
domains are created on Leaf1 and Leaf2, and VNIs are bound to the Layer 2 broadcast domains. Next,
an EVPN instance is configured in each Layer 2 broadcast domain, and an RD, export VPN target
(ERT), and import VPN target (IRT) are configured for the EVPN instance. After the local VTEP IP
address is configured on Leaf1 and Leaf2, they generate a BGP EVPN route and send it to each other.
The BGP EVPN route carries the local EVPN instance's ERT, Next_Hop attribute, and an inclusive
multicast route (Type 3 route defined in BGP EVPN). Figure 3 shows the format of an inclusive
multicast route, which comprises a prefix and a PMSI attribute. VTEP IP addresses are stored in the
Originating Router's IP Address field in the inclusive multicast route prefix, and VNIs are stored in the
MPLS Label field in the PMSI attribute. The VTEP IP address is also included in the Next_Hop attribute.
2022-07-08 1009
Feature Description
2. After Leaf1 and Leaf2 receive a BGP EVPN route from each other, they match the ERT of the route
against the IRT of the local EVPN instance. If a match is found, the route is accepted. If no match is
found, the route is discarded. Leaf1 and Leaf2 obtain the peer VTEP IP address (from the Next_Hop
attribute) and VNI carried in the route. If the peer VTEP IP address is reachable at Layer 3, they
establish a VXLAN tunnel to the peer end. Moreover, the local end creates a VNI-based ingress
replication table and adds the peer VTEP IP address to the table for forwarding BUM packets.
The process of dynamically establishing VXLAN tunnels between Leaf1 and Spine and between Leaf2 and
Spine using BGP EVPN is similar to the preceding process.
A VPN target is an extended community attribute of BGP. An EVPN instance can have the IRT and ERT configured. The
local EVPN instance's ERT must match the remote EVPN instance's IRT for EVPN route advertisement. If not, VXLAN
tunnels cannot be dynamically established. If only one end can successfully accept the BGP EVPN route, this end can
establish a VXLAN tunnel to the other end, but cannot exchange data packets with the other end. The other end drops
packets after confirming that there is no VXLAN tunnel to the end that has sent these packets.
For details about VPN targets, see Basic BGP/MPLS IP VPN.
2022-07-08 1010
Feature Description
1. Host3 sends dynamic ARP packets when it first communicates with Leaf1. Leaf1 learns the MAC
address of Host3 and the mapping between the BDID and packet inbound interface (that is, the
physical interface Port 1 corresponding to the Layer 2 sub-interface), and generates a MAC address
entry about Host3 in the local MAC address table, with the outbound interface being Port 1. Leaf1
generates a BGP EVPN route based on the ARP entry of Host3 and sends it to Leaf2. The BGP EVPN
route carries the local EVPN instance's ERT, Next_Hop attribute, and a Type 2 route (MAC/IP route)
defined in BGP EVPN. The Next_Hop attribute carries the local VTEP's IP address. The MAC Address
Length and MAC Address fields identify Host3's MAC address. The Layer 2 VNI is stored in the MPLS
Label1 field. Figure 5 shows the format of a MAC route or an IP route.
2. After receiving the BGP EVPN route from Leaf1, Leaf2 matches the ERT of the EVPN instance carried
in the route against the IRT of the local EVPN instance. If a match is found, the route is accepted. If no
match is found, the route is discarded. After accepting the route, Leaf2 obtains the MAC address of
Host3 and the mapping between the BDID and the VTEP IP address (Next_Hop attribute) of Leaf1,
and generates the MAC address entry of the Host3 in the local MAC address table. The recursion to
2022-07-08 1011
Feature Description
the outbound interface needs to be performed based on the next hop, and the final recursion result is
the VXLAN tunnel destined for Leaf1.
• When hosts on different subnets communicate with each other, only the hosts and Layer 3 gateway need to
dynamically learn MAC addresses from each other. This process is similar to the preceding process.
• Leaf nodes can learn the MAC addresses of hosts during data forwarding, depending on their capabilities to learn
MAC addresses from data packets. If VXLAN tunnels are established using BGP EVPN, leaf nodes can dynamically
learn the MAC addresses of hosts through BGP EVPN routes, rather than during data forwarding.
2022-07-08 1012
Feature Description
1. After Leaf1 receives a packet from Host3, it determines the Layer 2 broadcast domain of the packet
based on the access interface and VLAN information, and searches for the outbound interface and
encapsulation information in the broadcast domain.
2. Leaf1's VTEP performs VXLAN encapsulation based on the obtained encapsulation information and
forwards the packet through the outbound interface obtained.
3. After the VTEP on Leaf2 receives the VXLAN packet, it checks the UDP destination port number, source
and destination IP addresses, and VNI of the packet to determine the packet validity. Leaf2 obtains the
Layer 2 broadcast domain based on the VNI and performs VXLAN decapsulation to obtain the inner
Layer 2 packet.
4. Leaf2 obtains the destination MAC address of the inner Layer 2 packet, adds a VLAN tag to the packet
based on the outbound interface and encapsulation information in the local MAC address table, and
forwards the packet to Host2.
2022-07-08 1013
Feature Description
1. After Leaf1 receives a packet from TerminalA, it determines the Layer 2 broadcast domain of the
packet based on the access interface and VLAN information in the packet.
2. Leaf1's VTEP obtains the ingress replication list for the VNI, replicates the packet based on the list, and
performs VXLAN encapsulation. Leaf1 then forwards the VXLAN packet through the outbound
interface.
3. After the VTEP on Leaf2 or Leaf3 receives the VXLAN packet, it checks the UDP destination port
number, source and destination IP addresses, and VNI of the packet to determine the packet validity.
Leaf2 or Leaf3 obtains the Layer 2 broadcast domain based on the VNI and performs VXLAN
decapsulation to obtain the inner Layer 2 packet.
2022-07-08 1014
Feature Description
4. Leaf2 or Leaf3 checks the destination MAC address of the inner Layer 2 packet and finds it a BUM
MAC address. Therefore, Leaf2 or Leaf3 broadcasts the packet onto the network connected to
terminals (not the VXLAN tunnel side) in the Layer 2 broadcast domain. Specifically, Leaf2 or Leaf3
finds the outbound interfaces and encapsulation information not related to the VXLAN tunnel, adds
VLAN tags to the packet, and forwards the packet to TerminalB or TerminalC.
The forwarding process of a response packet from TerminalB/TerminalC to TerminalA is similar to the intra-subnet
forwarding process of known unicast packets.
2022-07-08 1015
Feature Description
1. After Leaf1 receives a packet from Host1, it determines the Layer 2 broadcast domain of the packet
based on the access interface and VLAN in the packet, and searches for the outbound interface and
encapsulation information in the Layer 2 broadcast domain.
2. The VTEP on Leaf1 performs VXLAN tunnel encapsulation based on the outbound interface and
encapsulation information, and forwards the packet to Spine.
3. Spine decapsulates the received VXLAN packet, finds that the destination MAC address in the inner
packet is MAC3 of the Layer 3 gateway interface VBDIF10, and determines that the packet needs to be
forwarded at Layer 3.
4. Spine removes the Ethernet header of the inner packet and parses the destination IP address. It then
searches the routing table based on the destination IP address to obtain the next hop address, and
searches ARP entries based on the next hop to obtain the destination MAC address, VXLAN tunnel
outbound interface, and VNI.
2022-07-08 1016
Feature Description
5. Spine re-encapsulates the VXLAN packet and forwards it to Leaf2. The source MAC address in the
Ethernet header of the inner packet is MAC4 of the Layer 3 gateway interface VBDIF20.
6. After the VTEP on Leaf2 receives the VXLAN packet, it checks the UDP destination port number, source
and destination IP addresses, and VNI of the packet to determine the packet validity. The VTEP then
obtains the Layer 2 broadcast domain based on the VNI, decapsulates the packet to obtain the inner
Layer 2 packet, and searches for the outbound interface and encapsulation information in the
corresponding Layer 2 broadcast domain.
7. Leaf2 adds a VLAN tag to the packet based on the outbound interface and encapsulation information,
and forwards the packet to Host2.
This mode supports the advertisement of host IP routes, MAC addresses, and ARP entries. For details, see
EVPN VXLAN Fundamentals. This mode is recommended for establishing VXLANs with distributed gateways.
The following uses an IPv4 over IPv4 network as an example. Table 1 shows the implementation differences
between IPv4 over IPv4 networks and other combinations of underlay and overlay networks.
IPv6 over In the inter-subnet forwarding scenario where VXLAN tunnels are established using BGP
IPv4 EVPN, if VXLAN gateways advertise IP prefix routes to each other, they can advertise only
network segment routes, and cannot advertise host routes.
During dynamic MAC address learning, the Layer 2 gateway learns the local host's MAC
address through neighbor discovery. Hosts at both ends learn each other's MAC address by
exchanging NS/NA packets.
2022-07-08 1017
Feature Description
During inter-subnet packet forwarding, a gateway must search the IPv6 routing table in the
local L3VPN instance.
IPv4 over A BGP EVPN IPv6 peer relationship is established between gateways.
IPv6 The VTEP IP addresses are IPv6 addresses.
IPv6 over A BGP EVPN IPv6 peer relationship is established between gateways.
IPv6 The VTEP IP addresses are IPv6 addresses.
During dynamic MAC address learning, the Layer 2 gateway learns the local host's MAC
address through neighbor discovery. Hosts at both ends learn each other's MAC address by
exchanging NS/NA packets.
During inter-subnet packet forwarding, a gateway must search the IPv6 routing table in the
local L3VPN instance.
A VXLAN tunnel is determined by a pair of VTEP IP addresses. When a local VTEP receives the same remote VTEP IP
address repeatedly, only one VXLAN tunnel can be established, but packets are encapsulated with different VNIs before
being forwarded through the tunnel.
2022-07-08 1018
Feature Description
In distributed gateway scenarios, BGP EVPN can be used to dynamically establish VXLAN tunnels in either of
the following situations:
Intra-subnet Communication
On the network shown in Figure 2, intra-subnet communication between Host2 and Host3 requires only
Layer 2 forwarding. The process for establishing a VXLAN tunnel using BGP EVPN is as follows.
2022-07-08 1019
Feature Description
1. First, a BGP EVPN peer relationship is established between Leaf1 and Leaf2. Then, Layer 2 broadcast
domains are created on Leaf1 and Leaf2, and VNIs are bound to the Layer 2 broadcast domains. Next,
an EVPN instance is configured in each Layer 2 broadcast domain, and an RD, an ERT, and an IRT are
configured for the EVPN instance. After the local VTEP IP address is configured on Leaf1 and Leaf2,
they generate a BGP EVPN route and send it to each other. The BGP EVPN route carries the local
EVPN instance's ERT and an inclusive multicast route (Type 3 route defined in BGP EVPN). Figure 3
shows the format of an inclusive multicast route, which comprises a prefix and a PMSI attribute. VTEP
IP addresses are stored in the Originating Router's IP Address field in the inclusive multicast route
prefix, and VNIs are stored in the MPLS Label field in the PMSI attribute. The VTEP IP address is also
included in the Next_Hop attribute.
2. After Leaf1 and Leaf2 receive a BGP EVPN route from each other, they match the ERT of the route
against the IRT of the local EVPN instance. If a match is found, the route is accepted. If no match is
found, the route is discarded. Leaf1 and Leaf2 obtain the peer VTEP IP address (from the Next_Hop
attribute) and VNI carried in the route. If the peer VTEP IP address is reachable at Layer 3, they
establish a VXLAN tunnel to the peer end. Moreover, the local end creates a VNI-based ingress
replication table and adds the peer VTEP IP address to the table for forwarding BUM packets.
A VPN target is an extended community attribute of BGP. An EVPN instance can have the IRT and ERT configured. The
local EVPN instance's ERT must match the remote EVPN instance's IRT for EVPN route advertisement. If not, VXLAN
tunnels cannot be dynamically established. If only one end can successfully accept the BGP EVPN route, this end can
establish a VXLAN tunnel to the other end, but cannot exchange data packets with the other end. The other end drops
packets after confirming that there is no VXLAN tunnel to the end that has sent these packets.
For details about VPN targets, see Basic BGP/MPLS IP VPN Fundamentals.
Inter-Subnet Communication
Inter-subnet communication between Host1 and Host2 requires Layer 3 forwarding. When VXLAN tunnels
are established using BGP EVPN, Leaf1 and Leaf2 must advertise host IP routes. Typically, 32-bit host IP
routes are advertised. Because different leaf nodes may connect to the same network segment on the
2022-07-08 1020
Feature Description
VXLAN network, the network segment routes advertised by the leaf nodes may conflict. This conflict may
cause host unreachability of some leaf nodes. Leaf nodes can advertise network segment routes in the
following scenarios:
• The network segment that a leaf node connects to is unique on a VXLAN, and a large number of
specific host routes are available. In this case, the routes of the network segment to which the host IP
routes belong can be advertised so that leaf nodes do not have to store all these routes.
• When hosts on a VXLAN need to access external networks, leaf nodes can advertise routes destined for
external networks onto the VXLAN to allow other leaf nodes to learn the routes.
Before establishing a VXLAN tunnel, perform configurations listed in the following table on Leaf1 and Leaf2.
Step Function
Create a Layer 2 broadcast domain and associate a A broadcast domain functions as a VXLAN network
Layer 2 VNI with the Layer 2 broadcast domain. entity to transmit VXLAN data packets.
Establish a BGP EVPN peer relationship between This configuration is used to exchange BGP EVPN
Leaf1 and Leaf2. routes.
Configure an EVPN instance in a Layer 2 broadcast This configuration is used to generate BGP EVPN
domain, and configure an RD, an ERT, and an IRT routes.
for the EVPN instance.
Configure L3VPN instances for tenants and bind the This configuration is used to differentiate and
L3VPN instances to the VBDIF interfaces of the isolate IP routing tables of different tenants.
Layer 2 broadcast domain.
Specify a Layer 3 VNI for an L3VPN instance. This configuration allows the leaf nodes to
determine the L3VPN routing table for forwarding
data packets.
Configure the export VPN target (eERT) and import This configuration controls the local L3VPN instance
VPN target (eIRT) for EVPN routes in the L3VPN to advertise and receive BGP EVPN routes.
instance.
Configure the type of route to be advertised This configuration is used to advertise IP routes
between Leaf1 and Leaf2. between Host1 and Host 2. Two types of routes are
available, IRB and IP prefix routes, which can be
selected as needed.
IRB routes advertise only 32-bit host IP routes. IRB
routes include ARP routes. Therefore, if only 32-bit
host IP routes need to be advertised, it is
recommended that IRB routes be advertised.
2022-07-08 1021
Feature Description
Step Function
Dynamic VXLAN tunnel establishment varies depending on how host IP routes are advertised.
• Host IP routes are advertised through IRB routes. (Figure 4 shows the process.)
1. When Host1 communicates with Leaf1 for the first time, Leaf1 learns the ARP entry of Host1
after receiving dynamic ARP packets. Leaf1 then finds the L3VPN instance bound to the VBDIF
interface of the Layer 2 broadcast domain where Host1 resides, and obtains the Layer 3 VNI
associated with the L3VPN instance. The EVPN instance of Leaf1 then generates an IRB route
based on the information obtained. Figure 5 shows the IRB route. The host IP address is stored in
the IP Address Length and IP Address fields; the Layer 3 VNI is stored in the MPLS Label2 field.
2022-07-08 1022
Feature Description
2. Leaf1 generates and sends a BGP EVPN route to Leaf2. The BGP EVPN route carries the local
EVPN instance's ERT, extended community attribute, Next_Hop attribute, and the IRB route. The
extended community attribute carries the tunnel type (VXLAN tunnel) and local VTEP MAC
address; the Next_Hop attribute carries the local VTEP IP address.
3. After Leaf2 receives the BGP EVPN route from Leaf1, Leaf2 processes the route as follows:
• If the ERT carried in the route is the same as the IRT of the local EVPN instance, the route is
accepted. After the EVPN instance obtains IRB routes, it can extract ARP routes from the IRB
routes for the advertisement of host ARP entries.
• If the ERT carried in the route is the same as the eIRT of the local L3VPN instance, the route
is accepted. Then, the L3VPN instance obtains the IRB route carried in the route, extracts the
host IP address and Layer 3 VNI of Host1, and saves the host IP route of Host1 to the routing
table. The outbound interface is obtained through recursion based on the next hop of the
route. The final recursion result is the VXLAN tunnel to Leaf1, as shown in Figure 6.
A BGP EVPN route is discarded only when the ERT in the route is different from the local EVPN
instance's IRT and local L3VPN instance's eIRT.
• If the route is accepted by the EVPN instance or L3VPN instance, Leaf2 obtains Leaf1's VTEP
IP address from the Next_Hop attribute. If the VTEP IP address is routable at Layer 3, a
VXLAN tunnel to Leaf1 is established.
2022-07-08 1023
Feature Description
1. Leaf1 generates a direct route to Host1's IP address. Then, Leaf1 has an L3VPN instance
configured to import the direct route, so that Host1's IP route is saved to the routing table of the
L3VPN instance and the Layer 3 VNI associated with the L3VPN instance is added. Figure 8 shows
the local host IP route.
If network segment route advertisement is required, use a dynamic routing protocol, such as OSPF. Then
configure an L3VPN instance to import the routes of the dynamic routing protocol.
2. Leaf1 is configured to advertise IP prefix routes in the L3VPN instance. Figure 9 shows the IP
prefix route. The host IP address is stored in the IP Prefix Length and IP Prefix fields; the Layer 3
VNI is stored in the MPLS Label field. Leaf1 generates and sends a BGP EVPN route to Leaf2. The
BGP EVPN route carries the local L3VPN instance's eERT, extended community attribute,
Next_Hop attribute, and the IP prefix route. The extended community attribute carries the tunnel
type (VXLAN tunnel) and local VTEP MAC address; the Next_Hop attribute carries the local VTEP
IP address.
2022-07-08 1024
Feature Description
3. After Leaf2 receives the BGP EVPN route from Leaf1, Leaf2 processes the route as follows:
• Matches the eERT of the route against the eIRT of the local L3VPN instance. If a match is
found, the route is accepted. Then, the L3VPN instance obtains the IP prefix type route
carried in the route, extracts the host IP address and Layer 3 VNI of Host1, and saves the
host IP route of Host1 to the routing table. The outbound interface is obtained through
recursion based on the next hop of the route. The final recursion result is the VXLAN tunnel
to Leaf1, as shown in Figure 10.
• If the route is accepted by the EVPN instance or L3VPN instance, Leaf2 obtains Leaf1's VTEP
IP address from the Next_Hop attribute. If the VTEP IP address is routable at Layer 3, a
VXLAN tunnel to Leaf1 is established.
2022-07-08 1025
Feature Description
1. Host3 sends dynamic ARP packets when it first communicates with Leaf1. Leaf1 learns the MAC
address of Host3 and the mapping between the BDID and packet inbound interface (that is, the
physical interface Port 1 corresponding to the Layer 2 sub-interface), and generates a MAC address
entry about Host3 in the local MAC address table, with the outbound interface being Port 1. Leaf1
generates a BGP EVPN route based on the ARP entry of Host3 and sends it to Leaf2. The BGP EVPN
route carries the local EVPN instance's ERT, Next_Hop attribute, and a Type 2 route (MAC/IP route)
defined in BGP EVPN. The Next_Hop attribute carries the local VTEP's IP address. The MAC Address
Length and MAC Address fields identify Host3's MAC address. The Layer 2 VNI is stored in the MPLS
Label1 field. Figure 12 shows the format of a MAC route or an IP route.
2. After receiving the BGP EVPN route from Leaf1, Leaf2 matches the ERT of the EVPN instance carried
in the route against the IRT of the local EVPN instance. If a match is found, the route is accepted. If no
match is found, the route is discarded. After accepting the route, Leaf2 obtains the MAC address of
Host3 and the mapping between the BDID and the VTEP IP address (Next_Hop attribute) of Leaf1,
and generates the MAC address entry of the Host3 in the local MAC address table. The outbound
2022-07-08 1026
Feature Description
interface is obtained through recursion based on the next hop, and the final recursion result is the
VXLAN tunnel destined for Leaf1.
Leaf nodes can learn the MAC addresses of hosts during data forwarding, depending on their capabilities to learn MAC
addresses from data packets. If VXLAN tunnels are established using BGP EVPN, leaf nodes can dynamically learn the
MAC addresses of hosts through BGP EVPN routes, rather than during data forwarding.
1. After Leaf1 receives a packet from Host3, it determines the Layer 2 broadcast domain of the packet
based on the access interface and VLAN information, and searches for the outbound interface and
2022-07-08 1027
Feature Description
2. Leaf1's VTEP performs VXLAN encapsulation based on the obtained encapsulation information and
forwards the packet through the outbound interface obtained.
3. After the VTEP on Leaf2 receives the VXLAN packet, it checks the UDP destination port number, source
and destination IP addresses, and VNI of the packet to determine the packet validity. Leaf2 obtains the
Layer 2 broadcast domain based on the VNI and performs VXLAN decapsulation to obtain the inner
Layer 2 packet.
4. Leaf2 obtains the destination MAC address of the inner Layer 2 packet, adds a VLAN tag to the packet
based on the outbound interface and encapsulation information in the local MAC address table, and
forwards the packet to Host2.
2022-07-08 1028
Feature Description
1. After Leaf1 receives a packet from TerminalA, it determines the Layer 2 broadcast domain of the
packet based on the access interface and VLAN information in the packet.
2. Leaf1's VTEP obtains the ingress replication list for the VNI, replicates the packet based on the list, and
performs VXLAN encapsulation. Leaf1 then forwards the VXLAN packet through the outbound
interface.
3. After the VTEP on Leaf2 or Leaf3 receives the VXLAN packet, it checks the UDP destination port
number, source and destination IP addresses, and VNI of the packet to determine the packet validity.
Leaf2 or Leaf3 obtains the Layer 2 broadcast domain based on the VNI and performs VXLAN
decapsulation to obtain the inner Layer 2 packet.
2022-07-08 1029
Feature Description
4. Leaf2 or Leaf3 checks the destination MAC address of the inner Layer 2 packet and finds it a BUM
MAC address. Therefore, Leaf2 or Leaf3 broadcasts the packet onto the network connected to
terminals (not the VXLAN tunnel side) in the Layer 2 broadcast domain. Specifically, Leaf2 or Leaf3
finds the outbound interfaces and encapsulation information not related to the VXLAN tunnel, adds
VLAN tags to the packet, and forwards the packet to TerminalB or TerminalC.
The forwarding process of a response packet from TerminalB/TerminalC to TerminalA is similar to the intra-subnet
forwarding process of known unicast packets.
2022-07-08 1030
Feature Description
1. After Leaf1 receives a packet from Host1, it finds that the destination MAC address of the packet is a
gateway MAC address so that the packet must be forwarded at Layer 3.
2. Leaf1 first determines the Layer 2 broadcast domain of the packet based on the inbound interface and
then finds the L3VPN instance to which the VBDIF interface of the Layer 2 broadcast domain is bound.
Leaf1 searches the routing table of the L3VPN instance for a matching host route based on the
destination IP address of the packet and obtains the Layer 3 VNI and next hop address corresponding
to the route. Figure 16 shows the host route in the L3VPN routing table. If the outbound interface is a
VXLAN tunnel, Leaf1 determines that VXLAN encapsulation is required and then:
• Obtains MAC addresses based on the VXLAN tunnel's source and destination IP addresses and
replaces the source and destination MAC addresses in the inner Ethernet header.
• Encapsulates the VXLAN tunnel's destination and source IP addresses in the outer header. The
source MAC address is the MAC address of the outbound interface on Leaf1, and the destination
MAC address is the MAC address of the next hop.
3. The VXLAN packet is then transmitted over the IP network based on the IP and MAC addresses in the
outer headers and finally reaches Leaf2.
4. After Leaf2 receives the VXLAN packet, it decapsulates the packet and finds that the destination MAC
address is its own MAC address. It then determines that the packet must be forwarded at Layer 3.
5. Leaf2 finds the corresponding L3VPN instance based on the Layer 3 VNI carried in the packet. Then,
Leaf2 searches the routing table of the L3VPN instance and finds that the next hop of the packet is
the gateway interface address. Leaf2 then replaces the destination MAC address with the MAC address
of Host2, replaces the source MAC address with the MAC address of Leaf2, and forwards the packet to
Host2.
When Huawei devices need to communicate with non-Huawei devices, ensure that the non-Huawei devices use the
same forwarding mode. Otherwise, the Huawei devices may fail to communicate with non-Huawei devices.
2022-07-08 1031
Feature Description
Background
To meet the requirements of inter-regional operations, user access, geographical redundancy, and other
scenarios, an increasing number of enterprises deploy DCs across regions. Data Center Interconnect (DCI) is
a solution that enables communication between VMs in different DCs. Using technologies such as VXLAN
and BGP EVPN, DCI securely and reliably transmits DC packets over carrier networks. Three-segment VXLAN
can be configured to enable inter-subnet communication between VMs in different DCs.
Benefits
Three-segment VXLAN enables Layer 3 communication between DC and offers the following benefits to
users:
• Different DCs do not need to run the same routing protocol for communication.
Implementation
Three-segment VXLAN establishes one VXLAN tunnel segment in each of the DCs and also establishes one
VXLAN tunnel segment between the DCs. As shown in Figure 1, BGP EVPN is used to create VXLAN tunnels
in distributed gateway mode within both DC A and DC B so that the VMs in each DC can communicate with
each other. Leaf2 and Leaf3 are the edge devices within the DCs that connect to the backbone network. BGP
EVPN is used to configure a VXLAN tunnel between Leaf2 and Leaf3, so that the VXLAN packets received by
one DC can be decapsulated, re-encapsulated, and sent to the peer DC. This process provides E2E transport
for inter-DC VXLAN packets and ensures that VMs in different DCs can communicate with each other.
2022-07-08 1032
Feature Description
Control Plane
The following describes how three-segment VXLAN tunnels are established.
The process of advertising routes on Leaf1 and Leaf4 is not described in this section. For details, see VXLAN Tunnel
Establishment.
1. Leaf4 learns the IP address of VMb2 in DC B and saves it to the routing table for the L3VPN instance.
Leaf4 then sends a BGP EVPN route to Leaf3.
2. As shown in Figure 2, Leaf3 receives the BGP EVPN route and obtains the host IP route contained in it.
Leaf3 then establishes a VXLAN tunnel to Leaf 4 according to the process described in VXLAN Tunnel
Establishment. Leaf3 sets the next hop of the route to its own VTEP address, re-encapsulates the route
with the Layer 3 VNI of the L3VPN instance, and sets the source MAC address of the route to its own
MAC address. Finally, Leaf3 sends the re-encapsulated BGP EVPN route to Leaf2.
2022-07-08 1033
Feature Description
3. Leaf2 receives the BGP EVPN route and obtains the host IP route contained in it. Leaf2 then
establishes a VXLAN tunnel to Leaf3 according to the process described in VXLAN Tunnel
Establishment. Leaf2 sets the next hop of the route to its own VTEP address, re-encapsulates the route
with the Layer 3 VNI of the L3VPN instance, and sets the source MAC address of the route to its own
MAC address. Finally, Leaf2 sends the re-encapsulated BGP EVPN route to Leaf1.
4. Leaf1 receives the BGP EVPN route and establishes a VXLAN tunnel to Leaf2 according to the process
described in VXLAN Tunnel Establishment.
A general overview of the packet forwarding process on Leaf1 and Leaf4 is provided below. For detailed information, see
Intra-subnet Packet Forwarding.
1. Leaf1 receives Layer 2 packets destined for VMb2 from VMa1 and determines that the destination
MAC addresses in these packets are all gateway interface MAC addresses. Leaf1 then terminates these
Layer 2 packets and finds the L3VPN instance corresponding to the BDIF interface through which
VMa1 accesses the broadcast domain. Leaf1 then searches the L3VPN instance routing table for the
VMb2 host route, encapsulates the received packets as VXLAN packets, and sends them to Leaf2 over
the VXLAN tunnel.
2. As shown in Figure 3, Leaf2 receives and parses these VXLAN packets. After finding the L3VPN
instance corresponding to the Layer 3 VNI of the packets, Leaf2 searches the L3VPN instance routing
table for the VMb2 host route. Leaf2 then re-encapsulates these VXLAN packets (setting the Layer 3
VNI and inner destination MAC address to the Layer 3 VNI and MAC address carried in the VMb2 host
2022-07-08 1034
Feature Description
3. As shown in Figure 3, Leaf3 receives and parses these VXLAN packets. After finding the L3VPN
instance corresponding to the Layer 3 VNI of the packets, Leaf3 searches the L3VPN instance routing
table for the VMb2 host route. Leaf3 then re-encapsulates these VXLAN packets (setting the Layer 3
VNI and inner destination MAC address to the Layer 3 VNI and MAC address carried in the VMb2 host
route sent by Leaf4). Finally, Leaf3 sends these packets to Leaf4.
4. Leaf4 receives and parses these VXLAN packets. After finding the L3VPN instance corresponding to the
Layer 3 VNI of the packets, Leaf4 searches the L3VPN instance routing table for the VMb2 host route.
Using this routing information, Leaf4 forwards these packets to VMb2.
Other Functions
Local leaking of EVPN routes is needed in scenarios where different VPN instances are used for the access of
different services in a DC and but an external VPN instance is used to communicate with other DCs to block
VPN instance allocation information within the DC from the outside. Depending on route sources, this
function can be used in the following scenarios:
Local VPN routes are advertised through EVPN after being locally leaked
2022-07-08 1035
Feature Description
1. The function to import VPN routes to a local VPN instance named vpn1 is configured in the BGP VPN
instance IPv4 or IPv6 address family.
2. vpn1 sends received routes to the VPNv4 or VPNv6 component, which then checks whether the ERT of
vpn1 is the same as the IRT of the external VPN instance vpn2. If they are the same, the VPNv4 or
VPNv6 component imports these routes to vpn2.
3. vpn2 sends locally leaked routes to the EVPN component and advertises these routes as BGP EVPN
routes to peers. In this case, vpn2 must be able to advertise locally leaked routes as BGP EVPN routes.
Remote public network routes are advertised through EVPN after being locally leaked
1. The EVPN component receives public network routes from a remote peer.
3. vpn1 sends received routes to the VPNv4 or VPNv6 component, which then checks whether the ERT of
vpn1 is the same as the IRT of vpn2. If they are the same, the VPNv4 or VPNv6 component imports
these routes to vpn2. In this case, vpn1 must be able to perform remote and local route leaking route
leaking in succession.
4. vpn2 sends locally leaked routes to the EVPN component and advertises these routes as BGP EVPN
routes to peers. In this case, vpn2 must be able to advertise locally leaked routes as BGP EVPN routes.
Background
2022-07-08 1036
Feature Description
Figure 1 shows the scenario where three-segment VXLAN is deployed to implement Layer 2 interconnection
between DCs. VXLAN tunnels are configured both within DC A and DC B and between transit leaf nodes in
both DCs. To enable communication between VM1 and VM2, implement Layer 2 communication between
DC A and DC B. If the VXLAN tunnels within DC A and DC B use the same VXLAN Network Identifier (VNI),
this VNI can also be used to establish a VXLAN tunnel between Transit Leaf1 and Transit Leaf2. In practice,
however, different DCs have their own VNI spaces. Therefore, the VXLAN tunnels within DC A and DC B tend
to use different VNIs. In this case, to establish a VXLAN tunnel between Transit Leaf1 and Transit Leaf2, VNIs
conversion must be implemented.
Benefits
This solution offers the following benefits to users:
• Decouples the VNI space of the network within a DC from that of the network between DCs, simplifying
network maintenance.
• Isolates network faults within a DC from those between DCs, facilitating fault location.
Principles
Currently, this solution is implemented in the local VNI mode. It is similar to downstream label allocation.
The local VNI of the peer transit leaf node functions as the outbound VNI, which is used by packets that the
local transit leaf node sends to the peer transit leaf node for VXLAN encapsulation.
Control Plane
2022-07-08 1037
Feature Description
1. Server Leaf1 learns VM1's MAC address, generates a BGP EVPN route, and sends it to Transit Leaf1.
The BGP EVPN route contains the following information:
• Type 2 route: EVPN instance's RD value, VM1's MAC address, and Server Leaf1's local VNI.
2. Upon receipt, Transit Leaf1 adds the BGP EVPN route to its local EVPN instance and generates a MAC
address entry for VM1 in the EVPN instance-bound BD. Based on the next hop and encapsulated
tunnel type, the MAC address entry's outbound interface recurses to the VXLAN tunnel destined for
Server Leaf1. The VNI in VXLAN tunnel encapsulation information is Transit Leaf1's local VNI.
3. Transit Leaf1 re-originates the BGP EVPN route and then advertises the route to Transit Leaf2. The re-
originated BGP EVPN route contains the following information:
• Type 2 route: EVPN instance's RD value, VM1's MAC address, and Transit Leaf1's local VNI.
4. Upon receipt, Transit Leaf2 adds the re-originated BGP EVPN route to its local EVPN instance and
generates a MAC address entry for VM1 in the EVPN instance-bound BD. Based on the next hop and
encapsulated tunnel type, the MAC address entry's outbound interface recurses to the VXLAN tunnel
destined for Transit Leaf1. The outbound VNI in VXLAN tunnel encapsulation information is Transit
Leaf1's local VNI.
5. Transit Leaf2 re-originates the BGP EVPN route and then advertises the route to Server Leaf2. The re-
2022-07-08 1038
Feature Description
• Type 2 route: EVPN instance's RD value, VM1's MAC address, and Transit Leaf2's local VNI.
6. Upon receipt, Server Leaf2 adds the re-originated BGP EVPN route to its local EVPN instance and
generates a MAC address entry for VM1 in the EVPN instance-bound BD. Based on the next hop and
encapsulated tunnel type, the MAC address entry's outbound interface recurses to the VXLAN tunnel
destined for Transit Leaf2. The VNI in VXLAN tunnel encapsulation information is Server Leaf2's local
VNI.
The preceding process takes MAC address learning by VM1 for example. MAC address learning by VM2 is the same,
which is not described here.
Forwarding Plane
Figure 3 shows the known unicast packets are forwarded. The following example process shows how VM2
sends Layer 2 packets to VM1:
Figure 3 Known unicast packet forwarding with VXLAN mapping in local VNI mode
1. After receiving a Layer 2 packet from VM2 through a BD Layer 2 sub-interface, Server Leaf2 searches
the BD's MAC address table based on the destination MAC address for the VXLAN tunnel's outbound
interface and obtains VXLAN tunnel encapsulation information (local VNI, destination VTEP IP address,
and source VTEP IP address). Based on the obtained information, the Layer 2 packet is encapsulated
through the VXLAN tunnel and then forwarded to Transit Leaf2.
2. Upon receipt, Transit Leaf2 decapsulates the VXLAN packet, finds the target BD based on the VNI,
searches the BD's MAC address table based on the destination MAC address for the VXLAN tunnel's
outbound interface, and obtains the VXLAN tunnel encapsulation information (outbound VNI,
destination VTEP IP address, and source VTEP IP address). Based on the obtained information, the
2022-07-08 1039
Feature Description
Layer 2 packet is encapsulated through the VXLAN tunnel and then forwarded to Transit Leaf1.
3. Upon receipt, Transit Leaf1 decapsulates the VXLAN packet. Because the packet's VNI is Transit Leaf1's
local VNI, the target BD can be found based on this VNI. Transit Leaf1 also searches the BD's MAC
address table based on the destination MAC address for the VXLAN tunnel's outbound interface and
obtains the VXLAN tunnel encapsulation information (local VNI, destination VTEP IP address, and
source VTEP IP address). Based on the obtained information, the Layer 2 packet is encapsulated
through the VXLAN tunnel and then forwarded to Server Leaf1.
4. Upon receipt, Server Leaf1 decapsulates the VXLAN packet and forwards it at Layer 2 to VM1.
In the scenario with three-segment VXLAN for Layer 2 interworking, BUM packet forwarding is the same as that in the
common VXLAN scenario except that the split horizon group is used to prevent loops. The similarities are not described
here.
• After receiving BUM packets from a Server Leaf node in the same DC, a Transit Leaf node obtains the split horizon
group to which the source VTEP belongs. Because all nodes in the same DC belong to the default split horizon
group, BUM packets will not be replicated to other Server Leaf nodes within the DC. Because the peer Transit Leaf
node belongs to a different split horizon group, BUM packets will be replicated to the peer Transit Leaf node.
• Upon receipt, the peer Transit Leaf node obtains the split horizon group to which the source VTEP belongs. Because
the Transit Leaf nodes at both ends belong to the same split horizon group, BUM packets will not be replicated to
the peer Transit Leaf node. Because the Server Leaf nodes within the DC belong to a different split horizon group,
BUM packets will be replicated to them.
Basic Concepts
The network in Figure 1 shows a scenario where an enterprise site (CPE) connects to a data center. The VPN
GWs (PE1 and PE2) and CPE are connected through VXLAN tunnels to exchange the L2/L3 services between
the CPE and data center. The data center gateway (CE1) is dual-homed to PE1 and PE2 to access the VXLAN
network for enhanced network access reliability. If one PE fails, services can be rapidly switched to the other
PE, minimizing service loss.
PE1 and PE2 on the network use the same virtual address as an NVE interface address (Anycast VTEP
address) at the network side. In this way, the CPE is aware of only one remote NVE interface. After the CPE
establishes a VXLAN tunnel with this virtual address, the packets from the CPE can reach CE1 through either
PE1 or PE2. However, when a single-homed CE, such as CE2 or CE3, exists on the network, the packets from
the CPE to the single-homed CE may need to detour to the other PE after reaching one PE. To achieve PE1-
PE2 reachability, a bypass VXLAN tunnel needs to be established between PE1 and PE2. To establish this
tunnel, an EVPN peer relationship is established between PE1 and PE2, and different addresses, namely,
bypass VTEP addresses, are configured for PE1 and PE2.
2022-07-08 1040
Feature Description
Control Plane
• PE1 and PE2 exchange Inclusive Multicast routes (Type 3) whose source IP address is their shared
anycast VTEP address. Each route carries a bypass VXLAN extended community attribute, which contains
the bypass VTEP address of PE1 or PE2.
• After receiving the Inclusive Multicast route from each other, PE1 and PE2 consider that they form an
anycast relationship based on the following details: The source IP address (anycast VTEP address) of the
route is identical to PE1's and PE2's local virtual addresses, and the route carries a bypass VXLAN
extended community attribute. PE1 and PE2 then establish a bypass VXLAN tunnel between them.
• PE1 and PE2 learn the MAC addresses of the CEs through the upstream packets from the AC side and
advertise the MAC/IP routes (Type 2) to each other. The routes carry the ESIs of the access links of the
CEs, information about the VLANs that the CEs access, and the bypass VXLAN extended community
attribute.
• PE1 and PE2 learn the MAC address of the CPE through downstream packets from the network side.
After learning that the next-hop address of the MAC route can be recursed to a static VXLAN tunnel,
PE1 and PE2 advertise the route to each other through an MAC/IP route, without changing the next-hop
address.
■ Uplink
As shown in Figure 2, after receiving Layer 2 unicast packets destined for the CPE from CE1, CE2,
and CE3, PE1 and PE2 search for their local MAC address table to obtain outbound interfaces,
perform VXLAN encapsulation on the packets, and forward them to the CPE.
2022-07-08 1041
Feature Description
■ Downlink
As shown in Figure 3:
After receiving a Layer 2 unicast packet sent by the CPE to CE1, PE1 performs VXLAN decapsulation
on the packet, searches the local MAC address table for the destination MAC address, obtains the
outbound interface, and forwards the packet to CE1.
After receiving a Layer 2 unicast packet sent by the CPE to CE2, PE1 performs VXLAN decapsulation
on the packet, searches the local MAC address table for the destination MAC address, obtains the
outbound interface, and forwards the packet to CE2.
After receiving a Layer 2 unicast packet sent by the CPE to CE3, PE1 performs VXLAN decapsulation
on the packet, searches the local MAC address table for the destination MAC address, and forwards
it to PE2 over the bypass VXLAN tunnel. After the packet reaches PE2, PE2 searches the destination
MAC address, obtains the outbound interface, and forwards the packet to CE3.
The process for PE2 to forward packets from the CPE is the same as that for PE1 to forward
packets from the CPE.
2022-07-08 1042
Feature Description
■ As shown in Figure 4, if the destination address of a BUM packet from the CPE is the Anycast VTEP
address of PE1 and PE2, the BUM packet may be forwarded to either PE1 or PE2. If the BUM
packet reaches PE2 first, PE2 sends a copy of the packet to CE3 and CE1. In addition, PE2 sends a
copy of the packet to PE1 through the bypass VXLAN tunnel between PE1 and PE2. After the copy
of the packet reaches PE1, PE1 sends it to CE2, not to the CPE or CE1. In this way, CE1 receives only
one copy of the packet.
■ As shown in Figure 5, after a BUM packet from CE2 reaches PE1, PE1 sends a copy of the packet to
CE1 and the CPE. In addition, PE1 sends a copy of the packet to PE2 through the bypass VXLAN
2022-07-08 1043
Feature Description
tunnel between PE1 and PE2. After the copy of the packet reaches PE2, PE2 sends it to CE3, not to
the CPE or CE1.
■ As shown in Figure 6, after a BUM packet from CE1 reaches PE1, PE1 sends a copy of the packet to
CE2 and the CPE. In addition, PE1 sends a copy of the packet to PE2 through the bypass VXLAN
tunnel between PE1 and PE2. After the copy of the packet reaches PE2, PE2 sends it to CE3, not to
the CPE or CE1.
■ Uplink
As shown in Figure 2, after receiving Layer 3 unicast packets destined for the CPE from CE1, CE2,
2022-07-08 1044
Feature Description
and CE3, PE1 and PE2 search for the destination address and directly forward them to the CPE
because they are on the same network segment.
■ Downlink
As shown in Figure 3:
After the Layer 3 unicast packet sent from the CPE to CE1 reaches PE1, PE1 searches for the
destination address and directly sends it to CE1 because they are on the same network segment.
After the Layer 3 unicast packet sent from the CPE to CE2 reaches PE1, PE1 searches for the
destination address and directly sends it to CE2 because they are on the same network segment.
After the Layer 3 unicast packet sent from the CPE to CE3 reaches PE1, PE1 searches for the
destination address and sends it to PE2, then sends it to CE3, because they are on the same
network segment.
The process for PE2 to forward packets from the CPE is the same as that for PE1 to forward
packets from the CPE.
■ Uplink
As shown in Figure 2:
Because the CPE is on a different network segment from PE1 and PE2, the destination MAC address
of a Layer 3 unicast packet sent from CE1, CE2, or CE3 to the CPE is the MAC address of the BDIF
interface on the Layer 3 gateway of PE1 or PE2. After receiving the packet, PE1 or PE2 removes the
Layer 2 tag from the packet, searches for a matching Layer 3 routing entry, and obtains the
outbound interface that is the BDIF interface connecting the CPE to the Layer 3 gateway. The BDIF
interface searches the ARP table, obtains the destination MAC address, encapsulates the packet
into a VXLAN packet, and sends it to the CPE through the VXLAN tunnel.
After receiving the Layer 3 packet from PE1 or PE2, the CPE removes the Layer 2 tag from the
packet because the destination MAC address is the MAC address of the BDIF interface on the CPE.
Then the CPE searches the Layer 3 routing table to obtain a next-hop address to forward the
packet.
■ Downlink
As shown in Figure 3:
Before sending a Layer 3 unicast packet to CE1 across subnets, the CPE searches its Layer 3 routing
table and obtains the outbound interface that is the BDIF interface on the Layer 3 gateway
connecting to PE1. The BDIF interface searches the ARP table to obtain the destination MAC
address, encapsulates the packet into a VXLAN packet, and forwards it to PE1 over the VXLAN
tunnel.
After receiving the packet from the CPE, PE1 removes the Layer 2 tag from the packet because the
destination address of the packet is the MAC address of PE1's BDIF interface. Then PE1 searches
the Layer 3 routing table and obtains the outbound interface that is the BDIF interface connecting
PE1 to its attached CE. The BDIF interface searches its ARP table and obtains the destination
address, performs Layer-2 encapsulation for the packet, and sends it to CE1.
The process for PE2 to forward packets from the CPE is the same as that for PE1 to forward
2022-07-08 1045
Feature Description
The vUGW is a unified packet gateway developed based on Huawei's CloudEdge solution. It can be used for 3rd
Generation Partnership Project (3GPP) access in general packet radio service (GPRS), Universal Mobile
Telecommunications System (UMTS), and Long Term Evolution (LTE) modes. The vUGW can function as a gateway
GPRS support node (GGSN), serving gateway (S-GW), or packet data network gateway (P-GW) to meet carriers' various
networking requirements in different phases and operational scenarios.
The vMSE is developed based on Huawei's multi-service engine (MSE). The carrier's network has multiple functional
boxes deployed, such as the firewall box, video acceleration box, header enrichment box, and URL filtering box. All
functions are added through patch installation. As time goes by, the network becomes increasingly slow, complicating
service rollout and maintenance. To solve this problem, the vMSE integrates the functions of these boxes and manages
these functions in a unified manner, providing value-added services for the data services initiated by users.
Networking Overview
Figure 1 and Figure 2 show NFVI distributed gateway networking. The DC gateways are the DCN's border
gateways, which exchange Internet routes with the external network through PEs. L2GW/L3GW1 and
L2GW/L3GW2 access the virtualized network functions (VNFs). VNF1 and VNF2 can be deployed as
virtualized NEs to implement the vUGW and vMSE functions and connect to L2GW/L3GW1 and
L2GW/L3GW2 through the interface processing unit (IPU).
This networking can be considered a combination of the distributed gateway function and VXLAN active-
active/quad-active gateway function.
• The distributed gateway function is deployed on L2GW/L3GW1 and L2GW/L3GW2, and a VXLAN tunnel
is established between them.
In the NFVI distributed gateway scenario, the NE40E can function as either a DC gateway or an L2GW/L3GW. However,
if the NE40E is used as an L2GW/L3GW, east-west traffic cannot be balanced.
Each L2GW/L3GW in Figure 1 represents two devices on the live network. anycast VXLAN active-active is configured on
2022-07-08 1046
Feature Description
Function Deployment
On the network shown in Figure 1, the number of bridge domains (BDs) must be planned according to the
number of network segments to which the IPUs belong. For example, if five IP addresses planned for five
IPUs are allocated to four network segments, you need to plan four different BDs. You also need to
configure all BDs and VBDIF interfaces on each of the DC gateways and L2GWs/L3GWs, and bind all VBDIF
interfaces to the same L3VPN instance. In addition, ensure that:
• A VPN BGP peer relationship is set up between each VNF and DC gateway, so that the VNF can
advertise UE routes to the DC gateway.
• Static VPN routes are configured on L2GW/L3GW1 and L2GW/L3GW2 for them to access VNFs. The
2022-07-08 1047
Feature Description
routes' destination IP addresses are the VNFs' IP addresses, and the next hop addresses are the IP
addresses of the IPUs.
• A BGP EVPN peer relationship is established between each DC gateway and L2GW/L3GW. An
L2GW/L3GW can flood static routes to the VNFs to other devices through BGP EVPN peer relationships.
A DC gateway can advertise local loopback routes and default routes to the L2GWs/L3GWs through the
BGP EVPN peer relationships.
• Traffic exchanged between a UE and the Internet through a VNF is called north-south traffic, whereas
traffic exchanged between VNF1 and VNF2 is called east-west traffic. Load balancing is configured on
DC gateways and L2GWs/L3GWs to balance both north-south and east-west traffic.
Forwarding entries are generated on each DC gateway and L2GW/L3GW through the following process:
1. BDs are deployed on each L2GW/L3GW and bound to links connecting to the IPU interfaces on the
associated network segments. Then, VBDIF interfaces are configured as the gateways of these IPU
interfaces. The number of BDs is the same as that of network segments to which the IPU interfaces
belong. A static VPN route is configured on each L2GW/L3GW, so that the L2GW/L3GW can generate
a route forwarding entry with the destination address being the VNF address, next hop being the IPU
address, and outbound interface being the associated VBDIF interface.
2022-07-08 1048
Feature Description
2. An L2GW/L3GW learns IPU MAC address and ARP information through the data plane, and then
advertises the information as an EVPN route to DC gateways. The information is then used to
generate an ARP entry and MAC forwarding entry for Layer 2 forwarding.
• The destination MAC addresses in MAC forwarding entries on the L2GW/L3GW are the MAC
addresses of the IPUs. For IPUs directly connecting to an L2GW/L3GW (for example, in Figure 1,
IPU1, IPU2, and IPU3 directly connect to L2GW/L3GW1), these IPUs are used as outbound
interfaces in the MAC forwarding entries on the L2GW/L3GW. For IPUs connecting to the other
L2GW/L3GW (for example, IPU4 and IPU5 connect to L2GW/L3GW2 in Figure 1), the MAC
forwarding entries use the VTEP address of the other L2GW/L3GW (L2GW/L3GW2) as the next
hop and carry the L2 VNI used for Layer 2 forwarding.
• In MAC forwarding entries on a DC gateway, the destination MAC address is the IPU MAC
address, and the next hop is the L2GW/L3GW VTEP address. These MAC forwarding entries also
store the L2 VNI information of the corresponding BDs.
To forward incoming traffic only at Layer 2, you are advised to configure devices to advertise only ARP (ND)
routes to each other. In this way, the DC gateway and L2GW/L3GW do not generate IP prefix routes based on IP
addresses. If the devices are configured to advertise IRB (IRBv6) routes to each other, enable the IRB asymmetric
mode on devices that receive routes.
2022-07-08 1049
Feature Description
3. After static VPN routes are configured on the L2GW/L3GW, they are imported into the BGP EVPN
routing table and then sent as IP prefix routes to the DC gateway through the BGP EVPN peer
relationship.
There are multiple links and static routes between the L2GW/L3GW and VNF. To implement load balancing, you
need to enable the Add-Path function when configuring static routes to be imported to the BGP EVPN routing
table.
4. By default, the next hop address of an IP prefix route received by the DC gateway is the IP address of
the L2GW/L3GW, and the route recurses to a VXLAN tunnel. In this case, incoming traffic is forwarded
at Layer 3. To forward incoming traffic at Layer 2, a routing policy must be configured on the
L2GW/L3GW to add the Gateway IP attribute to the static routes destined for the DC gateway.
Gateway IP addresses are the IP addresses of IPU interfaces. After receiving an IP prefix route carrying
the Gateway IP attribute, the DC gateway does not recurse the route to a VXLAN tunnel. Instead, it
performs IP recursion. Finally, the destination address of a route forwarding entry on the DC gateway
is the IP address of the VNF, the next hop is the IP address of an IPU interface, and the outbound
interface is the VBDIF interface corresponding to the network segment on which the IPU resides. If
traffic needs to be sent to the VNF, the forwarding entry can be used to find the corresponding VBDIF
interface, which then can be used to find the corresponding ARP entry and MAC entry for Layer 2
forwarding.
2022-07-08 1050
Feature Description
5. To establish a VPN BGP peer relationship with the VNF, the DC gateway needs to advertise its
loopback address to the L2GW/L3GW. In addition, because the DC gateway uses the anycast VTEP
address for the L2GW/L3GW, the VNF1-to-DCGW1 loopback protocol packets may be sent to DCGW2.
Therefore, the DC gateway needs to advertise its loopback address to the other DC gateway. Finally,
each L2GW/L3GW has a forwarding entry for the VPN route to the loopback addresses of DC
gateways, and each DC gateway has a forwarding entry for the VPN route to the loopback address of
the other DC gateway. After the VNF and DC gateways establish BGP peer relationships, the VNF can
send UE routes to the DC gateways, and the next hops of these routes are the VNF IP address.
2022-07-08 1051
Feature Description
6. The DCN does not need to be aware of external routes. Therefore, a route policy must be configured
on the DC gateway, so that the DC gateway can send default routes and loopback routes to the
L2GW/L3GW.
2022-07-08 1052
Feature Description
7. As the border gateway of the DCN, the DC gateway can exchange Internet routes with external PEs,
such as routes to server IP addresses on the Internet.
2022-07-08 1053
Feature Description
8. To implement load balancing during traffic transmission, load balancing and Add-Path can be
configured on the DC gateway and L2GW/L3GW. This balances both north-south and east-west traffic.
• North-south traffic balancing: Take DCGW1 in Figure 1 as an example. DCGW1 can receive EVPN
routes to VNF2 from L2GW/L3GW1 and L2GW/L3GW2. By default, after load balancing is
configured, DCGW1 sends half of traffic destined for VNF2 to L2GW/L3GW1 and half of traffic
destined for VNF2 to L2GW/L3GW2. However, L2GW/L3GW1 has only one link to VNF2, while
L2GW/L3GW2 has two links to VNF2. As a result, the traffic is not evenly balanced. To address
this issue, the Add-Path function must be configured on the L2GW/L3GWs. After Add-Path is
configured, L2GW/L3GW2 advertises two routes with the same destination address to DCGW1 to
implement load balancing.
2022-07-08 1054
Feature Description
1. Upon receipt of UE traffic, the base station encapsulates these packets and redirect them to a GPRS
tunneling protocol (GTP) tunnel whose destination address is the VNF IP address. The encapsulated
packets reach the DC gateway through IP forwarding.
2. Upon receipt, the DC gateway searches its virtual routing and forwarding (VRF) table and finds a
matching forwarding entry whose next hop is an IPU IP address and outbound interface is a VBDIF
interface. Therefore, the received packets match the network segment on which the VBDIF interface
resides. The DC gateway searches for the desired ARP entry on the network segment, finds a matching
MAC forwarding entry based on the ARP entry, and recurses the route to a VXLAN tunnel based on
the MAC forwarding entry. Then, the packets are forwarded to the L2GW/L3GW over a VXLAN tunnel.
3. Upon receipt, the L2GW/L3GW finds the target BD based on the L2 VNI, searches for a matching MAC
forwarding entry in the BD, and then forwards the packets to the VNF based on the MAC forwarding
entry.
4. After the packets reach the VNF, the VNF removes their GTP tunnel header, searches the routing table
based on their destination IP addresses, and forwards them to the L2GW/L3GW through the VNF's
default gateway.
5. After the packets reach the L2GW/L3GW, the L2GW/L3GW searches their VRF table for a matching
forwarding entry. Over the default route advertised by the DC gateway to the L2GW/L3GW, the
packets are encapsulated with the L3 VNI and then forwarded to the DC gateway through the VXLAN
tunnel.
6. Upon receipt, the DC gateway searches the corresponding VRF table for a matching forwarding entry
based on the L3 VNI and forwards these packets to the Internet.
2022-07-08 1055
Feature Description
Figure 10 shows the process of forwarding north-south traffic from the Internet to a UE through the VNF.
1. A device on the Internet sends response traffic to a UE. The destination address of the response traffic
is the destination address of the UE route. The route is advertised by the VNF to the DC gateway
through the VPN BGP peer relationship, and the DC gateway in turn advertises the route to the
Internet. Therefore, the response traffic must first be forwarded to the VNF first.
2. Upon receipt, the DC gateway searches the routing table for a forwarding entry that matches the UE
route. The route is advertised over the VPN BGP peer relationship between the DC gateway and VNF
and recurses to one or more VBDIF interfaces. Traffic is load-balanced among these VBDIF interfaces.
A matching MAC forwarding entry is found based on the ARP information on these VBDIF interfaces.
Based on the MAC forwarding entry, the response packets are encapsulated with the L2 VNI and then
forwarded to the L2GW/L3GW over a VXLAN tunnel.
3. Upon receipt, the L2GW/L3GW finds the target BD based on the L2 VNI, searches for a matching MAC
forwarding entry in the BD, obtains the outbound interface information from the MAC forwarding
2022-07-08 1056
Feature Description
4. Upon receipt, the VNF processes them and finds the base station corresponding to the destination
address of the UE. The VNF then encapsulates tunnel information into these packets (with the base
station as the destination) and forwards these packets to the L2GW/L3GW through the default
gateway.
5. Upon receipt, the L2GW/L3GW searches its VRF table for the default route advertised by the DC
gateway to the L2GW/L3GW. Then, the L2GW/L3GW encapsulates these packets with the L3 VNI and
forwards them to the DC gateway over a VXLAN tunnel.
6. Upon receipt, the DC gateway searches its VRF table for the default (or specific) route based on the L3
VNI and forwards these packets to the destination base station. The base station then decapsulates
these packets and sends them to the target UE.
During this process, the VNF may send the received packets to another VNF for value-added service
processing, based on the packet information. In this case, east-west traffic is generated. Figure 11 shows the
2022-07-08 1057
Feature Description
process of forwarding east-west traffic (from VNF1 to VNF2), which differs from the north-south traffic
forwarding process in packet processing after packets reach VNF1:
1. VNF1 sends a received packet to VNF2 for processing. VNF2 re-encapsulates the packet by using its
own address as the destination address of the packet and sends the packet to the L2GW/L3GW over
the default route.
2. Upon receipt, the L2GW/L3GW searches its VRF table and finds that multiple load-balancing
forwarding entries exist. Some entries use the IPU as the outbound interface, and some entries use the
L2GW/L3GW as the next hop.
3. If the path to the other L2GW/L3GW (L2GW/L3GW2) is selected preferentially, the packet is
encapsulated with the L2 VNI and forwarded to L2GW/L3GW2 over a VXLAN tunnel. L2GW/L3GW2
finds the target BD based on the L2 VNI and the destination MAC address, and forwards the packet to
VNF2.
4. Upon receipt, VNF2 processes the packet and forwards it to the Internet server. The subsequent
forwarding process is the same as the process for forwarding north-south traffic.
2022-07-08 1058
Feature Description
from the destination devices to UEs also undergoes this process. To meet the preceding requirements and
ensure that the UE traffic is load-balanced within the DCN, you need to deploy the NFVI distributed gateway
function on DCN devices.
The vUGW is a unified packet gateway developed based on Huawei's CloudEdge solution. It can be used for 3rd
Generation Partnership Project (3GPP) access in general packet radio service (GPRS), Universal Mobile
Telecommunications System (UMTS), and Long Term Evolution (LTE) modes. The vUGW can function as a gateway
GPRS support node (GGSN), serving gateway (S-GW), or packet data network gateway (P-GW) to meet carriers' various
networking requirements in different phases and operational scenarios.
The vMSE is developed based on Huawei's multi-service engine (MSE). The carrier's network has multiple functional
boxes deployed, such as the firewall box, video acceleration box, header enrichment box, and URL filtering box. All
functions are added through patch installation. As time goes by, the network becomes increasingly slow, complicating
service rollout and maintenance. To solve this problem, the vMSE integrates the functions of these boxes and manages
these functions in a unified manner, providing value-added services for the data services initiated by users.
Networking
Figure 1 and Figure 2 show NFVI distributed gateway networking. The DC gateways are the DCN's border
gateways, which exchange Internet routes with the external network through PEs. L2GW/L3GW1 and
L2GW/L3GW2 connect to virtualized network functions (VNFs). VNF1 and VNF2 can be deployed as
virtualized NEs to respectively provide vUGW and vMSE functions and connect to L2GW/L3GW1 and
L2GW/L3GW2 through interface processing units (IPUs).
This networking combines the distributed gateway function and the VXLAN active-active gateway function:
• The VXLAN active-active gateway function is deployed on DC gateways. Specifically, a bypass VXLAN
tunnel is established between DC gateways. Both DC gateways use the same virtual anycast VTEP
address to establish VXLAN tunnels with L2GW/L3GW1 and L2GW/L3GW2.
• The distributed gateway function is deployed on L2GW/L3GW1 and L2GW/L3GW2, and a VXLAN tunnel
is established between L2GW/L3GW1 and L2GW/L3GW2.
In the NFVI distributed gateway scenario, the NE40E functions as either a DCGW or an L2GW/L3GW. However, if the
NE40E is used as an L2GW/L3GW, east-west traffic cannot be balanced.
Each L2GW/L3GW in Figure 1 represents two devices on the live network. anycast VXLAN active-active is configured on
the devices for them to function as one to improve network reliability.
2022-07-08 1059
Feature Description
Function Deployment
On the network shown in Figure 1, the number of bridge domains (BDs) must be planned according to the
number of subnets to which the IPUs belong. For example, if five IP addresses planned for five IPUs are
allocated to four subnets, you need to plan four different BDs. You need to configure all BDs and VBDIF
interfaces only on L2GWs/L3GWs and bind all VBDIF interfaces to the same L3VPN instance. In addition,
deploy the following functions on the network:
• Establish VPN BGP peer relationships between VNFs and DC gateways, so that VNFs can advertise UE
routes to DC gateways.
• Establish BGP EVPN peer relationships between any two of the DC gateways and L2GWs/L3GWs.
L2GWs/L3GWs can then advertise VNF routes to DC gateways and other L2GWs/L3GWs through BGP
2022-07-08 1060
Feature Description
EVPN peer relationships. DC gateways can advertise the local loopback route and default route as well
as obtained UE routes to L2GWs/L3GWs through BGP EVPN peer relationships.
• Traffic forwarded between the UE and Internet through VNFs is called north-south traffic, and traffic
forwarded between VNF1 and VNF2 is called east-west traffic. To balance both types of traffic, you
need to configure load balancing on DC gateways and L2GWs/L3GWs.
Table 1 Differences between the asymmetric and symmetric modes in terms of forwarding entry generation
All traffic is forwarded at Layer 2 from DC gateways After traffic enters the DCN, the traffic is forwarded
to VNFs after entering the DCN, regardless of from DC gateways to the VNF at Layer 3. The traffic
whether it is from UEs to the Internet or vice versa. from the VNF to DC gateways and then out of the
However, after traffic leaves the DCN, it is DCN is also forwarded at Layer 3. On the network
forwarded at Layer 3 from VNFs to DC gateways. shown in Figure 2, IPUs connect to multiple
This prevents traffic loops between DC gateways L2GWs/L3GWs. Layer 3 forwarding is used between
and L2GWs/L3GWs. On the network shown in Figure DC gateways and VNFs, and some traffic forwarded
2, IPUs connect to multiple L2GWs/L3GWs. If Layer by an L2GW/L3GW to the VNF will be forwarded
3 forwarding is used between DC gateways and over a VXLAN tunnel to another L2GW/L3GW due to
VNFs, some traffic forwarded by an L2GW/L3GW to load balancing. After receiving VXLAN traffic, an
the VNF will be forwarded to another L2GW/L3GW L2GW/L3GW searches for matching routes. If these
due to load balancing. For example, L2GW/L3GW2 routes work in hybrid load-balancing mode, the
forwards some of the traffic to L2GW/L3GW1 and L2GW/L3GW preferentially selects the access-side
vice versa. As a result, a traffic loop occurs. If Layer outbound interface to forward the traffic, preventing
2 forwarding is used, the L2GW/L3GW does not loops.
forward the Layer 2 traffic received from another
L2GW/L3GW back, preventing traffic loops.
2022-07-08 1061
Feature Description
In symmetric mode, forwarding entries are created on each DC gateway and L2GW/L3GW as follows:
1. BDs are deployed on each L2GW/L3GW and bound to links connecting to the IPU interfaces on the
associated network segments. Then, VBDIF interfaces are configured as the gateways of these IPU
interfaces. The number of BDs is the same as that of network segments to which the IPU interfaces
belong. A VPN static route is configured on each L2GW/L3GW or a VPN IGP neighbor relationship is
established between each L2GW/L3GW and the VNF, so that the L2GW/L3GW can generate a route
forwarding entry with the destination address being the VNF address, next hop being the IPU address,
and outbound interface being the associated VBDIF interface.
Figure 3 Route forwarding entry for traffic from an L2GW/L3GW to the VNF
2. After VPN static or IGP routes are configured on the L2GW/L3GW, they are imported into the BGP
EVPN routing table and then sent as IP prefix routes to the DC gateway through the BGP EVPN peer
relationship.
There are multiple links and routes between the L2GW/L3GW and VNF. To implement load balancing, you need
to enable the Add-Path function when configuring routes to be imported into the BGP EVPN routing table.
3. The next hop address of an IP prefix route received by the DC gateway is the IP address of the
L2GW/L3GW, and the route recurses to a VXLAN tunnel. In this case, incoming traffic is forwarded at
Layer 3.
2022-07-08 1062
Feature Description
4. To establish a VPN BGP peer relationship with the VNF, the DC gateway needs to advertise its
loopback address to the L2GW/L3GW. In addition, because the DC gateway uses the anycast VTEP
address for the L2GW/L3GW, the VNF1-to-DCGW1 loopback protocol packets may be sent to DCGW2.
Therefore, the DC gateway needs to advertise its loopback address to the other DC gateway. Finally,
each L2GW/L3GW has a forwarding entry for the VPN route to the loopback addresses of DC
gateways, and each DC gateway has a forwarding entry for the VPN route to the loopback address of
the other DC gateway. After the VNF and DC gateways establish BGP peer relationships, the VNF can
send UE routes to the DC gateways, and the next hops of these routes are the VNF IP address.
2022-07-08 1063
Feature Description
5. In symmetric mode, the L2GW/L3GW needs to learn UE routes. Therefore, a route-policy needs to be
configured on the DC gateway to enable the DC gateway to advertise UE routes to the L2GW/L3GW
after setting the original next hops of these routes as the gateway address. Except UE routes, the DCN
does not need to be aware of other external routes. Therefore, another route-policy needs to be
configured on the DC gateway to ensure that the DC gateway advertises only loopback routes and
default routes to the L2GW/L3GW.
2022-07-08 1064
Feature Description
6. As the border gateway of the DCN, the DC gateway can exchange Internet routes with external PEs,
such as routes to server IP addresses on the Internet.
2022-07-08 1065
Feature Description
7. To implement load balancing during traffic transmission, load balancing and Add-Path can be
configured on the DC gateway and L2GW/L3GW. This balances both north-south and east-west traffic.
• North-south traffic balancing: Take DCGW1 in Figure 1 as an example. DCGW1 can receive EVPN
routes to VNF2 from L2GW/L3GW1 and L2GW/L3GW2. By default, after load balancing is
configured, DCGW1 sends half of traffic destined for VNF2 to L2GW/L3GW1 and half of traffic
destined for VNF2 to L2GW/L3GW2. However, L2GW/L3GW1 has only one link to VNF2, while
L2GW/L3GW2 has two links to VNF2. As a result, the traffic is not evenly balanced. To address
this issue, the Add-Path function must be configured on the L2GW/L3GWs. After Add-Path is
configured, L2GW/L3GW2 advertises two routes with the same destination address to DCGW1 to
implement load balancing.
2022-07-08 1066
Feature Description
1. Upon receipt of UE traffic, the base station encapsulates these packets and redirect them to a GPRS
tunneling protocol (GTP) tunnel whose destination address is the VNF IP address. The encapsulated
packets reach the DC gateway through IP forwarding.
2. After receiving these packets, the DC gateway searches the VRF table and finds that the next hop of
the forwarding entry corresponding to the VNF address is an IPU address and the outbound interface
is a VXLAN tunnel. The DC gateway then performs VXLAN encapsulation and forwards the packets to
the L2GW/L3GW at Layer 3.
3. Upon receipt of these packets, the L2GW/L3GW finds the corresponding VPN instance based on the L3
VNI, searches for a matching route in the VPN instance's routing table based on the VNF address, and
forwards the packets to the VNF.
4. After the packets reach the VNF, the VNF removes their GTP tunnel header, searches the routing table
based on their destination IP addresses, and forwards them to the L2GW/L3GW through the VNF's
default gateway.
5. After the packets reach the L2GW/L3GW, the L2GW/L3GW searches their VRF table for a matching
forwarding entry. Over the default route advertised by the DC gateway to the L2GW/L3GW, the
packets are encapsulated with the L3 VNI and then forwarded to the DC gateway through the VXLAN
tunnel.
6. Upon receipt, the DC gateway searches the corresponding VRF table for a matching forwarding entry
based on the L3 VNI and forwards these packets to the Internet.
2022-07-08 1067
Feature Description
Figure 9 shows the process of forwarding north-south traffic from the Internet to a UE through the VNF.
1. A device on the Internet sends response traffic to a UE. The destination address of the response traffic
is the destination address of the UE route. The route is advertised by the VNF to the DC gateway
through the VPN BGP peer relationship, and the DC gateway in turn advertises the route to the
Internet. Therefore, the response traffic must first be forwarded to the VNF first.
2. After the response traffic reaches the DC gateway, the DC gateway searches the routing table for
forwarding entries corresponding to UE routes. These routes are learned by the DC gateway from the
VNF over the VPN BGP peer relationship. These routes finally recurse to VXLAN tunnels, the response
packets are encapsulated into VXLAN packets and forwarded to the L2GW/L3GW at Layer 3.
3. After these packets reach the L2GW/L3GW, the L2GW/L3GW finds the corresponding VPN instance
based on the L3 VNI, searches for a route corresponding to the UE address in the VPN instance's
routing table, and forwards these packets to the VNF.
4. Upon receipt, the VNF processes them and finds the base station corresponding to the destination
2022-07-08 1068
Feature Description
address of the UE. The VNF then encapsulates tunnel information into these packets (with the base
station as the destination) and forwards these packets to the L2GW/L3GW through the default
gateway.
5. Upon receipt, the L2GW/L3GW searches its VRF table for the default route advertised by the DC
gateway to the L2GW/L3GW. Then, the L2GW/L3GW encapsulates these packets with the L3 VNI and
forwards them to the DC gateway over a VXLAN tunnel.
6. Upon receipt, the DC gateway searches its VRF table for the default (or specific) route based on the L3
VNI and forwards these packets to the destination base station. The base station then decapsulates
these packets and sends them to the target UE.
During this process, the VNF may send the received packets to another VNF for value-added service
processing, based on the packet information. In this case, east-west traffic is generated. Figure 10 shows the
process of forwarding east-west traffic (from VNF1 to VNF2), which differs from the north-south traffic
forwarding process in packet processing after packets reach VNF1:
2022-07-08 1069
Feature Description
1. VNF1 sends a received packet to VNF2 for processing. VNF2 re-encapsulates the packet by using its
own address as the destination address of the packet and sends the packet to the L2GW/L3GW1 over
the default route.
2. Upon receipt, the L2GW/L3GW1 searches its VRF table and finds that multiple load-balancing routes
exist. Some routes use the IPU as the outbound interface, and some routes use L2GW/L3GW2 as the
next hop.
3. If these routes work in hybrid load-balancing mode, L2GW/L3GW1 preferentially selects only the
routes with the outbound interfaces being IPUs and steers packets to VNF2 to prevent loops. If these
routes do not work in hybrid load-balancing mode, L2GW/L3GW1 forwards packets in load-balancing
route. Packets are encapsulated into VXLAN packets before they are sent to L2GW/L3GW2 at Layer 2.
After these packets reach L2GW/L3GW2, L2GW/L3GW2 finds the corresponding BD based on the L2
VNI, then finds the destination MAC address, and finally forwards these packets to VNF2.
4. Upon receipt, VNF2 processes the packet and forwards it to the Internet server. The subsequent
forwarding process is the same as the process for forwarding north-south traffic.
2022-07-08 1070
Feature Description
Service Description
Currently, data centers are expanding on a large scale for enterprises and carriers, with increasing
deployment of virtualization and cloud computing. In addition, to accommodate more services while
reducing maintenance costs, data centers are employing large Layer 2 and virtualization technologies.
As server virtualization is implemented in the physical network infrastructure for data centers, VXLAN, an
NVO3 technology, has adapted to the trend by providing virtualization solutions for data centers.
Networking Description
On the network shown in Figure 1, an enterprise has VMs deployed in different data centers. Different
network segments run different services. The VMs running the same service or different services in different
data centers need to communicate with each other. For example, VMs of the financial department residing
on the same network segment need to communicate, and VMs of the financial and engineering departments
residing on different network segments also need to communicate.
Feature Deployment
As shown in Figure 1:
• Deploy Device 1 and Device 2 as Layer 2 VXLAN gateways and establish a VXLAN tunnel between
Device 1 and Device 2 to allow communication between terminal users on the same network segment.
2022-07-08 1071
Feature Description
• Deploy Device 3 as a Layer 3 VXLAN gateway and establish a VXLAN tunnel between Device 1 and
Device 3 and between Device 2 and Device 3 to allow communication between terminal users on
different network segments.
Configure VXLAN on devices to trigger VXLAN tunnel establishment and dynamic learning of ARP and MAC
address entries. By now, terminal users on the same network segment and different network segments can
communicate through the Layer 2 and Layer 3 VXLAN gateways based on ARP and routing entries.
Service Description
Currently, data centers are expanding on a large scale for enterprises and carriers, with increasing
deployment of virtualization and cloud computing. In addition, to accommodate more services while
reducing maintenance costs, data centers are employing large Layer 2 and virtualization technologies.
As server virtualization is implemented in the physical network infrastructure for data centers, VXLAN, an
NVO3 technology, has adapted to the trend by providing virtualization solutions for data centers, allowing
intra-VXLAN communication and communication between VXLANs and legacy networks.
Networking Description
On the network shown in Figure 1, an enterprise has VMs deployed for the finance and engineering
departments and a legacy network for the human resource department. The finance and engineering
departments need to communicate with the human resource department.
2022-07-08 1072
Feature Description
Feature Deployment
As shown in Figure 1:
Deploy Device 2 as Layer 2 VXLAN gateway and Device 3 as a Layer 3 VXLAN gateway. The VXLAN gateways
are VXLANs' edge devices connecting to legacy networks and are responsible for VXLAN encapsulation and
decapsulation. Establish a VXLAN tunnel between Device 2 and Device 3 for VXLAN packet transmission.
When the human resource department sends a packet to VM1 of the financial department, the process is as
follows:
2. Upon receipt, Device 3 parses the destination IP address, and searches the routing table for a next hop
address. Then, Device 3 searches the ARP or ND table based on the next hop address to determine the
destination MAC address, VXLAN tunnel's outbound interface, and VNI.
3. Device 3 encapsulates the VXLAN tunnel's outbound interface and VNI into the packet and sends the
VXLAN packet to Device 2.
4. Upon receipt, Device 2 decapsulates the VXLAN packet, finds the outbound interface based on the
destination MAC address, and forwards the packet to VM1.
2022-07-08 1073
Feature Description
Service Description
Enterprises configure server virtualization on DCNs to consolidate IT resources, improve resource use
efficiency, and reduce network costs. With the wide deployment of server virtualization, an increasing
number of VMs are running on physical servers, and many applications are running in virtual environments,
which bring great challenges to virtual networks.
Network Description
On the network shown in Figure 1, an enterprise has two servers in the DC: engineering and finance
departments on Server1 and the marketing department on Server2.
The computing space on Server1 is insufficient, but Server2 is not fully used. The network administrator
wants to migrate the engineering department to Server2 without affecting services.
This scenario applies to IPv4 over IPv4, IPv6 over IPv4, IPv4 over IPv6, and IPv6 over IPv6 networks. Figure 1
shows an IPv4 over IPv4 network.
Feature Deployment
To ensure uninterrupted services during the migration of the engineering department, the IP and MAC
addresses of the engineering department must remain unchanged. This requires that the two servers belong
to the same Layer 2 network. If conventional migration methods are used, the administrator may have to
purchase additional physical devices to distribute traffic and reconfigure VLANs. These methods may also
2022-07-08 1074
Feature Description
Terms
Term Description
VXLAN Virtual extensible local area network. An NVO3 network virtualization technology
that encapsulates data packets sent from VMs into UDP packets and encapsulates IP
and MAC addresses used on the physical network in the outer headers before sending
the packets over an IP network. The egress tunnel endpoint then decapsulates the
packets and sends the packets to the destination VM.
BD bridge domain
2022-07-08 1075
Feature Description
8 WAN Access
Purpose
This document describes the WAN Access feature in terms of its overview, principles, and applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
2022-07-08 1076
Feature Description
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
2022-07-08 1077
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2022-07-08 1078
Feature Description
Definition
IMA is the acronym of Inverse Multiplexing for ATM. The general idea of IMA is that the sender schedules
and distributes a high-speed ATM cell stream to multiple low-speed physical links for transmission, and then
the receiver schedules and reassembles the stream fragments into one cell stream and submits the cell
stream to the ATM layer. In this manner, bandwidths are multiplexed flexibly, improving the efficiency of
bandwidth usage.
Purpose
Using multiple E1 lines is more flexible and efficient. IMA allows a network designer and administrator to
use multiple E1 lines to implement ATM access.
Benefits
IMA has the following advantages:
2022-07-08 1079
Feature Description
on a per cell basis. To know the ATM IMA feature, you need to learn the basic concepts of ATM IMA.
Basic Concepts
• IMA group
An IMA group can be considered a logical link that aggregates several low-speed physical links
(member links) to provide higher bandwidth. The rate of the logical link is approximately the sum of
the rate of the member links in the IMA group.
• ICP cell
ICP is short for IMA Control Protocol. ICP cells are a type of IMA negotiation cells, used mainly to
synchronize frames and transmit control information (such as the IMA version, IMA frame length, and
peer mode) between communicating devices. The offset of ICP cells in IMA frames on a link is fixed.
Like common cells, ICP cells consist of a 5-byte header and 48-byte payload.
• Filler cell
In the ATM model without an IMA sub-layer, decoupling of cell rates is implemented by Idle cells at the
Transmission Convergence (TC) sub-layer. After the IMA sub-layer is adopted, decoupling of cell rates
can no longer be implemented at the TC sub-layer due to frame synchronization. Therefore, Filler cells
are defined at the IMA sub-layer to implement decoupling of cell rates. If there is no ATM cell to be
sent, the sender sends Filler cells so that the physical layer transmits cells at a fixed rate. These filler
cells are discarded at the IMA receiving end.
• Differential delay
Links in an IMA group may have different delays and jitters. If the difference between the greatest
phase and the smallest phase in an IMA group exceeds the configured differential delay, the IMA group
removes the link with the longest delay from the cyclical sending queue and informs the peer that the
link is unavailable by sending the Link Control Protocol (LCP) cells. Through negotiation between the
two ends of a link, the link becomes active and then rejoins the cyclical sending queue of the IMA
group.
2022-07-08 1080
Feature Description
IMA IMA divides one higher-speed transmission IMA transports ATM traffic over bundled
channel into two or more lower-speed low-speed E1 lines. It allows a network
channels and transports an ATM cell designer and administrator to use these E1
stream across these lower-speed channels. lines to implement ATM access.
At the far-end, IMA groups these lower-
speed channels and reassembles the cells to
recover the original ATM cell stream.
An IMA group can be considered a logical
link that aggregates several physical low-
speed links (member links) to provide
higher bandwidth. The rate of the logical
link is approximately the sum of the rate of
the member links in the IMA group.
Principles
Figure 1 shows inverse multiplexing and de-multiplexing of ATM cells in an IMA group.
• The sending end: In the sending direction, IMA receives ATM cells from the ATM layer and places them
in circular order onto member links of the IMA group.
• The receiving end: After reaching the receiving end, these cells are reassembled into the original cell
flow and transmitted onto the ATM layer. The IMA process is transparent to the ATM layer.
2022-07-08 1081
Feature Description
2022-07-08 1082
Feature Description
Acronym/Abbreviation
AN Access Node
PW pseudowire
Definition
2022-07-08 1083
Feature Description
ATM was designated as the transmission and switching mode for Broadband Integrated Services Digital
Networks (B-ISDN) by the ITU-T in June 1992. Due to its high flexibility and support to the multi-media
service, ATM is considered as the key for realizing broadband communications.
Defined by the ITU-T, ATM implements transmission, multiplexing, and switching of data based on cells.
ATM is a cell-based and connection-oriented multiplexing and switching technology.
An ATM cell has a fixed length of 53 bytes. As defined by the ITU-T, ATM transmits, multiplexes, and
switches data based on cells. For example, the messages of voice, video, and data are all transmitted in the
cells of the fixed length. This ensures the fast data transmission.
Purpose
ATM provides the network with a versatile and connection-oriented transfer mode that applies to different
services.
Before the Gigabit Ethernet technology, ATM backbone switches were mostly used on backbone networks to
ensure high bandwidth. ATM dominated among network technologies because it can provide good QoS and
transmit voice, data, and video with high bandwidth.
Nevertheless, the initial roadmap for ATM, coping with all the network communication issues, was too
ambitious and idealistic. As a result, the ATM implementation became so complicated. The aim of the ATM
technology is too ideal. The realization of ATM is complex. The perfection of the ATM technology and
complexity of its architecture result in the difficulties of developing, configuring, managing, and
troubleshooting the ATM system.
ATM network devices are quite expensive. The ATM network cannot be affordable for people and its
excellent performance is unknown from the origin of ATM.
In the late 1990's, Internet and IP technology overshadowed ATM for their simplicity and flexibility. They
developed at a fast rate in the application field. This made a severe impact on the B-ISDN plan.
ATM is, however, still regarded as the best transmission technology of B-ISDN because it has advantages in
transporting integrated services. Therefore, the IP technology integrated with ATM. This brought about the
new era of constructing broadband networks through the integration of the IP and ATM technologies.
2022-07-08 1084
Feature Description
• Control plane: This plane generates and manages signaling requests. It sets up, monitors, and removes
connections by using signaling protocols.
• Management plane: This plane is divided into layer management and plane management.
■ Layer management: It is responsible for the management of every layer in each plane. It has a
layered structure corresponding to other planes.
■ Plane management: It is responsible for the system management and the communications between
different planes.
• Physical layer: Similar to the physical layer of the OSI reference model, the physical layer manages the
transmission related to the medium.
• ATM layer: It integrates with the ATM adaptation layer (AAL) and is similar to the data link layer of the
OSI reference model. The ATM layer is responsible for sharing virtual circuits on the physical link and
transmitting ATM cells on the ATM network.
• AAL: It integrates with the ATM layer and is similar to the data link layer of the OSI reference model.
AAL is mainly responsible for isolating the upper-layer protocols from the ATM layer. It prepares the
switchover from the data to cells, and divides the data into a 48-byte cell payload.
• Upper layer: It receives data, divides it into packets, and transmits it to AAL for processing.
2022-07-08 1085
Feature Description
The comparison between the ATM protocol architecture and the OSI reference model is shown in Figure 2.
Figure 2 Comparison between the ATM protocol architecture and the OSI reference model
ATM Adaptation Layer (AAL) Convergence Sublayer (CS) Convergence sub-layer: provides
standard interfaces.
2022-07-08 1086
Feature Description
The detailed functions of layers and sub-layers in the ATM reference model are described in the following
sections.
• Synchronizes the sending and receiving by sending and receiving continuous bit flows with timing
information.
• Specifies physical carriers for all physical media, including cables and connectors.
The ATM physical medium standard includes Synchronous Optical Network (SONET)/Synchronous Digital
Hierarchy (SDH), Digital Signal level 3 (DS-3), T3/E3, multimode fiber, and shielded twisted pair (STP).
Different media can use various transmission frame structures.
The following section describes the method to encapsulate ATM in SONET/SDH, T3/E3 frames:
Table 1 Comparison between the common transmission rates of SONET and SDH
2022-07-08 1087
Feature Description
■ The user layer lies at the top of the SONET physical layer.
■ The transmission channel layer, digital line layer, and segment regeneration layer are three sub-
layer entities of the SONET physical layer.
■ The transmission channel layer is mainly responsible for assembling and disassembling cells
for SONET frame signals.
■ The digital line layer adds the packet header (such as system overhead) and performs
multiplexing.
■ The segment regeneration layer includes the segment layer and photon layer. After data
arrives at the segment regeneration layer, the segment layer appends a segment header,
encapsulates the data in a frame, and transmits this frame to the photon layer. Then, the
photon layer sends this frame after switching the electric signals into optical signals.
The frame format of the STS-M that bears ATM cells is shown in Figure 2.
2022-07-08 1088
Feature Description
Similar to T-3, E3 adopts two technologies: PLCP and direct mapping into E3.
Compared with DS-S PCLP, E3 PLCP has the following differences:
■ It adopts the G.751 format, and inserts the tail used to synchronize E3 after every nine cells.
■ Its tail length ranges from 18 to 20 bytes, and that of T-3 PLCP ranges from 6.5 to 7 bytes.
2022-07-08 1089
Feature Description
ATM cells are directly mapped into E3 frames in the G.832 standard.
ATM cells are directly mapped into 530-byte payload with the system overhead occupying 7 bytes.
ATM IMA
ATM IMA Description describes the principles of ATM IMA.
ATM Bundling
ATM bundling is an extended ATM PWE3 application and is applicable to IP RAN networks. On the network
shown in Figure 5, nodeBs are connected to a Cell Site Gateway (CSG) using ATM links. Each nodeB probably
transmits both voice and data services. Configuring a PWE3 PW for each service on every nodeB connected
to a Radio Network Controller (RNC) will expose heavy burden on the CSG. Bundling physical links to one
PW to transmit the same type of service from different nodeBs to the RNC relieves the burden on the CSG
and provides service scalability.
ATM bundling is an ATM PWE3 extension and provides logical ATM bundle interfaces. PWE3 PWs are
established on ATM bundle interfaces and PVCs are configured on Serial sub-interfaces (ATM is specified as
the link layer protocol). After Serial sub-interfaces join the ATM bundle interfaces, PVCs on these sub-
2022-07-08 1090
Feature Description
interfaces are mapped to specified PWs. This reduces the number of PWs and system burden. ATM bundle
interfaces forward traffic as follows:
1. After receiving user traffic through a PVC of an ATM bundle member interface on a CSG, the CSG
forwards user traffic to a PW to which the PVC is mapped.
2. After receiving traffic from an RNC, the CSG maps traffic to specific ATM bundle member interfaces
based on PVCs and these ATM bundle member interfaces forward traffic to specific nodeBs.
• Transmits the VC number Virtual Path Identifier (VPI)/Virtual Channel Identifier (VCI), multiplexes, and
demultiplexes cells.
• User-to-Network Interface
The UNI defines the interfaces between the peripheral devices and ATM switches.
Depending on whether the switches are owned by clients or operators, UNIs can be divided into public
UNIs and private UNIs.
Private UNIs are connected to two switches on the same private ATM network and used inside the
private ATM network. Public UNIs are connected to ATM peripheral devices or private ATM switches to
public ATM switches.
• Network-to-Network Interface
The NNI refers to the interfaces between ATM switches.
Depending on whether the switches are owned by clients or operators, NNI can be divided into two
types: public NNIs and private NNIs.
Connected to two switches on the same private ATM network, the private NNI is used inside the private
ATM network. Connected to two ATM switches of the same public network carrier, the public NNI is
2022-07-08 1091
Feature Description
The VP is used to adapt to high-speed networks in which network control cost is increasing. The VP
technology reduces the control cost by binding the connections of the same paths on a shared network into
a unit. By doing so, the network management can only process lesser number of connections, instead of a
larger number of independent connections.
In the ATM communication, an ATM switch transmits the received cells to the output interface according to
the VPI/VCI of the input cells and the forwarding table that is generated during the setup of a connection. At
the same time, this ATM switch changes the VPI/VCI of a cell into that of an outgoing interface to complete
the VP switching or VC switching.
2022-07-08 1092
Feature Description
ATM VCs are of the following types: permanent virtual circuit (PVC), switching virtual circuit (SVC), and soft
virtual circuit (soft VC).
• The PVC is statically configured by the administrator. Once it is set up, it cannot be removed. PVC
applies to connections for advanced requirements.
• The SVC is set up through the signaling protocol. It can be connected and removed through commands.
When a node receives the connection request from other nodes, the connection response information
needs to be sent to this node if configuration requirements are satisfied. After the connection is set up,
the connection request is sent to the next target node.
The removing process is similar to the setting up of the connection.
• Soft VC indicates that the ATM network is based on SVC, but peripheral devices access the ATM network
in PVC mode.
The setting up of soft VCs is similar to that of SVCs. The only difference is that PVCs must be manually
configured between ATM switch interfaces and peripheral devices.
The advantage of this mode is that it is easy to manage users if PVCs are connected to the users. In
addition, SVCs can ensure the proper usage of the links.
Figure 3 Soft VC
2022-07-08 1093
Feature Description
In the ATM switching table shown in Figure 4, the first line shows that cells sent from the port with VPI/VCI
as 4/55 to a switch changes the cell header VPI/VCI to 8/62. Then, these cells are sent out from port 3.
The NNI cell header is used for communication between two switching nodes.
Figure 6 shows the NNI cell header format.
2022-07-08 1094
Feature Description
• GFC: indicates the generalflow test control with a length of 4 bits. It applies to the UNI interfaces only.
It performs flow control, and identifies different accesses on a shared media network.
• VPI: indicates the virtual path identifier. In the UNI, it can identify 256 VPs and its length is 8 bits. In the
NNI, it can identify 4096 VPs and its length is 12 bits.
• VCI: indicates the virtual channel identifier. It can identify 65536 VCs and its length is 16 bits.
• CLP: indicates the cell loss priority. It is used for congestion control and its length is 1 bit. When
congestion occurs, cells with the CLP as 1 are discarded first.
• PTI: indicates the payload type indicator. It identifies the payload type and its length is 3 bits.
• HEC: indicates the header error control. It is used for error control and cell delimitation in a cell header
and its length is 8 bits. HEC can correct 1-bit error, find multi-bit error, and perform HEC on the physical
layer.
Some specified VPI/VCI values are reserved for special cells. These special cells are described as follows:
• Idle cell: Its VPI is 0, VCI is 0, PTI is 0, and CLP is 1. It is used for rate adaptation.
• Unassigned cell: Its VPI is 0, VCI is 0, PTI can be any value, and CLP is 1.
• OAM cell: For the VP sub-layer, its VCI is 3 and it is used for the VP link. When VCI is 4, it is used for the
VP connection. For the VC sub-layer, it is used for the VC link when PTI is 4. When PTI is 5, it is used for
the VC connection.
■ Component signaling cell: Its VPI can be any value, and VCI is 1.
■ General broadcast signaling cell: Its VPI can be any value, and VCI is 2.
■ Point-to-point (P2P) signaling cell: Its VPI can be any value, and VCI is 5.
• Payload type: Its length is 3 bits. It is used to identify the information field, that is, the payload type. The
following lists the PT values and corresponding meanings defined by the ITU-T I.361.
■ PT = 000: indicates that the data cell does not experience congestion and ATM user to user (AUU)
is 0.
■ PT = 001: indicates that the data cell does not experience congestion and AUU is 1.
2022-07-08 1095
Feature Description
■ PT = 010: indicates that the data cell experiences congestion and AUU is 0.
■ PT = 011: indicates that the data cell experiences congestion and AUU is 1.
• The second bit identifies whether cells experience congestion and can be set through the network node
when there is congestion.
• The third bit is an AUU indicator. AUU = 0 indicates that the corresponding SAR-PDU is the beginning
segment or intermediate segment. AUU = 1 indicates that SAR-PDU is the ending segment.
ATM OAM
• Overview of OAM
According to different protocols, OAM has two different definitions.
■ OAM: Operation Administration and Maintenance (LUCENT APC User Manual, 03/99)
OAM offers a mechanism to detect and locate faults, and verify the network performance without
interrupting the service. After some OAM cells with the standard structure are inserted in user cell flow,
certain specific information can be provided.
1. Two ends simultaneously send OAM cells at a specified interval to their peers.
2. If the peer replies with a signal after receiving the OAM cell, it indicates the link is normal. If the
local timer finds that the OAM cell times out, the local port considers that the link fails.
OAM functions can vary with different hardwares. Main OAM functions are as follows.
2022-07-08 1096
Feature Description
Performance Monitoring (PM) Manages performance and returns the assessment to the
local.
F5 is divided into end-to-end and segmentation. PTI that has the length of 3 bits in the ATM header
information is used to differentiate the two types. When PTI is 100, it indicates the segmentation. When
PTI is 101, it indicates end-to-end. Currently, OAM is used to detect links. Therefore, Huawei products
mainly support the end-to-end type of F5.
• Convergence Sublayer
Convergence sublayer (CS) contains the following two sublayers:
The CS sublayer is used to convert the upper-layer information into ATM payload with the same size
that is suitable for the segments.
SSCS associates with the features of various services. The CPCS changes into frames by adding stuffing
characters with variable length at the front and back of frames to perform error detection. The frames
2022-07-08 1097
Feature Description
AAL Type
Currently, there are four types of AAL: AAL1, AAL2, AAL3/4, and AAL5. Each type supports certain specified
services on the ATM network. Products produced by most ATM equipment manufacturers widely adopt AAL5
to support data communication service.
• AAL1
AAL1 is used for constant bit rate (CBR), sending data at a fixed interval.
AAL1 uses one part of the 48-byte payload to bear additional information, such as sequence number
(SN) and sequence number protection (SNP). SN contains 1-bit convergence sublayer identifier and 3-
bit sequence counting (SC). CSI is also used for timing.
• AAL2
Compared with AAL1, AAL2 can transmit compressed voice and realize common channel signaling
(CCS) inside ISDN.
Details on AAL2 are defined in ITU-T 363.2.
AAL2 supports the processing of compressed voice at the upper limit rate of 5.3 Kbit/s. This realizes
silence detection, suppression, elimination, and CCS. In addition, higher bandwidth utilization is
available. Segments can be encapsulated into one or multiple ATM cells.
CS of AAL2 can be divided into CPCS and SSCS. SSCS is on top of CPCS. The basic structure of AAL2
users can be recognized through CPCS. Error check, data encapsulation, and payload breakdown can be
performed.
AAL2 allows payloads of variable length to exist in one or multiple ATM cells.
• AAL3/4
As the first technology trying to realize cell delay, AAL3/4 stipulates the connection-oriented and
connectionless data transmission.
CPCS is used to detect and process errors, identify the CPCS-service data unit (SDU) to be transmitted,
and determine the length of the CPCS-packet data unit (PDU).
• AAL5
AAL5 can also process connection-oriented and connectionless data. AAL5 is called the simple and valid
adaptation layer. It uses 48 bytes to load the payload information. AAL5 does not use the additional
information bit. It contains no sequence number and cannot detect errors.
AAL5 SAR sublayer is simple. It divides CPCS-PDUs into 48-byte SAR-PDUs without any overhead and
realizes the reverse function when receiving data.
The CPCS-PDU format of AAL5 CPCS is shown in Figure 1.
2022-07-08 1098
Feature Description
The length of the CPCS-PDU payload is variable and ranges from 1 to 65535 bytes.
As shown in Figure 1, no CPCS-PDU header exists. A CPCS-PDU tail, however, occupies eight bytes. The
meaning of each field in Figure 1 is as follows:
■ PAD: indicates the stuffing bit, making the CPCS-PDU length as the integer multiple of 48-byte
payload.
SSCS of AAL5 CS is similar to AAL3/4. CPCS is also shared by upper layers. CPCS performs error
detection, processes errors, fills bytes to form 48-byte payloads, and discards the received incomplete
CPCS-PDU.
• Logical Link Control (LLC)/Sub-Network Attachment Point (SNAP), which is the default encapsulation
technology adopted in the standard protocols
• LLC/SNAP allows multiprotocol multiplexing on a single ATM virtual circuit (AC). The type of the
protocol carrying the PDU is identified by the LLC header of the IEEE 802.2 standard that is added to the
PDU.
• VC multiplexing ensures the carrying of high-layer protocols on ATM VCs. Each protocol is carried on a
distinct ATM VC.
LLC/SNAP Encapsulation
LLC encapsulation is needed when several protocols are carried over the same VC. To ensure that the
receiver properly processes the received AAL5 CPCS-PDU packets, the payload field must contain information
2022-07-08 1099
Feature Description
necessary to identify the protocol of the routed or bridged PDU. In LLC encapsulation, this information is
encoded in an LLC header placed in front of the carried PDU.
There are two types of LLC:
Unless otherwise specified, LLC in this document refers to LLC type 1. The application of LLC type 2 is similar
to that of LLC type 1.
■ The LLC header value 0xFE-FE-03 identifies a routed ISO Protocol Data Unit (PDU).
■ The Ctrl field value is 0x03, specifying an unnumbered information command PDU.
For routed ISO PDUs, the format of the AAL5 CPCS-PDU Payload field is shown in Figure 2.
■ Length: It is 2 bytes.
2022-07-08 1100
Feature Description
ISO routing protocol is identified by a 1-byte Network Layer Protocol Identifier (NLPID) field that is a
part of the protocol data. NLPID values are administered by ISO and ITU-T.
An NLPID value of 0x00 is defined in ISO/IEC TR 9577 as the null network layer or inactive set. Since it
has no significance within the context of this encapsulation scheme, an NLPID value of 0x00 is invalid.
Although an IP is not an ISO protocol, the IP has an NLPID value of 0xCC. For an IP, it adopts the
preceding encapsulation format that is not used often.
The LLC header value 0xAA-AA-03 identifies a SNAP header with IEEE802.1a. Figure 3 shows the format
of a SNAP header.
■ The organizationally unique identifier (OUI) is 3 bytes in length. The OUI identifies an organization
that administers the meaning of the following Protocol Identifier (PID). The OUI value 0x00-00-00
indicates that the PID is an Ethernet type.
In the detailed format of an IPv4 PDU, the Ethernet type value is 0x08-00. Figure 5 shows the format of
the IP PDU.
2022-07-08 1101
Feature Description
- 0x00-0D Fragments
- 0x00-0E BPDUs
The AAL5 CPCS-PDU Payload field carrying a bridged PDU must have one of the following formats.
It is required to add padding after the PID field to align the user information field of the Ethernet,
802.3, 802.4, 802.5, FDDI, and 802.6 PDUs.
The sequence of a MAC address must be the same as that in the LAN or MAN.
2022-07-08 1102
Feature Description
Padding is added to ensure that the length of a frame on the Ethernet/802.3 physical layer reaches the
minimum value. Padding must be added when bridged Ethernet/802.3 PDU encapsulation with the LAN
FCS is used. Otherwise, you do not need to add padding.
When frames without the LAN FCS are received, the bridge must add some padding to the frames
before forwarding the frames to an Ethernet/802.3 subnet.
2022-07-08 1103
Feature Description
The common PDU header and trailer are conveyed in sequence at the egress bridge to an 802.6 subnet.
Specifically, the common PDU header contains the BAsize field, which contains the length of the PDU.
If this field is not available to the egress 802.6 bridge, that bridge cannot begin to transmit the
segmented PDU until it has received the entire PDU, calculated the length, and inserted the length into
the BAsize field.
If the field is available, the egress 802.6 bridge can extract the length from the BAsize field of the
Common PDU header, insert it into the corresponding field of the first segment, and immediately
transmit the segment onto the 802.6 subnet.
For the egress 802.6 bridge, you can set the length of the AAL5 CPCS-PDU to 0 to ignore AAL5 CPCS-
PDUs.
VC Multiplexing
In the multiplexing technologies based on the VC, the VC between two ATM sites is used to differentiate the
protocols that carry network interconnection. That is, each protocol must be carried over each VC.
Therefore, no additional multiplexing information is contained on the payload of each AAL5 CPCS-PDU. This
can save bandwidth and reduce the processing cost.
2022-07-08 1104
Feature Description
Since the PID field is not contained in a bridged Ethernet/802.3/802.4/802.5/FDDI PDU packet, the VC
determines the LAN FCS. PDUs in the same bridged medium can carry different protocols regardless of
whether the PDUs contain the LAN FCS.
8.3.3.1 IPoA
IP over AAL5 (IPoA) means that AAL5 bears IP packets. That is, IP packets are encapsulated in ATM cells and
transmitted on the ATM network.
2022-07-08 1105
Feature Description
Realization
As shown in Figure 1, on DeviceA, PVC 0/40 can reach DeviceB, and PVC 0/41 can reach DeviceC. If IP
packets sent to DeviceB need to be sent from PVC 0/40, the IP address of DeviceB must be mapped on PVC
0/40. After address mapping is set up, DeviceA sets up a route that reaches the IP address of DeviceB. The
outgoing interface is the interface where ATM PVC 0/40 resides.
• DeviceA searches the routing table, and finds the outgoing interface to be an interface configured with
ATM.
• The outgoing interface encapsulates IP packets into PVC 0/40 as IPoA cells.
DeviceB sends back these cells to DeviceA. Then, DeviceA pings through DeviceB.
Terms
Term Description
ATM Recommendation ITU-R F.1499 defines the Asynchronous Transfer Mode (ATM)
as a protocol for the transmission of a variety of digital signals using uniform
53 byte cells. Recommendation ITU-R M.1224 defines ATM as a transfer mode
2022-07-08 1106
Feature Description
Term Description
Cell ATM organizes digital data into 53-byte cells and then transmits, multiplexes,
or switches them. An ATM cell consists of 53 bytes. The first 5 bytes is the cell
header that contains the routing and priority information. The remaining 48
bytes are payloads.
Multi-network PVC A multi-network PVC travels multiple networks. It consists of PVC segments on
different networks.
2022-07-08 1107
Feature Description
CC Continuity Check
CS Convergence Sublayer
CS Convergence Sublayer
PM Performance Monitoring
2022-07-08 1108
Feature Description
PT Payload Type
VC Virtual Channel
VE Virtual-Ethernet
VP Virtual Path
VT Virtual-Template
2022-07-08 1109
Feature Description
Definition
Frame Relay (FR) is a Layer 2 packet-switched technology that allows devices to use virtual circuits (VCs) to
communicate on wide area networks (WANs).
Purpose
During the 1990s, rapid network expansion gave rise to the following requirements on networks:
The traditional methods used to meet the preceding requirements are circuit switching (leased lines) and
X.25 packet switching. However, these two methods have the following disadvantages:
• Circuit switching: Service deployment is costly, link usage efficiency is low, and transmission of traffic
bursts is unsatisfactory.
• X.25 packet switching: Switches and service deployment are costly, and because the X.25 protocol is
complicated, the transmission rate is low and the latency high.
FR was therefore introduced to meet such requirements. Unlike circuit switching and X.25 packet switching,
FR is highly efficient, cost-effective, reliable, and flexible. With these advantages, FR became popular in WAN
deployment in the 1990s. Table 1 compares circuit switching, X.25 packet switching, and FR.
2022-07-08 1110
Feature Description
Function Description
FR operates at the physical and data link layers of the Open System Interconnection (OSI) reference model
and is independent of upper layer protocols. This simplifies FR service deployment. Characterized by a short
network delay, low deployment costs, and high bandwidth usage efficiency, FR became a popular
communication technology in the early 1990s for WAN applications. FR has the following features:
• Uses VCs instead of physical links to transmit data. Multiple VCs can be multiplexed over one physical
link, which improves bandwidth usage.
• Is a streamlined version of X.25 and retains only the core functionality of the link layer, thereby
improving data processing efficiency.
• Performs statistical multiplexing, frame transparent transmission, and error check at the link layer. If FR
detects an error, it drops the error frame; FR does not correct the errors. In this way, FR does not involve
frame sequencing, flow control, response, or monitoring mechanism, and therefore reduces switch
deployment costs, improves network throughput, and shortens communication delay. The access rate of
FR users ranges from 64 kbit/s to 2 Mbit/s.
• Supports a frame size of at least 1600 bytes, suitable for LAN data encapsulation.
• Provides several effective mechanisms for bandwidth management and congestion control. Besides
reserving committed bandwidth resources for users, FR also allows traffic bursts to occupy available
bandwidth, which improves bandwidth usage.
Benefits
FR offers the following benefits:
• Easy deployment. FR can be deployed on X.25 devices after upgrading the device software; existing
applications and hardware require no modification.
• Flexible accounting mode. FR is suitable for traffic bursts and requires lower user communication
expenditure.
2022-07-08 1111
Feature Description
• Dynamically allocation of idle network resources. FR increases carrier returns from existing investments
by utilizing idle network resources.
DLCI
DLCIs are used to identify VCs.
A DLCI is valid only on the local interface and its directly connected remote interface, and enables the
remote interface to know to which VC a frame belongs. Because FR VCs are connection-oriented, the local
DLCIs can be considered as FR addresses provided by local devices.
A user interface on an FR network supports a maximum of 1024 VCs, and the number of available DLCIs
ranges from 16 to 1007.
On the FR network shown in Figure 1, two DTEs (Device A and Device D) are connected across an FR
network formed by two DCEs (Device B and Device C). Each DTE is connected to a DCE through a UNI, and
each DTE and its directly connected DCE must have the same DLCI. A PVC is established between two DTEs
that are connected through NNIs. VCs are differentiated by different DLCIs.
2022-07-08 1112
Feature Description
VC
A VC is a virtual circuit established between two devices on a packet-switched network.
For the device on the DTE side, the PVC status is determined by the device on the DCE side. For the DCE, the
PVC status is determined by the network.
When two network devices are directly connected, the virtual circuit status on the DCE side is set by the
device administrator.
The local management interface (LMI) maintains the FR link status and PVC status through status request
packets and status packets.
8.4.2.2 LMI
Introduction
Both a DCE and its connected DTE need to know the PVC status. Local Management Interface (LMI) is a
protocol that uses status enquiry messages and state messages to maintain link and PVC status, including
adding PVC status information, deleting information about disconnected PVCs, monitoring PVC status
changes, and checking link integrity. There are three standards for LMI:
• Vendor-specific implementation
2022-07-08 1113
Feature Description
This section describes LMI defined in ITU-T Q.933 Appendix A, which specifies the information units and LMI
implementation.
LMI Messages
There are two types of LMI messages:
• Status enquiry messages: sent from a DTE to a DCE to request the PVC status or detect the link
integrity.
• Status messages: sent from a DCE to a DTE to respond to status enquiry messages. The status messages
carry the PVC status or link integrity information.
LMI Reports
There are three types of LMI reports:
• Full status report: verifies the link integrity and transmits link integrity information and PVC status.
• Single PVC asynchronous status report: notifies a DTE of a PVC status change.
On a UNI that connects a DTE to a DCE, the PVC status of the DTE is determined by the DCE. To request the
PVC status, the DTE sends a status enquiry message to the DCE. Upon receipt of the message, the DCE
replies with a status message that carries the requested status information. The PVC status of the DCE is
determined by other devices connected to the DCE.
On an NNI that connects DCEs of a network, the DCEs periodically exchange PVC status.
2022-07-08 1114
Feature Description
1. A DTE sends a status enquiry message to its connected DCE, and at the same time, the link integrity
verification polling timer (T391) and the DTE counter (V391) start. The value of T391 specifies the
interval at which status enquiry messages are sent. The value of the full status polling counter (N391),
which includes the status of all PVCs, specifies the interval at which full status reports are sent. You
can specify the values of T391 and N391 or use the default values.
• If the value of V391 is less than that of N391, the status enquiry message sent by the DTE
requests only link integrity information.
• If the value of V391 is equal to that N391, V391 is reset to 0, and the status enquiry message sent
by the DTE requests link integrity and PVC status information.
2. After receiving the enquiry message, the DCE responds with a status message, and at the same time,
the polling confirm timer (T392) of the DCE starts. If the DCE does not receive a subsequent status
enquiry message before T392 expires, the DCE records an event and increases the value of the
monitored events counter (N393) by 1.
3. The DTE checks the status message from the DCE. In addition to responding to every enquiry that the
DTE sends, the DCE automatically informs the DTE of the PVC status when the PVC status changes or
a PVC is added or deleted. This mechanism enables the DTE to learn the PVC status in real time and
maintain up-to-date records.
4. If the DTE does not receive a status message before T391 expires, the DTE records an event and
increases the value of N393 by 1.
5. N393 is an error threshold and records the number of events that have occurred. If the value of N393
2022-07-08 1115
Feature Description
is greater than that of N392, the DTE or DCE considers the physical link and all VCs unavailable. You
can specify the values of N392 and N393 or use the default values.
Table 1 lists the parameters required for LMI packet exchange. These parameters can be configured to
optimize device performance.
DTE N391 Full status polling counter The DTE sends a full status report or
a link integrity verification only report
at an interval specified by T391. The
numbers of full status reports and
link integrity verification only reports
to be sent are determined using the
following formula: Number of link
integrity verification only
reports/Number of full status reports
= (N391 - 1)/1.
T391 Polling timer at the user side Specifies the interval at which the
DTE sends status enquiry messages.
T392 Polling timer at the network side Specifies the period during which the
DCE waits for a status enquiry
message from the DTE. The value of
T392 must be greater than that of
2022-07-08 1116
Feature Description
T391.
FR Frame Encapsulation
FR encapsulates a network layer protocol (IP or IPX) in the Data field of a frame and sends the frame to the
physical layer for transmission. Figure 1 shows FR frame encapsulation.
Upon receipt of a Protocol Data Unit (PDU) from a network layer protocol (IP for example), FR places the
PDU between the Address field and frame check sequence (FCS). FR then adds Flags to delimit the
beginning or end of the frame. The value of the Flags field is always 01111110. After the encapsulation, FR
sends the frame to the physical layer for transmission.
Figure 2 shows the basic format of an FR frame. In the format, the Flags field indicates the beginning or end
of the FR frame, and key information about the frame is carried in Address, Data, and FCS. The 2-byte
Address field is comprised of a 10-bit data-link connection identifier (DLCI) and a 6-bit congestion
management identifier.
■ DLCI: The 10-bit DLCI is the key part of an FR header because a DLCI identifies a VC between a DTE
2022-07-08 1117
Feature Description
FR VCs are connection-oriented, and a local device can be connected to different peers through VCs
with different DLCIs. A peer device can therefore be identified by a local DLCI.
A maximum of 1024 VCs can be configured on a user interface of an FR device, but the number of available
DLCIs ranges from 16 to 1007. The values 0 and 1023 are reserved for LMI.
■ C/R: follows DLCI in the Address field. The C/R bit is currently not defined.
■ Extended Address (EA): indicates whether the byte in which the EA value is 1 is the last addressing
field. If the value is 1, the current byte is determined to be the last DLCI byte. Although a two-byte
DLCI is generally used in FR, EA supports longer DLCIs. The eighth bit of each byte of the Address
field indicates the EA.
■ Congestion control: consists of three bits, which are forward-explicit congestion notification (FECN),
backward-explicit congestion notification (BECN), and discard eligibility (DE).
• Data: contains encapsulated upper-layer data. Each frame in this variable-length field includes a user
data or payload field of a maximum of 16000 bytes.
• FCS: is used to check the integrity of frames. A source device computes an FCS value and adds it to a
frame before sending the frame to a receiver. Upon receipt of the frame, the receiver computes an FCS
value and compares the two FCS values. If the two values are the same, the receiver processes the
frame; if the two values are different, the receiver discards the frame. If the frame is discarded, FR does
not send a notification to the source device. Error control is implemented by the upper layer of the OSI
module.
FR Frame Forwarding
On the network shown in Figure 3, the source device and receiver are connected through a PVC passing
through Device A, Device B, and Device C. Each router maintains an address mapping table that records the
mapping between the inbound and outbound interfaces. FR frames are received from the inbound interface
and sent by the outbound interface to the next router. Transit devices can be configured and connected
through VCs on the FR network.
2022-07-08 1118
Feature Description
Two devices across an FR network can be connected through a PVC consisting of multiple VCs, (each VC is
identified by a DLCI). Figure 3 shows how an FR frame is forwarded along a PVC:
1. The source device sends an FR frame from port 1 along the VC specified by DLCI 1.
2. After receiving the FR frame from port 1, Device A sends it through port 2 along the VC specified by
DLCI 2.
3. After receiving the FR frame from port 0, Device B sends it through port 1 along the VC specified by
DLCI 3.
4. After receiving the FR frame from port 1, Device C sends it to the receiver through port 0 along the VC
specified by DLCI 4.
8.4.2.4 FR Sub-interfaces
Background
An FR sub-interface is a logical interface configured on a physical interface. FR sub-interfaces reduce the
number of physical interfaces and deployment costs as well as the impact of split horizon.
An FR network interconnects networks in different geographical locations using a star, full-mesh, or partial-
mesh network topology.
The star topology requires the least number of PVCs and is the most cost-effective. In the star topology,
PVCs are configured on an interface of the central node for communication with different branch nodes. The
star topology is an ideal option when a headquarters and its branch offices need to be interconnected. The
disadvantage of the star topology is that packets exchanged between branch nodes have to pass through
the central node.
In a full-mesh topology, every two nodes are connected using PVCs and exchange packets directly. This
2022-07-08 1119
Feature Description
topology ensures high transmission reliability because packets can be switched to other PVCs if the direct
PVC between two nodes fails. However, the full-mesh topology suffers from the "N square" problem and
requires a large number of PVCs.
In a partial-mesh topology, only some nodes have PVCs to other nodes. An FR network is of the non-
broadcast multiple access (NBMA) type by default; Unlike Ethernet networks, the FR network does not
support broadcast. A node on the FR network must duplicate its received route and send the route to
different nodes over each PVC.
To avoid loops, split horizon is deployed to prevent an interface from sending received routing information.
On the network shown in Figure 1, Device B sends a route to a POS interface of Device A. Due to split
horizon, Device A cannot send the route to Device C or Device D through the POS interface. To resolve this
problem, any of the following solutions can be used:
• Use multiple physical interfaces to connect two neighboring devices. This solution is not cost-efficient
because each device needs to provide multiple physical interfaces.
• Configure multiple sub-interfaces on a physical interface. Then assign a network address to each sub-
interface so that they can function as multiple physical interfaces.
• Disable split horizon. This solution increases the possibility of routing loops.
Implementation
FR can be deployed on interfaces or sub-interfaces, and multiple sub-interfaces can be configured on one
interface. Although sub-interfaces are logical, they have similar function as interfaces at the network layer.
Protocol addresses and VCs can be configured on the sub-interfaces for communication with other devices.
2022-07-08 1120
Feature Description
On the network shown in Figure 2, three sub-interfaces (POS 0/1/0.1, POS 0/1/0.2, and POS 0/1/0.3) are
configured on a POS interface of Device A. Each sub-interface is connected to a remote device through a VC.
POS 0/1/0.1 is connected to Device B, POS 0/1/0.2 is connected to Device C, and POS 0/1/0.3 is connected to
Device D.
With the preceding configurations, the FR network is partially meshed. Devices can therefore exchange
update messages with each other, overcoming the limitations of split horizon.
Benefits
FR sub-interfaces reduce deployment costs.
8.4.3.1 FR Access
A typical FR application is FR access. FR access allows upper-layer packets to be transmitted over an FR
network.
An FR network allows user devices, such as routers and hosts, to exchange data.
2022-07-08 1121
Feature Description
Terms
Term Definition
X.25 A data link layer protocol that defines how to maintain connections between DTE
and DCE devices for remote terminal access and PC communication on a PDN.
2022-07-08 1122
Feature Description
VC virtual channel
Definition
As a bit-oriented link layer protocol, HDLC transparently transmits bit flows of any type without specifying
data as a set of characters.
Through the trunk technology, you can aggregate many physical interfaces into an aggregation group to
balance received and sent data among these interfaces and to provide more highly-reliable connections.
HDLC
Compared with other data link layer protocols, HDLC has the following features:
• Full-duplex communication, which can send data continuously without waiting for acknowledgment and
has high data transmission efficiency.
• All frames adopt the Circle Redundancy Check (CRC) that numbers information frames. In this way, the
information frames can be prevented from being lost or received repeatedly; therefore, the transmission
reliability is improved.
• Transmission control function is separated from process function. Therefore, HDLC has high flexibility
and excellent control function.
• HDLC does not depend on any character set and can transmit data transparently.
• Zero-Bit Insertion, which is used to perform transparent transmission, is easy to be applied on hardware.
2022-07-08 1123
Feature Description
Background
Synchronous data link protocols include character-oriented, bit-oriented, and byte-oriented protocols.
IBM put forward the first character-oriented synchronous protocol, called Binary Synchronous
Communication (BISYNC or BSC).
Later, ISO put forward related standards. The ISO standard is ISO 1745:1975 Information processing - Basic
mode control procedures for data communication systems.
In the early 1970s, IBM introduced the bit-oriented Synchronous Data Link Control (SDLC) protocol.
Later, ANSI and ISO adopted and developed SDLC, and then later put forward their own standards. ANSI
introduced the Advanced Data Communications Control Protocol (ADCCP), and ISO introduced HDLC.
HDLC Features
HDLC is a bit-oriented code-transparent synchronous data link layer protocol. It provides the following
features:
• HDLC works in full-duplex mode and can transmit data continuously without waiting for
acknowledgement. Therefore, HDLC features high data link transmission efficiency.
• HDLC uses cyclic redundancy check (CRC) for all frames and numbers them. This helps you know which
frames are dropped and which frames are repeatedly transmitted. HDLC ensures high transmission
reliability.
• HDLC separates the transmission control function from the processing function and features high
flexibility and perfect control capabilities.
• HDLC is independent of any character encoding set and transparently transmits data.
• Zero-bit insertion, which is used for transparent data transmission, is easy to implement on hardware.
HDLC is especially used to logically transmit data that is segmented into physical blocks or packages. These
blocks or packages are called frames, each of which is identified by a start flag and an end flag. In HDLC, all
bit-oriented data link control protocols use a unified frame format, and both data and control information
are transmitted in frames. Each frame begins at and ends with a frame delimiter, which is a unique
sequence of bits of 01111110. The frame delimiter marks the start or end of a frame or marks for
synchronization. The frame delimiter is invisible inside a frame to avoid confusion.
Zero-bit insertion is used to ensure that the sequence of bits used for the flag does not appear in normal
data. On the transmit end, zero-bit insertion monitors all fields except the flag and places a 0 after five
consecutive 1s. On the receive end, zero-bit insertion also monitors all fields except the flag. After five
consecutive 1s are found, if the following bit is a 0, the 0 is automatically deleted to restore the former bit
flow. If the following bit is a 1, it means that an error has occurred or an end delimit is received. In this case,
the frame receive procedure is generally either restarted or aborted.
2022-07-08 1124
Feature Description
Introduction
Nodes on a network running HDLC are called stations. HDLC specifies three types of stations: primary,
secondary, and combined.
A primary station is the controlling station on a link. It controls the secondary stations on the link and
manages data flow and error recovery.
A secondary station is present on a link where there is a primary station. The secondary station is controlled
by the primary station, and has no direct responsibility for controlling the link. Under normal circumstances,
a secondary station will transfer frames only when requested to do so by the primary station, and will
respond only to the primary station.
A combined station is a combination of primary and secondary stations.
Frames transferred by a primary station to a secondary station are called commands, and frames transferred
by a secondary station to a primary station are called responses.
On a point to multipoint (P2MP) link, there is a primary station and several secondary stations. The primary
station polls the secondary stations to determine whether they have data to transmit, and then selects one
to transmit its data. On a point to point (P2P) link, both ends can be combined stations. If a node is
connected to multiple links, the node can be the primary station for some links and a secondary station for
the other links.
2022-07-08 1125
Feature Description
• Information frames (I-frames): used to transmit valid user data. An I-frame contains a receive sequence
number N(R) and a sequence number of the sent frame N(S) in the Control field.
• Supervisory frames (S-frames): used for flow and error control. An S-frame contains only N(R) in the
Control field. S-frames do not have information fields.
• Unnumbered frames (U-frames): used to set up, tear down, and control links. A U-frame does not
contain N(R) or N(S) in the Control field.
8.5.2.5 IP-Trunk
A trunk can aggregate many interfaces into an aggregation group to implement load balancing on member
interfaces. Therefore, link connectivity is of higher reliability. Trunk interfaces are classified as Eth-Trunk
interfaces and IP-Trunk interfaces. An IP-Trunk can only be composed of POS links. It has the following
characteristics:
• Increased bandwidth: An IP-Trunk obtains the sum of bandwidths of all member interfaces.
• Improved reliability: When a link fails, traffic is automatically switched to other links, which improves
connection reliability.
Member interfaces of an IP-Trunk interface must be encapsulated with HDLC. IP-Trunk and Eth-Trunk
technologies have similar principles. For details, see the chapter about trunk in the NE40E Feature
Description - LAN Access and MAN Access.
2022-07-08 1126
Feature Description
Background
Due to unstable signals on physical links or incorrect configurations at the data link layer on live networks,
an interface on which High-Level Data Link Control (HDLC) is enabled may frequently experience HDLC
negotiation, and the HDLC protocol status of the interface may alternate between Up and Down, causing
routing protocol or MPLS flapping. As a result, devices and networks are severely affected. Worse still,
devices are paralyzed and networks become unavailable.
HDLC flapping suppression restricts the frequency at which the HDLC protocol status of an interface
alternates between Up and Down. This restriction minimizes the impact of flapping on devices and networks.
Implementation Principles
HDLC flapping suppression involves the following concepts:
• Penalty value: This value is calculated based on the HDLC protocol status of the interface using the
suppression algorithm. The core of the suppression algorithm is that the penalty value increases with
the changing times of the interface status and decreases exponentially.
• Suppression threshold: The HDLC protocol status of an interface remains Down when the penalty value
is greater than the suppression threshold.
• Reuse threshold: The HDLC protocol status of an interface is no longer suppressed when the penalty
value is smaller than the reuse threshold.
• Ceiling threshold: The penalty value no longer increases when the penalty value reaches the ceiling
threshold, preventing the HDLC protocol status of an interface from being suppressed for a long time.
The ceiling value can be calculated using the following formula: ceiling = reuse x 2
(MaxSuppressTime/HalfLifeTime).
• Half-life-period: period that the penalty value takes to decrease to half. A half-life-period begins to
elapse when the HDLC protocol status of an interface goes Down for the first time. If the specific half
life expires, the penalty value decreases by half. Once a half life ends, another half life starts.
• Max-suppress-time: maximum period during which the HDLC protocol status of an interface is
suppressed. After a max-suppress-time elapses, the HDLC protocol status of the interface is
renegotiated and reported.
2022-07-08 1127
Feature Description
At t1, the HDLC protocol status of an interface goes Down, and its penalty value increases by 1000. Then,
the interface goes Up, and its penalty value decreases exponentially based on the half-life rule. At t2, the
HDLC protocol status of the interface goes Down again, and its penalty value increases by 1000, reaching
1600, which has exceeded the suppression threshold of 1500. The HDLC protocol status of the interface is
therefore suppressed. As the interface keeps flapping, its penalty value keeps increasing until it reaches the
ceiling threshold of 10000 at tA. As time goes by, the penalty value decreases and reaches the reuse value of
750 at tB. The HDLC protocol status of the interface is then no longer suppressed.
HDLC
On the network shown in Figure 1, a point-to-point link is established betweenDevice A and Device B, and
HDLC is configured on Device A and Device B. HDLC provides simple, stable, and reliable data transmission
and features high fault tolerance at the data link layer.
Figure 1 HDLC
IP-Trunk
For an IP-Trunk interface, you can configure weights for member interfaces to implement load balancing
2022-07-08 1128
Feature Description
among member interfaces. There are two load balancing modes, namely, per-destination and per-packet
load balancing.
• Per-destination load balancing: packets with the same source and destination IP addresses are
transmitted over one member link.
• Per-packet load balancing: packets are transmitted over different member links.
As shown in Figure 2, two Routers are connected through POS interfaces that are bundled into an IP-Trunk
interface to transmit IPv4, IPv6, and MPLS packets.
Terms
Term Definition
Aggregation Two or more interfaces are bundled together so that they function as a single interface
for load balancing and link protection.
Inter-board Interfaces on different boards are bundled together to form a link aggregation group to
aggregation improve the reliability of the link aggregation group.
Bundling Two boards can be bundled together and considered as one board.
Load balancing Member interfaces in a link aggregation group are determined as outbound interfaces
for packets based on their source and destination MAC addresses.
Definition
The Point-to-Point Protocol (PPP) is a link-layer protocol used to transmit point-to-point (P2P) data over
full-duplex synchronous and asynchronous links.
PPP negotiation involves the following items:
• Link Control Protocol (LCP): used to set up, monitor, and tear down data links.
• Network Control Protocol (NCP): used to negotiate options for a network layer protocol running atop
PPP and the format and type of the data to be transmitted over data links.
PPP uses the Password Authentication Protocol (PAP) and Challenge Handshake Authentication Protocol
(CHAP) to secure network communication.
If carriers have high bandwidth requirements, bundle multiple PPP links into an MP link to increase link
bandwidth and improve link reliability.
Purpose
PPP, which works at the second layer (data link layer) of the open systems interconnection (OSI) model, is
mainly used on links that support full-duplex to transmit data. PPP is widely used because it provides user
authentication, supports synchronous and asynchronous communication, and is easy to extend.
PPP is developed based on the Serial Line Internet Protocol (SLIP) and overcomes the shortcomings of SLIP
which supports transmits only IP packets, and does not support negotiation. Compared with other link-layer
protocols, PPP has the following advantages:
• PPP supports both synchronous and asynchronous links, whereas SLIP supports only asynchronous links,
and other link-layer protocols, such as X.25, support only synchronous links.
• PPP uses a Network Control Protocol (NCP), such as the IP Control Protocol (IPCP) or Internetwork
Packet Exchange Control Protocol (IPXCP), to negotiate network-layer parameters.
• PPP supports Password Authentication Protocol (PAP) and Challenge Handshake Authentication
Protocol (CHAP) which improve network security.
• PPP does not have a retransmission mechanism, which reduces network costs and speeds up packet
transmission.
2022-07-08 1130
Feature Description
PPP Architecture
PPP works at the network access layer of the Transmission Control Protocol (TCP)/IP suite for point-to-point
(P2P) data transmission over full-duplex synchronous and asynchronous links.
• Link Control Protocol (LCP): used to set up, monitor, and tear down data links.
• Network Control Protocol (NCP): used to negotiate the formats and types of the data transmitted on
data links.
• (Optional) Password Authentication Protocol (PAP) and Challenge Handshake Authentication Protocol
(CHAP): used to improve network security.
2022-07-08 1131
Feature Description
• Flag field
The Flag field identifies the start and end of a physical frame and is always 0x7E.
• Address field
The Address field uniquely identifies a peer. PPP is used on P2P links, so two devices communicating
using PPP do not need to know the link-layer address of each other. This field must be filled with a
broadcast address of all 1s and is of no significance to PPP.
• Control field
The Control field value defaults to 0x03, indicating an unsequenced frame. By default, PPP does not use
sequence numbers or acknowledgement mechanisms to ensure transmission reliability.
The Address and Control fields together identify a PPP packet. That is, a PPP packet header is FF03 by
default.
• Protocol field
The Protocol field identifies the protocol of the data encapsulated in the Information field of a PPP
packet.
The structure of this field complies with the International Organization for Standardization (ISO) 3309
extension mechanism for address fields. All Protocol field values must be odd. The least significant bit of
the least significant byte must be "1". The least significant bit of the most significant byte must be "0".
If a device receives a data packet that does not comply with these rules, the device considers the packet
unrecognizable and sends a Protocol-Reject packet padded with the protocol code of the rejected
packet to the sender.
8031 Bridging NC
2022-07-08 1132
Feature Description
• Information field
The Information field contains the data. The maximum length of the Information field, including the
Padding content, is equivalent to the maximum receive unit (MRU) length. The MRU defaults to 1500
bytes and can be negotiated.
In the Information field, the Padding content is optional. If data is padded, the communicating devices
can communicate only when they can identify the padding information as well as the payload to be
transmitted.
• Code field
The 1–byte-long Code field identifies the LCP packet type.
If a receiver receives an LCP packet with an unknown Code field from a sender, the receiver sends a
Code-Reject packet to the sender.
0x01 Configure-Request
0x02 Configure-Ack
0x03 Configure-Nak
0x04 Configure-Reject
0x05 Terminate-Request
0x06 Terminate-Ack
0x07 Code-Reject
2022-07-08 1133
Feature Description
0x08 Protocol-Reject
0x09 Echo-Request
0x0A Echo-Reply
0x0B Discard-Request
0x0C Reserved
• Identifier field
The Identifier field is 1 byte long. It is used to match requests and replies. If a packet with an invalid
Identifier field is received, the packet is discarded.
The sequence number of a Configure-Request packet usually starts at 0x01 and increases by 1 each
time the Configure-Request packet is sent. After a receiver receives a Configure-Request packet, it must
send a reply packet with the same sequence number as the received Configure-Request packet.
• Length field
The Length field specifies the length of a negotiation packet, including the length of the Code, Identifier,
Length, and Data fields.
The Length field value cannot exceed the MRU of the link. Bytes outside the range of the Length field
are treated as padding and are ignored after they are received.
• Data field
The Data field contains the contents of a negotiation packet and includes the following fields:
0x01 Maximum-Receive-Unit
0x02 Async-Control-Character-Map
0x03 Authentication-Protocol
0x04 Quality-Protocol
2022-07-08 1134
Feature Description
0x05 Magic-Number
0x06 RESERVED
0x07 Protocol-Field-Compression
0x08 Address-and-Control-Field-Compression
1. Two devices enter the Establish phase if one of them sends a PPP connection request to the other.
2. In the Establish phase, the two devices perform an LCP negotiation to negotiate the working mode,
maximum receive unit (MRU), authentication mode, and magic number. The working mode can be
either Single-Link PPP (SP) or Multilink PPP (MP). If the LCP negotiation succeeds, LCP enters the
Opened state, which indicates that a lower-layer link has been established.
3. If authentication is configured, the two devices enter the Authentication phase and perform Password
2022-07-08 1135
Feature Description
4. In the Authentication phase, if PAP or CHAP authentication fails, the two devices enter the Terminate
phase. The link is torn down and LCP enters the Down state. If PAP or CHAP authentication succeeds,
the two devices enter the Network phase, and LCP remains in the Opened state.
5. In the Network phase, the two devices perform an NCP negotiation to select a network-layer protocol
and to negotiate network-layer parameters. After the two devices succeed in negotiating a network-
layer protocol, packets can be sent over this PPP link using the network-layer protocol.
Various control protocols, such as IP Control Protocol (IPCP) and Multiprotocol Label Switching
Control Protocol (MPLSCP), can be used in NCP negotiation. IPCP mainly negotiates the IP addresses
of the two devices.
6. If the PPP connection is interrupted during PPP operation, for example, if the physical link is
disconnected, the authentication fails, the negotiation timer expires, or the connection is torn down by
the network administrator, the two devices enter the Termination phase.
7. In the Termination phase, the two devices release all resources and enter the Dead phase. The two
devices remain in the Dead phase until a new PPP connection is established between them.
Dead Phase
The physical layer is unavailable during the Dead phase. A PPP link begins and ends with this phase.
When two devices detect that the physical link between them has been activated, for example, when carrier
signals are detected on the physical link, the two devices move from the Dead phase to the Establish phase.
After the PPP link is terminated, the two devices enter the Dead phase.
Establish Phase
In the Establish phase, the two devices perform an LCP negotiation to negotiate the working mode (SP or
MP), MRU, authentication mode, and magic number. After the LCP negotiation is complete, the two devices
enter the next phase.
In the Establish phase, the LCP status changes as follows:
• If the link is unavailable (in the Dead phase), LCP is in the Initial or Starting state. When the physical
layer detects that the link is available, the physical layer sends an Up event to the link layer. Upon
receipt, the link layer changes the LCP status to Request-Sent. Then, the devices at both ends send
Configure-Request packets to each other to configure a data link.
• If the local device first receives a Configure-Ack packet from the peer, the LCP status changes from
Request-Sent to Ack-Received. After the local device sends a Configure-Ack packet to the peer, the LCP
status changes from Ack-Received to Open.
• If the local device first sends a Configure-Ack packet to the peer, the LCP status changes from Request-
Sent to Ack-Sent. After the local device receives a Configure-Ack packet from the peer, the LCP status
2022-07-08 1136
Feature Description
• After LCP enters the Open state, the next phase starts.
The next phase is the Authentication or Network phase, depending on whether authentication is required.
Authentication Phase
The Authentication phase is optional. By default, PPP does not perform authentication during PPP link
establishment. If authentication is required, the authentication protocol must be specified in the Establish
phase.
PPP provides two password authentication modes: PAP authentication and CHAP authentication.
Two authentication methods are available: unidirectional authentication and bidirectional authentication. In
unidirectional authentication, the device on one end functions as the authenticating device, and the device on the other
end functions as the authenticated device. In bidirectional authentication, each device functions as both the
authenticating and authenticated device. In practice, only unidirectional authentication is used.
1. The authenticated device sends the local user name and password to the authenticating device.
2. The authenticating device checks whether the received user name is in the local user list.
• If the received user name is in the local user list, the authenticating device checks whether the
received password is correct.
2022-07-08 1137
Feature Description
• If the received user name is not in the local user list, the authentication fails.
A PAP packet is encapsulated into the Information field of a PPP packet with the Protocol field value 0xC023.
Figure 3 shows the PAP packet format.
2022-07-08 1138
Feature Description
Unidirectional CHAP authentication applies to the following scenarios:(The first scenario is recommended, so
that the authenticated device can check the user name of the authenticating device.)
1. The authenticating device initiates an authentication request by sending a Challenge packet that
carries a random number and the local user name to the authenticated device.
2. After receiving the Challenge packet through an interface, the authenticated device checks
whether a CHAP password is configured on the interface.
• If the password is configured, the authenticated device uses the hash algorithm to calculate
a hash value based on the packet ID, the CHAP password, and the random number in the
packet, and then sends a Response packet carrying the hash value and the local user name
to the authenticating device.
• If the password is not configured, the authenticated device searches the local user table for
the password matching the user name of the authenticating device in the received packet,
uses the hash algorithm to calculate a hash value based on the packet ID, the password
matching the user name, and the random number in the packet, and then sends a Response
packet carrying the hash value and the local user name to the authenticating device.
3. The authenticating device uses the hash algorithm to calculate a hash value based on the packet
ID, the locally saved password of the authenticated device, and the random number in the
Challenge packet, and then compares the hash value with that in the Response packet. If the two
hash values are the same, the authentication succeeds. Otherwise, the authentication fails.
• The authenticating device is not configured with a user name. In this scenario:
2022-07-08 1139
Feature Description
1. The authenticating device initiates an authentication request by sending a Challenge packet that
carries a random number to the authenticated device.
2. After receiving the Challenge packet, the authenticated device uses the hash algorithm to
calculate a hash value based on the packet ID, the CHAP password configured using the ppp
chap password command, and the random number in the packet, and then sends a Response
packet carrying the hash value and the local user name to the authenticating device.
3. The authenticating device uses the hash algorithm to calculate a hash value based on the packet
ID, the locally saved password of the authenticated device, and the random number in the
Challenge packet, and then compares the hash value with that in the Response packet. If the two
hash values are the same, the authentication succeeds. Otherwise, the authentication fails.
A CHAP packet is encapsulated into the Information field of a PPP packet with the Protocol field value
0xC223. Figure 5 shows the CHAP packet format.
2022-07-08 1140
Feature Description
• In PAP authentication, passwords are sent over links in simple text. After a PPP link is established, the
authenticated device repeatedly sends the user name and password until authentication finishes. PAP
authentication is used on networks that do not require high security.
• CHAP is a three-way handshake authentication protocol. In CHAP authentication, the authenticated device sends
only a user name to the authenticating device. Compared with PAP, CHAP features higher security because
passwords are not transmitted. CHAP authentication is used on networks that require high security.
Network Phase
In the Network phase, NCP negotiation is performed to select a network-layer protocol and to negotiate
network-layer parameters. An NCP can enter the Open or Closed state at any time. After an NCP enters the
Open state, network-layer data can be transmitted over the PPP link.
Termination Phase
PPP can terminate a link at any time. A link can be terminated manually by an administrator or be
terminated due to carrier loss, an authentication failure, or other causes.
Background
When two devices are connected through the interfaces over an intermediate transmission device, their
connection will be adjusted if the connection is found incorrect during traffic transmission. However, the
interfaces cannot detect the connection adjustment because the interfaces do not go Down, and therefore
LCP renegotiation is not triggered. However, PPP allows the interfaces to learn the 32-bit host routes from
each other only during the LCP negotiation. As a result, the interfaces continue to transmit traffic using the
host routes learned during the original connection even after the connection change, and traffic is
transmitted incorrectly.
To address this issue, deploy PPP magic number check on these devices. Even if the interfaces do not detect
the connection change, PPP magic number check can trigger LCP renegotiation. The interfaces then re-learn
the host routes from each other.
Principles
Magic numbers are generated by communication devices independently. To prevent devices from generating
2022-07-08 1141
Feature Description
identical magic numbers, each device generates a unique magic number using its serial number, hardware
address, or clock randomly.
Devices negotiate their magic numbers during LCP negotiation and send Echo packets carrying their
negotiated magic numbers to their peers after the LCP negotiation.
In Figure 1, Device A and Device B are connected over a transmission device, and Device C and Device D are
also connected over this transmission device. PPP connections have been established, and LCP negotiation is
complete between Device A and Device B and between Device C and Device D. If the connections are found
incorrect, an adjustment is required to establish a PPP connection between Device A and Device C. In this
situation, PPP magic number check can be used to trigger the LCP renegotiation as follows:
1. Device A sends to Device C an Echo-Request packet carrying Device A's negotiated magic number.
2. When receiving the Echo-Request packet, Device C compares the magic number carried in the packet
with its peer's negotiated magic number (Device D's). The magic numbers are different, and the error
counter on Device C increases by one.
3. Device C replies to Device A with an Echo-Reply packet carrying Device C's negotiated magic number.
4. When receiving the Echo-Reply packet, Device A compares the magic number carried in the packet
with the local magic number. The magic numbers are different. Device A then compares the magic
number in the packet with its peer's negotiated magic number (Device B's). The magic numbers are
also different, and the error counter on Device A increases by one.
5. The preceding steps are repeated. If the error counter reaches a specified value, LCP goes Down, and
LCP renegotiation is triggered.
2022-07-08 1142
Feature Description
Figure 1 shows the connection status before LCP renegotiation. Device A and Device C still use the local and peer's
magic numbers that are negotiated previously. These magic numbers are not updated until the LCP renegotiation.
Background
Due to unstable signals on physical links or incorrect configurations at the data link layer on live networks,
PPP-capable interfaces may frequently experience PPP negotiation, and the PPP protocol status of these
interfaces may alternate between Up and Down, causing routing protocol or MPLS flapping. As a result,
devices and networks are severely affected. Worse still, devices are paralyzed and the network become
unavailable.
PPP flapping suppression restricts the frequency at which the PPP protocol status of an interface alternates
between Up and Down. This restriction minimizes the impact of flapping on devices and networks.
2022-07-08 1143
Feature Description
Implementation Principles
PPP flapping suppression involves the following concepts:
• Penalty value: This value is calculated based on the PPP protocol status of the interface using the
suppression algorithm. The core of the suppression algorithm is that the penalty value increases with
the changing times of the interface status and decreases exponentially.
• Suppression threshold: The PPP protocol status of an interface is suppressed and remains Down when
the penalty value is greater than the suppression threshold.
• Reuse threshold: The PPP protocol status of an interface is no longer suppressed when the penalty value
is smaller than the reuse threshold.
• Ceiling threshold: The penalty value no longer increases when the penalty value reaches the ceiling
threshold, preventing the PPP protocol status of an interface from being suppressed for a long time. The
ceiling value can be calculated using the following formula: ceiling = reuse x 2
(MaxSuppressTime/HalfLifeTime).
• Half-life-period: period that the penalty value takes to decrease to half. A half-life-period begins to
elapse when the PPP protocol status of an interface goes Down for the first time. If a half-life-period
elapses, the penalty value decreases to half, and another half-life-period begins.
• Max-suppress-time: maximum period during which the PPP protocol status of an interface is
suppressed. After a max-suppress-time elapses, the PPP protocol status of the interface is renegotiated
and reported.
At t1, the PPP protocol status of an interface goes Down, and its penalty value increases by 1000. Then, the
2022-07-08 1144
Feature Description
interface goes Up, and its penalty value decreases exponentially based on the half-life rule. At t2, the PPP
protocol status of the interface goes Down again, and its penalty value increases by 1000, reaching 1600,
which has exceeded the suppression threshold of 1500. The PPP protocol status of the interface is therefore
suppressed. As the interface keeps flapping, its penalty value keeps increasing until it reaches the ceiling
threshold of 10000 at tA. As time goes by, the penalty value decreases and reaches the reuse value of 750 at
tB. The PPP protocol status of the interface is then no longer suppressed.
8.6.2.5 MP Fundamentals
How MP Works
The Multilink protocol bundles multiple PPP links into an MP link to increase link bandwidth and reliability.
MP fragments packets exceeding the maximum transmission unit (MTU) and sends these fragments to the
PPP peer over the PPP links in the MP-group. The PPP peer then reassembles these fragments into packets
and forwards these packets to the network layer. For packets that do not exceed the MTU, MP directly sends
these packets over the PPP links in the MP-group to the PPP peer, which in turn forwards these packets to
the network layer.
MP Implementation
An MP-group interface is dedicated to MP applications. MP is implemented by adding multiple PPP
interfaces to an MP-group interface.
MP negotiation involves:
• LCP negotiation: Devices on both ends negotiate LCP parameters and check whether they both work in
MP mode. If they work in different working modes, LCP negotiation fails.
• Network Control Protocol (NCP) negotiation: Devices on both ends perform NCP negotiation by using
only NCP parameters (such as IP addresses) of the MP-group interfaces but not using the NCP
parameters of physical interfaces.
Benefits
MP provides the following benefits:
• Increased bandwidth
2022-07-08 1145
Feature Description
• Load balancing
• Link backup
8.6.3.1 MP Applications
A single PPP link can provide only limited bandwidth. To increase link bandwidth and reliability, bundle
multiple PPP links into an MP link.
As shown in Figure 1, there are two PPP links between Device A and Device B. The two PPP links are bundled
into an MP link by creating an MP-group interface. The MP link provides higher bandwidth than a single PPP
link. If one PPP link in the MP group fails, communication over the other PPP link is not affected.
Terms
None
2022-07-08 1146
Feature Description
Definition
The pseudo random binary sequence (PRBS) is used to generate random data.
The circuit emulation service (CES) technology carries traditional TDM data over a packet switched network
(PSN) and provides end-to-end PDH and SDH data transmission in the PWE3 architecture.
PRBS tests use the PRBS technique to generate a PRBS stream, encapsulate the PRBS stream into CES
packets, send and receive the CES packets over CES service channels, and calculate the proportion of error
bits to the total number of bits to obtain the bit error rate (BER) of CES service channels for measuring
service connectivity.
Purpose
When routers are connected over a public network, the transmission quality affects service deployment and
cutover. To address this problem, use the NMS to deliver a service connectivity test command after CES
services are deployed on PWs. After the test is conducted, the device returns the test result to the NMS. This
shortens service deployment.
Benefits
• Monitors link quality during network cutover and helps identify potential risks, improving the cutover
success ratio and minimizing user complaints about operator network issues.
• Helps speed up service deployment and cutover on a network, shortening the service launch period.
2022-07-08 1147
Feature Description
PRBS Stream
PRBS tests use the PRBS technique to generate a PRBS stream, encapsulate the PRBS stream into CES
packets, send and receive the CES packets over CES service channels, and calculate the proportion of error
bits to the total number of bits to obtain the BER of CES service channels for measuring service connectivity.
A PRBS stream is a pseudo random binary sequence of bits.
1. PRBS stream generation: A PRBS stream is generated by a specific carry flip-flop using a multinomial.
The multinomial varies according to the length of a sequence.
2. PRBS stream measurement Figure 1 shows how PRBS stream measurement is implemented. After the
PRBS module of PE1 generates a PRBS stream, the PRBS stream is encapsulated to CES packets, which
are then sent by the network-side high-speed TX interface to PE2 over a PW. Upon receipt, PE2's line-
side E1 interface performs a local loopback and sends the CES packets through the network-side
interface to PE1's RX interface. After PE1 receives the packets, it compares the sent and received data
and counts the error bits.
3. Bit error insertion during tests: During the tests, bit errors can be inserted to the PRBS stream. PE1
generates a PRBS stream and inserts bit errors. After the PRBS receive unit receives bit errors, PE1 can
determine the test validity.
4. Test termination by PRBS streams: If a PRBS test lasts for a long time, you can stop sending and
receiving the PRBS stream to terminate the test.
PRBS tests are offline detections and interrupt services. Therefore, this function applies to site deployment and fault
detection after a service interruption.
BER Calculation
The BER is calculated using the following equation:
BER = Number of error bits/(Interface rate x Test period)
2022-07-08 1148
Feature Description
On an IP RAN shown in Figure 1, NE40E 1 is directly connected to a BTS over an E1 link, and NE40E 2 is
directly connected to a BSC over an E1 link. Link deterioration or incorrect connections may cause a cutover
failure.
Define
Time Division Multiplex (TDM) is implemented by dividing a channel by time, sampling voice signals, and
enabling sampled voice signals to occupy a fixed interval that is called timeslot according to time sequence.
In this way, multiple ways of signals, through TDM, can be combined into one way of high-rate complex
digital signal (group signal) in a certain structure. Each way of signal is transmitted independently.
Based on TDM circuits on a PSN, TDM Circuits over Packet Switching Networks (TDMoPSN) is a kind of
PWE3 service emulation. TDMoPSN emulates TDM services over a PSN such as an MPLS or Ethernet
network; therefore, transparently transmitting TDM services over a PSN.
Purpose
TDMoPSN is just a mature solution of this kind. TDMoPSN is applied to implement accessing and bearing of
TDM services on the PSN. TDMoPSN is mainly applied to IP RAN carrying wireless services to carry fixed
network services between MSAN devices.
Benefits
The TDMoPSN feature offers the following benefits to carriers:
2022-07-08 1149
Feature Description
• Binds only the useful time slots into packets to improve the resource utilization.
TDM
Time Division Multiplex (TDM) is implemented by dividing a channel by time, sampling voice signals, and
enabling sampled voice signals to occupy a fixed interval that is called timeslot according to time sequence.
In this way, multiple ways of signals, through TDM, can be combined into one way of high-rate complex
digital signal (group signal) in a certain structure. Each way of signal is transmitted independently.
• In the SDH system, the STM-1, STM-4, and STM-16 are usually used.
Clock Synchronization
TDM services require clock synchronization. One of the two parties in communication takes the clock of the
other as the source, that is, the device functioning as the Data Circuit-terminal Equipment (DCE) outputs
clocks signals to the device functioning as the Data Terminal Equipment (DTE). If the clock mode is incorrect
or the clock is faulty, error code is generated or synchronization fails.
The synchronization clock signals for TDM services are extracted from the physical layer. The 2.048 MHz
synchronization clock signals for E1 are extracted from the line code. The transmission adopts HDB3 or AMI
coding that carries timing information. Therefore, devices can extract clock signals from these two types of
codes.
2022-07-08 1150
Feature Description
TDMoPSN
Based on TDM circuits on a PSN, TDM Circuits over Packet Switching Networks (TDMoPSN) is a kind of
PWE3 service emulation. TDMoPSN emulates TDM services over a PSN such as an MPLS or Ethernet
network; therefore, transparently transmitting TDM services over a PSN. TDMoPSN is mainly implemented
by means of two protocols: Structure-Agnostic TDM over Packet (SAToP) and Structure-Aware TDM Circuit
Emulation Service over Packet Switched Network (CESoPSN).
• CESoPSN
The Structure-aware TDM Circuit Emulation Service over Packet Switched Network (CESoPSN) function
simulates PDH circuit services of low rate on E1/T1/E3 interfaces. Different from SAToP, CESoPSN
provides structured simulation and transmission of TDM services. That is, with a framed structure, it can
identify and transmit signaling in the TDM frame.
Features of the structured transmission mode are as follows:
■ When services are carried on the PSN, the TDM structure needs to be protected explicitly.
■ The transmission with a sensitive structure can be applied to the PSN with poor network
performance. In this manner, the transmission is more reliable.
Figure 2 CESoPSN
■ MPLS Label
The specified PSN header includes data required for forwarding packets from the PSN border
gateway to the TDM border gateway.
PWs are distinguished by PW tags that are carried on the specified layer of the PSN. Since TDM is
bidirectional, two PWs in reverse directions should be correlated.
■ PW Control Word
The structure of the CESoPSN control word is shown in Figure 3.
2022-07-08 1151
Feature Description
■ L bit (1 bit), R bit (1 bit), and M bit (2 bits): Used for transparent transmission of alarms and
identifying the detection of severe alarms by an upstream PE on the CE or AC side.
■ Length (6 bits): length of a TDMoPSN packet (control word and payload) when the padding
bit is used to meet the requirements on the minimum transmission unit on the PSN. When the
length of the TDMoPSN packet is longer than 64 bytes, padding bit field is padded with all 0s.
■ Sequence number (16 bits): It is used for PW sequencing and enabling the detection of
discarded and disordered packets. The length of the sequence number is 16 bits and has
unsigned circular space. The initial value of the sequence number is random.
■ Optional RTP
An RTP header can carry timestamp information to a remote device to support packet recovery
clock such as DCR. The packet recovery clock is not discussed in this document. In addition, packets
transmitted on some devices must include the RTP header. To save bandwidth, no RTP header is
recommended under other situations.
The RTP header is not configured by default. You can add it to packets. Configurations of PEs on
both sides must be the same; otherwise, two PEs cannot communicate with each other.
The padding method for the RTP header on the NE40E is to keep the sequence number (16 bits)
consistent with the PW control word and pad other bits with 0s.
■ TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by the number of
timeslots bound to PW (bytes). When the length of the whole PW packet is shorter than 64 bytes,
fixed bit fields are padded to meet requirements of Ethernet transmission.
• SAToP
The Structure-Agnostic TDM over Packet (SAToP) function emulates PDH circuit services of low rate.
SAToP is used to carry E1/T1/E3 services in unframed mode (non-structured). It divides and
encapsulates serial data streams of TDM services, and then transmits encapsulated packets in a PW.
2022-07-08 1152
Feature Description
SAToP is the most simple method to handle transparent transmission of PDH low-rate services in TDM
circuit simulation schemes.
Features of non-structured transmission mode are as follows:
■ The mode does not need to protect the integrity of the structure; it does not need to explain or
operate the channels.
Figure 5 SAToP
■ MPLS Label
The MPLS label for SAToP is the same as the MPLS label for CESoPSN.
■ PW Control Word
The structure of the CESoPSN control word is shown in Figure 6.
■ L bit (1 bit) and R bit (1 bit): Used for transparent transmission of alarms and identifying the
detection of severe alarms by an upstream PE on the CE or AC side.
■ Length (6 bits): length of a TDMoPSN packet (control word and payload) when the padding
bit is used to meet the requirements on the minimum transmission unit on the PSN. When the
length of the TDMoPSN packet is longer than 64 bytes, the padding bits are padded with all
0s.
2022-07-08 1153
Feature Description
■ Sequence number (16 bits): It is used for PW sequencing and enabling the detection of
discarded and disordered packets. The length of the sequence number is 16 bits and has
unsigned circular space. The initial value is the sequence number is random.
■ Optional RTP
The optional RTP for SAToP is the same as the optional RTP for CESoPSN.
■ TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by 32 (bytes). When
the length of the whole PW packet is shorter than 64 bytes, the fixed bits are padded to meet
requirements of Ethernet transmission.
IP RAN
IP RAN, mobile carrier, is a technology used to carry wireless services over the IP network. IP RAN scenarios
are complex because different base stations (BSs), interface technologies, access and convergence scenarios
are involved.
• 2G/2.5G/3G/LTE, traditional BSs/IP BSs, GSM/CDMA, TDM/ATM/IP (interface technologies) are involved.
• Varying with the BS type, distribution model, network environment, and evolution process, the
convergence modes include microwave, MSTP, DSL, PON, and Fiber. You can converge services on BSs
directly to the MAN UPE or through convergence gateways (with functions of BS convergence,
compression optimization, packet gateway, and offload).
• Reliability, security, QoS and operation and maintenance (OM) are considered in IP RAN scenarios. In
some IP RAN scenarios, transmission efficiency is concerned.
2022-07-08 1154
Feature Description
■ The packet encapsulation time equals 0.125 ms multiplied by the number of frames encapsulated
into a packet.
■ The network delay refers to the transmission delay between two PEs.
• Clock synchronization
TDMoPSN service packets are transmitted at a constant rate. The local and remote devices must have
synchronized clocks before exchanging TDMoPSN service packets. Traditional TDM services can
synchronize clocks through a physical link but TDMoPSN services are carried on a PSN. TDM services
lose synchronization clock signals when reaching a downstream PE.
A downstream PE uses either of the following methods to synchronize clocks:
• QoS processing
TDM services require low delay and jitter and fixed bandwidth. A high QoS priority must be specified for
TDM services.
Figure 1 shows the TDM implementation procedure from CE1, PE1, and PE2, to CE2:
2022-07-08 1155
Feature Description
■ In CESoPSN mode, PE1 encapsulates bytes 1 to 31 (payload) of the E1 frame received from CE1 in
a PW packet.
■ In SAToP mode, PE1 encapsulates 256 bits as payload from the bit stream in the form of 32 x 8 =
256bit in a PW packet.
The frequency of E1 frames is fixed, and therefore PE1 receives data (31 bytes or 256 bits) of a fixed
frequency from CE1 and then encapsulates data in the PW packet continuously. When the number of
encapsulated frames reaches the pre-configured number, the whole PW packet is sent to the PSN.
In the encapsulation structure of a PW packet, the control word is mandatory. The L bit, R bit, and
sequence number domain must be paid attention to. The L bit and R bit are used to carry alarm
information. They are used when the TDM transparent transmission process transmits E1 frame data
received by PE1 in a PW to an E1 interface of PE2 and PE1 needs to transmit alarm information (such as
AIS and RDI) from CE1 to a remote device. PE1 reports received alarm information (AIS/RDI) to the
control plane. The control plane modifies the L bit and R bit in the control word of the PW packet and
then sends them with E1 frame data to PE2.
The sequence number is used to prevent PW packets from being discarded or disordered during
forwarding on the PSN. Every time a PW packet is sent by PE1, the sequence number increases by 1.
2022-07-08 1156
Feature Description
Before PWE3 is applied, CEs are directly connected by cables or fibers. In this way, alarms generated on CE1
can be directly detected by CE2. After PWE3 is applied, CE2 cannot directly detect alarms generated on CE1
because the PWE3 tunnel between CEs does not have the circuit features of TDM services. To implement
better simulation, alarm transparent transmission is used.
As shown in Figure 2, it is assumed that data is transmitted from CE2 to CE1. Alarm transparent transmission
is the process of transmitting E1/T1 alarms on PE1 to downstream PE2 through the PW control word,
restoring E1/T1 alarms, and then transmitting them to CE2, and vice versa.
The types of alarms that can be transparently transmitted are AIS and RDI. Involved PW control words are
the L bit, R bit, and M bit.
• CPOS
Alarms: AUAIS, LOS, LOF, LOM, LOP, OOF, LAIS, LRDI, LREI, PAIS, PPLM, PRDI, PREI, PUNEQ and
RROOL.
Statistics: B1, B2, B3, SD and SF.
8.8.2.3 CEP
2022-07-08 1157
Feature Description
Basic Concepts
Circuit Emulation over Packet (CEP) is a protocol standard of TDM PWE3. Unlike Structure-Agnostic Time
Division Multiplexing over Packet (SAToP) and Structure-Aware TDM Circuit Emulation Service over Packet
Switched Network (CESoPSN), which encapsulate payload based on low-speed PDH services, CEP
encapsulates payload based on VCs. CEP emulates Synchronous Optical Network (SONET)/Synchronous
Digital Hierarchy (SDH) circuits and services over MPLS. The emulation signals include:
CEP treats these signals as serial data code flows and fragments and encapsulates them so that they can be
transmitted over PW tunnels.
• MPLS Label
The specified PSN header includes data required to forward packets from a PSN border gateway to a
TDM border gateway.
PWs are distinguished by MPLS labels that are carried on a specified PSN layer. To transmit bidirectional
TDM services, two PWs that transmit in opposite directions are associated.
• CEP Header
Figure 2 shows the CEP header format.
2022-07-08 1158
Feature Description
■ L bit: CEP-AIS. This bit must be set to 1 to signal to the downstream PE that a failure condition has
been detected on the attachment circuit.
■ R bit: CEP-RDI. This bit must be set to 1 to signal to the upstream PE that a loss of packet
synchronization has occurred. This bit must be set to 0 once packet synchronization is acquired.
■ N and P bits: These bits are used to explicitly relay negative and positive pointer adjustments
events across the PSN. The use of N and P bits is optional. If not used, N and P bits must be set to
0.
■ Length (6 bits): length of a TDMoPSN packet (including the length of a CEP header, plus the length
of the RTP header if used, and plus the length of the payload). If the length of the TDMoPSN
packet is shorter than the minimum transmission unit (64 bytes) on the PSN, padding bits are used.
If the length of the TDMoPSN packet is longer than 64 bytes, the entire field is padded with 0s.
■ Sequence Number (16 bits): used for PW sequencing and enabling the detection of discarded and
disordered packets. The length of the sequence number is 16 bits and has unsigned circular space.
The initial value of the sequence number is random.
• Optional RTP
An RTP header can carry timestamp information to a remote device to support packet recovery clock,
such as DCR.
By default, the RTP header is not configured. You can add it to packets. RTP configurations of PEs on
both ends of a PWE3 must be the same; otherwise, the two PEs cannot communicate with each other.
The sequence number (16 bits) in the RTP header is padded in the same way as that in the CEP header.
The other bits in the RTP header are 0s.
• TDM Payload
The TDM packet payload can only be 783 bytes.
2022-07-08 1159
Feature Description
CEP Implementation
Each STM-1 frame consists of 9 rows and 270 columns. VC-4 occupies 9 rows and 261 columns, a total of
2349 bytes. As a CEP payload is 783-bytes long, one VC-4 can be broken into three CEP packets.
Figure 4 shows CEP packet transmission from CE1, PE1, and PE2 to CE2.
2022-07-08 1160
Feature Description
between the CE1 clock and PE2 clock in TDM transparent transmission.
Applicable Scenario 1
Figure 1 Applicable Scenario 1
Scenario description
After TDM services from 2G base stations are converged on the E1 interface on PE1, TDM packets are
encapsulated into PSN packets that can be transmitted on PSNs. After reaching downstream PE2, PSN
packets are decapsulated to original TDM packets and then the TDM packets are sent to the 2G convergence
device.
Advantages of the solution
In the solution, multiple types of services are converged at a PE on the PSN. The solution effectively saves
original network resources, uses less PDH VLLs, and facilitates the deployment of sites and the maintenance
and administration of multiple services.
Application Scenario 2
2022-07-08 1161
Feature Description
Scenario description
TDM services of different office areas, residential areas, schools, enterprises, and institutions can be accessed
by a local PE through E1/T1 links. Heavy TDM services can be carried through CPOS interfaces.
Advantages of the solution
The solution saves the rent for VLL because TDM services for enterprises are access by a local PE. In addition,
the solution can choose access types flexibly and plan networking properly.
Applicable Scenario 3
Figure 3 Applicable Scenario 3
2022-07-08 1162
Feature Description
Scenario description
In this solution, a network can carry 2G, 3G, and fixed network services concurrently. This solution physically
integrates the transmission of different types of services but keeps the management of them independently.
Therefore, it provides different service bearer solutions for different operators on the same network.
Advantages of the solution
In the solution, different services can be carried on the same network and therefore the resource utilization
is improved and maintenance cost is reduced.
Applicable Scenario 4
Figure 4 Applicable Scenario 4
Scenario description
Services of different timeslots on different sites can be accessed by the PSN through local E1. The PE on the
convergence side binds different timeslots of different E1s to one E1 and then encapsulates bound timeslots
and other CE1/E1 services as SDH data, and finally sends encapsulated packets to the base station controller
(BSC) through the CPOS interface.
Advantages of the solution
The solution channelizes E1 services, transparently transmits E1 services, multiplexes timeslots of multiple
E1s to one E1, and manages services of multiple E1s/CE1s through the same CPOS interface.
2022-07-08 1163
Feature Description
PW Pseudo-Wire
Definition
The colored interface feature allows a router to directly output DWDM colored optical signals to the
multiplexer of a WDM device. The data link and transport layers are not isolated.
Purpose
With the rapid growth of Internet industry and traffic, revenue growth of carriers' data services lags far
behind. To address the pressure caused by traffic growth, carriers are increasing infrastructure investment
and O&M costs year by year. Carriers are in the dilemma where traffic increase does not bring corresponding
revenue increase. Carriers hope to reduce network layers to reduce Operating Expenses (OpEx) and Capital
Expenditures (CapEx).
To satisfy this need, Huawei has developed colored boards for NE40E. With colored optical modules
integrated, colored boards require fewer colorless optical modules, which reduces unnecessary optical-to-
electrical and electrical-to-optical conversion. The colored optical modules also simplify network layers and
2022-07-08 1164
Feature Description
Benefits
Colored interfaces offer the following benefits to carriers:
• Simplified network layers: Network layers are reduced by simplifying WDM devices.
• Saved resources: The equipment room and power consumption are saved.
8.9.2.1 Concepts
Overview of WDM
Wavelength-division multiplexing (WDM), a technology used in the MAN and WAN, is used to transmit two
or more optical signals of different wavelengths through the same optical fiber. A WDM system uses a
multiplexer at the transmitter to join multiple optical carrier signals of different wavelengths (carrying
different information) together on a single optical fiber, and a demultiplexer at the receiver to split the
optical carrier signals apart. Then, an optical receiver further processes and restores the optical carrier
signals to the original signals.
WDM interfaces supported by the NE40E consist of two interfaces, namely the controller WDM interface and
its corresponding GE interface. Parameters related to the optical layer and electrical layer are configured in
the controller WDM interface view, and all service features are configured in the GE interface view. The
mapping mode of service signals on WDM interfaces is Ethernet over OTN.
2022-07-08 1165
Feature Description
Overview of OTN
Currently, the Synchronous Digital Hierarchy over Synchronous Optical Network (SDH/SONET) and WDM
networks are usually used as transport networks. SDH/SONET processes and schedules services at the
electrical layer and WDM processes and schedules services at the optical layer. With the increasing of data
services, more and more bandwidths are required. The SDH/SONET network cannot meet the requirements
on cross scheduling and network scalability. In addition, operators require the WDM network of high
maintainability, security, and service scheduling flexibility. As a result, the OTN is developed to solve the
problems.
The OTN technology applies the operability and manageability of SDH/SONET networks to the WDM system
so that the OTN acquires the advantages of both the SDH/SONET network and the WDM network. In
addition, the OTN technology defines a complete system structure, including the management and
monitoring mechanism for each network layer and the network survival mechanism of the optical layer and
electrical layer. In this manner, operators' carrier-class requirements are really met.
The OTN, which consists of optical network elements connected through optical fiber links, provides the
transport, multiplexing, routing, management, monitoring, and protection (survival) capabilities to optical
channels that are used to transmit client signals. The OTN features that the transport settings of any digital
client signal are independent of specified client features, namely, client independence. Optical Transport
Hierarchy (OTH) is a new connection-oriented transport technology that is used to develop the OTN. Owing
to the great scalable capability, the OTN is applicable to the backbone mesh network. Ideally, the future
transport network is an all OTN network. Compared with SDH networks, the OTN is the optical transport
network of the next generation.
Compared with the traditional SDH and SONET networks, the OTN has the following advantages:
FEC Overview
The communication reliability is of great importance to communication technologies. Multiple channel
2022-07-08 1166
Feature Description
protection measures and automatic error correction coding techniques are used to enhance reliability.
The OTU overhead of an OTN frame contains FEC information. FEC, which corrects data by using algorithms,
can effectively improve the transport performance of the system where the signal-to-noise ratio (SNR) and
dispersion are limited. In this manner, the investment cost on the transport system is reduced accordingly. In
addition, in the system using FEC, the receiver can receive signals of a lower SNR. The maximum single span
is enlarged or the number of spans increases. In this manner, the total transmission distance of signals is
prolonged.
TTI Overview
Trail trace identifier (TTI) is a byte string in the overhead of an optical transport unit (OTU) or an optical
data unit (ODU). Like the J byte in the SDH segment overhead, the TTI identifies the source and destination
stations to which each optical fiber is connected to prevent incorrect connection. If the received TTI differs
from the expected value, a TIM alarm is generated.
OTU overhead: contains information about the transmission function of optical channels, and defines FAS,
MFAS, GCC0, and SM (such as TTI, BIP-8, BDI, BEI/BIAE, and IAE) overheads. Among these overheads, TTI is a
64-byte string monitoring the connectivity of the OTU segment.
ODU overhead: contains information about the maintenance and operation function of optical channels, and
defines TCM, PM, GCC1/GCC2, APS/PCC, and FTFL overheads. Among these overheads, TCM monitors the
serial connection, and PM monitors ODU paths.
2022-07-08 1167
Feature Description
• TCM overhead belongs to the ODU overhead. The TCM overhead has six levels (TCMn, n = 1...6) with
each TCMn occupying three bytes.
Figure 2 shows the specific allocation of the SM, PM, and TCM overheads.
2022-07-08 1168
Feature Description
Fundamentals
Delay measurement for the PM layer depends on bit 7 (DMp) of the PM&TCM byte in an ODU frame. Figure
1 shows the bits in the PM&TCM byte.
A toggled DMp signal indicates the start of delay measurement. Generally, the DMp signal has a fixed bit
value (0 or 1). When the value is toggled from 0 to 1 or 1 to 0, the two-way delay measurement starts. After
the value changes, the new value of the DMp signal remains unchanged until the next delay measurement
starts.
2022-07-08 1169
Feature Description
In Figure 2, the source path connection monitoring end point (P-CMEP) inserts and transmits the DMp signal
to the sink P-CMEP, which then loops it back to the source P-CMEP. If N is the number of frame periods
from the time the source P-CMEP transmits the toggled bit value of the DMp signal to the time the source
P-CMEP receives that bit value from the loopback node (sink P-CMEP), the OTN delay value can be
calculated using the following formula:
OTN delay = N × Each OTN frame period
Measurement Process
In delay measurement, devices, depending on their roles, can work in insertion mode, loopback mode, or
transparent transmission mode. Figure 3 shows the delay measurement process.
• The source device works in insertion mode and sends the DMp signal to the sink device.
• The sink device works in loopback mode, extracts the DMp signal from the ODU overhead, and loops it
back to the ODU overhead of the source device.
• The intermediate device works in transparent transmission mode and transparently transmits the DMp
signal of the ODUk layer without any processing.
Measurement Result
A measurement of a round-trip delay includes:
2022-07-08 1170
Feature Description
• The delay of the electrical layer (including the optical module) in routers' OTN subcards.
• The delay of the transport network (including both the electrical and optical layers)
Gray (colorless) light interfaces on the router are LAN, WAN, or POS.
2022-07-08 1171
Feature Description
To satisfy long-distance transmission, the router must provide OTN interfaces (OTUk) that have strong error correction
capabilities.
Terms
Term Definition
2022-07-08 1172
Feature Description
Definition
Linear multiplex section protection (LMSP) is an SDH interface-based protection technique that uses an SDH
interface to protect services on another SDH interface. If a link failure occurs, LMSP enables a device to send
a protection switching request over K bytes to its peer device. The peer device then returns a switching
bridge reply.
Purpose
Large numbers of low-speed links still exist on the user side. These links may be unstable due to aging.
These links have a small capacity and may fail to work properly due to congestion in traffic burst scenarios.
Therefore, a protection technique is required to provide reliability and stability for these low-speed links.
LMSP is an inherent feature of an SDH network. When a mobile bearer network is deployed, a Router must
be connected to an add/drop multiplexer (ADM) or RNC, both of which support LMSP. As the original
protection function of the Router cannot properly protect the communication channel between the Router
and ADM or RNC, LMSP is introduced to resolve this issue.
Benefits
LMSP offers the following benefits:
• Improves the reliability and security of low-speed links and enhances product creditability and market
competitiveness by reducing labor costs (automatic switching) and decreasing network interruption
time (rapid switching).
8.10.2 Principles
LMSP is a redundancy protection mechanism that uses a backup channel to protect services on a channel.
2022-07-08 1173
Feature Description
LMSP is defined in ITU-T G.783 and G.841 and used to protect multiplex section (MS) layers in linear
networking mode. LMSP applies to point-to-point physical networks.
LMSP can protect services against disconnection of the optical fiber on which the working MS resides, regenerator
failures, and MS performance deterioration. It does not protect against node failures.
As a supporting network, an SDH network facilitates the establishment of large-scale data communications
networks with high bandwidth. For example, data networks A and B can communicate with each other by
multiplexing services to SDH payloads and transmitting the payloads over optical fibers. An LMSP-enabled
router can protect traffic on a link to an ADM on an SDH network that has LMSP functions. Two LMSP-
enabled routers can also interwork to protect traffic on the direct link between them.
Linear MS Mode
Linear MS modes are classified as 1+1 or 1:N protection modes by protection structure (only 1:1 protection is
implemented).
• In 1+1 protection mode, each working link has a dedicated protection link as its backup. In a process
called bridging, a transmit end transmits data on both the working and protection links simultaneously.
In normal circumstances, a receive end receives data from the working link. If the working link fails and
the receive end detects the failure, the receive end receives data from the protection link. Generally,
only a receive end performs a switching action, along with single-ended protection. K1 and K2 bytes are
not required for LMSP negotiation.
The 1+1 protection mode has advantages such as rapid traffic switching and high reliability. However,
this mode has a low channel usage (about 50%). Figure 1 shows the 1+1 protection mode.
2022-07-08 1174
Feature Description
• In 1:N protection mode, a protection link provides traffic protection for N working links (1 ≤ N ≤ 14). In
normal circumstances, a transmit end transmits data on a working link. The protection link can transmit
low-priority data or it may not transmit any data. If the working link fails, the transmit end bridges data
onto the protection link. The receive end then receives data from the protection link. If the transmit end
is transmitting low-priority data on the protection link, it will stop the data transmission and start
transmitting high-priority protected data. Figure 2 shows the 1:N protection mode.
If several working links fail at the same time, only data on the working link with the highest priority can
be switched to the protection link. Data on other faulty working links is lost.
When N is 1, the 1:N protection mode becomes the 1:1 protection mode.
The 1:N protection mode requires both a transmit end and a receive end to perform switching.
Therefore, K1 and K2 bytes are required for negotiation. The 1:N protection mode has a high channel
usage but poorer reliability than the 1+1 protection mode.
■ In single-ended switching mode, if a link failure occurs, only the receive end detects the failure and
performs a switching action. Because only the receive end performs switching and bridging actions,
the two ends of an LMSP connection may select different links to receive traffic.
■ In dual-ended switching mode, if a link failure occurs, the receive end detects the failure and
performs a switching action. The transmit end also performs a switching action through SDH K
byte negotiation although it does not detect the failure. As a result, both ends of an LMSP
connection select the same link to send and receive traffic.
Single-ended switching must work with 1+1 protection, but dual-ended switching can work with 1:1 or
1+1 protection.
2022-07-08 1175
Feature Description
• LMSP types can be classified as single-chassis LMSP or multi-chassis LMSP (MC-LMSP), depending on
the number of LMSP-enabled devices.
Linear MS K Bytes
LMSP uses APS to control bridging, switching, and recovery actions. APS information is transmitted over the
K1 and K2 bytes in the MS overhead in an SDH frame structure. Table 1 lists the bit layout of the K1 and K2
bytes.
K1 K2
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
• Bits 7, 6, 5, and 4 of the K1 byte: switching request code. Table 2 describes switching request code
values and their meanings.
2022-07-08 1176
Feature Description
• Bits 3, 2, 1, and 0 of the K1 byte: switching request channel numbers. The value 0 indicates a protection
channel. The values 1 to 14 indicate working channels (the value can be only 1 in 1+1 protection
mode). The value 15 indicates an extra service channel (the value can be 15 only in 1:N protection
mode).
• Bits 7, 6, 5, and 4 of the K2 byte: bridging/switching channel numbers. The value meanings of a bridging
channel number are the same as those of a switching request channel number.
• Bit 3 of the K2 byte: protection mode. The value 0 indicates 1+1 protection, and the value 1 indicates 1:1
protection.
• Bits 2, 1, and 0 of the K2 byte: MS status code. The values are as follows:
■ 101: dual-ended
1. Device B receives a signal failure message and sends a bridge request to device A through the
protection channel.
2. After receiving the bridge request, device A sends a response to device B through the protection
channel.
3. After receiving the response, device B performs switching and bridging actions and sends a switching
acknowledgement to device A through the protection channel.
4. After receiving the switching acknowledgement, device A performs bridging and switching actions. The
switching is complete when LMSP enters the stable state.
2022-07-08 1177
Feature Description
• K1 and K2 bytes are sent in both single-ended protection and dual-ended protection. The information in
the K1 and K2 bytes, for example, 1:1/1+1 or single-/dual-ended protection information, must be
configured as required.
2022-07-08 1178
Feature Description
• Information in the K2 byte, for example, 1:1/1+1 or single-/dual-ended protection information, in both
single-ended protection and dual-ended protection must be verified. In single-ended protection mode, if
the local end finds that the configuration on the peer end is different from its configuration, it reports
an alarm, and switching is not affected. In dual-ended protection mode, if the local end finds that the
configuration on the peer end is different from its configuration, it reports an alarm, and switching is
affected.
PGP
MC-LMSP is implemented between main control boards over PGP. The connection mode is UDP. Figure 1
shows the communication process.
1. Interface board of the master device sends a message to the main control board through the IPC.
2. The main control board of the master device constructs a PGP packet and sends the packet from the
main control board to interface board over the VP.
3. The master device sends the packet through an interface to the backup device.
4. The backup device sends the packet to the main control board over the VP.
5. The main control board of the backup device performs APS PGP processing, and sends a message to
interface board through the IPC.
6. The interface board of the backup device sends the packet back to the master device.
7. The master device sends the packet from the interface board to the main control board.
2022-07-08 1179
Feature Description
1. The interfaces on TPE2 and TPE3 form an MC-LMSP group. TPE2 and TPE3 are configured as the
working and protection NEs, respectively. The LMSP state machine runs on TPE3.
4. An ICB channel is deployed to synchronize the status between TPE2 and TPE3.
8.10.3 Applications
2022-07-08 1180
Feature Description
• On the access side, a NodeB/BTS is connected to the Router over an E1 or SDH link, and a microwave or
SDH device is connected to the Router over an optical fiber. Single-chassis LMSP is configured for the
STM-1 link between the Router and microwave or SDH device.
• On the network side, the Router is connected to PEs. Single-chassis LMSP is configured on POS or CPOS
interfaces.
Access Side
2022-07-08 1181
Feature Description
Scenario 1: On the network shown in Figure 2, a base station is connected to the Router through the
microwave devices and then over the IMA/TDM link (CPOS interface) that has LMSP configured. The RNC is
connected to the device over the IMA/TDM link (CPOS interface). After base station data reaches the Router,
the base station can interwork with the RNC over the PW between the Router and device.
Scenario 2: On the network shown in Figure 3, a base station is connected to the Router through the
microwave devices and then over the IMA link (CPOS interface) that has LMSP configured. The RNC is
connected to the device over the ATM link. After base station data reaches the Router, the base station can
interwork with the RNC over the PW between the Router and device.
Network Side
Scenario 1: On the network shown in Figure 4, the Router's network-side interface is a CPOS interface on
which a global MP group is configured. Single-chassis LMSP is configured on the CPOS interface. The Router
is connected to another device to carry PW/L3VPN/MPLS/DCN services.
Scenario 2: On the network shown in Figure 5, the Router's network-side interface is a POS interface. Single-
chassis LMSP is configured on the POS interface. The Router is connected to another device to carry
PW/VPLS/L3VPN/MPLS/DCN services.
2022-07-08 1182
Feature Description
• If the primary PW fails, traffic switches to the backup PW. The traffic forwarding path on the AC side
remains unchanged, that is, traffic is still forwarded over the link between the RNC and Device C. The
traffic is then transmitted from Device C to Device B over the bypass PW.
• If the link between the RNC and Device C fails, traffic switches to the link between the RNC and Device
B over LMSP. If the negotiation mode of PW redundancy has been set to Independent, a
primary/backup PW switchover is performed. If the negotiation mode of PW redundancy has been set to
Master/Slave, no primary/backup PW switchover is performed and traffic is transmitted from Device B
to Device C over the bypass PW.
2022-07-08 1183
Feature Description
switching and high reliability. When MC-LMSP 1+1 protection is configured, the primary and backup PWs are
deployed on the Routers to transparently transmit data from the RNC to a remote Router. Two bypass PWs
must also be deployed between Device C and Device B to provide bypass protection for the primary and
backup PWs.
When the primary PW or the link between the RNC and Device C fails, the protection method in the scenario
of MC-LMSP 1+1 protection+two bypass PWs is similar to that in the scenario of MC-LMSP 1:1
protection+one bypass PW. The difference is that in the scenario of MC-LMSP 1+1 protection+two bypass
PWs, two bypass PWs are deployed between Device C and Device B. This ensures traffic replication for MC-
LMSP 1+1 protection and provides bypass protection for the primary and backup PWs and AC-side working
and protection links. If a fault occurs, such deployment can implement rapid traffic switching to ensure that
the networking environment after the switching also has MC-LMSP 1+1 protection.
Figure 2 shows a network with MC-LMSP 1+1 protection+two bypass PWs deployed.
• If the working PW fails, traffic switches to the protection PW. After Device B receives traffic from the
public network side through port A, it queries the MC-LMSP status on the AC side. If MC-LMSP has not
performed a working/protection channel switchover, Device B forwards the traffic to Device C through
port C. Device C then forwards the traffic to the RNC through port B. If MC-LMSP has performed a
working/protection channel switchover, Device B forwards the traffic to the RNC through port B.
• If the working channel between the RNC and Device C fails, traffic switches to the protection channel
2022-07-08 1184
Feature Description
between the RNC and Device B over LMSP. After Device B receives traffic from the AC side through port
B, it queries the E-PW APS status on the public network side. If E-PW APS has not performed a
working/protection PW switchover, Device B forwards the traffic to Device C through port C. Device C
then forwards the traffic to Device A through port A. If E-PW APS has performed a working/protection
PW switchover, Device B forwards the traffic to Device A through port A.
2022-07-08 1185
Feature Description
If the primary link between the MUX and Device A fails, traffic switches to the secondary link between the
MUX and Device B.
When the MUX detects the fault, it sends traffic to the protection link between the MUX and Device B.
Device B then sends the traffic based on the neighbor relationship learned by OSPF. Finally, traffic reaches
the BSC.
2022-07-08 1186
Feature Description
9 IP Services
Purpose
This document describes the IP services feature in terms of its overview, principles, and applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
2022-07-08 1187
Feature Description
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
2022-07-08 1188
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2022-07-08 1189
Feature Description
Definition
The Address Resolution Protocol (ARP) is an Internet protocol used to map IP addresses to MAC addresses.
Purpose
If two hosts need to communicate, the sender must know the network-layer IP address of the receiver. IP
datagrams, however, must be encapsulated with MAC addresses before they can be transmitted over the
physical network. Therefore, ARP is needed to map IP addresses to MAC addresses to ensure the
transmission of datagrams.
Function Overview
Table 1 lists ARP features.
Static ARP The mapping between IP and Communication security is a priority, and
MAC addresses is manually network resources are sufficient.
created and cannot be
dynamically modified.
Gratuitous ARP A device broadcasts gratuitous Gratuitous ARP is used to check whether the
ARP packets that carry the local local IP address conflicts with that of another
2022-07-08 1190
Feature Description
IP address as both the source and device, to notify other devices on the same
destination IP addresses to notify network segment of the new MAC address
the other devices on the same after the local network interface card is
network segment of its address replaced, or to notify master/slave switchovers
information. in a Virtual Router Redundancy Protocol
(VRRP) backup group.
Proxy ARP If a proxy ARP-enabled device Two hosts have the same network ID, but are
receives an ARP request message located on different physical network
that destined for another device, segments. If the hosts need to communicate,
the proxy ARP-enabled device routed proxy ARP must be enabled on the
encapsulates its MAC address intermediate device.
into an ARP reply message and Two hosts belong to the same VLAN, but host
sends the packet to the device isolation is configured for the VLAN. If the two
that sends the ARP request hosts need to communicate, intra-VLAN proxy
message. ARP must be enabled on the interfaces that
connect the two hosts.
Two hosts belong to different VLANs. If the
two hosts need to communicate at Layer 2,
inter-VLAN proxy ARP must be enabled on the
interfaces that connect the two hosts.
In the Ethernet virtual connection (EVC)
mode, if two hosts belong to the same bridge
domain (BD) for which host isolation is
configured, you must enable local proxy ARP
on the VBDIF interfaces that connect the two
hosts. Otherwise, the two hosts cannot
communicate.
ARP-Ping ARP-Ping uses ARP or ICMP To prevent address conflict, send ARP
request messages to detect messages to check whether an address is
whether an IP or MAC address to already in use on the network before
be configured for a device is in configuring an IP or MAC address for a device.
use.
Dual-Device ARP Hot Dual-device ARP hot backup Dual-device ARP hot backup prevents
Backup enables ARP entries on the downlink traffic from being interrupted
control and forwarding planes to because the backup device does not learn ARP
be synchronized between the entries from a device on the user side during a
master and backup devices in master/backup VRRP switchover, which
2022-07-08 1191
Feature Description
Benefits
ARP ensures communication by mapping IP addresses at the network layer to MAC addresses at the link
layer on Ethernet networks.
The Ethernet Address of destination field contains a total of 48 bits. Ethernet Address of destination (0-31)
indicates the first 32 bits of the Ethernet Address of destination field, and Ethernet Address of destination (32-
2022-07-08 1192
Feature Description
47) indicates the last 16 bits of the Ethernet Address of destination field.
An ARP message consists of 42 bytes. The first 14 bytes indicate the Ethernet frame header, and the last
28 bytes are the ARP request or reply message content. Table 1 describes the fields in an ARP message.
Ethernet address of 48 bits Ethernet destination MAC address in the Ethernet frame header.
destination This field in an ARP request message is the broadcast MAC
address, with a value of 0xFF-FF-FF-FF-FF-FF.
Ethernet address of 48 bits Ethernet source MAC address in the Ethernet frame header.
sender
Frame type 16 bits Data type. For an ARP request or reply message, the value of this
field is 0x0806.
Hardware type 16 bits Hardware address type. For an Ethernet network, the value of this
field is 1.
Protocol type 16 bits Type of the protocol address to be mapped by the sending device.
For an IP address, the value of this field is 0x0800.
Hardware length 8 bits Hardware address length. For an ARP request or reply message,
the value of this field is 6.
Protocol length 8 bits Protocol address length. For an ARP request or reply message, the
value of this field is 4.
Ethernet address of 48 bits Source MAC address. The value of this field is the same as the
sender Ethernet source MAC address in the Ethernet frame header.
Ethernet address of 48 bits Destination MAC address. The value of this field in an ARP request
destination message is 0x00-00-00-00-00-00.
2022-07-08 1193
Feature Description
destination
• ARP table
An ARP table contains the latest mapping between IP and MAC addresses. If a host always broadcasts
an ARP request message for a MAC address before it sends an IP datagram, network communication
traffic will greatly increase. Furthermore, all other hosts on the network have to receive and process the
ARP request messages, which lowers network efficiency. To solve this problem, an ARP table is
maintained on each host to ensure efficient ARP operations. The mapping between an IP address and a
MAC address is called an ARP entry.
ARP entries can be classified as dynamic or static.
■ Dynamic ARP entries are automatically generated and maintained by using ARP messages.
Dynamic ARP entries can be aged and overwritten by static ARP entries.
■ Static ARP entries are manually configured and maintained by a network administrator. Static ARP
entries can neither be aged nor be overwritten by dynamic ARP entries.
Before sending IP datagrams, a host searches the ARP table for the MAC address corresponding to the
destination IP address.
■ If the ARP table contains the corresponding MAC address, the host directly sends the IP datagrams
to the MAC address instead of sending an ARP request message.
■ If the ARP table does not contain the corresponding MAC address, the host broadcasts an ARP
request message to request the MAC address of the destination host.
Implementation
• ARP implementation within a network segment
Figure 2 illustrates how ARP is implemented within a network segment, by using IP datagram
transmission from Host A to Host B as an example.
2022-07-08 1194
Feature Description
Figure 2 ARP implementation between Host A and Host B on the same network segment
1. Host A searches its ARP table and does not find the mapping between the IP and MAC addresses
of Host B. Host A then sends an ARP request message for the MAC address of Host B. In this ARP
request message, the source IP and MAC addresses are respectively the IP and MAC addresses of
Host A, the destination IP and MAC addresses are respectively the IP address of Host B and 00-
00-00-00-00-00, and the Ethernet source MAC address and Ethernet destination MAC address are
respectively the MAC address of Host A and the broadcast MAC address.
2. After CE1 receives the ARP request message, CE1 broadcasts it on the network segment.
3. After Host B receives the ARP request message, Host B adds the MAC address of Host A to its ARP
table and sends an ARP reply message to Host A. In this ARP reply message, the source IP and
MAC addresses are respectively the IP and MAC addresses of Host B, the destination IP and MAC
addresses are respectively the IP and MAC addresses of Host A, and the Ethernet source and
destination MAC addresses are respectively the MAC addresses of Host B and Host A.
The PE also receives the ARP request message but discards it because the destination IP address in the ARP
request message is not its own IP address.
5. After Host A receives the ARP reply message, Host A adds the MAC address of Host B to its ARP
table and sends the IP datagrams to Host B.
2022-07-08 1195
Feature Description
ARP messages are Layer 2 messages. Therefore, ARP is applicable only to devices on the same network segment. If
two hosts on different network segments need to communicate, the source host sends IP datagrams to the default
gateway, which in turns forwards the IP datagrams to the destination host. ARP implementation between different
network segments involves separate ARP implementation within network segments. In this manner, hosts on
different network segments can communicate.
The following examples show how ARP is implemented between different network segments, by using
IP datagram transmission from Host A to Host C as an example.
Figure 3 illustrates how ARP is implemented between Host A and the PE on the same network segment.
1. Host A searches its ARP table and does not find the mapping between the IP and MAC addresses
of Interface 1 on the default gateway PE that connects to Host C. Host A then sends an ARP
request message for the MAC address of the PE's Interface 1. In this ARP request message, the
source IP and MAC addresses are respectively the IP and MAC addresses of Host A, the
destination IP and MAC addresses are respectively the IP address of the PE's Interface 1 and 00-
00-00-00-00-00, and the Ethernet source and destination MAC addresses are respectively the
MAC address of Host A and the broadcast MAC address.
2. After CE1 receives the ARP request message, CE1 broadcasts it on the network segment.
3. After the PE receives the ARP request message, the PE adds the MAC address of Host A to its ARP
table and sends an ARP reply message to Host A. In this ARP reply message, the source IP and
MAC addresses are respectively the IP and MAC addresses of the PE's Interface 1, the destination
IP and MAC addresses are respectively the IP and MAC addresses of Host A, and the Ethernet
source and destination MAC addresses are respectively the MAC address of the PE's Interface 1
and the MAC address of Host A.
2022-07-08 1196
Feature Description
Host B also receives the ARP request message but discards it because the destination IP address in the ARP
request message is not its own IP address.
5. After Host A receives the ARP reply message, Host A adds the MAC address of the PE's Interface 1
to its ARP table and sends the IP datagrams to the PE.
Figure 4 illustrates ARP implementation between the PE and Host C on the same network segment.
The PE searches its routing table and sends the IP datagrams from Interface 1 to Interface 2.
1. The PE searches its ARP table and does not find the mapping between the IP address and MAC
address of Host C. Then, the PE sends an ARP request message for the MAC address of Host C. In
this ARP request message, the source IP and MAC addresses are respectively the IP and MAC
addresses of the PE's Interface 2, the destination IP and MAC addresses are respectively the Host
C's IP address and 00-00-00-00-00-00, and the Ethernet source and destination MAC address are
respectively the MAC address of Interface 2 on PE and the broadcast MAC address.
2. After CE2 receives the ARP request message, CE2 broadcasts it on the network segment.
3. After Host C receives the ARP request message, Host C adds the MAC address of the PE's
Interface 2 to its ARP table and sends an ARP reply message to the PE. In this ARP reply message,
2022-07-08 1197
Feature Description
the source IP and MAC addresses are respectively the IP and MAC addresses of Host C, the
destination IP and MAC addresses are respectively the IP and MAC addresses of the PE's Interface
2, and the Ethernet source and destination MAC addresses are respectively the MAC address of
Host C and the MAC address of Interface 2 on PE.
Host D also receives the ARP request message but discards it because the destination IP address in the ARP
request message is not its own IP address.
4. CE2 receives the ARP reply message and forwards it to the PE.
5. After the PE receives the ARP reply message, the PE adds the MAC address of Host C to its ARP
table and sends the IP datagrams to Host C.
1. ARP request messages are broadcast, whereas ARP reply messages are unicast.
2. In ARP implementation, CE1 and CE2 transparently forward IP datagrams and do not modify them.
Definition
Dynamic ARP allows devices to dynamically learn and update the mapping between IP and MAC addresses
using ARP messages. You do not need to manually configure the mapping.
Aging Before a dynamic ARP entry If the IP address of the peer device remains unchanged but its
probe on a device is aged, the MAC address changes frequently, it is recommended that you
mode device sends ARP aging configure ARP aging probe messages to be broadcast.
probe messages to the If the MAC address of the peer device remains unchanged,
other devices on the same network bandwidth resources are insufficient, and the aging time
2022-07-08 1198
Feature Description
network segment. An ARP of ARP entries is set to a small value, it is recommended that you
aging probe message can be configure ARP aging probe messages to be unicast.
a unicast or broadcast
message. By default, a
device broadcasts ARP aging
probe messages.
Aging A dynamic ARP entry has a Two interconnected devices can learn the mapping between their
time life cycle. If a dynamic ARP IP and MAC addresses using ARP and can save the mapping in
entry is not updated before their ARP tables. Then, the two devices can communicate by
its life cycle ends, this using the ARP entries. When the peer device becomes faulty, or
dynamic ARP entry is the network adapter of the peer device is replaced but the local
deleted from the ARP table. device does not receive any status change information about the
The life cycle is called aging peer device, the local device continues sending IP datagrams to
time. the peer device. As a result, network traffic is interrupted
because the ARP table of the local device is not promptly
updated. To reduce the risk of network traffic interruption, an
aging timer can be set for each ARP entry. After the aging timer
of a dynamic ARP entry expires, the entry is automatically
deleted.
Number Before a dynamic ARP entry The ARP aging timer can help reduce the risk of network traffic
of aging is aged, a device sends ARP interruptions that occur because an ARP table is not updated
probe aging probe messages to quickly enough, but cannot eliminate problems due to delays.
attempts the peer device. If the device Specifically, if the dynamic ARP entry aging time is N seconds,
does not receive an ARP the local device can detect the status change of the peer device
reply message after the after N seconds. During the N seconds, the ARP table of the local
number of aging probe device is not updated. If the number of aging probe attempts is
attempts reaches a specified specified, the local device can obtain the status change
number, the dynamic ARP information about the peer device and update its ARP table.
entry is aged.
Implementation
Dynamic ARP entries can be created, updated, and aged.
If a device receives an ARP message that meets either of the following conditions, the device
automatically creates or updates an ARP entry:
2022-07-08 1199
Feature Description
■ The source IP address of the ARP message is on the same network segment as the IP address of the
inbound interface. The destination IP address of the ARP message is the IP address of the inbound
interface.
■ The source IP address of the ARP message is on the same network segment as the IP address of the
inbound interface. The destination IP address of the ARP message is the virtual IP address of the
VRRP group configured on the interface on the device.
Enhanced Functions
Dynamic ARP has an enhanced Layer 2 topology probe function. This function enables a device to set the
aging time to 0 for all ARP entries corresponding to a VLAN to which a Layer 2 interface belongs when the
Layer 2 interface becomes Up. The device then resends ARP probe messages to update all ARP entries.
If a non-Huawei device that connects to a Huawei device receives an ARP aging probe message with the
destination MAC address as the broadcast address and the ARP table of the non-Huawei device contains the
mapping between the IP address and MAC address of the Huawei device, the non-Huawei device does not
respond to the broadcast ARP aging probe message. Therefore, the Huawei device considers the link to the
non-Huawei device Down and deletes the mapping between the IP address and MAC address of the non-
Huawei device. To prevent this problem, configure Layer 2 topology change so that the Huawei device
unicasts ARP aging probe messages to the non-Huawei device.
Usage Scenario
Dynamic ARP applies to a network with a complex topology, insufficient bandwidth resources, and a high
real-time communication requirement.
Benefits
Dynamic ARP entries are dynamically created and updated using ARP messages. They do not need to be
manually maintained, greatly reducing maintenance workload.
2022-07-08 1200
Feature Description
Definition
Static ARP allows a network administrator to create the mapping between IP and MAC addresses.
Background
The difference between static ARP and dynamic ARP lies in the method of generating and maintaining ARP
entries. Dynamic ARP entries are automatically generated and maintained using ARP packets, while static
ARP entries must be manually configured and maintained by network administrators. The advantages and
disadvantages of dynamic and static ARP are as follows:
• Dynamic ARP
• Static ARP
2022-07-08 1201
Feature Description
• Binds IP addresses to the MAC address of a specified gateway so that IP datagrams destined for these IP
addresses must be forwarded by this gateway.
• Binds the destination IP addresses of IP datagrams sent by a specified host to a nonexistent MAC
address, helping filter out unwanted IP datagrams.
To ensure the stability and security of network communication, deploy static ARP based on actual
requirements and network resources.
Related Concepts
Static ARP entries are classified as short or long entries.
In Network Load Balancing (NLB) scenarios, you must configure both MAC entries with multiple outbound
interfaces and short static ARP entries for the gateway. These MAC entries and short static ARP entries must have
the same MAC address. In NLB scenarios, short static ARP entries are also called ARP entries with multiple
outbound interfaces and cannot be updated manually.
Usage Scenario
Static ARP applies to the following scenarios:
2022-07-08 1202
Feature Description
• Short static ARP entries mainly apply to scenarios in which network administrators want to bind hosts'
IP and MAC addresses but hosts' access interfaces can change.
Benefits
Static ARP ensures communication security. If a static ARP entry is configured on a device, the device can
communicate with the peer device using only the specified MAC address. Network attackers cannot modify
the mapping between the IP and MAC addresses using ARP messages, ensuring communication between the
two devices.
Principles
Gratuitous ARP allows a device to broadcast gratuitous ARP messages that carry the local IP address as both
the source and destination IP addresses to notify the other devices on the same network segment of its
address information. Gratuitous ARP is used in the following scenarios to ensure the stability and reliability
of network communication:
• You need to check whether the IP address of a device conflicts with the IP address of another device on
the same network segment. The IP address of each device must be unique to ensure the stability of
network communication.
• After the MAC address of a host changes after its network adapter is replaced, the host must quickly
notify other devices on the same network segment of the MAC address change before the ARP entry is
aged. This ensures the reliability of network communication.
• When a master/backup switchover occurs in a VRRP group, the new master device must notify other
devices on the same network segment of its status change.
Related Concepts
Gratuitous ARP uses gratuitous ARP messages. A gratuitous ARP message is a special ARP message that
carries the sender's IP address as both the source and destination IP addresses.
Implementation
Gratuitous ARP is implemented as follows:
• If a device finds that the source IP address in a received gratuitous ARP message is the same as its own
IP address, the device sends a gratuitous ARP message to notify the sender of the address conflict.
• If a device finds that the source IP address in a received gratuitous ARP message is different from its
2022-07-08 1203
Feature Description
own IP address, the device updates the corresponding ARP entry with the sender's IP and MAC
addresses carried in the gratuitous ARP message.
As shown in Figure 1, the IP address of Interface 1 on PE1 is 10.1.1.1, and the IP address of Interface 2 on
PE2 is 10.1.1.1.
1. Interface 1 broadcasts an ARP request message. Interface 2 receives the ARP request message and
finds that the source IP address in the message conflicts with its own IP address. Interface 2 then
performs the following operations:
b. Generates a conflict node on its conflict link and then sends gratuitous ARP messages to
Interface 1 at an interval of 5 seconds.
2. Interface 1 receives the gratuitous ARP messages from Interface 2 and finds that the source IP address
in the messages conflicts with its own IP address. Interface 1 then performs the following operations:
b. Generates a conflict node on its conflict link and then sends gratuitous ARP messages to
Interface 2 at an interval of 5 seconds.
Interface 1 and Interface 2 send gratuitous ARP messages to each other at an interval of 5 seconds until the
address conflict is rectified.
If one interface does not receive a gratuitous ARP message from the other interface within 8 seconds, the
interface considers the address conflict rectified. The interface deletes the conflict node on its conflict link
and stops sending gratuitous ARP messages to the other interface.
2022-07-08 1204
Feature Description
Functions
Gratuitous ARP has the following functions:
• Checks for IP address conflicts. If a device receives a gratuitous ARP message from another device, the
IP addresses of the two devices conflict.
• Notifies MAC address changes. When the MAC address of a host changes after its network adapter is
replaced, the host sends a gratuitous ARP message to notify other devices of the MAC address change
before the ARP entry is aged. This ensures the reliability of network communication. After receiving the
gratuitous ARP message, other devices maintain the corresponding ARP entry in their ARP tables based
on the address information carried in the message.
• Notifies status changes. When a master/backup switchover occurs in a VRRP backup group, the new
master device sends a gratuitous ARP message to notify other devices on the network of its status
change.
Benefits
Gratuitous ARP reveals address conflict on a network so that ARP tables of devices can be quickly updated.
This feature ensures the stability and reliability of network communication.
Principles
MAC-ARP association allows ARP entries on a device to be updated when MAC entries update, implementing
fast network traffic convergence.
• On ring networks, when the primary link is faulty, the user traffic must be switched to a secondary link.
This scenario requires ARP entries on the device to refresh promptly. If MSTP is applied to the network
and MSTP switches, the device exchanges ARP messages to instruct ARP entries to age fast and relearn
ARP entries (MSTP is a widely applied protocol to prevent loops. For detailed description, see
"STP/RSTP/MSTP" in NE40E Feature Description - LAN Access and MAN Access). When a great number
of users access the network and traffic convergence is slow, fast traffic convergence at Layer 3 cannot
be implemented.
After MAC-ARP association is configured, the associated ARP entries update information of the
outbound interface when MSTP refreshes Topology Change Notification (TCN) messages. Therefore,
ARP entries are updated, and traffic convergence at Layer 3 speeds up.
• In data center virtualization scenarios, when the location of a virtual machine (VM) changes, user traffic
on the network may be interrupted if the VM cannot send gratuitous ARP messages promptly to update
ARP entries on the gateway. In this case, the device relearns ARP entries by exchanging ARP messages
only after ARP entries on the gateway age.
2022-07-08 1205
Feature Description
When the VM location is changed after MAC-ARP association is enabled and a gateway's MAC entries
are updated upon receipt of Layer 2 user traffic, ARP entries and outbound interface information are
updated as follows to accelerate Layer 3 traffic convergence:
■ If ARP entries exist and the outbound interface of MAC entries is inconsistent with that of ARP
entries, ARP entries are updated based on MAC entries, and outbound interface information is
updated.
■ If ARP entries do not exist, a broadcast suppression table is searched based on MAC entries and
ARP probe is re-initiated to update ARP entries and outbound interface information.
Implementation
Figure 1 illustrates how MAC-ARP association is implemented.
In normal situations, the PE records ARP entries of Host A and Host B, and the outbound interface is
Interface 1.
1. After link 1 or link 2 fails, the CE notifies the PE by sending TCN messages to update MAC entries so
that traffic is not interrupted.
2. The PE first updates the MAC entries and then ARP entries, with the outbound interface changed to
Interface 2.
MAC-ARP association can be used to update only dynamic ARP entries and short static ARP entries.
2022-07-08 1206
Feature Description
Usage Scenario
MAC-ARP association is mainly deployed on the gateway and applies to the network that has multiple
alternative links or where users can switch to another gateway interface for access.
Benefits
MAC-ARP association speeds up the update of ARP entries and effectively ensures the real-time and stable
user traffic.
Principles
ARP is applicable only to devices on the same physical network. When a device on a physical network needs
to send IP datagrams to another physical network, the gateway is used to query the routing table to
implement communication between the two networks. However, routing table query consumes system
resources and can affect other services. To resolve this problem, deploy proxy ARP on an intermediate device.
Proxy ARP enables devices that reside on different physical network segments but on the same IP network to
resolve IP addresses to MAC addresses. This feature helps reduce system resource consumption caused by
routing table queries and improves the efficiency of system processing.
Implementation
• Routed proxy ARP
A large company network is usually divided into multiple subnets to facilitate management. The routing
information of a host in a subnet can be modified so that IP datagrams sent from this host to another
subnet are first sent to the gateway and then to another subnet. However, this solution makes it hard
to manage and maintain devices. If the gateways to which hosts are connected have different IP
addresses, you can deploy routed proxy ARP on a gateway so that the gateway sends its own MAC
address to a source host.
Figure 1 illustrates how routed proxy ARP is implemented between Host A and Host B.
2022-07-08 1207
Feature Description
1. Host A sends an ARP request message for the MAC address of Host B.
2. After the PE receives the ARP request message, the PE checks the destination IP address of the
message and finds that it is not its own IP address and determines that the requested MAC
address is not its MAC address. The PE then checks whether there are routes to Host B.
• If a route to Host B is available, the Interface1 checks whether routed proxy ARP is enabled.
■ If routed proxy ARP is enabled on the PE, the PE sends the MAC address of its Interface
1 to Host A.
■ If routed proxy ARP is not enabled on the PE, the PE discards the ARP request message
sent by Host A.
• If no route to Host B is available, the PE discards the ARP request message sent by Host A.
3. After Host A learns the MAC address of the PE's Interface 1, Host A sends IP datagrams to the PE
using this MAC address.
2022-07-08 1208
Feature Description
1. VM1 sends an ARP request message for the MAC address of VM2.
2. After receiving the ARP request message, the PE checks the destination IP address of the message
and finds that the requested MAC address is not its own MAC address. The PE then checks
whether proxy ARP anyway is enabled on Interface1:
• If proxy ARP anyway is enabled, the PE sends the MAC address of its interface Interface1 to
VM1.
• If proxy ARP anyway is not enabled, the PE discards the ARP request message sent by VM1.
2022-07-08 1209
Feature Description
3. After learning the MAC address of Interface1, VM1 sends IP datagrams to the PE based on this
MAC address.
Host A, Host B, and Host C belong to the same VLAN, but Host A and Host C cannot communicate at
Layer 2 because port isolation is enabled on the CE. To allow Host A and Host C to communicate,
configure a interface1 on the CE and enable intra-VLAN proxy ARP.
1. Host A sends an ARP request message for the MAC address of Host C.
2. After the CE receives the ARP request message, the CE checks the destination IP address of the
message and finds that it is not its own IP address and determines that the requested MAC
address is not the MAC address of its Interface 1. The CE then searches its ARP table for the ARP
entry indicating the mapping between the IP and MAC addresses of Host C.
2022-07-08 1210
Feature Description
• If the CE finds this ARP entry in its ARP table, the Interface1 checks whether intra-VLAN
proxy ARP is enabled.
■ If intra-VLAN proxy ARP is enabled on the CE, the CE sends the MAC address of its
interface1 to Host A.
■ If intra-VLAN proxy ARP is not enabled on the CE, the CE discards the ARP request
message sent by Host A.
• If the CE does not find this ARP entry in its ARP table, the CE discards the ARP request
message sent by Host A and checks whether intra-VLAN proxy ARP is enabled.
■ If intra-VLAN proxy ARP is enabled on the CE, the CE broadcasts the ARP request
message with the IP address of Host C as the destination IP address within VLAN 4.
After the CE receives an ARP reply message from Host C, the CE generates an ARP entry
indicating the mapping between the IP and MAC addresses of Host C.
■ If intra-VLAN proxy ARP is not enabled on the CE, the CE does not perform any
operations.
3. After Host A learns the MAC address of interface1, Host A sends IP datagrams to the CE using
this MAC address.
2022-07-08 1211
Feature Description
Host A belongs to VLAN 3, whereas Host B belongs to VLAN 2. Therefore, Host A cannot communicate
with Host B. To allow Host A and Host B to communicate, configure interface1 on the PE and enable
inter-VLAN proxy ARP.
1. Host A sends an ARP request message for the MAC address of Host B.
2. After the PE receives the ARP request message, the PE checks the destination IP address of the
message and finds that it is not its own IP address and determines that the requested MAC
address is not the MAC address of its interface1. The PE then searches its ARP table for the ARP
entry indicating the mapping between the IP and MAC addresses of Host B.
• If the PE finds this ARP entry in its ARP table, the Interface1 checks whether inter-VLAN
proxy ARP is enabled.
■ If inter-VLAN proxy ARP is enabled on the PE, the PE sends the MAC address of its
interface1 to Host A.
■ If inter-VLAN proxy ARP is not enabled on the PE, the PE discards the ARP request
2022-07-08 1212
Feature Description
• If the PE does not find this ARP entry in its ARP table, the PE discards the ARP request
message sent by Host A and checks whether inter-VLAN proxy ARP is enabled.
■ If inter-VLAN proxy ARP is enabled on the PE, the PE broadcasts the ARP request
message with the IP address of Host B as the destination IP address within VLAN 2.
After the PE receives an ARP reply message from Host B, the PE generates an ARP entry
indicating the mapping between the IP and MAC addresses of Host B.
■ If inter-VLAN proxy ARP is not enabled on the PE, the PE does not perform any
operations.
3. After Host A learns the MAC address of interface1, Host A sends IP datagrams to the PE using this
MAC address.
2022-07-08 1213
Feature Description
Host A and Host B belong to the same bride domain (BD) but cannot communicate at Layer 2 because
port isolation is enabled on the CE. To enable Host A and Host B to communicate, a VBDIF interface
(VBDIF 2) is configured on the CE to implement local proxy ARP.
1. Host A sends an ARP request message for the MAC address of Host B.
2. After the CE receives the ARP request message, the CE checks the destination IP address of the
message and finds that it is not its own IP address and determines that the requested MAC
address is not the MAC address of VBDIF 2. The CE then searches its ARP table for the ARP entry
indicating the mapping between the IP and MAC addresses of Host B.
• If the CE finds this ARP entry in its ARP table, the Interface1 checks whether local proxy ARP
is enabled.
■ If local proxy ARP is enabled on the CE, the CE sends the MAC address of VBDIF 2 to
Host A.
■ If local proxy ARP is not enabled on the CE, the CE discards the ARP request message.
• If the CE does not find this ARP entry in its ARP table, the CE discards the ARP request
message and checks whether local proxy ARP is enabled.
■ If local proxy ARP is enabled on the CE, the CE broadcasts an ARP request message to
request Host B's MAC address. After receiving an ARP reply message from Host B, the
CE generates an ARP entry for Host B.
■ If local proxy ARP is not enabled on the CE, the CE does not perform any operations.
3. After Host A learns the MAC address of VBDIF 2, Host A sends IP datagrams to the CE using this
MAC address.
Usage Scenario
Table 1 describes the usage scenarios for proxy ARP.
Routed proxy Two hosts that need to communicate belong to the same network segment but
ARP different physical networks. The gateways to which hosts are connected have different
IP addresses.
Proxy ARP Two VMs that need to communicate belong to the same network segment but different
anyway physical network. The gateways to which VMs are connected have the same IP address.
Intra-VLAN Two hosts that need to communicate belong to the same network segment and the
2022-07-08 1214
Feature Description
Inter-VLAN Two hosts that need to communicate belong to the same network segment but
proxy ARP different VLANs.
NOTE:
In VLAN aggregation scenarios, inter-VLAN proxy ARP can be enabled on the VLANIF interface
corresponding to the super-VLAN to implement communication between sub-VLANs.
Local proxy ARP In an EVC model, two hosts that need to communicate belong to the same network
segment and the same BD in which user isolation is configured.
Benefits
Proxy ARP offers the following benefits:
• Proxy ARP enables a host on a network to consider that the destination host is on the same network
segment. Therefore, the hosts do not need to know the physical network details but can be aware of
the network subnets.
• All processing related to proxy ARP is performed on a gateway, with no configuration needed on the
hosts connecting to it. In addition, proxy ARP affects only the ARP tables on hosts and does not affect
the ARP table and routing table on a gateway.
• Proxy ARP can be used when no default gateway is configured for a host or a host cannot route
messages.
9.2.2.7 ARP-Ping
Principles
ARP-Ping is classified as ARP-Ping IP or ARP-Ping MAC and is used to maintain a network on which Layer 2
features are deployed. ARP-Ping uses ARP messages to detect whether an IP or MAC address to be
configured for a device is in use.
• ARP-Ping IP
Before configuring an IP address for a device, check whether the IP address is being used by another
device. Generally, the ping operation can be used to check whether an IP address is being used.
However, if a firewall is configured for the device using the IP address and the firewall is configured not
to respond to ping messages, the IP address may be mistakenly considered available. To resolve this
problem, use the ARP-Ping IP feature. ARP messages are Layer 2 protocol messages and, in most cases,
can pass through a firewall configured not to respond to ping messages.
2022-07-08 1215
Feature Description
• ARP-Ping MAC
The host's MAC address is the fixed address of the network adapter on the host. It does not normally
need to be configured manually; however, there are exceptions. For example, if a device has multiple
interfaces and the manufacturer does not specify MAC addresses for these interfaces, the MAC
addresses must be configured, or a virtual MAC address must be configured for a VRRP group. Before
configuring a MAC address, use the ARP-Ping MAC feature to check whether the MAC address is being
used by another device.
Related Concepts
• ARP-Ping IP
A device obtains the specified IP address and outbound interface number from the configuration
management plane, saves them to the buffer, constructs an ARP request message, and broadcasts the
message on the outbound interface. If the device does not receive an ARP reply message within a
specified period, the device displays a message indicating that the IP address is not being used by
another device. If the device receives an ARP reply message, the device compares the source IP address
in the ARP reply message with the IP address stored in the buffer. If the two IP addresses are the same,
the device displays the source MAC address in the ARP reply message and displays a message indicating
that the IP address is being used by another device.
• ARP-Ping MAC
The ARP-Ping MAC process is similar to the ping process but ARP-Ping MAC is applicable only to directly
connected Ethernet LANs or Layer 2 Ethernet virtual private networks (VPNs). A device obtains the
specified MAC address and outbound interface number (optional) from the configuration management
plane, constructs an Internet Control Message Protocol (ICMP) Echo Request message, and broadcasts
the message on all outbound interfaces. If the device does not receive an ICMP Echo Reply message
within a specified period, the device displays a message indicating that the MAC address is not being
used by another device. If the device receives an ICMP Echo Reply message within a specified period, the
device compares the source MAC address in the message with the MAC address stored on the device. If
the two MAC addresses are the same, the device displays the source IP address in the ICMP Echo Reply
message and displays a message indicating that the MAC address is being used by another device.
Implementation
• ARP-Ping IP implementation
2022-07-08 1216
Feature Description
As shown in Figure 1, DeviceA uses ARP-Ping IP to check whether IP address 10.1.1.2 is being used. After
DeviceA receives an ARP reply message from HostA with IP address 10.1.1.2, DeviceA displays the MAC
address of HostA along with a message indicating that the IP address is in use by another host.
The ARP-Ping IP implementation process is as follows:
1. After IP address 10.1.1.2 is specified using the arp-ping ip command on DeviceA, DeviceA
broadcasts an ARP request message and starts a timer for ARP reply messages.
2. After HostA on the same LAN receives the ARP request message, HostA finds that the destination
IP address in the message is the same as its own IP address and sends an ARP reply message to
DeviceA.
3. When DeviceA receives the ARP reply message, it compares the source IP address in the message
with the IP address specified in the command.
• If the two IP addresses are the same, DeviceA displays the source MAC address in the ARP
reply message along with a message indicating that the IP address is being used. In addition,
DeviceA stops the timer for ARP reply messages.
• If the two IP addresses are different, DeviceA discards the ARP reply message and displays a
message indicating that the IP address is not being used by any host.
If DeviceA does not receive any ARP reply messages before the timer for ARP reply messages
expires, it displays a message indicating that the IP address is not being used by any host.
A device cannot allow the arp-ping ip command to ping its own IP address, whereas the ping command allows
this function.
2022-07-08 1217
Feature Description
As shown in Figure 2, DeviceA uses ARP-Ping MAC to check whether MAC address 00E0-FCE7-2EF5 is
being used by another host. After receiving ICMP Echo Reply messages from all hosts on the network,
DeviceA displays the IP address of the host with the MAC address 00E0-FCE7-2EF5 and displays a
message indicating that the MAC address is being used by another host.
The ARP-Ping MAC implementation process is as follows:
1. After MAC address 00E0-FCE7-2EF5 is specified using a command on DeviceA, DeviceA broadcasts
an ICMP Echo Request message and starts a timer for ICMP Echo Reply messages.
2. After receiving the ICMP Echo Request message, all the other hosts on the same LAN send ICMP
Echo Reply messages to DeviceA.
3. After DeviceA receives an ICMP Echo Reply message from a host, DeviceA compares the source
MAC address in the message with the MAC address specified in the command.
• If the two MAC addresses are the same, DeviceA displays the source IP address in the ICMP
Echo Reply message along with a message indicating that the MAC address is being used. In
addition, DeviceA stops the timer for ICMP Echo Reply messages.
• If the two MAC addresses are different, DeviceA discards the ICMP Echo Reply message and
displays a message indicating that the MAC address is not being used by any host.
If DeviceA does not receive any ICMP Echo Reply messages before the timer for ICMP Echo Reply
messages expires, it displays a message indicating that the MAC address is not being used by any
host.
Usage Scenario
ARP-Ping applies to directly connected Ethernet LANs or Layer 2 Ethernet VPNs.
Benefits
ARP-Ping checks whether an IP or MAC address to be configured is being used, preventing address conflicts.
2022-07-08 1218
Feature Description
Background
Figure 1 shows a typical network topology with a VRRP group deployed. In the topology, Device A is a
master device, and Device B is a backup device. In normal circumstances, Device A forwards uplink and
downlink traffic. If Device A or the link between Device A and the Switch becomes faulty, a master/backup
VRRP switchover is triggered to switch Device B to the Master state. Device B needs to advertise a network
segment route to a device on the network side to direct downlink traffic to Device B. If Device B has not
learned ARP entries from a device on the user side, the downlink traffic is interrupted.
Dual-device ARP hot backup applies in both Virtual Router Redundancy Protocol (VRRP) and enhanced trunk (E-Trunk)
scenarios. This section describes the implementation of dual-device ARP hot backup in VRRP scenarios.
Implementation
After you deploy dual-device ARP hot backup, the new master device forwards the downlink traffic without
learning ARP entries again. Dual-device ARP hot backup ensures downlink traffic continuity.
As shown in Figure 2, a VRRP group is configured on Device A and Device B. Device A is a master device, and
Device B is a backup device. Device A forwards uplink and downlink traffic.
2022-07-08 1219
Feature Description
If Device A or the link between Device A and the Switch becomes faulty, a master/backup VRRP switchover is
triggered to switch Device B to the Master state. Device B needs to advertise a network segment route to a
device on the network side to direct downlink traffic to Device B.
• Before you deploy dual-device ARP hot backup, Device B does not learn ARP entries from a device on
the user side and therefore a large number of ARP Miss messages are transmitted. As a result, system
resources are consumed and downlink traffic is interrupted.
• After you deploy dual-device ARP hot backup, Device B backs up ARP information on Device A in real
time. When Device B receives downlink traffic, it forwards the downlink traffic based on the backup ARP
information.
Usage Scenario
Dual-device ARP hot backup applies when VRRP or E-Trunk is deployed to implement a master/backup
device switchover.
To ensure that ARP entries are completely backed up, set the VRRP or E-Trunk switchback delay to a value greater than
the number of ARP entries that need to be backed up divided by the slowest backup speed.
Benefits
Dual-device ARP hot backup prevents downlink traffic from being interrupted because the backup device
does not learn ARP entries of a device on the user side during a master/backup device switchover, which
improves network reliability.
2022-07-08 1220
Feature Description
Background
To minimize the impact of device faults on services and improve network availability, a network device must
be able to quickly detect communication faults of devices that are not directly connected. Then, measures
can be taken to quickly rectify the faults to ensure the normal running of services.
Association between ARP and interface status allows the local interface to send ARP probe packets to the
peer interface and checks whether the peer interface can properly forward packets based on whether a reply
packet is received. This triggers fast route convergence.
Related Concepts
• ARP probe message
An ARP probe message sent from the local device to the peer device is an ARP request packet.
• Working mode
■ Strict mode
In strict mode, an interface sends ARP probe messages when the physical status is Up.
The protocol status of the local interface remains unchanged only when the local interface receives
an ARP reply packet from the peer interface and the source IP address of the ARP reply packet is
the same as the destination IP address of the ARP probe packet. If no ARP reply packet is received
from the peer interface within the allowable attempts, the protocol status of the local interface is
set to Down.
■ Loose mode
In loose mode, an interface sends ARP probe messages only when both the physical status and
protocol status are Up.
The protocol status of the local interface remains unchanged only when the local interface receives
an ARP packet from the peer interface and the source IP address of the ARP packet is the same as
the destination IP address of the ARP probe packet. If no ARP packet is received from the peer
interface within the allowable attempts, the protocol status of the local interface is set to Down.
If association between ARP and interface status is configured on devices at both ends, you are advised to configure
at lease the device at one side to work in strict mode. Do not configure devices at both ends to send ARP probe
messages in loose mode.
Implementation
2022-07-08 1221
Feature Description
Figure 1 shows how association between ARP and interface status is implemented.
As shown in Figure 1, association between ARP and interface status is deployed on DeviceA.
When DeviceA works in strict or loose mode (the physical status of the local interface is Up):
• If DeviceA receives ARP reply messages from DeviceB, the protocol status of the interface on
DeviceA remains unchanged.
• If DeviceA does not receive ARP reply messages from DeviceB, the protocol status of the interface
on DeviceA is set to Down.
Usage Scenario
Association between ARP and interface status is used when a communication fault occurs between network
devices that are not directly connected.
Benefits
If association between ARP and interface status is deployed, fast route convergence is triggered upon a link
fault so that the normal running of services can be ensured.
Networking Description
As shown in Figure 1, to facilitate ease of management, communication isolation is implemented for various
departments on the intranet of a company. For example, although Host A of the president's office, Host B of
the R&D department, and Host C of the financial department belong to the same VLAN, they cannot
communicate at Layer 2. However, the business requires that the president's office communicate with the
financial department. To permit this, enable intra-VLAN proxy ARP on the CE so that Host A can
2022-07-08 1222
Feature Description
• Before intra-VLAN proxy ARP is enabled, if Host A sends an ARP request message for the MAC address
of Host C, the message cannot be broadcast to hosts of the R&D department and financial department
because port isolation is configured on the CE. Therefore, Host A can never learn the MAC address of
Host C and cannot communicate with Host C.
• After intra-VLAN proxy ARP is enabled, the CE does not discard the ARP request message sent from
Host A even if the destination IP address in the message is not its own IP address. Instead, the CE sends
the MAC address of its interface 1 to Host A. Host A then sends IP datagrams to this MAC address.
Feature Deployment
Configure interface 1, which is a Layer 3 interface, on the CE, and enable intra-VLAN proxy ARP. After the
deployment, the CE sends the MAC address of its interface 1 to Host A when receiving a request for the MAC
address of Host C from Host A. Host A then sends IP datagrams to the CE, which forwards the IP datagrams
to Host C. Consequently, the communication between Host A and Host C is implemented.
2022-07-08 1223
Feature Description
Networking Description
As shown in Figure 1, the intranet of an organization communicates with the Internet through the gateway
PE. To prevent network attackers from obtaining private information by modifying ARP entries on the PE,
deploy static ARP.
• Before static ARP is deployed, the PE dynamically learns and updates ARP entries using ARP messages.
However, dynamic ARP entries can be aged and overwritten by new dynamic ARP entries. Therefore,
network attackers can send fake ARP messages to modify ARP entries on the PE to obtain the private
information of the organization.
• After static ARP is deployed, ARP entries on the PE are manually configured and maintained by a
network administrator. Static ARP entries are neither aged nor overwritten by dynamic ARP entries.
Therefore, deploying static ARP can prevent network attackers from sending fake ARP messages to
modify ARP entries on the PE, and information security is ensured.
Feature Deployment
Deploy static ARP on the PE to set up fixed mapping between IP and MAC addresses of hosts on the
intranet. This can prevent network attackers from sending fake ARP messages to modify ARP entries on the
PE, ensuring the stability and security of network communication and minimizing the risk of private
information being stolen.
Terms
2022-07-08 1224
Feature Description
Term Definition
Definition
As the name indicates, an Access Control List (ACL) is a list. The list contains matching clauses, which are
actually matching rules and used to tell the device to perform action on the packet or not.
Purpose
ACLs are used to ensure reliable data transmission between devices on a network by performing the
following:
• Defend the network against various attacks, such as attacks by using IP, Transmission Control Protocol
(TCP), or Internet Control Message Protocol (ICMP) packets.
• Control network access. For example, ACLs can be used to control enterprise network user access to
external networks, to specify the specific network resources accessible to users, and to define the time
ranges in which users can access networks.
• Limit network traffic and improve network performance. For example, ACLs can be used to limit the
bandwidth for upstream and downstream traffic and to apply charging rules to user requested
bandwidth, therefore achieving efficient utilization of network resources.
2022-07-08 1225
Feature Description
Benefits
ACL rules are used to classify packets. After ACL rules are applied to a Router, the Router permits or denies
packets based on them. The use of ACL rules therefore greatly improves network security.
An ACL is a set of rules. It identifies a type of packet but does not filter packets. Other ACL-associated functions are used
to filter identified packets.
ACL Classification
ACL can be classified as ACL4 or ACL6 based on the support for IPv4 or IPv6.
Basic ACL Defines rules based on packets' source addresses. 2000 to 2999
Layer 2 ACL Defines rules based on the Layer 2 information, 4000 to 4999
such as the source MAC address, destination
MAC address, or protocol type of Ethernet
frames.
MPLS-based Defines rules based on MPLS packets' EXP values, 10000 to 10999
ACL labels, or TTL values.
2022-07-08 1226
Feature Description
Basic ACL6 Defines rules based on packets' source addresses. 2000 to 2999
For easy memorization, use names instead of numbers to define ACLs. Just like using domain names to
replace IP addresses. ACLs of this type are called named ACLs. The ACL stated above called numbered ACLs.
The only difference between named and numbered ACLs is that the former ones are more recognizable
owing to descriptive names.
When naming an ACL, you can specify a number for it. If no number is specified, the system will allocate one
automatically.
One name is only for one ACL. Multiple ACLs cannot have the same name, even if they are of different types.
ACL Increment
An ACL increment is the difference between two adjacent ACL rule numbers that are automatically
allocated. For example, if the ACL increment is set to 5, the rule numbers are multiples of 5, such as 5, 10,
15, and 20.
• If an ACL increment is changed, rules in the ACL are automatically renumbered. For example, if the ACL
increment is changed from 5 to 2, the original rule numbers 5, 10, 15, and 20 will be renumbered as 2,
4, and 6.
• If the default increment 5 is restored for an ACL, the system immediately renumbers the rules in the
ACL based on the default increment. For example, if the increment of ACL 3001 is 2, rules in ACL 3001
are numbered 0, 2, 4, and 6. If the default increment 5 is restored, the rules will be renumbered as 5,
10, 15, and 20.
2022-07-08 1227
Feature Description
An ACL increment can be used to maintain ACL rules and makes it convenient to add new ACL rules. If a
user has created four rules numbered 0, 5, 10, and 15 in an ACL, the user can add a rule (for example, rule
number 1) between rules 0 and 5.
• An absolute time range start from yyyy-mm-dd to yyyy-mm-dd. This time range is effective once and
does not repeat.
• A cyclic time range is cyclic, with a one week cycle. For example, an ACL rule takes effect from 8:00 to
12:00 every Sunday.
What is "Matched"
Matched: the ACL exists, and there is a rule to which the packet conforms, no matter the rule is permit or
deny.
Mismatched: the ACL does not exist, or there is no rule in the ACL, or the packet does not conform to any
rules of the ACL.
Then, the device matches packets against rules in order according to the rule ID. When packets match one
rule, the match operation is complete, and no more rules will be matched against.
A rule is identified by a rule ID, which is configured by a user or generated by the system according to the ACL
increment. All rules in an ACL are arranged in ascending order of rule IDs.
If the rule ID is automatically allocated, there is a certain space between two rule IDs. The size of the space depends on
the ACL increment. For example, if the ACL increment is set to 5, the difference between two rule IDs are 5, such as 5,
10, 15, and the rest may be deduced by analogy. If the ACL increment is 2, the rule IDs generated automatically by the
system start from 2. In this manner, the user can add a rule before the first rule.
In configuration file, the rules are displayed in ascending order of rule IDs, not in the order of configuration.
Rule can be arranged in two modes: Configuration mode and Auto mode. The default mode is Configuration.
• If the Configuration mode is used, users can set rule IDs or allow a device to automatically allocate rule
IDs based on the increment.
2022-07-08 1228
Feature Description
If rule IDs are specified when rules are configured, the rules are inserted at places specified by the rule
IDs. For example, three rules with IDs 5, 10, and 15 exist on a device. If a new rule with ID 3 is
configured, the rules are displayed in ascending order, 3, 5, 10, and 15. This is the same as inserting a
rule before ID 5. If users do not set rule IDs, the device automatically allocates rule IDs based on the
increment. For example, if the ACL increment is set to 5, the difference or interval between two rule IDs
is 5, such as 5, 10, 15, and the rest may be deduced by analogy.
If the ACL increment is set to 2, the device allocates rule IDs starting from 2. The increment allows users
to insert new rules, facilitating rule maintenance. For example, the ACL increment is 5 by default. If a
user does not configure a rule ID, the system automatically generates a rule ID 5 as the first rule. If the
user intends to add a new rule before rule 5, the user only needs to input a rule ID smaller than 5. After
the automatic realignment, the new rule becomes the first rule.
In the Configuration mode, the system matches rules in ascending order of rule IDs. As a result, a latter
configured rule may be matched earlier.
• If the auto mode is used, the system automatically allocates rule IDs, and places the most precise rule in
the front of the ACL based on the depth-first principle. This can be implemented by comparing the
address wildcard. The smaller the wildcard, the narrower the specified range.
For example, 172.16.1.1 0.0.0.0 specifies a host with the IP address 172.16.1.1, and 172.16.1.1 0.0.0.255
specifies a network segment with the network segment address ranging from 172.16.1.1 to
172.16.1.255. The former specifies a narrower host range and is placed before the latter.
The detailed operations are as follows:
■ For basic ACL rules, the source address wildcards are compared. If the source address wildcards are
the same, the system matches packets against the ACL rules based on the configuration order.
■ For advanced ACL rules, the protocol ranges and then the source address wildcards are compared.
If both the protocol ranges and the source wildcards are the same, the destination address
wildcards are then compared. If the destination address wildcards are also the same, the ranges of
source port numbers are compared with the smaller range being allocated a higher precedence. If
the ranges of source port numbers are still the same, the ranges of destination port numbers are
compared with the smaller range being allocated a higher precedence. If the ranges of destination
port numbers are still the same, the system matches packets against ACL rules based on the
configuration order of rules.
For example, a wide range of packets are specified for packet filtering. Later, it is required that packets matching a
specific feature in the range be allowed to pass. If the auto mode is configured in this case, the administrator only
needs to define a specific rule and does not need to re-order the rules because a narrower range is allocated a
higher precedence in the auto mode.
2022-07-08 1229
Feature Description
Interface- Rules with any set are matched last, and other rules are matched in the order they are
based ACL configured.
Basic ACL Rules with VPN instance information are matched before those without VPN instance
information.
If multiple rules contain the same VPN instance information, the rule with the smaller
source IP addresses range (more 1s in the masks) is matched first.
If multiple rules contain the same VPN instance information and the same source IP
address range, they are matched in the order they are configured.
Advanced Rules with VPN instance information are matched before those without VPN instance
ACL information.
If multiple rules contain the same VPN instance information, the rule that contains the
protocol type is matched first.
If multiple rules contain the same VPN instance information and the same protocol
type, the rule with the smaller source IP address range (more 1s in the masks) is
matched first.
If multiple rules contain the same VPN instance information, protocol type, and source
IP address range, the rule with the smaller destination IP address range (more 1s in
the masks) is matched first.
If multiple rules contain the same VPN instance information, protocol type, source IP
address range, and destination IP address range, the rule with the smaller Layer 4 port
number range (TCP/UDP port numbers) is matched first.
If multiple rules contain the same VPN instance information, protocol type, source and
destination IP address ranges, and port number range, they are matched in the order
they are configured.
Layer 2 ACL Rules with smaller wildcards of Layer 2 protocol types (more 1s in the masks) are
matched first.
If multiple rules contain the same Layer 2 protocol type wildcard, the rule with the
smaller source MAC address range (more 1s in the masks) is matched first.
If multiple rules contain the same Layer 2 protocol type wildcard and the same source
MAC address range, the rule with the smaller destination MAC address range (more 1s
in the masks) is matched first.
If multiple rules contain the same Layer 2 protocol type wildcard, source and
destination MAC address ranges, the rule with the smaller VLAN ID of the outer tag is
matched first.
If multiple rules contain the same Layer 2 protocol type wildcard, source and
destination MAC address ranges, and VLAN ID of the outer tag, the rule with the
2022-07-08 1230
Feature Description
User ACL The rule that contains the protocol type is matched first.
(UCL) If multiple rules contain the same VPN instance information and the same protocol
type, the rule with the smaller source IP address range (more 1s in the masks) is
matched first.
If multiple rules contain the same VPN instance information, protocol type, and source
IP address range, the rule with the smaller destination IP address range (more 1s in
the masks) is matched first.
If multiple rules contain the same VPN instance information, protocol type, source IP
address range, and destination IP address range, the rule with the smaller Layer 4 port
number range (TCP/UDP port numbers) is matched first.
If multiple rules contain the same VPN instance information, protocol type, source and
destination IP address ranges, and port number range, they are matched in the order
they are configured.
• Checking continues until a match is found. And stop to check once a match is found. Therefore,
different arrangement orders may have different results even all the rules in an ACL are the same.
2022-07-08 1231
Feature Description
The performance for mismatched case depends on the ACL application. For detailed information, see
Table 2.
Table 2 The default value of the application modules for mismatched case
2022-07-08 1232
Feature Description
Example
The following commands are configured one after another:
rule deny ip dscp 30 destination 1.1.0.0 0.0.255.255
rule permit ip dscp 30 destination 1.1.1.0 0.0.0.255
If the config mode is used, the rules in the ACL are displayed as follows:
acl 3000
rule 5 deny ip dscp 30 destination 1.1.0.0 0.0.255.255
rule 10 permit ip dscp 30 destination 1.1.1.0 0.0.0.255
If the auto mode is used, the rules in the ACL are displayed as follows:
acl 3000
rule 5 permit ip dscp 30 destination 1.1.1.0 0.0.0.255
rule 10 deny ip dscp 30 destination 1.1.0.0 0.0.255.255
If the device receives a packet with DSCP value 30 and destination IP address 1.1.1.1, the packet is dropped
when the config mode is used, but the packet is allowed to pass when the auto mode is used.
Filtering Principle
When an ACL is applied to Telnet, SNMP, FTP or TFTP:
• If the source IP address of the user matches the permit rule, the user is allowed to log on.
• If the source IP address of the user matches the deny rule, the user is prohibited from logging on.
• If the source IP address of the user does not match any rule in the ACL, the user is prohibit logging on.
• If there is no rule in the ACL, or the ACL does not exist, all users are allowed to log in.
2022-07-08 1233
Feature Description
The default behavior is deny if the source IP address of the user does not match any rule in the ACL applied to FTP.
When an ACL is applied to SNMP, if receiving a packet with the community name field being null, the device directly
discards the packet without filtering the packet based on the ACL rule. In addition, the log about the community name
error is generated. ACL filtering is triggered only when the community name is not null.
2022-07-08 1234
Feature Description
If the NMS server belongs to a VPN, the VPN instance must be configured in the rule of ACL.
An ACL can be referenced in the VTY user interface accessed using Telnet only after user authentication is successful.
After a TCP connection is set up, to reference the ACL, you need to perform the following operations:
• If the login mode configured in the VTY user interface is SSH login, run the ssh server acl command.
• If the login mode configured in the VTY user interface is Telnet login, run the telnet server acl command.
• Classifier: defines traffic class. A Classifier can be configured with one or more if-match clauses. A
Classifier with non if-match clause is also allowed. Each if-match can be applied with an ACL. Multiple
Classifiers can use the same ACL. An ACL can be configured with one or more rules.
• Behavior: defines action(s) that can be applied to a traffic classifier. A Behavior can have one or more
actions.
• Traffic-Policy: associates traffic classifiers and behaviors. When the Traffic-Policy configuration is
complete, apply the Traffic-Policy to the interface to make Traffic Policy take effect.
Figure 1 shows relationships between an interface, traffic policy, traffic behavior, traffic classifier, and ACL.
2022-07-08 1235
Feature Description
Figure 1 Relationships between an interface, traffic policy, traffic behavior, traffic classifier, and ACL
By default, the order of classifier A, B, and C are 1, 2, and 3, which is the same as configuration order. Now if
you want to move the classifier A to be the last one, you can run the following command:
classifier A behavior A precedence 4
The precedence 1 is not used, so you can add a classifier (named D) before classifier B by the following
command:
classifier D behavior D precedence 1
2022-07-08 1236
Feature Description
#
traffic policy T
classifier D behavior D precedence 1
classifier B behavior B
classifier C behavior C
classifier A behavior A precedence 4
#
If you can add the classifier D by the following command not specifying precedence:
classifier D behavior D
• AND: Packets that match all the if-match clauses configured in a traffic classifier belong to this traffic
classifier.
• OR: Packets that match any one of the if-match clauses configured in a traffic classifier belong to this
traffic classifier.
2022-07-08 1237
Feature Description
As shown in the Figure 2, for each Classifier, if the logic between If-match clauses is OR, a packet is matched
against If-match clauses in the order of the If-match clauses configuration. Once the packet is matched with
an if-match clause:
• If there is no ACL applied to the matched if-match clause, then the related behavior is executed.
• If an ACL is applied to the matched if-match clause, and the packet matched with the permit rule, then
the related behavior is executed.
• If an ACL is applied to the matched if-match clause, and the packet matched with the deny rule, then
the packet is discarded directly and the related behavior is not executed.
If the packet is not matched any if-match clause, the related behavior is not executed, and the next Classifier
is processed for the packet.
2022-07-08 1238
Feature Description
Note: the rules of the ACL will not be combined. Therefore, the order of the If-match clauses in And logic
does not impact on the final matching result, but the order of the rules in the ACL still impacts on the final
result.
For example, in the following configuration:
#
acl 3000
rule 5 permit ip source 1.1.1.1 0
rule 10 deny ip source 2.2.2.2 0
#
traffic classifier example operator and
if-match acl 3000
if-match dscp af11
#
The device will combine all if-match clauses. The combination result is the same as the following
configurations.
#
acl 3000
rule 5 permit ip source 1.1.1.1 0 dscp af11
rule 10 deny ip source 2.2.2.2 0 dscp af11
#
traffic classifier example operator or
if-match acl 3000
#
traffic behavior example
remark dscp af22
#
traffic policy example
share-mode
classifier example behavior example
#
interface GigabitEthernet0/1/2
traffic-policy P inbound
#
Then, the device process the combined If-match clause according to the procedure of OR logic. The result is,
the DSCP of the packets is remark as AF22 if the packet is received from GE2/0/0 and the DSCP is 10 and the
source IP address is 1.1.1.1/32; the DSCP of the packets is discarded if the packet is received from GE0/1/2
and the DSCP is 10 and the source IP address is 1.1.1.2/32; other packets are forwarded directly since they
are not matched any rule.
In the License, AND logic permits only one if-match clause applied with ACL, and OR logic permits multiple if-match
clauses applied with ACL.
2022-07-08 1239
Feature Description
For traffic behavior sampling, even if a packet matches a rule that defines a deny action, the traffic behavior takes effect
for the packet.
A permit or deny action can be specified in an ACL for a traffic classifier to work with specific traffic
behaviors as follows:
• If the deny action is specified in an ACL, the packet that matches the ACL is denied, regardless of what
the traffic behavior defines.
• If the permit action is specified in an ACL, the traffic behavior applies to the packet that matches the
ACL.
For example, the following configuration leads to such a result: the IP precedence of packets with the source
IP address 10.1.1.1/24 are re-marked as 7; the packets with the source IP address 10.1.1.2/24 are dropped;
the packets with the source IP address 10.1.2.1/24 are forwarded with the IP precedence unchanged.
acl 3999
rule 5 permit ip source 10.1.1.1 0.0.0.255
rule 10 deny ip source 10.1.1.2 0.0.0.255
traffic classifier acl
if-match acl 3999
traffic behavior test
remark ip-pre 7
traffic policy test
classifier acl behavior test
interface GigabitEthernet0/1/1
traffic-policy test inbound
2022-07-08 1240
Feature Description
apply local-preference 20
#
route-policy a permit node 2
if-match acl 2001
if-match as-path-filter 3
apply cost 1000
#
route-policy a permit node 3
if-match ip-prefix prefix1
• A Route-policy can have multiple nodes. The logic between the nodes is "OR". The device processes the
nodes according to the ascending order of the node number. If the route matches one of the nodes, the
route is considered to match the policy, and the matching action is not continued for the matched
routes.
• Each node can have one or more if-match clauses and apply clauses.
The if-match clauses define the matching rules and the matching objects are route attributes. The logic
between the if-match clauses in the same node is "AND". If the route matches all the if-match clauses,
the route is considered to match the node. If the route does not match all the if-match clauses of the
node, the route continues to be matched against the next node.
The apply clauses define the action applied to the route that matches the node.
2022-07-08 1241
Feature Description
Permit Permit Yes The route is considered to match the if-match clause, and the
device continues to process the rest if-match clauses in the same
node.
If the route matches all if-match clause, then the apply clause is
executed and the device does not continue to match against the
rest nodes for this route.
If the route does not match all if-match clauses, the apply clause is
not executed. The device just continues to process the rest nodes
for the route. If there is no rest node, the route is "deny".
2022-07-08 1242
Feature Description
No The route is considered to not match the if-match clause, and the
apply clause is not executed. The device just continues to process
the rest nodes for the route. If there is no rest node, the
mismatched route is "deny".
Permit Deny Yes The node does not take effect, and the device just continues to
process the rest nodes for the route. If there is no rest node, the
No
route is "deny".
Deny Permit Yes The route is "deny", and the apply clause is not executed. And the
device does not continue to process the rest nodes for the route.
No The route does not match the if-match clause, and the apply clause
is not executed. The device just continues to process the rest nodes
for the route. If there is no rest node, the route is "deny".
Deny Deny Yes The node does not take effect, and the device just continues to
process the rest nodes for the route. If there is no rest node, the
No
route is "deny".
• The device continues to process the rest nodes if the route is deny by the ACL.
• The device continues to process the rest nodes if the route does not match any rule in the ACL.
• It is recommended that you configure deny rules with smaller numbers to filter out the unwanted routes. Then,
configure permit rules with larger numbers in the same ACL to receive or advertise the other routes.
• It is recommended that you configure permit rules with a smaller number to permit the routes to be received or
advertised by the device. Then, configure deny rules with larger numbers in the same ACL to filter out unwanted
routes.
The relative ACL does not Route policy does not support this kind of ACL.
exist
The relative ACL exists The if-match clause matching result is set as "deny". The device stops to
and there are rules in the process the other if-match clauses, and the apply clause is not executed.
ACL, but the rule If there are rest nodes, the device continues to process the rest nodes for the
2022-07-08 1243
Feature Description
Example2
In the following configurations, the result is, only the static route 10.1.1.1/24 can be imported to BGP, and
the local-preferences of all routes are modified.
acl number 2000
rule 5 permit source 10.1.1.1 0.0.0.255
#
route-policy policy1 permit node 10
if-match acl 2000
apply local-preference 1300
#
bgp 100
import-route static route-policy policy1
#
Example3
In the following configurations, the result is, all routes to 10.1.0.0/24 cannot be advertised to BGP VPNv4
peer 1.1.1.1, no matter the L3VPNs the denied routes belong to. The "vpn-instance vpnb" does not take
effect.
acl number 2000
2022-07-08 1244
Feature Description
• The static route 10.1.1.0/24 matches the ACL in node 10 and the node 10 is permit, so the local-
preference of 10.1.1.0/24 is modified to 1300.
• The static route 10.1.2.0/24 does not match node 10, but matches node 20. There is no rule in node20,
so all attributes of 10.1.1.0/24 are not modified.
The result is, both the static routes are imported to BGP, and only the local-preference of 10.1.1.0/24 is
2022-07-08 1245
Feature Description
modified.
Node Is Permit, Rule Is Deny.
Configuration example:
acl number 2000
rule 1 deny source 10.1.1.0 0.0.0.255
#
route-policy policy1 permit node 10
if-match acl 2000
apply local-preference 1300
#
route-policy policy1 permit node 20
#
bgp 100
import-route static route-policy policy1
#
• 10.1.1.0/24 matches the deny rule in node 10, so 10.1.1.0/24 is denied, the apply clause in node 10 is
not executed for 10.1.1.0/24, and the device continues to process node 20. As a result, 10.1.1.0/24 is
imported to BGP and its local-preference is not changed.
• 10.1.2.0/24 does not match any rule in node 10, so the apply clause in node 10 is not executed, and the
device continues to process node 20 for 10.1.2.0/24. As a result, 10.1.2.0/24 is imported to BGP.
The result is, both the static routes are imported to BGP, and the local-preferences of both routes are not
modified.
Node Is Deny, Rule Is Permit.
Configuration example:
acl number 2000
rule 1 permit source 10.1.1.0 0.0.0.255
#
route-policy policy1 deny node 10
if-match acl 2000
apply local-preference 1300
#
route-policy policy1 permit node 20
#
bgp 100
import-route static route-policy policy1
#
• 10.1.1.0/24 matches the permit rule in node 10 and the node 10 is deny, so 10.1.1.0/24 is denied, the
apply clause in node 10 is not executed for 10.1.1.0/24, and the device stops to process node 20. As a
result, 10.1.1.0/24 is not imported to BGP and its local-preference is not modified.
• 10.1.2.0/24 does not match node 10, so the apply clause in node 10 is not executed for 10.1.2.0/24, and
the device continues to process node 20 for 10.1.2.0/24. As a result, 10.1.2.0/24 is imported to BGP.
The result is, only 10.1.2.0/24 is imported to BGP and its local-preference is not modified.
Node Is Deny, Rule Is Deny.
2022-07-08 1246
Feature Description
Configuration example:
acl number 2000
rule 1 deny source 10.1.1.0 0.0.0.255
#
route-policy policy1 deny node 10
if-match acl 2000
apply local-preference 1300
#
route-policy policy1 permit node 20
#
bgp 100
import-route static route-policy policy1
#
• 10.1.1.0/24 matches the deny rule in node 10, so 10.1.1.0/24 is denied, the apply clause in node 10 is
not executed for 10.1.1.0/24, and the device continues to process node 20. As a result, 10.1.1.0/24 is
imported to BGP and its local-preference is not modified.
• 10.1.2.0/24 does not match node 10, so the apply clause in node 10 is not executed for 10.1.2.0/24, and
the device continues to process node 20 for 10.1.2.0/24. As a result, 10.1.2.0/24 is imported to BGP.
The result is, both the static routes are imported to BGP, and the local-preferences of both routes are not
modified.
If you do not want to advertise the routes to 10.1.1.0/24 and 10.1.2.0/24 on RTB, you can configure the
following commands:
[RTB] acl 2000
[RTB-acl2000] rule 5 deny source 10.1.1.0 0.0.0.255
[RTB-acl2000] rule 10 deny source 10.1.2.0 0.0.0.255
[RTB-acl2000] rule 15 permit source any
[RTB] ospf 100
[RTB-ospf-100] filter-policy acl 2000 export
Filter-policy impacts only on the routes advertised to or received from neighbors, not on the routes imported from a
route protocol to another route protocol. To import routes learned by other routing protocols, run the import-route
command in the OSPF view.
2022-07-08 1247
Feature Description
There are rules in the ACL but no rule The route is not imported or advertised
is matched
The ACL does not exist All routes are imported or advertised
The ACL exists but there is no rule in All routes are not imported or advertised
the ACL
2022-07-08 1248
Feature Description
Example 1
Only the static route 10.1.0.0/24 can be advertised to BGP peer.
acl number 2000
rule 5 permit source 10.1.0.0 0.0.0.255
#
bgp 100
ipv4-family unicast
filter-policy acl 2000 export
#
Example 2
All routes to 10.1.0.0/24 cannot be advertised to all BGP VPNv4 peers, no matter the L3VPNs the denied
routes belong to. The "vpn-instance vpnb" does not take effect.
acl number 2000
rule 5 deny source 10.1.0.0 0.0.0.255 vpn-instance vpnb
rule 10 permit
#
route-policy policy1 permit node 10
if-match acl 2000
#
bgp 100
ipv4-family vpnv4
filter-policy 2000 export
#
• If a multicast route matches the permit rule, the action defined in the multicast policy is executed.
• If a multicast route matches the deny rule, the action defined in the multicast policy is not executed.
• If a multicast route does not match any rule, or the ACL does not exist, or there is no rule in the ACL,
the multicast route is denied in most multicast policies. For detailed information, see Table 1.
2022-07-08 1249
Feature Description
static-rp group-policy No rule in the ACL is The default action is permit (the RP
c-rp group-policy matched. provides services for the multicast group).
The ACL does not exist. The default action is permit (the RP
provides services for all the multicast
There is no rule in the ACL.
groups 224.0.0.0/4).
Multicast boundary policy No rule in the ACL is The default action is deny (the multicast
matched. group address is not in the multicast
boundary range).
The ACL does not exist or The default action is permit (all groups are
there is no rule in the ACL. in the multicast boundary range).
Other multicast policies No rule in the ACL is The default action is deny (the action in the
matched. policy is not performed).
• A basic ACL can be used to specify the range of source addresses (unicast addresses) or the range of
multicast group addresses for multicast data packets and multicast protocol packets. A basic ACL
applied to a multicast policy supports only the source and time-range parameters.
• An advanced ACL applied to a multicast policy supports only two or three parameters:
Named ACLs applied to multicast policies must be advanced ACLs. Otherwise, the ACLs do not take effect.
2022-07-08 1250
Feature Description
A basic ACL applied to a multicast policy supports only the source and time-range parameters, and does not
support other parameters, such as a destination IP address, VPN instance, and packet length.
An advanced ACL applied to a multicast policy supports only the source, destination, and time-range
parameters, and does not support other parameters, such as a VPN instance and packet length.
If the unsupported parameters are applied to an ACL applied to a multicast policy, their matching result is
permit by default.
Example 1
In the following configuration, multicast FRR is enabled for all multicast entries.
<HUAWEI> system-view
[~HUAWEI] acl name frracl
[*HUAWEI-acl4-advance-frracl] rule permit ip source 10.0.0.1 0 destination 226.0.0.1 0
[*HUAWEI-acl4-advance-frracl] rule permit ip packet-length eq 65535
[*HUAWEI-acl4-advance-frracl] commit
[~HUAWEI-acl4-advance-frracl] quit
[~HUAWEI] multicast routing-enable
[~HUAWEI] pim
[*HUAWEI-pim] rpf-frr policy acl-name frracl
Module Function
TCP/IP attack defense Directly discards the TCP/IP attack packet. TCP/IP
attack defense function is enabled by default.
TCP/IP attack defense supports discarding the
following four kinds of attack packets.
Malformed packets: IP null payload packets, IGMP
null payload packets, LAND attack packets, Smurf
attack packets, and packets with invalid TCP flag
bits.
Invalid fragmented packets: repetitive fragmented
packets, Tear Drop attack packets, syndrop attack
packets, nesta attack packets, fawx attack packets,
bonk attack packets, NewTear attack packets, Rose
attack packets, dead ping attack packets, and Jolt
2022-07-08 1251
Feature Description
Module Function
attack packets.
UDP flood attack packets: UDP packets whose
destination interface numbers are 7, 13, and 19.
TCP SYN flood attack packets.
The whitelist, blacklist, and user-defined flow use ACL to define the characters of the flow.
Each CPU defend policy can be configured with one whitelist, one blacklist, and one or more user-defined
flows, as shown in the following figure.
cpu-defend policy 4
whitelist acl 2001
2022-07-08 1252
Feature Description
By default, the packet to CPU is matched in the order of whitelist --> blacklist --> user-defined flow. This order can be
modified by commands.
1. Performs the URPF, TCP/IP attack defense, and GTSM check. Continues to do the next step for the
packets that pass the checks. The packets not pass the checks are discarded.
2. Matches against the whitelist. Performs CAR and go to step 5 for the packet those match the permit
rule. Discards the packets those match the deny rule. Continues to do the next step for the
mismatched packet.
3. Matches against the blacklist. Performs CAR and go to step 5 for the packet those match the permit
rule. Discards the packets those match the deny rule. Continues to do the next step for the
mismatched packet.
4. Matches against the user-defined flow. Performs CAR and go to step 5 for the packet those match the
permit rule. Discards the packets those match the deny rule. Continues to do the next step for the
mismatched packet.
5. Checks all packets based on application layer association. Sends only the packets belong to enabled
protocols. The packets belong to disabled protocols are discarded.
Directly drops the management packets received from the non-management interfaces.
2022-07-08 1253
Feature Description
The packet matches the NAT is not executed, the packet is forwarded directly.
deny rule
The relative ACL does not NAT is not executed, all packet are forwarded directly.
exist
• An inexistent ACL or an ACL without any rule cannot be applied to IPsec policy.
• IPsec policy supports only advanced ACL (including numbered and named ACL).
• Rules in an advanced ACL can match data flows according to the source or destination IP address,
source or destination port, and protocol number only.
• The ACL applied to an IPsec policy does not support deny rule.
• The ACL cannot contain the rules that reference the address set or port set, and the ACL of the peer end
cannot contain the rules that reference the address set or port set.
• The source and destination port numbers in the ACL applied to an IPsec policy can be specified by the
eq parameter, rather than the lt, gt, and range parameters.
• An IPsec policy can only be applied one ACL. The original configuration must be deleted when a new
ACL is applied.
• ACLs configured in the same IPsec policy group cannot include the same rules.
2022-07-08 1254
Feature Description
The packet matches the The packet is processed by IPsec, and then be forwarded.
permit rule
The relative ACL does not IPsec does not support these kinds of ACLs
exist
BFD passive echo supports only basic ACLs, instead of advanced ACLs.
If the ACL applied to an established BFD session is modified, or a new ACL is applied to an established BFD session, the
ACL takes effect only when the session re-establishes or the parameters of the session is modified.
The session matches the Passive echo is enabled for the session
permit rule
The session matches the Passive echo is not enabled for the session
deny rule
2022-07-08 1255
Feature Description
all rules
The relative ACL does not Passive echo is not enabled for all sessions
exist
Terms
Term Definition
Interface-based ACL A list of rules for packet filtering based on the inbound interfaces of
packets.
Basic ACL A list of rules for packet filtering based on the source IP addresses of
packets.
Advanced ACL A list of rules for packet filtering based on the source or destination IP
addresses of packets and protocol types. It filters packets based on
protocol information, such as TCP source and destination port numbers
and the ICMP type and code.
Layer 2 ACL A list of rules for packet filtering based on the Ethernet frame header
information, such as source or destination Media Access Control (MAC)
addresses, protocol types of Ethernet frames, or 802.1p priorities.
User ACL A list of rules for packet filtering based on the source/destination IP
address, source/destination service group, source/destination user group,
source/destination port number, and protocol type.
MPLS-based ACL A list of rules for packet filtering based on the EXP values, Label values, or
TTL values of MPLS packets.
2022-07-08 1256
Feature Description
Definition
The Dynamic Host Configuration Protocol (DHCP) dynamically assigns IP addresses to hosts and centrally
manages host configurations. DHCP uses the client/server model. A client applies to the server for
configuration parameters, such as an IP address, subnet mask, and default gateway address; the server
replies with the requested configuration parameters.
DHCP and DHCPv6 are available for dynamic address allocation on IPv4 and IPv6 networks, respectively.
Though DHCP and DHCPv6 both use the client/server model, they are built based on different principles and
operate differently.
Purpose
A host can send packets to or receive packets from the Internet after it obtains an IP address, as well as the
router address, subnet mask, and DNS address.
The Bootstrap Protocol (BOOTP) was originally designed for diskless workstations to discover their own IP
addresses, the server address, the name of a file to be loaded into memory, and the gateway IP address.
BOOTP applies to a static scenario in which all hosts are allocated permanent IP addresses.
However, as the increasing network scale and network complexity complicate network configuration, the
proliferation of portable computers and wireless networks brings about host mobility, and the increasing
number of hosts causes IP address exhaustion, BOOTP is no longer applicable. To allow hosts to rapidly go
online or offline, as well as to improve IP address usage and support diskless workstations, an automatic
address allocation mechanism is needed based on the original BOOTP architecture.
DHCP was developed to implement automatic address allocation. DHCP extends BOOTP in the following
aspects:
• Allows a host to exchange messages with a server to obtain all requested configuration parameters.
Benefits
DHCP rapidly and dynamically allocates IP addresses, which improves IP address usage and prevents the
waste of IP addresses.
2022-07-08 1257
Feature Description
DHCP Architecture
Figure 1 shows the DHCP architecture.
• DHCP client
A DHCP client exchanges messages with a DHCP server to obtain an IP address and other configuration
parameters. A device interface can function as a DHCP client to dynamically obtain configuration
parameters from a DHCP server. This facilitates configuration and centralized management.
DHCP relay agents are not mandatory in the DHCP architecture. A DHCP relay agent is required only when the
server and client are located on different network segments.
• DHCP server
A DHCP server processes address allocation, lease extension, and address release requests originating
from a DHCP client or forwarded by a DHCP relay agent and assigns IP addresses and other
configuration parameters to the client.
To protect a DHCP server against network attacks, such as man-in-the-middle attacks, starvation attacks, and DoS
attacks by changing the CHADDR value, configure DHCP snooping on the intermediate device directly connecting
to a DHCP client to provide DHCP security services.
2022-07-08 1258
Feature Description
• DHCP Options
op 1 byte Message operation code that specifies the message type. The options are as
follows:
1: DHCP Request message
2: DHCP Reply message
The specific message type is carried in the options field.
htype 1 byte Hardware address type. For Ethernet, the value of this field is 1.
hlen 1 byte Hardware address length. For Ethernet, the value of this field is 6.
hops 1 byte Number of DHCP relay agents that have relayed this message. This field is set
to 0 by a DHCP client. The value increases by 1 each time a DHCP message
passes through a relay agent.
NOTE:
A maximum of 16 DHCP relay agents are allowed between a server and a client. If
this number is exceeded, DHCP messages are discarded.
xid 4 bytes Transaction ID for this message exchange. A DHCP client generates a random
2022-07-08 1259
Feature Description
number, which the client and server use to identify their message exchange.
secs 2 bytes Number of seconds elapsed since a DHCP client began to request an IP
address.
flags 2 bytes The leftmost bit determines whether the DHCP server unicasts or broadcasts
a DHCP Reply message. All remaining bits in this field are set to 0. The
options are as follows:
0: The DHCP server unicasts a DHCP Reply message.
1: The DHCP server broadcasts a DHCP Reply message.
ciaddr 4 bytes Client IP address. The IP address can be an existing IP address of a DHCP
client or an IP address assigned by a DHCP server to a DHCP client. During
initialization, the client has no IP address, and the value of this field is 0.0.0.0.
NOTE:
The IP address 0.0.0.0 is an invalid address that is used only for temporary
communication during system startup in DHCP mode.
yiaddr 4 bytes Client IP address assigned by the DHCP server. The DHCP server fills this field
into a DHCP Reply message.
siaddr 4 bytes Server IP address from which a DHCP client obtains the startup configuration
file.
giaddr 4 bytes Gateway IP address, which is the IP address of the first DHCP relay agent. If
the DHCP server and client are located on different network segments, the
first DHCP relay agent fills its own IP address into this field of the DHCP
Request message sent by the client. The relay agent forwards the message to
the DHCP server, which uses this field to determine the network segment
where the client resides. The DHCP server then assigns an IP address on this
network segment from an address pool.
The DHCP server also returns a DHCP Reply message to the first DHCP relay
agent. The DHCP relay agent then forwards the DHCP Reply message to the
client.
NOTE:
If the DHCP Request message passes through multiple DHCP Relay agents before
reaching the DHCP server, the value of this field remains as the IP address of the
first DHCP relay agent. However, the value of the Hops field increases by 1 each
time a DHCP Request message passes through a DHCP relay agent.
chaddr 16 bytes Client hardware address. This field must be consistent with the hardware type
2022-07-08 1260
Feature Description
and hardware length fields. When sending a DHCP Request message, the
client fills its hardware address into this field. For Ethernet, a 6-byte Ethernet
MAC address must be filled in this field when the hardware type and
hardware length fields are set to 1 and 6, respectively.
sname 64 bytes Server host name. This field is optional and contains the name of the server
from which a client obtains configuration parameters. The field is filled in by
the DHCP server and must contain a character string that ends with 0.
file 128 bytes Boot file name specified by the DHCP server for a DHCP client. This field is
optional and is delivered to the client when the IP address is assigned to the
client. The field is filled in by the DHCP server and must contain a character
string that ends with 0.
options Variable Optional parameters field. The length of this field must be at least 312 bytes.
This field contains the DHCP message type and configuration parameters
assigned by a server to a client, including the gateway IP address, DNS server
IP address, and IP address lease.
DHCP Options
In the DHCP options field, the first four bytes are decimal numbers 99, 130, 83 and 99, respectively. This is
the same as the magic cookie defined in standard protocols. The remaining bytes identify several options as
defined in standard protocols. One particular option, the DHCP Message Type option (Option 53), must be
included in every DHCP message. Option 53 defines DHCP message types, including the DHCPDISCOVER,
DHCPOFFER, DHCPREQUEST, DHCPACK, DHCPNAK, DHCPDECLINE, DHCPRELEASE, and DHCPINFORM
messages.
Type Description
DHCP DISCOVER A DHCP Discover message is broadcast by a DHCP client to locate a DHCP server
when the client attempts to access a network for the first time.
DHCP OFFER A DHCP Offer message is sent by a DHCP server in response to a DHCP Discover
message. A DHCP Offer message carries various configuration parameters.
2022-07-08 1261
Feature Description
Type Description
DHCP ACK A DHCP ACK message is sent by a DHCP server to acknowledge the DHCP Request
message from a DHCP client. After receiving a DHCP ACK message, the DHCP
client obtains the configuration parameters including the IP address.
DHCP NAK A DHCP NAK message is sent by a DHCP server to reject the DHCP Request
message from a DHCP client. For example, if a DHCP server cannot find matching
lease records after receiving a DHCP Request message, it sends a DHCP NAK
message indicating that no IP address is available for the DHCP client.
DHCP DECLINE A DHCP Decline message is sent by a DHCP client to notify the DHCP server that
the assigned IP address conflicts with another IP address. Then the DHCP client
applies to the DHCP server for another IP address.
DHCP RELEASE A DHCP Release message is sent by a DHCP client to release its IP address. After
receiving a DHCP Release message, the DHCP server can assign this IP address to
another DHCP client.
DHCP INFORM A DHCP Inform message is sent by a DHCP client to obtain other network
configuration parameters such as the gateway address and DNS server address
after the DHCP client has obtained an IP address.
• DHCP options
The options field in a DHCP message carries control information and parameters that are not defined in
common protocols. When a DHCP client requests an IP address from a DHCP server that has been
configured to encapsulate the options field, the server returns a DHCP Reply packet containing the
options field. Figure 2 shows the options field format.
The options field consists of the sub-fields Type, Length, and Value. Table 3 describes these sub-fields.
2022-07-08 1262
Feature Description
The type value of the options field ranges from 1 to 255. Table 4 lists common DHCP options.
Options ID Description
1 Subnet mask
3 Gateway address
6 DNS address
15 Domain name
44 NetBIOS name
50 Requested IP address
51 IP address lease
52 Additional option
54 Server identifier
2022-07-08 1263
Feature Description
Options ID Description
2022-07-08 1264
Feature Description
■ Option 43
Option 43 is called the vendor-specific information option. Figure 3 shows the Option 43 format.
DHCP servers and DHCP clients use Option 43 to exchange vendor-specific information. When a
DHCP server receives a DHCP Request message with parameter 43 encapsulated in Option 55, the
server encapsulates Option 43 in a DHCP Reply message and sends it to the DHCP client.
To implement extensibility and allocate more configuration parameters to DHCP clients, Option 43
supports sub-options, which are shown in Figure 3. Sub-options follow a similar format to that
used for Options. They contain a Type, Length, and Value sub-field. In the Type sub-field, the value
0x01 indicates the Auto-configuration server (ACS) parameter, the value 0x02 indicates the SP ID,
and the value 0x80 indicates the Preboot execution environment (PXE) server address.
If a device functions as a DHCP client, it can obtain the following information using Option 43:
■ ACS parameters, including the uniform resource locator (URL), user name, and password
■ SP ID that the Customer Premises Equipment (CPE) notifies the ACS of so that the ACS can
select configuration parameters from the specified SP
■ PXE server address, which is used by a DHCP client to obtain the Bootfile or control
information from the PXE server
■ Option 82
The Option 82 field is called the DHCP relay agent information field. It records the location of a
DHCP client. A DHCP relay agent or a DHCP snooping-enabled device appends the Option 82 field
to a DHCP Request message sent from a DHCP client and forwards the message to a DHCP server.
Servers use the Option 82 field to learn the location of DHCP clients, implement client security and
accounting, and make parameter assignment policies, allowing for more flexible address allocation.
The Option 82 field contains a maximum of 255 sub-options. If the Option 82 field is defined, at
least one sub-option must be defined.
The content of the Option 82 field is not uniformly defined, and vendors fill in the Option 82 field
as needed.
2022-07-08 1265
Feature Description
Related Concepts
The Dynamic Host Configuration Protocol (DHCP) dynamically assigns IP addresses to hosts and centrally
manages host configurations. DHCP uses the client/server model. A client applies to the server for
configuration parameters, such as an IP address, subnet mask, and default gateway address; the server
replies with the requested configuration parameters.
Usage Scenarios
With the DHCP client function configured, a device uses DHCP to dynamically request an IP address from the
DHCP server. This achieves appropriate assignment and centralized management of IP addresses.
Implementation
To obtain a valid dynamic IP address, a DHCP client exchanges different information with the DHCP server
at different stages. Generally, the DHCP client and server interact in the following modes:
As shown in Figure 1, the DHCP client establishes a connection with the DHCP server through the
following four stages:
1. Discovery stage: The DHCP client searches for a DHCP server. The DHCP client broadcasts a
DHCPDISCOVER message and only DHCP servers respond to the message.
2. Offer stage: Each DHCP server offers an IP address to the DHCP client. After receiving the
DHCPDISCOVER message from the DHCP client, each DHCP server selects an unassigned IP
address from the IP address pool, and sends a DHCPOFFER message with the leased IP address
and other configurations to the DHCP client.
3. Request stage: The DHCP client selects an IP address. If multiple DHCP servers send DHCPOFFER
messages to the DHCP client, the DHCP client accepts the first DHCPOFFER message it receives,
and broadcasts to each DHCP server a DHCPREQUEST message carrying information about the
selected IP address.
4. Acknowledgement stage: indicates the stage at which the DHCP server acknowledges the IP
address that is offered. When the selected DHCP server receives the DHCP Request message, it
searches for a related lease record based on the MAC address or Option 61 field in the received
message.
2022-07-08 1266
Feature Description
• If the related lease record exists, the DHCP server sends the DHCP client a DHCP ACK
message containing the DHCP client's IP address. After receiving the DHCP ACK message, the
DHCP client broadcasts a gratuitous ARP message to check whether any host is using the IP
address assigned by the DHCP server. If the DHCP client does not receive a response within a
specified period, it uses the IP address.
• If the related lease record does not exist or the DHCP server fails to properly assign IP
addresses, the DHCP server sends a DHCP NAK message to inform the DHCP client that it
cannot assign a proper IP address. In this case, the DHCP client has to send another DHCP
Discover message for a new application.
Figure 2 shows how a DHCP client establishes a connection with the DHCP server to update the IP
address lease.
1. When the IP address lease reaches 50% (T1), the DHCP client automatically sends a DHCP
Request message in unicast mode to the DHCP server to renew the IP address lease.
• If a DHCP NAK message is received, the DHCP client re-initiates the renewal procedure.
2. When the IP address lease reaches 87.5% (T2), if the DHCP client has not received a DHCP ACK
2022-07-08 1267
Feature Description
message yet, it broadcasts a DHCP Request message to DHCP servers to renew its IP address
lease.
• If a DHCP NAK message is received, the DHCP client re-initiates the renewal procedure.
3. If the DHCP client receives no response before the IP address lease expires, the DHCP client stops
using the current IP address and sends a DHCP Discover message to request a new IP address.
When a DHCP client accesses a network for the first time, the DHCP client goes through the following
stages to set up a connection with a DHCP server:
2022-07-08 1268
Feature Description
■ Discovery stage: The DHCP client searches for a DHCP server. The DHCP client broadcasts a
DHCPDISCOVER message and only DHCP servers respond to the message.
■ Offer stage: Each DHCP server offers an IP address to the DHCP client. After receiving the
DHCPDISCOVER message from the DHCP client, each DHCP server selects an unassigned IP address
from the IP address pool, and sends a DHCPOFFER message with the leased IP address and other
configurations to the DHCP client.
■ Request stage: The DHCP client selects an IP address. If multiple DHCP servers send DHCPOFFER
messages to the DHCP client, the DHCP client accepts the first DHCPOFFER message it receives,
and broadcasts to each DHCP server a DHCPREQUEST message carrying information about the
selected IP address.
■ Acknowledgment stage: The DHCP server acknowledges the IP address that is offered. After
receiving the DHCPREQUEST message, the DHCP server sends a DHCPACK message to the client.
The DHCPACK message contains the offered IP address and other settings. The DHCP client then
binds its TCP/IP protocol suite to the network interface card.
Except the IP address offered by the DHCP server selected by the DHCP client, the unassigned IP
addresses offered by other DHCP servers are available for other clients.
■ If the client has previously accessed the network correctly, it does not broadcast a DHCPDISCOVER
message. Instead, it broadcasts a DHCPREQUEST message that carries the previously assigned IP
address.
■ After receiving the DHCPREQUEST message, the DHCP server responds with a DHCPACK message if
the requested IP address is not assigned, notifying the client that it can continue to use the original
IP address.
■ If the IP address cannot be assigned to the DHCP client (for example, it has been assigned to
another client), the DHCP server responds with a DHCPNAK message to the client. After receiving
the DHCPNAK message, the client sends a DHCPDISCOVER message to apply for a new IP address.
2022-07-08 1269
Feature Description
■ When the DHCP client attempts to renew its lease by unicasting a DHCPREQUEST message to the
DHCP server according to the DHCP lease renewal process: If the DHCP server replies with a
DHCPACK message, the lease is successfully renewed. If the DHCP server replies with a DHCPNAK
message, the DHCP client needs to re-initiate a request.
■ When the DHCP server does not receive any response from the DHCP client for a period of time: If
address recycling is configured, the server reclaims the corresponding IP address. Otherwise, the
server reclaims the IP address when the lease expires, but not when it receives no response in the
period.
• Manual address assignment: An administrator binds fixed IP addresses to specific clients, such as the
WWW server, and uses DHCP to assign these IP addresses to the clients.
• Dynamic address assignment: DHCP assigns IP addresses with a validity period to clients. After the
validity period expires, the clients must re-apply for addresses. This address assignment mode is widely
adopted.
• IP address that is in the database of the DHCP server and is statically bound to the MAC address of the
client
• IP address that has previously been assigned to the client, that is, IP address in the requested IP Addr
Option of the DHCPDISCOVER message sent by the client
• IP address that is first found when the DHCP server searches the DHCP address pool for available IP
addresses
• If the DHCP address pool has no available IP address, the DHCP server searches the expired IP addresses
and conflicting IP addresses, and then assigns a valid IP address to the client. If all the IP addresses are
in use, an error message is reported.
2022-07-08 1270
Feature Description
is not in use, and the DHCP server assigns the IP address to a client. (This method is implemented based on
standard protocols.)
IP Address Reservation
DHCP supports IP address reservation for clients. The reserved IP addresses must belong to the address pool.
If an address in the address pool is reserved, it is no longer assignable. Addresses are usually reserved for
specific clients, such as DNS and WWW servers.
• If the flags field value is set to 1, the DHCP relay agent broadcasts DHCP reply messages to the DHCP
client.
Figure 1 DHCP client requesting an IP address through a DHCP relay agent for the first time (the flags field
value is set to 1)
• If the flags field value is set to 0, the DHCP relay agent unicasts DHCP reply messages to the DHCP
client.
2022-07-08 1271
Feature Description
Figure 2 DHCP client requesting an IP address through a DHCP relay agent for the first time (the flags field
value is set to 0)
1. When a DHCP client starts and initializes DHCP, it broadcasts a configuration request packet
(DHCPDISCOVER message) onto a local network. After a DHCP relay agent connecting to the local
network receives the broadcast packet, it processes and forwards the packet to the specified DHCP
server on another network.
2. After receiving the packet, the DHCP server sends the requested configuration parameters in a
DHCPOFFER message to the DHCP client through the DHCP relay agent.
3. The DHCP client replies to the DHCPOFFER message by broadcasting a DHCPREQUEST message.
Upon receipt, the DHCP relay agent sends the DHCPREQUEST message in unicast mode to the DHCP
server.
4. The DHCP server responds with a unicast DHCPACK or DHCPNAK message through the DHCP relay
agent.
DHCP Client Extending the IP Address Lease Through the DHCP Relay
Agent
An IP address dynamically assigned to a DHCP client usually has a validity period. The DHCP server
withdraws the IP address after the validity period expires. To continue using the IP address, the DHCP client
must renew the IP address lease.
The DHCP client enters the binding state after obtaining an IP address. The DHCP client has three timers to
control lease renewal, rebinding, and lease expiration. When assigning an IP address to the DHCP client, the
DHCP server can specify timer values. If the DHCP server does not specify timer values, the default values
are used. Table 1 describes the three timers.
Table 1 Timers
Lease renewal When the lease renewal timer expires, the DHCP 50% of the lease
2022-07-08 1272
Feature Description
Rebinding After the DHCP client sends a DHCPREQUEST 87.5% of the lease
message for extending the lease, the DHCP client
remains in the update state and waits for a
response. If the DHCP client does not receive any
responses from the server before the rebinding
timer expires, it considers the original DHCP server
unavailable and broadcasts a DHCPREQUEST
message. Any DHCP server on the network shown
in Figure 4 can reply to this request with a
DHCPACK or DHCPNAK message.
If the DHCP client receives a DHCPACK message, it
returns to the binding state and resets the lease
renewal timer and rebinding timer, as shown in
Figure 3. If the DHCP client receives a DHCPNAK
message, it stops using the current IP address
immediately and returns to the initializing state to
apply for a new IP address.
Lease expiration When the lease expires, the DHCP client stops using 100% of the lease
the current IP address and returns to the initializing
state to apply for a new IP address.
2022-07-08 1273
Feature Description
Figure 3 DHCP client extending the IP address lease by 50% through the DHCP relay agent
Figure 4 DHCP client extending the IP address lease by 87.5% through the DHCP relay agent
DHCP Relay Agent Setting the Priority of a DHCP Reply Message and
TTL Value of a DHCP Relay Message
• A DHCP relay agent can set the priority of DHCP reply messages. The priority of low-priority DHCP reply
messages can be raised so that they will not be discarded on access devices.
2022-07-08 1274
Feature Description
• A DHCP relay agent can set the TTL value of DHCP relay messages. The TTL value of DHCP relay
messages can be increased to prevent the messages from being discarded due to TTL becoming 0.
Principles
To implement PnP, a device must function as a DHCP client and obtain an IP address by exchanging DHCP
messages shown in Figure 1. The NMS can then use Telnet to log in to and configure the device.
1. A DHCP client is powered on and automatically starts the PnP process. The DHCP client broadcasts a
DHCP Discover message carrying Option 60 to apply for an IP address. The Option 60 field carries the
device identifier of the DHCP client.
2. After receiving the DHCP Discover message, the DHCP relay agent adds Option 82 to the message and
transmits the message in unicast mode to the NMS (DHCP server).
3. Based on the Option 60 and Option 82 fields in the message, the DHCP server searches the database
for a fixed IP address and sends a DHCP Offer message carrying the IP address to the DHCP relay
agent.
2022-07-08 1275
Feature Description
4. After receiving the DHCP Offer message, the DHCP relay agent forwards the message to the DHCP
client.
5. After receiving the DHCP Offer message, the DHCP client broadcasts a DHCP Request message.
6. After receiving the DHCP Request message, the DHCP relay agent adds Option 82 to the message and
transmits the message in unicast mode to the NMS.
7. The NMS confirms the IP address assigned to the DHCP client based on the data in the message and
sends a DHCP ACK message carrying the IP address to the DHCP relay agent.
8. After receiving the DHCP ACK message, the DHCP relay agent forwards the message to the DHCP
client.
9. After receiving the DHCP ACK message, the DHCP client sends gratuitous ARP messages to check
whether the IP address assigned to it is in use. If the IP address is available, the DHCP client obtains
the IP address, mask, and gateway address from the DHCP ACK message and generates a route based
on the information. Then the DHCP client automatically generates an IP address command
configuration in the configuration file. After these operations are complete, the DHCP client disables
the DHCP client function and stops sending or processing DHCP messages.
10. The NMS logs in to and configure the device. After the configuration takes effect, the device can be
used.
DHCP PnP reduces operation and maintenance (O&M) costs and improves O&M efficiency.
• A DHCP PnP-enabled device learns VLAN IDs automatically. This may affect other user configurations. If DHCP PnP
is not required, disable PnP on the DHCP client.
• After DHCP PnP is performed, the PnP default route is no longer required. Delete the default route on the DHCP
client to free up space in the routing table.
Service Overview
A DHCP server is used to assign IP addresses in the following scenarios:
• Manual configurations take a long time and bring difficulties to centralized management on a large
network.
• Hosts on the network outnumber the available IP addresses. Therefore, not every host can have a fixed
IP address assigned. For example, if service providers (SPs) limit the number of concurrent network
access users, many hosts must dynamically obtain IP addresses from the DHCP server.
2022-07-08 1276
Feature Description
Networking Description
On a typical DHCP network, a DHCP server and multiple DHCP clients exist, such as PCs and portable
computers. DHCP uses the client/server model. A client applies to the server for configuration parameters,
such as an IP address, subnet mask, and default gateway address; the server replies with the requested
configuration parameters. Figure 1 shows typical DHCP networking.
If a DHCP client and a DHCP server reside on different network segments, the client can obtain an IP address and other
configuration parameters from the server through a DHCP relay agent. For details about DHCP relay, see DHCP Relay.
Networking Description
DHCP server dual-device hot backup effectively implements rapid service switching by keeping user session
information synchronized on the master and backup devices in real time on the control and forwarding
planes. The user session information (including the IP address, MAC address, DHCP lease, and Option 82)
generated during user access from the master device is synchronized to the backup device. When VRRP
detects a link failure on the master device, a VRRP packet is sent to adjust the priority, triggering a
master/backup VRRP switchover. After the master/backup VRRP switchover is performed, the original backup
device takes over to assign addresses for new users or process lease renewal requests from online users.
Users are not aware of DHCP server switching.
Figure 1 shows the typical network with a VRRP group deployed. DeviceA and DeviceB are the master and
backup devices, respectively. Both DeviceA and DeviceB are DHCP servers that assign IP addresses to clients.
In normal situations, DeviceA processes DHCP users' login and lease renewal requests. If DeviceA or the link
between DeviceA and the switch fails, a master/backup VRRP switchover is performed. DeviceB then
becomes the master. DeviceB can assign addresses to new users or process lease renewal requests from
online users only after user session information on DeviceA has been synchronized to DeviceB.
2022-07-08 1277
Feature Description
Feature Deployment
If DeviceA or the link between DeviceA and the switch fails, new users cannot go online and the existing
online users cannot renew their leases. To resolve this issue, configure DHCP server dual-device hot backup
on DeviceA and DeviceB.
On the network shown in Figure 2, after DHCP server dual-device hot backup is configured on DeviceA and
DeviceB, DeviceB synchronizes user session information from DeviceA in real time. If a master/backup VRRP
switchover occurs, DeviceB can assign addresses to new users or process lease renewal requests from online
users based on the user session information synchronized from DeviceA.
2022-07-08 1278
Feature Description
configure a DHCP server on each network segment, which increases costs. The DHCP relay function solves
this problem.
Figure 1 illustrates the DHCP relay application. A DHCP client can apply for an IP address from a DHCP
server on another network segment through a DHCP relay agent. This function enables a single DHCP server
to serve DHCP clients on different network segments, which reduces costs and facilitates centralized
management.
DHCPv4 and DHCPv6 relay applications are the same. The DHCP relay application described in this section covers both
DHCPv4 and DHCPv6 relay. However, DHCPv4 and DHCPv6 relay cannot be used in the current version at the same
time.
Service Overview
Device installation is costly and needs to be complete in just one site visit. Engineers are classified into
hardware and software commissioning engineers. Hardware engineers install devices and lay out cables.
Software commissioning engineers are responsible for initial configuration. Hardware engineers must be on
site during device installation. To free software commissioning engineers from configuring devices on site,
you can configure DHCP PnP.
After DHCP PnP is configured, the NMS can use DHCP to configure and commission devices on a network
remotely. This solution effectively reduces operation and maintenance (O&M) costs.
Networking Description
Figure 1 shows a typical mobile bearer network. The NMS is connected to a device at the aggregation layer.
A large number of case-shaped UPEs exist on the network and are distributed sparsely. To reduce
installation expenditure and improve working efficiency, enable DHCP PnP on the case-shaped UPEs.
2022-07-08 1279
Feature Description
Feature Deployment
As shown in Figure 1, UPEs obtain management IP addresses using DHCP and are configured with NMS
parameters automatically. Then the NMS management channel is available to allow the NMS to make a
Telnet connection to UPEs and configure them remotely.
Definition
The Dynamic Host Configuration Protocol (DHCP) dynamically assigns IP addresses to hosts and centrally
manages host configurations. DHCP uses the client/server model. A client applies to the server for
configuration parameters, such as an IP address, subnet mask, and default gateway address; the server
replies with the requested configuration parameters.
DHCP and DHCPv6 are available for dynamic address allocation on IPv4 and IPv6 networks, respectively.
Though DHCP and DHCPv6 both use the client/server model, they are built based on different principles and
operate differently.
Purpose
A host can send packets to or receive packets from the Internet after it obtains an IP address, as well as the
router address, subnet mask, and DNS address.
The Bootstrap Protocol (BOOTP) was originally designed for diskless workstations to discover their own IP
addresses, the server address, the name of a file to be loaded into memory, and the gateway IP address.
BOOTP applies to a static scenario in which all hosts are allocated permanent IP addresses.
However, as the increasing network scale and network complexity complicate network configuration, the
proliferation of portable computers and wireless networks brings about host mobility, and the increasing
number of hosts causes IP address exhaustion, BOOTP is no longer applicable. To allow hosts to rapidly go
online or offline, as well as to improve IP address usage and support diskless workstations, an automatic
2022-07-08 1280
Feature Description
• Allows a host to exchange messages with a server to obtain all requested configuration parameters.
Benefits
DHCP rapidly and dynamically allocates IP addresses, which improves IP address usage and prevents the
waste of IP addresses.
• Manual configuration. IPv6 addresses/prefixes and other network configuration parameters are
manually configured, such as the DNS server address, network information service (NIS) server address,
and Simple Network Time Protocol (SNTP) server address.
• Stateless address allocation. A host uses the prefix carried in a received Router Advertisement (RA)
message and the local interface ID to automatically generate an IPv6 address.
• Stateful address autoconfiguration using DHCPv6. DHCPv6 address allocation can be implemented in
any of the following modes:
■ A DHCPv6 server automatically configures IPv6 addresses/prefixes and other network configuration
parameters, such as the DNS server address, NIS server address, and SNTP server address.
■ A host uses the prefix carried in a received RA message and the local interface ID to automatically
generate an IPv6 address. The DHCPv6 server assigns configuration parameters other than IPv6
addresses, such as the DNS server address, NIS server address, and SNTP server address.
■ DHCPv6 Prefix Delegation (PD). IPv6 prefixes do not need to be manually configured for the
downstream routers. The DHCPv6 prefix delegation mechanism allows a downstream router to
send DHCPv6 messages carrying the IA_PD option to an upstream router to apply for IPv6 prefixes.
After the upstream router assigns a prefix that has less than 64 bits to the downstream router, the
downstream router automatically subnets the delegated prefix into /64 prefixes and assigns the /64
2022-07-08 1281
Feature Description
prefixes to the links attached to IPv6 hosts through RA messages. This mechanism implements
automatic configuration of IPv6 addresses for IPv6 hosts and hierarchical IPv6 prefix delegation.
DHCPv6 Architecture
Figure 1 DHCPv6 architecture
Figure 1 shows the DHCPv6 architecture. The DHCPv6 architecture involves the following roles:
• DHCPv6 client: exchanges DHCPv6 messages with a DHCPv6 server to obtain an IPv6 address/prefix and
other configuration parameters.
• DHCPv6 relay agent: forwards DHCPv6 messages between a client and a server so that the client can
obtain an IPv6 address from the server. When DHCPv6 clients and servers reside on the same link, a
DHCPv6 client uses a link-local multicast address to obtain an IPv6 address/prefix and other
configuration parameters from a DHCPv6 server. If the DHCPv6 client and server reside on different
links, a DHCPv6 relay agent must be used to forward DHCPv6 messages between the client and server.
DHCPv6 relay allows a single DHCPv6 server to serve DHCPv6 clients on different links, reducing costs
and facilitating centralized management.
DHCPv6 relay agents are not mandatory in the DHCPv6 architecture. DHCPv6 relay agents are not needed when a
DHCPv6 client and a DHCPv6 server reside on the same link or they can exchange unicast packets for address
allocation or information configuration. DHCPv6 relay agents are needed only when a DHCPv6 client and a
DHCPv6 server reside on different links or they cannot exchange unicast packets.
• DHCPv6 server: processes address allocation, lease extension, and address release requests originating
from a DHCPv6 client or forwarded by a DHCPv6 relay agent and assigns IPv6 addresses/prefixes and
other configuration parameters to the client.
2022-07-08 1282
Feature Description
multicast packets instead of broadcast packets. DHCPv6 uses the following multicast addresses:
• DHCPv6 servers and relay agents listen to DHCPv6 messages on UDP port 547.
• Each DHCPv6 client or server has a DUID. A DHCPv6 server and a client use DUIDs to identify
each other.
• The client DUID is carried in the Client Identifier option, and the server DUID is carried in the
Server Identifier option. Both options have the same format. The option-code field value
determines whether the option is a Client Identifier or Server Identifier option. If the option-code
field value is 1, the option is a Client Identifier option. If the option-code field value is 2, the
option is a Server Identifier option.
• An IA is a construct through which a server and a client can identify, group, and manage a set of
related IPv6 addresses. Each IA consists of an IAID and associated configuration information.
• Each DHCPv6 client must associate one or more IAs with each of its interfaces that request to
obtain IPv6 addresses from a DHCPv6 server. The client uses the IAs associated with an interface
to obtain configuration information from a DHCPv6 server for that interface. Each IA must be
associated with an interface.
• Each IA has an identity association identifier (IAID), which must be unique among all IAIDs for
the IAs of a client. An IAID is not lost or changed due to a device restart.
• An interface is associated with one or more IAs. An IA contains one or more addresses.
2022-07-08 1283
Feature Description
• Introduction
• DHCPv6 Options
Introduction
• DHCPv6 message types
Unlike DHCP messages, DHCPv6 messages use the msg-type field in the header to identify the message
type. Table 1 lists the DHCPv6 message types.
2022-07-08 1284
Feature Description
configuration parameters.
2022-07-08 1285
Feature Description
DHCPv6 messages share an identical fixed format header and a variable format area for options, which
are different from those of DHCP messages. DHCPv6 messages transmitted between clients and servers
and between relay agents and servers have different header formats.
2022-07-08 1286
Feature Description
Only Relay-Forward and Relay-reply messages are exchanged between DHCPv6 relay agents and
2022-07-08 1287
Feature Description
options Variable Must include the Relay Must include the Relay
Message option; may Message option; may
include other options include other options
added by the relay
agent
DHCPv6 Options
• DHCPv6 options format
2022-07-08 1288
Feature Description
Overview
DHCPv6 relay agents relay DHCPv6 messages between DHCPv6 clients and servers that reside on different
2022-07-08 1289
Feature Description
network segments to facilitate dynamic address assignment. This function enables a single DHCPv6 server to
serve DHCPv6 clients on different network segments, which reduces costs and facilitates centralized
management.
• A DHCPv6 relay agent relays both messages from clients and Relay-Forward messages from other relay
agents. When a relay agent receives a valid message to be relayed, it constructs a new Relay-Forward
message. The relay agent copies the received DHCP message (excluding IP or UDP headers) into the
Relay Message option in the new message. If other options are configured on the relay agent, it also
adds them to the Relay-Forward message. Table 1 lists the fields that a DHCPv6 relay agent can
encapsulate into a Relay-Forward message.
Table 1 Fields that a DHCPv6 relay agent can encapsulate into a Relay-Forward message
Source address in the IP Set to the IPv6 global unicast address of the outbound interface.
header
Destination address in the Used to send unicast packets if the inbound interface is configured with a
IP header unicast address of a server or relay agent.
Used to send multicast packets to the All_DHCP_Servers multicast
address FF05::1:3 if the inbound interface is not configured with a unicast
address of any server or relay agent.
Hop limit in the IP header Set to 32 if the destination address is the All_DHCP_Servers multicast
address FF05::1:3.
Set to 255 if the destination address is a unicast address.
Link-address in the Relay- Set to a global unicast or link-local address assigned to the inbound
Forward message interface if the message comes from a client. The server then determines
the link it uses to assign addresses and other configuration parameters to
the client.
Set to 0 if the message comes from another relay agent.
2022-07-08 1290
Feature Description
Peer-address in the Relay- Set to the source address in the IP header of the received message.
Forward message
• A DHCPv6 relay agent relays a Relay-Reply message from a server. The relay agent extracts the Relay
Message option from a Relay-Reply message and relays it to the address contained in the peer-address
field of the Relay-Reply message. Table 2 lists the fields that a DHCPv6 relay agent can encapsulate into
a Relay-Reply message.
Table 2 Fields that a DHCPv6 relay agent can encapsulate into a Relay-Reply message
Source address in the IP Set to the IPv6 global unicast address of the outbound interface.
header
Destination address in the Set to the peer-address of the received outer Relay-Reply message.
IP header
Destination port number Set to 547 if the Relay-Reply message is sent to other relay agents.
in the UDP header Set to 546 if the message extracted from the Relay-Reply message is sent
to the client.
2022-07-08 1291
Feature Description
Figure 1 DHCPv6 client applying for an IP address to a DHCPv6 server through a DHCPv6 relay agent for the first
time
1. The DHCPv6 client sends a Solicit message to discover servers. The DHCPv6 relay agent that receives
the Solicit message constructs a Relay-Forward message with the Solicit message in the Relay
Message option and sends the Relay-Forward message to the DHCPv6 server.
2. After the DHCPv6 server receives the Relay-Forward message, it parses the Solicit message and
constructs a Relay-Reply message with the Advertise message in the Relay Message option. The
DHCPv6 server then sends the Relay-Reply message to the DHCPv6 relay agent. The DHCPv6 relay
agent parses the Relay Message option in the Relay-Reply message and sends the Advertise message
to the DHCPv6 client.
3. The DHCPv6 client then sends a Request message to request IP addresses and other configuration
parameters. The DHCPv6 relay agent constructs a Relay-Forward message with the Request message
in the Relay Message option and sends the Relay-Forward message to the DHCPv6 server.
4. After the DHCPv6 server receives the Relay-Forward message, it parses the Request message and
constructs a Relay-Reply message with the Reply message in the Relay Message option. The Reply
message contains the assigned IPv6 address and other configuration parameters. The DHCPv6 server
then sends the Relay-Reply message to the DHCPv6 relay agent. The DHCPv6 relay agent parses the
Relay Message option in the Relay-Reply message and sends the Reply message to the DHCPv6 client.
2022-07-08 1292
Feature Description
On the network shown in Figure 2, IPv6 prefixes do not need to be manually configured for the CPEs. The
DHCPv6 prefix delegation mechanism allows a CPE to apply for IPv6 prefixes by sending DHCPv6 messages
carrying the IA_PD option to the DHCPv6 server. After the DHCPv6 server assigns a prefix that has less than
64 bits to the CPE, the CPE automatically subnets the delegated prefix into /64 prefixes and assigns the /64
prefixes to the user network through RA messages. This mechanism implements automatic configuration of
IPv6 addresses for IPv6 hosts and hierarchical IPv6 prefix delegation.
If a DHCPv6 relay agent is deployed to forward DHCPv6 messages between CPEs (DHCPv6 clients) and the
DHCPv6 server, the DHCPv6 relay agent must set up routes to the network segments on which the clients
reside and advertises these network segments after the DHCPv6 server assigns PD prefixes to the clients.
Otherwise, core network devices cannot learn the routes destined for the CPEs, and IPv6 hosts cannot access
the network. If a client sends a Release message to the server to return a delegated prefix, or the lease of a
delegated prefix is not extended after expiration, the DHCPv6 relay agent deletes the network segment of
2022-07-08 1293
Feature Description
the client.
Figure 1 illustrates the DHCP relay application. A DHCP client can apply for an IP address from a DHCP
server on another network segment through a DHCP relay agent. This function enables a single DHCP server
to serve DHCP clients on different network segments, which reduces costs and facilitates centralized
management.
DHCPv4 and DHCPv6 relay applications are the same. The DHCP relay application described in this section covers both
DHCPv4 and DHCPv6 relay. However, DHCPv4 and DHCPv6 relay cannot be used in the current version at the same
time.
Networking Description
DHCPv6 relay dual-device hot standby effectively implements rapid service switching by keeping user session
information synchronized on the master and backup devices in real time on the control and forwarding
planes. The user session information (including the user DUID, MAC address, IPv6 address, and lease)
generated during user access from the master device is synchronized to the backup device. When
VRRP/VRRP6 detects a link failure on the master device, a VRRP/VRRP6 packet is sent to adjust the priority,
triggering a master/backup VRRP/VRRP6 switchover. After the master/backup VRRP/VRRP6 switchover is
2022-07-08 1294
Feature Description
performed, the original backup device takes over to provide user address assignment, lease renewal, and
data packet forwarding. Users are not aware of DHCPv6 relay agent switching.
Figure 1 shows the typical network with a VRRP/VRRP6 group deployed. DeviceA and DeviceB are the master
and backup devices, respectively. Both DeviceA and DeviceB are DHCPv6 relay agents that forward messages
between the DHCPv6 client and server. DeviceC and DeviceD function as the DHCPv6 client and server,
respectively. In normal situations, DeviceA forwards users' service packets. In addition, DeviceA generates
prefix routes based on the PD prefixes assigned by DeviceD to DeviceC, and advertises the prefix routes to
the network. DeviceD can then obtain the information about the routes to DeviceC and its connected user
terminals, so that the user terminals can access the network normally.
If DeviceA or the link between DeviceA and the switch fails, a master/backup VRRP/VRRP6 switchover is
performed. DeviceB then becomes the master. DeviceD can access the user terminals connected to DeviceC
only after DeviceB synchronizes DeviceA's PD prefix routes and advertises the prefix routes to the network.
Feature Deployment
If DeviceA or the link between DeviceA and the switch fails, new users cannot go online and the existing
online users cannot renew their leases. To resolve this issue, configure DHCPv6 relay dual-device hot standby
on DeviceA and DeviceB.
On the network shown in Figure 2, after DHCPv6 relay dual-device hot standby is configured on DeviceA and
2022-07-08 1295
Feature Description
DeviceB, DeviceB synchronizes DHCPv6 PD user information from DeviceA in real time and generates PD
prefix routes. If a master/backup VRRP/VRRP6 switchover is performed, DeviceD can access the user
terminals connected to the DeviceC through the PD prefix routes on DeviceB.
Definition
Domain Name System (DNS) is a distributed database for TCP/IP applications that provides conversion
between domain names and IP addresses.
Purpose
DNS uses a hierarchical naming method to specify a meaningful name for each device on the network and
uses a resolver to establish mappings between IP addresses and domain names. DNS allows users to use
meaningful and easy-to-memorize domain names instead of IP addresses to identify devices.
Benefits
When you check the continuity of a service, you can directly enter the domain name used to access the
service instead of the IP address. Even if the IP address used to access the service has changed, you can still
check continuity using the domain name, so long as the DNS server has obtained the new IP address.
Related Concepts
Static DNS is implemented based on the static domain name resolution table. The mapping between domain
names and IP addresses recorded in the table is manually configured. You can add some common domain
names to the table to facilitate resolution efficiency.
Implementation
Static domain name resolution requires a static domain name resolution table, which lists the mapping
created manually between domain names and IP addresses. The table contains commonly used domain
names. After searching for a specified domain name in the resolution table, clients can obtain the IP address
mapped to it. This process improves domain name resolution efficiency.
2022-07-08 1296
Feature Description
Usage Scenario
If HUAWEI NE40E-M2 series device functioning as a DNS client seldom uses domain names to access other
devices or no DNS server is available, you can configure static DNS on the device to resolve domain names.
Benefits
If there are not many hosts accessed by Telnet applications and the hosts do not change frequently, using
static DNS improves resolution efficiency.
Related Concepts
• Dynamic DNS: Client programs, such as ping and tracert, access the DNS server using the resolver of the
DNS client.
• Resolver: A server that provides mappings between domain names and IP addresses and handles a
user's request for domain name resolution.
• Recursive resolution: If a DNS server cannot find the IP address corresponding to a domain name, the
DNS server turns to other DNS servers for help and sends the resolved IP address to the DNS client.
• Query types:
■ Class-A query: a query used to request the IPv4 address corresponding to a domain name.
■ Class-AAAA query: a query used to request the IPv6 address corresponding to a domain name.
■ PTR query: a query used to request the domain name corresponding to an IP address.
Implementation
Dynamic DNS is implemented using the DNS server. Figure 1 shows the relationships between the client
program, resolver, DNS server, and cache.
2022-07-08 1297
Feature Description
The DNS client is composed of the resolver and cache and is responsible for accepting and responding to
DNS queries from client programs. Generally, the client program, cache, and resolver are on the same host,
whereas the DNS server is on another host.
1. A client program, such as a ping or tracert program, sends a DNS request carrying a domain name to
the DNS client.
2. After receiving the request, the DNS client searches the local database or the cache. If the required
DNS entry is not found, the DNS client sends a query packet to the DNS server. Currently, devices
support Class-A, Class-AAAA, and PTR queries.
3. The DNS server searches its local database for the IP address corresponding to the domain name
carried in the query packet. If the corresponding IP address cannot be found, the DNS server forwards
the query packet to the upper-level DNS server for help. The upper-level DNS server resolves the
domain name in recursive resolution mode, as specified in the query packet, and returns the resolution
result to the DNS server. The DNS server then sends the result to the DNS client.
4. After receiving the response packet from the DNS server, the DNS client sends the resolution result to
the client program.
Dynamic DNS allows you to define a domain name suffix list by pre-configuring some domain name suffixes. After you
enter a partial domain name, the DNS server automatically displays the complete domain name with different suffixes
for resolution.
Dynamic DNS supports TCP-based TLS-encrypted packet transmission. You can configure an SSL policy and load a digital
certificate on the DNS client and server in advance. During domain name resolution, the DNS server encrypts and
decrypts packets based on the configured SSL policy to improve DNS packet transmission security.
Usage Scenario
Dynamic DNS is used in scenarios in which a large number of mappings between domain names and IP
addresses exist and these mappings change frequently.
Benefits
If a large number of mappings between domain names and IP addresses exist, manually configuring DNS
entries on each DNS server is laborious. To solve this problem, use dynamic DNS instead. Dynamic DNS
effectively improves configuration efficiency and facilitates DNS management.
2022-07-08 1298
Feature Description
• If you seldom use domain names to visit other devices or no DNS server is available, configure static
DNS on the DNS client. To configure static DNS, you must know the mapping between domain names
and IP addresses. If a mapping changes, manually modify the DNS entry on the DNS client.
• If you want to use domain names to visit many devices and DNS servers are available, configure
dynamic DNS. Dynamic DNS requires DNS servers.
Definition
The maximum transmission unit (MTU) defines the maximum length of an IP packet that can be sent on an
interface without fragmentation. If the length of an IP packet exceeds the MTU, the packet is fragmented
before being sent out.
Application
At the data link layer, the MTU is used to limit the length of a frame. Each vendor may define different
MTUs for their products or even different product models.
Use an Ethernet as an example. Figure 1 shows a complete Ethernet frame.
2022-07-08 1299
Feature Description
On some devices:
• The MTU is configured on an Ethernet interface to indicate the maximum length of the IP packet in an
Ethernet frame. Here, the MTU is an IP MTU.
• The MTU is equal to the sum of the payload, destination MAC address, source MAC address, and packet
length. That is, MTU = IP MTU + 14 bytes.
• The MTU is equal to the sum of the payload, destination MAC address, source MAC address, CRC, and
packet length. That is, MTU = IP MTU + 18 bytes.
On the NE40E, the MTU is defined at Layer 3. As shown in Figure 2, the MTU indicates the maximum length
of the IP header and payload. If the MTU of an Ethernet interface is set to 1500 bytes, packets with the
maximum length of the IP header and payload less than 1500 bytes are not fragmented.
Purpose
The MTU determines the maximum number of bytes of a packet that a sender can send each time. It must
be correctly set to ensure normal communication between devices.
Original IPv4 Control Original IPv4 packets refer to IPv4 protocol packets sent from the control
2022-07-08 1300
Feature Description
packet sending plane plane of the local device. The source address of these packets is the local
device.
BGP, ICMP error, and BFD packets are protocol packets.
When the ping command is run on a device, the device sends an ICMP
request message with the source address being a local address.
Original IPv6 Control Original IPv6 packets refer to IPv6 protocol packets sent from the control
packet sending plane plane of the local device. The source address of these packets is the local
device.
When the ping ipv6 command is run on a device, the device sends an
ICMPv6 request message with the source address being a local address.
IPv4 packet Forwarding The device checks the MTU when sending a packet but not when
forwarding plane receiving a packet.
For the NE40E, the MTU configured on an interface is IP MTU, which is a
layer 3 concept (this MTU is also called interface MTU). As such, the
interface MTU typically takes effect only on Layer 3 traffic, but not on
Layer 2 traffic. A Layer 2 packet is usually not fragmented, even if its size
(including the IP header and payload) exceeds the interface MTU.
NOTE:
Typically, only the source and destination IPv6 nodes parse IPv6 extension headers. Transit nodes forward IPv6 packets
without performing IPv6 MTU-based packet fragmentation, which is performed only on the source node.
Forcible Fragmentation
By default, when the length of an IPv4 packet exceeds the interface MTU:
• If the DF bit in the IP header is set to 1, the packet is not fragmented. After receiving the packet, the
device discards it and returns an ICMP Packet Too Big message.
The NE40E supports forcible fragmentation. If forcible fragmentation is enabled, a board fragments all
oversized IPv4 packets (whose length exceeds the interface MTU) and sets the DF bit to 0.
Forcible fragmentation takes effect only for IPv4 packets.
By default, forcible fragmentation is disabled.
As shown in Figure 1, the control plane fragments IP packets and then encapsulates them with tunnel
headers (such as MPLS and L2TP) if needed before sending the packets to the forwarding plane. Because the
fragmentation process is implemented by software, the fragmentation rules are the same in different board
types.
If the size (including the IP header and payload) of non-MPLS packets sent from the control plane is greater
than the MTU configured on an outbound interface:
• If the DF bit in the IP header is set to 0, the packet is fragmented. In this case, the size (including the IP
header and payload) of each fragment is less than or equal to the MTU of the outbound interface.
• If the DF bit in the IP header is set to 1 and forcible fragmentation is enabled, the device fragments the
packet and sets the DF bit in each fragment to 0. (By default, forcible fragmentation is disabled. If the
clear ip df command is run on an interface, forcible fragmentation is enabled for protocol packets.)
For details about the fragmentation process of MPLS packets, see MPLS MTU Fragmentation.
The DF bit is usually set to 0 (fragmentation is enabled) for protocol packets, meaning that they are not
discarded by the local device even if they are longer than the MTU. Typically, the DF bit is set to 1
(fragmentation is disabled) for the protocol packets (ICMP packets) sent by a device only in the following
situations:
• The device is performing PMTU discovery, such as IPv6 PMTU negotiation or LDP/RSVP-TE PMTU
negotiation.
2022-07-08 1302
Feature Description
• Fragmentation takes effect only for traffic that needs to be forwarded over IPv4. The traffic includes
both raw IPv4 traffic that enters the device and traffic that needs to be forwarded over IPv4 after
decapsulation.
2022-07-08 1303
Feature Description
For example, MPLS L3VPN packets are MPLS encapsulated before being forwarded from a network-to-
network interface (NNI) to a user-to-network interface (UNI) on a PE. The PE fragments the packets
after removing MPLS labels.
Another example is in an L3VPN or Internet scenario where a customer-premises equipment (CPE) uses
a dot1q or QinQ VLAN tag termination sub-interface to access a PE. The packets sent from the CPE to
the PE are VLAN tagged. In such a scenario, the packets are also fragmented on the PE after the VLAN
tags are removed.
• Packets that are forwarded only through Layer 2 or MPLS are not fragmented.
• If a board supports forcible fragmentation (enabled using the ipv4 force-fragment enable command),
it ignores the DF bit. All oversized IPv4 packets are fragmented, and the DF bit is set to 0 after
fragmentation. By default (forcible fragmentation is disabled), if IPv4 packets are longer than the
interface MTU, the board discards those whose DF bit is set to 1 and returns ICMP Packet Too Big
messages to the source end. Forcible fragmentation takes effect only for IPv4 packets.
• IP header
• IP payload
• MPLS L3VPN scenario, the traffic forwarding from User-to-network interface (UNI) Network-to-network
interface (NNI).
2022-07-08 1304
Feature Description
• IP traffic on PEs or Ps is directed into LSPs using policy-based routing (PBR), the redirection function,
static routes, Interior Gateway Protocol (IGP) shortcuts, and forwarding adjacency.
• Packets originate from the control plane and are directed to LSPs. For example, run the ping -vpn-
instance or ping lsp command is executed on the device, the device originates ICMP Request messages.
These messages are IP packets and will be sent to MPLS tunnels.
• PMTU negotiated by LDP signalling (for details about this parameter, see chapter Protocols MTU
Negotiation)
• PMTU negotiated by RSVP-TE signalling (for details about this parameter, see chapter Protocols MTU
Negotiation)
Scenarios Parameters which may affect MPLS MTU value selection ("Y" indicates affect, "N"
indicates no affect, the smallest value among the affecting parameters is selected
as the MPLS MTU)
LDP LSP Y Y Y N N
MPLS-TE Y Y N Y N
LDP over TE Y Y Y N Y
NOTE:
In LDP over TE scenario, interface MTU on the tunnel interface affects MPLS MTU value selection,
because the LDP LSP is over TE tunnel and the TE tunnel interface is an out interface of the LDP
LSP.
2022-07-08 1305
Feature Description
Scenarios Parameters which may affect MPLS MTU value selection ("Y" indicates affect, "N"
indicates no affect, the smallest value among the affecting parameters is selected
as the MPLS MTU)
According to the above rules, the selected MPLS MTU on NE40E is impossible larger than the physical
interface MTU. Therefore, the size of the MPLS-labeled packets are less than or equal to the physical
interface MTU and will not be discarded by the local device if DF=0.
• if DF=0, the packet is fragmented. Each fragment (including the IP header and label) is less than or
equal to the MPLS MTU value.
• if DF=1, the packet is discarded and an ICMP Datagram Too Big message is sent to the source end.
For the MPLS packets received from the physical interface of the NE40E device, different board types may
have different MPLS MTU fragmentation modes, as shown in [xref]Target does not exist.
2022-07-08 1306
Feature Description
Table 2 Fragmentation implementation for IP packets entering MPLS tunnels (just for the packet received
from the physical interface)
- MPLS fragmentation is
Figure 2 Mode A
implemented only on the
ingress for IP packets
entering an MPLS tunnel.
When the total length of the
IP datagram and label of a
packet exceeds a specified
MPLS MTU, and DF is set to
0, the IP datagram is
fragmented. Each fragment
is attached one or more
MPLS labels and then
forwarded.
When the total length of the
IP datagram and label of a
packet exceeds a specified
MPLS MTU, and DF is set to
1,
if forcible-fragmentation is
disabled: the IP datagram is
attached with one or more
MPLS labels and then
forwarded, without being
fragmented.
if forcible-fragmentation is
enabled: the IP datagram is
fragmented, attached one or
more MPLS labels, and then
forwarded.
2022-07-08 1307
Feature Description
protocol (IP header) before the packet's inner IP header. After the packet is encapsulated with a GRE header
and transport protocol, its size may exceed the maximum size that the data link layer permits, resulting in a
forwarding failure. A GRE MTU is the maximum size of a non-fragmented IP packet to be sent before it
enters a GRE tunnel. After the packet enters the GRE tunnel, its maximum size must contain a GRE header
and transport protocol, as shown in Figure 1.
Before forwarding an IPv4 packet through a GRE tunnel, a device compares the packet's size with the
GRE MTU. If the packet's size exceeds the GRE MTU, the device fragments the packet and then
encapsulates a GRE header and transport protocol into each fragment. The fragments are not
reassembled during transmission. After the fragments reach the tunnel's peer device, they are
decapsulated and then reassembled. A GRE MTU can be manually configured or automatically learned.
PMTU learning can be enabled on a tunnel interface to prevent TCP packets encapsulated with BGP
messages from being fragmented multiple times during transmission, improving BGP message
transmission efficiency. On the network shown in Figure 2, DeviceA sends a probe packet with the
maximum length of 1500 bytes and DF value of 1. If the MTU of DeviceB is less than 1500 bytes,
DeviceB discards the probe packet and returns an ICMP error message carrying its own MTU. When
the message reaches DeviceA, DeviceA learns the new MTU. The final MTU (GRE MTU) learned by
DeviceA is the minimum MTU of the entire path minus 32 (20-byte IP header + 12-byte GRE
header). When the default MTU is 1500 bytes, the GRE MTU is 1468 (1500 – 32) bytes.
2022-07-08 1308
Feature Description
After PMTU learning is enabled on a device's tunnel interface, the device sends probe packets carrying
updated MTUs every 10 minutes.
■ If neither the tunnel pathmtu enable command nor the mtu command is run on a tunnel interface,
the GRE MTU is 1468 (1500 – 32) bytes.
■ If the mtu command is run on a tunnel interface, the GRE MTU is the MTU configured for the
tunnel interface minus 32.
■ If the mtu command is not run on a tunnel interface but the tunnel pathmtu enable command is
run, the GRE MTU is the minimum MTU of the tunnel interface minus 32.
■ The tunnel pathmtu enable and mtu commands cannot both be run on a tunnel interface.
■ In a scenario where PMTU learning is enabled for an IPv4 GRE tunnel and the minimum IPv4 MTU of
the tunnel is less than 1312 (1280 + 32) bytes, the IPv6 MTU learned by the corresponding IPv4 GRE
tunnel interface is less than 1280 bytes. If the ingress sends an IPv6 packet longer than the learned IPv6
MTU, the packet is dropped. To address this issue, perform either of the following operations:
■ Disable PMTU learning for the IPv4 GRE tunnel.
2022-07-08 1309
Feature Description
■ If the IPv4 GRE tunnel needs to have PMTU learning enabled and carry IPv6 packets, ensure that
the forwarding interfaces of the tunnel's transit nodes each have an IPv4 MTU of at least 1312
bytes.
■ If the mtu command is not run on a tunnel interface, the GRE MTU is 1468 (1500 – 32) bytes.
■ If the mtu command is run on a tunnel interface, the GRE MTU is the MTU configured for the
tunnel interface minus 32.
• Theoretical IPv4 over IPv6 MTU value = min (Outbound interface's IPv6 MTU value, IPv6 PMTU value) –
48 (IPv6 header)
• An IPv4 over IPv6 MTU value can be configured using the mtu command in the tunnel interface view.
2022-07-08 1310
Feature Description
Table 1 lists the parameters that affect the effective IPv4 over IPv6 MTU value.
Table 1 Parameters that affect the effective IPv4 over IPv6 MTU value
Scenario Parameters That Affect the Effective IPv4 over IPv6 MTU Value (√ Indicates
the Parameter That Affects the Value, and × Indicates the Parameter That
Does Not Affect the Value)
√ √ √
In an IPv6 over IPv4 tunnel scenario, packets can be fragmented during forwarding on an IPv4 network. Therefore, you
can choose to configure an MTU on a tunnel interface.
• If an MTU is configured on a tunnel interface, the IPv6 over IPv4 MTU is the configured MTU.
• If no MTU is configured on a tunnel interface, the IPv6 over IPv4 MTU is the default value (1500 bytes).
• Some devices check the MTU values carried in DD packets by default, while allowing users to disable
the MTU check.
• Some devices do not check the MTU values carried in DD packets by default, while allowing users to
enable the MTU check.
Implementation inconsistencies between vendor-specific devices are a common cause of OSPF adjacency
problems.
NE40E devices by default do not check MTU values carried in DD packets and set the MTU values to 0 bytes
before sending DD packets.
2022-07-08 1311
Feature Description
NE40E devices allow the setting of the MTU value in DD packets to be sent over a specified interface. After
the DD packets arrive at NE40E device, the device checks the interface MTU field and allows an OSPF
neighbor relationship to reach the Full state only if the interface MTU field in the packets is less than or
equal to the local MTU.
• Point-to-point (P2P) interfaces exchange Hello packets with the Padding field before they establish an
IS-IS neighbor relationship. After the IS-IS neighbor relationship is established, the P2P interfaces
exchange Hello packets without the padding field.
• Broadcast interfaces exchange Hello packets with the Padding field before and after they establish an
IS-IS neighbor relationship.
• The penultimate LSR assigned an implicit-null label uses the default LDP MTU equal to the MTU of the
local outbound interface mapped to the FEC.
• Except the preceding LSRs, each LSR selects a smaller value as the local LDP MTU. This value ranges
between the MTU of the local outbound interface mapped to the FEC and the MTU advertised by a
downstream LSR. If an LSR receives no MTU from any downstream LSR, the LSR uses the default LDP
2022-07-08 1312
Feature Description
A downstream LSR adds the calculated LDP MTU value to the MTU type-length-value (TLV) in a Label
Mapping message and sends the Label Mapping message upstream.
If an MTU value changes (such as when the local outbound interface or its configuration is changed), an LSR
recalculates an MTU value and sends a Label Mapping message carrying the new MTU value upstream. The
comparison process repeats to update MTUs along the LSP.
If an LSR receives a Label Mapping message that carries an unknown MTU TLV, the LSR forwards this
message to upstream LDP peers.
NE40E devices exchange Label Mapping messages to negotiate MPLS MTU values before they establish LDP
LSPs. Each message carries either of the following two MTU TLVs:
• Huawei proprietary MTU TLV: sent by Huawei routers by default. If an LDP peer cannot recognize this
Huawei proprietary MTU TLV, the LDP peer forwards the message with this TLV so that an LDP peer
relationship can still be established between the Huawei router and its peer.
• Relevant standards-compliant MTU TLV: specified by commands on NE40E. NE40E uses this MTU TLV to
negotiate with non-Huawei devices.
1. The ingress sends a Path message with the ADSPEC object that carries an MTU value. The smaller
MTU value between the MTU configured on the physical outbound interface and the configured MPLS
MTU is selected.
2. Upon receipt of the Path message, a transit LSR selects the smallest MTU among the received MTU
2022-07-08 1313
Feature Description
value, the MTU configured on the physical outbound interface, and the configured MPLS MTU. The
transit LSR then sends a Path message with the ADSPEC object that carries the smallest MTU value to
the downstream LSR. This process repeats until a Path message reaches the egress.
3. The egress uses the MTU value carried in the received Path message as the PMTU. The egress then
sends an Resv message that carries the PMTU value upstream to the ingress.
PWE3 Specify MTU in the mpls switch-l2vc One of the following MTUs with priorities in
command. descending order is selected:
Configure the mtu mtu-value command in MTU specified in the mpls l2vc command or
the PW template view. mpls switch-l2vc command
MTU configured in PW template
Interface MTU of the AC interface
Default MTU value (1500 bytes)
BGP Configure the mtu mtu-value command in One of the following MTUs with priorities in
VLL MPLS-L2VPN instance view. descending order is selected:
MTU configured in MPLS-L2VPN instance view
Default MTU value (1500 bytes)
VPLS Configure the mtu mtu-value command in One of the following MTUs with priorities in
the VSI view. descending order is selected:
MTU configured in VSI instance view
Default MTU value (1500 bytes)
By default, Huawei routers implement MTU negotiation for VCs or PWs. Two nodes must use the same MTU
to ensure that a VC or PW is established successfully. L2VPN MTUs are only used to establish VCs and PWs
and do not affect packet forwarding.
To communicate with non-Huawei devices that do not verify L2VPN MTU consistency, L2VPN MTU
consistency verification can be disabled on NE40E. This allows NE40E to establish VCs and PWs with the
non-Huawei devices.
Intra-AS VPN One VPN label and N public network labels. Value of N (depending on a
public network tunnel type):
Inter-AS VPN A packet transmitted within an AS carries one VPN label
N is 1 when packets are
Option A and N public network labels.
transmitted on an LDP LSP.
A packet transmitted between ASs carries no labels.
N is 1 when packets are
Inter-AS VPN A packet transmitted within an AS carries one VPN label transmitted on a static LSP.
Option B and N public network labels. N is 1 when packets are
A packet transmitted between ASs carries one VPN label. transmitted on a TE tunnel.
N is 2 when packets are
Inter-AS VPN A packet sent within the first AS carries one VPN label,
transmitted on a TE tunnel in
Option C one Border Gateway Protocol (BGP) label, and N public
the LDP over TE scenario.
network labels.
N is 3 when packets are
NOTE:
transmitted on a TE fast
For solution 1 (configuring inter-AS VPN Option C), an
MPLS packet sent within the first AS carries one VPN label, reroute (FRR) bypass tunnel in
one BGP label, and N public network labels.
For solution 2 (configuring inter-AS VPN Option C), an the LDP over TE scenario.
MPLS packet sent within the first AS carries one VPN label
NOTE:
and N public network labels.
A packet transmitted between ASs carries one VPN label The preceding N values take
effect when PHP is disabled. If
and one BGP label. PHP is enabled and
performed, N minus 1 (N – 1)
A packet sent within the second AS carries one VPN takes effect.
label, one BGP label, and N public network labels.
NOTE:
For solution 1 (configuring inter-AS VPN Option C), an
MPLS packet sent within the second AS carries one VPN
label, one BGP label, and N public network labels.
For solution 2 (configuring inter-AS VPN Option C), an
MPLS packet sent within the second AS carries one VPN
label and N public network labels.
Definition
2022-07-08 1315
Feature Description
Load balancing distributes traffic among multiple links available to the same destination.
Purpose
After load balancing is deployed, traffic is split into different links. When one link used in load balancing
fails, traffic can still be forwarded through other links.
Benefits
Load balancing offers the following benefits to carriers:
If the Forwarding Information Base (FIB) of a device has multiple entries with the same destination address
and mask but different next hops, outbound interfaces, or tunnel IDs, route load balancing can be
implemented.
2022-07-08 1316
Feature Description
• Solution 1: Configure multiple equal-cost routes with the same destination network segment but different next
hops and the maximum number of equal-cost routes for load balancing. This solution is mostly used among links
that directly connect two devices. However, this solution is being replaced with the trunk technology as the trunk
technology develops. Compared with this solution, the trunk technology saves IP addresses and facilitates
management by bundling links into a trunk.
• Solution 2: Separate destination IP addresses into several groups and allocate one link for each group. This solution
improves the utilization of bandwidth resources. However, if you use this solution to implement load balancing,
you must observe and analyze traffic and know the distribution and trends of traffic of various types.
• Load balancing distributes traffic among multiple links, providing higher bandwidth than each individual
link and preventing traffic congestion caused by link overload.
• Links used for load balancing back up each other. If a link fails, traffic can be automatically switched to
other available links, which increases link reliability.
Disadvantages:
Traffic is load-balanced randomly, which may result in poor traffic management.
2022-07-08 1317
Feature Description
Symmetric load balancing guarantees the data sequence but not the bandwidth usage.
2022-07-08 1318
Feature Description
Per-packet load-balancing means that the device sends packets in sequence alternately over the links used
for load balancing, as shown in Figure 3. Load is evenly distributed over the links.
• Links are of poor transmission quality. Delay, packet loss, or error packets may occur when the link
quality is poor.
• Packets are of varied sizes. When packets of different sizes are transmitted over the same link, under
circumstances of a steady transmission rate, small-sized packets may arrive at the peer first even
though they are sent later than large-sized packets. Therefore, check whether packet disorder is
tolerable and the links have the mechanism of keeping the original transmission sequence on the live
network before using per-packet load balancing.
As per-packet load balancing may cause packet disorder, it is not recommended for key services that
are sensitive to packet sequence, such as voice and video services.
By default, the load balancing modes of the traffic on control plane and forwarding plane are per-flow.
ECMP
ECMP evenly load-balances traffic over multiple equal-cost paths to a destination, irrespective of bandwidth.
2022-07-08 1319
Feature Description
UCMP
UCMP load-balances traffic over multiple equal-cost paths to a destination based on bandwidth ratios. All
paths carry traffic based on their bandwidth ratios. As shown in Figure 2. This increases bandwidth usage.
Trunk load balancing does not have ECMP or UCMP, but has similar functions. For example, if interfaces of different
rates, for example, GE and FE interfaces, are bundled into a trunk interface, and weights are assigned to the trunk
member interfaces, traffic can be load-balanced over trunk member links based on link weights. This is implemented in
a similar way as UCMP. By default, the trunk member interfaces have the same weight of 1. The default implementation
is similar to ECMP, but all member interfaces can only have the lowest forwarding capability among all.
2022-07-08 1320
Feature Description
• Among the paths used for UCMP, the bandwidth of any link cannot be smaller than the total bandwidth
divided by the maximum number of load balanced paths supported on the board. Otherwise, the path
carries no traffic.
In an ECMP scenario shown in Figure 1, when a link fails, all traffic will be load balancing again using the
hash algorithm to prevent traffic interruption. All the traffic will then be load balanced among normal links.
As a result, traffic forwarding paths may change. Requests of the same user may be sent to different servers,
greatly affecting the services in which sessions need to be maintained.
ECMP load balancing consistency function provides a method to solve the preceding problem. In Figure 2,
this function enables hash calculation to be performed only for traffic on the faulty link, without affecting
traffic on other normal links. This function maintains service sessions on normal links.
2022-07-08 1321
Feature Description
Figure 2 Traffic forwarding based on hash calculation of ECMP load balancing consistency
■ OSPF is configured on Device A, Device B, Device C, Device D, and Device E. OSPF learns three
2022-07-08 1322
Feature Description
different routes.
■ Packets entering Device A through Port 1 and heading for Device E are sent to the destination
according to specific load balancing modes by the three routes, implementing load balancing.
In Figure 2, two equal-cost LSPs exist between Device B and Device C so that MPLS load balancing can
be performed.
2022-07-08 1323
Feature Description
Two-Level Hash
When links connecting to next hops are trunk links, the traffic that is hashed based on protocol-based load
balancing is further hashed based on the trunk forwarding table. This is the two-level hash.
In Figure 5, traffic is load balanced between Device A and Device B, and between Device B and Device C. If
the two load balancing processes use the same algorithm to calculate the hash key, the same flow is always
2022-07-08 1324
Feature Description
distributed to the same link. In this case, the forwarding of the traffic is unbalanced.
Two-level load balancing works as follows:
A random number is introduced to the hash algorithm on each device. Random numbers vary depending on
devices, which ensures different hash results.
9.8.4.1.1 Overview
Huawei NE40E can implement load balancing using static routes and a variety of routing protocols, including
the Routing Information Protocol (RIP), RIP next generation (RIPng), Open Shortest Path First (OSPF),
OSPFv3, Intermediate System-to-Intermediate System (IS-IS), and Border Gateway Protocol (BGP).
When multiple dynamic routes participate in load-balancing, these routes must have equal metric. As metric
can be compared only among routes of the same protocol, only routes of the same protocol can load-
balance traffic.
Conditions
When the maximum number of static routes that load-balance traffic and the maximum number of routes
of all types that load-balance traffic are both greater than 1, the following rules apply:
• If N active static routes with the same prefix are available and N is less than or equal to the maximum
number of static routes that can be used to load-balance traffic, traffic is load-balanced among the N
static routes.
• If a static route is active and has N iterative next hops, traffic is load-balanced among N routes, which is
called iterative load balancing.
In Figure 1, R1 learns two OSPF routes to 10.1.1.2/32, both with the cost 2. The outbound interface and next
hop of one route are GE 0/1/0 and 10.1.1.34, and the outbound interface and next hop of the other route
are GE 0/2/0 and 10.1.1.38.
2022-07-08 1325
Feature Description
Although only one static route is configured, two iterative next hops (10.1.1.34 and 10.1.1.38) are
available. Therefore, the number of static routes displayed in the routing table is 1, but the number of
FIB entries is 2.
Traffic is load-balanced among three routes, although the cost of the new static route is different from
that of the other two routes.
• If the following command is run to set the priority of the new static route to 1:
ip route-static 10.1.1.45 30 10.1.1.42 preference 1
R1 will preferentially select the static route with next hop 10.1.1.42. As a result, the other static routes
become invalid, and traffic is no longer load-balanced.
Conditions
If the maximum number of OSPF routes that can be used to load-balance traffic and the maximum number
of routes of all types that can be used to load-balance traffic are both greater than 1 and multiple OSPF
routes with the same prefix exist, these routes participate in load balancing only when the following
conditions are met:
• These routes are of the same type (intra-area, inter-area, Type-1 external, or Type-2 external).
• If these routes are Type-2 external routes, the costs of the links to the ASBR or forwarding address are
the same.
2022-07-08 1326
Feature Description
• If OSPF route selection specified in relevant standards is implemented, these routes have the same area
ID.
The OSPF route selection rules specified in relevant standards are different from those in relevant standards. By default,
Huawei NE40E perform OSPF route selection based on the rules specified in relevant standards. To implement OSPF
route selection based on the rules specified in relevant standards, run the undo rfc1583 compatible command.
Principles
If the number of OSPF routes available for load balancing is greater than the configured maximum number
of OSPF routes that can be used to load-balance traffic, OSPF selects routes for load balancing in the
following order:
Weight indicates the route preference, and the weight of the next hop can be changed by the nexthop command
(in OSPF view). Routing protocols and their default preferences:
• DIRECT: 0
• STATIC: 60
• IS-IS: 15
• OSPF: 10
• OSPF ASE: 150
• OSPF NSSA: 150
• RIP: 100
• IBGP: 255
• EBGP: 255
Each interface has an index, which can be seen in the display interface interface-name command in any view.
Conditions
If the maximum number of IS-IS routes that can be used to load-balance traffic and the maximum number
of routes of all types that can be used to load-balance traffic are both greater than 1 and multiple IS-IS
2022-07-08 1327
Feature Description
routes with the same prefix exist, these routes can participate in load balancing only when the following
conditions are met:
Principles
If the number of IS-IS routes available for load balancing is greater than the configured maximum number
of IS-IS routes that can be used to load-balance traffic, IS-IS selects routes for load balancing in the
following order:
Weight indicates the route preference, and the weight of the next hop can be changed by the nexthop command
(in IS-IS view). Routing protocols and their default preferences:
• DIRECT: 0
• STATIC: 60
• IS-IS: 15
• OSPF: 10
• OSPF ASE: 150
• OSPF NSSA: 150
• RIP: 100
• IBGP: 255
• EBGP: 255
Each interface has an index, which can be seen in the display interface interface-name command in any view.
6. Routes carrying IPv4, IPv6, and OSI next hop addresses, in descending order
8. If all the preceding items are the same, IS-IS selects the routes that are first calculated for load
balancing.
2022-07-08 1328
Feature Description
If the maximum number of BGP routes that can be used to load-balance traffic and the maximum number
of routes of all types that can be used to load-balance traffic are both greater than 1, load balancing can be
performed among BGP routes in either of the following modes:
• By default, static routes or equal-cost IGP routes are used for BGP route recursion to implement load
balancing among BGP routes.
• BGP route attributes are changed and then routes are selected to implement load balancing when the
following conditions are met:
■ The routes have the same origin type (IGP, EGP, or incomplete).
■ All the routes are EBGP or IBGP routes. If the maximum load-balancing eibgp command is run, BGP
ignores this comparison item when selecting the optimal VPN route.
■ The metric values of the IGP routes to which BGP routes within an AS recurse are the same. After
the load-balancing igp-metric-ignore command is run, the device does not compare IGP metric
values when selecting routes for load balancing.
In addition, BGP labeled routes and non-labeled routes cannot load-balance traffic even if they meet
the preceding conditions. Load balancing cannot be implemented between blackhole routes and non-
blackhole routes.
2022-07-08 1329
Feature Description
• Routes advertised by the router with the smallest router ID. If the routes carry the Originator_ID
attribute, BGP selects the routes with the smallest Originator_ID without comparing router IDs.
• Routes that are learned from the BGP peer with the lowest IP address
In this scenario, if route load balancing is implemented on both PE1 and PE2 when CE1 accesses CE2, a traffic loop
occurs between PE1 and PE2. To prevent this problem, you need to configure the POPGO label allocation mode.
2022-07-08 1330
Feature Description
Figure 2 Load balancing among VPN unicast routes and leaked routes
The next hops of the routes to be used for UCMP must be of the same type. For example, UCMP can be implemented
only when the next hops of the three routes are all SRv6 TE Policies.
2022-07-08 1331
Feature Description
Figure 3 UCMP based on the BGP Link Bandwidth extended community attribute
2022-07-08 1332
Feature Description
After a multicast load balancing policy is configured, a multicast router selects equal-cost routes in each
routing table on the device, such as, the unicast, MBGP, MIGP, and multicast static routing tables. Based on
the mask length and priority of each type of equal-cost routes, the router selects a routing table on which
multicast routing depends. Then, the router implements load balancing among equal-cost routes in the
selected routing table.
2022-07-08 1333
Feature Description
Load balancing can be implemented only between or among the same type of equal-cost routes. For example, load
balancing can be implemented between two MBGP routes but cannot be implemented between an MBGP route and an
MIGP route.
• Label switched path (LSP): includes LDP LSP, BGP LSP, and static LSP.
• Constraint-based Routed label switched path (CR-LSP): includes RSVP-TE CR-LSP or static CR-LSP.
Compared with LSPs, CR-LSPs meet specified constraints, such as bandwidth or path constraints.
• Generic Routing Encapsulation (GRE) tunnel: GRE-encapsulated data packets are transparently
transmitted over the public IP network.
Generally, MPLS VPNs use LSPs or CR-LSPs as public network tunnels. If the core routers (Ps) on the
backbone network, however, provide only the IP functionality and not MPLS functionality, whereas the
PEs at the network edge have the MPLS functionality, the LSPs or CR-LSPs cannot be used as public
network tunnels. In this situation, GRE tunnels can be used for MPLS VPNs.
A tunnel policy determines the tunnel type to be used for a VPN. By default, a VPN uses LSPs to forward
data. To change the tunnel type or configure tunnel load balancing for VPN services, apply a tunnel policy to
the VPN and run the tunnel select-seq command in the tunnel policy view to configure the priority sequence
of tunnels and the number of tunnels used for load balancing.
After the tunnel policy is applied to a VPN, the VPN selects tunnels based on the following rules:
• If two or more CR-LSPs are available, the VPN selects any two of them at random.
• If less than two CR-LSPs are available, the VPN selects all CR-LSPs and also selects LSPs as substitutes to
ensure that two tunnels are available for load balancing.
• If two tunnels have been selected, one CR-LSP and the other LSP, and a CR-LSP is added or a CR-LSP
2022-07-08 1334
Feature Description
goes Up from the Down state, the VPN selects the CR-LSP to replace the LSP.
• If the number of existing tunnels for load balancing is smaller than the configured number and a CR-
LSP or LSP in the Up state is added, the newly added tunnel is also used for load balancing.
• If one or more tunnels used for load balancing go Down, the tunnel policy is triggered to re-select
tunnels. The VPN selects LSPs as substitutes to ensure that the configured number of tunnels are used
for load balancing.
• The number of tunnels used for load balancing depends on the number of eligible tunnels. For example,
if there are only one CR-LSP and one LSP in the Up state, load balancing is performed between the two
tunnels. The tunnels of other types are not selected even if they are Up.
• Routes used for load balancing must have equal costs, whereas tunnels used for load balancing can
have unequal costs.
On the network shown in Figure 1, assume that all links have the same route cost. If two routes are
available from PE1 to PE2 for load balancing, these two routes must have the same cost. If two tunnels
are available from PE1 to PE2 for load balancing, these tunnels can have unequal route costs.
Figure 1 Tunnels used for load balancing do not necessarily have the same cost
• Routes used for load balancing must go over different paths, whereas tunnels used for load balancing
2022-07-08 1335
Feature Description
Figure 2 Tunnels used for load balancing are allowed to go over the same path
Configuring parallel adjacency labels does not affect the allocation of common adjacency labels between IGP
neighbors. After parallel adjacency labels are configured, the involved device advertises multiple adjacency
2022-07-08 1336
Feature Description
2022-07-08 1337
Feature Description
multiple peer nodes and peer adjacencies, the SID allocated to a peer set maps multiple peer-node SIDs and
peer-Adj SIDs.
On the network shown in Figure 3, ASBR1 and ASBR3 are directly connected through two physical links. An
EBGP peer relationship is established between ASBR1 and ASBR3 through loopback interfaces. ASBR1 runs
BGP EPE to assign the Peer-Node SID 28001 to its peer (ASBR3) and to assign the Peer-Adj SIDs 18001 and
18002 to the physical links. For an EBGP peer relationship established between directly connected physical
interfaces, BGP EPE allocates a Peer-Node SID rather than a Peer-Adj SID. For example, on the network
shown in Figure 3, BGP EPE allocates only Peer-Node SIDs 28002, 28003, and 28004 to the ASBR1-ASBR5,
ARBR2-ASBR4, and ASBR2-ASBR5 peer relationships, respectively.
2022-07-08 1338
Feature Description
An Eth-Trunk interface can work in either static LACP mode or manual load balancing mode.
• Static LACP mode: a link aggregation mode that uses the Link Aggregation Control Protocol (LACP) to
negotiate parameters and select active links based on the IEEE802.3ad standard. In static LACP mode,
LACP determines the numbers of active and inactive links in a link aggregation group. It is also called
the M:N mode, with M and N indicating the number of primary and backup links, respectively. This
mode provides higher link reliability and allows load balancing to be performed among M links.
On the network shown in Figure 1, three primary links and two backup links with the same attributes
exist between two devices. Traffic is load-balanced among the three primary links, but not along the
two backup links. The actual bandwidth of the aggregated link is the sum of the bandwidths of the
three primary links.
If a link in M links fails, LACP selects one from the N backup links to replace the faulty one to retain
2022-07-08 1339
Feature Description
M:N backup. The actual link bandwidth is still the sum of the bandwidths of the M primary links.
If a link cannot be found in the backup links to replace the faulty link and the number of member links
in the Up state falls below the configured lower threshold of active links, the Eth-Trunk interface goes
Down. Then all member interfaces in the Eth-Trunk interface no longer forward data.
An Eth-Trunk interface working in static LACP mode can contain member interfaces at different rates, in
different duplex modes, and on different boards. Eth-Trunk member interfaces at different rates cannot
forward data at the same time. Member interfaces in half-duplex mode cannot forward data.
• Manual load balancing mode: In this mode, you must manually create an Eth-Trunk interface, add
interfaces to the Eth-Trunk interface, and specify active member interfaces. LACP is not involved. All
active member interfaces forward data and perform load balancing.
Traffic can be evenly load-balanced among all member interfaces. Alternatively, you can set the weight
for each member interface to implement uneven load balancing; in this manner, the interface that has a
larger weight transmits a larger volume of traffic. If an active link in a link aggregation group fails,
traffic is balanced among the remaining active links evenly or based on weights, as shown in Figure 2.
An Eth-Trunk interface working in manual load balancing mode can contain member interfaces at
different rates, in different duplex modes, and on different boards.
Hash Algorithm
The hash algorithm uses a hash function to map a binary value of any length to a smaller binary value of a
fixed length. The smaller binary value is the hash value. The device then uses an algorithm to map the hash
2022-07-08 1340
Feature Description
value to an outbound interface and sends packets out from this outbound interface.
Hash Factor
Traffic is hashed based on traffic characteristics, which are called hash factors.
Traffic characteristics that can be used as hash factors include but are not limited to the following:
• MPLS header: MPLS label and some bits in the MPLS payload
For the default hash factors of hash algorithm in typical load balance scenarios, see the chapter Appendix: Default Hash
Factors.
For the default hash factors of hash algorithm in typical load balance scenarios, see the chapter Appendix: Default Hash
Factors.
2022-07-08 1341
Feature Description
• Provider edge (PE): an edge device on the provider network, which is directly connected to a CE. The PE
receives IP packets from the CE, encapsulates them with MPLS headers, and then forwards them to the
P. The PE also receives MPLS packets from the P, removes MPLS headers from them, and then forwards
them to the CE.
• Provider (P): a backbone device on the provider network, which is not directly connected to a CE. The P
performs MPLS forwarding.
• Customer edge (CE): an edge device on a user network, which performs IP forwarding.
The PE performs load balancing based on the format of packets received by the CE-side inbound interface
(upstream). Huawei NE40E uses an IP 2-tuple or 5-tuple as hash factors. As such, the load balancing effect
depends on the diversity of private IP addresses and TCP/UDP source and destination port numbers.
The P performs MPLS forwarding and its load balancing algorithm is based on the MPLS packet format.
• Typically, a packet carries no more than four labels. Huawei NE40E supports an IP 5-tuple or 2-tuple as
hash factors. The load balancing effect depends on the diversity of private IP addresses.
• In scenarios such as inter-AS VPN, FRR, and LDP over TE FRR, packets carry more labels. By default,
Huawei NE40E uses the fourth or fifth label for hash calculation. In this case, the load balancing effect
depends on the diversity of the fourth or fifth label.
Load balancing on the egress PE is the same as that in scenario 2 if penultimate hop popping is not
supported and that in scenario 1 if penultimate hop popping is supported.
2022-07-08 1343
Feature Description
When an L2VPN accesses an L3VPN, the NPE removes MPLS and Layer 2 frame headers before forwarding
packets through L3 outbound interfaces. The load balancing algorithm is the same as that in scenario 1.
• Provider edge (PE): an edge device on the provider network, which is directly connected to a CE. The PE
receives Ethernet frames from the CE, encapsulates them with MPLS headers, and then sends them to
the P. The PE also receives MPLS packets from the P, removes MPLS headers from them, and then sends
the corresponding Ethernet frames to the CE.
• Provider (P): a backbone device on the provider network, which is not directly connected to a CE. The P
performs MPLS forwarding.
• Customer edge (CE): an edge device on a user network, which performs Layer 2 Ethernet/VLAN
forwarding.
2022-07-08 1344
Feature Description
The load balancing algorithm on the ingress PE (AC -> MPLS) is based on the type of traffic received from
the AC interface.
• For IP packets, hash calculation can be performed based on an IP 2-tuple or 5-tuple. The load balancing
effect depends on the diversity of private IP addresses of the packets.
• For non-IP over Ethernet packets, hash calculation is performed based on the source and destination
MAC addresses. The load balancing effect depends on the diversity of private MAC addresses of the
packets. Certain boards support hash calculation based on the 3-tuple <source MAC address, destination
MAC address, VC label>.
2022-07-08 1345
Feature Description
The P performs MPLS forwarding (MPLS -> MPLS) and its load balancing algorithm is based on the MPLS
packet format.
• Typically, a packet carries no more than four labels. Huawei NE40E supports an IP 5-tuple or 2-tuple as
hash factors. The load balancing effect depends on the diversity of private IP addresses.
• In scenarios such as inter-AS VPN, FRR, and LDP over TE FRR, packets carry more labels. By default,
Huawei NE40E uses the fourth or fifth label for hash calculation. In this case, the load balancing effect
depends on the diversity of the fourth or fifth label.
• If the inbound interface is a public network interface (MPLS - > AC), certain boards support hash
calculation based on an IP 5-tuple or 2-tuple, and certain boards support hash calculation based only on
the source and destination MAC addresses.
• If the inbound interface is a private network interface (AC -> AC), the load balancing algorithm is the
same as that in scenario 1.
When an L2VPN accesses an L3VPN, load balancing among L2 outbound interfaces on the NPE is the same
2022-07-08 1346
Feature Description
as that in scenario 1.
• Provider edge (PE): an edge device on the provider network, which is directly connected to a CE. The PE
receives Ethernet frames from the CE, encapsulates them with MPLS headers, and then sends them to
the P. The PE also receives MPLS packets from the P, removes MPLS headers from them, and then sends
the corresponding Ethernet frames to the CE.
• Provider (P): a backbone device on the provider network, which is not directly connected to a CE. The P
performs MPLS forwarding.
• Customer edge (CE): an edge device on a user network, which performs VLAN or IP forwarding.
2022-07-08 1347
Feature Description
The load balancing algorithm on the ingress PE is based on the type of traffic received from the AC interface.
• For IP packets, hash calculation can be performed based on an IP 2-tuple or 5-tuple. The load balancing
effect depends on the diversity of private IP addresses of the packets.
• For non-IP over Ethernet packets, hash calculation is typically performed based on the source and
destination MAC addresses. The load balancing effect depends on the diversity of private MAC
addresses of the packets.
• For non-Ethernet packets, most boards use the VC label for hash calculation.
The P performs MPLS forwarding (MPLS -> MPLS) and its load balancing algorithm is based on the MPLS
packet format.
• Typically, a packet carries no more than four labels. Huawei NE40E supports an IP 5-tuple or 2-tuple as
hash factors. The load balancing effect depends on the diversity of private IP addresses.
• In scenarios such as inter-AS VPN, FRR, and LDP over TE FRR, packets carry more labels. By default,
Huawei NE40E uses the fourth or fifth label for hash calculation. In this case, the load balancing effect
depends on the diversity of the fourth or fifth label.
2022-07-08 1348
Feature Description
The egress PE supports only trunk load balancing because the VC of VLL/PWE3 is P2P.
• If the inbound interface is a private network interface (AC -> AC), the load balancing algorithm is the
same as that in scenario 1.
• If the inbound interface is a public network interface (MPLS - > AC), certain boards support hash
calculation based on an IP 5-tuple or 2-tuple, certain boards support hash calculation based on the VC
label, and certain boards do not support hash calculation.
When an L2VPN accesses an L3VPN, the NPE removes MPLS headers before forwarding packets through L2
outbound interfaces. The load balancing algorithm is the same as that in scenario 1.
• L2TP Access Concentrator (LAC): a network device capable of PPP and L2TP. It is usually an ISP's access
device that provides access services for users over the PSTN/ISDN. An LAC uses L2TP to encapsulate the
packets received from users before sending them to an LNS and decapsulates the packets received from
2022-07-08 1349
Feature Description
• L2TP Network Server (LNS): a network device that accepts and processes L2TP tunnel requests. Users
can access VPN resources after they have been authenticated by the LNS. An LNS and an LAC are two
endpoints of an L2TP tunnel. The LAC initiates an L2TP tunnel, whereas the LNS accepts L2TP tunnel
requests. An LNS is usually deployed as an enterprise gateway or a PE on an IP public network.
• Transit node: a transmission device on the transit network between an LAC and an LNS. Various types
of networks can be used as the transit networks, such as IP or MPLS networks.
• Control message: is used to establish, maintain or tear down the L2TP tunnel and sessions. The format
of L2TP control message is shown as Figure 2.
If the transit nodes of L2TP tunnel use per-packet load balancing, the L2TP control messages may arrive
out of order, this may result in the failure of L2TP tunnel establishment.
• Data message: is used to transmit PPP frames over L2TP tunnel. The data message is not retransmitted
if lost. The format of L2TP data message is shown as Figure 3.
2022-07-08 1350
Feature Description
tunnel address of the remote LNS. That is, the source IP address and destination IP address of the new IP
header is unique. Therefore, the L2TP traffic is belongs to the same flow. The load balancing result depends
on the number of the L2TP tunnels (Tunnel ID) or sessions (Session ID) carrying the traffic. The more L2TP
tunnels or sessions, the better result of load balancing.
GTP Scenario
Load balancing in the GTP scenario is similar to that in the L2TP scenario. The transit node performs load
balancing based on the IP address in the IP header and the tunnel endpoint identifier (TEID) in the GTP
header.
Figure 1 Transmitting data of multi-protocol local networks through the single-protocol backbone network
2022-07-08 1351
Feature Description
In the scenarios stated above, the source IP addresses and the destination IP addresses of all packets in the
GRE tunnel are the source address and the destination address of the GRE tunnel. Therefore, on any transit
node or on egress node of the GRE tunnel, the TTLs in the outer IP headers of the GRE packets are the same.
If a flow is carried by only one GRE tunnel and the load balancing mode is per-flow, the load balancing is
not available. It is recommended that you create multiple GRE tunnels to carry the flow.
• 4-tuple <source IP address, destination IP address, source port number, destination port number>,
• 5-tuple <source IP address, destination IP address, source port number, destination port number, and
protocol number>
Therefore, the result of the load balancing depends on the IP addresses and the TCP/UDP port number of
the traffic.
Default hash factors of IP unicast traffic depends on the type of the inbound board.
2022-07-08 1353
Feature Description
• Stable-preferred
Based on this policy, a multicast router distributes (*, G) entries and (S, G) entries on their
corresponding equal-cost routes. Therefore, stable-preferred is similar to the balance-preferred policy.
This policy implements automatic load balancing adjustment when equal-cost routes are deleted.
However, dynamic load balancing adjustment will not be performed when multicast routing entries are
deleted or when weights of load balancing routes change.
This policy applies to a network that has stable multicast services.
A VXLAN tunnel is determined by a pair of VTEP IP addresses. When a local VTEP receives the same remote VTEP IP
address repeatedly, only one VXLAN tunnel can be established, but packets are encapsulated with different VNIs before
being forwarded through the tunnel.
In the preceding distributed gateway scenario, the ingress (Leaf1) of the tunnel encapsulates VXLAN headers
into packets and then forwards them through the tunnel. If there are multiple equal-cost links in the tunnel,
hash calculations in different scenarios are as follows:
• When Host3 communicates with Host2 and Leaf1 functions as a Layer 2 gateway (that is, in a VXLAN
Layer 2 forwarding scenario), by default, the packets passing through Leaf1 (Host3's original packets
encapsulated with VXLAN headers) are hashed based on the VNI and MAC addresses (source and
destination MAC addresses of Host3). To implement a hash calculation based only on the VNI, perform
global configuration on Leaf1 connected to the VXLAN tunnel.
• When Host1 communicates with Host2 and Leaf1 functions as a Layer 3 gateway (that is, in a VXLAN
Layer 3 forwarding scenario), by default, the packets passing through Leaf1 (Host1's original packets
encapsulated with VXLAN headers) are hashed based on the VNI and IP addresses (source and
destination IP addresses of Host1). To implement a hash calculation based only on the VNI, perform
global configuration on Leaf1 connected to the VXLAN tunnel.
2022-07-08 1355
Feature Description
(MPLS P node) "The inner layer is the IP header" means that the MPLS label stack is
followed by the IP header (for example, an MPLS L3VPN packet) or
(MPLS -> MPLS)
that only the L2 Ethernet header is carried between the MPLS label
stack and IP header (for example, a VPLS packet). The inner layer is
not the IP header in other cases, for example, when VLL is carried over
MPLS and control word + Ethernet header + IP header is carried.
2022-07-08 1356
Feature Description
EVPN inbound TCP/UDP 5-tuple <source IP address, destination IP address, source port
tunnel (AC -> number, destination port number, protocol number>
MPLS, AC ->
Non-TCP/Non-UDP 3-tuple <source IP address, destination IP address, protocol
SRv6)
number>
EVPN VWPS TCP/UDP over IP 5-tuple <source IP address, destination IP address, source port
inbound tunnel number, destination port number, protocol number>
(AC -> MPLS, AC
Non-TCP/Non-UDP 3-tuple <source IP address, destination IP address, protocol
-> SRv6)
over IP number>
tunnel (AC -> When the traffic type is MPLS over Ethernet+non-IP, the hash factors
Non-IP over vary according to the number of MPLS labels:
MPLS)
If the number of labels is 5 or less, the hash factors are the innermost
Ethernet label plus 12 bytes after the bottommost label. In this scenario, the
same traffic may be hashed to multiple outbound interfaces, causing
packet out-of-order. You are advised to run the load-balance hash-
fields vll label-ip command to solve this problem.
If the number of labels is greater than 5, the hash factors are the five
outermost labels.
2022-07-08 1357
Feature Description
5-tuple <source IP
address, destination IP
address, source port
TCP/UDP
number, destination port
L3 forwarding (including
number, protocol
IPv4 L3VPN
IPv4/IPv6 unicast number>
inbound/outbound
tunnels)
3-tuple <source IP
address, destination IP
Non-TCP/Non-UDP
address, protocol
number>
2022-07-08 1358
Feature Description
5-tuple <source IP
address, destination IP
address, source port
TCP/UDP
number, destination port
number, protocol
IPv4/IPv6 number>
3-tuple <source IP
Bridging/VPLS inbound address, destination IP
Non-TCP/Non-UDP
/outbound tunnel (AC -> address, protocol
MPLS, AC -> AC) number>
5-tuple <source IP
address, destination IP
address, source port
TCP/UDP
number, destination port
number, protocol
IP over Ethernet number>
2022-07-08 1359
Feature Description
3-tuple <source IP
address, destination IP
Non-TCP/Non-UDP
address, protocol
number>
5-tuple <source IP
address, destination IP
address, source port
TCP/UDP
number, destination port
number, protocol
IPv4/IPv6 number>
VLL outbound tunnel
(MPLS -> AC)
3-tuple <source IP
address, destination IP
Non-TCP/Non-UDP
address, protocol
number>
Non-IP VC label
2022-07-08 1360
Feature Description
Definition
Unequal cost multipath (UCMP) allows traffic to be distributed according to the bandwidth ratio of multiple
unequal-cost paths that point to the same destination with the same precedence. All paths carry
proportional traffic according to bandwidth ratio to achieve optimal load balancing.
Purpose
When equal-cost routes have multiple outbound interfaces that connect to both high-speed links and low-
speed links, equal cost multipath (ECMP) evenly distributes traffic among links to a destination, regardless of
the difference between link bandwidths. When the link bandwidths differ greatly, low-bandwidth links may
be congested, whereas high-bandwidth links may be idle. To fully utilize bandwidths of different links, traffic
must be balanced according to the bandwidth ratio of these links.
• GE interfaces
• POS interfaces
• Serial interfaces
Precautions
• If interface-based UCMP is enabled, global UCMP cannot be enabled. Similarly, if global UCMP is
enabled, interface-based UCMP cannot be enabled.
• The bandwidth accuracy for the interface board is Mbit/s, which supports high-speed links.
• You must run the shutdown and undo shutdown commands in sequence after enabling UCMP on an
interface. As a result, traffic is interrupted. Global UCMP avoids this situation by providing more
functions.
• GigabitEthernet
2022-07-08 1362
Feature Description
• POS
• Eth-Trunk
• IP-Trunk
• Serial
• MP-Group
• If any outbound interface does not support UCMP, UCMP does not take effect after being enabled
globally. That means that traffic is still evenly load-balanced on paths.
• If all outbound interfaces support UCMP, enabling UCMP globally triggers all routes that carry the
bandwidth of each outbound interface to be delivered to the interface board. The bandwidth is
delivered in the same way as interface-based UCMP. The interface board then calculates the traffic
distribution ratio based on the bandwidth of each outbound interface.
• Processing interface bandwidth changes takes time so that the CPU may be busy processing frequent
changes of interface bandwidths. To avoid this problem, you can set an interval for reporting changes of
interface bandwidth to interface boards. If the interface bandwidth changes multiple times within the
interval, the latest bandwidth is reported to interface boards.
Precautions
If global UCMP is enabled, interface-based UCMP cannot be enabled. Similarly, if interface-based UCMP is
enabled, global UCMP cannot be enabled.
9.9.3 Applications
2022-07-08 1363
Feature Description
interfaces are 10 Gbit/s, 1 Gbit/s, and 1 Gbit/s, respectively. Three IPv4 equal-cost routes are available
between DeviceA and DeviceB.
When UCMP is not enabled in the three interfaces, their traffic ratio is 1:1:1.
After UCMP is enabled on the three interfaces, the traffic ratio of the three interfaces approaches the
bandwidth ratio 10:1:1.
When UCMP is not enabled on the three interfaces, their traffic ratio is 1:1:1, irrespective of the bandwidth
ratio.
After global UCMP is enabled, traffic from DeviceA to DeviceB is load-balanced on the three outbound
interfaces, and the traffic ratio approaches the bandwidth ratio 3:1:1.
When a member interface of Eth-Trunk 1 is shut down, the bandwidth of Eth-Trunk 1 changes to 2 Gbit/s
and accordingly the bandwidth ratio of the three outbound interfaces is 2:1:1 for load balancing.
When interfaces support UCMP, the bandwidths of equal-cost routes are displayed in the FIB table. By
calculating the bandwidth ratio of interfaces, you can see whether the bandwidth ratio approaches the
traffic ratio. In this way, you can learn whether UCMP functions normally.
2022-07-08 1364
Feature Description
Terms
None
Definition
Internet Protocol version 4 (IPv4) is the core protocol of the Transmission Control Protocol (TCP)/IP protocol
suite. It works at the Internet layer of the TCP/IP model. This layer corresponds to the network layer in the
OSI model. At the IP layer, information is divided into data units, and address and control information is
added to allow datagrams to be routed.
IP provides unreliable and connectionless data transmission services. Unreliable transmission means that IP
does not ensure that IP datagrams successfully arrive at their destination. IP only provides best effort
delivery. Once an error occurs, for example, a router exhausts the buffer, IP discards the excess datagrams
and sends ICMP messages to the source. The upper-layer protocols, such as TCP, are responsible for
resolving reliability issues.
Connectionless transmission means that IP does not maintain status information for subsequent datagrams.
Every datagram is processed independently, meaning that IP datagrams may not be received in the same
order they are sent. If a source sends two consecutive datagrams A and B in sequence to the same
destination, each datagram is possibly routed over a different path, and therefore B may arrive ahead of A.
Application
Each host on an IP network must have an IP address. An IP address is 32 bits long and consists of two parts:
network ID and host ID.
• A network ID uniquely identifies a network segment or a group of network segments. A network ID can
be obtained by converting an IP address and subnet mask into binary numbers and performing an AND
operation on the numbers.
• A host ID uniquely identifies a device on a network segment. A host ID can be obtained by converting
2022-07-08 1365
Feature Description
an IP address and subnet mask into binary numbers, reversing the post-conversion subnet mask, and
performing an AND operation on the numbers.
If multiple devices on a network segment have the same network ID, they belong to the same network,
regardless of their physical locations.
Purpose
IPv4 shields the differences at the data link layer and provides a uniform format for datagrams transmitted
at the upper layer.
9.10.2.1 ICMP
The Internet Control Message Protocol (ICMP) is an error-reporting mechanism and is used by IP or an
upper-layer protocol (TCP or UDP). An ICMP message is encapsulated as a part of an IP datagram and
transmitted through the Internet.
An IP datagram contains information about only the source and destination, not about all nodes along the
entire path through which the IP datagram passes. The IP datagram can record information about all nodes
along the path only when route record options are set in the IP datagram. Therefore, if a device detects an
error, it reports the error to the source and not to intermediate devices.
When an error occurs during the IP datagram forwarding, ICMP reports the error to the source of the IP
datagram, but does not rectify the error or notify the intermediate devices of the error. A majority of errors
generally occur on the source. When an error occurs on an intermediate device, however, the source cannot
locate the device on which the error occurs even after receiving the error report.
2022-07-08 1366
Feature Description
When a routing device forwards a message that meets the following conditions:
The routing device will discard the message and return an ICMP Net Unreachable message to the source
address to inform the source host to stop sending messages to this destination.
9.10.2.2 TCP
The Transmission Control Protocol (TCP) defined in standard protocols ensures high-reliability transmission
between hosts. TCP provides reliable, connection-oriented, and full-duplex services for user processes. TCP
transmits data through sequenced and nonstructural byte streams.
TCP is an end-to-end, connection-oriented, and reliable protocol. TCP supports multiple network
applications. In addition, TCP assumes that the lower layer provides only unreliable datagram services, and it
can run over a network of different hardware structures.
Figure 1 shows the position of TCP in a layered protocol architecture, where TCP is above IP. TCP can
transmit variable-length data through IP encapsulation. IP then performs data fragmentation and assembly
and transmits the data over multiple networks.
TCP works below applications and above IP. Its upper-layer interface consists of a series of calls similar to
the interrupt call of an operating system.
TCP can asynchronously transmit data of upper-layer applications. The lower-layer interfaces are assumed as
IP interfaces. To implement connection-oriented and reliable data transmission over unreliable networks,
TCP must provide the following:
• Connection assurance
Figure 2 shows the process of setting up and tearing down a TCP connection.
2022-07-08 1367
Feature Description
9.10.2.3 UDP
The User Datagram Protocol (UDP) is a computer communication protocol that provides packet switching
services on the Internet. By default, UDP uses IP as the lower-layer protocol. UDP provides the simplest
protocol mechanism that sends information to a user application. UDP is transaction-oriented and does not
support delivery or duplicate protection. TCP, however, is required by applications for reliable data
transmission. Figure 1 shows the format of a UDP datagram.
9.10.2.4 RawIP
RawIP only fills in certain fields of an IP header and allows an application to provide its own IP header.
Similar to UDP, RawIP is unreliable. No control mechanism is available to verify whether a RawIP datagram
is received. RawIP is connectionless, and it transmits data between hosts without an electric circuit of any
type. Unlike UDP, RawIP allows application data to be directly processed at the IP layer through a socket.
This is helpful to the applications that need to directly communicate with the IP layer.
9.10.2.5 Socket
2022-07-08 1368
Feature Description
A socket consists of a set of application programming interfaces (APIs) working between the transport layer
and application layer. The socket shields differences of transport layer protocols and provides the uniform
programming interfaces for the application layer. In this manner, the application layer, being exempt from
the detailed process of the TCP/IP protocol suite, can transmit data over IP networks by calling socket
functions. Figure 1 shows the position of the socket in the TCP/IP protocol stack.
The following types of sockets are supported by different protocols at the transport layer:
• TCP-based socket: provides reliable byte-stream communication services for the application layer.
• UDP-based socket: supports connectionless and unreliable data transmission for the application layer
and preserves datagram boundaries.
• RawIP socket: also called raw socket. Similar to the UDP-based socket, the RawIP socket supports
connectionless and unreliable data transmission and preserves datagram boundaries. The RawIP socket
is unique in that it can be used by applications to directly access the network layer.
• Link layer-based socket: used by Intermediate System to Intermediate System (IS-IS) to directly access
the link layer.
9.10.2.6 DSCP
The Internet Engineering Task Force (IETF) redefined the type of service (ToS) for IPv4 packets and Traffic
Class (TC) for IPv6 packets as the Differentiated Service (DS) field for the DiffServ model. The value of the
DS field is the DiffServ code point (DSCP) value. This is shown in Figure 1.
2022-07-08 1369
Feature Description
In an IPv4 packet, the six left-most bits (0 to 5) in the DS field are defined as the DSCP value, and the two
right-most bits (6 and 7) are reserved bits. Bits 0 to 2 are the Class Selector Code Point (CSCP) value,
indicating a class of DSCP. Devices that support the DiffServ function perform forwarding behaviors for
packets based on the DSCP value.
An IPv6 packet contains the Traffic Class field. The Traffic Class field is 8 bits long and functions the same as
the ToS field in an IPv4 packet to identify the service type.
Generally, each protocol has a default DSCP value, and the DSCP values of some protocols can be configured using the
host-packet type command or the corresponding commands for changing the DSCP values of the protocols. In this case,
the rules for the DSCP values to take effect as follows:
• If a protocol has its own command for changing the DSCP value, the DSCP value configured using its own
command takes effect regardless of whether the DSCP value is controlled by the host-packet type command.
• If a protocol does not have its own command for changing the DSCP value and the DSCP value is controlled by the
host-packet type command, the DSCP value configured using the command takes effect.
• If a protocol does not have its own command for changing the DSCP value and the DSCP value is not controlled by
the host-packet type command, the default DSCP value takes effect.
For details about the DSCP value and meaning corresponding to each PHB, see DSCP and PHB.
ICMP_ECHO_REPLY0 No N/A
2022-07-08 1370
Feature Description
DNS 0 No N/A
RADIUS 48 No N/A
2022-07-08 1371
Feature Description
IGMP 48 No N/A
PIM 48 No N/A
IKE 48 No N/A
RSVP-TE 48 No N/A
2022-07-08 1372
Feature Description
MSDP 48 No N/A
ICMP6_ECHO_REPLY
Copied from the No N/A
TC/DSCP value of
an ICMP6_ECHO
message
ND 48 No N/A
(NS/NA/RS/RA)
DNSv6 0 No N/A
TFTPv6 SERVER NA No NA
2022-07-08 1373
Feature Description
snmp priority-level
HWTACACS 48 No N/A
RADIUS 48 No N/A
MLD 48 No N/A
PIMv6 48 No N/A
DHCPv6 48 No N/A
2022-07-08 1374
Feature Description
IP packet is
inherited.
Otherwise, it is set
to 48.
In the inbound direction, you can control the following ICMP messages:
In the outbound direction, you can control the following ICMP messages:
If you disable the sending or receiving of ICMP messages, the Router does not send or receive any ICMP
2022-07-08 1375
Feature Description
message. This reduces network traffic and Router burden and prevents malicious attacks.
Alternatively, you can limit the ICMP message rate and configure the Router to discard ICMP messages with
the TTL 1 and ICMP messages that carry options. This reduces Router burden.
Definition
Internet Protocol version 6 (IPv6), also called IP Next Generation (IPng), is the second-generation standard
protocol of network layer protocols. As a set of specifications defined by the Internet Engineering Task Force
(IETF), IPv6 is the upgraded version of Internet Protocol version 4 (IPv4).
The most significant difference between IPv6 and IPv4 is that IP addresses are lengthened from 32 bits to
128 bits. Featuring a simplified header format, sufficient address space, hierarchical address structure,
flexible extended header, and an enhanced neighbor discovery (ND) mechanism, IPv6 has a competitive
future in the market.
Purpose
IP technology has become widely applied due to the great success of the IPv4 Internet. As the Internet
develops, however, IPv4 weaknesses have become increasingly obvious in the following aspects:
2022-07-08 1376
Feature Description
IPv6 solves the problem of IP address shortage and has the following advantages:
• Easy to deploy.
With so many obvious advantages over IPv4, IPv6 has developed rapidly.
• X:X:X:X:X:X:X:X
■ IPv6 addresses in this format are written as eight groups of four hexadecimal digits (0 to 9, A to F),
each group separated by a colon (:). Every "X" represents a group of hexadecimal digits. For
example, 2001:db8:130F:0000:0000:09C0:876A:130B is a valid IPv6 address.
For convenience, any zeros at the beginning of a group can be omitted; therefore, the given
example becomes 2001:db8:130F:0:0:9C0:876A:130B.
■ Any number of consecutive groups of 0s can be replaced with two colons (::). Therefore, the given
example can be written as 2001:db8:130F::9C0:876A:130B.
This double-colon substitution can only be used once in an address; multiple occurrences would be
ambiguous.
• X:X:X:X:X:X:d.d.d.d
IPv4-mapped IPv6 address: The format of an IPv4-mapped IPv6 address is 0:0:0:0:0:FFFF:IPv4-address.
IPv4-mapped IPv6 addresses are used to represent IPv4 node addresses as IPv6 addresses.
"X:X:X:X:X:X" represents the high-order six groups of digits, each "X" standing for 16 bits represented by
hexadecimal digits. "d.d.d.d" represents the low-order four groups of digits, each "d" standing for 8 bits
represented by decimal digits. "d.d.d.d" is a standard IPv4 address.
2022-07-08 1377
Feature Description
• Unicast address: identifies a single network interface and is similar to an IPv4 unicast address. A packet
sent to a unicast address is transmitted to the unique interface identified by this address.
A global unicast address cannot be the same as its network prefix because an IPv6 address which is the
same as its network prefix is a subnet-router anycast address reserved for a device. However, this rule
does not apply to an IPv6 address with a 127-bit network prefix.
• Anycast address: assigned to a group of interfaces, which usually belong to different nodes. A packet
sent to an anycast address is transmitted to only one of the member interfaces, typically the nearest
according to the routing protocol's choice of distance.
Application scenario: When a mobile host communicates with the mobile agent on the home subnet, it
uses the anycast address of the subnet Router.
Addresses specifications: Anycast addresses do not have independent address space. They can use the
format of any unicast address. Syntax is required to differentiate an anycast address from a unicast
address.
As IPv6 defines, an IPv6 address with the interface identifier of all 0s is a subnet-router anycast address.
As shown in Figure 2, the subnet prefix is an IPv6 unicast address prefix which is specified during
configuration of an IPv6 unicast address.
2022-07-08 1378
Feature Description
An anycast address is not necessarily a subnet-router anycast address and can also be a global unicast address.
• Multicast address: assigned to a set of interfaces that belong to different nodes and is similar to an IPv4
multicast address. A packet that is sent to a multicast address is delivered to all the interfaces identified
by that address.
IPv6 addresses do not include broadcast addresses. In IPv6, multicast addresses can provide the
functions of broadcast addresses.
• Link-local unicast address: used in the neighbor discovery protocol and in the communication between
nodes on the local link during stateless address autoconfiguration. The packet with the link-local unicast
address as the source or destination address is only forwarded on the local link. The link-local unicast
address can be automatically configured on any interface using the link-local prefix FE80::/10 (1111
1110 10), and the interface identifier in IEEE EUI-64 format (an EUI-64 can be derived from an EUI-48).
• Unique Local unicast address: is globally unique and intended for local communication. Unique local
unicast addresses are not expected to be routable on the global internet. They are routable inside a site
and also possibly between a limited set of sites. These addresses are not auto-configured. A unique local
unicast address consists of a 7-bit prefix, a 41-bit global ID (including the L bit which is one bit), a 16-
bit subnet ID, and a 64-bit interface ID.
2022-07-08 1379
Feature Description
• Loopback address: is 0:0:0:0:0:0:0:1 or ::1 and not assigned to any interface. Similar to the IPv4 loopback
address 127.0.0.1, the IPv6 loopback address indicates that a node sends IPv6 packets to itself.
• Unspecified address (::): can neither be assigned to any node nor function as the destination address.
The unspecified address can be used in the Source Address field of the IPv6 packet sent by an initializing
host before it has learned its own address. During Duplicate Address Detection (DAD), the Source
Address field of a Neighbor Solicitation (NS) packet is an unspecified address.
• Global unicast address: equivalent to an IPv4 public network address. Global unicast addresses are used
on links that can be aggregated, and are provided to the Internet Service Provider (ISP). The structure of
this type of address enables route-prefix aggregation to solve the problem of a limited number of
global routing entries. A global unicast address consists of a 48-bit route prefix managed by operators,
a 16-bit subnet ID managed by local nodes, and a 64-bit interface ID. Unless otherwise specified, global
unicast addresses include site-local unicast addresses.
• For Layer 3 physical interfaces and sub-interfaces, the EUI-64 address is generated based on the MAC
address of a physical interface, with FFFE added in the middle.
• For loopback interfaces , VBDIF interfaces, and tunnel interface, the EUI-64 address is generated based
on the MAC address of an interface, with the last two bytes following the interface index added in the
middle.
• For Eth-Trunk interfaces and its sub-interfaces, Global-VE sub-interfaces, VE sub-interfaces, and VLANIF
interfaces, the EUI-64 address is generated based on the MAC address of an interface, with FFFE added
in the middle.
Taking the insertion of a hexadecimal number FFFE (1111 1111 1111 1110) into the middle of a MAC
address as an example, see Figure 3 for the detailed conversion procedure.
2022-07-08 1380
Feature Description
■ In stateful address autoconfiguration, the host obtains the address and configuration from a server.
■ In stateless address autoconfiguration, the host automatically configures an IPv6 address that
contains the prefix advertised by the local Router and interface ID of the host. If no Router exists
on the link, the host can only configure the link-local address automatically to interwork with local
nodes.
2022-07-08 1381
Feature Description
6. Prefer an address whose label value is the same as that of the destination address.
The candidate address can be the unicast address that is configured on the specified outbound interface. If a source
address that has the same label value and is in the same address range with the destination address is not found
on the outbound interface, you can select such a source address from another interface.
Select a destination address using the following rules in descending order of priority.
5. Prefer an address whose label value is the same as that of the source address.
• QoS
In an IPv6 header, the new Flow Label field specifies how to identify and process traffic. The Flow Label
field identifies a flow and allows a Router to recognize packets in the flow and to provide special
processing.
QoS is guaranteed even for the packets encrypted with IPsec because the IPv6 header can identify
different types of flows.
• Built-in security
An IPv6 packet contains a standard extension header related to IPsec, and therefore IPv6 can provide
end-to-end security. This provides network security specifications and improves interoperability between
different IPv6 applications.
2022-07-08 1382
Feature Description
When multiple extension headers are used in the same packet, the headers must be listed in the following
order:
Not all extension headers must be examined and processed by Routers. When a Router forwards packets, it
determines whether or not to process the extension headers based on the Next Header value in the IPv6
basic header.
The destination options extension header appears twice in a packet: one before the routing extension header
and one after the upper layer extension header. All other extension headers appear only once.
9.11.2.3 ICMPv6
Internet Control Message Protocol for Internet Protocol version 6 (ICMPv6) is an integral part of IPv6 and
used on IPv6 networks. It provides similar functions to those of ICMPv4 on IPv4 networks.
2022-07-08 1383
Feature Description
• Type: indicates a message type. Values 0 to 127 indicate the error message type, and values 128 to 255
indicate the informational message type.
2022-07-08 1384
Feature Description
2022-07-08 1385
Feature Description
2022-07-08 1386
Feature Description
PMTU Principles
PMTU is the process of determining the minimum IPv6 MTU on the path between the source and
destination. The PMTU discovery mechanism uses a technique to dynamically discover the PMTU for a path.
When an IPv6 node has a large amount of data to send to another node, the data is transmitted in a series
of IPv6 fragments. When these fragments are of the maximum length allowed in successful transmission
from the source node to the destination node, the fragment length is considered optimal and called PMTU.
A source node assumes that the PMTU of a path is the known IPv6 MTU of the first hop on the path. If any
of the packets sent on that path are too large to be forwarded, the transit node discards these packets and
returns an ICMPv6 Datagram Too Big message to the source node. The source node sets the PMTU for the
path based on the IPv6 MTU in the received message.
When the PMTU learned by the transit node is less than or equal to the actual PMTU, the PMTU discovery
process is complete. Before the PMTU discovery process is completed, ICMPv6 Datagram Too Big messages
may be repeatedly sent or received because there may be links with smaller MTUs further along the path.
2022-07-08 1387
Feature Description
9.11.2.6 TCP6
Transmission Control Protocol version 6 (TCP6) provides a mechanism to establish virtual circuits between
processes of two endpoints. A TCP6 virtual circuit is similar to the full-duplex circuit that transmits data
between systems. TCP 6 provides reliable data transmission between processes, and is known as a reliable
protocol. TCP6 also provides a mechanism to optimize transmission performance according to the network
status. When all data can be received and acknowledged, the transmission rate increases gradually. Delay
causes the sending host to reduce the sending rate before it receives Acknowledgement packets.
TCP6 is generally used in interactive applications, such as the web application. Certain errors in data
receiving affect the normal operation of devices. TCP6 establishes virtual circuits using the three-way
handshake mechanism, and all virtual circuits are deleted through the four-way handshake. TCP6
connections provide multiple checksums and reliability functions, but increase cost. As a result, TCP6 has
lower efficiency than User Datagram Protocol version 6 (UDP6).
Figure 1 shows the establishment and tearing down of a TCP6 connection.
2022-07-08 1388
Feature Description
9.11.2.7 UDP6
User Datagram Protocol version 6 (UDP6) is a computer communications protocol used to exchange packets
on a network. UDP6 has the following characteristics:
• UDP only uses source and destination information and is mainly used in the simple request/response
structure.
• UDP is unreliable. This is because no control mechanism is available to ensure that UDP6 datagrams
reach their destinations.
• UDP is connectionless, meaning that no virtual circuits are required during data transmission between
hosts.
The connectionless feature of UDP6 enables it to send data to multicast addresses. This is different from
TCP6, which requires specific source and destination addresses.
9.11.2.8 RawIP6
RawIP6 fills only a limited number of fields in the IPv6 header, and allows application programs to provide
their own IPv6 headers.
RawIP6 is similar to UDP6 in the following aspects:
• RawIP6 is unreliable because no control mechanism is available to ensure that RawIP6 datagrams reach
their destinations.
• RawIP6 is connectionless, meaning that no virtual circuits are required during data transmission
2022-07-08 1389
Feature Description
between hosts.
Unlike UDP6, rawIP6 allows application programs to directly operate the IP layer through the socket, which
facilitates direct interactions of applications with the lower layer.
9.11.3 DSCP
The Internet Engineering Task Force (IETF) redefined the type of service (ToS) for IPv4 packets and Traffic
Class (TC) for IPv6 packets as the Differentiated Service (DS) field for the DiffServ model. The value of the
DS field is the DiffServ code point (DSCP) value. This is shown in Figure 1.
In an IPv4 packet, the six left-most bits (0 to 5) in the DS field are defined as the DSCP value, and the two
right-most bits (6 and 7) are reserved bits. Bits 0 to 2 are the Class Selector Code Point (CSCP) value,
indicating a class of DSCP. Devices that support the DiffServ function perform forwarding behaviors for
packets based on the DSCP value.
An IPv6 packet contains the Traffic Class field. The Traffic Class field is 8 bits long and functions the same as
the ToS field in an IPv4 packet to identify the service type.
Generally, each protocol has a default DSCP value, and the DSCP values of some protocols can be configured using the
host-packet type command or the corresponding commands for changing the DSCP values of the protocols. In this case,
the rules for the DSCP values to take effect as follows:
• If a protocol has its own command for changing the DSCP value, the DSCP value configured using its own
command takes effect regardless of whether the DSCP value is controlled by the host-packet type command.
• If a protocol does not have its own command for changing the DSCP value and the DSCP value is controlled by the
host-packet type command, the DSCP value configured using the command takes effect.
• If a protocol does not have its own command for changing the DSCP value and the DSCP value is not controlled by
the host-packet type command, the default DSCP value takes effect.
For details about the DSCP value and meaning corresponding to each PHB, see DSCP and PHB.
2022-07-08 1390
Feature Description
ICMP_ECHO_REPLY0 No N/A
DNS 0 No N/A
RADIUS 48 No N/A
2022-07-08 1391
Feature Description
protocol)
IGMP 48 No N/A
PIM 48 No N/A
IKE 48 No N/A
2022-07-08 1392
Feature Description
ToS/DSCP value of
the inner IP packet
is inherited.
Otherwise, it is set
to 48.
RSVP-TE 48 No N/A
MSDP 48 No N/A
ICMP6_ECHO_REPLY
Copied from the No N/A
TC/DSCP value of
an ICMP6_ECHO
message
ND 48 No N/A
(NS/NA/RS/RA)
DNSv6 0 No N/A
2022-07-08 1393
Feature Description
management-protocol)
TFTPv6 SERVER NA No NA
HWTACACS 48 No N/A
RADIUS 48 No N/A
MLD 48 No N/A
PIMv6 48 No N/A
2022-07-08 1394
Feature Description
control-protocol)
DHCPv6 48 No N/A
9.12 ND Description
9.12.1 Overview of ND
Definition
The Neighbor Discovery (ND) protocol is an important part of the Internet Protocol suite used with IPv6.
NDP in IPv6 is a replacement of Address Resolution Protocol (ARP) and ICMP Router Discovery (RD) in IPv4.
NDP uses ICMPv6 packets to implement functions including RD, duplicate address detection (DAD), address
resolution, neighbor unreachability detection (NUD), and redirection.
Purpose
If two hosts need to communicate on a local area network (LAN), the network-layer address (IPv6 address)
of the receiver must be available to the sender. In addition, IPv6 data packets must be encapsulated into
frames before they are sent over a physical network. Therefore, the sender must know the physical address
(MAC address) of the receiver, and the mapping between the IPv6 address and the physical address must be
available to ensure transmission of data packets.
2022-07-08 1395
Feature Description
Benefits
ND allows mapping between network-layer IPv6 addresses and link-layer MAC addresses to ensure
communication on an Ethernet.
9.12.2 Understanding ND
9.12.2.1 ND Fundamentals
Neighbor discovery (ND) is a group of messages and processes that identify relationships between
neighboring nodes. IPv6 ND provides similar functions as the Address Resolution Protocol (ARP) and ICMP
router discovery in IPv4, as well as additional functions.
After a node is configured with an IPv6 address, it checks that the address is available and does not conflict
with other addresses. When a node is a host, a Router must notify it of the optimal next hop address of a
packet to a destination. When a node is a Router, it must advertise its IPv6 address and address prefix, along
with other configuration parameters to instruct hosts to configure parameters. When forwarding IPv6
packets, a node must know the link layer addresses and check the availability of neighboring nodes. IPv6 ND
provides four types of ICMPv6 messages:
• Router Solicitation (RS): After startup, a host sends an RS message to a Router, and waits for the Router
to respond with a Router Advertisement (RA) message.
• Router Advertisement (RA): A Router periodically advertises RA messages containing prefixes and flag
bits.
• Neighbor Solicitation (NS): An IPv6 node uses NS messages to obtain the link-layer address of its
neighbor, check that the neighbor is reachable, and detect address conflicts.
• Neighbor Advertisement (NA): After receiving an NS message, an IPv6 node responds with an NA
message. In addition, the IPv6 node initially sends NA messages when the link layer changes.
2022-07-08 1396
Feature Description
The IPv6 address 2001:db8:1::1 is assigned to HostA as a tentative IPv6 address. To check the validity of this
address, HostA sends an NS message containing the requested address 2001:db8:1::1 to the solicited-node
multicast group to which 2001:db8:1::1 belongs. Because 2001:db8:1::1 is not specified, the source address of
the NS message is an unspecified address. After receiving the NS message, HostB processes the message as
follows:
• If 2001:db8:1::1 is a tentative or unused address of HostB, HostB does not use this address as an
interface address, nor does it send an NA message.
• If HostB checks that 2001:db8:1::1 is a used address, it sends an NA message that contains 2001:db8:1::1
to 2001:db8:2::1. After receiving the message, HostA finds that its tentative address is duplicate.
• A device receives an NS message with the same target address but a different source MAC address from
a peer device while sending an NS message.
• A device receives an NS message with the same target address and source MAC address from a peer
device while sending an NS message.
Figure 2 shows an address conflict self-recovery example (common address conflict). The principles for other
scenarios are similar to those for the common address conflict scenario.
2022-07-08 1397
Feature Description
At t1, HostA sends an NS message. After receiving an NA message from HostB, HostA continues to perform
address conflict detection at t2 and send an NS message to HostB.
• If HostB replies with an NA message, HostA continues to perform address conflict detection at the next
time and send an NS message to HostB.
• If HostB does not reply with an NA message, the address is available and HostA stops sending NS
messages to HostB.
Neighbor Discovery
Similar to ARP in IPv4, IPv6 ND parses the neighbor addresses and detects the availability of neighbors based
on NS and NA messages.
When a node needs to obtain the link-layer address of another node on the same local link, it sends an
ICMPv6 type 135 NS message. An NS message is similar to an ARP request message in IPv4, but is destined
for a multicast address rather than a broadcast address. Only the node whose last 24 bits in its address are
the same as the multicast address can receive the NS message. This reduces the possibility of broadcast
storms. A destination node fills its link-layer address in the NA message.
An NS message is also used to detect the availability of a neighbor when the link-layer address of the
neighbor is known. An NA message is the response to an NS message. After receiving an NS message, a
destination node responds with an ICMPv6 type 136 NA message on the local link. After receiving the NA
message, the source node can communicate with the destination node. When the link-layer address of a
node on the local link changes, the node actively sends an NA message.
Router Discovery
Router discovery is used to locate a neighboring Router and learn the address prefix and configuration
parameters related to address autoconfiguration. IPv6 router discovery is implemented based on the
following messages:
2022-07-08 1398
Feature Description
• RS message
When a host is not configured with a unicast address, for example, when the system has just started, it
sends an RS message. An RS message helps the host rapidly perform address autoconfiguration without
waiting for the RA message that is periodically sent by an IPv6 device. An RS message is of the ICMPv6
type 133.
• RA message
Interfaces on each IPv6 device periodically send RA messages only when they are enabled to do so.
After a Router receives an RS message from an IPv6 device on the local link, the Router responds with
an RA message. An RA message is sent to the all-nodes multicast address (FF02::1) or to the IPv6
unicast address of the node that sent the RS message. An RA message is of the ICMPv6 type 134 and
contains the following information:
■ One or more on-link prefixes (On-link nodes can perform address autoconfiguration using these
address prefixes.)
■ Whether the Router sending the RA message can be used as a default router (If so, the lifetime of
the default router is also included, expressed in seconds.)
■ Other information about the host, such as the hop limit and the MTU that specifies the maximum
size of the packet initiated by a host
After an IPv6 host on the local link receives an RA message, it extracts the preceding information to
obtain the updated default router list, prefix list, and other configurations.
Neighbor Tracking
A neighbor state can transit from one to another. Hardware faults and hot swapping of interface cards
interrupt communication with neighboring devices. Communication cannot be restored if the destination of a
neighboring device becomes invalid, but it can be restored if the path fails. Nodes need to maintain a
neighbor table to monitor the state of each neighboring device.
RFC standards define five neighbor states: Incomplete, Reachable, Stale, Delay, and Probe.
Figure 3 shows the transition of neighbor states. The Empty state indicates that the neighbor table is empty.
2022-07-08 1399
Feature Description
The following example describes changes in neighbor state of node A during its first communication with
node B.
1. Node A sends an NS message and generates a cache entry. The neighbor state of node A is
Incomplete.
2. If node B replies with an NA message, the neighbor state of node A changes from Incomplete to
Reachable. Otherwise, the neighbor state changes from Incomplete to Empty after a certain period of
time, and node A deletes this entry.
3. After the neighbor reachable time times out, the neighbor state changes from Reachable to Stale,
indicating that the neighbor reachable state is unknown.
4. If node A in the Reachable state receives an unsolicited NA message from node B, and the link-layer
address of node B carried in the message is different from that learned by node A, the neighbor state
of node A changes to Stale.
5. After the aging time of ND entries in the Stale state expires, the neighbor state changes to Delay.
6. After a period of time (5s), the neighbor state changes from Delay to Probe. During this time, if node
A receives an NA message, the neighbor state of node A changes to Reachable.
7. Node A in the Probe state sends three unicast NS messages at the configured interval (1s). If node A
receives an NA message, the neighbor state of node A changes from Probe to Reachable. Otherwise,
the state changes to Empty and node A deletes the entry.
Address Autoconfiguration
A Router can notify hosts of how to perform address autoconfiguration using RA messages and prefix flags.
For example, the Router can specify stateful (DHCPv6) or stateless address autoconfiguration for the hosts.
When stateless address autoconfiguration is employed, a host uses the prefix information in a received RA
message and local interface ID to automatically form an IPv6 address, and sets the default router according
to the default router information in the message. Or the host obtains DNS information based on the RDNSS
and DNSSL options in the message.
2022-07-08 1400
Feature Description
NS/NA spoofing An attacker sends an authorized node (host or router) an NS message with a bogus
source link-layer address option, or an NA message with a bogus target link-layer
address option. Then packets from the authorized node are sent to this link-layer
address.
Duplicate Address An attacker responds to every DAD attempt made by a host that accesses the
Detection (DAD) network, claiming that the address is already in use. This is performed to ensure that
attack the host will never obtain an address.
Spoofed Redirect An attacker uses the link-local address of the first-hop router to send a Redirect
message message to an authorized host. The authorized host accepts this message because the
host mistakenly considers that the message came from the first-hop router.
Replay attack An attacker obtains valid messages and replays them. Even if Neighbor Discovery
Protocol (NDP) messages are cryptographically protected so that their contents
cannot be forged, they are still prone to replay attacks.
Bogus address An attacker sends a bogus RA message specifying that some prefixes are on-link. If a
prefix prefix is on-link, a host will not send any packets that contain this prefix to the router.
Instead, the host will send NS messages to attempt address resolution, but the NS
messages are not responded to. As a result, the host is denied services.
Malicious last-hop An attacker multicasts bogus RA messages or unicasts bogus RA messages in response
router to multicast RS messages to a host attempting to discover a last-hop router. If the
host selects the attacker as its default router, the attacker is able to insert himself as a
man-in-the-middle and intercepts all messages exchanged between the host and its
destination.
To counter these threats, SEND specifies security mechanisms to extend ND. SEND defines cryptographically
generated addresses (CGAs), CGA option, and Rivest Shamir Adleman (RSA) Signature option, which are
used to ensure that the sender of an ND message is the owner of the message's source address. SEND also
defines Timestamp and Nonce options to prevent replay attacks.
• CGA: contains an IPv6 interface identifier that is generated from a one-way hash of the public key and
2022-07-08 1401
Feature Description
associated parameters.
• CGA option: contains information used to verify the sender's CGA, including the modifier value and
public key of the sender. This option is used to check the validity of source IPv6 addresses carried in ND
messages.
• RSA Signature option: contains the hash value of the sender's public key and contains the digital
signature generated from the sender's private key and ND messages. This option is used to check the
integrity of ND messages and authenticate the identity of the sender.
If an attacker uses an address that belongs to an authorized node, the attacker must use the node's public key for
encryption. Otherwise, the receiver can detect the attempted attack after checking the CGA option. Even if the
attacker obtains the public key of the authorized node, the receiver can still detect the attempted attack after
checking the digital signature, which is generated from the sender's private key.
• Timestamp option: a 64-bit unsigned integer field containing a timestamp. The value indicates the
number of seconds since January 1, 1970, 00:00 UTC. This option prevents unsolicited advertisement
messages and Redirect messages from being replayed. The receiver is expected to ensure that the
timestamp of the recently received message is the latest.
• Nonce option: contains a random number selected by the sender of a solicitation message. This option
prevents replay attacks during message exchange. For example, a sender sends an NS message carrying
the Nonce option and receives an NA message as a response that also carries the Nonce option; the
sender verifies the NA message based on the Nonce option.
To reject insecure ND messages, an interface can have the IPv6 SEND function configured. An ND message
that meets any of the following conditions is insecure:
• The received ND message does not carry the CGA or RSA option, which indicates that the interface
sending this message is not configured with a CGA.
• The key length of the received ND message exceeds the length limit that the interface supports.
• The rate at which ND messages are received exceeds the system rate limit.
• The time difference between the sent and received ND messages exceeds the time difference allowed by
the interface.
As Router implementation complies with standard protocols, the key-hash field in the RSA signature option of ND
packets is generated using the SHA-1 algorithm. SHA-1 has been proved not secure enough.
9.12.2.2 Static ND
Definition
Static ND allows a network administrator to create a mapping between IPv6 and MAC addresses.
2022-07-08 1402
Feature Description
Related Concepts
The main difference between static ND and dynamic ND lies in how ND entries are generated and
maintained. That is, dynamic ND entries are automatically generated and maintained using ND messages,
whereas static ND entries are manually configured and maintained by network administrators.
To ensure communication stability and security, deploy static ND based on actual requirements and network
resources.
• IPv6 addresses can be bound to the MAC address of a specified gateway to ensure that only this
gateway forwards the IPv6 datagrams destined for these IPv6 addresses.
• The destination IPv6 addresses of certain IPv6 datagrams sent by a specified host can be bound to a
2022-07-08 1403
Feature Description
Application Scenarios
Static ND is applicable to the following networks:
• Network with high requirements for information security, such as a government network or military
network
Benefits
Configuring static ND entries improves communication security. If a static ND entry is configured on a
device, the device can communicate with the peer device using only the specified MAC address. This
improves communication security, because network attackers cannot modify the mapping between the IPv6
and MAC addresses using ND messages.
9.12.2.3 Dynamic ND
Definition
Dynamic ND allows devices to dynamically learn and update the mapping between IPv6 and MAC addresses
through ND messages. That is, you do not need to manually configure the mapping.
Related Concepts
Dynamic ND entries can be created, updated, and aged through ND messages.
Upon receipt of an ND message whose source IPv6 address is on the same network segment as the IPv6
address of the inbound interface, a device automatically creates or updates an ND entry if the message
meets either of the following conditions:
■ The destination IPv6 address is the IPv6 address of the inbound interface.
■ The destination IPv6 address is the Virtual Router Redundancy Protocol (VRRP) virtual IPv6 address
of the inbound interface.
2022-07-08 1404
Feature Description
To prevent ND probing from consuming a large amount of system resources, a device limits the rate of
sending ND probe messages. That is, in high-specification scenarios, extended periods of time are needed
from when ND probing starts to when ND entry aging is complete.
Application Scenarios
The dynamic ND aging mechanism ensures that ND entries unused during a specified period are
automatically deleted. This mechanism helps save the storage space of ND tables and speed up ND table
lookups.
Dynamic ND applies to networks with complex topologies and high real-time communication requirements.
Aging probe Before a dynamic ND entry If the IPv6 address of the peer device remains unchanged
mode on a device is aged, the but its MAC address changes frequently, it is recommended
device sends unicast or that you configure the local device to multicast ND aging
multicast ND aging probe probe messages.
messages to other devices. If the MAC address of the peer device remains unchanged,
By default, unicast ND network bandwidth resources are insufficient, and the
aging probe messages are aging time of ND entries is set to a small value, it is
sent. recommended that you configure the local device to
unicast ND aging probe messages.
Aging time Every dynamic ND entry has Two interconnected devices can use ND to learn the
a lifecycle, which is also mapping between their IPv6 and MAC addresses and save
called aging time. If a the mapping in their ND tables. Then, the two devices can
dynamic ND entry is not communicate using the ND entries. When the peer device
updated after its lifecycle becomes faulty or its NIC is replaced but the local device
ends, this dynamic ND entry does not receive any status change information about the
is deleted from the ND peer device, the local device continues to send IP
table. datagrams to the peer device. As a result, network traffic is
interrupted because the ND table of the local device is not
promptly updated. To reduce the risk of network traffic
interruptions, an aging timer can be set for each ND entry.
After the aging timer of a dynamic ND entry expires, the
entry is automatically deleted.
Maximum Before a dynamic ND entry The ND aging timer can help reduce the risk of network
number of is aged, a device sends ND traffic interruptions that occur because an ND table is not
probes for aging probe messages to updated quickly enough, but it cannot eliminate problems
aging dynamic the peer device. If the caused by delays. For example, if the aging time of a
2022-07-08 1405
Feature Description
ND entries device does not receive an dynamic ND entry is N seconds, the local device can detect
ND reply message after the status change of the peer device after N seconds.
sending a specified During this period, the ND table of the local device is not
maximum number of aging updated. You can set the maximum number of probes for
probe messages, the aging dynamic ND entries to ensure that the ND table is
dynamic ND entry is updated in time in the preceding situation.
deleted.
Benefits
Dynamic ND entries are dynamically created and updated using ND messages. In this way, they do not need
to be manually maintained, greatly reducing maintenance workload.
9.12.2.4 Proxy ND
Background
ND applies only to the communication of hosts on the same network segment and physical network. When
a Router receives an NS packet from a host, the Router checks whether the destination IPv6 address in the
NS packet is the local IPv6 address. This helps to determine whether the NS packet requests for the local
MAC address. If yes, an NA packet is sent as a reply. If not, the NS packet is discarded.
For the hosts on the same network segment but different physical networks or the hosts that are on the
same network segment and physical network but fail in Layer 2 interworking, proxy ND can be deployed on
the Router between the hosts to allow such hosts to communicate with each other. After proxy ND is
deployed and the Router receives an NS packet, the Router finds that the destination address in the NS
packet is not its own IPv6 address and then replies the source host with an NA packet carrying its own MAC
address and the IPv6 address of the destination host. Specifically, the Router takes the place of the
destination host to reply with an NA packet.
Usage Scenarios
Table 1 describes the usage scenarios for different types of proxy ND.
Routed proxy Hosts that need to communicate reside on the same network segment but different
ND physical networks, and the gateways connecting to the two hosts are configured with
2022-07-08 1406
Feature Description
different IP addresses.
Any proxy ND Hosts that need to communicate reside on the same network segment but different
physical networks, and the gateways connected to the hosts have the same gateway
address.
Intra-VLAN Hosts that need to communicate reside on the same network segment and belong to the
proxy ND same VLAN, but user isolation is configured in the VLAN.
Inter-VLAN Hosts that need to communicate reside on the same network segment but belong to
proxy ND different VLANs.
Local proxy ND Hosts that need to communicate reside on the same network segment and BD, but user
isolation is configured in the BD.
Implementation
• Routed proxy ND
If hosts that need to communicate are on the same network segment but different physical networks
and the gateway connected to the hosts are configured with different IP addresses, enable routed proxy
ND on the interfaces connecting the Router and hosts.
As shown in Figure 1, Device A and Device B are connected to the same network, and the IPv6
addresses of interface 1 and interface 2 belong to different network segments. In this example, Host A
wants to communicate with Host B, and the destination IPv6 address and local IPv6 address are on the
same network segment. Host A sends an NS packet to request for Host B's MAC address. However, Host
B cannot receive the NS packet and therefore fails to send a reply because Host A and Host B are on
different physical networks.
2022-07-08 1407
Feature Description
To address this problem, enable routed ND proxy on Device A's interface 1 and Device B's interface 2.
2. Upon receipt of the NS packet, Device A finds that the destination IPv6 address in the packet is
not its own IPv6 address and therefore determines that the NS packet does not request for its
MAC address. Device A then checks whether routes destined for Host B exist.
• If routes destined for Host B do not exist, the NS packet sent by Host A is discarded.
• If routes destined for Host B exist, Device A checks whether routed proxy ND is enabled on
the interface receiving the NS packet.
■ If routed proxy ND is enabled, Device A sends an NA packet that contains the MAC
address of interface 1 to Host A.
Upon receipt of the NA packet, Host A considers that this packet is sent by Host B. Host
A learns the MAC address of Device A's interface 1 in the NA packet and sends data
packets to Host B using this MAC address.
• Any proxy ND
In scenarios where servers are partitioned into VMs, to allow flexible deployment and migration of VMs
on multiple servers or gateways, the common solution is to configure Layer 2 interworking between
multiple gateways. However, this approach may lead to larger Layer 2 domains on the network and
risks of broadcast storms. To resolve this problem, a common way is to enable any proxy ND on a VM
gateway so that the gateway sends its own MAC address to the source VM and the traffic sent from the
source VM to other VMs is transmitted over routes.
As shown in Figure 2, the IPv6 address of VM1 is 2001:db8:300:400::1/64, the IPv6 address of VM2 is
2001:db8:300:400::2/64, and VM1 and VM2 are on the same network segment. Device A and Device B
are connected to two networks using two interface 1s with the same IPv6 address and MAC address.
Because the destination IPv6 address and local IPv6 address are on the same network segment, if VM1
wants to communicate with VM2, VM1 will send an NS packet to request for VM2's MAC address.
However, because VM1 and VM2 are on different physical networks, VM2 cannot receive the NS packet
and therefore fails to send a reply.
2022-07-08 1408
Feature Description
To address the problem, enable any proxy ND on Device A's interface 1 and Device B's interface 1.
2. Upon receipt of the NS packet, Device A finds that the destination IPv6 address in the packet is
not its own IPv6 address and therefore determines that the NS packet does not request for its
MAC address. Then, Device A checks whether any proxy ND is enabled on the interface receiving
the NS packet.
• If any proxy ND is enabled, Device A sends an NA packet that contains the MAC address of
Interface 1 to VM1.
Upon receipt of the NA packet, VM1 considers that this packet is sent by VM2. VM1 learns
the MAC address of Device A's interface 1 in the NA packet and sends data packets to VM2
using this MAC address.
• Intra-VLAN proxy ND
If hosts belong to the same VLAN but the VLAN is configured with Layer 2 port isolation, intra-VLAN
proxy ND needs to be enabled on the associated VLAN interfaces to enable host interworking.
As shown in Figure 3, Host A and Host B are connected to Device, and the interfaces connecting Device
to Host A and Host B belong to the same VLAN. Because intra-VLAN Layer 2 port isolation is configured
on Device, Host A and Host B cannot communicate with each other at Layer 2.
2022-07-08 1409
Feature Description
2. Upon receipt of the NS packet, Device finds that the destination IPv6 address in the packet is not
its own IPv6 address and therefore determines that the NS packet does not request for its MAC
address. Device then checks whether ND entries destined for Host B exist.
• If such ND entries exist and the VLAN information in the ND entries is consistent with the
VLAN information configured on the interface receiving the NS packet, Device determines
whether intra-VLAN proxy ND is enabled on the associated VLAN interface.
■ If intra-VLAN proxy ND is enabled, Device sends the MAC address of interface 1 to Host
A.
Upon receipt of the NA packet, Host A considers that this packet is sent by Host B. Host
A learns the MAC address of Device's interface 1 in the NA packet and sends data
packets to Host B using this MAC address.
• If such ND entries do not exist, the NS packet sent by Host A is discarded and Device checks
whether intra-VLAN proxy ND is enabled on the associated VLAN interfaces.
• Inter-VLAN proxy ND
If hosts are on the same network segment and physical network but belong to different VLANs, inter-
VLAN proxy ND must be enabled on the associated VLAN interfaces to enable Layer 3 interworking
between the hosts.
2022-07-08 1410
Feature Description
In a VLAN aggregation scenario shown in Figure 4, Host A and Host B are on the same network
segment, but Host A belongs to sub-VLAN 2 and Host B belongs to sub-VLAN 3. Host A and Host B
cannot implement Layer 2 interworking.
2. Upon receipt of the NS packet, Device finds that the destination IPv6 address in the packet is not
its own IPv6 address and therefore determines that the NS packet does not request for its MAC
address. Device then checks whether ND entries destined for Host B exist.
• If such ND entries exist and the VLAN information in the ND entries is inconsistent with the
VLAN information configured on the interface receiving the NS packet, Device determines
whether inter-VLAN proxy ND is enabled on the associated VLAN interface.
■ If inter-VLAN proxy ND is enabled, Device sends the MAC address of Interface 1 to Host
A.
Upon receipt of the NA packet, Host A considers that this packet is sent by Host B. Host
A learns the MAC address of Device's interface 1 in the NA packet and sends data
packets to Host B using this MAC address.
• If such ND entries do not exist, the NS packet sent by Host A is discarded and Device checks
whether inter-VLAN proxy ND is enabled on the associated VLAN interface.
2022-07-08 1411
Feature Description
On the L2VPN+L3VPN IP RAN shown in Figure 5, the CSG is connected to the ASG through L2VE sub-
interfaces, and the ASG terminates L2VPN packets and are connected to the BGP/MPLS IPv6 VPN
through L3VE sub-interfaces. BTS1 belongs to VLAN 2 and BTS2 belongs to VLAN3. Therefore, users
who are connected to BTSs and belong to the same network segment cannot implement Layer 2
interworking.
To address this problem, enable inter-VLAN proxy ND on the L3VE sub-interfaces of the ASG.
2. Upon receipt of the NS packet, the ASG finds that the destination IPv6 address in the packet is
not its own IPv6 address and therefore determines that the NS packet does not request for its
MAC address. The ASG then checks whether ND entries destined for CSG2 exist.
• If such ND entries exist and the VLAN information in the ND entries is inconsistent with the
VLAN information configured on the interface receiving the NS packet, the ASG determines
whether inter-VLAN proxy ND is enabled on the associated VLAN interface.
■ If inter-VLAN proxy ND is enabled, the ASG sends the MAC address of the L3VE sub-
interface to CSG1.
Upon receipt of the NA packet, CSG1 considers that this packet is sent by CSG2. CSG1
learns the MAC address of the ASG's L3VE sub-interface in the NA packet and sends
data packets to CSG2 using this MAC address.
• If such ND entries do not exist, the NS packet sent by CSG1 is discarded and CSG2 checks
whether inter-VLAN proxy ND is enabled on the associated L3VE sub-interface.
2022-07-08 1412
Feature Description
• Local proxy ND
Local proxy ND can be deployed if two hosts on the same network segment and in the same BD want
to communicate with each other but the BD is configured with split horizon.
On the network shown in Figure 6, Host A and Host B are connected to Device. The interfaces
connecting Host A and Host B belong to the same BD as Device. Because split horizon is configured on
Device for the BD, Host A and Host B cannot communicate with each other at Layer 2.
2. Upon receipt of the NS packet, Device finds that the destination IPv6 address in the packet is not
its own IPv6 address and therefore determines that the NS packet does not request for its MAC
address. Device then checks whether ND entries destined for Host B exist.
• If such ND entries exist and the BD information in the ND entries is consistent with the BD
information configured on the interface receiving the NS packet, Device determines whether
local proxy ND is enabled on the associated BD interface.
■ If local proxy ND is enabled, Device sends the MAC address of interface 1 to Host A.
Upon receipt of the NA packet, Host A considers that this packet is sent by Host B. Host
A learns the MAC address of Device's interface 1 in the NA packet and sends data
packets to Host B using this MAC address.
• If such ND entries do not exist, the NS packet sent by Host A is discarded and Device checks
whether local proxy ND is enabled on the associated BD interface.
2022-07-08 1413
Feature Description
corresponding ND entries are generated after the NA packet sent by Host B is received.
Related Concepts
Rate limiting on ND messages helps reduce CPU resource consumption by ND messages, protecting other
services. ND messages include Router Solicitation (RS), Router Advertisement (RA), Neighbor Solicitation
(NS), and Neighbor Advertisement (NA) messages. The rate of ND messages can be limited in the following
modes:
• Limiting the rate of sending ND messages. Table 1 describes how to limit the rate of sending ND
messages in different views.
2022-07-08 1414
Feature Description
The priorities of rate limits for sending ND messages are as follows: rate limit for sending ND multicast messages
configured in the interface view > rate limit for sending ND messages configured in the interface view > rate limit
for sending ND multicast messages configured in the system view > rate limit for sending ND messages configured
in the system view
• Limiting the rate of receiving ND messages. Table 2 describes how to limit the rate of receiving ND
messages in different views.
System view ND message type-based rate limiting on Limiting the number of ND messages to be
ND messages. processed globally if ND message attacks
Specified source MAC address-based rate occur on a device: If a device is attacked, it
limiting on ND messages: limits the rate receives a large number of ND messages
of ND messages with a specified source within a short period. As a result, the device
MAC address. consumes many CPU resources to learn and
Specified source IPv6 address-based rate respond to ND entries, affecting the
limiting on ND messages: limits the rate processing of other services. To resolve this
of ND messages with a specified source issue, configure a rate limit based on an ND
IPv6 address. message type, ND message type+MAC
Specified destination IPv6 address-based address, ND message type+IPv6 address, or
rate limiting on ND messages: limits the other modes in the system view. After the
rate of ND messages with a specified configuration is complete, the device counts
destination IPv6 address. the number of ND messages received per
Specified target IPv6 address-based rate period. If the number of ND messages
limiting on ND messages: limits the rate exceeds the configured limit, the device does
of ND messages with a specified target not process excess ND messages.
IPv6 address.
Any source MAC address-based rate
limiting on ND messages: limits the rate
of ND messages with any source MAC
address.
Any source IPv6 address-based rate
2022-07-08 1415
Feature Description
Benefits
2022-07-08 1416
Feature Description
Rate limiting on ND messages helps reduce CPU resource consumption by ND messages, protecting other
services.
Background
If a device is flooded with IPv6 packets that contain unresolvable destination IPv6 addresses, the device
generates a large number of ND Miss messages. This is because the device has no ND entry that matches
the next hop of the route. IPv6 packets, which trigger ND Miss messages, are sent to the CPU for processing.
As a result, the device generates and delivers many temporary ND entries based on ND Miss messages, and
sends a large number of NS messages to the destination network. This increases CPU usage of the device
and consumes considerable bandwidth resources of the destination network. As shown in Figure 1, the
attacker sends IPv6 packets with the unresolvable destination IPv6 address 2001:db8:1::2 /64 to the gateway
(Device).
Related Concepts
The rate of ND Miss messages can be limited in the following modes:
• Limiting the rate of ND Miss messages globally: If a device is flooded with IPv6 packets that contain
unresolvable destination IPv6 addresses, the number of ND Miss messages to be processed on the
device is limited.
■ Specified source IPv6 address-based rate limiting on ND Miss messages: limits the rate of ND Miss
messages with a specified source IPv6 address.
■ Any source IPv6 address-based rate limiting on ND Miss messages: limits the rate of ND Miss
2022-07-08 1417
Feature Description
• Limiting the rate of ND Miss messages on an interface: If an interface is flooded with IPv6 packets that
contain unresolvable destination IPv6 addresses, the number of ND Miss messages to be processed on
the interface is limited. The configuration on an interface does not affect IPv6 packet forwarding on
other interfaces.
■ Specified source IPv6 address-based rate limiting on ND Miss messages: limits the rate of ND Miss
messages with a specified source IPv6 address on an interface.
Benefits
Rate limiting on ND Miss messages helps reduce CPU resource consumption by ND Miss messages,
protecting other services.
As shown in Figure 1, PW1 is the primary PW and PW2 is the secondary PW. When the BTS transmits traffic
to the BSC, the BTS first sends NS multicast packets to CSG1. CSG1 forwards the received NS multicast
packets to ASG1. Upon receipt of the packets, ASG1 can learn ND entries of the BTS. In this case, ASG2 does
not receive NS multicast packets or learn ND entries of the BTS. When the CSG1-to-ASG1 link is faulty, the
secondary PW takes over and the BSC-to-BTS traffic is forwarded through ASG2. Because ASG2 does not
learn ND entries of the BTS, packet loss occurs.
As shown in Figure 2, the CSG1-to-ASG1 link becomes faulty. In this case, the BTS forwards NS multicast
2022-07-08 1418
Feature Description
packets to the BSC over the CSG1-to-ASG2 link. Upon receipt of NS multicast packets, ASG2 can learn ND
entries of the BTS. In this case, ASG1 does not receive NS multicast packets or learn ND entries of the BTS.
When the CSG1-to-ASG1 link recovers, the primary PW takes over and the BSC-to-BTS traffic is forwarded
through ASG1. Because ASG1 does not learn ND entries of the BTS, packet loss occurs.
With the ND dual-fed function configured on CSG1, when CSG1 receives NS/NA packets from the BTS, CSG1
caches the packets locally. After a primary/secondary PW switchover is performed, CSG1 sends the cached
NS/NA packets to the ASG whose PW status is Active. In this case, the ASG can generate ND entries based
on legitimate NS packets or update ND entries based on legitimate NS or NA packets. This prevents
downstream traffic from being discarded by the ASG, improving network reliability.
Networking Description
Dual-device ND hot backup enables the master device to back up ND entries at the control and forwarding
layers to the backup device in real time. When the backup device switches to a master device, it uses the
backup ND entries to generate host route information. After you deploy dual-device ND hot backup, once a
master/backup VRRP6 switchover occurs, the new master device forwards downlink traffic with no need for
relearning ND entries. Dual-device ND hot backup ensures downstream traffic continuity.
Figure 1 shows a typical network topology in which a VRRP6 backup group is deployed. In the topology,
Device A is a master device, and Device B is a backup device. In normal circumstances, Device A forwards
both upstream and downstream traffic. If Device A or the link between Device A and the switch fails, a
master/backup VRRP6 switchover is triggered and Device B becomes the master device. Then, Device B needs
to advertise network segment routes to devices on the network side so that downstream traffic is directed
from the network side to Device B. If Device B has not learned ND entries from user-side devices, the
downstream traffic is interrupted. Therefore, downstream traffic can be properly forwarded only after Device
2022-07-08 1419
Feature Description
B is deployed with ND dual-device hot backup and learns ND entries of user-side devices.
In addition to a master/backup VRRP6 switchover, a master/backup E-Trunk switchover also triggers this problem.
Therefore, dual-device ND hot backup also applies to E-Trunk master/backup scenarios. This section describes the
implementation of dual-device ND hot backup in VRRP6 scenarios.
Feature Deployment
As shown in Figure 2, a VRRP6 backup group is configured on Device A and Device B. Device A is a master
device, and Device B is a backup device. Device A forwards upstream and downstream traffic.
If Device A or the link between Device A and the switch fails, a master/backup VRRP6 switchover is triggered
and Device B becomes the master device. Device B advertises network segment routes to network-side
devices and downstream traffic is directed to Device B.
2022-07-08 1420
Feature Description
• Before you deploy dual-device ND hot backup, Device B does not learn the ND entry of a user-side
device and therefore a large number of ND Miss messages are transmitted. As a result, system
resources are consumed and downstream traffic is interrupted.
• After you deploy dual-device ND hot backup, Device B backs up ND information on Device A in real
time. When Device B receives downstream traffic, it forwards the downstream traffic based on the
backup ND information.
Definition
An IPv4 over IPv6 tunnel connects isolated IPv4 sites over the IPv6 network.
Objective
During the later transition phase from IPv4 to IPv6, IPv6 networks have been widely deployed, and IPv4 sites
are scattered across IPv6 networks. It is not economical to connect these isolated IPv4 sites with private lines.
The common solution is the tunneling technology. With this technology, IPv4 over IPv6 tunnels can be
created on IPv6 networks to enable communication between isolated IPv4 sites through IPv6 public
networks.
Benefits
Using IPv6 tunnels as virtual links for IPv4 networks allows carriers to fully utilize existing networks without
upgrading internal devices of their backbone networks.
Background
During the later transition phase from IPv4 to IPv6, IPv6 networks have been widely deployed, and IPv4 sites
are scattered across IPv6 networks. It is not economical to connect these isolated IPv4 sites with private lines.
The common solution is the tunneling technology. With this technology, IPv4 over IPv6 tunnels can be
created on IPv6 networks to enable communication between isolated IPv4 sites through IPv6 public
networks.
2022-07-08 1421
Feature Description
encapsulate the IPv4 packets into IPv6 packets. Figure 1 shows the standard protocol-defined format of an
IPv6 header.
2022-07-08 1422
Feature Description
Version A 4-bit field indicating the version number of the Internet The value is 6 for an IPv6
Protocol header.
Traffic Class An 8-bit field indicating the traffic class of an IPv4 over IPv6 The value is an integer ranging
tunnel, used to identify the service class of packets and from 0 to 255. The default value
similar to the ToS field in IPv4 is 0.
Flow Label A 20-bit field used to mark the packets of a specified service The value is an integer ranging
flow so that a device can recognize and provide special from 0 to 1048575. The default
handling of packets in the flow value is 0.
Next Header An 8-bit field indicating the type of header immediately The value is 4 in IPv4 over IPv6
following the IPv6 header tunnel scenarios.
Hop Limit An 8-bit field indicating the maximum number of hops The value is an integer ranging
along a tunnel, allowing packet transmission termination from 1 to 255. The default value
when routing loops occur on an IPv4 over IPv6 tunnel is 64.
Source Address A 128-bit field indicating the source IPv6 address of an IPv6 The address is a 32-digit
packet hexadecimal number, in the
format of X:X:X:X:X:X:X:X.
Destination A 128-bit field indicating the destination IPv6 address of an The address is a 32-digit
Address IPv6 packet hexadecimal number, in the
format of X:X:X:X:X:X:X:X.
2022-07-08 1423
Feature Description
Implementation Principle
An IPv4 over IPv6 tunnel is manually configured between two border Routers. You must manually specify the
source address/source interface and the destination address/destination domain name of the tunnel.
As shown in Figure 2, packets passing through the IPv4 over IPv6 tunnel are processed on border nodes (B
and C), and the other nodes (A, D, and intermediate nodes between B and C) are unaware of the tunnel.
IPv4 packets are transmitted between A, B, C, and D, whereas IPv6 packets are transmitted between B and C.
Therefore, border Routers B and C must be able to process both IPv4 and IPv6 packets, that is, IPv4/IPv6 dual
protocol stack must be supported and enabled on B and C.
Figure 2 shows the processing of IPv4 packets along an IPv4 over IPv6 tunnel.
1. IPv4 packet forwarding: Node A sends an IPv4 packet to node B in which the destination address is the
IPv4 address of node D.
2. Tunnel encapsulation: After B receives the IPv4 packet from A on the IPv4 network, B finds that the
destination address of the IPv4 packet is not itself and the outbound interface to the next hop is a tunnel
interface. B then adds an IPv6 header to the packet. Specifically, node B encapsulates its own IPv6 address
and that of node C into the Source Address and Destination Address fields, respectively, sets the value of
the Version field to 6 and that of the Next Header field to 4, and encapsulates other fields that ensure the
transmission of the packet along the tunnel as required.
3. Tunnel forwarding: Node B searches the IPv6 routing table based on the Destination Address field carried
in the IPv6 packet header and forwards the encapsulated IPv6 packet to node C. Other nodes on the IPv6
network are unaware of the tunnel and process the encapsulated packet as an ordinary IPv6 packet.
4. Tunnel decapsulation: Upon receipt of the IPv6 packet in which the destination address is its own IPv6
address, node C decapsulates the packet by removing its IPv6 header based on the Version field and
determines the encapsulated packet is an IPv4 packet based on the Next Header field.
5. IPv4 packet forwarding: Node C searches the IPv4 routing table based on the Destination Address field of
the IPv4 packet and forwards the packet to Node D.
2022-07-08 1424
Feature Description
Definition
An IPv6 over IPv4 tunnel connects isolated IPv6 sites over the IPv4 network.
Objective
During the earlier transition phase from IPv4 to IPv6, IPv4 networks have been widely deployed, and IPv6
sites are scattered across IPv4 networks. It is not economical to connect these isolated IPv6 sites with private
lines. The common solution is the tunneling technology. With this technology, IPv6 over IPv4 tunnels can be
created on IPv4 networks to enable communication between isolated IPv6 sites through IPv4 public
networks.
Benefits
Fully uses existing networks. Devices on an IPv4 backbone network do not need to upgrade to IPv6 networks.
1. On the border Router, IPv4/IPv6 dual stack is enabled, and an IPv6 over IPv4 tunnel is configured.
2. After the border Router receives a packet from the IPv6 network, if the destination address of the
packet is not the border Router and the outbound interface is a tunnel interface, the border Router
appends an IPv4 header to the IPv6 packet to encapsulate it as an IPv4 packet.
3. On the IPv4 network, the encapsulated packet is transmitted to the remote border Router.
2022-07-08 1425
Feature Description
4. The remote border Router receives the packet, removes the IPv4 header, and then sends the
decapsulated IPv6 packet to the remote IPv6 network.
IPv6 over IPv4 tunnels are classified into IPv6 over IPv4 manual tunnels and IPv6-to-IPv4 (6to4)
tunnels depending on the application scenarios.
The following describes the characteristics and applications of each.
IPv6-to-IPv4 Tunnel
A 6to4 tunnel can connect multiple isolated IPv6 sites through an IPv4 network. A 6to4 tunnel can be a
P2MP connection, whereas a manual tunnel is a P2P connection. Therefore, Routers on both ends of the
6to4 tunnel are not configured in pairs.
A 6to4 tunnel uses a special IPv6 address, a 6to4 address in the format of 2002:IPv4 address:subnet
ID:interface ID. A 6to4 address has a 48-bit prefix composed of 2002:IPv4 address. The IPv4 address is the
globally unique IPv4 address applied by an isolated IPv6 site. This IPv4 address must be configured on the
physical interfaces connecting the border Routers between IPv6 and IPv4 networks to the IPv4 network. The
IPv6 address has a 16-bit subnet ID and a 64-bit interface ID, which are assigned by users in the isolated
IPv6 site.
When the 6to4 tunnel is used for communication between the 6to4 network and the native IPv6 network,
you can configure an anycast address with the prefix 2002:c058:6301/48 on the tunnel interface of the 6to4
relay Router.
The difference between a 6to4 address and anycast address is as follows:
• If a 6to4 address is used, you must configure different addresses for tunnel interfaces of all devices.
• If an anycast address is used, you must configure the same address for the tunnel interfaces of all
devices, effectively reducing the number of addresses.
A 6to4 network refers to a network on which all nodes are configured with 6to4 addresses. A native IPv6
network refers to a network on which nodes do not need to be configured with 6to4 addresses. A 6to4 relay
is required for communication between 6to4 networks and native IPv6 networks.
2022-07-08 1426
Feature Description
6RD Tunneling
IPv6 rapid deployment (6RD) tunneling allows rapid deployment of IPv6 services over an existing IPv4
network.
As an enhancement to the 6to4 solution, 6RD tunneling allows service providers to use one of their own IPv6
prefixes instead of the well-known 2002::/16 prefix standardized for 6to4. 6RD tunneling provides more
flexible network planning, allowing different service providers to deploy 6RD tunnels using different prefixes.
Therefore, 6RD tunneling is the most widely used IPv6 over IPv4 tunneling technology.
Basic Concepts
Figure 3 introduces the basic concepts of 6RD tunneling and 6RD relay.
• 6RD domain
A 6RD domain is a special IPv6 network. The IPv6 address prefixes of devices or hosts within a 6RD
domain share the same 6RD delegated prefix. A 6RD domain consists of 6RD customer edge (CE)
devices and 6RD border relays (BRs). Each 6RD domain uses a unique 6RD prefix.
• 6RD CE
A 6RD CE is an edge node connecting a 6RD network to an IPv4 network. An IPv4 address needs to be
configured for the interface connecting the 6RD CE to the IPv4 network. An IPv6 address needs to be
configured for the interface connecting the 6RD CE to the 6RD network, and the IPv6 prefix is a 6RD
2022-07-08 1427
Feature Description
delegated prefix.
• 6RD BR
A 6RD BR is used to connect a 6RD network to an IPv6 network. At least one IPv4 interface needs to be
configured for the 6RD BR. Each 6RD domain has only one 6RD BR.
• 6RD prefix
A 6RD prefix is an IPv6 prefix used by a service provider. It is part of a 6RD delegated prefix.
A 6RD address has a 64-bit length and consists of a 6RD delegated prefix and a customized subnet mask.
The 6RD delegated prefix is a combination of a 6RD prefix and all or part of an IPv4 address. The length of
the IPv4 address is determined by the IPv4 prefix length configured for the 6RD tunnel. That is, after
subtracting specified high-order bits from the IPv4 address, the rest of the IPv4 address becomes part of the
6RD delegated prefix.
Service Scenarios
A 6RD tunnel can be used in two scenarios: interworking between 6RD domains and interworking between a
6RD domain and an IPv6 network.
2022-07-08 1428
Feature Description
1. A service provider assigns a 6RD prefix and an IPv4 address to 6RD CE A, and 6RD CE A delivers
the 6RD delegated prefix calculated based on the 6RD prefix and IPv4 address to host A.
2. Upon receiving an IPv6 packet sent by host A, 6RD CE A searches the IPv6 forwarding information
base (FIB) table based on the destination address in the IPv6 packet and discovers that the 6RD
tunnel interface is the outbound interface and the destination address is a 6RD address. 6RD CE A
then encapsulates the IPv6 packet into an IPv4 packet in which the destination address is the IPv4
address extracted from the 6RD address and the source address is the IPv4 source address
configured for the local tunnel interface.
3. 6RD CE A forwards the IPv4 packet from the tunnel interface to 6RD CE B over the IPv4 network.
4. Upon receiving the IPv4 packet, 6RD CE B decapsulates the IPv4 packet, searches for the
destination address contained in the IPv6 packet header, and routes the IPv6 packet to host B.
5. After receiving the packet, host B responds to the packet. The returned packet is processed in a
similar way.
• As shown in Figure 6, a 6RD domain and an IPv6 network interwork over a 6RD tunnel.
1. A service provider assigns a 6RD prefix and an IPv4 address for the 6RD CE and assigns an IPv4
address for the 6RD BR. The 6RD CE delivers the 6RD delegated prefix calculated based on the
6RD prefix and IPv4 address to host A.
2. When the IPv6 packet sent by host A reaches the 6RD CE, the 6RD CE searches the IPv6 FIB table
based on the destination address in the IPv6 packet and discovers that the 6RD tunnel interface is
the outbound interface and the next-hop address instead of the destination address is a 6RD
address. The 6RD CE then encapsulates the IPv6 packet into an IPv4 packet in which the
destination address is the IPv4 address extracted from the next-hop 6RD address and the source
address is the IPv4 source address configured for the local tunnel interface.
2022-07-08 1429
Feature Description
3. The 6RD CE forwards the IPv4 packet from the tunnel interface to the 6RD BR over the IPv4
network.
4. Upon receiving the IPv4 packet, the 6RD BR decapsulates the IPv4 packet, searches for the
destination address contained in the IPv6 packet header, and routes the IPv6 packet to host B.
5. After receiving the packet, host B responds to the packet. The returned packet is processed in a
similar way.
2022-07-08 1430
Feature Description
10 IP Routing
Purpose
This document describes the IP Routing feature in terms of its overview, principles, and applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
2022-07-08 1431
Feature Description
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
2022-07-08 1432
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2022-07-08 1433
Feature Description
Definition
As a basic concept on data communication networks, routing is the process of packet relaying or forwarding,
and the process provides route information for packet forwarding.
Purpose
During data forwarding, routers, routing tables, and routing protocols are indispensable. Routing protocols
are used to discover routes and contribute to the generation of routing tables. Routing tables store the
routes discovered by various routing protocols, and routers select routes and implement data forwarding.
10.2.2.1 Routers
On the Internet, network connection devices control network traffic and ensure data transmission quality on
networks. Common network connection devices include hubs, bridges, switches, and routers.
As a standard network connection device, a router is used to select routes and forward packets. Based on the
destination address in the received packet, a router selects a path to send the packet to the next router. The
last router is responsible for sending the packet to the destination host. In addition, a router can select an
optimal path for data transmission.
For example, in Figure 1, traffic from Host A to Host C needs to pass through three networks and two
routers. The hop count from a router to its directly connected network is zero. The hop count from a router
to a network that the router can reach through another router is one. The rest can be deduced by analogy. If
a router is connected to another router through a network, a network segment exists between the two
routers, and they are considered adjacent on the Internet. In Figure 1, the bold arrows indicate network
segments. The routers do not need to know about the physical link composition of each network segment.
2022-07-08 1434
Feature Description
Network sizes may vary greatly, and the actual lengths of network segments vary as well. Therefore, you can
set a weighted coefficient for the network segments of each network and then measure the cost of a route
based on the number of network segments.
A route with the minimal network segments is not necessarily optimal. For example, a route passing through
three high-speed Local Area Network (LAN) network segments may be a better choice than one passing
through two low-speed Wide Area Network (WAN) network segments.
• Routes discovered by link layer protocols, which are also called interface routes or direct routes
2022-07-08 1435
Feature Description
Each router that supports Layer 3 virtual private network (L3VPN) maintains a management routing table (local
core routing table) for each VPN instance.
• Destination: indicates the destination IP address or the destination network address of an IP packet.
• Mask: indicates the network mask. The network mask and the destination address are used together to
identify the address of the network segment where the destination host or router resides.
■ The address of the network segment where the destination host or router resides can be calculated
using after the AND operation on the destination address and network mask. For example, if the
destination address is 1.1.1.1 and the mask is 255.255.255.0, the address of the network segment
where the host or the router resides is 1.1.1.0.
■ The mask, which consists of several consecutive 1s, can be expressed either in dotted decimal
notation or by the number of consecutive 1s in the mask. For example, the length of the mask
255.255.255.0 is 24, and therefore, the mask can also be expressed as 24.
• Pre: indicates the priority of a route that is added to the IP routing table. If multiple routes have the
same destination but different next hops or outbound interfaces or these routes are static routes or
discovered by different routing protocols, the one with the highest priority (the smallest value) is
2022-07-08 1436
Feature Description
selected as the optimal route. For the route priority of each routing protocol, see Table 1.
• Cost: indicates the route cost. When multiple routes to the same destination have the same priority, the
route with the smallest cost is selected as the optimal route.
The Preference is used during the selection of routes discovered by different routing protocols, whereas the Cost is
used during the selection of routes discovered by the same routing protocol.
• Flags:
Route flag:
• Next hop: indicates the IP address of the next router through which an IP packet passes.
Based on the destination addresses, routes can be classified into the following types:
In addition, based on whether the destination is directly connected to the router, route types are as follows:
Setting a default route can reduce the number of routing entries in the routing table. When a router cannot
find a route in the routing table, the router uses the default route (destined for 0.0.0.0/0) to send packets.
In Figure 1, Device A is connected to three networks, and therefore, it has three IP addresses and three
outbound interfaces. Figure 1 shows the routing table on Device A.
2022-07-08 1437
Feature Description
2022-07-08 1438
Feature Description
• Interior Gateway Protocol (IGP): runs within an Autonomous System (AS), such as RIP, OSPF, and IS-IS.
• Exterior Gateway Protocol (EGP): runs between ASs. At present, BGP is the most widely used EGP.
• Distance-vector routing protocol: includes RIP and BGP. BGP is also called a path-vector protocol.
These routing algorithms differ in their methods of discovering and calculating routes.
This chapter describes unicast routing protocols only. For details on multicast routing protocols, see the
HUAWEI NE40E-M2 series Universal Service Router Feature Description - IP Multicast.
Routers manage both static and dynamic routes. These routes can be exchanged between different routing
protocols to implement readvertisement of routing information.
Route Priority
Routing protocols (including static route) may discover different routes to the same destination, but not all
the routes are optimal. Only one routing protocol is used each time to determine the optimal route to a
destination. Routing protocols and static routes have their priorities. When multiple route sources exist, the
route with the highest priority (smallest value) is selected as the optimal route. Table 1 lists routing
protocols and their default priorities.
Value 0 indicates a direct route, and value 255 indicates any route learned from an unreliable source. A
smaller value indicates a higher priority.
2022-07-08 1439
Feature Description
Direct 0
OSPF 10
IS-IS 15
Static 60
RIP 100
BGP 255
IBGP 255
EBGP 255
Priorities can be manually configured for routes of routing protocols, except for direct routes. In addition, the
priorities of static routes can be different.
The NE40E defines external and internal priorities. The external priorities refer to the priorities set by users
for routing protocols. Table 1 lists the default external priorities.
When different routing protocols are configured with the same priority, the system selects the optimal route
based on the internal priority. For the internal priority of each routing protocol, see Table 2.
Direct 0
OSPF inter-area 10
OSPFv3 inter-area 10
IS-IS Level-1 15
IS-IS Level-2 18
EBGP 20
2022-07-08 1440
Feature Description
Static 60
UNR 65
RIP 100
RIPng 100
IBGP 200
For example, both an OSPF route and a static route are destined for 10.1.1.0/24, and their protocol priorities
are set to 5. In this case, the NE40E selects the optimal route based on the internal priorities listed in Table 2
. The internal priority of OSPF (10) is higher than that of the static route (60). Therefore, the device selects
the route discovered by OSPF as the optimal route.
• If multiple OSPFv2 processes learn routes to the same destination and the external and internal priorities of the
routes are the same, the system selects the route with the smallest link cost; if the link costs of the routes are the
same, the routes participate in load balancing. If multiple OSPFv3 processes learn routes to the same destination
and the external and internal priorities of the routes are the same, the system selects the route with the smallest
process ID.
• If multiple IS-IS processes learn routes to the same destination and the external and internal priorities of the routes
are the same, the device selects the route with the smallest link cost; if the link costs of the routes are the same,
the routes perform load balancing.
• If multiple RIP/RIPng processes learn routes to the same destination and the external and internal priorities of the
routes are the same, the device selects the route with the smallest link cost; if the link costs of the routes are the
same, the routes perform load balancing.
Definition
Priority-based route convergence is an important technology that improves network reliability. It provides
faster route convergence for key services. For example, to minimize the interruption of key services in case of
network faults, real-time multicast services require that the routes to the multicast source quickly converge,
and the Multiprotocol Label Switching (MPLS) VPN bearer network requires that routes between PEs also
quickly converge.
2022-07-08 1441
Feature Description
Convergence priorities provide references for the system to converge routes for service forwarding. Different
routes can be set with different convergence priorities, which can be identified as critical, high, medium, and
low listed in descending order.
Purpose
With the integration of network services, requirements on service differentiation increase. Carriers require
that the routes for key services, such as Voice over IP (VoIP) and video conferencing services converge faster
than those for common services. Therefore, routes need to converge based on their convergence priorities to
improve network reliability.
Direct Critical
Static Medium
RIP Low
BGP Low
For VPN route priorities, only 32-bit host routes of OSPF and IS-IS are identified as medium, and the other routes are
identified as low.
Applications
Figure 1 shows networking for multicast services. An IGP runs on the network; Device A is the receiver, and
Device B is the multicast source server with IP address 10.10.10.10/32. The route to the multicast source
server is required to converge faster than other routes, such as 10.12.10.0/24. In this case, you can set a
2022-07-08 1442
Feature Description
higher convergence priority for 10.10.10.10/32 than that of 10.12.10.0/24. Then, when routes converge on
the network, the route to the multicast source server 10.10.10.10/32 converges first, ensuring the
transmission of multicast services.
Load Balancing
The NE40E supports the multi-route model (multiple routes with the same destination and priority). Routes
discovered by one routing protocol with the same destination and cost can load-balance traffic. In each
routing protocol view, you can run the maximum load-balancing number command to perform load
balancing. Load balancing can work per-destination or per-packet.
2022-07-08 1443
Feature Description
Device A needs to forward packets to 10.1.1.0/24 and 10.2.1.0/24. Based on per-destination load
balancing, packets of the same flow are transmitted along the same path. The processes for forwarding
packets on Device A are as follows:
■ The first packet P1 to 10.1.1.0/24 is forwarded through Port 1, and all subsequent packets to
10.1.1.0/24 are forwarded through Port 1.
■ The first packet P1 to 10.2.1.0/24 is forwarded through Port 2, and all subsequent packets to
10.2.1.0/24 are forwarded through Port 2.
Currently, RIP, OSPF, BGP, and IS-IS support load balancing, and static routes also support load balancing.
The number of equal-cost routes for load balancing varies with products.
Route Backup
The NE40E supports route backup to improve network reliability. You can configure multiple routes to the
same destination as required. The route with the highest priority functions as the primary route, and the
other routes with lower priorities function as backup routes.
In most cases, the NE40E uses the primary route to forward packets. If the link fails, the primary route
becomes inactive. The NE40E then selects a backup route with the highest priority to forward packets, and
the primary route is switched to the backup route. When the original primary route recovers, the NE40E
restores and reselects the optimal route. Because the original primary route has the highest priority, the
NE40E selects this route to send packets. Therefore, the backup route is switched to the primary route.
Overview
2022-07-08 1444
Feature Description
Fast Reroute (FRR) functions when the lower layer (physical layer or data link layer) detects a fault. The
lower layer reports the fault to the upper layer routing system and immediately forwards packets through a
backup link.
If a link fails, FRR helps reduce the impact of the link failure on services transmitted on the link.
Background
On traditional IP networks, when a fault occurs at the lower layer of the forwarding link, the physical
interface on the router goes Down. After the router detects the fault, it instructs the upper layer routing
system to recalculate routes and then update routing information. The routing system takes several seconds
to reselect an available route.
For services that are sensitive to packet loss and delay, a convergence time of several seconds is intolerable
because it may lead to service interruptions. For example, the maximum convergence time tolerable for
Voice over IP (VoIP) services is within milliseconds. IP FRR enables the forwarding system to detect a fault
and then to take measures to restore services as soon as possible.
The static routes that are imported between public and private networks do not support IP FRR.
• When optimal routes are selected from the routes discovered by routing protocols, a backup link is
selected for each preferred primary link based on the protocol priority, and then the forwarding
information of primary and backup links is provided for the forwarding engine.
2022-07-08 1445
Feature Description
Feature Description
IP FRR Implements FRR through a backup route. IP FRR is applicable to networks where a
master link and a backup link exist and load balancing is not configured.
Load balancing Implements fast route switching through equal-cost routes and applies to the multi-
link networking with load balancing.
Definition
Indirect next hop is a technique used to speed up route convergence. This technique can change the direct
association between route prefixes and next hop information into an indirect association. Indirect next hop
allows next hop information to be refreshed independently of the prefixes of the same next hop, which
speeds up route convergence.
Purpose
In the scenario requiring route recursion, when IGP routes or tunnels are switched, forwarding entries are
rapidly refreshed, which implements fast route convergence and reduces the impact of route or tunnel
switching on services.
2022-07-08 1446
Feature Description
Recursion Policy
A recursion policy is used to control the recursion result of the next hop to meet requirements of different
scenarios. In route recursion, behaviors do not need to be controlled by the recursion policy. Instead,
recursion behaviors only need to comply with the longest match rule. In addition, the recursion policy needs
to be applied only when VPN routes recurse to tunnels.
By default, the system selects Label Switched Paths (LSPs) for VPNs without performing load balancing. If
load balancing or other types of tunnels are required, configure a tunnel policy and bind it to a tunnel. After
the tunnel policy is applied, the system uses the tunnel bound to the tunnel policy or selects a tunnel based
on the priorities specified in the tunnel policy during next hop recursion.
2022-07-08 1447
Feature Description
As shown in Figure 1, without indirect next hop, prefixes are totally independent, each corresponding to its
next hop and forwarding information. When a dependent route changes, the next hop corresponding to each
prefix performs recursion and forwarding information is updated based on the prefix. In this case, the
convergence time is decided by the number of prefixes.
Note that prefixes of a BGP peer have the same next hop, forwarding information, and refreshed forwarding
information.
As shown in Figure 2, with indirect next hop, prefixes of routes from the same BGP peer share the same next
hop. When a dependent route changes, only the shared next hop performs recursion and forwarding
information is updated based on the next hop. In this case, routes of all prefixes can converge at a time.
Therefore, the convergence time is irrelevant to the number of prefixes.
2022-07-08 1448
Feature Description
In Figure 3, an IBGP peer relationship is established between Device A and Device D. The IBGP peer
relationship is established between two loopback interfaces on the Routers, but the next hop cannot be used
to guide packet forwarding, because it is not directly reachable. Therefore, to refresh the forwarding table
and guide packet forwarding, the system needs to search for the actual outbound interface and directly
connected next hop based on the original IBGP next hop.
Device D receives 100,000 routes from Device A. These routes have the same original BGP next hop. After
recursion, these routes eventually follow the same IGP path (A->B->D). If the IGP path (A->B->D) fails, these
IBGP routes do not need to perform recursion separately, and the relevant forwarding entries do not need to
be refreshed one by one. Note that only the shared next hop needs to perform recursion and be refreshed.
Consequently, these IBGP routes converge to the path (A->C->D) on the forwarding plane. Therefore, the
convergence time depends on only the number of next hops, not the number of prefixes.
If Device A and Device D establish a multi-hop EBGP peer relationship, the convergence procedure is the
same as the preceding one. Indirect next hop also applies to the recursion of a multi-hop EBGP route.
2022-07-08 1449
Feature Description
In Figure 4, a neighbor relationship is established between PE1 and PE2, and PE2 receives 100,000 VPN
routes from PE1. These routes have the same original BGP next hop. After recursion, these VPN routes
eventually follow the same public network tunnel (tunnel 1). If tunnel 1 fails, these routes do not need to
perform recursion separately, and the relevant forwarding entries do not need to be refreshed one by one.
Note that only the shared next hop needs to perform recursion, and the relevant forwarding entries need to
be refreshed. Consequently, these VPN routes converge to tunnel 2 on the forwarding plane. In this manner,
the convergence time depends on only the number of next hops, not the number of prefixes.
10.2.2.14 Multi-Topology
Multi-Topology Overview
On a traditional IP network, only one unicast topology exists, and only one unicast forwarding table is
available on the forwarding plane, which forces services transmitted from one router to the same destination
address to share the same next hop, and various end-to-end services, such as voice and data services, to
share the same physical links. As a result, some links may become heavily congested whereas others remain
relatively idle. To address this problem, configure multi-topology to divide a physical network into different
logical topologies for different services.
By default, the base topology is created on the public network. The class-specific topology can be added or
deleted in the public network address family view. Each topology contains its own routing table. The class-
specific topology supports the addition, deletion, and import of protocol routes.
2022-07-08 1450
Feature Description
Background
A VRRP group is configured on Device1 and Device2 on the network shown in Figure 1. Device1 is a master
device, whereas Device2 is a backup device. The VRRP group serves as a gateway for users. User-to-network
traffic travels through Device1. However, network-to-user traffic may travel through Device1, Device2, or
both of them over a path determined by a dynamic routing protocol. Therefore, user-to-network traffic and
network-to-user traffic may travel along different paths, which interrupts services if firewalls are attached to
devices in the VRRP group, complicates traffic monitoring or statistics collection, and increases costs.
To address the preceding problems, the routing protocol is expected to select a route passing through the
master device so that the user-to-network and network-to-user traffic travels along the same path.
Association between direct routes and a VRRP group can meet expectations by allowing the dynamic routing
protocol to select a route based on the VRRP status.
2022-07-08 1451
Feature Description
Related Concepts
VRRP is a widely used fault-tolerant protocol that groups multiple routing devices into a VRRP group,
improving network reliability. A VRRP group consists of a master device and one or more backup devices. If
the master device fails, the VRRP group switches services to a backup device to ensure communication
continuity and reliability.
A device in a VRRP group operates in one of three states:
• Master: If a network is working correctly, the master device transmits all services.
• Backup: If the master device fails, the VRRP group selects a backup device as the new master device to
take over traffic and ensure uninterrupted service transmissions.
• Initialize: A device in the Initialize state is waiting for an interface Startup message to switch its status
to Master or Backup.
For details about VRRP, see HUAWEI NE40E-M2 series Universal Service Router Feature Description - Network Reliability
- VRRP.
2022-07-08 1452
Feature Description
Implementation
Association between direct routes and a VRRP group allows VRRP interfaces to adjust the costs of direct
network segment routes based on the VRRP status. The direct route with the master device as the next hop
has the lowest cost. A dynamic routing protocol imports the direct routes and selects the direct route with
the lowest cost. For example, VRRP interfaces on Device1 and Device2 on the network shown in Figure 1 are
configured with association between direct routes and the VRRP group. The implementation is as follows:
• Device1 in the Master state sets the cost of its route to the directly connected virtual IP network
segment to 0 (default value).
• Device2 in the Backup state increases the cost of its route to the directly connected virtual IP network
segment.
A dynamic routing protocol selects the route with Device1 as the next hop because this route costs less than
the other route. Therefore, both user-to-network traffic and network-to-user traffic travel through Device1.
Usage Scenario
When a data center is used, firewalls are attached to devices in a VRRP group to improve network security.
Network-to-user traffic cannot pass through a firewall if it travels over a path different than the one used by
user-to-network traffic.
When an IP radio access network (RAN) is configured, VRRP is configured to set the master/backup status of
aggregation site gateways (ASGs) and radio service gateways (RSGs). Network-to-user and user-to-network
traffic may pass through different paths, complicating network operation and management.
Association between direct routes and a VRRP group can address the preceding problems by ensuring the
user-to-network and network-to-user traffic travels along the same path.
Background
In Figure 1, a Layer 2 virtual private network (VPN) connection is set up between each AGG and the CSG
through L2 virtual Ethernet (VE) interfaces, and BGP VPNv4 peer relationships are set up between the AGGs
and RSGs on an L3VPN. L3VE interfaces are configured on the AGGs, and VPN instances are bound to the
L3VE interfaces so that the CSG can access the L3VPN. BGP is configured on the AGGs to import direct
routes between the CSG and AGGs. The AGGs convert these direct routes to BGP VPNv4 routes before
advertising them to the RSGs.
AGG1 functions as the master device in Figure 1. In most cases, the RSGs select routes advertised by AGG1,
and traffic travels along Link A. If AGG1 or the CSG-AGG1 link fails, traffic switches over to Link B. After
AGG1 or the CSG-AGG1 link recovers, the L3VE interface on AGG1 goes from Down to Up, and AGG1
2022-07-08 1453
Feature Description
immediately generates a direct route destined for the CSG and advertises the route to the RSGs.
Downstream traffic then switches over to Link A. However, AGG1 has not learned the MAC address of the
NodeB yet. As a result, downstream traffic is lost.
To address this problem, configure the direct route to respond to L3VE interface status changes after a delay.
After you configure the delay, the RSG preferentially selects routes advertised by AGG1 only after AGG1
learns the MAC address of the NodeB.
Figure 1 Networking for the direct route responding to L3VE interface status changes after a delay
Implementation
After you configure the direct route to respond to L3VE interface status changes after a delay, the cost of the
direct route between the CSG and AGG1 is modified to the configured cost (greater than 0) when the L3VE
interface on AGG1 goes from Down to Up. After the configured delay expires, the cost of the direct route to
the CSG restores to the default value 0. Because BGP has imported the direct route and has advertised it to
RSGs, the cost value determines whether RSGs preferentially select the direct route.
RSGs preferentially transmit traffic over Link B before AGG1 has learned the MAC address of the NodeB,
which reduces traffic loss.
Usage Scenario
This feature applies to IP radio access networks (RANs) on which an L2VPN accesses an L3VPN.
Background
In Figure 1, PWs are set up between the AGGs and the CSG. BGP virtual private network version 4 (VPNv4)
peer relationships are set up between the AGGs and RSGs. Layer 3 virtual Ethernet (L3VE) interfaces are
configured on the AGGs, and VPN instances are bound to the L3VE interfaces so that the CSG can access the
2022-07-08 1454
Feature Description
L3VPN. BGP is configured on the AGGs to import direct routes between the CSG and AGGs. The AGGs
convert these direct routes to BGP VPNv4 routes before advertising them to the RSGs.
AGG1 functions as the master device in Figure 1. In most cases, the RSGs select routes advertised by AGG1,
and traffic travels along Link A. If AGG1 or the CSG-AGG1 link fails, traffic switches over to Link B. After
AGG1 or the CSG-AGG1 link recovers, the L3VE interface on AGG1 goes from Down to Up, and AGG1
immediately generates a direct route destined for the CSG and advertises the route to the RSGs.
Downstream traffic then switches over to Link A. However, PW1 is on standby. As a result, downstream
traffic is lost.
To address this problem, associate the direct route and PW status. After the association is configured, the
RSG preferentially selects the direct route only after PW1 becomes active.
Figure 1 Networking for the association between the direct route and PW status
Implementation
Configuring the association between the direct route and PW status allows a VE interface to adjust the cost
value of the direct route based on PW status. The cost value determines whether the RSGs preferentially
select the direct route because BGP has imported the direct route and has advertised it to RSGs. For
example, if you associate the direct route and PW status on the network shown in Figure 1, the
implementation is as follows:
• When PW1 becomes active, the cost value of the direct route between the CSG and AGG1 restores to
the default value 0. RSGs preferentially transmit traffic over Link A.
• When PW1 is on standby, the cost value of the direct route between the CSG and AGG1 is modified to a
configured value (greater than 0). RSGs preferentially transmit traffic over Link B, which reduces traffic
loss.
Usage Scenario
This feature applies to IP radio access networks (RANs) on which primary/secondary PWs are configured
2022-07-08 1455
Feature Description
Background
By default, IPv4 Address Resolution Protocol (ARP) Vlink direct routes or IPv6 Neighbor Discovery Protocol
(NDP) Vlink direct routes are only used for packet forwarding in the same VLAN and cannot be imported to
dynamic routing protocols. This is because importing Vlink direct routes to dynamic routing protocols will
increase the number of routing entries and affect routing table stability. In some cases, some operations
need to be performed based on Vlink direct routes of VLAN users. For example, different VLAN users use
different route exporting policies to guide traffic from the remote device. In this scenario, ARP or NDP Vlink
direct routes are needed to be imported by a dynamic routing protocol and advertised to the remote device.
After advertisement of ARP or NDP Vlink direct routes is enabled, these direct routes can be imported by a
dynamic routing protocol (IGP or BGP) and advertised to the remote device.
Related Concepts
ARP Vlink direct routes: routing entries with physical interfaces of VLAN users and used to forward IP
packets. These physical interfaces are learned using ARP. On networks with VLANs, IP packets can be
forwarded only by physical interfaces rather than logical interfaces. After learning the ARP entry of a peer
end, VLANIF interfaces, QinQ interfaces, or QinQ VLAN tag termination sub-interfaces generate a 32-bit ARP
Vlink direct route, and the route is displayed in the routing table. Regular physical interfaces do not generate
a 32-bit ARP Vlink direct route in this case.
NDP Vlink direct routes: routing entries carrying IPv6 addresses of VLAN users' physical interfaces. These IPv6
addresses are learned and resolved using NDP.
Implementation
On the network shown in Figure 1, Device A, Device B, and Device C are connected to the logical interface of
Device D which is a Border Gateway Protocol (BGP) peer of Device E. However, Device E needs to
communicate only with Device B rather than Device A and Device C. In this scenario, Vlink direct route
advertisement must be enabled on Device D. Then Device D obtains each physical interface of Device A,
Device B, and Device C, uses a routing policy to filter out network segment routes and routes destined for
Device A and Device C, and advertises the route destined for Device B to Device E.
2022-07-08 1456
Feature Description
Usage Scenario
Vlink direct route advertisement is applicable to networks in which a device needs to add Vlink direct routes
with physical interfaces of VLAN users to the routing table of a dynamic routing protocol before advertising
the routes to remote ends.
Advantages
With Vlink direct route advertisement, a device can add Vlink direct routes to the routing table of a dynamic
routing protocol (such as an Interior Gateway Protocol or BGP) and then use different export policies to
advertise routes required by remote ends.
Service Overview
A data center, used for service access and transmission, consists of many servers, disk arrays, security devices,
and network devices that store and process a great number of services and applications. Firewalls are used
to improve data security, and VRRP groups are configured to improve communication reliability. VRRP may
cause user-to-network traffic and network-to-user traffic to travel along different paths, and as a result, the
firewall may discard the network-to-user traffic because of path inconsistency. To address this problem,
association between direct routes and a VRRP group must be configured.
Networking Description
Figure 1 shows a data center network. A server functions as a core service module in the data center. A
VRRP group protects data exchanged between the server and core devices, improving service security.
Firewalls are attached to devices in the VRRP group to improve network security.
Feature Deployment
The master device transmits server traffic to a core device. When the core device attempts to send traffic to
the server, the traffic can only pass through a firewall attached to the master device. On the network shown
2022-07-08 1458
Feature Description
in Figure 1, the server sends data destined for the core device through the master device, and the core device
sends data destined for the server along a path that an Interior Gateway Protocol (IGP) selects. The
association between the direct routes and a VRRP group can be configured on switch A and switch B so that
the IGP selects a route based on VRRP status. The IGP forwards core-device-to-server traffic over the same
path as the one over which server-to-core-device traffic is transmitted, which prevents the firewall from
discarding traffic.
Service Overview
NodeBs and radio network controllers (RNCs) on an IP radio access network (RAN) do not have dynamic
routing capabilities. Therefore, static routes must be configured to allow NodeBs to communicate with
aggregation site gateways (ASGs) and allow RNCs to communicate with remote service gateways (RSGs)
that are at the aggregation layer. VRRP is configured to provide ASG and RSG redundancy, improving device
reliability and ensuring non-stop transmission of value-added services, such as voice, video, and cloud
computation services over mobile bearer networks.
Networking Description
Figure 1 shows VRRP-based gateway protection applications on an IPRAN. A NodeB is dual-homed to VRRP-
enabled ASGs to communicate with the aggregation network. The NodeB sends traffic destined for the RNC
through the master ASG, whereas the RNC sends traffic destined for the NodeB through either the master or
backup ASG over a path selected by a dynamic routing protocol. As a result, traffic in opposite directions
may travel along different paths. Similarly, the RNC is dual-homed to VRRP-enabled RSGs. Path
inconsistency may also occur.
2022-07-08 1459
Feature Description
Feature Deployment
On the IPRAN shown in Figure 1, both ASGs and RSGs may send and receive traffic over different paths. For
example, user-to-network traffic enters the aggregation network through the master ASG, whereas network-
to-user traffic flows out of the aggregation network from the backup ASG. Path inconsistency complicates
traffic monitoring or statistics collection and increases the cost. In addition, when the master ASG is working
properly, the backup ASG also transmits services, which is counterproductive to VRRP redundancy backup
implementation. Association between direct routes and the VRRP group can be configured to ensure path
consistency.
On the NodeB side, the direct network segment routes of ASG VRRP interfaces can be associated with VRRP
status. The route with the master ASG as the next hop has a lower cost than the route with the backup ASG
as the next hop. The dynamic routing protocol imports the direct routes and selects the route with a lower
cost, ensuring path consistency. Implementation on the RNC side is similar to that on the NodeB side.
RIP 520 -
RIPv2 520 -
RIPng 521 -
BGP - 179
OSPF - -
IS-IS - -
Note that "-" indicates that the related transport layer protocol is not used.
DHCP 67/68 -
DNS 53 53
FTP - 20/21
2022-07-08 1460
Feature Description
HTTP - 80
IMAP - 993
POP3 - 995
SMTP 25 25
SNMP 161 -
TELNET - 23
TFTP 69 -
Note that "-" indicates that the related transport layer protocol is not used.
Terms
Term Description
ARP Vlink direct IP packets are forwarded through a specified physical interface. IP packets cannot
routes be forwarded through a VLANIF interface, because a VLANIF interface is a logical
interface with several physical interfaces as its member interfaces. If an IPv4 packet
reaches a VLANIF interface, the device obtains information about the physical
interface using ARP and generates the relevant routing entry. The route recorded in
the routing entry is called an ARP Vlink direct route.
FRR FRR is applicable to services that are very sensitive to packet loss and delay. When
a fault is detected at the lower layer, the lower layer informs the upper layer
routing system of the fault. Then, the routing system forwards packets through a
backup link. In this manner, the impact of the link fault on services is minimized.
NDP Vlink direct IP packets are forwarded through a specified physical interface. IP packets cannot
routes be forwarded through a VLANIF interface, because a VLANIF interface is a logical
interface with several physical interfaces as its member interfaces. If an IPv6 packet
reaches a VLANIF interface, the device obtains information about the physical
2022-07-08 1461
Feature Description
Term Description
interface using the neighbor discovery protocol (NDP) and generates the relevant
routing entry. The route recorded in the routing entry is called an NDP Vlink direct
route.
UNR When a user goes online through a Layer 2 device, such as a switch, but there is no
available Layer 3 interface and the user is assigned an IP address, no dynamic
routing protocol can be used. To enable devices to use IP routes to forward the
traffic of this user, use the Huawei User Network Route (UNR) technology to assign
a route to forward the traffic of the user.
Abbreviations
CE Customer Edge
PE Provider Edge
RM Route Management
2022-07-08 1462
Feature Description
Definition
Static routes are special routes that are configured by network administrators.
Purpose
On a simple network, only static routes can ensure that the network runs properly. If a router cannot run
dynamic routing protocols or cannot generate routes to a destination network, you can configure static
routes on the router.
Route selection can be controlled using static routes. Properly configuring and using static routes can
improve network performance and guarantee the required bandwidth for important applications. When a
network fault occurs or the network topology changes, however, static routes must be changed manually by
the administrator.
10.3.2.1 Components
On the NE40E, you can run the ip route-static command to configure a static route, which consists of the
following components:
2022-07-08 1463
Feature Description
When creating a static route, you can specify interface-type interface-number, nexthop-address, or both. In
addition, you can configure the Next-Table function, that is, only a VPN instance name (public in the case of
the public network) is specified as the next hop of a static route, and no outbound interface or next hop
address is specified. You can configure the parameters as required.
Every route requires a next-hop address. Before sending a packet, a device needs to search its routing table
for the route matching the destination address in the packet using the longest match rule. The link layer can
find the corresponding link-layer address and then forward the packet only when a next-hop IP address is
available.
When specifying an outbound interface, note the following:
• For a Point-to-Point (P2P) interface, if an outbound interface is specified, the next hop address is the
address of the remote interface connected to the outbound interface. For example, when a GE interface
is configured with PPP encapsulation and obtains the remote IP address through PPP negotiation, you
can specify only an outbound interface, without the need to specify a next hop address.
• When configuring static routes, you are advised not to specify a broadcast interface (such as an
Ethernet interface) or a virtual template (VT) interface as the outbound interface. Ethernet interfaces
are broadcast interfaces, and each VT interface can be associated with multiple virtual access interfaces.
If either of the two types of interfaces is specified as the outbound interface, multiple next hops exist
and the next hop cannot be determined. In actual applications, to specify a broadcast interface (such as
an Ethernet interface) or a VT interface as the outbound interface, you need to specify a next hop
address along with the outbound interface.
In this example, static routes to networks 3, 4, and 5 need to be configured on Device A; static routes to
networks 1 and 5 need to be configured on Device B; static routes to networks 1, 2, and 3 need to be
configured on Device C.
2022-07-08 1464
Feature Description
2022-07-08 1465
Feature Description
10.3.2.3 Functions
2022-07-08 1466
Feature Description
An IPv6 static route with destination address ::/0 (mask length 0) is a default IPv6 route. If the destination
address of an IPv6 packet fails to match any entry in the routing table, a router selects the default IPv6 route
to forward the IPv6 packet.
• If a BFD session associated with a static route detects a link failure when the BFD session is Down, the
BFD session reports the link failure to the system. The system then deletes the static route from the IP
routing table.
• If a BFD session associated with a static route detects that a faulty link recovers when the BFD session is
Up, the BFD session reports the fault recovery to the system. The system then adds the static route to
the IP routing table again.
• By default, a static route can still be selected even though the BFD session associated with it is
AdminDown (triggered by the shutdown command run either locally or remotely). If a device is
restarted, the BFD session needs to be re-negotiated. In this case, whether the static route associated
with the BFD session can be selected as the optimal route is subject to the re-negotiated BFD session
status.
• Single-hop detection
In single-hop detection mode, the configured outbound interface and next hop address are the
information about the directly connected next hop. The outbound interface associated with the BFD
session is the outbound interface of the static route, and the peer address is the next hop address of the
static route.
• Multi-hop detection
In multi-hop detection mode, only the next hop address is configured. Therefore, the static route must
recurse to the directly connected next hop and outbound interface. The peer address of the BFD session
is the original next hop address of the static route, and the outbound interface is not specified. In most
cases, the original next hop is an indirect next hop. Multi-hop detection is performed on the static
routes that support route recursion.
For details about BFD, see the HUAWEI NE40E-M2 series Universal Service RouterFeature Description - Network
2022-07-08 1467
Feature Description
Reliability.
Background
Static routes do not have a dedicated detection mechanism. If a link fails, the corresponding static route will
not be automatically deleted from the IP routing table. In this case, intervention of a network administrator
is required. This delays the link switchover and may cause lengthy service interruptions.
BFD for static route can use BFD sessions to monitor the link status of a static route. However, both ends of
a link must support BFD. BFD for static route may not be supported in some scenarios, for example, on a
network with Layer 2 devices. NQA for static route can solve this problem.
Table 1 compares BFD for static route and NQA for static route.
Table 1 Comparison between BFD for static route and NQA for static route
Requirements for devices Both ends must support BFD. NQA is required on only one end.
Related Concepts
NQA helps carriers monitor network quality of service (QoS) in real time, and can be used to diagnose the
fault if a network fails.
NQA relies on a test instance to monitor the link status. The two ends of an NQA test are called the NQA
client and the NQA server. An NQA test is initiated by the NQA client. NQA test results are classified into the
following types:
• Success: The test is successful. It instructs the routing management module to set the status of the
static route to active and add the static route to the routing table.
• Failed: The test fails. It instructs the routing management module to set the status of the static route to
inactive and delete the static route from the routing table.
• No result: The test is running and no result has been obtained. If the test result is no result, the status
of the static route is not changed.
For NQA details, see "System Monitor" in HUAWEI NE40E-M2 seriesUniversal Service Router Feature Description.
2022-07-08 1468
Feature Description
Implementation
NQA for static route associates an NQA test instance with a static route and uses the NQA test instance to
monitor the link status. The routing management module determines whether a static route is active based
on the NQA test result. If the static route is inactive, the routing management module deletes it from the IP
routing table and selects a normal backup link for data forwarding, which prevents lengthy service
interruptions.
In Figure 1, each access switch is connected to 10 clients, and a total of 100 clients exist. Because no
dynamic routing protocol can be used between DeviceB and clients, static routes to the clients need to be
configured on DeviceB. To ensure network stability, the same configuration is performed on DeviceC for
backup.
DeviceA, DeviceB, and DeviceC run a dynamic routing protocol and can learn routes from each other. On
DeviceB and DeviceC, the dynamic routing protocol is configured to import static routes, and different costs
are configured for the static routes. In this way, DeviceA can learn the routes to clients from DeviceB and
DeviceC through the dynamic routing protocol. DeviceA then determines the primary and backup links based
on the costs.
NQA for static route is configured on DeviceB, and an NQA test instance is used to monitor the status of the
primary link. If the primary link fails, the corresponding static route is deleted and downlink traffic switches
to the backup link. When both the links are running properly, downlink traffic is preferentially transmitted
along the primary link.
2022-07-08 1469
Feature Description
NQA test instances can monitor the links of IPv4 and IPv6 static routes. The mechanisms for monitoring IPv4 and IPv6
static routes are the same.
Each static route can be associated with only one NQA test instance.
Usage Scenario
NQA for static route applies to a network where BFD for static route cannot be deployed due to device
limitations. For example, user devices access the network through the switch, OLT, DSLAM, MSAN, or xDSL
mode.
Benefits
It can rapidly and periodically detect the link status of static routes and implement rapid primary/backup link
switchovers, preventing lengthy service interruptions.
Background
When the link over which a static route runs fails, the static route will be deleted from the IP routing table
2022-07-08 1470
Feature Description
to trigger a route re-selection. After a new route is selected, traffic is switched to the new route. Some
carriers, however, may require that specific traffic always travel along a fixed link, regardless of the link
status. Static route permanent advertisement is introduced to meet this service need.
Implementation
With static route permanent advertisement, a static route can still be advertised and added to the IP routing
table for route selection even when the link over which the static route runs fails. After static route
permanent advertisement is configured, the static route can be advertised and added to the IP routing table
in both of the following scenarios:
• An outbound interface is configured for the static route, and the outbound interface has an IP address.
Static route permanent advertisement is not affected no matter whether the outbound interface is Up.
• No outbound interface is configured for the static route. Static route permanent advertisement is not
affected no matter whether the static route can obtain an outbound interface through route recursion.
After static route permanent advertisement is enabled, a static route always remains in the IP routing table regardless of
route reachability. If the destination of the route becomes unreachable, traffic interruption occurs.
Typical Networking
On the network shown in Figure 1, BR1, BR2, and BR3 belong to ISP1, ISP2, and ISP3 respectively. Two links
(Link A and Link B) exist between BR1 and BR2, but ISP1 expects its service traffic destined for ISP2 to be
always transmitted over Link A.
A direct EBGP peer relationship is established between BR1 and BR2. A static route is created on BR1, with
10.1.1.2/24 (IP address of BR2) as the destination address and the local interface connected to BR2 as the
outbound interface.
2022-07-08 1471
Feature Description
Without static route permanent advertisement, Link A is used to transmit traffic. If Link A fails, BGP will
switch the traffic to Link B.
With static route permanent advertisement, Link A is used to transmit traffic regardless of whether the
destination is reachable through Link A. If Link A fails, no link switchover is performed, causing traffic
interruption. To check whether the destination is reachable through the static route, ping the destination
address of the static route to which static route permanent advertisement is applied.
Background
On a network with a backup path between label switching routers (LSRs), packet loss may occur during a
traffic switchover or switchback because the status of a static route is different from that of a Label
Distribution Protocol (LDP) session. To resolve this problem, configure association between LDP and static
routes.
Typical Networking
Figure 1 shows the typical networking of association between LDP and static routes. LSRA and LSRD
interwork through static routes. Primary and backup static routes are deployed on LSRA, with the next-hop
devices being LSRB and LSRC, respectively. Primary and backup LDP LSPs are established based on the static
routes. The primary LSP uses Link A, and the backup LSP uses Link B. In normal cases, Link A is preferred.
Association between LDP and static routes in switchover and switchback scenarios is described as follows.
Figure 1 Networking of LSP switching scenario where association between LDP and static routes is configured
Switchover scenario
In the switchover scenario, traffic of the primary static route is not switched to the backup link when the LDP
session on the primary link fails (not because of a link fault). As a result, traffic on the LSP over the primary
link is interrupted.
After an LDP session is established, LSP traffic travels along the primary link, Link A (LSRA → LSRB → LSRD).
If the LDP session between LSRA and LSRB is interrupted, traffic of the primary LSP is switched immediately
2022-07-08 1472
Feature Description
to the backup link, Link B (LSRA → LSRC → LSRD). However, because the link between LSRA and LSRB is
normal, traffic of the primary static route is not switched to the backup link. The asynchronous state
between LDP and the primary static route causes an LSP traffic interruption.
If association between LDP and static routes is enabled, traffic is automatically switched to the backup link
when the LDP session goes Down, ensuring uninterrupted traffic forwarding.
Switchback scenario
In the switchback scenario, when the primary link recovers from a fault, the traffic of the primary static
route is switched back to Link A earlier than the traffic of the primary LSP because the convergence of static
routes is faster than that of LDP LSPs. As a result, the backup LSP on Link B cannot be used, and the LSP on
Link A has not been set up yet. As a result, LSP traffic is interrupted.
If the link between LSRA and LSRB fails, traffic is switched immediately to the backup link, Link B (LSRA →
LSRC → LSRD). After the link between LSRA and LSRB recovers, traffic of the primary static route is
immediately switched back to Link A (LSRA → LSRB → LSRD). However, the backup LSP cannot be used, and
the LSP on Link A has not recovered yet. As a result, traffic is interrupted.
If association between LDP and static routes is enabled, the static route on Link A becomes active only when
the LDP session on Link A goes Up. In this manner, the states of the primary static route and LSP are
asynchronous during the switchback, which prevents traffic loss.
Usage Scenario
Association between LDP and static routes applies to scenarios where a static route backup path exists
between LSRs.
Benefits
Association between LDP and static routes ensures state consistency between LDP and static routes, prevents
traffic loss, and improves network reliability.
Definition
Routing Information Protocol (RIP) is a simple Interior Gateway Protocol (IGP). RIP is used in small-scale
networks, such as campus networks and simple regional networks.
As a distance-vector routing protocol, RIP exchanges routing information through User Datagram Protocol
(UDP) packets with port number 520.
RIP employs the hop count as the metric to measure the distance to the destination. In RIP, by default, the
number of hops from the Router to its directly connected network is 0; the number of hops from the Router
to a network that is reachable through another Router is 1, and so on. The hop count (the metric) equals the
2022-07-08 1473
Feature Description
number of Routers along the path from the local network to the destination network. To speed up route
convergence, RIP defines the hop count as an integer that ranges from 0 to 15. A hop count that is greater
than or equal to 16 is classified as infinite, indicating that the destination network or host is unreachable.
Due to the hop limit, RIP is not applicable to large-scale networks.
RIP has two versions:
RIP supports split horizon, poison reverse, and triggered update, which improves the performance and
prevents routing loops.
Purpose
As the earliest IGP, RIP is used in small and medium-sized networks. Its implementation is simple, and the
configuration and maintenance of RIP are easier than those of Open Shortest Path First (OSPF) and
Intermediate System-to-Intermediate System (IS-IS). Therefore, RIP is widely used on live networks.
10.4.2.1 RIP-1
RIP version 1 (RIP-1) is a classful routing protocol, which supports only the broadcast of protocol packets.
Figure 1 shows the format of a RIP-1 packet. A RIP packet can carry a maximum of 25 routing entries. RIP is
based on UDP, and a RIP-1 packet cannot be longer than 512 bytes. RIP-1 packets do not carry any mask
information, and RIP-1 can identify only the routes to natural network segments, such as Class A, Class B,
and Class C. Therefore, RIP-1 does not support route summarization or discontinuous subnets.
2022-07-08 1474
Feature Description
10.4.2.2 RIP-2
RIP version 2 (RIP-2) is a classless routing protocol. Figure 1 shows the format of a RIP-2 packet.
• Supports external route tags and uses a routing policy to flexibly control routes based on the tag.
• Supports route summarization and classless inter-domain routing (CIDR) by adding mask information to
RIP-2 packets.
• Supports next hop specification so that the optimal next hop address can be specified on the broadcast
network.
• Supports Update packets transmission along multicast routes. Only the Routers that support RIP-2 can
receive RIP-2 packets, which reduces resource consumption.
• Provides three packet authentication modes: simple text authentication, Message Digest 5 (MD5)
authentication and HMAC-SHA256 authentication. For the sake of security, using the HMAC-SHA256
authentication is recommended.
10.4.2.3 Timers
RIP uses the following timers:
• Update timer: The Update timer periodically triggers Update packet transmission. By default, the
interval at which Update packets are sent is 30s.
• Age timer: If a RIP device does not receive any packets from its neighbor to update a route before the
route expires, the RIP device considers the route unreachable. By default, the age timer interval is 180s.
• Garbage-collect timer: If a route becomes invalid after the age timer expires or a route unreachable
message is received, the route is placed into a garbage queue instead of being immediately deleted
from the RIP routing table. The garbage-collect timer monitors the garbage queue and deletes expired
routes. If an Update packet of a route is received before the garbage-collect timer expires, the route is
placed back into the age queue. The garbage-collect timer is set to avoid route flapping. By default, the
garbage collect timer interval is 120s.
• Hold-down timer: If a RIP device receives an updated route with cost 16 from a neighbor, the route
enters the holddown state, and the hold-down timer is started. To avoid route flapping, the RIP device
2022-07-08 1475
Feature Description
does not accept any updated routes until the hold-down timer expires, even if the cost is less than 16
except in the following scenarios:
1. The cost carried in the Update packet is less than or equal to that carried in the last update
packet.
2. The hold-down timer expires, and the corresponding route enters the Garbage state.
The relationship between RIP routes and the four timers is as follows:
• The advertisement of RIP routing updates is triggered by the update timer with a default value 30
seconds.
• Each routing entry is associated with two timers: the age timer and garbage-collect timer.
1. Each time a route is learned and added to the routing table, the age timer is started.
2. If no Update packet is received from the neighbor within 180 seconds after the age timer is
started, the metric of the corresponding route is set to 16, and the garbage-collect timer is
started.
• If no Update packet is received within 120 seconds after the garbage-collect timer is started, the
corresponding routing entry is deleted from the routing table after the garbage-collect timer expires.
• By default, the hold-down timer is disabled. If you configure a hold-down timer, it starts after the
system receives a route with a cost greater than 16 from its neighbor.
In Figure 1, Device A sends Device B a route to 10.0.0.0/8. If split horizon is not configured, Device B will
send this route back to Device A after learning it from Device A. As a result, Device A learns the following
routes to 10.0.0.0/8:
• A route with Device B as the next hop and total two hops
2022-07-08 1476
Feature Description
Only direct routes, however, are active in the RIP routing table of Device A.
If the route from Device A to 10.0.0.0/8 becomes unreachable and Device B is not notified, Device B still
considers the route to 10.0.0.0/8 reachable and continues sending this route to Device A. Then, Device A
receives incorrect routing information and considers the route to 10.0.0.0/8 reachable through Device B;
Device B considers the route to 10.0.0.0/8 reachable through Device A. As a result, a loop occurs on the
network.
After split horizon is configured, Router B no longer sends the route back after learning the route, which
prevents such a loop.
In Figure 2, Device A sends the route to 10.0.0.0/8 that it learns from Device B only to Device C.
In Figure 1, Device A sends Device B a route to 10.0.0.0/8. If poison reverse is not configured, Device B will
send this route back to Device A after learning it from Device A. As a result, Device A learns the following
2022-07-08 1477
Feature Description
routes to 10.0.0.0/8:
• A route with Device B as the next hop and total two hops
Only direct routes, however, are active in the RIP routing table of Device A.
If the route from Device A to 10.0.0.0 becomes unreachable and Device B is not notified, Device B still
considers the route to 10.0.0.0/8 reachable and continues sending this route to Device A. Then, Device A
receives incorrect routing information and considers the route to 10.0.0.0/8 reachable through Device B;
Device B considers the route to 10.0.0.0/8 reachable through Device A. As a result, a loop occurs on the
network.
With poison reverse, after Device B receives the route from Device A, Device B sends a route unreachable
message to Device A with cost 16. Device A then no longer learns the reachable route from Device B, which
prevents routing loops.
If both split horizon and poison reverse are configured, only poison reverse takes effect.
In Figure 1, if the route to 10.4.0.0 becomes unreachable, Device C learns the information first. By default, a
RIP-enabled device sends routing updates to its neighbors every 30s. If Device C receives an Update packet
from Device B within 30s while Device C is still waiting to send Update packets, Device C learns the incorrect
route to 10.4.0.0. In this case, the next hops of the routes from Device B or Device C to network 10.4.0.0 are
Device C and Device B respectively, which results in routing loops. If Device C sends an Update packet to
2022-07-08 1478
Feature Description
Device B immediately after it detects a network, Device B can rapidly update its routing table, which
prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local Device sets the
cost of the route to 16 and then advertises the route immediately to its neighbors. This process is called
route poisoning.
• Interface-based summarization
Users can specify a summary address.
For example, users can configure a RIP-enabled interface to summarize the route 10.1.1.0/24 with
metric 2 and route 10.1.2.0/24 with metric 3 into the route 10.1.0.0/16 with metric 2.
2022-07-08 1479
Feature Description
Background
Routing Information Protocol (RIP)-capable devices monitor the neighbor status by exchanging Update
packets periodically. During the period local devices detect link failures, carriers or users may lose a large
number of packets. Bidirectional forwarding detection (BFD) for RIP can speed up fault detection and route
convergence, which improves network reliability.
After BFD for RIP is configured on the Router, BFD can detect a fault (if any) within milliseconds and notify
the RIP module of the fault. The Router then deletes the route that passes through the faulty link and
switches traffic to a backup link. This process speeds up RIP convergence.
Table 1 describes the differences before and after BFD for RIP is configured.
Related Concepts
The BFD mechanism bidirectionally monitors data protocol connectivity over the link between two routers.
After BFD is associated with a routing protocol, BFD can rapidly detect a fault (if any) and notify the
protocol module of the fault, which speeds up route convergence and minimizes traffic loss.
• Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators) must be
configured, and requests must be delivered manually to establish BFD sessions.
Static BFD is applicable to networks on which only a few links require high reliability.
• Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing protocols, and the local
discriminator is dynamically allocated, whereas the remote discriminator is obtained from BFD packets
sent by the neighbor.
When a new neighbor relationship is set up, a BFD session is established based on the neighbor and
detection parameters, including source and destination IP addresses. When a fault occurs on the link,
the routing protocol associated with BFD can detect the BFD session Down event. Traffic is switched to
the backup link immediately, which minimizes data loss.
Dynamic BFD is applicable to networks that require high reliability.
2022-07-08 1480
Feature Description
Implementation
For details about BFD implementation, see "BFD" in Universal Service Router Feature Description - Reliability
. Figure 1 shows a typical network topology for BFD for RIP.
1. RIP neighbor relationships are established among Device A, Device B, and Device C and between
Device B and Device D.
3. Device A calculates routes, and the next hop along the route from Device A to Device D is Device
B.
4. If a fault occurs on the link between Device A and Device B, BFD will rapidly detect the fault and
report it to Device A. Device A then deletes the route whose next hop is Device B from the routing
table.
5. Device A recalculates routes and selects a new path Device C → Device B → Device D.
6. After the link between Device A and Device B recovers, a new BFD session is established between
the two routers. Device A then reselects an optimal link to forward packets.
1. RIP neighbor relationships are established among Device A, Device B, and Device C and between
Device B and Device D.
3. If a fault occurs on the link between Device A and Device B, BFD will rapidly detect the fault and
report it to Device A. Device A then deletes the route whose next hop is Device B from the routing
table.
4. After the link between Device A and Device B recovers, a new BFD session is established between
the two routers. Device A then reselects an optimal link to forward packets.
2022-07-08 1481
Feature Description
Usage Scenario
BFD for RIP is applicable to networks that require high reliability.
Benefits
BFD for RIP improves network reliability and enables devices to rapidly detect link faults, which speeds up
route convergence on RIP networks.
• Simple authentication: The authenticated party adds the configured password directly to packets for
authentication. This authentication mode provides the lowest password security.
• MD5 authentication: The authenticated party uses the Message Digest 5 (MD5) algorithm to generate a
ciphertext password and adds it to packets for authentication. This authentication mode improves
password security. For the sake of security, using the HMAC-SHA256 algorithm rather than the MD5
algorithm is recommended.
• Keychain authentication: The authenticated party configures a keychain that changes over time. This
authentication mode further improves password security.
Keychain authentication improves RIP security by periodically changing the password and the encryption
algorithms. For details about Keychain, see "Keychain" in NE40E Feature Description - Security.
• HMAC-SHA256 authentication: The authenticated party uses the HMAC-SHA256 algorithm to generate
a ciphertext password and adds it to packets for authentication.
RIP authentication ensures network security by adding an authentication field used to encrypt a packet
before sending the packet to ensure network security. After receiving a RIP packet from a remote router, the
local router discards the packet if the authentication password in the packet does not match the local
authentication password. This authentication mode protects the local router.
On IP networks of carriers, RIP authentication ensures the secure transmission of packets, improves the
system security, and provides secure network services for carriers.
2022-07-08 1482
Feature Description
Definition
RIP next generation (RIPng) is an extension to RIP version 2 (RIP-2) on IPv6 networks. Most RIP concepts
apply to RIPng.
RIPng is a distance-vector routing protocol, which measures the distance (metric or cost) to the destination
host by the hop count. In RIPng, the hop count from a device to its directly connected network is 0, and the
hop count from a device to a network that is reachable through another device is 1. When the hop count is
equal to or exceeds 16, the destination network or host is defined as unreachable.
To be applied on IPv6 networks, RIPng makes the following changes to RIP:
• UDP port number: RIPng uses UDP port number 521 to send and receive routing information.
• Multicast address: RIPng uses FF02::9 as the link-local multicast address of a RIPng device.
• Prefix length: RIPng uses a 128-bit (the mask length) prefix in the destination address.
• Source address: RIPng uses link-local address FE80::/10 as the source address to send RIPng Update
packets.
Purpose
RIPng is an extension to RIP for support of IPv6.
2022-07-08 1483
Feature Description
• Next hop RTE: It defines the IPv6 address of the next hop and is located before a group of IPv6-prefix
RTEs that have the same next hop. The Metric field of a next hop RTE is always 0xFF.
• IPv6-prefix RTE: It describes the destination IPv6 address and the cost in the RIPng routing table and is
located after a next hop RTE. A next hop RTE can be followed by multiple different IPv6-prefix RTEs.
2022-07-08 1484
Feature Description
10.5.2.2 Timers
RIPng uses the following timers:
• Update timer: This timer periodically triggers Update packet transmission. By default, the interval at
which Update packets are sent is 30s. This timer is used to synchronize RIPng routes on the network.
• Age timer: If a RIPng device does not receive any Update packet from its neighbor before a route
expires, the RIPng device considers the route to its neighbor unreachable.
• Garbage-collect timer: If no packet is received to update an unreachable route after the Age timer
expires, this route is deleted from the RIPng routing table.
• Hold-down timer: If a RIP device receives an updated route with cost 16 from a neighbor, the route
enters the holddown state, and the hold-down timer is started.
On the network shown in Figure 1, after DeviceB sends a route to network 123::45 to DeviceA, DeviceA does
not send the route back to DeviceB.
2022-07-08 1485
Feature Description
In Figure 1, if poison reverse is not configured, DeviceB sends DeviceA a route learned from DeviceA. The
cost of the route from DeviceA to network 123::0/64 is 1. If the route from DeviceA to network 123::0/64
becomes unreachable and DeviceB does not receive an Update packet from DeviceA and keeps sending
DeviceA the route from DeviceA to network 123::0/64, a routing loop occurs.
With poison reverse, after Device B receives the route from Device A, Device B sends a route unreachable
message to Device A with cost 16. Device A then no longer learns the reachable route from Device B, which
prevents routing loops.
If both poison reverse and split horizon are configured, only poison reverse takes effect.
In Figure 1, if network 123::0 is unreachable, DeviceC learns the information first. By default, a RIPng-
enabled device sends Update packets to its neighbors every 30 seconds. If DeviceC receives an Update packet
from DeviceB within 30s when DeviceC is still waiting to send Update packets, DeviceC learns the incorrect
route to 123::0. In this case, the next hops of the routes from DeviceB and DeviceC to 123::0 are DeviceC and
DeviceB, respectively, which results in routing loops. If DeviceC sends an Update packet to DeviceB
immediately after it detects a network fault, DeviceB can rapidly update its routing table, which prevents
routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local Router sets the
2022-07-08 1486
Feature Description
cost of the route to 16 and then advertises the route immediately to its neighbors. This process is called
route poisoning.
Background
On large networks, the RIPng routing table of each device contains a large number of routes, which
consumes lots of system resources. In addition, if a specific link connected to a device within an IP address
range frequently alternates between Up and Down, route flapping occurs.
To address these problems, RIPng route summarization was introduced. With RIPng route summarization, a
device summarizes routes destined for different subnets of a network segment into one route destined for
one network segment and then advertises the summary route to other network segments. RIPng route
summarization reduces the number of routes in the routing table, minimizes system resource consumption,
and prevents route flapping.
Implementation
RIPng route summarization is interface-based. After RIPng route summarization is enabled on an interface,
the interface summarizes routes based on the longest matching rule and then advertises the summary route.
The smallest metric among the specific routes for the summarization is used as the metric of the summary
route.
For example, an interface has two routes: 11:11:11::24 with metric 2 and 11:11:12::34 with metric 3. After
RIPng route summarization is enabled on the interface, the interface summarizes the two routes into the
route 11::0/16 with metric 2 and then advertises it.
Background
As networks develop, network security has become an increasing concern. Internet Protocol Security (IPsec)
2022-07-08 1487
Feature Description
authentication can be used to authenticate RIPng packets. The packets that fail to be authenticated are
discarded, which prevents data transmitted based on TCP/IP from being illegally obtained, tampered with, or
attacked.
Implementation
IPsec has an open standard architecture and ensures secure packet transmission on the Internet by
encrypting packets. RIPng IPsec provides a complete set of security protection mechanisms to authenticate
RIPng packets, which prevents devices from being attacked by forged RIPng packets.
IPsec includes a set of protocols that are used at the network layer to ensure data security, such as Internet
Key Exchange (IKE), Authentication Header (AH), and Encapsulating Security Payload (ESP). The three
protocols are described as follows:
• AH: A protocol that provides data origin authentication, data integrity check, and anti-replay protection.
AH does not encrypt packets to be protected.
• ESP: A protocol that provides IP packet encryption and authentication mechanisms besides the functions
provided by AH. The encryption and authentication mechanisms can be used together or independently.
Benefits
RIPng IPsec offers the following benefits:
• Improves carriers' reputation and competitiveness by preventing services from being tampered with or
attacked by unauthorized users.
Definition
Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol (IGP) developed by the Internet
Engineering Task Force (IETF).
OSPF version 2 (OSPFv2) is intended for IPv4. OSPF version 3 (OSPFv3) is intended for IPv6.
2022-07-08 1488
Feature Description
Purpose
Before the emergence of OSPF, the Routing Information Protocol (RIP) was widely used as an IGP on
networks. RIP is a distance-vector routing protocol. Due to its slow convergence, routing loops, and poor
scalability, RIP is gradually being replaced with OSPF.
Typical IGPs include RIP, OSPF, and Intermediate System to Intermediate System (IS-IS). Table 1 describes
differences among the three typical IGPs.
Routing Uses a distance-vector Uses the shortest path first Uses the SPF algorithm to
algorithm algorithm and exchanges (SPF) algorithm to generate generate an SPT based on
routing information over the a shortest path tree (SPT) the network topology,
User Datagram Protocol based on the network calculates shortest paths to
(UDP). topology, calculates shortest all destinations, and
paths to all destinations, and exchanges routing
exchanges routing information over IP.
information over IP. The SPF algorithm runs
separately in Level-1 and
Level-2 databases.
Benefits
OSPF offers the following benefits:
• Wide application scope: OSPF applies to medium-sized networks with several hundred Routers, such as
enterprise networks.
2022-07-08 1489
Feature Description
• Network masks: OSPF packets can carry masks, and therefore the packet length is not limited by
natural IP masks. OSPF can process variable length subnet masks (VLSMs).
• Fast convergence: When the network topology changes, OSPF immediately sends link state update
(LSU) packets to synchronize the changes to the link state databases (LSDBs) of all Routers in the same
autonomous system (AS).
• Loop-free routing: OSPF uses the SPF algorithm to calculate loop-free routes based on the collected link
status.
• Area partitioning: OSPF allows an AS to be partitioned into areas, which simplifies management.
Routing information transmitted between areas is summarized, which reduces network bandwidth
consumption.
• Equal-cost routes: OSPF supports multiple equal-cost routes to the same destination.
• Hierarchical routing: OSPF uses intra-area routes, inter-area routes, Type 1 external routes, and Type 2
external routes, which are listed in descending order of priority.
• Authentication: OSPF supports area-based and interface-based packet authentication, which ensures
packet exchange security.
• Multicast: OSPF uses multicast addresses to send packets on certain types of links, which minimizes the
impact on other devices.
Router ID
A router ID is a 32-bit unsigned integer, which identifies a Router in an autonomous system (AS). A router ID
must exist before the Router runs OSPF.
A router ID can be manually configured or automatically selected by the Router.
If no router ID has been manually configured, the Router automatically selects the system ID or an interface
IP address as the router ID.
In any of the following situations, router ID reselection may be triggered:
• The system ID or IP address that is selected as the router ID is deleted, and the OSPF process is
restarted.
Areas
When a large number of Routers run OSPF, link state databases (LSDBs) become very large and require a
2022-07-08 1490
Feature Description
large amount of storage space. Large LSDBs also complicate shortest path first (SPF) computation and
overload the Routers. As the network scale expands, there is an increasing probability that the network
topology changes, causing the network to change continuously. In this case, a large number of OSPF packets
are transmitted on the network, leading to a decrease in bandwidth utilization efficiency. In addition, each
time the topology changes, all Routers on the network must recalculate routes.
OSPF prevents frequent LSDB updates and improves network utilization by partitioning an AS into different
areas. Routers can be logically allocated to different groups (areas), and each group is identified by an area
ID. A Router, not a link, resides at the border of an area. and a network segment or link can belong to only
one area. An area must be specified for each OSPF interface.
OSPF areas include common areas, stub areas, and not-so-stubby areas (NSSAs). Table 1 describes these
OSPF areas.
Common By default, OSPF areas are defined as common areas. The backbone area must have
area Common areas include: all its devices connected.
Standard area: is the most prevalent area and transmits All non-backbone areas must
Backbone area: connects to all other OSPF areas and backbone area.
Stub area A stub area is a non-backbone area with only one area The backbone area cannot be
border router (ABR) and generally resides at the border configured as a stub area.
of an AS. The ABR in a stub area does not transmit An autonomous system
received AS external routes, which significantly decreases boundary router (ASBR) cannot
the number of entries in the routing table on the Router exist in a stub area. Therefore,
and the amount of routing information to be AS external routes cannot be
transmitted. To ensure the reachability of AS external advertised within the stub area.
routes, the ABR in the stub area generates a default A virtual link cannot pass
route and advertises the route to non-ABRs in the stub through a stub area.
area.
A totally stub area allows only intra-area routes and
ABR-advertised Type 3 default routes to be advertised
within the area and does not allow AS external routes or
inter-area routes to be advertised.
NSSA An NSSA is similar to a stub area. An NSSA does not An ABR in an NSSA advertises
support Type 5 LSAs but can import AS external routes. Type 7 LSA default routes
Type 7 LSAs carrying the information about AS external within the NSSA.
2022-07-08 1491
Feature Description
routes are generated by ASBRs in an NSSA and are All inter-area routes must be
advertised only within the NSSA. When the Type 7 LSAs advertised by ABRs.
reach an ABR in the NSSA, the ABR translates the Type 7 A virtual link cannot pass
LSAs into Type 5 LSAs and floods them to the entire through an NSSA.
OSPF domain.
A totally NSSA allows only intra-area routes to be
advertised within the area. AS external routes or inter-
area routes cannot be advertised in a totally NSSA.
Router Types
Routers are classified by location in an AS. Figure 1 and Table 2 show the classification.
Internal router All interfaces of an internal router belong to the same OSPF area.
ABR An ABR can belong to two or more areas, one of which must be a
backbone area.
An ABR connects the backbone area and non-backbone areas, and it
can connect to the backbone area either physically or logically.
Backbone router At least one interface on this type of router belongs to the backbone
area.
2022-07-08 1492
Feature Description
Internal routers in the backbone area and all ABRs are backbone
routers.
LSA
OSPF encapsulates routing information into LSAs for transmission. Table 3 describes LSAs and their
functions.
Router-LSA (Type 1) Describes the link status and cost of a Router. Router-LSAs are
generated by each Router and advertised within the area to which the
Router belongs.
Network-LSA (Type 2) Describes the link status on the local network segment. Network-LSAs
are generated by a designated router (DR) and advertised within the
area to which the DR belongs.
ASBR-summary-LSA (Type 4) Describes routes of an area to the ASBRs of other areas. ASBR-
summary-LSAs are generated by an ABR and advertised to other areas,
excluding stub areas, totally stub area, NSSAs, totally NSSAs, and the
areas to which the ASBRs belong.
2022-07-08 1493
Feature Description
NSSA LSA (Type 7) Describes AS external routes. NSSA-LSAs are generated by an ASBR and
advertised only within NSSAs.
Opaque-LSA (Type 9/Type Opaque-LSAs provide a general mechanism for OSPF extension.
10/Type 11) Type 9 LSAs are advertised only on the network segment where the
interface advertising the LSAs resides. The Grace LSAs used in graceful
restart (GR) are one type of Type 9 LSA.
Type 10 LSAs are advertised within an OSPF area. The LSAs that are
used to support traffic engineering (TE) are one type of Type 10 LSA.
Type-11 LSAs are advertised in an AS. The LSAs used to support routing
loop detection for routes imported to OSPF are one type of Type 11
LSA.
2022-07-08 1494
Feature Description
default
Type3 LSA)
Packet Types
OSPF uses IP packets to encapsulate protocol packets. The protocol number is 89. OSPF packets are classified
as Hello, database description (DD), link state request (LSR), link state update (LSU), or link state
acknowledgment (LSAck) packets, as described in Table 5.
Hello packet Hello packets are sent periodically to discover and maintain
OSPF neighbor relationships.
Database description (DD) packet A DD packet contains the summaries of LSAs in the local LSDB.
DD packets are used for LSDB synchronization between two
Routers.
Link state request (LSR) packet LSR packets are sent to OSPF neighbors to request required
LSAs.
A Router sends LSR packets to its OSPF neighbor only after DD
packets have been successfully exchanged.
Link state update (LSU) packet LSU packets are used to transmit required LSAs to OSPF
neighbors.
Link state acknowledgment (LSAck) LSAck packets are used to acknowledge received LSAs.
packet
Route Types
Route types are classified as intra-area, inter-area, Type 1 external, or Type 2 external routes. Intra-area and
inter-area routes describe the network structure of an AS. Type 1 or Type 2 AS external routes describe how
to select routes to destinations outside an AS.
Table 6 describes OSPF routes in descending order of priority.
2022-07-08 1495
Feature Description
Type 2 external route Because a Type 2 external route offers low reliability, its cost is
considered to be much greater than the cost of any internal route to
an ASBR.
Cost of a Type 2 external route = Cost of the route from an ASBR to
the destination
If multiple ASBRs have routes to the same destination, the route
with the lowest cost from the corresponding ASBR to the destination
is selected and imported. If the routes have the same cost from the
corresponding ASBR to each route destination, the route with the
smallest cost from the local router to the corresponding ASBR is
selected. The cost of each Type 2 external route equals the cost of
the route from the corresponding ASBR to the destination.
Broadcast Ethernet
FDDI
2022-07-08 1496
Feature Description
NBMA X.25
P2P LAPB
DR and BDR
On broadcast or NBMA networks, any two Routers need to exchange routing information. As shown in
Figure 2, n Routers are deployed on the network. n x (n - 1)/2 adjacencies must be established. Any route
change on a Router is transmitted to other Routers, which wastes bandwidth resources. OSPF resolves this
problem by defining a DR and a BDR. After a DR is elected, all Routers send routing information only to the
DR. Then the DR broadcasts LSAs. Routers other than the DR and BDR are called DR others. The DR others
establish only adjacencies with the DR and BDR and not with each other. This process reduces the number of
adjacencies established between Routers on broadcast or NBMA networks.
If the original DR fails, Routers must reelect a DR and the Routers except the new DR must synchronize
routing information to the new DR. This process is lengthy, which may cause incorrect route calculations. A
2022-07-08 1497
Feature Description
BDR is used to shorten the process. The BDR is a backup for a DR. A BDR is elected together with a DR. The
BDR establishes adjacencies with all Routers on the network segment and exchanges routing information
with them. If the DR fails, the BDR immediately becomes a new DR. Because no re-election is required and
adjacencies have been established, this process is very short. In this case, a new BDR needs to be elected.
Although this process takes a long time, it does not affect route calculation.
The DR and BDR are not designated manually. Instead, they are elected by all Routers on the network
segment. The DR priority of an interface on the Router determines whether the interface is qualified for DR
or BDR election. On the local network segment, the Routers whose DR priorities are greater than 0 are all
candidates. Hello packets are used for the election. Each Router adds information about the DR elected by
itself to a Hello packet and sends the packet to other Routers on the network segment. If two Routers on
the same network segment declare that they are DRs, the one with a higher DR priority wins. If they have
the same priority, the one with a larger router ID wins. If the priority of a Router is 0, it cannot be elected as
a DR or BDR.
OSPF Multi-Process
OSPF multi-process allows multiple OSPF processes to independently run on the same Router. Route
exchange between different OSPF processes is similar to that between different routing protocols. A Router's
interface can belong only to one OSPF process.
A typical application of OSPF multi-process is that OSPF runs between PEs and CEs in VPN scenarios and
OSPF is also used as an IGP on the VPN backbone network. The OSPF processes on the PEs are independent
of each other.
• An ABR in an area advertises Type 3 default summary LSAs within the area to help the routers in the
area forward inter-area packets.
• An ASBR in an AS advertises Type 5 external default ASE LSAs or Type 7 external default NSSA LSAs to
help the routers in the AS forward AS external packets.
OSPF routes are hierarchically managed. The priority of the default route carried in Type 3 LSAs is higher
than the priority of the default route carried in Type 5 or Type 7 LSAs.
The rules for advertising OSPF default routes are as follows:
• An OSPF device can advertise default route LSAs only when it has an external interface.
• If an OSPF device has advertised default route LSAs, it no longer learns the same type of default route
advertised by other Routers. That is, the device no longer calculates the same type of default route LSA
advertised by other Routers. However, the corresponding LSAs exist in the database.
2022-07-08 1498
Feature Description
• If the advertisement of external default routes depends on other routes, the dependent routes cannot
be the routes (learned by the local OSPF process) in the local OSPF routing domain. This is because
external default routes are used to guide packet forwarding outside the domain. However, the next
hops of routes in the OSPF routing domain are within the domain, unable to guide packet forwarding
outside the domain.
• Before a Router advertises a default route, it checks whether a neighbor in the full state is present in
area 0. The Router advertises a default route only when a neighbor in the full state is present in area 0.
If no such a neighbor exists, the backbone area cannot forward packets and advertising a default route
is meaningless. For the concept of the Full State, see OSPF Neighbor States.
Table 8 describes the principles for advertising default routes in different areas.
Common By default, OSPF devices in a common area do not generate default routes, even if they have
area default routes.
When a default route is generated by another routing process, the Router must advertise the
default route to the entire OSPF AS. To achieve this, a command must be run on the ASBR to
generate a default route. After the configuration is complete, the Router generates a default
ASE LSA (Type 5 LSA) and advertises it to the entire OSPF AS.
If no default route exists on the ASBR, the Router does not advertise a default route.
Totally Stub Neither Type 3 (except default Type 3 LSAs) nor Type 5 LSAs can be advertised within a
Area totally stub area.
A Router in the totally stub area must learn AS external and inter-area routes from an ABR.
After you configure a totally stub area, an ABR automatically generates a default Summary
LSA (Type 3 LSA) and advertises it within the entire totally stub area. Then the device can
learn AS external and inter-area routes from the ABR.
NSSA A small number of AS external routes learned from the ASBR in an NSSA can be imported to
the NSSA. External routes ASE LSAs (Type 5 LSAs) to other areas cannot be advertised within
the NSSA. When at least a neighbor in Full status and an interface that is Up exist in the
backbone area, the ABR automatically generates a Type 7 LSA carrying a default route and
advertises it within the entire NSSA. In this case, a small number of routes are learned
through the ASBR in the NSSA, and other routes are learned through the ABR in the NSSA.
You can manually configure the ASBR to generate a default NSSA LSA (Type 7 LSA) and
2022-07-08 1499
Feature Description
advertise it in the entire NSSA. In this manner, external routes can also be learned through
the ASBR in the NSSA.
An ABR does not translate Type 7 LSA default routes into Type 5 LSA default routes for
transmission in the entire OSPF domain.
Totally A totally NSSA does not allow ASE LSAs (Type 5 LSAs) of external routes or inter-area routes
NSSA (Type 3 LSAs, except the default Type 3 LSAs) to be transmitted within the area.
A Router in this area must external routes to other areas from an ABR. The ABR
automatically generates Type 3 and Type7 LSAs carrying a default route and advertises them
to the entire totally NSSA. Then, AS external and inter-area routes can be advertised within
the area through the ABR.
1. Adjacency establishment
The adjacency establishment process is as follows:
a. The local and remote devices use OSPF interfaces to exchange Hello packets to establish a
Neighbor relationship.
b. The local and remote devices negotiate a master/slave relationship and exchange Database
Description (DD) packets.
c. The local and remote devices exchange link state advertisements (LSAs) to synchronize their link
state databases (LSDBs).
2. Route calculation: OSPF uses the shortest path first (SPF) algorithm to calculate routes, implementing
fast route convergence.
• Neighbor relationship: After the local Router starts, it uses an OSPF interface to send a Hello packet to
the remote Router. After the remote Router receives the packet, it checks whether the parameters
carried in the packet are consistent with its own parameters. If the parameters carried in the packet are
consistent with its own parameters, the remote Router establishes a neighbor relationship with the local
Router.
• Adjacency: After the local and remote Routers establish a neighbor relationship, they exchange DD
2022-07-08 1500
Feature Description
OSPF has eight neighbor states: Down, Attempt, Init, 2-way, Exstart, Exchange, Loading, and Full. Down, 2-
way, and Full are stable states. Attempt, Init, Exstart, Exchange, and Loading are unstable states, which last
only several minutes. Figure 1 shows the eight neighbor states.
OSPF Meaning
Neighbor
State
Down This is the initial state of a neighbor conversation. This state indicates that a Router has not
received any Hello packets from its neighbors within a dead interval.
Attempt In the Attempt state, a Router periodically sends Hello packets to manually configured
neighbors.
NOTE:
Init This state indicates that a Router has received Hello packets from its neighbors but the
neighbors did not receive Hello packets from the Router.
2-way This state indicates that a device has received Hello packets from its neighbors and
Neighbor relationship have been established between the devices.
If no adjacency needs to be established, the neighbors remain in the 2-way state. If
adjacencies need to be established, the neighbors enter the Exstart state.
Exstart In the Exstart state, devices establish a master/slave relationship to ensure that DD packets
are sequentially exchanged.
2022-07-08 1501
Feature Description
OSPF Meaning
Neighbor
State
Exchange In the Exchange state, Routers exchange DD packets. A Router uses a DD packet to describe
its own LSDB and sends the packet to its neighbors.
Loading In the Loading state, a device sends Link State Request (LSR) packets to its neighbors to
request their LSAs for LSDB synchronization.
Full In this state, a device establishes adjacencies with its OSPF neighbors and all LSDBs have
been synchronized.
The neighbor state of the local device may be different from that of a remote device. For example, the neighbor state of
the local Router is Full, but the neighbor state of the remote Router is Loading.
Adjacency Establishment
Adjacencies can be established in either of the following situations:
• Two Routers have established a neighbor relationship and communicate for the first time.
• The designated router (DR) or backup designated router (BDR) on a network segment changes.
2022-07-08 1502
Feature Description
a. Router A uses the multicast address 224.0.0.5 to send a Hello packet through the OSPF interface
connected to a broadcast network. In this case, Router A does not know which Router is the DR
or which Router is a neighbor. Therefore, the DR field is 0.0.0.0, and the Neighbors Seen field is
0.
b. After Router B receives the packet, it returns a Hello packet to Router A. The returned packet
carries the DR field of 2.2.2.2 (ID of Router B) and the Neighbors Seen field of 1.1.1.1 (Router
A's router ID). Router A has been discovered but its router ID is less than that of Router B, and
therefore Router B regards itself as a DR. Then Router B's state changes to Init.
c. After Router A receives the packet, Router A's state changes to 2-way.
The following procedures are not performed for DR others on a broadcast network.
a. Router A sends a DD packet to Router B. The packet carries the following fields:
• I field: The value 1 indicates that the packet is the first DD packet, which is used to
negotiate a master/slave relationship and does not carry LSA summaries.
• M field: The value 1 indicates that the packet is not the last DD packet.
2022-07-08 1503
Feature Description
To improve transmission efficiency, Router A and Router B determine which LSAs in each other's
LSDB need to be updated. If one party determines that an LSA of the other party is already in its
own LSDB, it does not send an LSR packet for updating the LSA to the other party. To achieve
the preceding purpose, Router A and Router B first send DD packets, which carry summaries of
LSAs in their own LSDBs. Each summary identifies an LSA. To ensure packet transmission
reliability, a master/slave relationship must be determined during DD packet exchange. One
party serving as a master uses the Seq field to define a sequence number. The master increases
the sequence number by one each time it sends a DD packet. When the other party serving as a
slave sends a DD packet, it adds the sequence number carried in the last DD packet received
from the master to the Seq field of the packet.
b. After Router B receives the DD packet, Router B's state changes to Exstart and Router B returns
a DD packet to Router A. The returned packet does not carry LSA summaries. Because Router
B's router ID is greater than Router A's router ID, Router B declares itself a master and sets the
Seq field to y.
c. After Router A receives the DD packet, it agrees that Router B is a master and Router A's state
changes to Exchange. Then Router A sends a DD packet to Router B to transmit LSA summaries.
The packet carries the Seq field of y and the MS field of 0. The value 0 indicates that Router A
declares itself a slave.
d. After Router B receives the packet, Router B's state changes to Exchange and Router B sends a
new DD packet containing its own LSA summaries to Router A. The value of the Seq field
carried in the new DD packet is changed to y + 1.
Router A uses the same sequence number as Router B to confirm that it has received DD packets from
Router B. Router B uses the sequence number plus one to confirm that it has received DD packets
from Router A. When Router B sends the last DD packet, it sets the M field of the packet to 0.
3. LSDB synchronization
a. After Router A receives the last DD packet, it finds that many LSAs in Router B's LSDB do not
exist in its own LSDB, so Router A's state changes to Loading. After Router B receives the last
DD packet from Router A, Router B's state directly changes to Full, because Router B's LSDB
already contains all LSAs of Router A.
b. Router A sends an LSR packet for updating LSAs to Router B. Router B returns an LSU packet to
Router A. After Router A receives the packet, it sends an LSAck packet for acknowledgment.
The preceding procedures continue until the LSAs in Router A's LSDB are the same as those in Router
B's LSDB. Router A sets the state of the neighbor relationship with Router B to Full. After Router A and
Router B exchange DD packets and update all LSAs, they establish an adjacency.
2022-07-08 1504
Feature Description
a. After Router B sends a Hello packet to a down interface of Router A, Router B's state changes to
Attempt. A neighbor Router has not been discovered, and Router B regards itself as a DR. The
packet carries the DR field of 2.2.2.2 (ID of Router B) and the Neighbors Seen field of 0.
b. After Router A receives the packet, Router A's state changes to Init and Router A returns a Hello
packet. The returned packet carries the DR and Neighbors Seen fields of 2.2.2.2. Router B has
been discovered but its router ID is greater than that of Router A, and therefore Router A agrees
that Router B is a DR.
The following procedures are not performed for DR others on an NBMA network.
3. LSDB synchronization
The procedure for synchronizing LSDBs on an NBMA network is the same as that on a broadcast
network.
2022-07-08 1505
Feature Description
Route Calculation
OSPF uses the shortest path first (SPF) algorithm to calculate routes, implementing fast route convergence.
OSPF uses an LSA to describe the network topology. A Router LSA describes the attributes of a link between
Routers. A Router transforms its LSDB into a weighted, directed graph, which reflects the topology of the
entire AS. All Routers in the same area have the same graph. Figure 4 shows a weighted, directed graph.
Based on the graph, each Router uses the SPF algorithm to calculate an SPT with itself as the root. The SPT
shows routes to nodes in the AS. Figure 5 shows an SPT.
Figure 5 SPT
When a Router's LSDB changes, the Router recalculates a shortest path. Frequent SPF calculations consume
a large amount of resources and affect Router efficiency. Changing the interval between SPF calculations can
prevent resource consumption caused by frequent LSDB changes. The default interval between SPF
calculations is 5 seconds.
The route calculation process is as follows:
If multiple equal-cost routes are produced during route calculation, the SPF algorithm retains all these routes in
the LSDB.
2022-07-08 1506
Feature Description
If the Router performing an SPF calculation is an ABR, the Router needs to check only Type 3 LSAs in the
backbone area.
If there are multiple paths to an ASBR, check whether the rules for selecting a path to the ASBR among intra-
area and inter-area paths on different types of devices are the same. If the rules are different, routing loops may
occur.
The RFC 1583 compatibility mode and RFC 1583 non-compatibility mode may affect path selection rules. Even in
the same mode, the path selection rules on devices from different vendors may be slightly different. In this case,
the rules used in RFC 1583 compatibility mode or RFC 1583 non-compatibility mode for selecting a path to an
ASBR can be adjusted, preventing loops to some extent.
• Route summarization
Route summarization enables a Router to summarize routes with the same prefix into a single route
and to advertise only the summarized route to other areas. Route summarization reduces the size of a
routing table and improves Router performance.
• Route filtering
OSPF can use routing policies to filter routes. By default, OSPF does not filter routes.
Route Summarization
When a large OSPF network is deployed, an OSPF routing table includes a large number of routing entries.
To accelerate route lookup and simplify management, configure route summarization to reduce the size of
the OSPF routing table. If a link frequently alternates between Up and Down, the links not involved in the
route summarization are not affected. This process prevents route flapping and improves network stability.
Route summarization can be carried out on an ABR or ASBR.
2022-07-08 1507
Feature Description
• ABR summarization
When an ABR transmits routing information to other areas, it generates Type 3 LSAs for each network
segment. If consecutive network segments exist in this area, you can summarize these network
segments into a single network segment. The ABR generates one LSA for the summarized network
segment and advertises only that LSA.
• ASBR summarization
If route summarization has been configured and the local Router is an ASBR, the local Router
summarizes imported Type 5 LSAs within the summarized address range. If an NSSA has been
configured, the local Router also summarizes imported Type 7 LSAs within the summarized address
range.
If the local Router is both an ASBR and an ABR, it summarizes Type 5 LSAs translated from Type 7 LSAs.
Route Filtering
OSPF routing policies include access control lists (ACLs), IP prefix lists, and route-policies. For details about
these policies, see the section "Routing Policy" in the NE40EFeature Description - IP Routing.
OSPF route filtering applies in the following aspects:
• Route import
OSPF can import the routes learned by other routing protocols. A Router uses a configured routing
policy to filter routes and imports only the routes matching the routing policy. Only an ASBR can import
routes, and therefore a routing policy for importing routes must be configured on the ASBR.
• Route learning
A Router uses a routing policy to filter received intra-area, inter-area, and AS external routes. The
Router adds only the routes matching the routing policy to its routing table. All routes can still be
advertised from an OSPF routing table.
The Router filters only routes calculated based on LSAs, and therefore learned LSAs are complete.
2022-07-08 1508
Feature Description
The maximum number of external routes configured for all devices in the OSPF AS must be the same.
When the number of external routes in the LSDB reaches the maximum number, the device enters the
overload state and starts the overflow timer at the same time. The device automatically exits from the
overflow state after the overflow timer expires. Table 1 describes the operations performed by the device
after it enters or exits from the overload state.
Table 1 Operations performed by the device after it enters or exits from the overload state
Staying at overload state Deletes self-generated non-default external routes and stops
advertising non-default external routes.
Discards newly received non-default external routes and does not
reply with a Link State Acknowledgment (LSAck) packet.
Checks whether the number of external routes is still greater than the
configured maximum number when the overflow timer expires.
Restarts the timer if the number of external routes is greater than or
equal to the configured maximum number.
Exits from the overflow state if the number of external routes is less
than the configured maximum number.
2022-07-08 1509
Feature Description
Background
All non-backbone areas must be connected to the backbone area during OSPF deployment to ensure that all
areas are reachable.
In Figure 1, area 2 is not connected to area 0 (backbone area), and Device B is not an ABR. Therefore, Device
B does not generate routing information about network 1 in area 0, and Device C does not have a route to
network 1.
Some non-backbone areas may not be connected to the backbone area. You can configure an OSPF virtual
link to resolve this issue.
Related Concepts
A virtual link refers to a logical channel established between two ABRs over a non-backbone area.
A virtual link is similar to a point-to-point (P2P) connection established between two ABRs. You can
configure interface parameters, such as the interval at which Hello packets are sent, at both ends of the
virtual link as you do on physical interfaces.
Principles
In Figure 2, two ABRs use a virtual link to directly transmit OSPF packets. The device between the two ABRs
only forwards packets. Because the destination of OSPF packets is not the device, the device transparently
transmits the OSPF packets as common IP packets.
2022-07-08 1510
Feature Description
10.6.2.5 OSPF TE
OSPF Traffic Engineering (TE) is developed based on OSPF to support Multiprotocol Label Switching (MPLS)
TE and establish and maintain TE LSPs. In the MPLS TE architecture described in "MPLS Feature Description",
OSPF functions as the information advertising component, responsible for collecting and advertising MPLS
TE information.
In addition to the network topology, TE needs to know network constraints, such as the bandwidth, TE
metric, administrative group, and affinity attribute. However, current OSPF functions cannot meet these
requirements. Therefore, OSPF introduces a new type of LSAs to advertise network constraints. Based on the
network constraints, the Constraint Shortest Path First (CSPF) algorithm can calculate the path subject to
specified constraints.
• Uses the collected TE information to form the TE database (TEDB) so that CSPF can calculate routes.
OSPF does not care about information content or how MPLS uses the information.
TE-LSA
OSPF uses a new type of LSA (Type 10 opaque LSA) to collect and advertise TE information. Type 10 opaque
LSAs contain the link status information required by TE, including the maximum link bandwidth, maximum
reservable bandwidth, current reserved bandwidth, and link color. Based on the OSPF flooding mechanism,
Type 10 opaque LSAs synchronize link status information among devices in an area to form a uniform TEDB
for route calculation.
2022-07-08 1511
Feature Description
• An IGP shortcut-enabled device uses a tunnel interface as an outbound interface but does not advertise
the tunnel interface to neighbors. Therefore, other devices cannot use this tunnel.
• A forwarding adjacency-enabled device uses a tunnel interface as an outbound interface and advertises
the tunnel interface to neighbors. Therefore, other devices can use this tunnel.
• IGP shortcut is unidirectional and needs to be configured only on the device that uses IGP shortcut.
OSPF SRLG
OSPF supports the applications of the Shared Risk Link Group (SRLG) in MPLS by obtaining information
about the TE SRLG flooded among devices in an area. For details, refer to the chapter "MPLS" in this
manual.
Definition
As an extension of OSPF, OSPF VPN enables Provider Edges (PEs) and Customer Edges (CEs) in VPNs to run
OSPF for interworking and use OSPF to learn and advertise routes.
Purpose
2022-07-08 1512
Feature Description
As a widely used IGP, in most cases, OSPF runs in VPNs. If OSPF runs between PEs and CEs, and PEs use
OSPF to advertise VPN routes to CEs, no other routing protocols need to be configured on CEs for
interworking with PEs, which simplifies management and configuration of CEs.
• OSPF is used in a site to learn routes. Running OSPF between PEs and CEs can reduce the number of
the protocol types that CEs must support.
• Similarly, running OSPF both in a site and between PEs and CEs simplifies the work of network
administrators and reduces the number of protocols that network administrators must be familiar with.
• When a network using OSPF but not VPN on the backbone network begins to use BGP/MPLS VPN,
running OSPF between PEs and CEs facilitates the transition.
In Figure 1, CE1, CE3, and CE4 belong to VPN 1, and the numbers following OSPF refer to the process IDs of
the multiple OSPF instances running on PEs.
The routes that PE1 receives from CE1 are advertised to CE3 and CE4 as follows:
1. PE1 imports OSPF routes of CE1 into BGP and converts them to BGP VPNv4 routes.
3. PE2 imports the BGP VPNv4 routes into OSPF and then advertises these routes to CE3 and CE4.
The process of advertising routes of CE4 or CE3 to CE1 is the same as the preceding process.
2022-07-08 1513
Feature Description
A non-backbone area (Area 1) is configured between PE1 and CE1, and a backbone area (Area 0) is
configured in Site 1. The backbone area in Site 1 is separated from the VPN backbone area. To ensure that
the backbone areas are contiguous, a virtual link is configured between PE1 and CE1.
OSPF Domain ID
If inter-area routes are advertised between local and remote OSPF areas, these areas are considered to be in
the same OSPF domain.
• Each OSPF domain has one or more domain IDs. If more than one domain ID is available, one of the
domain IDs is a primary ID, and the others are secondary IDs.
• If an OSPF instance does not have a specific domain ID, its ID is considered as null.
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of OSPF routes
(Type 3, Type 5, or Type 7) to be advertised to CEs based on domain IDs.
• If local domain IDs are the same as or compatible with remote domain IDs in BGP routes, PEs advertise
Type 3 routes.
2022-07-08 1514
Feature Description
Table 1 Domain ID
Relationship Between Local and Remote Comparison Type of the Generated Routes
Domain IDs Between Local
and Remote
Domain IDs
Both the local and remote domain IDs are null. Equal Inter-area routes
The remote domain ID is the same as the local Equal Inter-area routes
primary domain ID or one of the local
secondary domain IDs.
The remote domain ID is different from the Not equal If the local area is a non-NSSA,
local primary domain ID or any of the local external routes are generated.
secondary domain IDs. If the local area is an NSSA, NSSA
routes are generated.
In Figure 3, on PE1, OSPF imports a BGP route destined for 10.1.1.1/32 and then generates and advertises a
Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPF route with 10.1.1.1/32 as the destination address and
PE1 as the next hop and advertises the route to PE2. Therefore, PE2 learns an OSPF route with 10.1.1.1/32 as
the destination address and CE1 as the next hop.
Similarly, CE1 also learns an OSPF route with 10.1.1.1/32 as the destination address and PE2 as the next hop.
PE1 learns an OSPF route with 10.1.1.1/32 as the destination address and CE1 as the next hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops respectively, and the next hop of
the routes from PE1 and PE2 to 10.1.1.1/32 is CE1, which leads to a routing loop.
In addition, the priority of an OSPF route is higher than that of a BGP route. Therefore, on PE1 and PE2, BGP
2022-07-08 1515
Feature Description
routes to 10.1.1.1/32 are replaced with the OSPF route, and the OSPF route with 10.1.1.1/32 as the
destination address and CE1 as the next hop is active in the routing tables of PE1 and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by OSPF is deleted,
which causes the OSPF route to be withdrawn. As a result, no OSPF route exists in the routing table, and the
BGP route becomes active again. This cycle causes route flapping.
OSPF VPN provides a few solutions to routing loops, as described in Table 2.
VPN Route Tag The VPN route tag is carried in Type 5 or When a PE detects that the VPN
Type 7 LSAs generated by PEs based on the route tag in the incoming LSA is
received BGP VPN route. the same as the local route tag,
It is not carried in BGP extended community the PE ignores the LSA, which
attributes. The VPN route tag is valid only on prevents routing loops.
2022-07-08 1516
Feature Description
Default route It is a route whose destination IP address and Default routes are used to
mask are both 0. forward traffic from CEs or sites
where CEs reside to the VPN
backbone network.
Exercise caution when disabling routing loop prevention because it may cause routing loops.
During BGP or OSPF route exchanges, routing loop prevention prevents OSPF routing loops in VPN sites.
In the inter-AS VPN Option A scenario, if OSPF runs between ASBRs to transmit VPN routes, the remote
ASBR may fail to learn the OSPF routes sent by the local ASBR due to the routing loop prevention
mechanism.
In Figure 4, inter-AS VPN Option A is deployed with OSPF running between PE1 and CE1. CE1 sends VPN
routes to CE3.
1. PE1 learns routes to CE1 using the OSPF process in a VPN instance, imports these routes into MP-BGP,
and sends the MP-BGP routes to ASBR1.
2. After receiving the MP-BGP routes, ASBR1 imports the routes into the OSPF process in a VPN instance
and generates Type 3, Type 5, or Type 7 LSAs carrying DN bit 1.
3. ASBR2 uses OSPF to learn these LSAs and checks the DN bit of each LSA. After learning that the DN
2022-07-08 1517
Feature Description
bit in each LSA is 1, ASBR2 does not add the routes carried in these LSAs to its routing table.
The routing loop prevention mechanism prevents ASBR2 from learning the OSPF routes sent from ASBR1. As
a result, CE1 cannot communicate with CE3.
To address the preceding problem, use either of the following methods:
• Disable the device from setting the DN bit to 1 in the LSAs when importing BGP routes into OSPF. For
example, if ASBR1 does not set the DN bit to 1 when importing MP-BGP routes into OSPF. After ASBR2
receives these routes and finds that the DN bit in the LSAs carrying these routes is 0, ASBR2 will add the
routes to its routing table.
• Disable the device from checking the DN bit after receiving LSAs. For example, ASBR1 sets the DN bit to
1 in LSAs when importing MP-BGP routes into OSPF. ASBR2, however, does not check the DN bit after
receiving these LSAs.
The preceding methods can be used based on specific types of LSAs. You can configure a sender to
determine whether to set the DN bit to 1 or configure a receiver to determine whether to check the DN bit
in the Type 3 LSAs based on the router ID of the device that generates the Type 3 LSAs.
In the inter-AS VPN Option A scenario shown in Figure 5, the four ASBRs are fully meshed and run OSPF.
ASBR2 may receive the Type 3, Type 5, or Type 7 LSAs generated on ASBR4. If ASBR2 is not configured to
check the DN bit in the LSAs, ASBR2 will accept the Type 3 LSAs, which may cause routing loops, as
described in Routing Loop Prevention. ASBR2 will deny the Type 5 or Type 7 LSAs, because the VPN route
tags carried in the LSAs are the same as the default VPN route tag of the OSPF process on ASBR2.
To address the routing loop problem caused by Type 3 LSAs, ASBR2 can be disabled from checking the DN
bit in the Type 3 LSAs generated by devices with router ID 1.1.1.1 and router ID 3.3.3.3. After the
configuration is complete, if ASBR2 receives Type 3 LSAs sent by ASBR4 with router ID 4.4.4.4, ASBR2 checks
the DN bit and denies these Type 3 LSAs because the DN bit is set to 1.
Figure 5 Networking for fully meshed ASBRs in the inter-AS VPN Option A scenario
Sham Link
OSPF sham links are unnumbered P2P links between two PEs over an MPLS VPN backbone network.
Generally, BGP extended community attributes carry routing information over the MPLS VPN backbone
between BGP peers. OSPF running on the other PE can use the routing information to generate inter-area
2022-07-08 1518
Feature Description
In Figure 6, if an intra-area OSPF link exists between the network segments of local and remote CEs. Routes
that pass through the intra-area route link and have higher priorities than inter-area routes that pass
through the MPLS VPN backbone network. As a result, VPN traffic is always forwarded through the intra-
area route instead of the backbone network. To prevent such a problem, an OSPF sham link can be
established between PEs so that the routes that pass through the MPLS VPN backbone network also become
OSPF intra-area routes and take precedence.
• A sham link is a link between two VPN instances. Each VPN instance contains the address of an end-
point of a sham link. The address is a loopback address with the 32-bit mask in the VPN address space
on the PE.
• After a sham link is established between two PEs, the PEs become neighbors on the sham link and
exchange routing information.
• A sham link functions as a P2P link within an area. Users can select a route from the sham link and
intra-area route link by adjusting the metric.
Multi-VPN-Instance CE
OSPF multi-instance generally runs on PEs. Devices that run OSPF multi-instance within user LANs are called
Multi-VPN-Instance CEs (MCEs).
Compared with OSPF multi-instance running on PEs, MCEs have the following characteristics:
• MCEs establish one OSPF instance for each service. Different virtual CEs transmit different services,
which ensures LAN security at a low cost.
• MCEs implement different OSPF instances on a CE. The key to implementing MCEs is to disable loop
detection and calculate routes directly. MCEs also use the received LSAs with the DN-bit 1 for route
calculation.
2022-07-08 1519
Feature Description
Background
As defined in OSPF, stub areas cannot import external routes. This mechanism prevents external routes from
consuming the bandwidth and storage resources of Routers in stub areas. If you need to both import
external routes and prevent resource consumption caused by external routes, you can configure not-so-
stubby areas (NSSAs).
There are many similarities between NSSAs and stub areas. However, different from stub areas, NSSAs can
import AS external routes into the OSPF AS and advertise the imported routes in the OSPF AS without
learning external routes from other areas on the OSPF network.
Related Concepts
• N-bit
A Router uses the N-bit carried in a Hello packet to identify the area type that it supports. The same
area type must be configured for all Routers in an area. If Routers have different area types, they
cannot establish OSPF neighbor relationships. Some vendors' devices do not comply with standard
protocols, but the N-bit is also set in OSPF Database Description (DD) packets. You can manually set
the N-bit on a Router to interwork with the vendors' devices.
• Type 7 LSA
Type 7 LSAs, which describe imported external routes, are introduced to support NSSAs. Type 7 LSAs are
generated by an ASBR in an NSSA and advertised only within the NSSA. After an ABR in an NSSA
receives Type 7 LSAs, it selectively translates Type 7 LSAs into Type 5 LSAs to advertise external routes
to other areas on an OSPF network.
Principles
To advertise external routes imported by an NSSA to other areas, a translator must translate Type 7 LSAs
into Type 5 LSAs. Notes for an NSSA are as follows:
• By default, the translator is the ABR with the largest router ID in the NSSA.
• The propagate bit (P-bit) is used to notify a translator whether Type 7 LSAs need to be translated.
• Only Type 7 LSAs with the P-bit set and a non-zero forwarding address (FA) can be translated into Type
5 LSAs. An FA indicates that packets to a destination address will be forwarded to the address specified
by the FA.
FA indicates that the packet to a specific destination address is to be forwarded to the address specified by.
The loopback interface address in an area is preferentially selected as the FA. If no loopback interface exists, the
address of the interface that is Up and has the smallest logical index in the area is selected as the FA.
2022-07-08 1520
Feature Description
• The P-bit is not set for default routes in Type 7 LSAs generated by an ABR.
Figure 1 NSSA
Advantages
Multiple ABRs may be deployed in an NSSA. To prevent routing loops caused by default routes, ABRs do not
calculate the default routes advertised by each other.
Background
When multicast and an MPLS TE tunnel are configured on a network and the TE tunnel is configured with
IGP Shortcut, the outbound interface of the route calculated by an IGP may be not an actual physical
interface but a TE tunnel interface. The TE tunnel interface on the Device sends multicast Join messages over
a unicast route to the multicast source address. The multicast Join messages are transparent to the Device
spanned by the TE tunnel. As a result, the Device spanned by the TE tunnel cannot generate multicast
forwarding entries.
To resolve the problem, configure OSPF local multicast topology (MT) to create a multicast routing table for
multicast packet forwarding.
Implementation
Multicast and an MPLS TE tunnel are deployed on the network, and the TE tunnel is enabled with IGP
Shortcut. As shown in Figure 1, DeviceB is spanned by the TE tunnel and therefore does not create any
multicast forwarding entry.
2022-07-08 1521
Feature Description
Because the TE tunnel is unidirectional, multicast data packets sent from the multicast source are directly
sent to the Routers spanned by the tunnel through physical interfaces. These Routers, however, do not have
multicast forwarding entries. As a result, the multicast data packets are discarded, and services are
unavailable.
After local MT is enabled, if the outbound interface of the calculated route is an IGP Shortcut TE tunnel
interface, the route management (RM) module creates a separate Multicast IGP (MIGP) routing table for the
multicast protocol, calculates the actual physical outbound interface for the route, and then adds the route
to the MIGP routing table. Multicast then uses routes in the MIGP routing table to forward packets.
In Figure 1, after the messages requesting to join a multicast group reach DeviceA, they are forwarded to
DeviceB through interface 1. In this manner, DeviceB can correctly create the multicast forwarding table.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults between
forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two systems. The path
can be a physical link, a logical link, or a tunnel.
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session quickly detects a link fault and then
notifies OSPF of the fault, which speeds up OSPF's response to network topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol convergence must be
as quick as possible to improve network availability. Link faults are inevitable, and therefore a solution must
be provided to quickly detect faults and notify routing protocols.
BFD for Open Shortest Path First (OSPF) associates BFD sessions with OSPF. After BFD for OSPF is
2022-07-08 1522
Feature Description
configured, BFD quickly detects link faults and notifies OSPF of the faults. BFD for OSPF accelerates OSPF
response to network topology changes.
Table 1 describes OSPF convergence speeds before and after BFD for OSPF is configured.
Table 1 OSPF convergence speeds before and after BFD for OSPF is configured
Principles
Figure 1 BFD for OSPF
Figure 1 shows a typical network topology with BFD for OSPF configured. The principles of BFD for OSPF are
described as follows:
3. The outbound interface on Device A connected to Device B is interface 1. If the link between Device A
and Device B fails, BFD detects the fault and then notifies Device A of the fault.
4. Device A processes the event that a neighbor relationship goes Down and recalculates routes. The new
route passes through Device C and reaches Device A, with interface 2 as the outbound interface.
Definition
Generalized TTL security mechanism (GTSM) is a mechanism that protects services over the IP layer by
checking whether the TTL value in an IP packet header is within a pre-defined range.
2022-07-08 1523
Feature Description
Purpose
On networks, attackers may simulate OSPF packets and keep sending them to a device. After receiving these
packets, the device directly sends them to the control plane for processing without checking their validity if
the packets are destined for the device. As a result, the control plane is busy processing these packets,
resulting in high CPU usage.
GTSM is used to protect the TCP/IP-based control plane against CPU-utilization attacks, such as CPU-
overload attacks.
Principles
GTSM-enabled devices check the TTL value in each received packet based on a configured policy. The
packets that fail to pass the policy are discarded or sent to the control plane, which prevents the devices
from possible CPU-utilization attacks. A GTSM policy involves the following items:
• Source port number and destination port number of protocols above TCP/UDP
• For directly connected OSPF neighbors, the TTL value of the unicast protocol packets to be sent is set to
255.
• GTSM takes effect on unicast packets rather than multicast packets. This is because the TTL value of
multicast packets can only be 255, and therefore GTSM is not needed to protect against multicast
packets.
Definition
Hello packets are periodically sent on OSPF interfaces of Routers. By exchanging Hello packets, the Routers
establish and maintain the neighbor relationship, and elect the DR and the Backup Designated Router (BDR)
on the multiple-access network (broadcast or NBMA network). OSPF uses a Hello timer to control the
2022-07-08 1524
Feature Description
interval at which Hello packets are sent. A Router can send Hello packets again only after the Hello timer
expires. Neighbors keep waiting to receive Hello packets until the Hello timer expires. This process delays the
establishment of OSPF neighbor relationships or election of the DR and the BDR.
Enabling Smart-discover can solve the preceding problem.
Without Smart-discover Hello packets are sent only when the Hello timer expires.
Hello packets are sent at the Hello interval.
Neighbors keep waiting to receive Hello packets within the
Dead interval.
With Smart-discover Hello packets are sent directly regardless of whether the Hello
timer expires.
Neighbors can receive packets and change the state
immediately.
Principles
In the following situations, Smart-discover-enabled interfaces can send Hello packets to neighbors regardless
of whether the Hello timer expires:
• On broadcast or NBMA networks, neighbor relationships can be established and a DR and a BDR can be
elected rapidly.
■ The neighbor status becomes 2-way for the first time or returns to Init from 2-way or a higher
state.
• On P2P or P2MP networks, neighbor relationships can be established rapidly. The establishment of
neighbor relationships on a P2P or P2MP network is the same as that on a broadcast or NBMA network.
Background
When a new device is deployed on a network or a device is restarted, network traffic may be lost during BGP
route convergence because IGP routes converge more quickly than BGP routes.
OSPF-BGP synchronization can address this problem.
Purpose
2022-07-08 1525
Feature Description
If a backup link exists, BGP traffic may be lost during traffic switchback because BGP routes converge more
slowly than OSPF routes do.
In Figure 1, Device A, Device B, Device C, and Device D run OSPF and establish IBGP connections. Device C
functions as the backup of Device B. When the network is stable, BGP and OSPF routes converge completely
on the Router.
In most cases, traffic from Device A to 10.3.1.0/30 passes through Device B. If Device B fails, traffic is
switched to Device C. After Device B recovers, traffic is switched back to Device B. During this process, packet
loss occurs.
Consequently, convergence of OSPF routes is complete whereas BGP route convergence is still going on. As a
result, Device B does not have the route to 10.3.1.0/30.
When packets from Device A to 10.3.1.0/30 reach Device B, Device B discards them because Device B does
not have the route to 10.3.1.0/30.
Principles
If OSPF-BGP synchronization is configured on a device, the device remains as a stub Router during the set
synchronization period. During this period, the link metric in the LSA advertised by the device is set to the
maximum value (65535), instructing other OSPF Routers not to use it as a transit Router for data
forwarding.
In Figure 1, OSPF-BGP synchronization is enabled on Router B. In this situation, before BGP route
convergence is complete, Device A keeps forwarding data through Device C rather than Device B until BGP
route convergence on Device B is complete.
Background
LDP-IGP synchronization is used to synchronize the status between LDP and an IGP to minimize the traffic
loss time if a network fault triggers the LDP and IGP switching.
On a network with active and standby links, if the active link fails, IGP routes and an LSP are switched to the
2022-07-08 1526
Feature Description
standby link. After the active link recovers, IGP routes are switched back to the active link before LDP
convergence is complete. In this case, the LSP along the active link takes time to make preparations, such as
adjacency restoration, before being established. As a result, LSP traffic is discarded. If an LDP session or
adjacency between nodes fails on the active link, the LSP along the active link is deleted. However, the IGP
still uses the active link, and as a result, LSP traffic cannot be switched to the standby link, and is
continuously discarded.
On a network enabled with LDP-IGP synchronization, an IGP keeps advertising the maximum cost of an IGP
route over the new active link to delay IGP route convergence until LDP converges. That is, before the LSP of
the active link is established, the LSP of the standby link is retained so that the traffic continues to be
forwarded through the standby link. The standby LSP is torn down only after the active LSP is established
successfully.
LDP-IGP synchronization involves the following timers:
• Hold-max-cost timer
• Delay timer
Implementation
Figure 1 Switchback in LDP-IGP synchronization
• The network shown in Figure 1 has active and standby links. When the active link recovers from a fault,
traffic is switched from the standby link back to the active link. During the traffic switchback, the
standby LSP cannot be used, and a new LSP cannot be set up over the active link once IGP route
convergence is complete. This causes a traffic interruption for a short period of time. To prevent this
problem, LDP-IGP synchronization can be configured to delay IGP route switchback until LDP
convergence is complete. Before convergence of the active LSP completes, the standby LSP is retained,
2022-07-08 1527
Feature Description
so that the traffic continues to be forwarded through the standby LSP until the active LSP is successfully
established. Then the standby LSP is torn down. The detailed process is as follows:
2. An LDP session is set up between LSR2 and LSR3. The IGP advertises the maximum cost of the
active link to delay the IGP route switchback.
4. The LDP session is set up. Label messages are exchanged to notify the IGP to start
synchronization.
5. The IGP advertises the normal cost of the active link, and its routes converge to the original
forwarding path. The LSP is reestablished and entries are delivered to the forwarding table
(within milliseconds).
• If an LDP session between nodes fails on the active link, the LSP along the active link is deleted.
However, the IGP still uses the active link, and as a result, LSP traffic cannot be switched to the standby
link, and is continuously discarded. To prevent this problem, you can configure LDP-IGP synchronization.
If an LDP session fails, LDP notifies an IGP of the failure. The IGP advertises the maximum cost of the
failed link, which enables the route to switch from the active link to the standby link. In addition to the
LSP switchover from the primary LSP to the backup LSP, LDP-IGP synchronization is implemented. The
process of LDP-IGP synchronization is as follows:
2. LDP notifies the IGP of the failure in the session over the active link. The IGP then advertises the
maximum cost along the active link.
4. An LSP is set up over the standby link, and then forwarding entries are delivered.
To prevent repeated failures in LDP session reestablishment, you can use the Hold-max-cost timer to
configure the device to always advertise the maximum cost, so that traffic is transmitted along the
standby link before the LDP session is reestablished on the active link.
After LDP-IGP synchronization is enabled on an interface, the IGP queries the status of the interface and
LDP session according to the process shown in Figure 2, enters the corresponding state according to the
query result, and then transits the state according to Figure 2.
2022-07-08 1528
Feature Description
Usage Scenario
LDP-IGP synchronization applies to the following scenario:
On the network shown in Figure 3, an active link and a standby link are established. LDP-IGP
synchronization and LDP FRR are deployed.
2022-07-08 1529
Feature Description
Benefits
Packet loss is reduced during an active/standby link switchover, improving network reliability.
• Partial Route Calculation (PRC): calculates only those routes which have changed when the network
topology changes.
• An OSPF intelligent timer: can dynamically adjust its value based on the user's configuration and the
interval at which an event is triggered, such as the route calculation interval, which ensures rapid and
stable network operation.
OSPF intelligent timer uses the exponential backoff technology so that the value of the timer can reach
the millisecond level.
PRC
When a node in a network topology changes, the Dijkstra algorithm needs to recalculate all routes on the
network. This calculation takes a long time and consumes a large number of CPU resources, which affects
the convergence speed on the entire network. However, PRC uses only the nodes that have changed to
recalculate routes, thereby decreasing CPU usage.
In route calculation, a leaf represents a route, and a node represents a device. Either an SPT change or a leaf
change causes a routing information change. The SPT change is irrelevant to the leaf change. PRC processes
routing information as follows:
2022-07-08 1530
Feature Description
• If the SPT changes, PRC calculates all the leaves only on the changed node.
• If the SPT remains unchanged, PRC calculates only the changed leaves.
For example, if a new route is imported, the SPT of the entire network remains unchanged. In this case, PRC
updates only the interface route for this node, thereby reducing the CPU usage.
• On a network where routes are calculated repeatedly, the OSPF intelligent timer dynamically adjusts
the route calculation based on user's configuration and the exponential backoff technology. The number
of route calculation times and the CPU resource consumption are decreased. Routes are calculated after
the network topology stabilizes.
• On an unstable network, if a router generates or receives LSAs due to frequent topology changes, the
OSPF intelligent timer can dynamically adjust the interval. No LSAs are generated or processed within
an interval, which prevents invalid LSAs from being generated and advertised on the entire network.
Background
If an interface carrying OSPF services alternates between Up and Down, OSPF neighbor relationship flapping
occurs on the interface. During the flapping, OSPF frequently sends Hello packets to reestablish the neighbor
relationship, synchronizes LSDBs, and recalculates routes. In this process, a large number of packets are
exchanged, adversely affecting neighbor relationship stability, OSPF services, and other OSPF-dependent
services, such as LDP and BGP. OSPF neighbor relationship flapping suppression can address this problem by
delaying OSPF neighbor relationship reestablishment or preventing service traffic from passing through
flapping links.
Related Concepts
Flapping-event: reported when the status of a neighbor relationship on an interface last changes from Full
2022-07-08 1531
Feature Description
Implementation
Flapping detection
Each OSPF interface on which OSPF neighbor relationship flapping suppression is enabled starts a flapping
counter. If the interval between two successive neighbor status changes from Full to a non-Full state is
shorter than detecting-interval, a valid flapping_event is recorded, and the flapping_count increases by 1.
When the flapping_count reaches or exceeds threshold, flapping suppression takes effect. If the interval
between two successive neighbor status changes from Full to a non-Full state is longer than resume-
interval, the flapping_count is reset.
The detecting-interval, threshold, and resume-interval are configurable.
Flapping suppression
Flapping suppression works in either Hold-down or Hold-max-cost mode.
• Hold-down mode: In the case of frequent flooding and topology changes during neighbor relationship
establishment, interfaces prevent neighbor relationship reestablishment during Hold-down suppression,
which minimizes LSDB synchronization attempts and packet exchanges.
• Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use 65535 as the cost
of the flapping link during Hold-max-cost suppression, which prevents traffic from passing through the
flapping link.
Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression duration can be changed
manually.
If an attack causes frequent neighbor relationship flapping, Hold-down mode can minimize the impact of
the attack.
When an interface enters the flapping suppression state, all neighbor relationships on the interface enter the state
accordingly.
2022-07-08 1532
Feature Description
Typical Scenarios
Basic scenario
In Figure 1, the traffic forwarding path is Device A -> Device B -> Device C -> Device E before a link failure
occurs. After the link between Device B and Device C fails, the forwarding path switches to Device A ->
Device B -> Device D -> Device E. If the neighbor relationship between Device B and Device C frequently
flaps at the early stage of the path switchover, the forwarding path will be switched frequently, causing
traffic loss and affecting network stability. If the neighbor relationship flapping meets suppression
conditions, flapping suppression takes effect.
• If flapping suppression works in Hold-down mode, the neighbor relationship between Device B and
Device C is prevented from being reestablished during the suppression period, in which traffic is
forwarded along the path Device A -> Device B -> Device D -> Device E.
• If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the link between
Device B and Device C during the suppression period, and traffic is forwarded along the path Device A -
> Device B -> Device D -> Device E.
2022-07-08 1533
Feature Description
neighbor relationship between Device B and Device C is prevented from being reestablished, the whole
network will be divided. Therefore, Hold-max-cost mode (rather than Hold-down mode) is recommended. If
flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the link between Device B
and Device C during the suppression period. After the network stabilizes and the suppression timer expires,
the link is restored.
Broadcast scenario
In Figure 3, four devices are deployed on the same broadcast network using switches, and the devices are
broadcast network neighbors. If Device C flaps due to a link failure, and Device A and Device B were
deployed at different time (Device A was deployed earlier for example) or the flapping suppression
parameters on Device A and Device B are different, Device A first detects the flapping and suppresses Device
C. Consequently, the Hello packets sent by Device A do not carry Device C's router ID. However, Device B has
not detected the flapping yet and still considers Device C a valid node. As a result, the DR candidates
identified by Device A are Device B and Device D, whereas the DR candidates identified by Device B are
Device A, Device C, and Device D. Different DR candidates result in a different DR election result, which may
lead to route calculation errors. To prevent this problem in scenarios where an interface has multiple
neighbors, such as on a broadcast, P2MP, or NBMA network, all neighbors on the interface are suppressed
when the status of a neighbor relationship last changes to ExStart or Down. Specifically, if Device C flaps,
Device A, Device B, and Device D on the broadcast network are all suppressed. After the network stabilizes
and the suppression timer expires, Device A, Device B, and Device D are restored to normal status.
2022-07-08 1534
Feature Description
Multi-area scenario
In Figure 4, Device A, Device B, Device C, Device E, and Device F are connected in area 1, and Device B,
Device D, and Device E are connected in backbone area 0. Traffic from Device A to Device F is preferentially
forwarded along an intra-area route, and the forwarding path is Device A -> Device B -> Device C -> Device
E -> Device F. When the neighbor relationship between Device B and Device C flaps and the flapping meets
suppression conditions, flapping suppression takes effect in the default mode (Hold-max-cost).
Consequently, 65535 is used as the cost of the link between Device B and Device C. However, the forwarding
path remains unchanged because intra-area routes take precedence over inter-area routes during route
selection according to OSPF route selection rules. To prevent traffic loss in multi-area scenarios, configure
Hold-down mode to prevent the neighbor relationship between Device B and Device C from being
reestablished during the suppression period. During this period, traffic is forwarded along the path Device A
-> Device B -> Device D -> Device E -> Device F.
By default, the Hold-max-cost mode takes effect. The mode can be changed to Hold-down manually.
Scenario with both LDP-IGP synchronization and OSPF neighbor relationship flapping suppression
configured
In Figure 5, if the link between PE1 and P1 fails, an LDP LSP switchover is implemented immediately, causing
the original LDP LSP to be deleted before a new LDP LSP is established. To prevent traffic loss, LDP-IGP
synchronization needs to be configured. With LDP-IGP synchronization, 65535 is used as the cost of the new
LSP to be established. After the new LSP is established, the original cost takes effect. Consequently, the
original LSP is deleted, and LDP traffic is forwarded along the new LSP.
LDP-IGP synchronization and OSPF neighbor relationship flapping suppression work in either Hold-down or
Hold-max-cost mode. If both functions are configured, Hold-down mode takes precedence over Hold-max-
cost mode, followed by the configured link cost. Table 1 lists the suppression modes that take effect in
different situations.
2022-07-08 1535
Feature Description
Table 1 Principles for selecting the suppression modes that take effect in different situations
For example, the link between PE1 and P1 frequently flaps in Figure 5, and both LDP-IGP synchronization
and OSPF neighbor relationship flapping suppression are configured. In this case, the suppression mode is
selected based on the preceding principles. No matter which mode (Hold-down or Hold-max-cost) is
selected, the forwarding path is PE1 -> P4 -> P3 -> PE2.
Figure 5 Scenario with both LDP-IGP synchronization and OSPF neighbor relationship flapping suppression
configured
Scenario with both bit-error-triggered protection switching and OSPF neighbor relationship flapping
2022-07-08 1536
Feature Description
suppression configured
If a link has poor link quality, services transmitted along it may be adversely affected. If bit-error-triggered
protection switching is configured and the bit error rate (BER) along a link exceeds a specified value, a bit
error event is reported, and 65535 is used as the cost of the link, triggering route reselection. Consequently,
service traffic is switched to the backup link. If both bit-error-triggered protection switching and OSPF
neighbor relationship flapping suppression are configured, they both take effect. Hold-down mode takes
precedence over Hold-max-cost mode, followed by the configured link cost.
Context
If network-wide OSPF LSA flush causes network instability, source tracing must be implemented as soon as
possible to locate and isolate the fault source. However, OSPF itself does not support source tracing. A
conventional solution is to isolate nodes one by one until the fault source is located, but the process is
complex and time-consuming and may compromise network services. To solve the preceding problem, OSPF
introduces a proprietary protocol, namely, the source tracing protocol. This protocol supports the flooding of
flush source information. When the preceding problem occurs, you can quickly query the flush source
information on any device on the network to quickly locate the fault source.
Related Concepts
Source tracing
A mechanism that helps locate the device that flushes OSPF LSAs. This feature has the following
characteristics:
• Uses a new UDP port. Source tracing packets are carried by UDP packets, and the UDP packets also
carry the OSPF LSAs flushed by the current device and are flooded hop by hop based on the OSPF
topology.
• Forwards packets along UDP channels, which are independent of the channels used to transmit OSPF
packets. Therefore, this protocol facilitates incremental deployment. In addition, source tracing does not
affect the devices with the related UDP port disabled.
• Supports query of the node that flushed LSAs on any of the devices after source tracing packets are
flooded on the network, which speeds up fault locating and faulty node isolation.
Flush
Network-wide OSPF LSAs are deleted.
PS-Hello packets
Packets used to negotiate the OSPF flush source tracing capability between OSPF neighbors.
PS-LSA
When a device flushes an OSPF LSA, it generates a PS-LSA carrying information about the device and brief
2022-07-08 1537
Feature Description
Fundamentals
The implementation of OSPF flush source tracing is as follows:
Only router-LSAs, network-LSAs, and inter-area-router-LSAs can be flushed. Therefore, a device generates a PS-LSA only
when it flushes a router-LSA, network-LSA, or inter-area-router-LSA.
2022-07-08 1538
Feature Description
Devices A and B both support DeviceA sends a PS-Hello packet to notify its source tracing
source tracing. capability.
Upon reception of the PS-Hello packet, DeviceB sets the source
tracing field for DeviceA and replies with an ACK packet to notify its
source tracing capability to DeviceA.
Upon reception of the ACK packet, DeviceA sets the source tracing
field for DeviceB, and does not retransmit the PS-Hello packet.
DeviceA supports source tracing, DeviceA sends a PS-Hello packet to notify its source tracing
but DeviceB does not. capability.
DeviceA fails to receive an ACK packet from DeviceB after 10s elapses
and retransmits the PS-Hello packet. A maximum of two
retransmissions are allowed. After DeviceA fails to receive an ACK
packet from DeviceB after the PS-Hello packet is retransmitted twice,
DeviceA considers that DeviceB does not support source tracing.
2022-07-08 1539
Feature Description
Devices A and B both support After source tracing is disabled from DeviceB, DeviceB sends a PS-
source tracing, but source tracing is Hello packet to notify its source tracing incapability.
disabled from DeviceB. Upon reception of the PS-Hello packet from DeviceB, DeviceA replies
with an ACK packet that carries the source tracing capability.
Upon reception of the ACK packet from DeviceA, DeviceB considers
the capability negotiation complete and disables the UDP port.
DeviceA does not support source After source tracing is disabled from DeviceB, DeviceB sends a PS-
tracing, and source tracing is Hello packet to notify its source tracing incapability.
disabled from DeviceB. DeviceB fails to receive an ACK packet from DeviceA after 10s elapses
and retransmits the PS-Hello packet. A maximum of two
retransmissions are allowed. After two retransmissions, DeviceB
considers the capability negotiation complete and disables the UDP
port.
• If a device flushes an OSPF LSA, it generates and floods a PS-LSA to source tracing-capable neighbors.
• If a device receives a flush LSA from a source tracing-incapable neighbor, the device generates and
floods a PS-LSA to source tracing-capable neighbors. If a device receives the same flush LSA (with the
same LSID and sequence number) from more than one source tracing-incapable neighbor, the device
generates only one PS-LSA.
2022-07-08 1540
Feature Description
which the Flush Router field is its router ID and the Neighbor Router field is 0, and adds the PS-LSA to
the queue where packets are to be sent to all source tracing-capable neighbors.
• After DeviceA receives the flush LSA from source tracing-incapable DeviceB, DeviceA generates a PS-LSA
in which the Flush Router field is its router ID and the Neighbor Router field is the router ID of
DeviceB, and adds the PS-LSA to the queue where packets are to be sent to all source tracing-capable
neighbors.
• After DeviceA receives the flush LSA from DeviceB, followed by the same flush LSA sent by DeviceC,
DeviceA generates a PS-LSA in which the Flush Router field is its router ID and the Neighbor Router
field is the router ID of DeviceB, and adds the PS-LSA to the queue where packets are to be sent to all
source tracing-capable neighbors. No PS-LSA is generated in response to the flush LSA received from
DeviceC.
• During neighbor relationship establishment, a device initializes the sequence number of the PS-LSU
packet of the neighbor. When the device replies with a PS-LSU packet, it adds the sequence number of
the PS-LSU packet of the neighbor. During PS-LSU packet retransmission, the sequence number remains
unchanged. After the device receives a PS-LSU ACK packet with the same sequence number, it increases
the sequence number of the neighbor's PS-LSU packet by 1.
• The neighbor manages the PS-LSA sending queue. When a PS-LSA is added to the queue which was
empty, the neighbor starts a timer. After the timer expires, the neighbor adds the PS-LSA to a PS-LSU
packet, sends the packet to its neighbor, and starts another timer to wait for a PS-LSU ACK packet.
• After the PS-LSU ACK timer expires, the PS-LSU packet is retransmitted.
• When the device receives a PS-LSU ACK packet with a sequence number same as that in the neighbor
record, the device clears PS-LSAs from the neighbor queue, and sends another PS-LSU packet after the
timer expires.
■ If the sequence number of a received PS-LSU ACK packet is less than that in the neighbor record,
the device ignores the packet.
■ If the sequence number of a received PS-LSU ACK packet is greater than that in the neighbor
record, the device discards the packet.
• When a device receives a PS-LSU packet from a neighbor, the neighbor records the sequence number of
the packet and replies with a PS-LSU ACK packet.
• When the device receives a PS-LSU packet with the sequence number the same as that in the neighbor
record, the device discards the PS-LSU packet.
2022-07-08 1541
Feature Description
• After the device parses a PS-LSU packet, it adds the PS-LSA in the packet to the LSDB. The device also
checks whether the PS-LSA is newer than the corresponding PS-LSA in the LSDB.
■ If the received PS-LSA is the same as the corresponding local one, the device does not process the
received PS-LSA.
■ If the received PS-LSA is older, the device floods the corresponding local one to the neighbor.
• If the device receives a PS-LSU packet from a neighbor and the neighbor does not support source
tracing, the device modifies the neighbor status as source tracing capable.
GTSM GTSM is a security mechanism that checks whether the time to live
(TTL) value in each received IP packet header is within a pre-defined
range.
CPU-CAR Interface boards can check the packets to be sent to the CPU for
processing and prevent the main control board from being
overloaded by a large number of packets that are sent to the CPU.
The source tracing protocol needs to apply for an independent CAR
channel and has small CAR values configured.
2022-07-08 1542
Feature Description
Typical Scenarios
Scenario where all nodes support source tracing
All nodes on the network support source tracing, and DeviceA is the faulty source. Figure 3 shows the
networking.
When DeviceA flushes an OSPF LSA, it generates a PS-LSA that carries DeviceA information and brief
information about the flush LSA. Then the PS-LSA is flooded on the network hop by hop. After the fault
occurs, maintenance personnel can log in to any node on the network to locate DeviceA that keeps sending
flush LSAs and isolate DeviceA from the network.
Scenario where source tracing-incapable nodes are not isolated from source tracing-capable nodes
All nodes on the network except DeviceC support source tracing, and DeviceA is the fault source. In this case,
the PS-LSA can be flooded on the entire network, and the fault source can be accurately located. Figure 4
shows the networking.
2022-07-08 1543
Feature Description
Figure 4 Scenario where source tracing-incapable nodes are not isolated from source tracing-capable nodes
When DeviceA flushes an OSPF LSA, it generates a PS-LSA that carries DeviceA information and brief
information about the flush LSA. Then the PS-LSA is flooded on the network hop by hop. When DeviceB and
DeviceE negotiate the source tracing capability with DeviceC, they find that DeviceC does not support source
tracing. Therefore, after DeviceB receives the PS-LSA from DeviceA, DeviceB sends the PS-LSA to DeviceD,
but not to DeviceC. After receiving the flush LSA from DeviceC, DeviceE generates a PS-LSA that carries
information about the advertisement source (DeviceE), flush source (DeviceC), and the flush LSA, and floods
the PS-LSA on the network.
After the fault occurs, maintenance personnel can log in to any device on the network except DeviceC to
locate the faulty node. Two possible faulty nodes can be located in this case: DeviceA and DeviceC, and they
both send the same flush LSA. In this case, DeviceA takes precedence over DeviceC when the maintenance
personnel determine the most possible faulty source. After DeviceA is isolated, the network recovers.
Scenario where source tracing-incapable nodes are isolated from source tracing-capable nodes
All nodes on the network except DeviceC and DeviceD support source tracing, and DeviceA is the faulty
source. In this case, the PS-LSA cannot be flooded on the entire network. Figure 5 shows the networking.
2022-07-08 1544
Feature Description
Figure 5 Scenario where source tracing-incapable nodes are isolated from source tracing-capable nodes
When DeviceA flushes an OSPF LSA, it generates a PS-LSA that carries DeviceA information and brief
information about the flush LSA. However, the PS-LSA can reach only DeviceB because DeviceC and DeviceD
do not support source tracing.
During source tracing capability negotiation, DeviceE finds that DeviceC does not support source tracing, and
DeviceF finds that DeviceD does not support source tracing. After DeviceE receives the flush LSA from
DeviceC, DeviceE generates and floods a PS-LSA on behalf of DeviceC. Similarly, after DeviceF receives the
flush LSA from DeviceD, DeviceF generates and floods a PS-LSA on behalf of DeviceD.
• If maintenance personnel log in to DeviceA or DeviceB, the personnel can locate the fault source
(DeviceA) directly. After DeviceA is isolated, the network recovers.
• If the maintenance personnel log in to DeviceE, DeviceF, DeviceG, or DeviceH, the personnel will find
that DeviceE claims DeviceC to be the fault source of the OSPF flush LSA and DeviceF claims DeviceD to
be the fault source of the same OSPF flush LSA.
• If the maintenance personnel log in to DeviceC and DeviceD, the personnel will find that the flush LSA
was initiated by DeviceB, not generated by DeviceC or DeviceD.
• If the maintenance personnel log in to DeviceB, the personnel will find that DeviceA is the faulty device,
and isolate DeviceA. After DeviceA is isolated, the network recovers.
2022-07-08 1545
Feature Description
Background
In OSPF, intra-area links take precedence over inter-area links during route selection even when the inter-
area links are shorter than the intra-area links. Each OSPF interface belongs to only one area. As a result,
even when a high-speed link exists in an area, traffic of another area cannot be forwarded along the link. A
common method used to solve this problem is to configure multiple sub-interfaces and add them to
different areas. However, this method has a defect that an independent IP address needs to be configured
for each sub-interface and then is advertised, which increases the total number of routes. In this situation,
OSPF multi-area adjacency is introduced.
OSPF multi-area adjacency allows an OSPF interface to be multiplexed by multiple areas so that a link can
be shared by the areas.
Figure 1 Traffic forwarding paths before and after OSPF multi-area adjacency is enabled
In Figure 1, the link between Device A and Device B in area 1 is a high-speed link.
In Figure 1 a, OSPF multi-area adjacency is disabled on Device A and Device B, and traffic from Device A to
Device B in area 2 is forwarded along the low-speed link of Device A -> Device C -> Device D -> Device B.
In Figure 1 b, OSPF multi-area adjacency is enabled on Device A and Device B, and their multi-area
adjacency interfaces belong to area 2. In this case, traffic from Device A to Device B in area 2 is forwarded
along the high-speed link of Device A -> Device B.
• Allows interface multiplexing, which reduces OSPF interface resource usage in multi-area scenarios.
2022-07-08 1546
Feature Description
• Allows link multiplexing, which prevents a traffic detour to low-speed links and optimizes the OSPF
network.
Related Concepts
Multi-area adjacency interface: indicates the OSPF logical interface created when OSPF multi-area
adjacency is enabled on an OSPF-capable interface (main OSPF interface). The multi-area adjacency
interface is also referred to as a secondary OSPF interface. The multi-area adjacency interface has the
following characteristics:
• The multi-area adjacency interface and the main OSPF interface belong to different OSPF areas.
• The network type of the multi-area adjacency interface must be P2P. The multi-area adjacency interface
runs an independent interface state machine and neighbor state machine.
• The multi-area adjacency interface and the main OSPF interface share the same interface index and
packet transmission channel. Whether the multi-area adjacency interface or the main OSPF interface is
selected to forward an OSPF packet is determined by the area ID carried in the packet header and
related configuration.
• If the interface is P2P, its multi-area adjacency interface sends packets through multicast.
• If the interface is not P2P, its multi-area adjacency interface sends packets through unicast.
Principles
Figure 2 Networking for OSPF multi-area adjacency
In Figure 2, the link between Device A and Device B in area 1 is a high-speed link. In area 2, traffic from
Device A to Device B is forwarded along the low-speed link of Device A -> Device C -> Device D -> Device B.
If you want the traffic from Device A to Device B in area 2 to be forwarded along the high-speed link of
Device A -> Device B, deploy OSPF multi-area adjacency.
Specifically, configure OSPF multi-area adjacency on the main interfaces of Device A and Device B to create
multi-area adjacency interfaces. The multi-area adjacency interfaces belong to area 2.
2022-07-08 1547
Feature Description
1. An OSPF adjacency is established between Device A and Device B. For details about the establishment
process, see Adjacency Establishment.
2. Route calculation is implemented. For details about the calculation process, see Route Calculation.
The optimal path in area 2 obtained by OSPF through calculation is the high-speed link of Device A ->
Device B. In this case, the high-speed link is shared by area 1 and area 2.
Background
As networks develop, voice over IP (VoIP) and online video services pose higher requirements for real-time
transmission. Nevertheless, if a primary link fails, OSPF-enabled devices need to perform multiple operations,
including detecting the fault, updating the link-state advertisement (LSA), flooding the LSA, calculating
routes, and delivering forward information base (FIB) entries before switching traffic to a new link. This
process takes a much longer time, the minimum delay to which users are sensitive. As a result, the
requirements for real-time transmission cannot be met. OSPF IP FRR can solve this problem. OSPF IP FRR
conforms to dynamic IP FRR defined by standard protocols. With OSPF IP FRR, devices can switch traffic
from a faulty primary link to a backup link, protecting against a link or node failure.
Major FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, Remote LFA, and MRT, among
which OSPF supports only LFA and Remote LFA.
Related Concepts
OSPF IP FRR
OSPF IP FRR refers to a mechanism in which a device uses the loop-free alternate (LFA) algorithm to
compute the next hop of a backup link and stores the next hop together with the primary link in the
forwarding table. If the primary link fails, the device switches the traffic to the backup link before routes are
converged on the control plane. This mechanism keeps the traffic interruption duration and minimizes the
impacts.
OSPF IP FRR policy
An OSPF IP FRR policy can be configured to filter alternate next hops. Only the alternate next hops that
match the filtering rules of the policy can be added to the IP routing table.
LFA algorithm
A device uses the shortest path first (SPF) algorithm to calculate the shortest path from each neighbor with
a backup link to the destination node. The device then uses the inequalities defined in standard protocols
and the LFA algorithm to calculate the next hop of the loop-free backup link that has the smallest cost of
the available shortest paths.
2022-07-08 1548
Feature Description
Remote LFA
LFA FRR cannot be used to calculate alternate links on large-scale networks, especially on ring networks.
Remote LFA FRR addresses this problem by calculating a PQ node and establishing a tunnel between the
source node of a primary link and the PQ node. If the primary link fails, traffic can be automatically switched
to the tunnel, which improves network reliability.
P space
Remote LFA uses the source end of a protection link as the root node and calculates an SPT to all the other
nodes on the network (with the protection link calculated in the tree). Then Remote LFA removes all the
nodes along the protection link from the SPT, and the set of the remaining nodes is called a P space.
Extended P space
Remote LFA uses neighbors of the source end of a protection link as root nodes and calculates separate SPTs
(with the protection link calculated in the trees). Then Remote LFA removes all the nodes along the
protection link from each SPT, and the set of the remaining nodes on the SPTs is called an extended P space.
Q space
Remote LFA uses the destination end of a protection link as the root node and calculates an SPT to all the
other nodes on the network (with the protection link calculated in the tree). Then Remote LFA removes all
the nodes along the protection link from the SPT, and the set of the remaining nodes is called a Q space.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the destination of
a protection tunnel.
Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a source node, E for the faulty node, N for a node
along a backup link, and D for a destination node.
2022-07-08 1549
Feature Description
Node-and-link protection
Node-and-link protection takes effect when the traffic to be protected.
In Figure 2, traffic flows from Device S to Device D. The primary link is Device S->Device E->Device D, and
the backup link is Device S->Device N->Device D. The preceding inequalities are met. With OSPF IP FRR,
Device S switches the traffic to the backup link if the primary link fails, reducing the traffic interruption
duration.
Node-and-link protection takes effect when the following conditions are met:
• The link costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, S) + Distance_opt(S, D).
• The interface costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, E) + Distance_opt(E, D).
Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a source node, E for the faulty node, N for a node
along a backup link, and D for a destination node.
2022-07-08 1550
Feature Description
On the network shown in Figure 3, Remote LFA calculates the PQ node as follows:
1. Calculates an SPT with each of P1's neighbors (excluding the neighbor on the protection link) as the
root. In this case, neighbors PE1 and P3 are used for calculation. For each SPT, an extended P space is
composed of the root node and those reachable nodes that belong to the SPT but do not pass through
the P1→P2 link. When PE1 is used as a root node for calculation, the extended P space {PE1, P1, P3} is
obtained. When P3 is used as a root node for calculation, the extended P space {PE1, P1, P3, P4} is
obtained. By combining these two extended P spaces, the final extended P space {PE1, P1, P3, P4} is
obtained.
2. Calculates a reverse SPT with P2 as the root. The obtained Q space is {P2, PE2, P4}.
3. Selects the PQ node (P4) that exists both in the extended P space and Q space.
On a network with a large number of nodes, to ensure that RLFA/TI-LFA calculation can be completed as soon as
possible, the elected P and Q nodes may not be optimal, but they comply with rules. This does not affect the
protection effect.
OSPF FRR in the Scenario Where Multiple Nodes Advertise the Same
Route
Both OSPF LFA FRR and OSPF remote LFA FRR use the SPF algorithm to calculate the shortest path from
2022-07-08 1551
Feature Description
each neighbor (root node) that provides a backup link to the destination node and store the node-based
backup next hop, which applies to single-node routing scenarios. As networks are increasingly diversified,
two ABRs or ASBRs are deployed to improve network reliability. In this case, OSPF FRR in a scenario where
multiple nodes advertise the same route is needed.
In a scenario where multiple nodes advertise the same route (multi-node routing scenario), OSPF FRR is implemented by
calculating the Type 3 LSAs advertised by ABRs of an area for intra-area, inter-area, ASE, or NSSA routing. Therefore, the
OSPF FRR calculation methods are the same when multiple nodes advertise the same route. Inter-area routing is used as
an example to describe how FRR in a multi-node routing scenario works.
Figure 4 OSPF FRR in the scenario where multiple nodes advertise the same route
In Figure 4, Device B and Device C function as ABRs to forward routes between area 0 and area 1. Device E
advertises an intra-area route. Upon receipt of the route, Device B and Device C translate it into a Type 3
LSA and flood the LSA to area 0. After OSPF FRR is enabled on Device A, Device A considers both Device B
and Device C as its neighbors. Without a fixed neighbor as the root node, Device A fails to calculate the FRR
backup next hop. To address this problem, a virtual node is simulated between Device B and Device C and
used as the root node of Device A, and Device A uses the LFA or remote LFA algorithm to calculate the
backup next hop. This solution converts multi-node routing into single-node routing.
For example, both Device B and Device C advertise the route 10.1.1.0/24, and OSPF FRR is enabled on Device
A. After Device A receives the route, it fails to calculate a backup next hop for the route due to a lack of a
fixed root node. To address this problem, a virtual node is simulated between Device B and Device C based
on the two sources of the route 10.1.1.0/24. The virtual node forms a link with each of Device B and Device
C. If the virtual node advertises a 10.1.1.0/24 route, it will use the smaller cost of the routes advertised by
Device B and Device C as the cost of the route. If the cost of the route advertised by Device B is 5 and that of
the route advertised by Device C is 10, the cost of the route advertised by the virtual node is 5. The cost of
the link from Device B to the virtual node is 0, and that of the link from Device C to the virtual node is 5.
The costs of the links from the virtual node to Device B and Device C are both 65535, the maximum value.
Device A is configured to consider Device B and Device C as invalid sources of the 10.1.1.0/24 route and use
2022-07-08 1552
Feature Description
the LFA or remote LFA algorithm to calculate the backup next hop for the route, with the virtual node as the
root node.
In a scenario where multiple nodes advertise the same route, OSPF FRR can use the LFA or remote LFA
algorithm. When OSPF FRR uses the remote LFA algorithm, PQ node selection has the following restrictions:
• An LDP tunnel will be established between a faulty node and a PQ node, and a virtual node in the
scenario where multiple nodes advertise the same route cannot transmit traffic through LDP tunnels. As
a result, the virtual node cannot be selected as a PQ node.
• The destination node is not used as a PQ node. After a virtual node is added to a multi-node routing
scenario, the destination node becomes the virtual node. As a result, the nodes directly connected to the
virtual node cannot be selected as PQ nodes.
OSPF LFA IP FRR provides protection for Link1, and Link2 or Link3 is selected as the backup link for traffic
forwarding. Assume that Link2 is selected as the backup link:
• If Link1 fails but the backup link (Link2) is normal, traffic can be forwarded normally after being
switched to the backup link.
• If both Link1 and Link2 fail, traffic is interrupted after being switched to the backup link.
OSPF SRLG FRR can be configured in the scenario where some links have the same risk of failure. If Link1
and Link2 have the same risk of failure, you can add them to an SRLG and configure OSPF SRLG FRR so that
a link outside the SRLG is preferentially selected as a backup link, which reduces the possibility of service
interruptions. After Link1 and Link2 are added to the same SRLG, OSPF LFA IP FRR selects Link3, which is not
in the SRLG, as the backup link to provide protection for Link1. If both Link1 and Link2 fail, traffic can be
switched to Link3 for normal transmission.
2022-07-08 1553
Feature Description
Derivative Functions
If you bind a Bidirectional Forwarding Detection (BFD) session with OSPF IP FRR, the BFD session goes down
if BFD detects a link fault. If the BFD session goes down, OSPF IP FRR is triggered to switch traffic from the
faulty link to the backup link, which minimizes the loss of traffic.
• Area authentication
Area authentication is configured in the OSPF area view and applies to packets received by all interfaces
in the OSPF area.
• Interface authentication
Interface authentication is configured in the interface view and applies to all packets received by the
interface.
• Non-authentication
Authentication is not required.
• Simple authentication
The authenticated party directly adds the configured password to packets for authentication. This
authentication mode provides the lowest password security.
• MD5 authentication
2022-07-08 1554
Feature Description
The authenticated party encrypts the configured password using a Message Digest 5 (MD5) algorithm
and adds the ciphertext password to packets for authentication. This authentication mode improves
password security. The supported MD5 algorithms include MD5 and HMAC-MD5.
For the sake of security, using the HMAC-SHA256 algorithm rather than the MD5 algorithm is
recommended.
• Keychain authentication
A keychain consists of multiple authentication keys, each of which contains an ID and a password. Each
key has the lifetime. Keychain dynamically selects the authentication key based on the lifetime. A
keychain can dynamically select the authentication key to enhance attack defense.
Keychain dynamically changes algorithms and keys, which improves OSPF security.
• HMAC-SHA256 authentication
A password is encrypted using the HMAC-SHA256 algorithm before it is added to the packet, which
improves password security.
OSPF carries authentication types in packet headers and authentication information in packet tails.
The authentication types include:
• 0: non-authentication
• 1: simple authentication
• 2: Ciphertext authentication
Usage Scenario
Figure 1 OSPF authentication on a broadcast network
• The interface authentication configurations must be the same on all devices on the same network so
that OSPF neighbor relationships can be established.
• The area authentication configurations must be the same on all devices in the same area.
2022-07-08 1555
Feature Description
• Hello packet
• DD packet
• LSR packet
• LSU packet
• LSAck packet
Packet length 16 bits Length of the OSPF packet with the packet header, in bytes.
Area ID 32 bits ID of the area to which the Router that sends the OSPF packet belongs.
Checksum 16 bits Checksum of the OSPF packet that does not carry the Authentication
field.
2022-07-08 1556
Feature Description
NOTE:
Authentication 64 bits This field has different meanings for different AuType values:
0: This field is not defined.
1: This field defines password information.
2: This field contains the key ID, MD5 authentication data length, and
sequence number.
MD5 authentication data is added to an OSPF packet and is not included in the Authentication field.
Hello Packet
Hello packets are commonly used packets, which are periodically sent by OSPF interfaces to establish and
maintain neighbor relationships. A Hello packet includes information about the designated router (DR),
backup designated router (BDR), timers, and known neighbors. Figure 2 shows the format of a Hello packet.
2022-07-08 1557
Feature Description
Network Mask 32 bits Mask of the network on which the interface that sends the Hello packet
resides.
NOTE:
RouterDeadInterval32 bits Dead interval. If a device does not receive any Hello packets from its
neighbors within a specified dead interval, the neighbors are considered
down.
Table 3 lists the address types, interval types, and default intervals used when Hello packets are transmitted
on different networks.
2022-07-08 1558
Feature Description
NBMA Unicast HelloInterval is used by the DR, BDR, 30 seconds for HelloInterval
address and Router that can become a DR. 120 seconds for PollInterval
PollInterval is used when neighbors
become Down, and HelloInterval is
used in other cases.
Routers on the same network segment must have the same HelloInterval and RouterDeadInterval values. Otherwise,
they cannot establish neighbor relationships. In addition, on an NBMA network, the PollInterval values must be the same
at both ends.
DD Packet
During an adjacency initialization, two Routers use DD packets to describe their own link state databases
(LSDBs) for LSDB synchronization. A DD packet contains the header of each LSA in an LSDB. An LSA header
uniquely identifies an LSA. The LSA header occupies only a small portion of the LSA, which reduces the
amount of traffic transmitted between Routers. A neighbor can use the LSA header to check whether it
already has the LSA. When two Routers exchange DD packets, one functions as the master, and the other
functions as the slave. The master defines a start sequence number and increases the sequence number by
one each time it sends a DD packet. After the slave receives a DD packet, it uses the sequence number
carried in the DD packet for acknowledgment.
Figure 3 shows the format of a DD packet.
2022-07-08 1559
Feature Description
Interface MTU 16 bits Maximum size of an IP packet that an interface can send without
fragmenting the packet.
I 1 bit If the DD packet is the first among multiple consecutive DD packets sent
by a device, this field is set to 1. Otherwise, this field is set to 0.
M (More) 1 bit If the DD packet is the last among multiple consecutive DD packets sent
by a device, this field is set to 0. Otherwise, this field is set to 1.
M/S 1 bit When two OSPF devices exchange DD packets, they negotiate a
(Master/Slave) master/slave relationship. The device with a larger router ID becomes
the master. If this field is set to 1, the device that sends the DD packet is
the master.
DD sequence 32 bits Sequence number of the DD packet. The master and slave use the
number sequence number to ensure that DD packets are correctly transmitted.
LSR Packet
2022-07-08 1560
Feature Description
After two Routers exchange DD packets, they send LSR packets to request LSAs from each other. The LSR
packets contain the summaries of the requested LSAs. Figure 4 shows the format of an LSR packet.
Link State ID 32 bits This field together with the LS type field describes an LSA in an AS.
The LS type, Link State ID, and Advertising Router fields can uniquely identify an LSA. If the preceding fields of two LSAs
are the same, the device uses the LS sequence number, LS checksum, and LS age fields to determine which LSA is newer.
LSU Packet
A Router uses an LSU packet to transmit LSAs requested by its neighbors or to flood its own updated LSAs.
The LSU packet contains a set of LSAs. On networks that support multicast and broadcast, LSU packets are
multicast to flood LSAs. To ensure reliable LSA flooding, a device uses an LSAck packet to acknowledge the
LSAs contained in an LSU packet that is received from a neighbor. If an LSA fails to be acknowledged, the
LSA is directly retransmitted to the neighbor. Figure 5 shows the format of an LSU packet.
2022-07-08 1561
Feature Description
LSAck Packet
A device uses an LSAck packet to acknowledge the headers of the LSAs contained in a received LSU packet.
LSAck packets are transmitted in unicast or multicast mode according to the link type. Figure 6 shows the
format of an LSAck packet.
2022-07-08 1562
Feature Description
• Router-LSAs (Type 1)
• Network-LSAs (Type 2)
• AS-external-LSAs (Type 5)
LS age 16 bits Time that elapses after an LSA is generated, in seconds. The value of this
field continually increases regardless of whether the LSA is transmitted
over a link or saved in an LSDB.
2022-07-08 1563
Feature Description
Type5: AS-external-LSA
Type7: NSSA-LSA
Link State ID 32 bits This field together with the LS type field describes an LSA in an AS.
LS sequence 32 bits Sequence number of the LSA. Neighbors can use this field to identify the
number latest LSA.
length 16 bits Length of the LSA including the LSA header, in bytes.
Router-LSA
A router-LSA describes the link status and cost of a Router. Router-LSAs are generated by a Router and
advertised within the area to which the Router belongs. Figure 2 shows the format of a router-LSA.
Link State ID 32 bits Router ID of the Router that generates the LSA.
2022-07-08 1564
Feature Description
V (Virtual Link) 1 bit If the Router that generates the LSA is located at one end of a virtual
link, this field is set to 1. In other cases, this field is set to 0.
E (External) 1 bit If the Router that generates the LSA is an autonomous system boundary
router (ASBR), this field is set to 1. In other cases, this field is set to 0.
B (Border) 1 bit If the Router that generates the LSA is an area border router (ABR), this
field is set to 1. In other cases, this field is set to 0.
# links 16 bits Number of links and interfaces described in the LSA, including all links
and interfaces in the area to which the Router belongs.
Link ID 32 bits Object to which the Router is connected. Its meanings are as follows:
1: router ID
2: interface IP address of the designated router (DR)
3: network segment or subnet number
4: router ID of the neighbor on a virtual link
Type 8 bits Type of the Router link. The values are as follows:
1: The Router is connected to another Router in point-to-point (P2P)
mode.
2: The Router is connected to a transport network.
3: The Router is connected to a stub network.
4: The Router is connected to another Router over a virtual link.
Network-LSA
A network-LSA describes the link status of all Routers on the local network segment. Network-LSAs are
2022-07-08 1565
Feature Description
generated by a DR on a broadcast or non-broadcast multiple access (NBMA) network and advertised within
the area to which the DR belongs. Figure 3 shows the format of a network-LSA.
Attached Router 32 bits Router IDs of all Routers on the broadcast or NBMA network, including
the router ID of the DR
Summary-LSA
A network-summary-LSA describes routes on a network segment in an area. The routes are advertised to
other areas.
An ASBR-summary-LSA describes routes to the ASBR in an area. The routes are advertised to all areas except
the area to which the ASBR belongs.
The two types of summary-LSAs have the same format and are generated by an ABR. Figure 4 shows the
format of a summary-LSA.
2022-07-08 1566
Feature Description
When a default route is advertised, both the Link State ID and Network Mask fields are set to 0.0.0.0.
2022-07-08 1567
Feature Description
AS-External-LSA
An AS-external-LSA describes AS external routes. AS-external-LSAs are generated by an ASBR. Among the
five types of LSAs, only AS-external-LSAs can be advertised to all areas except stub areas and not-so-stubby
areas (NSSAs). Figure 5 shows the format of an AS-external-LSA.
Forwarding 32 bits Packets destined for the advertised destination address are forwarded to
Address the address specified by this field.
External Route 32 bits Tag added to the external route. This field can be used to manage
Tag external routes. OSPF itself does not use this field.
2022-07-08 1568
Feature Description
When AS-external-LSAs are used to advertise default routes, both the Link State ID and Network Mask fields are set to
0.0.0.0.
Related Concepts
Redistribute ID
IS-IS uses a system ID as a redistribution identifier, OSPF and OSPFv3 use a router ID + process ID as a
redistribution identifier, and BGP uses a VrfID + random number as a redistribution identifier. For ease of
understanding, the redistribution identifiers of different protocols are all called Redistribute IDs. When routes
are distributed, the information carried in the routes contains Redistribute IDs.
Redistribute List
A Redistribute list may consist of multiple Redistribute IDs. Each Redistribute list of BGP contains a maximum
of four Redistribute IDs, and each Redistribute list of any other routing protocol contains a maximum of two
Redistribute IDs. When the number of Redistribute IDs exceeds the corresponding limit, the old ones are
discarded according to the sequence in which Redistribute IDs are added.
2022-07-08 1569
Feature Description
Take the route distributed by DeviceA as an example. A stable routing loop is formed through the following
process:
Phase 1
On the network shown in Figure 2, OSPF process 1 on DeviceA imports the static route 10.0.0.1 and floods a
Type 5 AS-External-LSA in OSPF process 1. After receiving the LSA, OSPF process 1 on DeviceD and OSPF
process 1 on DeviceE each calculate a route to 10.0.0.1, with the outbound interfaces being interface1 on
DeviceD and interface1 on DeviceE, respectively, and the cost being 102. At this point, the routes to 10.0.0.1
in OSPF process 1 in the routing tables of DeviceD and DeviceE are active.
Figure 2 Phase 1
Phase 2
In Figure 3, DeviceD and DeviceE are configured to import routes from OSPF process 1 to OSPF process 2.
No route-policy is configured for the import, or the configured route-policy is improper. For example, OSPF
process 2 on DeviceE imports routes from OSPF process 1 and then floods a Type 5 AS-External-LSA in OSPF
process 2. After receiving the LSA, OSPF process 2 on DeviceD calculates a route to 10.0.0.1, with the cost
being 2, which is smaller than that (102) of the route calculated by OSPF process 1. As a result, the active
route to 10.0.0.1 in the routing table of DeviceD is switched from the one calculated by OSPF process 1 to
the one calculated by OSPF process 2, and the outbound interface of the route is sub-interface2.1.
2022-07-08 1570
Feature Description
Figure 3 Phase 2
Phase 3
In Figure 4, DeviceD imports the route from OSPF process 2 to OSPF process 1 and floods a Type 5 AS-
External LSA in OSPF process 1. After receiving the LSA, OSPF process 1 on DeviceE recalculates the route to
10.0.0.1. The cost of the route becomes 2, which is smaller than that of the previously calculated route.
Therefore, the route to 10.0.0.1 in OSPF process 1 on DeviceE is changed to the route distributed by DeviceD,
and the outbound interface is interface 2.
Figure 4 Phase 3
Phase 4
After the route to 10.0.0.1 on DeviceE is updated, OSPF process 2 still imports the route from OSPF process 1
as the route remains active, and continues to distribute/update a Type 5 AS-External-LSA.
As a result, a stable routing loop is formed. Assuming that traffic is injected from DeviceF, Figure 5 shows
the traffic flow when the routing loop occurs.
2022-07-08 1571
Feature Description
2. DeviceD learns the route distributed by DeviceB through OSPF process 1 and imports the route from
OSPF process 1 to OSPF process 2. DeviceE learns the route distributed by DeviceD through OSPF
process 2 and saves the Redistribute List distributed by DeviceD through OSPF process 2 to the routing
table when calculating routes.
3. DeviceE imports the route from OSPF process 2 to OSPF process 1 and redistributes the route through
OSPF process 1. The corresponding Type 11 extended prefix Opaque LSA contains the Redistribute ID
of OSPF process 1 on DeviceE and the Redistribute ID of OSPF process 2 on DeviceD. The Redistribute
ID of OSPF process 1 on DeviceB has been discarded from the LSA.
4. OSPF process 1 on DeviceD learns the Redistribute list corresponding to the route distributed by
DeviceE and saves the Redistribute list in the routing table. When importing the route from OSPF
process 1 to OSPF process 2, DeviceD finds that the Redistribute list of the route contains its own
2022-07-08 1572
Feature Description
Redistribute ID, considers that a routing loop is detected, and reports an alarm. OSPF process 2 on
DeviceD distributes a large cost when redistributing the route so that other devices preferentially
select other paths after learning the route. This prevents routing loops.
When detecting a routing loop upon route import between processes of the same protocol, the device increases
the cost of the corresponding route. As the cost of the delivered route increases, the optimal route in the IP
routing table changes. In this way, the routing loop is eliminated.
In the case of inter-protocol route import, if a routing protocol with a higher preference detects a routing loop,
although this protocol increases the cost of the corresponding route, the cost increase will not render the route
inactive. As a result, the routing loop cannot be eliminated. If the routing protocol with a lower preference
increases the cost of the corresponding route, this route competes with the originally imported route during route
selection. In this case, the routing loop can be eliminated.
Figure 7 Traffic flow when a routing loop occurs during route import between OSPF and IS-IS
1. DeviceD learns the route distributed by DeviceB through OSPF process 1 and imports the route from
OSPF process 1 to IS-IS process 2. When IS-IS process 2 on DeviceD distributes route information, it
uses the extended prefix sub-TLV to distribute the Redistribute ID of IS-IS process 2 through an LSP.
IS-IS process 2 on DeviceE learns the route distributed by DeviceD and saves the Redistribute ID
distributed by IS-IS process 2 on DeviceD to the routing table during route calculation.
2022-07-08 1573
Feature Description
2. DeviceE imports the route from IS-IS process 2 to OSPF process 1 and uses an E-AS-External-LSA to
distribute the Redistribute ID of OSPF process 1 on DeviceE when distributing route information.
Similarly, after OSPF process 1 on DeviceD learns the route from DeviceE, DeviceD saves the
Redistribute ID distributed by OSPF process 1 on DeviceE to the routing table during route calculation.
3. When importing the route from OSPF process 1 to IS-IS process 2, DeviceD finds that the Redistribute
list of the route contains its own Redistribute ID, considers that a routing loop is detected, and reports
an alarm. IS-IS process 2 on DeviceD distributes a large cost when distributing the imported route.
Because IS-IS has a higher preference than OSPF ASE, this does not affect the route selection result or
resolve the routing loop.
4. DeviceE imports the route from IS-IS process 2 to OSPF process 1, finds that the Redistribute list of the
route contains its own Redistribute ID, considers that a routing loop is detected, and reports an alarm.
OSPF process 1 on DeviceE distributes a large cost when distributing the imported route so that other
devices preferentially select other paths after learning the route. This prevents routing loops.
Figure 8 Traffic flow when a routing loop occurs during route import between OSPF and BGP
1. DeviceD learns the route distributed by DeviceB through BGP and imports the BGP route to OSPF
process 2. When DeviceD distributes the imported route through OSPF process 2, it uses a Type 11
extended prefix Opaque LSA to distribute the Redistribute ID of OSPF process 2 on DeviceD. DeviceE
learns the route distributed by DeviceD through OSPF process 2 and saves the Redistribute List
distributed by DeviceD through OSPF process 2 to the routing table when calculating routes.
2022-07-08 1574
Feature Description
2. DeviceE imports the route from OSPF process 2 to BGP and distributes the Redistribute ID of the BGP
process on DeviceE through a Type 11 extended prefix Opaque LSA when redistributing the imported
route. After BGP on DeviceD learns the route distributed by DeviceE, DeviceD saves the Redistribute ID
distributed by BGP on DeviceE to the routing table during route calculation.
3. When importing the route from BGP to OSPF process 2, DeviceD finds that the Redistribute list of the
route contains its own Redistribute ID, considers that a routing loop is detected, and reports an alarm.
OSPF process 2 on DeviceD distributes a large link cost when distributing the imported route. Because
OSPF has a higher preference than BGP, this does not affect the route selection result or resolve the
routing loop.
4. After learning the route distributed by OSPF on DeviceD, DeviceE imports the route to BGP. Upon
finding that the Redistribute list of the route contains its own Redistribute ID, DeviceE considers that a
routing loop is detected and reports an alarm. When BGP on DeviceE distributes the route, it reduces
the preference of the route. In this way, other devices preferentially select other paths after learning
this route, preventing routing loops.
When detecting a routing loop upon route import between processes of the same protocol, the device increases
the cost of the corresponding route. As the cost of the delivered route increases, the optimal route in the IP
routing table changes. In this way, the routing loop is eliminated.
In the case of inter-protocol route import, if a routing protocol with a higher preference detects a routing loop,
although this protocol increases the cost of the corresponding route, the cost increase will not render the route
inactive. As a result, the routing loop cannot be eliminated. If the routing protocol with a lower preference
increases the cost of the corresponding route, this route competes with the originally imported route during route
selection. In this case, the routing loop can be eliminated.
Application Scenarios
Figure 9 shows a typical seamless MPLS network. If the OSPF process deployed at the access layer differs
from that deployed at the aggregation layer, OSPF inter-process mutual route import is usually configured
on AGGs so that routes can be leaked between the access and aggregation layers. In this case, a routing
loop may occur between AGG1 and AGG2. If OSPF routing loop detection is configured on AGG1 and AGG2,
routing loops can be quickly detected and resolved.
2022-07-08 1575
Feature Description
Definition
Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol (IGP) developed by the Internet
Engineering Task Force (IETF).
OSPF version 2 (OSPFv2) is intended for IPv4, and OSPF version 3 (OSPFv3) is intended for IPv6.
Purpose
OSPFv3 is an extension of OSPF for support of IPv6.
• OSPFv3 and OSPFv2 are the same in terms of the working principles of the Hello packet, state machine,
link-state database (LSDB), flooding, and route calculation.
• OSPFv3 packets are encapsulated into IPv6 packets and can be transmitted in unicast or multicast
mode.
2022-07-08 1576
Feature Description
Hello packet Hello packets are sent periodically to discover and maintain
OSPFv3 neighbor relationships.
Database Description (DD) packet Such packets contain the summary of the local LSDB and are
used for LSDB synchronization between two devices.
Link State Request (LSR) packet LSR packets are sent to the neighbor to request the required
LSAs.
An OSPFv3 device sends LSR packets to its neighbor only after
they exchange DD packets.
Link State Update (LSU) packet LSU packets carry the LSAs required by neighbors.
Link State Acknowledgment (LSAck) LSAck packets acknowledge the receipt of an LSA.
packet
LSA Types
OSPFv3 encapsulates routing information into LSAs for transmission. Table 1 describes LSAs and their
functions.
Router-LSA (Type 1) Describes the link status and link cost of a device, is
generated by the device for the area in which each
OSPFv3 interface resides, and is advertised in the
area.
Network-LSA (Type 2) Describes the link status of all routers on the local
network segment. Network-LSAs are generated by a
designated router (DR) and advertised in the area to
which the DR belongs.
2022-07-08 1577
Feature Description
Router Types
Figure 1 Router types
2022-07-08 1578
Feature Description
Internal router All interfaces on an internal router belong to the same OSPFv3 area.
Area border router (ABR) An ABR belongs to two or more areas, one of which must be the
backbone area.
An ABR is used to connect the backbone area and non-backbone
areas. It can be physically or logically connected to the backbone
area.
Backbone router At least one interface on a backbone router belongs to the backbone
area.
Internal routers in Area 0 and all ABRs are backbone routers.
AS boundary router (ASBR) An ASBR exchanges routing information with other ASs.
An ASBR does not necessarily reside on the border of an AS. It can be
an internal router or an ABR. An OSPFv3 device that has imported
external routing information will become an ASBR.
Type 1 external route Such routes offer higher reliability, and their costs are approximately
the same as those of AS internal routes and are comparable with the
costs of routes generated by OSPFv3.
Cost of a Type 1 external route = Cost of the route from a local
router to an ASBR + Cost of the route from the ASBR to the
destination of the Type 1 external route
Type 2 external route Such routes have low reliability. Therefore, OSPFv3 considers that the
2022-07-08 1579
Feature Description
Area
When a large number of Routers run OSPFv3, LSDBs become very large and require a large amount of
storage space. Large LSDBs also complicate shortest path first (SPF) computation and may overload Routers.
As the network scale expands, the probability of network topology changes increases, which causes the
network to continuously change. In such cases, large numbers of OSPFv3 packets are transmitted on the
network, leading to a decrease in bandwidth utilization efficiency. Each change in the network topology
causes all Routers on the network to recalculate routes.
OSPFv3 resolves this problem by partitioning an AS into different areas. An area is regarded as a logical
group, and each group is identified by an area ID. A Router, not a link, resides at the border of an area. A
network segment or link can belong only to one area. An area must be specified for each OSPFv3 interface.
OSPFv3 areas include common areas, stub areas, and NSSAs. Table 4 describes these in more detail.
Common OSPFv3 areas are common areas by default. Common In the backbone area, all
area areas include standard areas and backbone areas. devices must be connected.
Standard area: transmits intra-area, inter-area, and All non-backbone areas must
Stub area A stub area is a non-backbone area with only one ABR The backbone area cannot be
and generally resides at the border of an AS. The ABR in configured as a stub area.
a stub area does not transmit received AS external An ASBR cannot exist in a stub
routes, which significantly decreases the number of area. Therefore, AS external
entries in the routing table on the ABR and the amount routes cannot be advertised
of routing information to be transmitted. To ensure the within the stub area.
reachability of AS external routes, the ABR in the stub
area generates a default route and advertises the route
to non-ABRs in the stub area.
2022-07-08 1580
Feature Description
NSSA An NSSA is similar to a stub area. An NSSA does not ABRs in an NSSA advertise Type
advertise Type 5 LSAs but can import AS external routes. 3 LSAs carrying a default route
ASBRs in an NSSA generate Type 7 LSAs to carry the within the NSSA. All inter-area
information about the AS external routes. The Type 7 routes are advertised by ABRs.
LSAs are advertised only within the NSSA. When the
Type 7 LSAs reach an ABR in the NSSA, the ABR
translates the Type 7 LSAs into Type 5 LSAs and floods
them to the entire AS.
A totally NSSA area allows only intra-area routes to be
advertised within the area.
Non-broadcast Multiple OSPFv3 considers networks with X.25 as the link layer protocol as NBMA
Access (NBMA) networks by default.
On an NBMA network, protocol packets, such as Hello packets, DD packets,
LSR packets, LSU packets, and LSAck packets are sent in unicast mode.
Point-to-Multipoint (P2MP) No network is a P2MP network by default, no matter what type of link layer
protocol is used on the network. A non-fully meshed NBMA network can be
2022-07-08 1581
Feature Description
Point-to-point (P2P) If the link layer protocol is PPP, HDLC, or LAPB, OSPFv3 defaults the
network type to P2P.
On a P2P network, protocol packets, such as Hello packets, DD packets, LSR
packets, LSU packets, and LSAck packets are sent in multicast mode using
the multicast address FF02::5.
Stub Area
Stub areas are specific areas where ABRs do not flood received AS external routes. In stub areas, routers
maintain fewer routing entries and less routing information than the routers in other areas.
Configuring a stub area is optional. Not every area can be configured as a stub area, because a stub area is
usually a non-backbone area with only one ABR and is located at the AS border.
To ensure the reachability of the routes to destinations outside an AS, the ABR in the stub area generates a
default route and advertises the route to the non-ABRs in the same stub area.
Note the following points when configuring a stub area:
• If an area needs to be configured as a stub area, all the devices in the area must be configured as stub
devices using the stub command.
• No ASBRs are allowed in the area to be configured as a stub area because AS external routes cannot be
transmitted in the stub area.
NSSA
Stub areas cannot import or transmit external routes, which prevents a large number of external routes from
consuming the bandwidth and storage resources of Routers in the Stub areas. If you need to import external
routes to an area and prevent these routes from consuming resources, configure the area as a not-so-stubby
area (NSSA).
Derived from stub areas, NSSAs resemble stub areas in many ways. Different from stub areas, NSSAs can
import AS external routes and advertise them within the entire OSPFv3 AS, without learning external routes
from other areas.
To advertise external routes imported by an NSSA to other areas on the OSPFv3 network, a translator must
2022-07-08 1582
Feature Description
• The propagate bit (P-bit) is used to notify a translator whether Type 7 LSAs need to be translated.
• By default, the translator is the ABR with the largest router ID in the NSSA.
OSPFv3 Multi-process
OSPFv3 supports multi-process. Multiple OSPFv3 processes can independently run on the same router. Route
exchange between different OSPFv3 processes is similar to that between different routing protocols.
2022-07-08 1583
Feature Description
• LSDB
• Flooding mechanism
• Five types of packets: Hello, DD, LSR, LSU, and LSAck packets
• Route calculation
• OSPFv3 uses IPv6 which is based on links rather than network segments.
Therefore, the interfaces on which OSPFv3 is to be configured must be on the same link rather than in
the same network segment. In addition, the interfaces can establish OSPFv3 sessions without IPv6
global addresses.
■ OSPFv3 router LSAs and network LSAs do not contain IP addresses, which are advertised through
link LSAs and intra-area prefix LSAs.
■ In OSPFv3, router IDs, area IDs, and LSA link state IDs no longer indicate IP addresses, but the IPv4
address format is still reserved.
■ Neighbors are identified by router IDs instead of IP addresses on broadcast, NBMA, or P2MP
networks.
■ OSPFv3 can store or flood unidentified packets, whereas OSPFv2 discards unidentified packets.
■ In OSPFv3, unknown LSAs with 1 as the U flag bit can be flooded, and the flooding scope of such
LSAs is specified by the LSAs.
For example, DeviceA and DeviceB can identify LSAs of a certain type. DeviceA and DeviceB are
connected through DeviceC which, however, cannot identify these LSAs. If DeviceA floods such LSA to
DeviceC, DeviceC can still flood the received LSAs to DeviceB although DeviceC does not identify these
LSAs. DeviceB then processes these LSAs.
If OSPFv2 is run, DeviceC discards the unidentified LSAs. As a result, these LSAs cannot reach DeviceB.
2022-07-08 1584
Feature Description
■ The packets flooded on a link are not transmitted to other links, which prevents unnecessary
flooding and saves bandwidth.
■ Link LSA: A device floods a link LSA on the link where it resides to advertise its link-local address
and the configured global IPv6 address.
■ Intra-area prefix LSA: A device advertises an intra-area prefix LSA in the local OSPF area to inform
the other routers in the area or the network (either a broadcast network or an NBMA network) of
its IPv6 global address.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults between
forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two systems. The path
can be a physical link, a logical link, or a tunnel.
In BFD for OSPFv3, a BFD session is associated with OSPFv3. The BFD session quickly detects a link fault and
then notifies OSPFv3 of the fault, which speeds up OSPFv3's response to network topology changes.
2022-07-08 1585
Feature Description
Purpose
A link fault or a topology change causes devices to recalculate routes. Therefore, it is important to shorten
the convergence time of routing protocols to improve network performance.
As link faults are inevitable, rapidly detecting these faults and notifying routing protocols is an effective way
to quickly resolve such issues. If BFD is associated with the routing protocol and a link fault occurs, BFD can
speed up the convergence of the routing protocol.
Principles
Figure 1 BFD for OSPFv3
Figure 1 shows a typical network topology with BFD for OSPFv3 configured. The principles of BFD for
OSPFv3 are described as follows:
3. The outbound interface of the route from DeviceA to DeviceB is interface 1. If the link between Device
A and DeviceB fails, BFD detects the fault and notifies DeviceA of the fault.
4. DeviceA processes the neighbor Down event and recalculates the route. The new outbound interface
of the route is interface 2. Packets from DeviceA pass through DeviceC to reach DeviceB.
2022-07-08 1586
Feature Description
A higher convergence priority can be configured for routes over which key services are transmitted so that
these routes can converge first, which minimizes the impact on key services.
Background
As networks develop, Voice over Internet Protocol (VoIP) and online video services pose higher requirements
for real-time transmission. Nevertheless, if a primary link fails, OSPFv3-enabled devices need to perform
multiple operations, including detecting the fault, updating the link-state advertisement (LSA), flooding the
LSA, calculating routes, and delivering forward information base (FIB) entries before switching traffic to a
new link. This process takes a much longer time than the minimum delay to which users are sensitive. As a
result, the requirements for real-time transmission cannot be met.
Principles
OSPFv3 IP FRR refers to a mechanism in which a device uses the loop-free alternate (LFA) algorithm to
precompute the next hop of a backup route, and stores the primary and backup routes to the same
destination address but with different next hops in the forwarding table. If the primary link fails, the device
switches traffic to the backup link before route convergence is complete on the control plane. This
mechanism minimizes the length of traffic interruptions and protects services. The NE40E supports OSPFv3
IP FRR.
A device uses the shortest path first (SPF) algorithm to calculate the shortest path from each neighbor that
can provide a backup link to the destination node. The device then uses the inequalities defined in standard
protocols and the LFA algorithm to calculate the next hop of the loop-free backup link that has the smallest
cost of the available shortest paths.
An OSPFv3 IP FRR policy is used to filter alternate next hops. Only the alternate next hops that match the
filtering rules of the policy can be added to the IP routing table. Users can configure a desired OSPFv3 IP
FRR to filter alternate next hops.
If a Bidirectional Forwarding Detection (BFD) session is bound to OSPFv3 IP FRR, the BFD session goes down
if BFD detects a link fault. If the BFD session goes down, OSPFv3 IP FRR is triggered on the interface to
switch traffic from the faulty link to the backup link, which minimizes the loss of traffic.
Usage Scenario
OSPFv3 IP FRR guarantees protection against either a link failure or a node-and-link failure. Distance_opt (X,
Y) indicates the shortest path from node X to node Y.
2022-07-08 1587
Feature Description
• Link protection: Link protection takes effect when the traffic to be protected flows along a specified
link and the link costs meet the inequality: Distance_opt (N, D) < Distance_opt (N, S) + Distance_opt (S,
D).
■ S: source node
■ D: destination node
On the network shown in Figure 1, traffic is forwarded from DeviceS to DeviceD. The primary link is
DeviceS -> DeviceE -> DeviceD, and the backup link is DeviceS -> DeviceN -> DeviceE -> DeviceD. The
link costs satisfy the link protection inequality. If the primary link fails, DeviceS switches the traffic to
the backup link, minimizing the traffic interruption duration.
• Link-and-node protection: Node-and-link protection takes effect when the traffic to be protected flows
along a specified link and node. Figure 2 shows the networking for link-and-node protection. The link-
and-node protection takes precedence over the link protection.
Link-and-node protection must satisfy the following conditions:
■ The link cost must satisfy the inequality: Distance_opt (N, D) < Distance_opt (N, S) + Distance_opt
(S, D).
■ The interface cost must satisfy the inequality: Distance_opt (N, D) < Distance_opt (N, E) +
Distance_opt (E, D).
S indicates the source node of traffic, E indicates the faulty node, N indicates the node on the backup
link, and D indicates the destination node of traffic.
2022-07-08 1588
Feature Description
OSPFv3 FRR in the Scenario Where Multiple Nodes Advertise the Same
Route
OSPFv3 IP FRR uses the SPF algorithm to calculate the shortest path from each neighbor (root node) that
provides a backup link to the destination node and store the node-based backup next hop, which applies to
single-node routing scenarios. As networks are increasingly diversified, two ABRs or ASBRs are deployed to
improve network reliability. In this case, OSPFv3 FRR in a scenario where multiple nodes advertise the same
route is needed.
In a scenario where multiple nodes advertise the same route, OSPFv3 FRR is implemented by calculating the Type 3 LSAs
advertised by ABRs of an area for intra-area, inter-area, ASE routing. Inter-area routing is used as an example to
describe how OSPFv3 FRR works in a scenario where multiple nodes advertise the same route.
Figure 3 OSPFv3 FRR in the scenario where multiple nodes advertise the same route
In Figure 3, Device B and Device C function as ABRs to forward routes between area 0 and area 1. Device E
advertises an intra-area route. Upon receipt of the route, Device B and Device C translate it into a Type 3
LSA and flood the LSA to area 0. After OSPFv3 FRR is enabled on Device A, Device A considers both Device B
and Device C as its neighbors. Without a fixed neighbor as the root node, Device A fails to calculate the FRR
backup next hop. To address this problem, a virtual node is simulated between Device B and Device C and
used as the root node of Device A, and Device A uses the LFA algorithm to calculate the backup next hop.
This solution converts multi-node routing into single-node routing.
For example, both Device B and Device C advertise the route 2001:DB8:1::1/64, and OSPFv3 FRR is enabled
on Device A. After Device A receives the route, it fails to calculate a backup next hop for the route due to a
lack of a fixed root node. To address this problem, a virtual node is simulated between Device B and Device
C and used as the root node of Device A. The virtual node forms a link with each of Device B and Device C. If
the virtual node advertises a 2001:DB8:1::1/64 route, it will use the smaller cost of the routes advertised by
Device B and Device C as the cost of the route. If the cost of the route advertised by Device B is 5 and that of
the route advertised by Device C is 10, the cost of the route advertised by the virtual node is 5. The cost of
2022-07-08 1589
Feature Description
the link from Device B to the virtual node is 0, and that of the link from Device C to the virtual node is 5.
The costs of the links from the virtual node to Device B and Device C are both 65535, the maximum value.
Device A is configured to consider Device B and Device C as invalid sources of the 2001:DB8:1::1/64 route
and use the LFA algorithm to calculate the backup next hop for the route, with the virtual node as the root
node.
10.7.2.6 OSPFv3 GR
Graceful restart (GR) is a technology used to ensure proper traffic forwarding, especially the forwarding of
key services, during the restart of routing protocols.
Without GR, the master/slave main control board switchover due to various reasons leads to transient
service interruption, and as a result, route flapping occurs on the whole network. Such route flapping and
service interruption are unacceptable on large-scale networks.
GR is one of the high availability (HA) technologies which comprise a series of comprehensive technologies,
such as fault-tolerant redundancy, link protection, faulty node recovery, and traffic engineering technologies.
As a fault-tolerant redundancy technology, GR is widely used to ensure non-stop forwarding of key data
during the master/slave main control board switchovers and system upgrade.
In GR mode, the forwarding plane continues data forwarding during a restart, and operations on the control
plane, such as re-establishment of neighbor relationships and route calculation, do not affect the forwarding
plane, preventing service interruptions caused by route flapping and improving network reliability.
Table 1 Comparison between master/slave main control board switchovers with and without GR
Master/Slave Main Control Board Switchovers Master/Slave Main Control Board Switchovers with
Without GR GR
OSPFv3 neighbor relationships are reestablished. OSPFv3 neighbor relationships are reestablished.
Routes are recalculated. Routes are recalculated.
FIB entries change. FIB entries remain unchanged.
The entire network detects route changes, and Except the neighbors of the router on which a
route flapping occurs for a short period of time. master/slave main control board switchover occurs,
Packets are lost during forwarding, and services other routers do not detect route changes.
are interrupted. No packets are lost during forwarding, and services are
not affected.
2022-07-08 1590
Feature Description
Definition
As an extension to OSPFv3, OSPFv3 VPN multi-instance enables Provider Edges (PEs) and Customer Edges
(CEs) in VPN networks to run OSPFv3 for interworking and use OSPFv3 to learn and advertise routes.
Purpose
As a widely used IGP, in most cases, OSPFv3 runs in VPNs. If OSPFv3 runs between PEs and CEs, and PEs use
OSPFv3 to advertise VPN routes to CEs, no other routing protocols need to be configured on CEs for
interworking with PEs, which simplifies management and configuration of CEs.
• OSPFv3 is used in a site to learn routes. Running OSPFv3 between PEs and CEs can reduce the number
of protocol types supported by CEs.
• Similarly, running OSPFv3 both in a site and between PEs and CEs simplifies the work of network
administrators and reduces the number of protocols that network administrators must be familiar with.
• When the network, which originally uses OSPFv3 but not VPN on the backbone network begins to use
BGP/MPLS VPN, running OSPFv3 between PEs and CEs facilitates the transition.
In Figure 1, CE1, CE3, and CE4 belong to VPN 1, and the numbers following OSPFv3 indicate the process IDs
of the multiple OSPFv3 instances running on PEs.
2022-07-08 1591
Feature Description
1. PE1 imports OSPFv3 routes of CE1 into BGP and forms BGP VPNv6 routes.
3. PE2 imports the BGP VPNv4 routes into OSPFv3 and then advertises these routes to CE3 and CE4.
The process of advertising routes of CE4 or CE3 to CE1 is the same as the preceding process.
OSPFv3 Domain ID
If inter-area routes are advertised between local and remote OSPFv3 areas, these areas are considered to be
in the same OSPFv3 domain.
• Each OSPFv3 domain has one or more domain IDs. If more than one domain ID is available, one of the
domain IDs is a primary ID, and the others are secondary IDs.
• If an OSPFv3 instance does not have specific domain IDs, its ID is considered as null.
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of OSPFv3 routes
(Type 3 or Type 5) to be advertised to CEs based on the domain IDs, as described in Table 1.
• If local domain IDs are the same as or compatible with remote domain IDs in BGP routes, PEs advertise
Type 3 routes.
• If local domain IDs are different from or incompatible with remote domain IDs in BGP routes, PEs
advertise Type 5 routes.
Relationship Between Local and Remote Domain IDs Type of the Generated Routes
The remote domain ID equals the local primary domain ID or Inter-area routes
one of the local secondary domain IDs.
The remote domain ID is different from the local primary If the local area is a non-NSSA, external
domain ID or any of the local secondary domain IDs. routes are generated.
If the local area is an NSSA, NSSA routes
are generated.
2022-07-08 1592
Feature Description
In Figure 2, on PE1, OSPFv3 imports a BGP route destined for 2001:db8:1::1/64 and then generates and
advertises a Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPFv3 route with 2001:db8:1::1/64 as the
destination address and PE1 as the next hop and advertises the route to PE2. Therefore, PE2 learns an
OSPFv3 route with 2001:db8:1::1/64 as the destination address and CE1 as the next hop.
Similarly, CE1 also learns an OSPFv3 route with 2001:db8:1::1/64 as the destination address and PE2 as the
next hop. PE1 learns an OSPF route with 2001:db8:1::1/64 as the destination address and CE1 as the next
hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops respectively, and the next hops of
the routes from PE1 and PE2 to 2001:db8:1::1/64 are CE1, which leads to a routing loop.
In addition, the priority of an OSPFv3 route is higher than that of a BGP route. Therefore, on PE1 and PE2,
BGP routes to 2001:db8:1::1/64 are replaced with the OSPFv3 route, and the OSPFv3 route with
2001:db8:1::1/64 as the destination address and CE1 as the next hop is active in the routing tables of PE1
and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by OSPFv3 is
deleted, which causes the OSPFv3 route to be withdrawn. As a result, no OSPFv3 route exists in the routing
table, and the BGP route becomes active again. This cycle causes route flapping.
OSPFv3 VPN provides a few solutions to routing loops, as described in Table 2.
DN-bit It is a flag bit used by OSPFv3 multi-instance When advertising the generated
processes to prevent routing loops. Type 3, Type 5, or Type 7 LSAs to
CEs, PEs set the DN-bit of these
LSAs to 1. PEs retain the DN-bit
(0) of other LSAs.
When calculating routes, the
OSPFv3 multi-instance process of
a PE ignores LSAs with DN-bit 1,
which prevents the PE from
2022-07-08 1593
Feature Description
VPN route tag The VPN route tag is carried in Type 5 or When a PE detects that the VPN
Type 7 LSAs generated by PEs based on the route tag in the incoming LSA is
received BGP VPN route. the same as that in the local LSA,
It is not carried in BGP extended community the PE ignores this LSA, which
attributes. The VPN route tag is valid only on prevents routing loops.
the PEs that receive BGP routes and generate
OSPFv3 LSAs.
Default route It is a route whose destination IP address and PEs do not calculate default
mask are both 0. routes.
Default routes are used to
forward the traffic from CEs or
the sites where CEs reside to the
VPN backbone network.
Multi-VPN-Instance CE
OSPFv3 multi-instance generally runs on PEs. Devices that run OSPFv3 multi-instance within user LANs are
called Multi-VPN-Instance CEs (MCEs).
Compared with OSPFv3 multi-instance running on PEs, MCEs have the following characteristics:
• MCEs establish one OSPFv3 instance for each service. Different virtual CEs transmit different services,
which ensures LAN security at a low cost.
• MCEs implement different OSPFv3 instances on a CE. The key to implementing MCEs is to disable loop
detection and calculate routes directly. MCEs also use the received LSAs with the DN-bit 1 for route
calculation.
2022-07-08 1594
Feature Description
If DeviceC fails, traffic is switched to DeviceB after rerouting. Packets are lost when DeviceC recovers.
Because OSPFv3 route convergence is faster than BGP route convergence, OSPFv3 convergence is complete
whereas BGP route convergence is still going on when DeviceC recovers. The next hop of the route from
DeviceA to DeviceD is DeviceC, which, however, does not know the route to DeviceD since BGP convergence
on DeviceC is not complete.
Therefore, DeviceC discards the packets destined for DeviceD after receiving them from DeviceA, as shown in
Figure 2.
2022-07-08 1595
Feature Description
• Data integrity: Received packets are authenticated to check whether they have been modified.
• Data authentication: The data source is authenticated to ensure that the data is sent from a real sender.
• Anti-replay: The attacks from malicious users who repeatedly send obtained data packets are prevented.
Specifically, the receiver rejects old or repeated data packets.
IPsec adopts two security protocols: Authentication Header (AH) security and Encapsulating Security Payload
(ESP):
• AH: A protocol that provides data origin authentication, data integrity check, and anti-replay protection.
AH does not encrypt packets to be protected.
AH data is carried in the following fields:
■ IP version
■ Header length
■ Packet length
■ Identification
■ Protocol
■ Options
• ESP: A protocol that provides IP packet encryption and authentication mechanisms besides the functions
provided by AH. The encryption and authentication mechanisms can be used together or independently.
2022-07-08 1596
Feature Description
Prior to the OSPFv3 Authentication Trailer, OSPFv3 can use only IPsec for authentication. However, on some
special networks, a mobile ad hoc network (MANET) for example, IPsec is difficult to deploy and maintain.
To address this problem, standard protocol introduces Authentication Trailer for OSPFv3, which provides
another approach for OSPFv3 to implement authentication.
In OSPFv3 authentication, an authentication field is added to each OSPFv3 packet for encryption. When a
local device receives an OSPFv3 packet from a remote device, the local device discards the packet if the
authentication password carried in the packet is different from the local one, which protects the local device
against potential attacks. Therefore, OSPFv3 authentication improves network security.
Based on the applicable scope, OSPFv3 authentication is classified as follows:
• Area authentication
Area authentication is configured in the OSPFv3 area view and applies to packets received by all
interfaces in an OSPF area.
• Process authentication
Process authentication is configured in the OSPFv3 view and applies to all packets in an OSPF process.
• Interface authentication
Interface authentication is configured in the interface view and applies to all packets received by the
interface.
• 1: simple authentication
• 2: ciphertext authentication
• Interface authentication configurations must be the same on all devices of the same network so that
OSPFv3 neighbor relationships can be established.
2022-07-08 1597
Feature Description
• Area authentication configurations must be the same on all devices in the same area.
Background
If the status of an interface carrying OSPFv3 services alternates between Up and Down, OSPFv3 neighbor
relationship flapping occurs on the interface. During the flapping, OSPFv3 frequently sends Hello packets to
reestablish the neighbor relationship, synchronizes LSDBs, and recalculates routes. In this process, a large
number of packets are exchanged, adversely affecting neighbor relationship stability, OSPFv3 services, and
other OSPFv3-dependent services, such as LDP and BGP. OSPFv3 neighbor relationship flapping suppression
can address this problem by delaying OSPFv3 neighbor relationship reestablishment or preventing service
traffic from passing through flapping links.
Related Concepts
flapping_event: reported when the status of a neighbor relationship on an interface last changes from Full
to a non-Full state. The flapping_event triggers flapping detection.
flapping_count: number of times flapping has occurred.
detect-interval: detection interval. The interval is used to determine whether to trigger a valid
flapping_event.
threshold: flapping suppression threshold. When the flapping_count reaches or exceeds threshold, flapping
suppression takes effect.
resume-interval: interval for exiting from OSPFv3 neighbor relationship flapping suppression. If the interval
between two successive valid flapping_events is longer than resume-interval, the flapping_count is reset.
Implementation
Flapping detection
Each OSPFv3 interface on which OSPFv3 neighbor relationship flapping suppression is enabled starts a
flapping_count. If the interval between two successive neighbor status changes from Full to a non-Full state
is shorter than detect-interval, a valid flapping_event is recorded, and the flapping_count increases by 1.
When the flapping_count reaches or exceeds threshold, flapping suppression takes effect. If the interval
between two successive neighbor status changes from Full to a non-Full state is longer than resume-
interval, the flapping_count is reset.
The detect-interval, threshold, and resume-interval are configurable.
2022-07-08 1598
Feature Description
Flapping suppression
Flapping suppression works in either Hold-down or Hold-max-cost mode.
• Hold-down mode: In the case of frequent flooding and topology changes during neighbor relationship
establishment, interfaces prevent neighbor relationships from being reestablished during the
suppression period, which minimizes LSDB synchronization attempts and packet exchanges.
• Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use 65535 as the cost
of the flapping link during Hold-max-cost suppression, which prevents traffic from passing through the
flapping link.
Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression duration can be changed
manually.
If an attack causes frequent neighbor relationship flapping, Hold-down mode can minimize the impact of
the attack.
When an interface enters the flapping suppression state, all neighbor relationships on the interface enter the state
accordingly.
Typical Scenarios
Basic scenario
In Figure 1, the traffic forwarding path is Device A -> Device B -> Device C -> Device E before a link failure
occurs. After the link between Device B and Device C fails, the forwarding path switches to Device A ->
Device B -> Device D -> Device E. If the neighbor relationship between Device B and Device C frequently
flaps at the early stage of the path switchover, the forwarding path will be switched frequently, causing
traffic loss and affecting network stability. If the neighbor relationship flapping meets suppression
conditions, flapping suppression takes effect.
• If flapping suppression works in Hold-down mode, the neighbor relationship between Device B and
Device C is prevented from being reestablished during the suppression period, in which traffic is
forwarded along the path Device A -> Device B -> Device D -> Device E.
2022-07-08 1599
Feature Description
• If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the link between
Device B and Device C during the suppression period, and traffic is forwarded along the path Device A -
> Device B -> Device D -> Device E.
Broadcast scenario
In Figure 3, four devices are deployed on the same broadcast network using switches, and the devices are
2022-07-08 1600
Feature Description
broadcast network neighbors. If Device C flaps due to a link failure, and Device A and Device B were
deployed at different time (Device A was deployed earlier for example) or the flapping suppression
parameters on Device A and Device B are different, Device A first detects the flapping and suppresses Device
C. Consequently, the Hello packets sent by Device A do not carry Device C's router ID. However, Device B has
not detected the flapping yet and still considers Device C a valid node. As a result, the DR candidates
identified by Device A are Device B and Device D, whereas the DR candidates identified by Device B are
Device A, Device C, and Device D. Different DR candidates result in a different DR election result, which may
lead to route calculation errors. To prevent this problem in scenarios where an interface has multiple
neighbors, such as on a broadcast, P2MP, or NBMA network, all neighbors on the interface are suppressed
when the status of a neighbor relationship last changes to ExStart or Down. Specifically, if Device C flaps,
Device A, Device B, and Device D on the broadcast network are all suppressed. After the network stabilizes
and the suppression timer expires, Device A, Device B, and Device D are restored to normal status.
Multi-area scenario
In Figure 4, Device A, Device B, Device C, Device E, and Device F are connected in area 1, and Device B,
Device D, and Device E are connected in backbone area 0. Traffic from Device A to Device F is preferentially
forwarded along an intra-area route, and the forwarding path is Device A -> Device B -> Device C -> Device
E -> Device F. When the neighbor relationship between Device B and Device C flaps and the flapping meets
suppression conditions, flapping suppression takes effect in the default mode (Hold-max-cost).
Consequently, 65535 is used as the cost of the link between Device B and Device C. However, the forwarding
path remains unchanged because intra-area routes take precedence over inter-area routes during route
selection according to OSPFv3 route selection rules. To prevent traffic loss in multi-area scenarios, configure
Hold-down mode to prevent the neighbor relationship between Device B and Device C from being
reestablished during the suppression period. During this period, traffic is forwarded along the path Device A
-> Device B -> Device D -> Device E -> Device F.
By default, the Hold-max-cost mode takes effect. The mode can be changed to Hold-down manually.
2022-07-08 1601
Feature Description
Context
If network-wide OSPFv3 LSA flush causes network instability, source tracing must be implemented as soon
as possible to locate and isolate the fault source. However, OSPFv3 itself does not support source tracing. A
conventional solution is isolation node by node until the faulty node is located. The solution is complex and
time-consuming. Therefore, a fast source tracing method is required. To solve the preceding problem,
OSPFv3 introduces a proprietary protocol, namely, the source tracing protocol. This protocol supports the
flooding of flush source information. When the preceding problem occurs, you can quickly query the flush
source information on any device on the network to quickly locate the fault source.
Related Concepts
Source tracing
A mechanism that helps locate the device that flushes OSPFv3 LSAs. This feature has the following
characteristics:
• Uses a new UDP port. Source tracing packets are carried by UDP packets, and the UDP packets carry the
OSPFv3 LSAs flushed by the current device and are flooded hop by hop based on the OSPFv3 topology.
• Forwards packets along UDP channels which are independent of the channels used to transmit OSPFv3
packets. Therefore, this protocol facilitates incremental deployment. In addition, source tracing does not
affect the devices with the related UDP port disabled.
• Supports query of the node that flushed LSAs on any device that supports this feature after source
tracing packets are flooded on the network, which speeds up fault locating and faulty node isolation by
maintenance personnel.
Flush
2022-07-08 1602
Feature Description
Fundamentals
The implementation of OSPFv3 flush source tracing is as follows:
Only router-LSAs, network-LSAs, and inter-area-router-LSAs can be flushed. Therefore, a device generates a PS-LSA only
when it flushes a router-LSA, network-LSA, or inter-area-router-LSA.
2022-07-08 1603
Feature Description
Devices A and B both support DeviceA sends a PS-Hello packet to notify its source tracing
source tracing. capability.
Upon reception of the PS-Hello packet, DeviceB sets the source
tracing field for DeviceA and replies with an ACK packet to notify its
source tracing capability to DeviceA.
Upon reception of the ACK packet, DeviceA sets the source tracing
field for DeviceB, and does not retransmit the PS-Hello packet.
DeviceA supports source tracing, DeviceA sends a PS-Hello packet to notify its source tracing
but DeviceB does not. capability.
DeviceA fails to receive an ACK packet from DeviceB after 10s elapses
and retransmits the PS-Hello packet. A maximum of two
retransmissions are allowed. After DeviceA fails to receive an ACK
packet from DeviceB after two retransmissions, DeviceA considers
that DeviceB does not support source tracing.
Devices A and B both support After source tracing is disabled from DeviceB, DeviceB sends a PS-
source tracing, but source tracing is Hello packet to notify its source tracing incapability.
2022-07-08 1604
Feature Description
disabled from DeviceB. Upon reception of the PS-Hello packet from DeviceB, DeviceA replies
with an ACK packet that carries the source tracing capability.
Upon reception of the ACK packet from DeviceA, DeviceB considers
the capability negotiation complete and disables the UDP port.
DeviceA does not support source After source tracing is disabled from DeviceB, DeviceB sends a PS-
tracing, and source tracing is Hello packet to notify its source tracing incapability.
disabled from DeviceB. DeviceB fails to receive an ACK packet within 10s and retransmits the
PS-Hello packet. A maximum of two retransmissions are allowed.
After two retransmissions, DeviceB considers the capability
negotiation complete and disables the UDP port.
• If a device flushes an LSA, it generates and floods a PS-LSA to source tracing-capable neighbors.
• If a device receives a flush LSA from a source tracing-incapable neighbor, the device generates and
floods a PS-LSA to source tracing-capable neighbors. If a device receives the same flush LSA (with the
same LSID and sequence number) from more than one source tracing-incapable neighbor, the device
generates only one PS-LSA.
• After DeviceA receives the flush LSA from source tracing-incapable DeviceB, DeviceA generates a PS-LSA
in which the Flush Router field is its router ID and the Neighbor Router field is the router ID of
2022-07-08 1605
Feature Description
DeviceB, and adds the PS-LSA to the queue where packets are to be sent to all source tracing-capable
neighbors.
• After DeviceA receives the flush LSA from DeviceB, followed by the same flush LSA sent by DeviceC,
DeviceA generates a PS-LSA in which the Flush Router field is its router ID and the Neighbor Router
field is the router ID of DeviceB, and adds the PS-LSA to the queue where packets are to be sent to all
source tracing-capable neighbors. No PS-LSA is generated in response to the flush LSA received from
DeviceC.
• During neighbor relationship establishment, a device initializes the sequence number of the PS-LSU
packet of the neighbor. When the device replies with a PS-LSU packet, it adds the sequence number of
the PS-LSU packet of the neighbor. During PS-LSU packet retransmission, the sequence number remains
unchanged. After the device receives a PS-LSU ACK packet with the same sequence number, it increases
the sequence number of the neighbor's PS-LSU packet by 1.
• The neighbor manages the PS-LSA sending queue. When a PS-LSA is added to the queue which was
empty, the neighbor starts a timer. After the timer expires, the neighbor adds the PS-LSA to a PS-LSU
packet, sends the packet to its neighbor, and starts another timer to wait for a PS-LSU ACK packet.
• After the PS-LSU ACK timer expires, the PS-LSU packet is retransmitted.
• When the device receives a PS-LSU ACK packet with a sequence number same as that in the neighbor
record, the device clears PS-LSAs from the neighbor queue, and sends another PS-LSU packet after the
timer expires.
■ If the sequence number of a received PS-LSU ACK packet is less than that in the neighbor record,
the device ignores the packet.
■ If the sequence number of a received PS-LSU ACK packet is greater than that in the neighbor
record, the device discards the packet.
• When a device receives a PS-LSU packet from a neighbor, the neighbor records the sequence number of
the packet and replies with a PS-LSU ACK packet.
• When the device receives a PS-LSU packet with the sequence number the same as that in the neighbor
record, the device discards the PS-LSU packet.
• After the device parses a PS-LSU packet, it adds the PS-LSA in the packet to the LSDB. The device also
checks whether the PS-LSA is newer than the corresponding PS-LSA in the LSDB.
■ If the received PS-LSA is the same as the corresponding local one, the device does not process the
2022-07-08 1606
Feature Description
received PS-LSA.
■ If the received PS-LSA is older, the device floods the corresponding PS-LSA in the LSDB to the
neighbor.
• If the device receives a PS-LSU packet from a neighbor and the neighbor does not support source
tracing, the device modifies the neighbor status as source tracing capable.
GTSM GTSM is a security mechanism that checks whether the time to live
(TTL) value in each received IP packet header is within a pre-defined
range.
CPU-CAR Interface boards can check the packets to be sent to the CPU for
processing and prevent the main control board from being
overloaded by a large number of packets that are sent to the CPU.
The source tracing protocol needs to apply for an independent CAR
channel and has small CAR values configured.
Typical Scenarios
Scenario where all nodes support source tracing
2022-07-08 1607
Feature Description
Assume that all nodes on the network support source tracing and DeviceA is the fault source. In this
scenario, the fault source can be accurately located. Figure 3 shows the networking.
When DeviceA flushes an OSPFv3 LSA, it generates a PS-LSA that carries DeviceA information and brief
information about the OSPFv3 flush LSA. After the fault occurs, maintenance personnel can log in to any
node on the network to locate DeviceA, which keeps sending flush LSAs, and isolate DeviceA from the
network.
Scenario where source tracing-incapable nodes are not isolated from source tracing-capable nodes
All nodes on the network except DeviceC support source tracing, and DeviceA is the faulty source. In this
case, the PS-LSA can be flooded on the entire network, and the fault source can be accurately located.
Figure 4 shows the networking.
2022-07-08 1608
Feature Description
Figure 4 Scenario where source tracing-incapable nodes are not isolated from source tracing-capable nodes
When DeviceA flushes an OSPFv3 LSA, it generates a PS-LSA that carries DeviceA information and brief
information about the flush LSA. When DeviceB and DeviceE negotiate the source tracing capability with
DeviceC, they find that DeviceC does not support source tracing. Therefore, after DeviceB receives the PS-LSA
from DeviceA, DeviceB sends the PS-LSA to DeviceD, but not to DeviceC. After receiving the OSPFv3 flush
LSA from DeviceC, DeviceE generates a PS-LSA that carries information about the advertisement source
(DeviceE), flush source (DeviceC), and the flush LSA, and floods the PS-LSA on the network.
After the fault occurs, maintenance personnel can log in to any device on the network except DeviceC to
locate the faulty node. Two possible faulty nodes can be located in this case: DeviceA and DeviceC, and they
both send the same flush LSA. In this case, DeviceA takes precedence over DeviceC when the maintenance
personnel determine the most possible faulty source. After DeviceA is isolated, the network recovers.
Scenario where source tracing-incapable nodes are isolated from source tracing-capable nodes
All nodes on the network except DeviceC and DeviceD support source tracing, and DeviceA is the fault
source. In this case, the PS-LSA cannot be flooded on the entire network. Figure 5 shows the networking.
2022-07-08 1609
Feature Description
Figure 5 Scenario where source tracing-incapable nodes are isolated from source tracing-capable nodes
When DeviceA flushes an OSPFv3 LSA, it generates a PS-LSA that carries DeviceA information and brief
information about the flush LSA. However, the PS-LSA can reach only DeviceB because DeviceC and DeviceD
do not support source tracing.
During source tracing capability negotiation, DeviceE finds that DeviceC does not support source tracing, and
DeviceF finds that DeviceD does not support source tracing. After DeviceE receives the flush LSA from
DeviceC, DeviceE generates and floods a PS-LSA on behalf of DeviceC. Similarly, after DeviceF receives the
flush LSA from DeviceD, DeviceF generates and floods a PS-LSA on behalf of DeviceD.
• If maintenance personnel log in to DeviceA or DeviceB, the personnel can locate the fault source
(DeviceA) directly. After DeviceA is isolated, the network recovers.
• If the maintenance personnel log in to DeviceE, DeviceF, DeviceG, or DeviceH, the personnel will find
that DeviceE claims DeviceC to be the fault source of the OSPFv3 flush LSA and DeviceF claims DeviceD
to be the fault source of the same OSPFv3 flush LSA.
• If the maintenance personnel log in to DeviceC and DeviceD, the personnel will find that the flush LSA
was initiated by DeviceB, not generated by DeviceC or DeviceD.
• If the maintenance personnel log in to DeviceB, the personnel will find that DeviceA is the fault source,
and isolate DeviceA. After DeviceA is isolated, the network recovers.
2022-07-08 1610
Feature Description
• Hello packet
Packet length 16 bits Length of the OSPFv3 packet containing the packet header, in bytes.
Area ID 32 bits ID of the area to which the Router that sends the OSPFv3 packet
belongs.
Checksum 16 bits Checksum of the OSPFv3 packet that does not contain the
Authentication field.
2022-07-08 1611
Feature Description
Hello Packet
Hello packets are commonly used packets, which are periodically sent on OSPFv3 interfaces to establish and
maintain neighbor relationships. A Hello packet includes information about the designated router (DR),
backup designated router (BDR), timers, and known neighbors. Figure 2 shows the format of a Hello packet.
NOTE:
2022-07-08 1612
Feature Description
RouterDeadInterval16 bits Dead interval. If a Router does not receive any Hello packets from its
neighbors within a specified dead interval, the neighbors are considered
Down.
Table 3 lists the address types, interval types, and default intervals used when Hello packets are transmitted
on different networks.
Non- Unicast HelloInterval for the DR, BDR, and 30 seconds for HelloInterval
broadcast address Router that can become a DR 120 seconds for PollInterval
multiple PollInterval for the case when
access neighbors become Down and
(NBMA) HelloInterval for other cases
To establish neighbor relationships between Routers on the same network segment, you must set the same
HelloInterval, PollInterval, and RouterDeadInterval values for the Routers. PollInterval applies only to NBMA networks.
2022-07-08 1613
Feature Description
DD Packet
During an adjacency initialization, two Routers use DD packets to describe their own link state databases
(LSDBs) for LSDB synchronization. A DD packet contains the header of each LSA in an LSDB. An LSA header
uniquely identifies an LSA. The LSA header occupies only a small portion of the LSA, which reduces the
amount of traffic transmitted between Routers. A neighbor can use the LSA header to check whether it
already has the LSA. When two Routers exchange DD packets, one functions as the master and the other
functions as the slave. The master defines a start sequence number. The master increases the sequence
number by one each time it sends a DD packet. After the slave receives a DD packet, it uses the sequence
number carried in the DD packet for acknowledgment.
Figure 3 shows the format of a DD packet.
Interface MTU 16 bits Maximum length of the DD packet sent by the interface with packet
fragmentation disabled.
M (More) 1 bit If the DD packet is the last packet among multiple consecutive DD
packets sent by a Router, this field is set to 0. In other cases, this field is
2022-07-08 1614
Feature Description
set to 1.
M/S 1 bit When two Routers exchange DD packets, they negotiate a master/slave
(Master/Slave) relationship. The Router with a larger router ID becomes the master. If
this field is set to 1, the DD packet is sent by the master.
DD sequence 32 bits Sequence number of the DD packet. The master and slave use the
number sequence number to ensure that DD packets are correctly transmitted.
LSR Packet
After two Routers exchange DD packets, they send LSR packets to request each other's LSAs. The LSR
packets contain the summaries of the requested LSAs. Figure 4 shows the format of an LSR packet.
Link State ID 32 bits This field together with the LS type field describes an LSA in an AS.
The LS type, Link State ID, and Advertising Router fields can uniquely identify an LSA. If two LSAs have the same LS type,
Link State ID, and Advertising Router fields, a Router uses the LS sequence number, LS checksum, and LS age fields to
2022-07-08 1615
Feature Description
LSU Packet
A Router uses an LSU packet to transmit LSAs requested by its neighbors or to flood its own updated LSAs.
The LSU packet contains a set of LSAs. For multicast and broadcast networks, LSU packets are multicast to
flood LSAs. To ensure reliable LSA flooding, a Router uses an LSAck packet to acknowledge the LSAs
contained in an LSU packet that is received from a neighbor. If an LSA fails to be acknowledged, the Router
retransmits the LSA to the neighbor. Figure 5 shows the format of an LSU packet.
LSAck Packet
A Router uses an LSAck packet to acknowledge the LSAs contained in a received LSU packet. The LSAs can
be acknowledged using LSA headers. LSAck packets can be transmitted over different links in unicast or
multicast mode. Figure 6 shows the format of an LSAck packet.
2022-07-08 1616
Feature Description
• Router-LSA (Type 1)
• Network-LSA (Type 2)
• Inter-Area-Prefix-LSA (Type 3)
• Inter-Area-Router-LSA (Type 4)
• AS-external-LSA (Type 5)
• Link-LSA (Type 8)
• Intra-Area-Prefix-LSA (Type 9)
2022-07-08 1617
Feature Description
LS age 16 bits Time that elapses after the LSA is generated, in seconds. The value of
this field continually increases regardless of whether the LSA is
transmitted over a link or saved in an LSDB.
Link State ID 32 bits This field together with the LS type field describes an LSA in an area.
LS sequence 32 bits Sequence number of the LSA. Routers can use this field to identify the
number latest LSA.
Length 16 bits Length of the LSA including the LSA header, in bytes.
Router-LSA
A router-LSA (Type 1) describes the link status and cost of a Router. Router-LSAs are generated by a Router
and advertised within the area to which the Router belongs. Figure 2 shows the format of a router-LSA.
2022-07-08 1618
Feature Description
Nt (NSSA 1 bit If the Router that generates the LSA is an NSSA border router, this field
translation) is set to 1. In other cases, this field is set to 0. When this field is set to 1,
the Router unconditionally translates NSSA-LSAs into AS-external-LSAs.
V (Virtual Link) 1 bit If the Router that generates the LSA is located at one end of a virtual
link, this field is set to 1. In other cases, this field is set to 0.
E (External) 1 bit If the Router that generates the LSA is an autonomous system boundary
router (ASBR), this field is set to 1. In other cases, this field is set to 0.
B (Border) 1 bit If the Router that generates the LSA is an area border router (ABR), this
field is set to 1. In other cases, this field is set to 0.
Type 8 bits Type of the Router link. The values are as follows:
1: Connected to another Router in point-to-point (P2P) mode.
2: Connected to a transport network.
3: Reserved.
2022-07-08 1619
Feature Description
4: Virtual link.
Network-LSA
A network-LSA (Type 2) records the router IDs of all Routers on the local network segment. Network-LSAs
are generated by a DR on a broadcast or non-broadcast multiple access (NBMA) network and advertised
within the area to which the DR belongs. Figure 3 shows the format of a network-LSA.
Attached Router 32 bits Router IDs of all Routers on the same network, including the router ID
of the DR
2022-07-08 1620
Feature Description
Inter-Area-Prefix-LSA
An inter-area-prefix-LSA (Type 3) describes routes on a network segment in an area. It is generated by the
ABR. The routes are advertised to other areas.
Figure 4 shows the format of an inter-area-prefix-LSA.
PrefixOption 8 bits Prefix-related capability option, indicating the length in the packet.
Inter-Area-Router-LSA
An inter-area-router-LSA (Type 4) describes routes to the ASBR in other areas. It is generated by the ABR.
The routes are advertised to all related areas except the area that the ASBR belongs to.
Figure 5 shows the format of an inter-area-router-LSA.
2022-07-08 1621
Feature Description
AS-External-LSA
An AS-external-LSA describes a route to a destination outside the AS and is generated by an ASBR.
Figure 6 shows the format of an AS-external-LSA.
2022-07-08 1622
Feature Description
T 1 bit Whether an External Route Tag has been included in the LSA.
If this field is 1, an External Route Tag has been included in the LSA.
If this field is 0, no External Route Tag is included in the LSA.
Referenced LS 16 bits Referenced LS type. If this value is not 0, an LSA with this LS type is to
Type be associated with this LSA (see Referenced Link State ID below).
External Route 32 bits External route tag, which can be used to communicate additional
Tag information between ASBRs.
NSSA-LSA
NSSA-LSAs are originated by ASBRs within an NSSA and describe routes to destinations external to the AS.
Figure 7 shows the format of an NSSA-LSA.
2022-07-08 1623
Feature Description
Link-LSA
Each Router generates a link LSA for each link. A link LSA describes the link-local address and IPv6 address
prefix associated with the link and the link option set in the network LSA. It is transmitted only on the link.
Figure 8 shows the format of a link-LSA.
2022-07-08 1624
Feature Description
Options 24 bits Set of options that may be set in the network-LSA generated by the DR
on broadcast or NBMA links.
Link-local 128 bits The originating Router's link-local interface address on the link.
Interface
Address
Intra-Area-Prefix-LSA
Each Router and DR generates one or such LSAs and transmits them in the local area.
• An LSA generated on a Router describes the IPv6 address prefix associated with the router LSA.
• Such LSAs generated by a DR describe the IPv6 address prefixes associated with network LSAs.
2022-07-08 1625
Feature Description
Referenced LS 16 bits Router-LSA or network-LSA with which the IPv6 address prefixes should
Type be associated.
If Referenced LS Type is 0x2001, the IPv6 prefixes are associated with a
router-LSA.
If Referenced LS Type is 0x2002, the IPv6 prefixes are associated with a
network-LSA.
2022-07-08 1626
Feature Description
Related Concepts
Redistribute ID
IS-IS uses a system ID as a redistribution identifier, OSPF and OSPFv3 use a router ID + process ID as a
redistribution identifier, and BGP uses a VrfID + random number as a redistribution identifier. For ease of
understanding, the redistribution identifiers of different protocols are all called Redistribute IDs. When routes
are distributed, the information carried in the routes contains Redistribute IDs.
Redistribute List
A Redistribute list may consist of multiple Redistribute IDs. Each Redistribute list of BGP contains a maximum
of four Redistribute IDs, and each Redistribute list of any other routing protocol contains a maximum of two
Redistribute IDs. When the number of Redistribute IDs exceeds the corresponding limit, the old ones are
discarded according to the sequence in which Redistribute IDs are added.
Take the route distributed by DeviceA as an example. A stable routing loop is formed through the following
process:
2022-07-08 1627
Feature Description
Phase 1
On the network shown in Figure 2, OSPFv3 process 1 on DeviceA imports the static route 10.0.0.1 and floods
a Type 5 AS-External-LSA in OSPFv3 process 1. After receiving the LSA, OSPFv3 process 1 on DeviceD and
OSPFv3 process 1 on DeviceE each calculate a route to 10.0.0.1, with the outbound interfaces being
interface1 on DeviceD and interface1 on DeviceE, respectively, and the cost being 102. At this point, the
routes to 10.0.0.1 in OSPFv3 process 1 in the routing tables of DeviceD and DeviceE are active.
Figure 2 Phase 1
Phase 2
In Figure 3, DeviceD and DeviceE are configured to import routes from OSPFv3 process 1 to OSPFv3 process
2. No route-policy is configured for the import, or the configured route-policy is improper. For example,
OSPFv3 process 2 on DeviceE imports routes from OSPFv3 process 1 and then floods a Type 5 AS-External-
LSA in OSPFv3 process 2. After receiving the LSA, OSPFv3 process 2 on DeviceD calculates a route to
10.0.0.1, with the cost being 2, which is smaller than that (102) of the route calculated by OSPFv3 process 1.
As a result, the active route to 10.0.0.1 in the routing table of DeviceD is switched from the one calculated
by OSPFv3 process 1 to the one calculated by OSPFv3 process 2, and the outbound interface of the route is
sub-interface2.1.
Figure 3 Phase 2
Phase 3
In Figure 4, DeviceD imports the route from OSPFv3 process 2 to OSPFv3 process 1 and floods a Type 5 AS-
External LSA in OSPFv3 process 1. After receiving the LSA, OSPFv3 process 1 on DeviceE recalculates the
route to 10.0.0.1. The cost of the route becomes 2, which is smaller than that of the previously calculated
route. Therefore, the route to 10.0.0.1 in OSPFv3 process 1 on DeviceE is changed to the route distributed by
2022-07-08 1628
Feature Description
Figure 4 Phase 3
Phase 4
After the route to 10.0.0.1 on DeviceE is updated, OSPFv3 process 2 still imports the route from OSPFv3
process 1 as the route remains active, and continues to distribute/update a Type 5 AS-External-LSA.
As a result, a stable routing loop is formed. Assuming that traffic is injected from DeviceF, Figure 5 shows
the traffic flow when the routing loop occurs.
2022-07-08 1629
Feature Description
2. DeviceD learns the route distributed by DeviceB through OSPFv3 process 1 and imports the route from
OSPFv3 process 1 to OSPFv3 process 2. DeviceE learns the route distributed by DeviceD through
OSPFv3 process 2 and saves the Redistribute List distributed by DeviceD through OSPFv3 process 2 to
the routing table when calculating routes.
3. DeviceE imports the route from OSPFv3 process 2 to OSPFv3 process 1 and redistributes the route
through OSPFv3 process 1. The corresponding E-AS-External-LSA contains the Redistribute ID of
OSPFv3 process 1 on DeviceE and the Redistribute ID of OSPFv3 process 2 on DeviceD. The
Redistribute ID of OSPFv3 process 1 on DeviceB has been discarded from the LSA.
4. OSPFv3 process 1 on DeviceD learns the Redistribute list corresponding to the route distributed by
DeviceE and saves the Redistribute list in the routing table. When importing the route from OSPFv3
process 1 to OSPFv3 process 2, DeviceD finds that the Redistribute list of the route contains its own
Redistribute ID, considers that a routing loop is detected, and reports an alarm. OSPFv3 process 2 on
DeviceD distributes a large cost when redistributing the route so that other devices preferentially
select other paths after learning the route. This prevents routing loops.
When detecting a routing loop upon route import between processes of the same protocol, the device increases
the cost of the corresponding route. As the cost of the delivered route increases, the optimal route in the IP
routing table changes. In this way, the routing loop is eliminated.
In the case of inter-protocol route import, if a routing protocol with a higher preference detects a routing loop,
although this protocol increases the cost of the corresponding route, the cost increase will not render the route
inactive. As a result, the routing loop cannot be eliminated. If the routing protocol with a lower preference
increases the cost of the corresponding route, this route competes with the originally imported route during route
selection. In this case, the routing loop can be eliminated.
2022-07-08 1630
Feature Description
On the network shown in Figure 7, DeviceA, DeviceB, and DeviceC run OSPFv3 process 1, DeviceF and
DeviceG run IS-IS process 2, and DeviceD and DeviceE run both processes. Route import between OSPFv3
process 1 and IS-IS process 2 is configured on DeviceD and DeviceE. The routes distributed by OSPFv3
process 1 on DeviceE are re-distributed back to OSPFv3 process 1 on DeviceD through IS-IS process 2. As the
costs of the routes newly distributed by DeviceD are smaller, they are preferentially selected by OSPFv3
process 1, resulting in routing loops.
Figure 7 Traffic flow when a routing loop occurs during route import between OSPFv3 and IS-IS
1. DeviceD learns the route distributed by DeviceB through OSPFv3 process 1 and imports the route from
OSPFv3 process 1 to IS-IS process 2. When IS-IS process 2 on DeviceD distributes route information, it
uses the extended prefix sub-TLV to distribute the Redistribute ID of IS-IS process 2 through an LSP.
IS-IS process 2 on DeviceE learns the route distributed by DeviceD and saves the Redistribute ID
distributed by IS-IS process 2 on DeviceD to the routing table during route calculation.
2. DeviceE imports the route from IS-IS process 2 to OSPFv3 process 1 and uses an E-AS-External-LSA to
distribute the Redistribute ID of OSPFv3 process 1 on DeviceE when distributing route information.
Similarly, after OSPFv3 process 1 on DeviceD learns the route from DeviceE, DeviceD saves the
Redistribute ID distributed by OSPFv3 process 1 on DeviceE to the routing table during route
calculation.
3. When importing the route from OSPFv3 process 1 to IS-IS process 2, DeviceD finds that the
Redistribute list of the route contains its own Redistribute ID, considers that a routing loop is detected,
and reports an alarm. IS-IS process 2 on DeviceD distributes a large cost when distributing the
imported route. Because IS-IS has a higher preference than OSPFv3 ASE, this does not affect the route
selection result or resolve the routing loop.
4. DeviceE imports the route from IS-IS process 2 to OSPFv3 process 1, finds that the Redistribute list of
the route contains its own Redistribute ID, considers that a routing loop is detected, and reports an
alarm. OSPFv3 process 1 on DeviceE distributes a large cost when distributing the imported route so
2022-07-08 1631
Feature Description
that other devices preferentially select other paths after learning the route. This prevents routing
loops.
Figure 8 Traffic flow when a routing loop occurs during route import between OSPFv3 and BGP
1. DeviceD learns the route distributed by DeviceB through BGP and imports the BGP route to OSPFv3
process 2. When DeviceD distributes the imported route through OSPFv3 process 2, it uses an
extended prefix E-AS-External-LSA to distribute the Redistribute ID of OSPFv3 process 2 on DeviceD.
DeviceE learns the route distributed by DeviceD through OSPFv3 process 2 and saves the Redistribute
List distributed by DeviceD through OSPFv3 process 2 to the routing table when calculating routes.
2. DeviceE imports the route from OSPFv3 process 2 to BGP and distributes the Redistribute ID of the
BGP process on DeviceE through an E-AS-External-LSA when redistributing the imported route. After
BGP on DeviceD learns the route distributed by DeviceE, DeviceD saves the Redistribute ID distributed
by BGP on DeviceE to the routing table during route calculation.
3. When importing the route from BGP to OSPFv3 process 2, DeviceD finds that the Redistribute list of
the route contains its own Redistribute ID, considers that a routing loop is detected, and reports an
alarm. OSPFv3 process 2 on DeviceD distributes a large link cost when distributing the imported route.
Because OSPFv3 has a higher preference than BGP, this does not affect the route selection result or
resolve the routing loop.
2022-07-08 1632
Feature Description
4. When importing the route from OSPFv3 process 2 to BGP, DeviceE finds that the Redistribute list of
the route contains its own Redistribute ID, considers that a routing loop is detected, and reports an
alarm. In addition, when BGP on DeviceE distributes the imported route, it reduces the preference of
the route. In this way, other devices preferentially select other paths after learning this route,
preventing routing loops.
Usage Scenario
Figure 9 shows a typical seamless MPLS network. If the OSPFv3 process deployed at the access layer differs
from that deployed at the aggregation layer, OSPFv3 inter-process mutual route import is usually configured
on AGGs so that routes can be leaked between the access and aggregation layers. In this case, a routing
loop may occur between AGG1 and AGG2. If OSPFv3 routing loop detection is configured on AGG1 and
AGG2, routing loops can be quickly detected and resolved.
Definition
Intermediate System to Intermediate System (IS-IS) is a dynamic routing protocol initially designed by the
International Organization for Standardization (ISO) for its Connectionless Network Protocol (CLNP).
To support IP routing, the Internet Engineering Task Force (IETF) extends and modifies IS-IS in relevant
standards, which enables IS-IS to be applied to both TCP/IP and Open System Interconnection (OSI)
environments. This type of IS-IS is called Integrated IS-IS or Dual IS-IS.
In this document, IS-IS refers to Integrated IS-IS, unless otherwise stated.
If IS-IS IPv4 and IS-IS IPv6 implement a feature in the same way, details are not provided in this chapter.
2022-07-08 1633
Feature Description
Purpose
As an Interior Gateway Protocol (IGP), IS-IS is used in Autonomous Systems (ASs). IS-IS is a link state
protocol, and it uses the Shortest Path First (SPF) algorithm to calculate routes.
IS-IS Areas
To support large-scale routing networks, IS-IS adopts a two-level structure in a routing domain. A large
domain can be divided into areas. Figure 1 shows an IS-IS network. The entire backbone area covers all
Level-2 Routers in area 1 and Level-1-2 routers in other areas. Three types of Routers on the IS-IS network
are described as follows:
• Level-1 device
A Level-1 device manages intra-area routing. It establishes neighbor relationships with only the Level-1
and Level-1-2 devices in the same area and maintains a Level-1 LSDB. The LSDB contains routing
information in the local area. A packet to a destination beyond this area is forwarded to the nearest
Level-1-2 device.
• Level-2 device
A Level-2 device manages inter-area routing. It can establish neighbor relationships with all Level-2
devices and Level-1-2 devices, and maintains a Level-2 LSDB which contains inter-area routing
2022-07-08 1634
Feature Description
information.
All Level-2 Routers form the backbone network of the routing domain. Level-2 neighbor relationships
are set up between them. They are responsible for communications between areas. The Level-2 Routers
in the routing domain must be in succession to ensure the continuity of the backbone network. Only
Level-2 Routers can directly exchange data packets or routing information with the Routers beyond the
area.
• Level-1-2 device
A device, which can establish neighbor relationships with both Level-1 devices and Level-2 devices, is
called a Level-1-2 device. A Level-1-2 device can establish Level-1 neighbor relationships with Level-1
devices and Level-1-2 devices in the same area. It can also establish Level-2 neighbor relationships with
Level-2 devices and Level-1-2 devices in other areas. Level-1 devices can be connected to other areas
only through Level-1-2 devices.
A Level-1-2 device maintains two LSDBs: a Level-1 LSDB and a Level-2 LSDB. The Level-1 LSDB is used
for intra-area routing, whereas the Level-2 LSDB is used for inter-area routing.
Level-1 devices in different areas cannot establish neighbor relationships. Level-1-2 devices can establish neighbor
relationships with each other, regardless of the areas to which the devices belong.
In general, Level-1 devices are located within an area, Level-2 devices are located between areas, and Level-
1-2 devices are located between Level-1 devices and Level-2 devices.
Interface level
A Level-1-2 device may need to establish only a Level-1 adjacency with one neighbor and establish only a
Level-2 adjacency with another neighbor. In this case, you can set the level of an interface to control the
setting of adjacencies on the interface. Specifically, only Level-1 adjacencies can be established on a Level-1
interface, and only Level-2 adjacencies can be established on a Level-2 interface.
2022-07-08 1635
Feature Description
• Area address
An IDP and HODSP of the DSP can identify a routing domain and the areas in a routing domain;
therefore, the combination of the IDP and HODSP is referred to as an area address, equal to an area ID
in OSPF. You are advised to avoid the situation where different Level-1 areas in the same routing
domain have the same area address. The area addresses of Routers in the same Level-1 area must be
the same.
A Router generally requires only one area address, and the area addresses of all nodes in the same area
must be the same. In the implementation of a device, an IS-IS process can be configured with a
maximum of three area addresses to support seamless combination, division, and transformation of
areas.
• System ID
A system ID uniquely identifies a host or a Router in an area. In the device, the length of the system ID
is 48 bits (6 bytes).
A router ID corresponds to a system ID. If a Router uses the IP address (192.168.1.1) of Loopback 0 as
its router ID, its system ID used in IS-IS can be obtained through the following steps:
■ Extend each part of the IP address 192.168.1.1 to 3 digits and add 0 or 0s to the front of the part
that is shorter than 3 digits.
■ Divide the extended address 192.168.001.001 into three parts, with each part consisting of 4
decimal digits.
There are many ways to specify a system ID. Whichever you choose, ensure that the system ID uniquely
identifies a host or a Router.
If the same system ID is configured for more than one device on the same network, network flapping may occur.
To address this problem, IS-IS provides the automatic recovery function. With the function, if the system detects an
IS-IS system ID conflict, it automatically changes the local system ID to resolve the conflict. The first two bytes of
the system ID automatically changed by the system are Fs, and the last four bytes are randomly generated. For
example, FFFF:1234:5678 is such a system ID. If the conflict persists after the system automatically changes three
system IDs, the system no longer resolves this conflict.
• SEL
The role of an SEL (also referred to as NSAP Selector or N-SEL) is similar to that of the "protocol
2022-07-08 1636
Feature Description
identifier" of IP. A transport protocol matches an SEL. The SEL is "00" in IP.
• NET
A Network Entity Title (NET) indicates the network layer information of an IS itself and consists of an
area ID and a system ID. It does not contain the transport layer information (SEL = 0). A NET can be
regarded as a special NSAP. The length of the NET field is the same as that of an NSAP, varying from 8
bytes to 20 bytes. For example, in NET ab.cdef.1234.5678.9abc.00, the area is ab.cdef, the system ID is
1234.5678.9abc, and the SEL is 00.
In general, an IS-IS process is configured with only one NET. When areas need to be redefined, for
example, areas need to be combined or an area needs to be divided into sub-areas, you can configure
multiple NETs.
A maximum of three area addresses can be configured in an IS-IS process, and therefore, you can configure only a
maximum of three NETs. When you configure multiple NETs, ensure that their system IDs are the same.
The Routers in an area must have the same area address.
• Broadcast network
Related Concepts
DIS and Pseudo Node
A Designated Intermediate System (DIS) is an intermediate router elected in IS-IS communication. A pseudo
node simulates a virtual node on a broadcast network and is not a real router. In IS-IS, a pseudo node is
identified by the system ID and 1-byte circuit ID (a non-zero value) of a DIS.
The DIS is used to create and update pseudo nodes and generate the link state protocol data units (LSPs) of
pseudo nodes. The routers advertise a single link to a pseudo node and obtain routing information about the
entire network through the pseudo node. The router does not need to exchange packets with all the other
routers on the network. Using the DIS and pseudo nodes simplifies network topology and reduces the length
of LSPs generated by routers. When the network changes, fewer LSPs are generated. Therefore, fewer
resources are consumed.
SPF Algorithm
The SPF algorithm, also named Dijkstra's algorithm, is used in a link-state routing protocol to calculate the
shortest paths to other nodes on a network. In the SPF algorithm, a local router takes itself as the root and
generates a shortest path tree (SPT) based on the network topology to calculate the shortest path to every
destination node on a network. In IS-IS, the SPF algorithm runs separately in Level-1 and Level-2 databases.
2022-07-08 1637
Feature Description
Implementation
All routers on the IS-IS network communicate through the following steps:
• LSDB Synchronization
• Route Calculation
Device A, Device B, Device C, and Device D are Level-2 routers. Device A is newly added to the
broadcast network. Figure 2 demonstrates the process of establishing the neighbor relationship between
Device A and Device B, the process of establishing the neighbor relationship between Device A and
Device C or Device D is similar to that between Device A and Device B.
As shown in Figure 2, the process for establishing a neighbor relationship on a broadcast link consists of
the following phases:
■ Device A broadcasts a Level-2 local area network (LAN) IS-to-IS Hello PDU (IIH). After Device B
receives the IIH, Device B detects that the neighbor field in the IIH does not contain its media
2022-07-08 1638
Feature Description
access control (MAC) address, and sets its neighbor status with Device A to Initial.
■ Device B replies a Level-2 LAN IIH to Device A. After Device A receives the IIH, Device A detects
that the neighbor field in the IIH contains its MAC address, and sets its neighbor status with Device
B to Up.
■ Device A sends a Level-2 LAN IIH to Device B. After Device B receives the IIH, Device B detects that
the neighbor field in the IIH contains its MAC address, and sets its neighbor status with Device A to
Up.
DIS Election
On a broadcast network, any two routers exchange information. If n routers are available on the
network, n x (n - 1)/2 adjacencies must be established. Each status change of a router is transmitted to
other routers, which wastes bandwidth resources. IS-IS resolves this problem by introducing the DIS. All
routers send information to the DIS, which then broadcasts the network link status. Using the DIS and
pseudo nodes simplifies network topology and reduces the length of LSPs generated by routers. When
the network changes, fewer LSPs are generated. Therefore, fewer resources are consumed.
A DIS is elected after a neighbor relationship is established. Level-1 and Level-2 DISs are elected
separately. You can configure different priorities for DISs at different levels. In DIS election, a Level-1
priority and a Level-2 priority are specified for every interface on every router. A router uses every
interface to send IIHs and advertises its priorities in the IIHs to neighboring routers. The higher the
priority, the higher the probability of being elected as the DIS. If there are multiple routers with the
same highest priority on a broadcast network, the one with the largest MAC address is elected. The DISs
at different levels can be the same router or different routers.
In the DIS election procedure, IS-IS is different from Open Shortest Path First (OSPF). In IS-IS, DIS
election rules are as follows:
■ The router with the priority of 0 also takes part in the DIS election.
■ When a new router that meets the requirements of being a DIS is added to the broadcast network,
the router is selected as the new DIS, which triggers a new round of LSP flooding.
The establishment of a neighbor relationship on a P2P link is different from that on a broadcast link. A
neighbor relationship on a P2P link can be established in 2-way or 3-way mode, as shown in Table 1. By
default, the 3-way handshake mechanism is used to establish a neighbor relationship on a P2P link.
2022-07-08 1639
Feature Description
LSDB Synchronization
IS-IS is a link-state protocol. An IS-IS router obtains first-hand information from other routers running link-
state protocols. Every router generates information about itself, directly connected networks, and links
between itself and directly connected networks. The router then sends the generated information to other
routers through adjacent routers. Every router saves link state information without modifying it. Finally,
every router has the same network interworking information, and LSDB synchronization is complete. The
process of synchronizing LSDBs is called LSP flooding. In LSP flooding, a router sends an LSP to its neighbors
and the neighbors send the received LSP to their neighbors except the router that first sends the LSP. The
LSP is flooded among the routers at the same level. This implementation allows each router at the same
level to have the same LSP information and keep a synchronized LSDB.
2022-07-08 1640
Feature Description
All routers in the IS-IS routing domain can generate LSPs. A new LSP is generated in any of the following
situations:
1. When the DIS receives an LSP, it searches the LSDB for the related records. If the DIS does not
find the LSP in its LSDB, it adds the LSP to its LSDB and broadcasts the new LSDB.
2. If the sequence number of the received LSP is greater than that of the local LSP, the DIS replaces
the local LSP with the received LSP in the LSDB and broadcasts the new LSDB.
3. If the sequence number of the received LSP is less than that of the local LSP, the DIS sends the
local LSP in the LSDB to the inbound interface.
4. If the sequence number of the received LSP is equal to that of the local LSP, the DIS compares the
Remaining Lifetime of the two LSPs. If Remaining Lifetime of the received LSP is 0, the DIS
replaces the LSP with the received LSP, and broadcasts the new LSDB. If the Remaining Lifetime
of local LSP is 0, the DIS sends the LSP to the inbound interface.
5. If the sequence number of the received LSP and the local LSP in the LSDB are the same and
neither Remaining Lifetime is 0, the DIS compares the checksum of the two LSPs. If the received
LSP has a greater checksum than that of the local LSP in the LSDB, the DIS replaces the local LSP
in the LSDB with the received LSP and advertises the new LSDB. If the received LSP has a smaller
checksum than that of the local LSP in the LSDB, the DIS sends the local LSP in the LSDB to the
inbound interface.
6. If the checksums of the received LSP and the local LSP are the same, the LSP is not forwarded.
2022-07-08 1641
Feature Description
1. If the sequence number of the received LSP is greater than that of the local LSP in the LSDB, the
router adds the received LSP to its LSDB. The router then sends a PSNP packet to acknowledge
the received LSP and sends the LSP to all its neighbors except the neighbor that sends the LSP.
2. If the sequence number of the received LSP is less than that of the local LSP, the router directly
sends its LSP to the neighbor and waits for a PSNP from the neighbor as an acknowledgement.
3. If the sequence number of the received LSP is the same as that of the local LSP in the LSDB, the
router compares the Remaining Lifetimes of the two LSPs. If Remaining Lifetime of the received
LSP is 0, the router adds the LSP to its LSDB. The router then sends a PSNP to acknowledge the
received LSP. If Remaining Lifetime of the local LSP is 0, the router directly sends the local LSP to
the neighbor and waits for a PSNP from the neighbor.
4. If the sequence number of the received LSP and the local LSP in the LSDB are the same, and
neither Remaining Lifetime is 0, the router compares the checksum of the two LSPs. If the
received LSP has a greater checksum than that of the local LSP, the router adds the received LSP
to its LSDB. The router then sends a PSNP to acknowledge the received LSP. If the received LSP
has a smaller checksum than that of the local LSP, the router directly sends the local LSP to the
neighbor and waits for a PSNP from the neighbor. At last, the router sends the LSP to all its
neighbors except the neighbor that sends the LSP.
5. If the checksums of the received LSP and the local LSP are the same, the LSP is not forwarded.
Route Calculation
When LSDB synchronization is complete and network convergence is implemented, IS-IS performs SPF
2022-07-08 1642
Feature Description
calculation by using LSDB information to obtain the SPT. IS-IS uses the SPT to create a forwarding database
(a routing table).
In IS-IS, link costs are used to calculate shortest paths. The default cost for an interface on a Huawei router
is 10. The cost is configurable. The cost of a route is the sum of the cost of every outbound interface along
the route. There may be multiple routes to a destination, among which the route with the smallest cost is
the optimal route.
Level-1 routers can also calculate the shortest path to Level-2 routers to implement inter-area route
selection. When a Level-1-2 router is connected to other areas, the router sets the value of the attachment
(ATT) bit in its LSP to 1 and sends the LSP to neighboring routers. In the route calculation process, a Level-1
router selects the nearest Level-1-2 router as an intermediate router between the Level-1 and Level-2 areas.
To optimize IS-IS networks and facilitate traffic management, more precise route control is required. IS-IS
uses the following methods to control routing information:
• Route Leaking
• Route Summarization
• Load Balancing
• Administrative Tag
• Link-group
Route Leaking
When Level-1 and Level-2 areas both exist on an IS-IS network, Level-2 routers do not advertise the learned
routing information about a Level-1 area and the backbone area to any other Level-1 area by default.
Therefore, Level-1 routers do not know the routing information beyond the local area. As a result, the Level-
1 routers cannot select the optimal routes to the destination beyond the local area.
With route leaking, Level-1-2 routers can select routes using routing policies, or tags and advertise the
selected routes of other Level-1 areas and the backbone area to the Level-1 area. Figure 1 shows the typical
networking for route leaking.
2022-07-08 1643
Feature Description
• Device A, Device B, Device C, and Device D belong to area 10. Device A and Device B are Level-1
routers. Device C and Device D are Level-1-2 routers.
If Device A sends a packet to Device F, the selected optimal route should be Device A -> Device B -> Device
D -> Device E -> Device F because its cost is 40 (10 + 10 + 10 + 10 = 40) which is less than that of Device A -
> Device C -> Device E -> Device F (10 + 50 + 10 = 70). However, if you check routes on Device A, you can
find that the selected route is Device A -> Device C -> Device E -> Device F, which is not the optimal route
from Device A to Device F.
This is because Device A does not know the routes beyond the local area, and therefore, the packets sent by
Device A to other network segments are sent through the default route generated by the nearest Level-1-2
device.
In this case, you can enable route leaking on the Level-1-2 devices (Device C and Device D). Then, check the
route and you can find that the selected route is Device A -> Device B -> Device D -> Device E -> Device F.
Route Summarization
On a large-scale IS-IS network, links connected to devices within an IP address range may alternate between
up and down. With route summarization, multiple routes with the same IP prefix are summarized into one
route, which prevents route flapping, reduces routing entries and system resource consumption, and
facilitates route management. Figure 2 shows the typical networking.
2022-07-08 1644
Feature Description
• Router A, Router B, and Router C use IS-IS to communicate with each other.
• Device A belongs to area 20, and Device B and Device C belong to area 10.
• Device B maintains Level-1 and Level-2 LSDBs and leaks the routes to three network segments
(172.16.1.0/24, 172.16.2.0/24, and 172.16.3.0/24) from the Level-1 area to the Level-2 area. If a link
fault causes the Device C interface with IP address 172.16.1.1/24 to frequently alternate between up
and down, the state change is advertised to the Level-2 area, triggering frequent LSP flooding and SPF
calculation on Device A. As a result, the CPU usage on Device A increases, and even network flapping
occurs.
If Device B is configured to summarize routes to the three network segments in the Level-1 area into
route 172.16.0.0/22, the number of routing entries on Device B is reduced; in addition, the impact of link
state changes in the Level-1 area on route convergence in the Level-2 area can be reduced.
Load Balancing
When multiple equal-cost routes are available on a network, you can configure IS-IS load balancing to
improve link utilization and prevent network congestion caused by link overload. IS-IS load balancing evenly
distributes traffic among multiple equal-cost paths. Figure 3 shows the typical networking for load
balancing.
2022-07-08 1645
Feature Description
• Device A, Device B, Device C, and Device D communicate with each other on an IP network using IS-IS.
• Device A, Device B, Device C, and Device D belong to area 10 and are Level-2 routers.
• If load balancing is not enabled, traffic on Device A is transmitted along the optimal route obtained
using the SPF calculation. Consequently, traffic on different links is unbalanced. Enabling load balancing
on Device A sends traffic to RouterDevice D through RouterDevice B and Device C. This transmission
mode relieves the load on the optimal route.
Load balancing supports per-packet load balancing and per-flow load balancing. For details, see NE40E
Feature Description - IP Routing.
IS-IS supports not only intra-process load balancing, but also inter-process load balancing when equal-cost
routes exist between different processes.
Administrative Tag
Administrative tags carry administrative information about IP address prefixes. When the cost type is wide,
wide-compatible, or compatible and the prefix of the reachable IP address to be advertised by IS-IS has this
cost type, IS-IS adds the administrative tag to the reachability type-length-value (TLV) in the prefix. In this
manner, the administrative tag is advertised to the entire routing domain along with the prefix so that
routes can be imported or filtered based on the administrative tag.
2022-07-08 1646
Feature Description
link-group
In Figure 4, Router A is dual-homed to the IS-IS network through Router B and Router C. The path Router A
-> Router B is primary and the path Router A -> Router C is backup. The bandwidth of each link is 100
Gbit/s, and the traffic from Client is transmitted at 150 Gbit/s. In this situation, both links in the path Router
A -> Router B or the path Router A -> Router C need to carry the traffic. If Link-a fails, Link-b takes over all
the traffic. However, the bandwidth of Link-b is not sufficient to carry the traffic. As a result, traffic loss
occurs.
To address this problem, configure link groups. You can add multiple links to a link group. If one of the links
fails and the bandwidth of the other the links in the group is not sufficient to carry the traffic, the link group
automatically increases the costs of the other links to a configured value so that this link group is not
selected. Then, traffic is switched to another link group.
In Figure 4, Link-a and Link-b belong to link group 1, and Link-c and Link-d belong to link group 2.
• If Link-a fails, link group 1 automatically increases the cost of Link-b so that the traffic is switched to
Link-c and Link-d.
• If both Link-a and Link-c fail, the link groups increase the costs of Link-b and Link-d (to the same value)
so that Link-b and Link-d load-balance the traffic.
Background
If the status of an interface carrying IS-IS services alternates between Up and Down, IS-IS neighbor
relationship flapping occurs on the interface. During the flapping, IS-IS frequently sends Hello packets to
reestablish the neighbor relationship, synchronizes LSDBs, and recalculates routes. In this process, a large
2022-07-08 1647
Feature Description
number of packets are exchanged, adversely affecting neighbor relationship stability, IS-IS services, and other
IS-IS-dependent services, such as LDP and BGP. IS-IS neighbor relationship flapping suppression can address
this problem by delaying IS-IS neighbor relationship reestablishment or preventing service traffic from
passing through flapping links.
Related Concepts
Flapping_event: reported when the status of a neighbor relationship on an interface last changes from Up
to Init or Down. The flapping_event triggers flapping detection.
Flapping_count: number of times flapping has occurred.
Detect-interval: interval at which flapping is detected. The interval is used to determine whether to trigger a
valid flapping_event.
Threshold: flapping suppression threshold. When the flapping_count exceeds the threshold, flapping
suppression takes effect.
Resume-interval: interval used to determine whether flapping suppression exits. If the interval between two
valid flapping_events is longer than the resume-interval, flapping suppression exits.
Implementation
Flapping detection
IS-IS interfaces start a flapping counter. If the interval between two flapping_events is shorter than the
detect-interval, a valid flapping_event is recorded, and the flapping_count increases by 1. When the
flapping_count exceeds the threshold, the system determines that flapping occurs, and therefore triggers
flapping suppression, and sets the flapping_count to 0. If the interval between two valid flapping_events is
longer than the resume-interval before the flapping_count reaches the threshold again, the system sets the
flapping_count to 0 again. Interfaces start the suppression timer when the status of a neighbor relationship
last changes to Init or Down.
The detect-interval, threshold, and resume-interval are configurable.
Flapping suppression
Flapping suppression works in either Hold-down or Hold-max-cost mode.
• Hold-down mode: In the case of frequent flooding and topology changes during neighbor relationship
establishment, interfaces prevent neighbor relationships from being reestablished during the
suppression period, which minimizes LSDB synchronization attempts and packet exchanges.
• Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use the maximum
cost of the flapping link during the suppression period, which prevents traffic from passing through the
flapping link.
Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost mode.
By default, the Hold-max-cost mode takes effect. The mode and suppression period can be changed
manually.
2022-07-08 1648
Feature Description
When an interface enters the flapping suppression state, all neighbor relationships on the interface enter the state
accordingly.
• Three Hello packets in which the padding TLV carries a sub-TLV with the value being 251 are sent
consecutively to notify the peer device to forcibly exit flapping suppression.
• If flapping suppression works in Hold-down mode, the neighbor relationship between Device B and
Device C is prevented from being reestablished during the suppression period, in which traffic is
forwarded along the path Device A -> Device B -> Device D -> Device E.
• If flapping suppression works in Hold-max-cost mode, the maximum cost is used as the cost of the link
between Device B and Device C during the suppression period, and traffic is forwarded along the path
Device A -> Device B -> Device D -> Device E.
2022-07-08 1649
Feature Description
Broadcast scenario
In Figure 3, four devices are deployed on the same broadcast network using switches, and the devices are
broadcast network neighbors. If Device C flaps due to a link failure, and Device A and Device B were
deployed at different time (Device A was deployed earlier for example) or the flapping suppression
parameters on Device A and Device B are different, Device A first detects the flapping and suppresses Device
C. Consequently, the Hello packets sent by Device A do not carry Device C's router ID. However, Device B has
2022-07-08 1650
Feature Description
not detected the flapping yet and still considers Device C a valid node. As a result, the DR candidates
identified by Device A are Device B and Device D, whereas the DR candidates identified by Device B are
Device A, Device C, and Device D. Different DR candidates result in a different DR election result, which may
lead to route calculation errors. To prevent this problem in scenarios where an interface has multiple
neighbors, such as on a broadcast, P2MP, or NBMA network, all neighbors on the interface are suppressed
when the status of a neighbor relationship last changes to Init or Down. Specifically, if Device C flaps,
Device A, Device B, and Device D on the broadcast network are all suppressed. After the network stabilizes
and the suppression timer expires, Device A, Device B, and Device D are restored to normal status.
By default, the Hold-max-cost mode takes effect. The mode can be changed to Hold-down manually.
2022-07-08 1651
Feature Description
Scenario with both LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression
configured
In Figure 5, if the link between PE1 and P1 fails, an LDP LSP switchover is implemented immediately, causing
the original LDP LSP to be deleted before a new LDP LSP is established. To prevent traffic loss, LDP-IGP
synchronization needs to be configured. With LDP-IGP synchronization, the maximum cost is used as the cost
of the new LSP to be established. After the new LSP is established, the original cost takes effect.
Consequently, the original LSP is deleted, and LDP traffic is forwarded along the new LSP.
LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression work in either Hold-down or
Hold-max-cost mode. If both functions are configured, Hold-down mode takes precedence over Hold-max-
cost mode, followed by the configured link cost. Table 1 lists the suppression modes that take effect in
different situations.
Table 1 Principles for selecting the suppression modes that take effect in different situations
2022-07-08 1652
Feature Description
For example, the link between PE1 and P1 frequently flaps in Figure 5, and both LDP-IGP synchronization
and IS-IS neighbor relationship flapping suppression are configured. In this case, the suppression mode is
selected based on the preceding principles. No matter which mode (Hold-down or Hold-max-cost) is
selected, the forwarding path is PE1 -> P4 -> P3 -> PE2.
Figure 5 Scenario with both LDP-IGP synchronization and IS-IS neighbor relationship flapping suppression
configured
Scenario with both bit-error-triggered protection switching and IS-IS neighbor relationship flapping
suppression configured
If a link has poor link quality, services transmitted along it may be adversely affected. If bit-error-triggered
protection switching is configured and the bit error rate (BER) along a link exceeds a specified value, a bit
error event is reported, and the maximum cost is used as the cost of the link, triggering route reselection.
Consequently, service traffic is switched to the backup link. If both bit-error-triggered protection switching
and IS-IS neighbor relationship flapping suppression are configured, they both take effect. Hold-down mode
takes precedence over Hold-max-cost mode, followed by the configured link cost.
Scenario with both Link-bundle and IS-IS neighbor relationship flapping suppression configured
When the service traffic rate exceeds the capacity of the link, multiple links must be used. If one of the links
between two devices is faulty, traffic is switched to another link. Because of limited forwarding capacity on
the new link, excessive traffic is discarded. If the number of faulty links reaches the upper threshold, the
maximum cost is used as the cost of all links in the link bundle to switch all service traffic to the backup
2022-07-08 1653
Feature Description
nodes. When both link-bundle and neighbor relationship flapping suppression are configured, if the number
of flapping links reaches the upper threshold, the maximum cost must be configured as the cost of all other
links in the link bundle to prevent service loss caused by user traffic congestion. As shown in Figure 6, two
parallel links exist between Device A and Device C. If Link 1 is faulty and Link 2 bears all service traffic,
traffic loss occurs. If both link-bundle and neighbor relationship flapping suppression are configured and Link
1 flaps, the maximum cost must be configured for Link 2 to avoid service traffic congestion. Only the Hold-
max-cost mode therefore can be configured for neighbor relationship flapping suppression to switch the
traffic forwarding path to Device A->Device B->Device C.
Figure 6 Scenario with both Link-bundle and IS-IS neighbor relationship flapping suppression configured
If a system fails to store new LSPs for LSDB synchronization, the routes calculated by the system are
incorrect. In that case, the system enters the Overload state. The user can configure the device to enter the
Overload state when the system lacks sufficient memory. At present, users can set the Overload timer when
IS-IS is started and configure whether to delete the leaked routes and whether to advertise the imported
routes. A device enters the Overload state after an exception occurs on the device or when it is configured to
enter the state.
• If IS-IS enters the Overload state after an exception occurs on the device, the system deletes all
imported or leaked routes.
• If IS-IS enters the Overload state based on a user configuration, the system only deletes all imported or
leaked routes if configured to do so.
Although LSPs with overload fields are flooded throughout the network, they are ignored in the calculation
of the routes passing through the device in the Overload state. Specifically, after the overload field of LSPs is
configured on a device, other devices do not count the routes that pass through the device when performing
SPF calculation, but the direct routes between the device and other devices are still calculated.
If a device in an IS-IS domain is faulty, routes may be incorrectly calculated across the entire domain. The
overload field can be configured for the device to isolate it from the IS-IS network temporarily, which
2022-07-08 1654
Feature Description
• Intelligent timer
The first timeout period of the timer is fixed. If an event that triggers the timer occurs before the set
timer expires, the next timeout period of the timer increases.
The intelligent timer applies to LSP generation and SPF calculation.
I-SPF
In ISO 10589, the Dijkstra algorithm was adopted to calculate routes. When a node changes on the network,
the algorithm recalculates all routes. The calculation requires a long time to complete and consumes a
significant amount of CPU resources, reducing convergence speed.
I-SPF improves the algorithm. Except for the first time the algorithm is run, only the nodes that have
changed rather than all nodes in the network are used in the calculation. The SPT generated using I-SPF is
the same as that generated using the previous algorithm. This significantly decreases CPU usage and speeds
up network convergence.
PRC
Similar to I-SPF, PRC calculates only routes that have changed. PRC, however, does not calculate the shortest
path. It updates routes based on the SPT calculated by I-SPF.
In route calculation, a leaf represents a route, and a node represents a device. Either an SPT change or a leaf
change causes a routing information change. The SPT change is irrelevant to the leaf change. PRC processes
routing information as follows:
• If the SPT changes after I-SPF calculation, PRC calculates all the leaves only on the changed node.
• If the SPT remains unchanged after I-SPF calculation, PRC calculates only the changed leaves.
For example, if a new route is imported, the SPT of the entire network remains unchanged. In this case, PRC
updates only the interface route for this node, thereby reducing the CPU usage.
PRC working with I-SPF further improves network convergence performance and replaces the original SPF
algorithm.
2022-07-08 1655
Feature Description
On the NE40E, only I-SPF and PRC are used to calculate IS-IS routes.
LSP fast flooding is supported by default and does not need to be configured.
Intelligent Timer
Although the route calculation algorithm is improved, the long interval for triggering route calculation also
affects the convergence speed. A millisecond-level timer can shorten the interval. Frequent network changes,
however, also consume too much CPU resources. The SPF intelligent timer can quickly respond to a few
external emergencies and avoid excessive CPU usage.
In most cases, an IS-IS network running normally is stable. The frequent changes on a network are rather
rare, and IS-IS does not calculate routes frequently. Therefore, a short period (within milliseconds) can be
configured as the first interval for route calculation. If the network topology changes frequently, the interval
set by the intelligent timer increases with the calculation times to reduce CPU consumption.
The LSP generation intelligent timer is similar to the SPF intelligent timer. When the LSP generation
intelligent timer expires, the system generates a new LSP based on the current topology. In the original
implementation mechanism, a timer with a fixed interval is used, which cannot meet the requirements of
fast convergence and low CPU usage at the same time. Therefore, the LSP generation timer is designed as
an intelligent timer to respond to emergencies (for example, the interface goes Up or Down) quickly and
speed up network convergence. In addition, when the network changes frequently, the interval for the
intelligent timer becomes longer to reduce CPU consumption.
2022-07-08 1656
Feature Description
Virtual system IDs can be configured, and virtual LSPs that carry routing information can be generated for
IS-IS.
IS-IS LSP fragment extension allows an IS-IS device to generate more LSP fragments and carry more IS-IS
information.
Terms
• Originating system
The originating system is a device that runs the IS-IS protocol. A single IS-IS process advertises LSPs as
virtual devices do, except that the originating system refers to a real IS-IS process.
• Normal system ID
The normal system ID is the system ID of the originating system.
• Additional system ID
The additional system ID, assigned by the network administrator, is used to generate additional or
extended LSP fragments. A maximum of 256 additional or extended LSP fragments can be generated.
Like a normal system ID, an additional system ID must be unique in a routing domain.
• Virtual system
The virtual system, identified by an additional system ID, is used to generate extended LSP fragments.
These fragments carry additional system IDs in their LSP IDs.
Principles
IS-IS LSP fragments are identified by the LSP Number field in their LSP IDs. The LSP Number field is 1 byte.
Therefore, an IS-IS process can generate a maximum of 256 fragments. With fragment extension, more
information can be carried.
Each system ID represents a virtual system, and each virtual system can generate 256 LSP fragments. In
addition, another virtual systems can be configured. Therefore, an IS-IS process can generate more LSP
fragments.
After a virtual system and fragment extension are configured, an IS-IS device adds the contents that cannot
be contained in its LSPs to the LSPs of the virtual system and notifies other devices of the relationship
between the virtual system and itself through a special TLV in the LSPs.
IS Alias ID TLV
Standard protocol defines a special Type-Length-Value (TLV): IS Alias ID TLV.
Type 1 byte TLV type. If the value is 24, it indicates the IS Alias ID TLV.
2022-07-08 1657
Feature Description
LSPs with fragment number 0 sent by the originating system and virtual system carry IS Alias ID TLVs to
indicate the originating system.
Operation Modes
IS-IS devices can use the LSP fragment extension feature in the following modes:
• Mode-1
Mode-1 is used when some devices on the network do not support LSP fragment extension.
In this mode, virtual systems participate in SPF calculation. The originating system advertises LSPs
containing information about links to each virtual system and each virtual system advertises LSPs
containing information about links to the originating system. In this manner, the virtual systems
function the same as the actual devices connected to the originating system on the network.
Mode-1 is a transitional mode for earlier versions that do not support LSP fragment extension. In the
earlier versions, IS-IS cannot identify Alias ID TLVs. Therefore, the LSP sent by a virtual system must look
like a common IS-IS LSP.
The LSP sent by a virtual system contains the same area address and overload bit as those in the
common LSP. If the LSPs sent by a virtual system contain TLVs specified in other features, the TLVs must
be the same as those in common LSPs.
LSPs sent by a virtual system carry information of the neighbor (the originating system), and the carried
cost is the maximum value minus 1. LSPs sent by the originating system carry information of the
2022-07-08 1658
Feature Description
neighbor (the virtual system), and the carried cost is 0. This mechanism ensures that the virtual system
is a node downstream of the originating system when other devices calculate routes.
In Figure 1, Device B does not support LSP fragment extension; Device A supports LSP fragment
extension in mode-1; Device A1 and Device A2 are virtual systems of Device A. Device A1 and Device A2
send LSPs carrying partial routing information of Device A. After receiving LSPs from Device A, Device
A1, and Device A2, Device B considers there to be three devices at the peer end and calculates routes
normally. Because the cost of the route from Device A to Device A1 or Device A2 is 0, the cost of the
route from Device B to Device A is equal to that from Device B to Device A1.
• Mode-2
Mode-2 is used when all the devices on the network support LSP fragment extension. In this mode,
virtual systems do not participate in SPF calculation. All the devices on the network know that the LSPs
generated by the virtual systems actually belong to the originating system.
IS-IS working in mode-2 identifies IS Alias ID TLVs, which are used to calculate the SPT and routes.
In Figure 1, Device B supports LSP fragment extension, and Device A supports LSP fragment extension in
mode-2; Device A1 and Device A2 send LSPs carrying some routing information of Device A. After
receiving LSPs from Device A1 and Device A2, Device B obtains IS Alias ID TLV and learns that the
originating system of Device A1 and Device A2 is Device A. Device B then considers information
advertised by Device A1 and Device A2 to be about Device A.
Whatever the LSP fragment extension mode, LSPs can be resolved. However, if LSP fragment extension is not
supported, only LSPs in mode-1 can be resolved.
Area Yes No
Process
After LSP fragment extension is configured, if information is lost because LSPs overflow, the system restarts
the IS-IS process. After being restarted, the originating system loads as much routing information as
possible. Any excessive information beyond the forwarding capability of the system is added to the LSPs of
2022-07-08 1659
Feature Description
the virtual systems for transmission. In addition, if a virtual system with routing information is deleted, the
system automatically restarts the IS-IS process.
Usage Scenario
If there are non-Huawei devices on the network, LSP fragment extension must be set to mode-1. Otherwise, these
devices cannot identify LSPs.
Configuring LSP fragment extension and virtual systems before setting up IS-IS neighbors or importing
routes is recommended. If IS-IS neighbors are set up or routes are imported first and the information to be
carried exceeds the forwarding capability of 256 fragments before LSP fragment extension and virtual
systems are configured, you have to restart the IS-IS process for the configurations to take effect.
• IPv6 Reachability
The IPv6 Reachability TLV indicates the reachability of a network by specifying the route prefix and
metric. The type value is 236 (0xEC).
2022-07-08 1660
Feature Description
The IPv6 Interface Address TLV is similar to the IP interface address TLV of IPv4 in function, except that
it changes the original 32-bit IPv4 address to a 128-bit IPv6 address. The type value is 232 (0xE8).
The NLPID is an 8-bit field that identifies network layer protocol packets. The NLPID of IPv6 is 142 (0x8E). If
an IS-IS router supports IPv6, it advertises routing information through the NLPID value.
10.8.2.10 IS-IS TE
IS-IS Traffic Engineering (TE) allows MPLS to set up and maintain TE constraint-based routed label switched
paths (CR-LSPs).
To establish CR-LSPs, MPLS needs to learn the traffic attributes of all the links in the local area. MPLS can
acquire the TE information of the links through IS-IS.
Traditional routers select the shortest path as the primary route regardless of other factors, such as
bandwidth, even when the path is congested.
On the network shown in Figure 1, all the links have the same metric (10). The shortest path from DeviceA/
DeviceH to DeviceE is DeviceA/DeviceH → DeviceB → DeviceC → DeviceD → DeviceE. Data is forwarded
along this shortest path. Therefore, the link DeviceA (DeviceH) → DeviceB → DeviceC → DeviceD → DeviceE
may be congested whereas the link DeviceA/DeviceH → DeviceB → DeviceF → DeviceG → DeviceD → Device
E is idle.
To solve the preceding problem, you can adjust the link metric. For example, based on topology analysis, you
can adjust the metric of the link DeviceB → DeviceC to 30. In this manner, traffic can be diverted to the link
DeviceA/DeviceH → DeviceB → DeviceF → DeviceG → DeviceD → DeviceE.
This method eliminates the congestion on the link DeviceA/DeviceH → DeviceB → DeviceC → DeviceD →
DeviceE; however, the other link DeviceA/DeviceH → DeviceB → DeviceF → DeviceG → DeviceD → DeviceE
may be congested. In addition, on a network with complex topologies, it is difficult to adjust the metric
because the change in the metric of one link may affect multiple routes.
As an overlay model, MPLS can set up a virtual topology over the physical network topology and map traffic
to the virtual topology, effectively combining MPLS and TE technology into MPLS TE.
MPLS TE has advantages in solving the problem of network congestion. Through MPLS TE, carriers can
2022-07-08 1661
Feature Description
precisely control the path through which traffic passes, thus avoiding congested nodes. In addition, MPLS TE
reserves resources during tunnel establishment to ensure service quality.
To ensure service continuity, MPLS TE introduces the path backup and fast reroute (FRR) mechanisms to
switch traffic in time when a link is faulty. MPLS TE allows service providers (SPs) to fully utilize existing
network resources to provide diversified services. In addition, network resources can be optimized for
scientific network management.
To accomplish the preceding tasks, MPLS TE needs to learn TE information about all devices on the network.
However, MPLS TE lacks a mechanism in which each device floods its TE information throughout the entire
network for TE information synchronization. However, IS-IS does provide such a mechanism. Therefore,
MPLS TE can advertise and synchronize TE information with the help of IS-IS. To support MPLS TE, IS-IS
needs to be extended.
In brief, IS-IS TE collects TE information on IS-IS networks and then transmits the TE information to the
Constrained Shortest Path First (CSPF) module.
Basic Principles
IS-IS TE is an extension of IS-IS intended to support MPLS TE. As defined in standard protocols, IS-IS TE uses
LSPs to carry TE information to help MPLS implement the flooding, synchronization, and resolution of TE
information. Then, IS-IS TE transmits the resolved TE information to the CSPF module. In MPLS TE, IS-IS TE
plays the role of a porter. Figure 2 illustrates the relationships between IS-IS TE, MPLS TE, and CSPF.
To carry TE information in LSPs, IS-IS TE defines the following TLVs in standard protocols:
2022-07-08 1662
Feature Description
2022-07-08 1663
Feature Description
It extracts TE information from IS-IS LSPs and transmits the TE information to the CSPF module.
Usage Scenario
IS-IS TE helps MPLS TE set up TE tunnels. In Figure 3, a TE tunnel is set up between Device A and Device C.
• IS-IS and IS-IS TE are enabled on Device A, Device B, Device C, and Device D.
After the configurations are complete, IS-IS on Device A, Device B, Device C, and Device D sends LSPs
carrying TE information configured on each device. Device A obtains the MPLS TE configurations of DeviceB,
DeviceC, and DeviceD from the received LSPs. In this way, Device A obtains the TE information of the entire
network. The CSPF module can use the information to calculate the path required by the tunnel.
After IS-IS wide metric is enabled, TLV type 135 contains information about routes; TLV type 22 contains
information about IS-IS neighbors.
2022-07-08 1664
Feature Description
■ Extended IP Reachability TLV: replaces the earlier IP Reachability TLV and carries information about
routes. This TLV expands the range of the route cost to 4 bytes and carries sub-TLVs.
The metric style can be set to narrow, narrow-compatible, compatible, wide-compatible, or wide mode. Table 1 shows
which metric styles are carried in received and sent packets. A device can calculate routes only when it can receive, send,
and process corresponding TLVs. Therefore, to ensure correct data forwarding on a network, the proper metric style
must be configured for each device on the network.
Table 1 Metric style carried in received and sent under different metric style configurations
Configured Metric Style Metric Style Carried in Metric Style Carried in Sent Packets
Received Packets
When the metric style is set to compatible, IS-IS sends the information both in narrow and wide modes.
Process
• If the metric style carried in sent packets is changed from narrow to wide:
The information previously carried by TLV type 128, TLV type 130, and TLV type 2 is now carried by TLV
type 135 and TLV type 22.
• If the metric style carried in sent packets is changed from wide to narrow:
The information previously carried by TLV type 135 and TLV type 22 is now carried by TLV type 128, TLV
type 130, and TLV type 2.
• If the metric style carried in sent packets is changed from narrow or wide to narrow and wide:
2022-07-08 1665
Feature Description
The information previously carried in narrow or wide mode is now carried by TLV type 128, TLV type
130, TLV type 2, TLV type 135, and TLV type 22.
Usage Scenario
IS-IS wide metric is used to support IS-IS TE, and the metric style needs to be set to wide, compatible or wide
compatible.
• Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators) are set using
commands, and requests must be delivered manually to establish BFD sessions.
• Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing protocols.
BFD for IS-IS enables BFD sessions to be dynamically established. After detecting a fault, BFD notifies IS-IS of
the fault. IS-IS sets the neighbor status to Down, quickly updates link state protocol data units (LSPs), and
performs the partial route calculation (PRC). BFD for IS-IS implements fast IS-IS route convergence.
Instead of replacing the Hello mechanism of IS-IS, BFD works with IS-IS to rapidly detect the faults that occur on
neighboring devices or links.
■ Global BFD is enabled on each device, and BFD is enabled on a specified interface or process.
■ Neighbors are Up, and a designated intermediate system (DIS) has been elected on a broadcast
2022-07-08 1666
Feature Description
network.
■ P2P network
After the conditions for establishing BFD sessions are met, IS-IS instructs the BFD module to
establish a BFD session and negotiate BFD parameters between neighbors.
■ Broadcast network
After the conditions for establishing BFD sessions are met and the DIS is elected, IS-IS instructs BFD
to establish a BFD session and negotiate BFD parameters between the DIS and each device. No
BFD sessions are established between non-DISs.
On broadcast networks, devices (including non-DIS devices) of the same level on a network segment
can establish adjacencies. In BFD for IS-IS, however, BFD sessions are established only between the DIS
and non-DISs. On P2P networks, BFD sessions are directly established between neighbors.
If a Level-1-2 neighbor relationship is set up between the devices on both ends of a link, the following
situations occur:
■ On a broadcast network, IS-IS sets up a Level-1 BFD session and a Level-2 BFD session.
■ P2P network
If the neighbor relationship established between P2P IS-IS interfaces is not Up, IS-IS tears down the
BFD session.
■ Broadcast network
If the neighbor relationship established between broadcast IS-IS interfaces is not Up or the DIS is
reelected on the broadcast network, IS-IS tears down the BFD session.
If the configurations of dynamic BFD sessions are deleted or BFD for IS-IS is disabled from an interface,
all Up BFD sessions established between the interface and its neighbors are deleted. If the interface is a
DIS and the DIS is Up, all BFD sessions established between the interface and its neighbors are deleted.
If BFD is disabled from an IS-IS process, BFD sessions are deleted from the process.
BFD detects only the one-hop link between IS-IS neighbors because IS-IS establishes only one-hop neighbor
relationships.
Usage Scenario
2022-07-08 1667
Feature Description
Dynamic BFD needs to be configured based on the actual network. If the time parameters are not configured correctly,
network flapping may occur.
BFD for IS-IS speeds up route convergence through rapid link failure detection. The following is a networking
example for BFD for IS-IS.
If the link between Device A and Device B fails, BFD can rapidly detect the fault and report it to IS-IS. IS-IS
sets the neighbor status to Down to trigger an IS-IS topology calculation. IS-IS also updates LSPs so that
Device C can promptly receive the updated LSPs from Device B, which accelerates network topology
convergence.
Context
As networks develop, services such as Voice over IP (VoIP) and online video services require high-quality and
real-time transmission. However, if a link fails, IS-IS must complete the following procedure before switching
traffic to a new link: detect the fault, update LSPs, flood LSPs, calculate routes, and deliver route entries to
the FIB. This is a lengthy process, and the associated traffic interruption is often longer than users can
tolerate. As a result, real-time transmission requirements cannot be met.
IS-IS Auto fast re-route (FRR) is a dynamic IP FRR technology that minimizes traffic loss by immediately
switching traffic to the backup link pre-computed by an IGP based on the LSDBs on the entire network and
stored in the FIB if a link or adjacent node failure is detected. As IP FRR implements route convergence, it is
becoming increasingly popular with carriers.
Major Auto FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, TI-LFA, Remote LFA, and
MRT, among which IS-IS supports only LFA, TI-LFA, and Remote LFA.
2022-07-08 1668
Feature Description
Related Concepts
LFA
LFA is an IP FRR technology that calculates the shortest path from the neighbor that can provide a backup
link to the destination node based on the Shortest Path First (SPF) algorithm. Then, a loop-free backup link
with the smallest cost is calculated according to the following inequality:
Distance_opt (N, D) < Distance_opt (N, S) + Distance_opt (S, D). In the inequality, S, D, and N indicate the
source node, destination node, and a node on the backup link, respectively, and Distance_opt (X, Y) indicates
the shortest distance from node X to node Y.
P space
P space consists of the nodes through which the shortest path trees (SPTs) with the source node of a
primary link as the root are reachable without passing through the primary link.
Extended P space
Extended P space consists of the nodes through which the SPTs with neighbors of a primary link's source
node as the root are reachable without passing through the primary link.
Q space
Q space consists of the nodes through which the SPTs with the destination node of a primary link as the root
are reachable without passing through the primary link.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the destination of
a protection tunnel.
Remote LFA
LFA FRR cannot be used to calculate backup links on large-scale networks, especially on ring networks.
Remote LFA Auto FRR addresses this problem by calculating a PQ node and establishing a tunnel between
the source node of a primary link and the PQ node. If the primary link fails, traffic can be automatically
switched to the tunnel, which improves network reliability.
When calculating an RLFA FRR backup path, a Huawei device calculates the extended P space by default.
TI-LFA
In some LFA FRR and RLFA scenarios, the extended P space and Q space neither intersect nor have direct
neighbors. Consequently, no backup path can be computed, failing to meet reliability requirements. TI-LFA
solves this problem by computing the extended P space, Q space, and post-convergence SPT based on the
protected path, computing a scenario-specific repair list, and establishing an SR tunnel from the source node
to a P node and then to a Q node to offer alternate next hop protection. If the protected link fails, traffic is
automatically switched to the backup path, improving network reliability.
When computing a TI-LFA FRR backup path, Huawei devices compute the extended P space by default.
2022-07-08 1669
Feature Description
• Link protection: Link protection applies to traffic transmitted over specified links.
In the example network shown in Figure 1, traffic flows from DeviceS to DeviceD, and the link cost
meets the preceding link protection inequality. If the primary link (DeviceS -> DeviceD) fails, DeviceS
switches the traffic to the backup link (DeviceS -> DeviceN -> DeviceD), minimizing traffic loss.
• Node-and-link protection: Node-and-link protection applies to traffic transmitted over specified nodes
or links. Figure 2 illustrates the networking. Node-and-link protection takes precedence over link
protection.
Node-and-link protection takes effect when the following conditions are met:
1. The link cost satisfies the inequality: Distance_opt (N, D) < Distance_opt (N, S) + Distance_opt (S,
D).
2. The interface cost of the device satisfies the inequality: Distance_opt (N, D) < Distance_opt (N, E)
+ Distance_opt (E, D).
S indicates the source node of traffic, E indicates the faulty node, N indicates the node on the
backup link, and D indicates the destination node of traffic.
2022-07-08 1670
Feature Description
Similar to IS-IS LFA Auto FRR, Remote LFA is also classified as link protection or node-and-link protection.
The following example shows how Remote LFA works to protect against link failures:
In Figure 3, traffic flows through PE1 -> P1 -> P2 -> PE2. To prevent traffic loss in the case of a failure on the
link between P1 and P2, remote LFA calculates a PQ node (P4) and establishes a Label Distribution Protocol
(LDP) tunnel between P1 and P4. If P1 detects a failure on the link to P2, P1 encapsulates packets into MPLS
packets and forwards the MPLS packets to P4. After receiving the packets, P4 removes the MPLS label from
them, searches its IP routing table for a next hop, forwards the packets accordingly. In this way, the packets
finally reach PE2. Remote LFA ensures uninterrupted traffic forwarding.
On the network shown in Figure 3, Remote LFA calculates the PQ node as follows:
1. Calculates an SPT with each of P1's neighbors (PE1 and P3, excluding the neighbors on the protection
link) as the root. For each SPT, an extended P space is composed of the root node and those reachable
nodes that belong to the SPT but do not pass through the P1→P2 link. When PE1 is used as a root
node for calculation, the extended P space {PE1, P1, P3} is obtained. When P3 is used as a root node
for calculation, the extended P space {PE1, P1, P3, P4} is obtained. By combining the two extended P
spaces, the final extended P space {PE1, P1, P3, P4} is obtained.
2. Calculates a reverse SPT with P2 as the root. The Q space is {P2, PE2, P4}.
3. Determines the PQ node that is in both the extended P space and Q space. Therefore, the PQ node is
P4 in this example.
IPv6 IS-IS Remote LFA Auto FRR protects IPv6 traffic and uses IPv4 LDP LSPs. The principle of IPv6 IS-IS Remote LFA
Auto FRR is similar to that of IPv4 IS-IS Remote LFA Auto FRR.
IS-IS FRR in the Scenario Where Multiple Nodes Advertise the Same
Route
IS-IS LFA FRR uses the SPF algorithm to calculate the shortest path to the destination node, with each
neighbor that provides a backup link as the root node. The calculated backup next hop is node-based, which
applies to the scenario where each route is received from a single node. As networks diversify, multiple
nodes may advertise the same route. In this case, LFA conditions in the scenario where each route is received
from a single node cannot be met. As a result, the backup next hop cannot be calculated. IS-IS FRR for the
2022-07-08 1671
Feature Description
scenario where multiple nodes advertise the same route can address this problem by using one of the route
sources to protect the primary route source, improving network reliability.
Figure 4 IS-IS FRR in the scenario where multiple nodes advertise the same route
In Figure 4(a), the cost of the link between Device A and Device B is 5, whereas the cost of the link between
Device A and Device C is 10. Both Device B and Device C advertise the route 10.1.1.0/24. IS-IS FRR is enabled
on Device A. However, single-node LFA conditions are not met. As a result, Device A fails to calculate the
backup next hop of the route 10.1.1.0/24. IS-IS FRR in the scenario where multiple nodes advertise the same
route can address this problem.
In Figure 4(b), a virtual node is simulated between Device B and Device C and is connected to Device B and
Device C. The cost of the link from Device B or Device C to the virtual node is 0, whereas the cost of the link
from the virtual node to Device B or Device C is the maximum value. After the virtual node advertises the
route 10.1.1.0/24, the backup next hop is calculated for the virtual node because the scenario where multiple
nodes advertise the same route has been converted to the scenario where the route is received from only
one node. Then the route 10.1.1.0/24 inherits the backup next hop from the virtual node. Device A computes
two links to the virtual node. The primary link is from Device A to Device B, and the backup link is from
Device A to Device C.
2022-07-08 1672
Feature Description
Equal cost multi path (ECMP) evenly balances traffic over multiple equal-cost paths to the same destination.
If the ECMP FRR function is not supported in ECMP scenarios, no backup next hop can be calculated for
primary links.
IS-IS ECMP FRR is enabled by default, and a backup next hop is calculated separately for each primary link,
which enhances reliability in ECMP scenarios. With ECMP FRR, IS-IS pre-calculates backup paths for load
balancing links based on the LSDBs on the entire network. The backup paths are stored in the forwarding
table and are used for traffic protection in the case of link failures.
• In Figure 5, traffic is forwarded from Device A to Device D and is balanced among link 1, link 2, and link
3. Backup paths of the three links are calculated based on ECMP FRR. For example, the backup paths of
link 1, link 2, and link 3 are link 3, link 3, and link 2, respectively.
■ If the ECMP FRR function is not enabled in the load balancing scenario and link 1 fails, traffic over
link 1 is randomly switched to link 2 or link 3, which affects service traffic management.
■ If the ECMP FRR function is enabled in the load balancing scenario and link 1 fails, traffic over link
1 is switched to link 3 according to FRR route selection rules, which enhances service traffic
management.
• In Figure 6, traffic is forwarded from Device A to Device D and is balanced between link 1 and link 2.
Backup paths of the two links are calculated based on ECMP FRR. For example, the backup paths of link
1 and link 2 are both link 3.
■ If the ECMP FRR function is not enabled in the load balancing scenario and Device B fails, link 1
and link 2 fail accordingly, leading to a traffic interruption.
■ If the ECMP FRR function is enabled in the load balancing scenario and Device B fails, link 1 and
link 2 fail accordingly. However, traffic is switched to link 3, which prevents the traffic interruption.
2022-07-08 1673
Feature Description
If IS-IS LFA Auto FRR is enabled, it implements protection for the two links by calculating a backup link if
either Link 1 or Link 2 fails.
• If Link 1 fails but Link 2 is normal, traffic is not interrupted after being switched to the backup link.
• If both Link 1 and Link 2 fail, traffic is interrupted after being switched to the backup link.
IS-IS SRLG FRR prevents service interruptions in the scenario where links have the same risk of failure. To
prevent traffic interruption in this case, add link 1 and link 2 to an SRLG so that a link outside the SRLG is
preferentially selected as a backup link.
2022-07-08 1674
Feature Description
Background
As the Internet develops, more data, voice, and video information are exchanged over the Internet. New
services, such as e-commerce, online conferencing and auctions, video on demand, and distance learning,
emerge gradually. The new services have high requirements for network security. Carriers need to prevent
data packets from being illegally obtained or modified by attackers or unauthorized users. IS-IS
authentication applies to the area or interface where packets need to be protected. Using IS-IS
authentication enhances system security and helps carriers provide safe network services.
Related Concepts
Authentication Classification
Based on packet types, the authentication is classified as follows:
• Interface authentication: is configured in the interface view to authenticate Level-1 and Level-2 IS-to-IS
Hello PDUs (IIHs).
• Area authentication: is configured in the IS-IS process view to authenticate Level-1 CSNPs, PSNPs, and
LSPs.
• Routing domain authentication: is configured in the IS-IS process view to authenticate Level-2 CSNPS,
PSNPs, and LSPs.
Based on the authentication modes of packets, the authentication is classified into the following types:
• Simple authentication: The authenticated party directly adds the configured password to packets for
authentication. This authentication mode provides the lowest password security.
• MD5 authentication: uses the MD5 algorithm to encrypt a password before adding the password to the
packet, which improves password security. For the sake of security, using the HMAC-SHA256 algorithm
rather than the MD5 algorithm is recommended.
• Keychain authentication: further improves network security with a configurable key chain that changes
with time.
• HMAC-SHA256 authentication: uses the HMAC-SHA256 algorithm to encrypt a password before adding
the password to the packet, which improves password security.
Implementation
IS-IS authentication encrypts IS-IS packets by adding the authentication field to packets to ensure network
security. After receiving IS-IS packets from a remote router, a local router discards the packets if the
authentication passwords in the packets are different from the locally configured one. This mechanism
protects the local router.
2022-07-08 1675
Feature Description
IS-IS provides a type-length-value (TLV) to carry authentication information. The TLV components are as
follows:
• Type: indicates the type of a packet, which is 1 byte. The value defined by ISO is 10, whereas the value
defined by IP is 133.
• Value: indicates the authentication information, including authentication type and authenticated
password, which ranges from 1 to 254 bytes. The authentication type is 1 byte:
■ 0: reserved
■ 1: simple authentication
Interface Authentication
Authentication passwords for IIHs are saved on interfaces. The interfaces send authentication packets with
the authentication TLV. Interconnected router interfaces must be configured with the same password.
Area Authentication
Every router in an IS-IS area must use the same authentication mode and have the same key chain.
Routing Domain Authentication
Every Level-2 or Level-1-2 router in an IS-IS area must use the same authentication mode and have the
same key chain.
For area authentication and routing domain authentication, you can set a router to authenticate SNPs and
LSPs separately in the following ways:
• A router sends LSPs and SNPs that carry the authentication TLV and verifies the authentication
information of the LSPs and SNPs it receives.
• A router sends LSPs that carry the authentication TLV and verifies the authentication information of the
LSPs it receives. The router sends SNPs that carry the authentication TLV and does not verify the
authentication information of the SNPs it receives.
• A router sends LSPs that carry the authentication TLV and verifies the authentication information of the
LSPs it receives. The router sends SNPs without the authentication TLV and does not verify the
authentication information of the SNPs it receives.
• A router sends LSPs and SNPs that carry the authentication TLV but does not verify the authentication
information of the LSPs and SNPs it receives.
Context
2022-07-08 1676
Feature Description
If network-wide IS-IS LSP deletion causes network instability, source tracing must be implemented as soon
as possible to locate and isolate the fault source. However, IS-IS itself does not support source tracing. A
conventional solution is to isolate nodes one by one until the fault source is located, but the process is
complex and time-consuming and may compromise network services. To address this problem, enable IS-IS
purge source tracing.
IS-IS purge source tracing is a Huawei proprietary protocol.
Related Concepts
• PS-PDU: packets that carry information about the node that floods IS-IS purge LSPs.
• CAP-PDU: packets used to negotiate the IS-IS purge source tracing capability between IS-IS neighbors.
• IS-IS purge source tracing port: UDP port number used to send and receive IS-IS purge source tracing
packets. This UDP port number is configurable.
Fundamentals
IS-IS purge LSPs do not carry source information. If a device fails on the network, a large number of purge
LSPs are flooded. Without a source tracing mechanism, nodes need to be isolated one by one until the faulty
node is located, which is labor-intensive and time-consuming. IS-IS purge LSPs will trigger route flapping on
the network, or even routes become unavailable. In this case, the device that floods the purge LSPs must be
located and isolated immediately.
A solution that can meet the following requirements is required:
• 1. Information about the source that flooded the purge LSPs can be obtained when network routes are
unreachable.
• 2. The method used to obtain source information must apply to all devices on the network and support
incremental deployment, without compromising routing capabilities.
For requirement 1, IS-IS purge source tracing uses UDP to send and receive source tracing packets. These
packets carry IS-IS LSP information purged by the faulty device and are flooded hop by hop along the IS-IS
neighbor topology. After IS-IS purge source tracing packets are flooded, you can log in to any device that
supports IS-IS purge source tracing to view information about the device that flooded the purge LSPs. This
helps you quickly locate and isolate the faulty node.
For requirement 2, IS-IS purge source tracing forwards packets along UDP channels that are independent of
the channels used to transmit IS-IS packets. In addition, source tracing does not affect the devices with the
related UDP port disabled.
Capability Negotiation
Source tracing packets are transmitted over UDP. Devices listen for the UDP port and use it to send and
receive source tracing packets. If a source tracing-capable device sends source tracing packets to a device
that is source tracing-incapable, the former may be incorrectly identified as an attacker. Therefore, the
2022-07-08 1677
Feature Description
source tracing capability needs to be negotiated between devices so that source tracing packets are
exchanged between only source tracing-capable devices. In addition, source tracing capability negotiation is
also required to enable a source tracing-capable device to send source tracing information on behalf of a
source tracing-incapable device.
Source tracing capability negotiation depends on IS-IS neighbor relationships. Specifically, after an IS-IS
neighbor relationship is established, the local device initiates source tracing capability negotiation based on
the IP address of the neighbor.
PS-PDU Generation
If a fault source purges an LSP, it generates and floods a PS-PDU to all its source tracing neighbors.
If a device receives a purge LSP from a source tracing-incapable neighbor, the device generates and floods a
PS-PDU to all its neighbors. If a device receives the same purge LSP (with the same LSP ID and sequence
number) from more than one source tracing-incapable neighbor, the device generates only one PS-PDU.
PS-PDU flooding is similar to IS-IS LSP flooding.
Security Concern
A UDP port is used to send and receive source tracing packets. Therefore, the security of the port must be
taken into consideration.
The source tracing protocol inevitably increases packet receiving and sending workload and intensifies
bandwidth pressure. To minimize its impact on other protocols, the number of source tracing packets must
be controlled.
• Authentication
Source tracing is embedded in the IGP, inherits existing configuration parameters of the IGP, and uses
authentication parameters of the IGP to authenticate packets.
• GTSM
GTSM is a security mechanism that checks whether the time to live (TTL) value in each received IP
packet header is within a pre-defined range.
Source tracing packets can only be flooded as far as one hop. Therefore, GTSM can be used to check
such packets by default. When a device sends a packet, it sets the TTL of the packet to 255. If the TTL is
not 254 when the packet is received, the packet will be discarded.
• CPU-CAR
The NP module on interface boards can check the packets to be sent to the CPU for processing and
prevent the main control board from being overloaded by a large number of packets that are sent to
the CPU.
The source tracing protocol needs to apply for an independent CAR channel and has small CAR values
configured.
Typical Scenarios
2022-07-08 1678
Feature Description
All nodes on the network support source tracing and DeviceA is the fault source.
When DeviceA purges an IGP packet, it floods a source tracing packet that carries DeviceA information and
brief information about the IGP packet. Then the source tracing packet is flooded on the network hop by
hop. After the fault occurs, maintenance personnel can log in to any node on the network to locate DeviceA,
which keeps sending purge LSPs, and isolate DeviceA from the network.
Scenario where source tracing-incapable nodes are not isolated from source tracing-capable nodes
All nodes on the network except DeviceC support source tracing, and DeviceA is the faulty source. In this
scenario, PS-PDUs can be flooded on the entire network, and the fault source can be accurately located.
Figure 2 shows the networking.
2022-07-08 1679
Feature Description
Figure 2 Scenario where source tracing-incapable nodes are not isolated from source tracing-capable nodes
When DeviceA purges an IGP packet, it floods a source tracing packet that carries DeviceA information and
brief information about the IGP packet. Then the source tracing packet is flooded on the network hop by
hop. When DeviceB and DeviceE negotiate the source tracing capability with DeviceC, they find that DeviceC
does not support source tracing. After DeviceB receives the PS-PDU from DeviceA, DeviceB sends the packet
to DeviceD, but not to DeviceC. After receiving the purge LSP from DeviceC, DeviceE finds that DeviceC does
not support source tracing and then generates a PS-PDU which carries information about the advertisement
source (DeviceE), purge source (DeviceC), and the purged LSP, and floods the PS-PDU on the network.
After the fault occurs, maintenance personnel can log in to any node on the network except DeviceC to
locate the faulty node. Two possible faulty nodes can be located in this case: DeviceA and DeviceC, and they
both sends the same purge LSP. In this case, DeviceA takes precedence over DeviceC when the maintenance
personnel determine the most probable fault source. After DeviceA is isolated, the network recovers, ruling
out the possibility that DeviceC is the fault source.
Scenario where source tracing-incapable nodes are isolated from source tracing-capable nodes
Assume that all devices except DeviceC and DeviceD support source tracing and DeviceA is the fault source.
In this scenario, PS-PDUs cannot be flooded on the entire network. The fault source locating is complicated.
Figure 3 shows the networking.
2022-07-08 1680
Feature Description
Figure 3 Scenario where source tracing-incapable nodes are isolated from source tracing-capable nodes
When DeviceA purges an IS-IS LSP, it floods a PS-PDU that carries node A information and brief information
about the LSP. However, the PS-PDU sent by DeviceA can only reach DeviceB because DeviceC and DeviceD
do not support IS-IS purge source tracing.
During source tracing capability negotiation, DeviceE and DeviceF find that DeviceC and DeviceD do not
support source tracing, respectively. After receiving the purge LSP from DeviceC, DeviceE generates and
floods a PS-PDU on behalf of DeviceC. Similarly, after receiving the purge LSP from DeviceD, DeviceF
generates and floods a PS-PDU on behalf of DeviceD.
After the fault occurs, maintenance personnel can locate the fault source (DeviceA) directly if they log in to
DeviceA or DeviceB. After DeviceA is isolated, the network recovers. However, if the personnel log in to
DeviceE, DeviceF, DeviceG, or DeviceH, they will find that DeviceE claims DeviceC to be the fault source and
DeviceF claims DeviceD to be the fault source. If the personnel then log in to DeviceC or DeviceD, they will
find that the purge LSP was sent by DeviceB, and was not generated by DeviceC or DeviceD. If the personnel
then log in to DeviceB, they will determine that DeviceA is the fault source. After DeviceA is isolated, the
network recovers.
10.8.2.16 IS-IS MT
With IS-IS multi-topology (MT), IPv6, multicast, and advanced topologies can have their own routing tables.
This feature prevents packet loss if an integrated topology and the IPv4/IPv6 dual stack are deployed,
isolates multicast services from unicast routes, improves network resource usage, and reduces network
construction cost.
Context
2022-07-08 1681
Feature Description
On a traditional IP network, IPv4 and IPv6 share the same integrated topology, and only one unicast
topology exists, which causes the following problems:
• Packet loss if the IPv4/IPv6 dual stack is deployed: If some Routers and links in an IPv4/IPv6 topology do
not support IPv4 or IPv6, they cannot receive IPv4 or IPv6 packets sent from the Router that supports
the IPv4/IPv6 dual stack. As a result, these packets are discarded.
• Multicast services highly depending on unicast routes: Only one unicast forwarding table is available on
the forwarding plane because only one unicast topology exists, which forces services transmitted from
one router to the same destination address to share the same next hop, and various end-to-end
services, such as voice and data services, to share the same physical links. As a result, some links may be
heavily congested whereas others remain relatively idle. In addition, the multicast reverse path
forwarding (RPF) check depends on the unicast routing table. If the default unicast routing table is used
when transmitting multicast services, multicast services depend heavily on unicast routes, a multicast
distribution tree cannot be planned independently of unicast routes, and unicast route changes affect
multicast distribution tree establishment.
Deploying multiple topologies for different services on a physical network can address these problems. IS-IS
MT transmits MT information through new TLVs in IS-IS packets. Users can deploy multiple logical
topologies based on IP protocols or service types supported by links so that SPF calculations are performed
independently in different topologies, which improves network usage.
If an IPv4 or IPv6 BFD session is Down in a topology on a network enabled with MT, neighbors of the IPv4 or IPv6
address family will be affected.
Related Concepts
IS-IS MT allows multiple route selection subsets to be deployed on a versatile network infrastructure and
divides a physical network into multiple logical topologies, where each topology performs its own SPF
calculations.
IS-IS MT, an extension of IS-IS, allows multiple topologies to be applied to IS-IS. IS-IS MT complies with
standard protocols and transmits multi-topology information using new TLVs in IS-IS packets. Users can
deploy multiple logical topologies on a physical network. Each topology performs its own SPF calculations
and maintains its own routing table. Traffic of different services, including the traffic transmitted in different
IP topologies, has its own optimal forwarding path.
The MT ID configured on an interface identifies the topology bound to the interface. One or more MT IDs
can be configured on a single interface.
Reverse path forwarding (RPF) check: After receiving a packet, a device searches its unicast routing table,
MBGP routing table, MIGP routing table, and multicast static routing table based on the packet source and
selects an optimal route from these routing tables as the RPF route. If the interface that the packet arrives at
is the same as the RPF interface, the packet passes the RPF check and is forwarded. Otherwise, the RPF
check fails and traffic is interrupted.
2022-07-08 1682
Feature Description
Implementation
IS-IS MT uses MT IDs to identify different topologies. Each Hello packet or LSP sent by a Router carries one
or more MT TLVs of the topologies to which the source interface belongs. If the Router receives from a
neighbor a Hello packet or LSP that carries only some of the local MT TLVs, the Router assumes that the
neighbor belongs to only the default IPv4 topology. On a point-to-point (P2P) link, an adjacency cannot be
established between two neighbors that share no common MT ID. On broadcast links, adjacencies can still
be established between neighbors even if they do not share the same MT ID.
Figure 1 shows the MT TLV format.
The following section uses IS-IS MT to describe separation of the dual stack and the separation of the
multicast topology from the unicast topology.
• Figure 2 shows the networking for separation of the IPv4 topology from the IPv6 topology. The values
in the networking diagram are link costs. Device A, Device C, and Device D support the IPv4/IPv6 dual
stack; Device B supports IPv4 only and cannot forward IPv6 packets.
Without IS-IS MT, Device A, Device B, Device C, and Device D use the IPv4/IPv6 topology to perform SPF
calculation. In this case, the shortest path from Device A to Device D is Device A -> Device B- > Device
2022-07-08 1683
Feature Description
D. IPv6 packets cannot reach Device D through Device B because Device B does not support IPv6.
If a separate IPv6 topology is set up using IS-IS MT, Device A chooses only IPv6 links to forward IPv6
packets. In this case, the shortest path from Device A to Device D is Device A -> Device C -> Device D.
• Figure 3 shows the networking for separation between unicast and multicast topologies using IS-IS MT.
On the network shown in Figure 3, all Routers are interconnected using IS-IS. A TE tunnel is set up
between Device A (ingress) and Device E (egress). The outbound interface of the route calculated by IS-
IS may not be a physical interface but a TE tunnel interface. In this case, Router C through which the TE
tunnel passes cannot set up multicast forwarding entries. As a result, multicast services cannot be
transmitted.
IS-IS MT addresses this problem by establishing separate unicast and multicast topologies. TE tunnels
are excluded from a multicast topology. Therefore, multicast services are unaffected by TE tunnels.
Background
On a network where multicast and a unidirectional TE tunnel are deployed, if the TE tunnel is configured
with IGP Shortcut, IS-IS uses an MPLS TE tunnel that is up to perform SPF calculation. In this case, the
outbound interface of the route calculated by IS-IS may be a TE tunnel interface rather than a physical
interface. As a result, the routers spanned by the TE tunnel cannot detect multicast packets and may discard
multicast data packets, affecting network reliability. Figure 1 shows the networking.
2022-07-08 1684
Feature Description
1. Client sends a Report message to DeviceA, requesting to join a multicast group. Upon receipt, DeviceA
sends a Join message to DeviceB.
2. When the Join message reaches DeviceB, DeviceB selects TE-Tunnel1/0/0 as the Reverse Path
Forwarding (RPF) interface and forwards the message to DeviceC through Interface 2 based on an
MPLS label.
3. Because the Join message is forwarded based on an MPLS label, DeviceC does not create a multicast
forwarding entry. As the penultimate hop of the MPLS forwarding, DeviceC removes the MPLS label
and forwards the Join message to DeviceD through Interface2.
4. After DeviceD receives the Join message, it generates a multicast forwarding entry in which the
upstream and downstream interfaces are Interface1 and Interface2, respectively. DeviceD then sends
the Join message to DeviceE. Then the shortest path tree is established.
5. When DeviceD receives traffic from the multicast source, DeviceD sends traffic to DeviceC. Because
DeviceC has not created a forwarding entry for the traffic, the traffic is discarded. As a result, multicast
services are interrupted.
Related Concepts
IS-IS local MT is a mechanism that enables the routing management (RM) module to create a separate
multicast topology on the local device so that protocol packets exchanged between devices are not
erroneously discarded. When the outbound interface of the route calculated by IS-IS is an IGP Shortcut-
enabled TE tunnel interface, IS-IS local MT calculates a physical outbound interface for the route. This
2022-07-08 1685
Feature Description
Implementation
Figure 2 shows how multicast packets are forwarded after local MT is enabled.
2022-07-08 1686
Feature Description
Usage Scenario
IS-IS local MT prevents multicast services from being interrupted on networks, which allows multicasting and
has an IGP Shortcut-enabled TE tunnel.
Benefits
Local MT resolves the conflict between multicast and a TE tunnel and improves multicast service reliability.
The first eight bytes in all IS-IS PDUs are public. Figure 1 shows the IS-IS PDU format.
2022-07-08 1687
Feature Description
• Intradomain Routing Protocol Discriminator: network layer protocol identifier assigned to IS-IS, which is
0x83.
• ID Length: length of the system ID of network service access point (NSAP) addresses or NETs in this
routing domain.
• Maximum Area Address: maximum number of area addresses supported by an IS-IS area. The value 0
indicates that a maximum of three area addresses are supported by this IS-IS area.
• Type/Length/Value (TLV): encoding type that features high efficiency and expansibility. Each type of
PDU contains a different TLV. Table 2 shows the mapping between TLV codes and PDU types.
8 Padding IIH
2022-07-08 1688
Feature Description
2022-07-08 1689
Feature Description
As shown in Figure 3, most fields in a P2P IIH are the same as those in a LAN IIH. The P2P IIH does not
have the priority and LAN ID fields but has a local circuit ID field. The local circuit ID indicates the local
link ID.
LSP Format
LSPs are used to exchange link-state information. There are two types of LSPs: Level-1 and Level-2. Level-1
IS-IS transmits Level-1 LSPs. Level-2 IS-IS transmits Level-2 LSPs. Level-1-2 IS-IS can transmit both Level-1
and Level-2 LSPs.
Level-1 and Level-2 LSPs have the same format, as shown in Figure 4.
2022-07-08 1690
Feature Description
SNP Format
SNPs describe the LSPs in all or some of the databases and are used to synchronize and maintain all LSDBs.
SNPs consist of complete SNPs (CSNPs) and partial SNPs (PSNPs).
• CSNPs carry summaries of all LSPs in LSDBs, which ensures LSDB synchronization between neighboring
2022-07-08 1691
Feature Description
routers. On a broadcast network, the designated intermediate system (DIS) sends CSNPs at an interval.
The default interval is 10 seconds. On a P2P link, neighboring devices send CSNPs only when a neighbor
relationship is established for the first time.
Figure 5 shows the CSNP format.
• PSNPs list only the sequence numbers of recently received LSPs. A PSNP can acknowledge multiple LSPs
at a time. If an LSDB is not updated, PSNPs are also used to request a new LSP from a neighbor.
Figure 6 shows the PSNP format.
2022-07-08 1692
Feature Description
10.8.2.19 IS-IS GR
The NE40E can be configured as a GR helper rather than a GR restarter. This function is enabled by default and does not
need to be configured additionally.
Graceful restart (GR) is a technology that ensures normal data forwarding and prevents key services from
being affected when a routing protocol restarts.
When GR is not supported, the active/standby switchover triggered by various reasons causes short-time
forwarding interruption and route flapping on the entire network. Route flapping and service interruption
are unacceptable on a large-scale network, especially on a carrier network.
GR is an HA technique introduced to resolve the preceding problem. HA technologies comprise a set of
comprehensive techniques, such as fault-tolerant redundancy, link protection, faulty node recovery, and
traffic engineering. As a fault-tolerant redundancy technology, GR is widely used to ensure non-stop
forwarding of key data during the active/standby switchover and system upgrade.
When the GR function is enabled, the forwarding plane continues data forwarding during a restart, and
operations on the control plane, such as re-establishment of neighbor relationships and route calculation, do
not affect the forwarding plane, preventing service interruption caused by route flapping and improving
network reliability.
2022-07-08 1693
Feature Description
Related Concepts
Redistribute ID
IS-IS uses a system ID as a redistribution identifier, OSPF and OSPFv3 use a router ID + process ID as a
redistribution identifier, and BGP uses a VrfID + random number as a redistribution identifier. For ease of
understanding, the redistribution identifiers of different protocols are all called Redistribute IDs. When routes
are distributed, the extended TLVs carried in the routes contain Redistribute IDs.
Redistribute List
A Redistribute list may consist of multiple Redistribute IDs. Each Redistribute list of BGP contains a maximum
of four Redistribute IDs, and each Redistribute list of any other routing protocol contains a maximum of two
Redistribute IDs. When the number of Redistribute IDs exceeds the corresponding limit, the old ones are
discarded according to the sequence in which Redistribute IDs are added.
Take DeviceA distributing route 10.0.0.1/32 as an example. A stable routing loop is formed through the
following process:
Phase 1
On the network shown in Figure 2, IS-IS process 1 on DeviceA imports the static route 10.0.0.1, generates an
LSP carrying the prefix of this route and floods the LSP in IS-IS process 1. After receiving the LSP, IS-IS
process 1 on DeviceD and IS-IS process 1 on DeviceE each calculate a route to 10.0.0.1, with the outbound
interface being interface1 on DeviceD and interface1 on DeviceE, respectively, and the cost being 110. At this
point, the routes to 10.0.0.1 in IS-IS process 1 in the routing tables of DeviceD and DeviceE are active.
2022-07-08 1694
Feature Description
Figure 2 Phase 1
Phase 2
In Figure 3, DeviceD and DeviceE are configured to import routes from IS-IS process 1 to IS-IS process 2.
Either no route-policy is configured for the import or the configured route-policy is improper. DeviceE is used
as an example. In phase 1, the route to 10.0.0.1 in IS-IS process 1 in the routing table of DeviceE is active. In
this case, IS-IS process 2 imports this route from IS-IS process 1, generates an LSP carrying the prefix of this
route, and floods the LSP in IS-IS process 2. After receiving the LSP, IS-IS process 2 on DeviceD calculates a
route to 10.0.0.1, with the cost being 10, which is smaller than that (110) of the route calculated by IS-IS
process 1. As a result, the active route to 10.0.0.1 in the routing table of DeviceD is switched from the one
calculated by IS-IS process 1 to the one calculated by IS-IS process 2, and the outbound interface is sub-
interface 2.1.
Figure 3 Phase 2
Phase 3
In Figure 4, after the route to 10.0.0.1 in IS-IS process 2 on DeviceD becomes active, IS-IS process 1 imports
this route from IS-IS process 2, generates an LSP carrying the prefix of this route, and floods the LSP in IS-IS
process 1. After receiving the LSP, IS-IS process 1 on DeviceE recalculates the route to 10.0.0.1, with the cost
being 10, which is smaller than that (110) of the previously calculated route. As a result, the route to
10.0.0.1 in IS-IS process 1 in the routing table of DeviceE is switched to the route (with the smaller cost)
advertised by DeviceD, and the outbound interface is interface 2.
2022-07-08 1695
Feature Description
Figure 4 Phase 3
Phase 4
After the active route to 10.0.0.1 on DeviceE is updated, IS-IS process 2 still imports the route from IS-IS
process 1 as the route remains active, and continues to advertise/update an LSP.
As a result, a stable routing loop is formed. Assuming that traffic is injected from DeviceF, Figure 5 shows
the traffic flow when the routing loop occurs.
2022-07-08 1696
Feature Description
The following uses the networking shown in Figure 6 as an example to describe how a routing loop is
detected and resolved.
1. DeviceD learns the route distributed by DeviceB through IS-IS process 1 and imports the route from IS-
IS process 1 to IS-IS process 2. When distributing the imported route, IS-IS process 2 on DeviceD
distributes the Redistribute ID of IS-IS process 2 through the sub-TLV (with the type value of 10) of
the TLV (with the type value of 135 or 235). Similarly, IS-IS process 2 on DeviceE learns the route
distributed by DeviceD and saves the Redistribute ID distributed by IS-IS process 2 on DeviceD to the
routing table during route calculation.
2. When re-distributing the route imported from IS-IS process 2, IS-IS process 1 on DeviceE also
distributes the Redistribute ID of IS-IS process 1 on DeviceE through the sub-TLV (with the type value
of 10) of the TLV (with the type value of 135 or 235).
3. After learning the route from DeviceE, IS-IS process 1 on DeviceD saves the Redistribute ID distributed
by IS-IS process 1 on DeviceE in the routing table during route calculation.
4. When importing the route from IS-IS process 1 to IS-IS process 2, DeviceD finds that the re-
distribution information of the route contains its own Redistribute ID, considers that a routing loop is
detected, and reports an alarm. IS-IS process 2 on DeviceD distributes a large cost when distributing
the imported route so that other devices preferentially select other paths after learning the route. This
prevents routing loops.
2022-07-08 1697
Feature Description
loop detection for routes imported to IS-IS and routing loop detection for routes imported to OSPF are
enabled by default and do not need to be manually configured.
1. DeviceA distributes its locally originated route 10.1.1.1/24 to DeviceB through IS-IS process 1. DeviceB
imports the route from IS-IS process 1 to OSPF and adds the Redistribute ID of OSPF on DeviceB to
the route when distributing the route through OSPF.
2. After learning the Redistribute list carried in the route advertised by DeviceB, OSPF on DeviceD saves
the Redistribute ID of OSPF on DeviceB to the routing table during route calculation. After DeviceD
imports this route from OSPF to IS-IS process 2, DeviceD redistributes the route through IS-IS process
2. In the redistributed route, the extended TLV contains the Redistribute ID of IS-IS process 2 on
DeviceD and the Redistribute ID of OSPF on DeviceB. After learning the Redistribute list carried in the
route advertised by DeviceD, IS-IS process 2 on DeviceE saves the Redistribute list in the routing table
during route calculation.
3. After DeviceE imports this route from IS-IS process 2 to OSPF, DeviceE redistributes the route through
OSPF. The redistributed route carries the Redistribute ID of OSPF on DeviceE and the Redistribute ID of
IS-IS process 2 on DeviceD. The Redistribute ID of OSPF on DeviceB has been discarded from the route.
DeviceD learns the Redistribute list carried in the route distributed by DeviceE and saves the
Redistribute list in the routing table. When importing the OSPF route to IS-IS process 2, DeviceD finds
that the Redistribute list of the route contains its own Redistribute ID, considers that a routing loop is
detected, and reports an alarm. To resolve the routing loop, IS-IS process 2 on DeviceD distributes a
large route cost when redistributing the route. However, because IS-IS has a higher preference than
OSPF ASE, DeviceE still prefers the route learned from DeviceD through IS-IS process 2. As a result, the
routing loop is not eliminated. The route received by DeviceE carries the Redistribute ID of OSPF on
DeviceE and the Redistribute ID of IS-IS process 2 on DeviceD.
2022-07-08 1698
Feature Description
4. When importing the route from IS-IS process 2 to OSPF, DeviceE finds that the Redistribution
information of the route contains its own Redistribute ID, considers that a routing loop is detected,
and reports an alarm. To resolve the routing loop, OSPF on DeviceE distributes a large route cost
when redistributing the route. In this case, DeviceD prefers the route distributed by DeviceB. As such,
the routing loop is resolved.
When detecting a routing loop upon route import between processes of the same protocol, the device increases
the cost of the corresponding route. As the cost of the delivered route increases, the optimal route in the IP
routing table changes. In this way, the routing loop is eliminated.
In the case of inter-protocol route import, if a routing protocol with a higher preference detects a routing loop,
although this protocol increases the cost of the corresponding route, the cost increase will not render the route
inactive. As a result, the routing loop cannot be eliminated. If the routing protocol with a lower preference
increases the cost of the corresponding route, this route competes with the originally imported route during route
selection. In this case, the routing loop can be eliminated.
Application Scenario
Figure 8 shows a typical intra-AS seamless MPLS network. If the IS-IS process deployed at the access layer
differs from that deployed at the aggregation layer, IS-IS inter-process mutual route import is usually
configured on AGGs so that routes can be leaked between the access and aggregation layers. As a result, a
routing loop may occur between AGG1 and AGG2. If routing loop detection for IS-IS inter-process mutual
route import is configured on AGG1 and AGG2, routing loops can be quickly detected and eliminated.
10.8.3.1 IS-IS MT
Figure 1 shows the use of IS-IS MT to separate an IPv4 topology from an IPv6 topology. Device A, Device C,
and Device D support IPv4/IPv6 dual-stack; Device B supports IPv4 only and cannot forward IPv6 packets.
2022-07-08 1699
Feature Description
If IS-IS MT is not used, Device A, Device B, Device C, and Device D consider the IPv4 and IPv6 topologies the
same when using the SPF algorithm for route calculation. The shortest path from Device A to Device D is
Device A -> Device B- > Device D. Device B does not support IPv6 and cannot forward IPv6 packets to Device
D.
If IS-IS MT is used to establish a separate IPv6 topology, Device A chooses only IPv6 links to forward IPv6
packets. The shortest path from Device A to Device D changes to Device A -> Device C -> Device D. IPv6
packets are then forwarded.
Figure 2 shows the use of IS-IS MT to separate unicast and multicast topologies.
All Routers in Figure 2 are interconnected using IS-IS. A TE tunnel is set up between Device A (ingress) and
Device E (egress). The outbound interface of the route calculated by IS-IS may not be a physical interface
2022-07-08 1700
Feature Description
but a TE tunnel interface. The Routers between which the TE tunnel is established cannot set up multicast
forwarding entries. As a result, multicast services cannot run properly.
IS-IS MT is configured to solve this problem by establishing separate unicast and multicast topologies. TE
tunnels are excluded from a multicast topology; therefore, multicast services can run properly, without being
affected by TE tunnels.
Definition
Border Gateway Protocol (BGP) is a dynamic routing protocol used between autonomous systems (ASs).
As three earlier-released versions of BGP, BGP-1, BGP-2, and BGP-3 are used to exchange reachable inter-AS
routes, establish inter-AS paths, avoid routing loops, and apply routing policies between ASs.
Currently, BGP-4 is used.
As an exterior routing protocol on the Internet, BGP has been widely used among Internet service providers
(ISPs).
BGP has the following characteristics:
• Unlike an Interior Gateway Protocol (IGP), such as Open Shortest Path First (OSPF) and Routing
Information Protocol (RIP), BGP is an Exterior Gateway Protocol (EGP) which controls route
advertisement and selects optimal routes between ASs rather than discovering or calculating routes.
• BGP uses Transport Control Protocol (TCP) as the transport layer protocol, which enhances BGP
reliability.
■ BGP selects inter-AS routes, which poses high requirements on stability. Therefore, using TCP
enhances BGP's stability.
■ BGP peers must be logically connected through TCP. The destination port number is 179 and the
local port number is a random value.
• When routes are updated, BGP transmits only the updated routes, which reduces bandwidth
consumption during BGP route distribution. Therefore, BGP is applicable to the Internet where a large
number of routes are transmitted.
■ Between ASs: BGP routes carry information about the ASs along the path. The routes that carry the
local AS number are discarded to prevent inter-AS loops.
■ Within an AS: BGP does not advertise routes learned in an AS to BGP peers in the AS to prevent
intra-AS loops.
2022-07-08 1701
Feature Description
• BGP provides many routing policies to flexibly select and filter routes.
• BGP provides a mechanism that prevents route flapping, which effectively enhances Internet stability.
BGP4+ Definition
As a dynamic routing protocol used between ASs, BGP4+ is an extension of BGP.
Traditional BGP4 manages IPv4 routing information but does not support the inter-AS transmission of
packets encapsulated by other network layer protocols (such as IPv6).
To support IPv6, BGP4 must have the additional ability to associate an IPv6 protocol with the next hop
information and network layer reachable information (NLRI).
Two NLRI attributes that were introduced to BGP4+ are as follows:
• Multiprotocol Reachable NLRI (MP_REACH_NLRI): carries the set of reachable destinations and the next
hop information used for packet forwarding.
The Next_Hop attribute in BGP4+ is in the format of an IPv6 address, which can be either a globally unique
IPv6 address or a next hop link-local address.
Using multiple protocol extensions of BGP4, BGP4+ is applicable to IPv6 networks without changing the
messaging and routing mechanisms of BGP4.
Purpose
BGP transmits route information between ASs. It, however, is not required in all scenarios.
2022-07-08 1702
Feature Description
• On the network shown in Figure 1, users need to be connected to two or more ISPs. The ISPs need to
provide all or part of the Internet routes for the users. Routers, therefore, need to select the optimal
route through the AS of an ISP to the destination based on the attributes carried in BGP routes.
• Users need to transmit VPN routes through a Layer 3 VPN. For details, see the HUAWEI NE40E-M2
series Feature Description - VPN.
• Users need to transmit multicast routes and construct a multicast topology. For details, see the HUAWEI
NE40E-M2 series Feature Description - IP Multicast.
• The ISP does not need to provide Internet routes for users.
When BGP runs within an AS, it is called IBGP; however, when it runs between ASs, it is called EBGP.
2022-07-08 1703
Feature Description
• Peer: BGP speakers that exchange messages with each other are called peers.
BGP Messages
BGP runs by sending five types of messages: Open, Update, Notification, Keepalive, and Route-refresh.
• Open: first message sent after a TCP connection is set up. An Open message is used to set up a BGP
peer relationship. After a peer receives the Open message and negotiation between the local device and
peer succeeds, the peer sends a Keepalive message to confirm and maintain the peer relationship. Then,
the peers can exchange Update, Notification, Keepalive, and Route-refresh messages.
• Update: This type of message is used to exchange routes between peers. An Update message can
advertise multiple reachable routes with the same attributes and can be used to delete multiple
unreachable routes.
■ An Update message can be used to advertise multiple reachable routes that share the same set of
attributes. These attributes are applicable to all destinations (expressed by IP prefixes) in the
network layer reachability information (NLRI) field of the Update message.
■ An Update message can be used to withdraw multiple unreachable routes. Each route is identified
2022-07-08 1704
Feature Description
by its destination address (using the IP prefix), which identifies the routes previously advertised
between BGP speakers.
■ An Update message can be used only to delete routes. In this case, it does not need to carry the
route attributes or NLRI. When an Update message is used only to advertise reachable routes, it
does not need to carry information about routes to be withdrawn.
• Notification: If error conditions are detected, BGP sends Notification messages to its peers. The BGP
connections are then torn down immediately.
• Keepalive: BGP periodically sends Keepalive messages to peers to ensure the validity of BGP
connections.
• Route-refresh: This type of message is used to request that peers re-send all reachable routes to the
local device.
If all BGP devices are enabled with the route-refresh capability and an import routing policy changes,
the local device sends Route-refresh messages to its peers. Upon receipt, the peers re-send their routing
information to the local device. This ensures that the local BGP routing table is dynamically updated
and the new routing policy is used without tearing down BGP connections.
• In the Idle state, BGP denies all connection requests. This is the initial status of BGP.
• In the Connect state, BGP decides subsequent operations after a TCP connection is established.
• In the Active state, BGP attempts to set up a TCP connection. This is the intermediate status of BGP.
• In the OpenSent state, BGP is waiting for an Open message from the peer.
• In the Established state, BGP peers can exchange Update, Route-refresh, Keepalive, and Notification
messages.
The BGP peer relationship can be established only when both BGP peers are in the Established state. Both
peers send Update messages to exchange routes.
BGP Processing
• BGP adopts TCP as its transport layer protocol. Therefore, a TCP connection must be available between
the peers. BGP peers negotiate parameters by exchanging Open messages to establish a BGP peer
relationship.
• After the peer relationship is established, BGP peers exchange BGP routing tables. BGP does not require
2022-07-08 1705
Feature Description
a periodic update of its routing table. Instead, Update messages are exchanged between peers to
update their routing tables incrementally if BGP routes change.
• BGP sends Keepalive messages to maintain the BGP connection between peers.
• If BGP detects an error (for example, it receives an error message), BGP sends a Notification message to
report the error, and the BGP connection is torn down accordingly.
BGP Attributes
BGP route attributes are a set of parameters that describe specific BGP routes, and BGP can filter and select
routes based on these attributes. BGP route attributes are classified into the following four types:
• Well-known mandatory: This type of attribute can be identified by all BGP devices and must be carried
in Update messages. Without this attribute, errors occur in the routing information.
• Well-known discretionary: This type of attribute can be identified by all BGP devices. As this type of
attribute is optional, it is not necessarily carried in Update messages.
• Optional transitive: This indicates the transitive attribute between ASs. A BGP device may not recognize
this type of attribute, but will still accept messages carrying it and advertise them to other peers.
• Optional non-transitive: If a BGP device does not recognize this type of attribute, the device ignores it
and does not advertise messages carrying it to other peers.
■ IGP: This type of attribute has the highest priority. IGP is the Origin attribute for routes obtained
through an IGP in the AS from which the routes originate. For example, the Origin attribute of the
routes imported to the BGP routing table using the network command is IGP.
■ EGP: This type of attribute has the second highest priority. The Origin attribute of the routes
obtained through EGP is EGP.
■ Incomplete: This type of attribute has the lowest priority and indicates the routes learned through
other modes. For example, the Origin attribute of the routes imported by BGP using the import-
route command is Incomplete.
■ When advertising the route beyond the local AS, the BGP speaker adds the local AS number to the
AS_Path list and then advertises this attribute to peer Routers through Update messages.
■ When advertising the route within the local AS, the BGP speaker creates an empty AS_Path list in
2022-07-08 1706
Feature Description
an Update message.
When a BGP speaker advertises a route learned from the Update message of another BGP speaker:
■ When advertising the route beyond the local AS, the BGP speaker adds the local AS number to the
far left side of the AS_Path list. From the AS_Path attribute, the BGP device that receives the route
learns the ASs through which the route passes to the destination. The number of the AS closest to
the local AS is placed on the far left side of the list, and the other AS numbers are listed next to the
former in sequence.
■ When advertising the route within the local AS, the BGP speaker does not change its AS_Path
attribute.
The AS_Path attribute has four types: AS_Sequence, AS_Set, AS_Confed_Sequence, and AS_Confed_Set.
■ AS_Sequence: ordered set of ASs that the route in an Update message has traversed to reach the
destination.
■ AS_Set: unordered set of ASs that the route in an Update message has traversed to reach the
destination. The AS_Set attribute is used in route summarization scenarios. After route
summarization, the device records the unordered set of AS numbers because it cannot sequence
the numbers of ASs through which specific routes pass. Regardless of how many AS numbers an
AS_Set contains, BGP considers the AS_Set length to be 1 during route selection.
■ AS_Confed_Sequence: ordered set of member ASs in the local confederation that an Update
message has traversed.
■ AS_Confed_Set: unordered set of member ASs in the local confederation that an Update message
has traversed. This type is primarily used for route summarization in a confederation.
The AS_Confed_Sequence and AS_Confed_Set attributes are used to prevent routing loops and select
routes among the member ASs in a confederation.
■ When advertising a route to an EBGP peer, a BGP speaker sets the Next_Hop of the route to the
address of the local interface used to establish the EBGP peer relationship.
■ When advertising a locally generated route to an IBGP peer, a BGP speaker sets the Next_Hop of
the route to the address of the local interface used to establish the IBGP peer relationship.
■ When advertising a route learned from an EBGP peer to an IBGP peer, a BGP speaker does not
change the Next_Hop of the route.
2022-07-08 1707
Feature Description
enters an AS. If the Router running BGP obtains multiple routes from different EBGP peers and these
routes have the same destination but different next hops, the device selects the route with the smallest
MED value.
• Open Message
• Update Message
• Notification Message
• Keepalive Message
• Route-refresh Message
Marker 16 octets Indicates whether the information synchronized between BGP peers is
2022-07-08 1708
Feature Description
Length 2 octets Indicates the total length of a BGP message (including the header), in
(unsigned octets. The length ranges from 19 octets to 4096 octets.
integer)
Type 1 octet Indicates the BGP message type, which has five values.
(unsigned 1: Open
integer) 2: Update
3: Notification
4: Keepalive
5: Route-refresh
Open Message
Open messages are used to establish BGP connections. The value of the Type field in the header of an Open
message is 1. Figure 2 shows the format of an Open message.
Table 2 Description of each field in the Open message without the header
Version 1 octet Indicates the BGP version number. For BGP-4, the value of the field is 4.
(unsigned
integer)
2022-07-08 1709
Feature Description
Hold Time 2 octets Indicates the hold time set by the message sender, in seconds. BGP peers
(unsigned use this field to negotiate the interval at which Keepalive or Update
integer) messages are sent so that the peers can maintain the connection
between them. Upon receipt of an Open message, the finite state
machine (FSM) of a BGP speaker must compare the locally configured
hold time with that carried in the received Open message. The FSM uses
the smaller value as the negotiation result. The value is greater than or
equal to 3. A value of 0 indicates that no Keepalive messages are sent.
The default value is 180.
Opt Parm Len 1 octet Indicates the length of the Optional Parameters field. If the value is 0,
(unsigned no optional parameters are used.
integer)
Optional Variable Indicates a list of optional BGP parameters, with each one representing
Parameters a unit in TLV format.
0 7 15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...
| Parm. Type | Parm. Length | Parameter Value (variable)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...
2022-07-08 1710
Feature Description
+------------------------------+
AFI: is short for address family identifier and occupies 2 octets. AFI is
used with the subsequent AFI (SAFI) to determine the relationship
between the network layer protocol and IP address. The encoding mode
is the same as that in multiprotocol extensions. The value complies with
the address family numbers defined in the related RFC protocol.
Res: is reserved and occupies 1 octet. This field is ignored by the
interface that receives the message. The value must be set to 0.
SAFI: occupies 1 octet. SAFI is used with AFI to determine the
relationship between the network layer protocol and IP address. The
encoding mode is the same as that in multiprotocol extensions. The
value complies with the address family numbers defined in the related
RFC protocol.
If the value of Capability Code is 2:
The route-refresh capability is supported. The value of Capability Length
is 0, and Capability Value is omitted.
Devices can process Route-refresh messages only after the route-refresh
capability is negotiated successfully. By default, the IPv4 unicast and
route-refresh capabilities are supported.
Update Message
Update messages are used to transfer routing information between BGP peers. The value of the Type field in
the header of an Update message is 2. Figure 3 shows the format of an Update message without the header.
2022-07-08 1711
Feature Description
Table 3 Description of each field in the Update message without the header
Withdrawn 2 octets Indicates the length of the Withdrawn Routes field, in octets. If the value
Routes Length (unsigned is 0, the Withdrawn Routes field is omitted.
integer)
Withdrawn Variable Contains a list of routes to be withdrawn. Each entry in the list contains
Routes the Length (1 octet) and Prefix (length-variable) fields.
Length: indicates the mask length of the route to be withdrawn. The
value 0 indicates a mask length that matches all routes.
Prefix: contains an IP address prefix, followed by the minimum number
of trailing bits needed to make the end of the field fall on an octet
boundary. For example, consider the withdrawal of the route
192.168.200.200. The Prefix (in hexadecimal encoding) of the route
varies according to different mask lengths:
Total Path 2 octets Indicates the total length of the Path Attributes field. If the value is 0,
Attribute Length (unsigned both the Network Layer Reachability Information field and the Path
integer) Attributes field are omitted in the Update message.
Path Attributes Variable Indicates a list of path attributes in the Update message. The type codes
of the path attributes are arranged in ascending order. Each path
attribute is encoded as a TLV (<attribute type, attribute length, attribute
value>) of variable length.
2022-07-08 1712
Feature Description
octet Flags field (unsigned integer) and the one-octet Type Code field
(unsigned integer).
Attr.Flags: occupies one octet (eight bits) and indicates the attribute
flag. The meaning of each bit is as follows:
O (Optional bit): defines whether the attribute is optional. The value 1
indicates an optional attribute, whereas the value 0 indicates a well-
known attribute.
T (Transitive bit): Defines whether the attribute is transitive. For an
optional attribute, the value 1 indicates that the attribute is transitive,
whereas the value 0 indicates that the attribute is non-transitive. For a
well-known attribute, the value must be set to 1.
P (Partial bit): Defines whether the information in an optional-transitive
attribute is partial. If the information is partial, P must be set to 1; if the
information is complete, P must be set to 0. For well-known attributes
and for optional non-transitive attributes, P must be set to 0.
E (Extended Length bit): defines whether the length (Attr. Length) of the
attribute needs to be extended. If the attribute length does not need to
be extended, E must be set to 0 and the attribute length is 1 octet. If the
attribute length needs to be extended, E must be set to 1 and the
attribute length is 2 octets.
U (Unused bits): Indicates that the lower-order four bits of Attr. Flags
are not used. These bits are ignored on receipt and must be set to 0.
Attr.Type Code: Indicates the attribute type code and occupies 1 octet
(unsigned integer). For details about the type codes, see Table 4.
Attr.Value: Varies with Attr.Type Code.
Network Layer Variable Indicates a list of IP address prefixes in the Update message. Each
Reachability address prefix in the list is encoded as a 2-tuple LV (<prefix length, the
Information prefix of the reachable route>). The encoding mode is the same as that
(NLRI) used for Withdrawn Routes.
2022-07-08 1713
Feature Description
4: Multi_Exit_Disc MED that is used to identify the optimal route for the traffic to enter an AS.
5: Local_Pref Local_Pref that is used to identify the optimal route for the traffic to leave an
AS.
6: Atomic_Aggregate The BGP speaker selects the summary route rather than a specific route.
7: Aggregator Router ID and AS number of the device that performs route summarization.
10: Cluster_List List of the RRs through which the reflected route passes.
Notification Message
Notification messages are used to notify BGP peers of errors in a BGP process. The value of the Type field in
the header of a Notification message is 3. Figure 6 shows the format of a Notification message without the
header.
Table 5 Description of each field in the Notification message without the header
2022-07-08 1714
Feature Description
Table 5 Description of each field in the Notification message without the header
Error code 1 octet Indicates an error type. The value 0 indicates a non-specific error type.
For details about the error codes, see Table 6.
Error subcode 1 octet Provides further information about the nature of a reported error.
5: Authentication failure.
7: Unsupported capability.
2022-07-08 1715
Feature Description
7: AS routing loop.
2: Administrative shutdown.
4: Administrative reset.
7: Connection conflict.
8: Resource shortage.
Keepalive Message
Keepalive messages are used to maintain BGP connections. The value of the Type field in the header of a
Keepalive message is 4. Each Keepalive message has only a header; it does not have a data portion.
Therefore, the total length of each Keepalive message is fixed at 19 octets. Figure 7 shows the format of a
Keepalive message without the header.
2022-07-08 1716
Feature Description
Table 7 Description of each field in the Keepalive message without the header
Marker 16 octets Indicates whether the information synchronized between BGP peers is
complete. This field is used for calculation in BGP authentication. If no
authentication is used, the field is set to all ones in binary format or all
Fs in hexadecimal notation.
Length 2 octets Indicates the total length of a BGP message (including the header), in
octets. The length ranges from 19 octets to 4096 octets.
Type 1 octet Indicates the message type. The value of this field is an integer ranging
from 1 to 5, indicating Open, Update, Notification, Keepalive, and Route-
refresh messages, respectively. The value of the Type field in each
Keepalive message is 4.
Route-refresh Message
Route-refresh messages are used to dynamically request a BGP route advertiser to re-send Update messages.
The value of the Type field in the header of a Route-refresh message is 5. Figure 8 shows the format of a
Route-refresh message without the header.
Table 8 Description of each field in the Route-refresh message without the header
AFI 2 octets Indicates the address family ID, which is defined the same as that in
(unsigned Open messages.
integer)
2022-07-08 1717
Feature Description
Res. 1 octet Must be all zeros. The field is ignored when a Route-refresh message is
(unsigned received.
integer)
For details about route import, see Route Import; for details about BGP route selection rules, see BGP Route
Selection; for details about route summarization, see Route Summarization; for details about advertising
routes to BGP peers, see BGP Route Advertisement.
For details about import or export policies, see "Routing Policies" in NE40E Feature Description — IP Routing
.
For details about BGP load balancing, see Load Balancing Among BGP Routes.
2022-07-08 1718
Feature Description
Route Import
BGP itself cannot discover routes. Therefore, it needs to import other protocol routes, such as IGP routes or
static routes, to the BGP routing table. Imported routes can be transmitted within an AS or between ASs.
• The Import mode enables BGP to import routes by protocol type, such as RIP, OSPF, IS-IS, static routes,
and direct routes.
• The network mode imports a route with the specified prefix and mask to the BGP routing table, which is
more precise than the import mode.
1. Prefers the routes that do not recurse to an SRv6 TE Policy in the Graceful Down state (the SRv6 TE
Policy is in the delayed deletion state).
2. Prefers routes in descending order of Valid, Not Found, and Invalid after BGP origin AS validation
results are applied to route selection in a scenario where the device is connected to a Resource Public
Key Infrastructure (RPKI) server.
Locally originated routes include routes imported using the network or import-route command, as
well as manually and automatically generated summary routes.
b. Prefers a route obtained using the aggregate command over a route obtained using the
summary automatic command.
c. Prefers a route imported using the network command over a route imported using the import-
route command.
7. Prefers a route that carries the Accumulated Interior Gateway Protocol Metric (AIGP) attribute.
2022-07-08 1719
Feature Description
• The priority of a route that carries the AIGP attribute is higher than the priority of a route that
does not carry the AIGP attribute.
• If two routes both carry the AIGP attribute, the route with a smaller AIGP attribute value plus IGP
metric of the recursive next hop is preferred over the other route.
• The AS_CONFED_SEQUENCE and AS_CONFED_SET are not included in the AS_Path length.
• During route selection, a device assumes that an AS_SET carries only one AS number regardless of
the actual number of ASs it carries.
• If the bestroute as-path-ignore command is run, BGP no longer compares the AS_Path attribute.
After the load-balancing as-path-ignore command is run, the routes with different AS_Path values can load-
balance traffic.
9. Prefers the route with the Origin type as IGP, EGP, and Incomplete in descending order.
If the bestroute med-plus-igp command is run, BGP preferentially selects the route with the smallest sum of
MED multiplied by a MED multiplier and IGP cost multiplied by an IGP cost multiplier.
• BGP compares the MEDs of only routes from the same AS (excluding confederation sub-ASs).
MEDs of two routes are compared only when the first AS number in the AS_Sequence (excluding
AS_Confed_Sequence) of one route is the same as its counterpart in the other route.
• If a route does not carry MED, BGP considers its MED as the default value (0) during route
selection. If the bestroute med-none-as-maximum command is run, BGP considers its MED as
the largest MED value (4294967295).
• If the compare-different-as-med command is run, BGP compares MEDs of routes even when the
routes are received from peers in different ASs. If the ASs use the same IGP and route selection
mode, you can run this command. Otherwise, do not run this command because a loop may
occur.
• If the deterministic-med command is run, routes are no longer selected in the sequence in which
they are received.
11. Prefers local VPN routes, LocalCross routes, and RemoteCross routes in descending order.
LocalCross routes indicate the routes that are leaked between local VPN instances or routes imported between
public network and VPN instances.
If the ERT of a VPNv4 route in the routing table of a VPN instance on a PE matches the IRT of another
2022-07-08 1720
Feature Description
VPN instance on the PE, the VPNv4 route is added to the routing table of the latter VPN instance. This
route is called a LocalCross route. If the ERT of a VPNv4 route learned from a remote PE matches the
IRT of a VPN instance on the local PE, the VPNv4 route is added to the routing table of that VPN
instance. This route is called a RemoteCross route.
12. Prefers EBGP routes over IBGP routes among the routes learned from peers. In the VPNv4, EVPN, and
VPNv6 address families, the routes sent by the local VRF take precedence over the routes learned from
peers.
13. Prefers the VPNv4, EVPN, and VPNv6 routes learned from peers.
If the peer high-priority command is run, the device preferentially selects the VPNv4, EVPN, and
VPNv6 routes learned from IPv4 or IPv6 peers.
14. Prefers the routes that are learned from VPNv4 or VPNv6 peers and are then leaked to a VPN instance
and that carry IPv4 or IPv6 next hop addresses.
If the bestroute nexthop-priority ipv4 command is run, the device preferentially selects the routes
that are learned from VPNv4 or VPNv6 peers and are then leaked to a VPN instance and that carry
IPv4 next hop addresses.
If the bestroute nexthop-priority ipv6 command is run, the device preferentially selects the routes
that are learned from VPNv4 or VPNv6 peers and are then leaked to a VPN instance and that carry
IPv6 next hop addresses.
15. Prefers the route that recurses to an IGP route with the smallest cost.
If the bestroute igp-metric-ignore command is run, BGP no longer compares the IGP cost.
By default, Cluster_List takes precedence over Router ID during BGP route selection. To enable Router ID to take
precedence over Cluster_List during BGP route selection, run the bestroute routerid-prior-clusterlist command.
17. Prefers the route advertised by the Router with the smallest router ID.
After the bestroute router-id-ignore command is run, BGP does not compare router IDs during route
selection.
If each route carries an Originator_ID, the originator IDs rather than router IDs are compared during route
selection. The route with the smallest Originator_ID is preferred.
18. Prefers the route learned from the peer with the smallest IP address.
19. If BGP Flow Specification routes are configured locally, the first configured BGP Flow Specification
route is preferentially selected.
21. Prefers the Add-Path route with the smallest recv pathID.
2022-07-08 1721
Feature Description
23. Prefers locally received routes over the routes imported between VPN and public network instances.
Route Summarization
On a large-scale network, the BGP routing table can be very large. Route summarization can reduce the size
of the routing table.
Route summarization is the process of summarizing specific routes with the same IP prefix into a summary
route. After route summarization, BGP advertises only the summary route rather than all specific routes to
BGP peers.
2022-07-08 1722
Feature Description
• Automatic route summarization: takes effect on the routes imported by BGP. With automatic route
summarization, the specific routes for the summarization are suppressed, and BGP summarizes routes
based on the natural network segment and sends only the summary route to BGP peers. For example,
10.1.1.1/32 and 10.2.1.1/32 are summarized into 10.0.0.0/8, which is a Class A address.
• Manual route summarization: takes effect on routes in the local BGP routing table. With manual route
summarization, users can control the attributes of the summary route and determine whether to
advertise the specific routes.
IPv4 supports both automatic and manual route summarization, whereas IPv6 supports only manual route
summarization.
• When there are multiple valid routes, a BGP speaker advertises only the optimal route to its peers.
• A BGP speaker advertises the routes learned from EBGP peers to all BGP peers, including EBGP peers
and IBGP peers.
• A BGP speaker does not advertise the routes learned from an IBGP peer to other IBGP peers.
• Whether a BGP speaker advertises the routes obtained from an IBGP peer to its EBGP peers depends on
the BGP-IGP synchronization state.
• A BGP speaker advertises all BGP optimal routes to new peers after peer relationships are established.
2022-07-08 1723
Feature Description
Community attributes simplify the application, maintenance, and management of route-policies and allow a
group of BGP devices in multiple ASs to share a route-policy. The community attribute is a route attribute. It
is transmitted between BGP peers and is not restricted by the AS. Before advertising a route with the
community attribute to peers, a BGP peer can change the original community attribute of this route.
The peers in a peer group share the same policy, while the routes with the same community attribute share
the same policy.
In addition to well-known community attributes, you can use a community filter to define extended
community attributes to flexibly control route-policies.
Usage Scenario
On the network shown in Figure 2, EBGP connections are established between DeviceA and DeviceB, and
between DeviceB and DeviceC. If the No_Export community attribute is configured on DeviceA in AS 10 and
DeviceA sends a route with the community attribute to DeviceB in AS20, DeviceB does not advertise the
2022-07-08 1724
Feature Description
Networking Application
On the network shown in Figure 1, Device B establishes an EBGP peer relationship with each of Device A and
Device C. By setting the Large-Community attribute to 20:4:30 on Device B, you can disable the routes on
Device A from being advertised to AS30.
2022-07-08 1725
Feature Description
10.9.2.6 AIGP
Background
The Accumulated Interior Gateway Protocol Metric (AIGP) attribute is an optional non-transitive Border
Gateway Protocol (BGP) path attribute. The attribute type code assigned by the Internet Assigned Numbers
Authority (IANA) for the AIGP attribute is 26.
Routing protocols, such as IGPs that have been designed to run within a single administrative domain,
generally assign a metric to each link, and then choose the path with the smallest metric as the optimal
path between two nodes. BGP, designed to provide routing over a large number of independent
administrative domains, does not select paths based on metrics. If a single administrative domain (AIGP
domain) consists of several BGP networks, it is desirable for BGP to select paths based on metrics, just as an
IGP does. The AIGP attribute enables BGP to select paths based on metrics.
Related Concepts
An AIGP administrative domain is a set of autonomous systems (ASs) in a common administrative domain.
The AIGP attribute takes effect only in an AIGP administrative domain.
Implementation
AIGP Attribute Origination
The AIGP attribute can be added to a route only through a route-policy. You can configure a BGP route to
add an AIGP value when routes are imported, received, or sent. If no AIGP value is configured, BGP routes do
not contain AIGP attributes. Figure 1 shows the typical AIGP application networking.
2022-07-08 1726
Feature Description
BGP cannot transmit the AIGP attribute outside the AIGP domain. If the AIGP attribute of a route changes,
BGP sends Update packets for BGP peers to update information about this route. In a scenario in which A, a
BGP speaker, sends a route that carries the AIGP attribute to B, its BGP peer:
• If B does not support the AIGP attribute or does not have the AIGP capability enabled for a peer, B
ignores the AIGP attribute and does not transmit the AIGP attribute to other BGP peers.
• If B supports the AIGP attribute and has the AIGP capability enabled for a peer, B can modify the AIGP
attribute of the route only after B has set itself to be the next hop of the route. The rules for modifying
the AIGP attribute are as follows:
■ If the BGP peer relationship between A and B is established over an IGP route, or a static route that
does not require recursion, B uses the metric value of the IGP or static route plus the received AIGP
attribute value as the new AIGP attribute value and sends the new AIGP attribute to other peers.
■ If the BGP peer relationship between A and B is established over a BGP route, or a static route that
requires recursion, route recursion is performed when B sends data to A. Each route recursion
involves a recursive route. B uses the sum of the metric values of the recursive routes plus the
received AIGP attribute value as the new AIGP attribute value and sends the new AIGP attribute to
other peers.
If multiple active routes exist between two nodes, BGP will make a route selection decision. If BGP cannot
determine the optimal route based on PrefVal, Local_Pref, and Route-type, BGP compares the AIGP
attributes of these routes. The rules are as follows:
2022-07-08 1727
Feature Description
• If BGP cannot determine the optimal route based on Route-type, BGP compares the AIGP attributes. If
this method still cannot determine the optimal route, BGP proceeds to compare the AS_Path attributes.
• The priority of a route that carries the AIGP attribute is higher than the priority of a route that does not
carry the AIGP attribute.
• If all routes carry the AIGP attribute, the route with the smallest AIGP attribute value plus the IGP
metric value of the recursive next hop is preferred over the other routes.
Usage Scenario
The AIGP attribute is used to select the optimal route in an AIGP administrative domain.
The AIGP attribute can be transmitted between BGP unicast peers as well as between BGP VPNv4/VPNv6
peers. Transmitting the AIGP attribute between BGP VPNv4/VPNv6 peers allows L3VPN traffic to be
transmitted along the path with the smallest AIGP attribute value.
On the inter-AS IPv4 VPN Option B network shown in Figure 2, BGP VPNv4 peer relationships are established
between PEs and ASBRs. Two paths with different IGP costs exist between PE1 and PE2. If you want the PEs
to select a path with a lower IGP cost to carry traffic, you can enable the AIGP capability in the BGP VPNv4
address family view and configure a route policy to add the same AIGP initial value to BGP VPNv4 routes.
Take PE1 as an example. After this configuration, PE1 receives two BGP VPNv4 routes destined for CE2 from
ASBR1 and ASBR2, and the BGP VPNv4 route sent by ASBR1 has a lower AIGP value. If higher-priority route
selection conditions of the routes are the same, PE1 preferentially selects the BGP VPNv4 route with a lower
AIGP value so that traffic can be transmitted over the PE1 -> ASBR1 -> ASBR3 -> PE2 path.
Benefits
2022-07-08 1728
Feature Description
After the AIGP attribute is configured in an AIGP administrative domain, BGP selects paths based on metrics,
just as an IGP. Consequently, all devices in the AIGP administrative domain use the optimal routes to
forward data.
Background
As user networks and the scope of network services continue to expand, load-balancing techniques are used
to improve bandwidth between nodes. If tunnels are used for load balancing, transit nodes (P) obtain IP
content carried in MPLS packets as a hash key. If a transit node cannot obtain the IP content from MPLS
packets, the transit node can only use the top label in the MPLS label stack as a hash key. The top label in
the MPLS label stack cannot differentiate underlying-layer protocols in packets in detail. As a result, the top
MPLS labels are not distinguished when being used as hash keys, resulting in load imbalance. Per-packet
load balancing can be used to prevent load imbalance but results in packets being delivered out of sequence.
This drawback adversely affects user experience. To address these problems, use the entropy label capability
to improve load balancing performance.
Implementation
On the network shown in Figure 1, load balancing is performed on ASBRs (transit nodes) and the result is
uneven. To achieve even load balancing, you can configure the entropy label capability of the BGP LSP.
The entropy label is generated by the ingress solely for the purpose of load balancing. To help the egress
LSR distinguish the entropy label generated by the ingress LSR from application labels, label 7 is added
before an entropy label in the MPLS label stack.
2022-07-08 1729
Feature Description
The ingress generates an entropy label and encapsulates it into the MPLS label stack. If packets are not
encapsulated with MPLS labels on the ingress, the ingress can easily obtain IP or Layer 2 protocol data for
use as a hash key. If the ingress detects the entropy label capability enabled for tunnels, the ingress uses IP
information carried in packets to compute an entropy label, adds it to the MPLS label stack, and advertises it
to an ASBR. The ASBR uses the entropy label as a hash key to load-balance traffic and does not need to
parse IP data inside MPLS packets.
The entropy label is pushed into packets by the ingress and removed by the egress. Therefore, the egress
needs to notify the ingress of the support for the entropy label capability.
• Egress: If the egress can parse an entropy label, the egress adds the entropy label to Path Attributes in
BGP routes and then advertises the BGP routes to notify upstream nodes, including the ingress of the
local entropy label capability.
• Transit node: A transit node needs to be enabled with the entropy label advertisement capability so that
the transit node can advertise the BGP routes to notify upstream nodes of the local entropy label
capability.
• Ingress: determines whether to add an entropy label into packets to improve load balancing based on
the entropy label capability advertised by the egress.
Usage Scenario
• In Figure 1, entropy labels are used when load balancing is performed among transit nodes.
• On the network shown in Figure 2, the BGP labeled routes exchanged between PE1 and ASBR1 are sent
through an RR. If the RR needs to advertise the entropy label attribute, BGP LSP Entropy label attribute
advertisement needs to be enabled between the RR and PE1, and between the RR and ASBR1. The RR
also needs to be enabled to forward BGP LSP entropy labels. If the RR is not enabled to forward BGP
2022-07-08 1730
Feature Description
LSP entropy labels, it discards the BGP LSP entropy labels carried in routes.
Benefits
Entropy labels help achieve more even load balancing.
1. After BGP routing loop detection is enabled, the local device generates a random number, adds the
Loop-detection attribute to the routes to be advertised to EBGP peers or the locally imported routes to
be advertised to peers, and encapsulates the attribute with the random number and the local vrfID.
The local vrfID is automatically generated and globally unique. In the public network scenario, the
vrfID is 0. In the private network scenario, the vrfID is automatically generated after a VPN instance is
created. When OSPF/IS-IS routes are imported to BGP, the routing loop attributes of the OSPF/IS-IS
routes are inherited.
2. When the local device receives a route with the Loop-detection attribute from another device, the
local device performs the following checks:
• Compares the Loop-detection attribute of the received route with the combination of the vrfID
and random number that are locally stored.
■ If they are the same, the local device determines that a routing loop occurs.
2022-07-08 1731
Feature Description
■ If they are different, the local device determines that no routing loop occurs, and the route
participates in route selection.
■ If a route has a routing loop record, a routing loop once occurred. Such a route is considered
to be a looped route even if it does not carry the routing loop attribute of the local device.
■ If there is no routing loop record and the Loop-detection attribute of the received route is
different from the combination of the vrfID and random number that are locally stored, the
local device determines that no routing loop occurs, and the route participates in route
selection normally.
■ If there is no routing loop record but the Loop-detection attribute of the received route is the
same as the combination of the vrfID and random number that are locally stored, the local
device determines that a routing loop occurs.
3. If a routing loop is detected and the looped route is selected, the local device reports an alarm to
notify the user of the routing loop risk, enters the loop prevention state, and performs the following
operations:
• Preferentially selects non-looped routes when the BGP routing table contains multiple routes with
the same destination as the looped route.
• Increases the MED value and reduces the local preference of the looped route when advertising it.
4. After the device processes a looped route, the routing loop may be resolved. If the routing loop
persists, you need to locate the cause of the loop and resolve the loop. As the device cannot detect
when the loop risk is eliminated, the routing loop alarm will not be cleared automatically. To
manually clear the alarm after the loop risk is eliminated, you can run a related command.
Implementation
The Loop-detection attribute is a private BGP attribute. It uses a reserved value (type=255) to implement
routing loop detection in some scenarios. Figure 1 shows the Loop-detection attribute TLV, and Table 1
describes the fields in it.
2022-07-08 1732
Feature Description
Currently, the Loop-detection attribute is supported only in the BGP IPv4 public network, BGP IPv4 private network, BGP
IPv6 public network, BGP IPv6 private network, BGP VPNv4, and BGP VPNv6 address families.
Field Description
2022-07-08 1733
Feature Description
Field Description
Attr.Type Code Attribute type, which occupies one byte. The value is
an unsigned integer, with the initial value being
0xFF.
BGP also defines a sub-TLV for Attr.Value to identify the device that detects a routing loop. Figure 2 shows
the sub-TLV, and Table 2 describes the fields in the sub-TLV.
A maximum of four Loop-detection attribute sub-TLVs can be carried. If more than four sub-TLVs exist, they are
overwritten according to the first-in-first-out rule.
Field Description
Attr.Value 0 31 63
+-------------+-------------+
| vrfID | Random number |
+-------------+-------------+
2022-07-08 1734
Feature Description
Field Description
NOTE:
Application Scenarios
BGP routing loops may occur in the following scenarios. You are advised to enable BGP routing loop
detection during network planning.
• On the network shown in Figure 3, DeviceA and DeviceC belong to AS 100, and DeviceB belongs to AS
200. An export policy is configured on DeviceB to delete the original AS numbers from the routes to be
advertised to DeviceC. After receiving a BGP route that originates from DeviceC, DeviceA advertises the
route to DeviceB, which then advertises the route back to DeviceC. As a result, a BGP routing loop
occurs on DeviceC. After BGP routing loop detection is enabled on the entire network, DeviceC adds
Loop-detection attribute 1 to the BGP route (locally imported) before advertising the route to DeviceA.
After receiving the route, DeviceA adds Loop-detection attribute 2 to the route before advertising the
route to DeviceB (EBGP peer). After receiving the route, DeviceB adds Loop-detection attribute 3 to the
route before advertising the route to DeviceC (EBGP peer). After receiving the Loop-detection attributes,
DeviceC discovers that these attributes contain Loop-detection attribute 1 which was added by itself,
and then reports a routing loop alarm.
• On the network shown in Figure 4, DeviceA resides in AS 100; DeviceB resides in AS 200; DeviceC and
DeviceD reside in AS 300. An export policy is configured on DeviceD to delete the original AS numbers
from the routes to be advertised to DeviceB. In this scenario, a BGP routing loop occurs on DeviceB.
2022-07-08 1735
Feature Description
• On the network shown in Figure 5, the PE advertises a VPN route through VPN1, and then receives this
route through VPN1, indicating that a routing loop occurs on the PE.
• On the network shown in Figure 6, DeviceA, DeviceB, and DeviceC belong to AS 100. An IBGP peer
relationship is established between DeviceA and the RR, between the RR and DeviceB, and between the
RR and DeviceC. OSPF runs on DeviceB and DeviceC. DeviceB is configured to import BGP routes to
OSPF, and DeviceC is configured to import OSPF routes to BGP. An export policy is configured on
DeviceA to add AS numbers to the AS_Path attribute for the routes to be advertised to the RR. After
receiving a BGP route from DeviceA, the RR advertises this route to DeviceB. DeviceB then imports the
BGP route to convert it to an OSPF route and advertises the OSPF route to DeviceC. DeviceC then
imports the OSPF route to convert it to a BGP route and advertises the BGP route to the RR. When
comparing the route advertised by DeviceA and the route advertised by DeviceC, the RR prefers the one
advertised by DeviceC as it has a shorter AS_Path than that of the route advertised by DeviceA. As a
result, a stable routing loop occurs.
To address this problem, enable BGP routing loop detection on DeviceC. After BGP routing loop
detection is enabled, DeviceC adds Loop-detection attribute 1 to the BGP route imported from OSPF
2022-07-08 1736
Feature Description
and advertises the BGP route to the RR. After receiving this BGP route, the RR advertises it (carrying
Loop-detection attribute 1) to DeviceB. As OSPF routing loop detection is enabled by default, when the
BGP route is imported to become an OSPF route on DeviceB, the OSPF route inherits the routing loop
attribute of the BGP route and has an OSPF routing loop attribute added as well before the OSPF route
is advertised to DeviceC. Upon receipt of the OSPF route, DeviceC imports it to convert it to a BGP
route. Because BGP routing loop detection is enabled, the BGP route inherits the routing loop attributes
of the OSPF route. Upon receipt of the route, DeviceC finds that the received route carries its own
routing loop attribute and therefore determines that a routing loop has occurred. In this case, DeviceC
generates an alarm, and reduces the local preference and increases the MED value of the route before
advertising the route to the RR. After receiving the route, the RR compares this route with the route
advertised by DeviceA. Because the route advertised by DeviceC has a lower local preference and a
larger MED value, the RR preferentially selects the route advertised by DeviceA. The routing loop is then
resolved.
When the OSPF route is transmitted to DeviceC again, DeviceC imports it to convert it to a BGP route,
and the route carries only the OSPF routing loop attribute added by DeviceB. However, DeviceC still
considers the route as a looped route because the route has a routing loop record. In this case, the RR
does not preferentially select the route after receiving it from DeviceC. Then routes converge normally.
• When BGP is configured to advertise the default route, the Loop-detection attribute is not added to the default
route.
• When BGP Add-Path is configured, the Loop-detection attribute is not added to routes.
• When the route server function is configured, the Loop-detection attribute is not added to the routes advertised by
the server.
• The Loop-detection attribute is not added to the received routes to be advertised to IBGP peers.
2022-07-08 1737
Feature Description
the configurations of this peer group. If the configurations of the peer group change, the configurations of
all the peers in the group change accordingly. A large number of BGP peers may exist on a large-scale BGP
network. If many of the BGP peers need the same policies, some commands need to be run repeatedly for
each peer. To simplify the configuration, you can configure a static peer group. Each peer in a peer group
can be configured with unique policies to advertise and receive routes.
However, multiple BGP peers can change frequently on some BGP networks, causing the establishment of
BGP peer relationships to change accordingly. If you configure peers in static mode, you must frequently add
or delete peer configurations on the local device, which increases the maintenance workload. To address this
problem, configure the dynamic BGP peer function to enable BGP to listen for BGP connection requests from
a specified network segment, dynamically establish BGP peer relationships, and add these peers to the same
dynamic peer group. This spares you from adding or deleting BGP peer configurations in response to each
change in dynamic peers.
Application
On the network shown in Figure 1, an EBGP peer relationship is established between Device A and Device B
and between Device A and Device C, and an IBGP peer relationship is established between Device A and
Device D and between Device A and Device E.
Device B and Device C are on the same network segment (10.1.0.0/16). In this case, you can configure a
dynamic peer group on Device A to listen for BGP connection requests from this network segment. After the
dynamic peer group is configured, Device B and Device C are dynamically added to this peer group, and the
devices to be deployed on this network segment will also be dynamically added to the peer group when they
request to establish BGP peer relationships with Device A. This process helps reduce the network
maintenance workload. In addition, you can configure another dynamic peer group on Device A so that
Device D and Device E are dynamically added to this peer group.
2022-07-08 1738
Feature Description
In Figure 1, there are multiple BGP routers in AS 200. To reduce the number of IBGP connections, AS 200 is
divided into three sub-ASs: AS 65001, AS 65002, and AS 65003. In AS 65001, fully meshed IBGP connections
are established between the three routers.
BGP speakers outside a confederation such as Router F in AS 100, do not know the existence of the sub-ASs
(AS 65001, AS 65002, and AS 65003) in the confederation. The confederation ID is the AS number that is
used to identify the entire confederation. For example, AS 200 in Figure 1 is the confederation ID.
2022-07-08 1739
Feature Description
Application
After receiving routes from peers, an RR selects the optimal route based on BGP route selection rules and
advertises the optimal route to other peers based on the following rules:
• If the optimal route is from a non-client IBGP peer, the RR advertises the route to all clients.
• If the optimal route is from a client, the RR advertises the route to all non-clients and clients.
• If the optimal route is from an EBGP peer, the RR advertises the route to all clients and non-clients.
An RR is easy to configure because it only needs to be configured on the Router that needs to function as an
RR, and clients do not need to know whether they are clients.
On some networks, if fully meshed connections have already been established among clients of an RR, they
can exchange routing information directly. In this case, route reflection among the clients through the RR is
unnecessary and occupies bandwidth. For example, on the NE40E, route reflection through the RR can be
disabled, but the routes between clients and non-clients can still be reflected. By default, route reflection
between clients through the RR is enabled.
On the NE40E, an RR can change various attributes of BGP routes, such as the AS_Path, MED, Local_Pref,
and community attributes.
Originator_ID
Originator_ID and Cluster_List are used to detect and prevent routing loops.
The Originator_ID attribute is four bytes long and is generated by an RR. It carries the router ID of the route
originator in the local AS.
• When a route is reflected by an RR for the first time, the RR adds the Originator_ID attribute to this
route. The Originator_ID attribute is used to identify the Router that originates the route. If a route
already carries the Originator_ID attribute, the RR does not create a new one.
• After receiving the route, a BGP speaker checks whether the Originator_ID is the same as its router ID. If
Originator_ID is the same as its router ID, the BGP speaker discards this route.
2022-07-08 1740
Feature Description
Cluster_List
To prevent routing loops between ASs, a BGP Router uses the AS_Path attribute to record the ASs through
which a route passes. Routes with the local AS number are discarded by the Router. To prevent routing loops
within an AS, IBGP peers do not advertise routes learned from the local AS.
With RR, IBGP peers can advertise routes learned from the local AS to each other. However, the Cluster_List
attribute must be deployed to prevent routing loops within the AS.
An RR and its clients form a cluster. In an AS, each RR is uniquely identified by a Cluster_ID.
To prevent routing loops, the RR uses the Cluster_List attribute to record the Cluster_IDs of all RRs through
which a route passes.
Similar to an AS_Path, which records all the ASs through which a route passes, a Cluster_List is composed of
a series of Cluster_IDs and records all RRs through which a route passes. The Cluster_List is generated by the
RR.
• Before an RR reflects a route between its clients or between its clients and non-clients, the RR adds the
local Cluster_ID to the head of the Cluster_List. If a route does not carry any Cluster_List, the RR creates
one for the route.
• After the RR receives an updated route, it checks the Cluster_List of the route. If the RR finds that its
cluster ID is included in the Cluster_List, the RR discards the route. If its cluster ID is not included in the
Cluster_List, the RR adds its cluster ID to the Cluster_List and then reflects the route.
Backup RR
To enhance network reliability and prevent single points of failure, more than one route reflector needs to
be configured in a cluster. The route reflectors in the same cluster must share the same Cluster_ID to
prevent routing loops. Therefore, the same Cluster_ID must be configured for all RRs in the same cluster.
With backup RRs, clients can receive multiple routes to the same destination from different RRs. The clients
then apply BGP route selection rules to choose the optimal route.
Figure 2 Backup RR
2022-07-08 1741
Feature Description
On the network shown in Figure 2, RR1 and RR2 are in the same cluster. An IBGP connection is set up
between RR1 and RR2. The two RRs are non-clients of each other.
• If Client 1 receives an updated route from an external peer, Client 1 advertises the route to RR1 and
RR2 through IBGP.
• After receiving the updated route, RR1 adds the local Cluster_ID to the top of the Cluster_List of the
route and then reflects the route to other clients (Client 1, Client 2, and Client 3) and the non-client
(RR2).
• After receiving the reflected route, RR2 checks the Cluster_List and finds that its Cluster_ID is contained
in the Cluster_List. In this case, it discards the updated route and does not reflect it to its clients.
If RR1 and RR2 are configured with different Cluster_IDs, each RR receives both the routes from its clients
and the updated routes reflected by the other RR. Therefore, configuring the same Cluster_ID for RR1 and
RR2 reduces the number of routes that each RR receives and memory consumption.
The application of Cluster_List prevents routing loops among RRs in the same AS.
Multiple Clusters in an AS
Multiple clusters may exist in an AS. RRs are IBGP peers of each other. An RR can be configured as a client or
non-client of another RR. Therefore, the relationship between clusters in an AS can be configured flexibly.
For example, a backbone network is divided into multiple reflection clusters. Each RR has other RRs
configured as its non-clients, and these RRs are fully meshed. Each client establishes IBGP connections only
to the RR in the same cluster. In this manner, all BGP peers in the AS can receive reflected routes. Figure 3
shows the networking.
2022-07-08 1742
Feature Description
Hierarchical Reflector
Hierarchical reflectors are usually deployed if RRs need to be deployed. On the network shown in Figure 4,
the ISP provides Internet routes for AS 100. Two EBGP connections are established between the ISP and AS
100. AS 100 is divided into two clusters. The four Routers in Cluster 1 are core routers.
• Two Level-1 RRs (RR-1s) are deployed in Cluster 1, which ensures the reliability of the core layer of AS
100. The other two Routers in the core layer are clients of RR-1s.
2022-07-08 1743
Feature Description
In Figure 5, all PEs and RRs reside in the same AS, and peer relationships are established between each PE
and its RR and between RRs in both VPNv4 and VPN-Target address families; PE1 is a client of the level-1
RR1, and PE2 is a client of the level-1 RR2; RRR is a level-2 RR, with RR1 and RR2 as its clients; RT 1:1 is
configured on PE1 and PE2. PE1 receives a VPN route from CE1.
If no RR cluster ID loop is allowed, after RR1 and RR2 advertise the RT routes learned from PEs to RRR
(Level-2 RR), RRR implements route selection. If RRR selects the route learned from RR1, RRR advertises a
VPN ORF route to RR1 and RR2. The Cluster_List of the route includes the local cluster ID. As a result, RR1
discards the VPN ORF route. Consequently, RR1 does not have the RT filter of RRR, unable to guide VPNv4
peers to advertise routes. As a result, CE2 fails to learn routes from CE1. To address this problem, run the
peer allow-cluster-loop command in the BGP-VPN-Target address family view on RR1. In the command,
the peer address is set to the address of RRR. After the command is run, RR1 can receive the RT routes
advertised by RRR (Level-2 RR) and can guide VPNv4 peers to advertise routes.
2022-07-08 1744
Feature Description
Application
In some scenarios on the live network, to achieve network traffic interworking, EBGP full-mesh connections
may be required. However, establishing full-mesh connections among devices that function as ASBRs is
costly and places high requirements on the performance of the devices, which adversely affects the network
topology and device expansion. In Figure 1, the route server can advertise routes to all its EBGP peers,
without requiring EBGP full-mesh connections among ASBRs. Therefore, the route server function reduces
network resource consumption.
• Remote route leaking: After a PE receives a BGP VPNv4/VPNv6 route from a remote PE, the local PE
matches the export target (ERT) of the route against the import targets (IRTs) configured for local VPN
instances. If the ERT matches the IRT of a local VPN instance, the PE converts the BGP VPNv4/VPNv6
2022-07-08 1745
Feature Description
route to a BGP VPN route and adds the BGP VPN route to the routing table of this local VPN instance.
• Local route leaking: A PE matches the ERT of a BGP VPN route in a local VPN instance against the IRTs
configured for other local VPN instances. If the ERT matches the IRT of a local VPN instance, the PE
adds the BGP VPN route to the routing table of this local VPN instance. Locally leaked routes include
locally imported routes or routes learned from VPN peers.
After a PE receives VPNv4 routes destined for the same IP address from another PE or VPN routes from a CE,
the local PE implements route leaking by following the steps shown in Figure 1.
In Figure 2, PEs have the same VPN instance (vpna) and the RTs among VPN instances match each other.
The RD configured for PE2 and PE3 is 2:2, and that configured for PE4 is 3:3. Site 2 has a route destined for
10.1.1.0/24. The route is sent to PE2, PE3, and PE4, which then convert this route to multiple BGP VPNv4
routes and send them to PE1. Upon receipt of the BGP VPNv4 routes, PE1 implements route leaking as
shown in Figure 3. The detailed process is as follows:
1. After receiving the BGP VPNv4 routes from PE2, PE3, and PE4, PE1 adds them to the BGP VPNv4
routing table.
2. PE1 converts the BGP VPNv4 routes to BGP VPN routes by removing their RDs, adds the BGP VPN
routes to the routing table of the VPN instance, selects an optimal route from the BGP VPN routes
based on BGP route selection rules, and adds the optimal BGP VPN route to the IP VPN instance
routing table.
2022-07-08 1746
Feature Description
10.9.2.14 MP-BGP
Conventional BGP-4 manages only IPv4 unicast routing information, and inter-AS transmission of packets of
other network layer protocols, such as multicast, is limited.
To support multiple network layer protocols, the Internet Engineering Task Force (IETF) extends BGP-4 to
MP-BGP. MP-BGP is forward compatible. Specifically, Routers supporting MP-BGP can communicate with the
Routers that do not support MP-BGP.
As an enhancement of BGP-4, MP-BGP provides routing information for various routing protocols, including
IPv6 (BGP4+) and multicast.
• MP-BGP maintains both unicast and multicast routes. It stores them in different routing tables to
separate unicast routing information from multicast routing information.
• MP-BGP supports both unicast and multicast address families and can build both the unicast routing
topology and multicast routing topology.
• Most unicast routing policies and configuration methods supported by BGP-4 can be applied to
multicast, and unicast and multicast routes can be maintained according to these routing policies.
Extended Attributes
2022-07-08 1747
Feature Description
BGP-4 Update packets carry three IPv4-related attributes: NLRI (Network Layer Reachable Information),
Next_Hop, and Aggregator. Aggregator contains the IP address of the BGP speaker that performs route
summarization.
To support multiple network layer protocols, BGP-4 needs to carry network layer protocol information in
NLRI and Next_Hop. MP-BGP introduces the following two route attributes:
• MP_REACH_NLRI: indicates the multiprotocol reachable NLRI. It is used to advertise a reachable route
and its next hop.
The preceding two attributes are optional non-transitive. Therefore, the BGP speakers that do not support
MP-BGP will ignore the information carried in the two attributes and do not advertise the information to
other peers.
Address Family
The Address Family Information field consists of a 2-byte Address Family Identifier (AFI) and a 1-byte
Subsequent Address Family Identifier (SAFI).
BGP uses address families to distinguish different network layer protocols. For the values of address families,
see relevant standards. The NE40E supports multiple MP-BGP extension applications, such as VPN extension
and IPv6 extension, which are configured in their respective address family views.
Multicast-related address family views, such as the BGP-IPv4 multicast address family view, BGP-MVPN
address family view, BGP-IPv6 MVPN address family view, and BGP-MDT address family view, can transmit
inter-AS routing information and are mainly used in MBGP, BIER, NG MVPN, BIERv6, and Rosen MVPN
scenarios. For details about its application in multicast, see HUAWEI NE40E-M2 seriesUniversal Service
Router Configuration Guide - IP Multicast.
VPN-related address family views, such as the BGP-VPNv4 address family view, BGP-VPNv6 address family
view, BGP-VPN instance view, BGP-L2VPN-AD address family view, and BGP multi-instance VPN instance
view, are mainly used in BGP/MPLS IP VPN, VPWS, and VPLS scenarios. For details, see HUAWEI NE40E-M2
seriesUniversal Service Router Feature Description - VPN.
EVPN-related address family views, such as the BGP-EVPN address family view and BGP multi-instance EVPN
address family view, are mainly used in EVPN VPLS, EVPN VPWS, and EVPN L3VPN scenarios. For details, see
HUAWEI NE40E-M2 seriesUniversal Service Router Feature Description - VPN - EVPN Feature Description.
The BGP IPv4 SR Policy address family view and BGP IPv6 SR Policy address family view are mainly used in
Segment Routing MPLS and Segment Routing IPv6 scenarios. For details, see HUAWEI NE40E-M2 series
Universal Service Router Feature Description - Segment Routing.
Flow-related address family views, such as the BGP-Flow address family view, BGP-Flow VPNv4 address
family view, BGP-Flow VPNv6 address family view, BGP-Flow VPN instance IPv4 address family view, and
BGP-Flow VPN instance IPv6 address family view are mainly used to defend against DoS/DDoS attacks and
improve network security and availability. For details, see HUAWEI NE40E-M2 seriesUniversal Service Router
Feature Description - Security - BGP Flow Specification Feature Description.
2022-07-08 1748
Feature Description
The BGP-labeled address family view and BGP-labeled-VPN instance IPv4 address family view are mainly
used for carrier configuration using the BGP label distribution solution. For details, see HUAWEI NE40E-M2
seriesUniversal Service Router Configuration - VPN - BGP/MPLS IP VPN Configuration and HUAWEI NE40E-
M2 seriesUniversal Service Router Configuration - VPN - EVPN Configuration.
The BGP-LS address family view is mainly used to summarize the topology information collected by an IGP
and send the information to the upper-layer controller. For details, see HUAWEI NE40E-M2 seriesUniversal
Service Router Feature Description - BGP Feature Description - BGP-LS.
BGP Authentication
BGP can work properly only after BGP peer relationships are established. Authenticating BGP peers can
improve BGP security. BGP supports the following authentication modes:
• MD5 authentication
BGP uses TCP as the transport layer protocol. Message Digest 5 (MD5) authentication can be used
when establishing TCP connections to improve BGP security. MD5 authentication sets the MD5
authentication password for the TCP connection, and TCP performs the authentication. If the
authentication fails, the TCP connection cannot be established.
The encryption algorithm used for MD5 authentication poses security risks. Therefore, you are advised to use an
authentication mode based on a more secure encryption algorithm.
• Keychain authentication
Keychain authentication is performed at the application layer. It prevents service interruptions and
improves security by periodically changing the password and encryption algorithms. When keychain
authentication is configured for BGP peer relationships over TCP connections, BGP messages as well as
the process of establishing TCP connections can be authenticated. For details about keychain, see
"Keychain" in HUAWEI NE40E-M2 series Feature Description - Security.
• TCP-AO authentication
The TCP authentication option (TCP-AO) is used to authenticate received and to-be sent packets during
TCP session establishment and data exchange. It supports packet integrity check to prevent TCP replay
attacks. TCP-AO authentication improves the security of the TCP connection between BGP peers and is
applicable to the network that requires high security.
BGP GTSM
During network attacks, attackers may simulate BGP messages and continuously send them to the Router. If
the messages are destined for the Router, it directly forwards them to the control plane for processing
without validating them. As a result, the increased processing workload on the Router's control plane results
2022-07-08 1749
Feature Description
in high CPU usage. The Generalized TTL Security Mechanism (GTSM) defends against attacks by checking
the time to live (TTL) value in each packet header. TTL refers to the maximum number of Routers through
which a packet can pass.
GTSM checks whether the TTL value in each IP packet header is within a pre-defined range, which protects
services above the IP layer and improves system security.
After a GTSM policy of BGP is configured, an interface board checks the TTL values of all BGP messages.
According to actual networking requirements, you can set the default action (to drop or pass) that GTSM
will take on the messages whose TTL values are not within a pre-defined range. If a valid TTL range is
specified based on the network topology and the default action that GTSM will take is set to drop, the BGP
messages whose TTL values are not within the valid range are discarded directly by the interface board upon
receipt. This prevents bogus BGP messages from consuming CPU resources.
You can enable the logging function so that the device can record information about message dropping in
logs. The recorded logs facilitate fault locating.
BGP RPKI
Resource Public Key Infrastructure (RPKI) ensures BGP security by verifying the validity of the BGP route
source AS or route advertiser.
Attackers can steal user data by advertising routes that are more specific than those advertised by carriers.
RPKI can resolve this problem. For example, if a carrier has advertised a route destined for 10.10.0.0/16, an
attacker can advertise a route destined for 10.10.153.0/24, which is more specific than 10.10.0.0/16.
According to the longest match rule, the route 10.10.153.0/24 is preferentially selected for traffic forwarding.
As a result, the attacker succeeds in illegally obtaining user data.
To solve the preceding problems, you can configure Route Origin Authorization (ROA) and regional
validation to ensure BGP security.
ROA
ROA stores the mapping between prefix addresses and the origin AS and checks whether the routes with a
specified IP address prefix are valid by verifying the AS number.
In Figure 1, a connection is created between Device B and the RPKI server. Device B can download the ROA
database from the RPKI database and verify the mapping between 10.1.1.0/24 and AS 100. When AS 200
receives a route with the prefix 10.1.1.0/24 from AS 100 and AS 300, it compares the origin AS with that in
the ROA database. If they are the same, the route is considered valid. If they are different, the route is
considered invalid. The route to 10.1.1.0/24 learned from AS 100 is valid because it matches the ROA
database, and the route advertisement from the origin AS 100 is considered valid. The route to 10.1.1.0/24
learned from AS 300 is invalid because it does not match the ROA database, and the route advertisement
from the origin AS 300 is considered invalid.
If no RPKI server is available, a static ROA database can be configured for Device B. In this way, ROA validation can be
implemented based on the static ROA database, without relying on an RPKI server, thereby preventing route hijacking to
a certain extent.
2022-07-08 1750
Feature Description
• Valid: indicates that the route advertisement from the origin AS to the specified IP address prefix is
valid.
• Invalid: indicates that the route advertisement from the origin AS to the specified IP address prefix is
invalid and that the route is not allowed to participate in route selection.
• Not Found: indicates that the origin AS does not exist in the ROA database and that the route
participates in route selection.
In addition, ROA validation can be applied to the routes to be advertised to an EBGP peer, which prevents
route hijacking. In Figure 2, Device B is configured to perform ROA validation on the routes to be advertised.
Before advertising a route that matches the export policy to an EBGP peer, Device B performs ROA
validation on the route by matching the origin AS of the route against that of the corresponding route in the
ROA database. If the route is not found in the ROA database, the validation result is Not Found. If the route
is found in the ROA database and the origin AS of the route is the same as that of the corresponding route
in the ROA database, the validation result is Valid. If the origin AS of the route is different from that of the
corresponding route in the ROA database, the validation result is Invalid. If the validation result is Valid, the
route is advertised by default. If the validation result is Not Found, the route is not advertised by default. If
the validation result is Invalid, the route is not advertised by default and an alarm is generated; the alarm is
cleared when all routes with the validation result of Invalid are withdrawn.
2022-07-08 1751
Feature Description
Regional validation
Regional validation: Users can manually configure regions by combining multiple trusted ASs into a region
and combining multiple regions into a regional confederation. Regional validation controls route selection
results by checking whether the routes received from EBGP peers in an external region belong to the local
region. This prevents intra-region routes from being hijacked by attackers outside the local region, and
ensures that hosts in the local region can securely access internal services.
Regional validation applies to the following typical scenarios: regional validation scenario and regional
confederation validation scenario.
2022-07-08 1752
Feature Description
SSL/TLS authentication
Secure Sockets Layer (SSL) is a security protocol that protects data privacy on the Internet. Transport Layer
2022-07-08 1753
Feature Description
Security (TLS) is a successor of SSL. TLS protects data integrity and privacy by preventing attackers from
eavesdropping the data exchanged between a client and server. To ensure data transmission security on a
network, SSL/TLS authentication can be enabled for BGP message encryption.
Fundamentals
On the network shown in Figure 1, DeviceA and DeviceB belong to AS 100 and AS 200, respectively. An EBGP
connection is established between the two devices.
BFD is used to monitor the BGP peer relationship between DeviceA and DeviceB. If the link between them
becomes faulty, BFD can quickly detect the fault and notifies BGP.
On the network shown in 2, indirect multi-hop EBGP connections are established between DeviceA and
DeviceC and between DeviceB and DeviceD; a BFD session is established between DeviceA and DeviceC; a
BGP peer relationship is established between DeviceA and DeviceB; the bandwidth between DeviceA and
DeviceB is low. If the original forwarding path DeviceA->DeviceC goes faulty, traffic that is sent from DeviceE
to DeviceA is switched to the path DeviceA->DeviceB->DeviceD->DeviceC. Due to low bandwidth on the link
between DeviceA and DeviceB, traffic loss may occur on this path.
BFD for BGP TTL check applies only to the scenario in which DeviceA and DeviceC are indirectly connected EBGP peers.
2022-07-08 1754
Feature Description
Figure 2 Network diagram of setting a TTL value for checking the BFD session with a BGP peer
To prevent this issue, you can set a TTL value on DeviceC for checking the BFD session with DeviceA. If the
number of forwarding hops of a BFD packet (TTL value in the packet) is smaller than the TTL value set on
DeviceC, the BFD packet is discarded, and BFD detects a session down event and notifies BGP. DeviceA then
sends BGP Update messages to DeviceE for route update so that the traffic forwarding path can change to
DeviceE->DeviceF->DeviceB->DeviceD->DeviceC. For example, the TTL value for checking the BFD session on
DeviceC is set to 254. If the link between DeviceA and DeviceC fails, traffic sent from DeviceE is forwarded
through the path DeviceA->DeviceB->DeviceD->DeviceC. In this case, the TTL value in a packet decreases to
252 when the packet reaches DeviceC. Since 252 is smaller than the configured TTL value 254, the BFD
packet is discarded, and BFD detects a session down event and notifies BGP. DeviceA then sends BGP Update
messages to DeviceE for route update so that the traffic forwarding path can change to DeviceE->DeviceF->
DeviceB->DeviceD->DeviceC.
Networking
BGP peer tracking can quickly detect link or peer faults by checking whether routes to peers exist in the IP
routing table. If no route is found in the IP routing table based on the IP address of a BGP peer (or a route
exists but is unreachable, for example, the outbound interface is a Null0 interface), the BGP session goes
2022-07-08 1755
Feature Description
down, achieving fast BGP route convergence. If a reachable route can be found in this case, the BGP session
does not go down.
On the network shown in Figure 1, IGP connections are established between DeviceA, DeviceB, and DeviceC,
a BGP peer relationship is established between DeviceA and DeviceC, and BGP peer tracking is configured on
DeviceA. If the link between DeviceA and DeviceB fails, the IGP performs fast convergence first. As no route
is found on DeviceA based on the IP address of DeviceC, BGP peer tracking detects that no reachable route
to DeviceC is available and then notifies BGP on DeviceA of the fault. As a result, DeviceA terminates the
BGP connection with DeviceC.
• If a default route exists on DeviceA and the link between DeviceA and DeviceB fails, BGP peer tracking will not
terminate the peer relationship between DeviceA and DeviceC. This is because DeviceA can find the default route in
the IP routing table based on the peer's IP address.
• If DeviceA and DeviceC establish an IBGP peer relationship, you are advised to enable BGP peer tracking on both
devices to ensure that the peer relationship can be terminated soon after a fault occurs.
• If establishing a BGP peer relationship depends on IGP routes, you need to configure how long BGP peer tracking
waits after detecting peer unreachability before it terminates the BGP connection. The configured length of time
should be longer than the IGP route convergence time. Otherwise, before IGP route flapping caused by intermittent
disconnection is suppressed, the BGP peer relationship may have been terminated. This results in unnecessary BGP
convergence.
Background
With the wide application of IPv6 technologies, more and more separate IPv6 networks emerge. IPv6
provider edge (6PE), a technology designed to provide IPv6 services over IPv4 networks, allows service
providers to provide IPv6 services without constructing new IPv6 backbone networks. The 6PE solution
connects separate IPv6 networks using MPLS tunnels on IPv4 networks. The 6PE solution implements
IPv4/IPv6 dual stack on the PEs of Internet service providers (ISPs) and uses MP-BGP to assign labels to IPv6
routes. In this manner, the 6PE solution connects separate IPv6 networks over IPv4 tunnels between PEs.
Related Concepts
In real-world situations, different metro networks of a carrier or backbone networks of collaborative carriers
often span different ASs. 6PE is classified as either intra-AS 6PE or inter-AS 6PE, depending on whether
separate IPv6 networks connect to the same AS. If separate IPv6 networks are connected to different ASs,
inter-AS 6PE can be implemented through inter-AS 6PE Option B (with ASBRs as PEs), inter-AS 6PE Option
C, or inter-AS 6PE Option B mode.
2022-07-08 1756
Feature Description
• Intra-AS 6PE: Separate IPv6 networks are connected to the same AS. PEs in the AS exchange IPv6 routes
through MP-IBGP peer relationships.
• Inter-AS 6PE Option B (with ASBRs as PEs): ASBRs in different ASs exchange IPv6 routes through MP-
EBGP peer relationships.
• Inter-AS 6PE Option B: ASBRs in different ASs exchange IPv6 labeled routes through MP-EBGP peer
relationships.
• Inter-AS 6PE Option C: PEs in different ASs exchange IPv6 labeled routes through multi-hop MP-EBGP
peer relationships.
Intra-AS 6PE
Figure 1 shows intra-AS 6PE networking. 6PE runs on the edge of an ISP network. PEs that connect to IPv6
networks use the IPv4/IPv6 dual stack. PEs and CEs exchange IPv6 routes using the IPv6 EBGP or an IGP. PEs
exchange routes with each other and with Ps using an IPv4 routing protocol. PEs need to establish tunnels to
transparently transmit IPv6 packets. The tunnels mainly include MPLS label switched paths (LSPs) and MPLS
Local Ifnet tunnels. By default, LSPs are preferentially selected. If no LSPs are available, MPLS Local Ifnet
tunnels are used.
Figure 2 shows an intra-AS 6PE scenario where CE2 sends routes to CE1 and CE1 sends a packet to CE2. I-L
indicates the inner label, whereas O-L indicates the outer tunnel label. The outer tunnel label is allocated by
MPLS and is used to divert packets to the BGP next hop. The inner label indicates the outbound interface of
the packets or the CE to which the packets belong.
2. Upon receipt of the IPv6 route from CE2, PE2 changes the next hop of the route to itself, assigns an
inner label to the route, and sends the IPv6 labeled route to PE1, its IBGP peer.
2022-07-08 1757
Feature Description
3. Upon receipt of the IPv6 labeled route from PE2, PE1 recurses the route to a tunnel and delivers
information about the route to the forwarding table. Then, PE1 changes the next hop of the route to
itself, removes the label from the route, and sends the ordinary IPv6 route to CE1, its EBGP peer.
1. CE1 sends an ordinary IPv6 packet to PE1 over a public network IPv6 link.
2. Upon receipt of the IPv6 packet from CE1, PE1 searches its forwarding table based on the destination
address of the packet, encapsulates the packet with the inner label and outer tunnel label, and sends
the IPv6 packet with two labels to PE2 over a public network tunnel.
3. Upon receipt of the IPv6 packet with two labels, PE2 removes the two labels and forwards the packet
to CE2 over an IPv6 link based on the destination address of the packet.
The route and packet transmission processes show that CEs are unaware of whether the public network is an
IPv4 or IPv6 network.
Inter-AS 6PE
• Inter-AS 6PE Option B (with ASBRs as PEs)
Figure 3 shows the networking of inter-AS 6PE Option B (with ASBRs as PEs). Inter-AS 6PE Option B
(with ASBRs as PEs) is similar to intra-AS 6PE. The only difference is that in the former scenario, ASBRs
(shown in Figure 3) establish an EBGP peer relationship. The route and packet transmission processes in
an inter-AS 6PE Option B scenario (with ASBRs as PEs) are also similar to those in an intra-AS 6PE
scenario. For details, see Figure 2.
2022-07-08 1758
Feature Description
Figure 3 Networking diagram for inter-AS 6PE Option B (with ASBRs as PEs)
Figure 5 shows the inter-AS 6PE Option B scenario where CE2 sends routes to CE1 and CE1 sends
packets to CE2. I-L indicates the inner label, whereas O-L indicates the outer tunnel label.
2022-07-08 1759
Feature Description
2. Upon receipt of the IPv6 route from CE2, PE2 changes the next hop of the route to itself, assigns
an inner label to the route, and sends the IPv6 labeled route to ASBR2, its IBGP peer.
3. Upon receipt of the IPv6 labeled route from PE2, ASBR2 recurses the route to a tunnel and
delivers the route to the forwarding table. Then, ASBR2 changes the next hop of the route to
itself, allocates a new inner label to the route, and sends the route to ASBR1, its EBGP peer.
4. Upon receipt of the IPv6 labeled route from ASBR2, ASBR1 recurses the route to a tunnel and
delivers the route to the forwarding table. Then, ASBR1 changes the next hop of the route to
itself, allocates a new inner label to the route, and sends the route to PE1, its IBGP peer.
5. Upon receipt of the IPv6 labeled route from ASBR1, PE1 recurses the route to a tunnel and
delivers the route to the forwarding table. Then, PE1 changes the next hop of the route to itself,
removes the label from the route, and sends the ordinary IPv6 route to CE1, its EBGP peer.
1. CE1 sends an ordinary IPv6 packet to PE1 over a public network IPv6 link.
2. Upon receipt of the IPv6 packet from CE1, PE1 searches its forwarding table based on the
destination address of the packet, encapsulates the packet with the inner label and outer tunnel
label, and sends the IPv6 packet with two labels to ASBR1 over an intra-AS public network LSP.
3. Upon receipt of the packet from PE1, ASBR1 removes the two labels from the packet, searches its
forwarding table based on the destination address of the packet, encapsulates the packet with a
new inner label and outer tunnel label, and sends the IPv6 packet to ASBR2 over an inter-AS
public network tunnel.
4. Upon receipt of the packet from ASBR1, ASBR2 removes the two labels from the packet, searches
its forwarding table based on the destination address of the packet, encapsulates the packet with
a new inner label and outer tunnel label, and sends the IPv6 packet to PE2 over an intra-AS
public network LSP.
5. Upon receipt of the IPv6 packet with two labels, PE2 removes the two labels and forwards the
2022-07-08 1760
Feature Description
packet to CE2 over an IPv6 link based on the destination address of the packet.
Figure 6 shows inter-AS 6PE Option C networking. The difference between inter-AS 6PE Option B and
inter-AS 6PE Option C is as follows: in an inter-AS 6PE Option C scenario, PEs establish a multi-hop MP-
EBGP peer relationship, exchange labeled routes using an IPv4 routing protocol, and transparently
transmit IPv6 packets over an end-to-end BGP LSP between the PEs.
Two inter-AS 6PE Option C solutions are available, depending on the establishment methods of end-to-end LSPs. In
an inter-AS 6PE Option C scenario, PEs establish a multi-hop MP-EBGP peer relationship to exchange IPv6 labeled
routes and establish an end-to-end BGP LSP to transmit IPv6 packets. The way in which the end-to-end BGP LSP is
established does not matter much to inter-AS 6PE Option C and therefore is not described here.
Figure 7 shows the inter-AS 6PE Option C scenario where CE2 sends routes to CE1 and CE1 sends
packets to CE2. I-L indicates an inner label, B-L indicates a BGP LSP label, and O-L indicates an outer
tunnel label.
In Figure 7, the following two assumptions are made for clearer description:
■ An MPLS local Ifnet tunnel is established between the two ASBRs.
■ MPLS does not use the penultimate hop popping (PHP) function.
2022-07-08 1761
Feature Description
2. Upon receipt of the IPv6 route from CE2, PE2 changes the next hop of the route to itself, assigns
an inner label to the route, and sends the IPv6 labeled route to PE1, its MP-EBGP peer.
3. Upon receipt of the IPv6 labeled route from PE2, PE1 recurses the route to a tunnel and delivers
information about the route to the forwarding table. Then, PE1 changes the next hop of the route
to itself, removes the label from the route, and sends the ordinary IPv6 route to CE1, its EBGP
peer.
In this manner, the IPv6 route is transmitted from CE2 to CE1. During route transmission, ASBRs
transparently transmit packets carrying IPv6 labeled routes without modifying the IPv6 labeled routes.
1. CE1 sends an ordinary IPv6 packet to PE1 over a public network IPv6 link.
2. Upon receipt of the IPv6 packet from CE1, PE1 searches its forwarding table based on the
destination address of the packet, changes the next hop of the packet, encapsulates the packet
with an inner label, a BGP LSP label, and an outer tunnel label, and sends the IPv6 packet to P1
over an intra-AS public network tunnel.
3. Upon receipt of the IPv6 packet from PE1, P1 removes the outer label, adds a new outer label to
the packet, and forwards the packet with three labels to ASBR1 over an intra-AS public network
tunnel.
4. Upon receipt of the IPv6 packet from P1, ASBR1 removes the outer label and BGP LSP label,
encapsulates a new BGP LSP label into the IPv6 packet, and forwards the IPv6 packet with two
labels to ASBR2 over the inter-AS public network tunnel.
5. Upon receipt of the IPv6 packet from ASBR1, ASBR2 removes the BGP LSP label, encapsulates the
packet with a new outer label, and forwards the IPv6 packet with two labels to P2 over an intra-
AS public network tunnel.
6. Upon receipt of the IPv6 packet from ASBR2, P2 removes the outer label from the packet,
encapsulates the packet with a new outer label, and forwards the packet with two labels to PE2
over an intra-AS public network tunnel.
2022-07-08 1762
Feature Description
7. Upon receipt of the IPv6 packet with two labels, PE2 removes the two labels and forwards the
packet to CE2 over an IPv6 link based on the destination address of the packet.
Usage Scenario
Each 6PE mode has its advantages and usage scenarios. Intra-AS 6PE applies to scenarios where separate
IPv6 networks connect to the same AS. Inter-AS 6PE applies to scenarios where separate IPv6 networks
connect to different ASs. Table 1 lists the usage scenarios of inter-AS 6PE.
Inter-AS 6PE Option B (with Advantage: The configuration is It applies to small networks where
ASBRs as PEs) simple and similar to that for different IPv6 networks connect to
intra-AS 6PE, and additional inter- ASBRs in different ASs.
AS configuration is not required. This mode is especially applicable
Disadvantage: The scalability is to the scenario where a small
poor. ASBRs must manage number of ASs are spanned.
information about all IPv6 labeled
routes, which increases the
performance requirements on the
ASBRs.
Inter-AS 6PE Option B Advantage: MPLS tunnels are It applies to an inter-AS Option B
established segment by segment, public network with multi-
reducing management costs. segment tunnels established
Disadvantage: Information about between PEs in different ASs that
IPv6 labeled routes is stored and are connected to separate IPv6
advertised by ASBRs in different networks.
ASs. When a large number of IPv6
labeled routes exist, the ASBRs are
overburdened and are likely to
become fault points.
Inter-AS 6PE Option C Advantage: IPv6 labeled routes It applies to an inter-AS Option C
are directly exchanged between public network with E2E tunnels
the ingress and egress PEs and do established between PEs in
not need to be stored or different ASs that are connected
forwarded by transit devices. to separate IPv6 networks.
Information about IPv6 labeled This solution is recommended
2022-07-08 1763
Feature Description
Benefits
6PE offers the following benefits:
• Easy maintenance: All configurations are performed on PEs, and IPv6 networks are unaware of the IPv4
network. The existing IPv4 network is used to carry IPv6 services, simplifying network maintenance.
• Low network construction cost: Carriers can make full use of the existing MPLS network resources and
provide IPv6 services for users without upgrading the network. 6PE devices can also provide various
types of services, such as IPv6 VPN and IPv4 VPN services.
Applications
On the network shown in Figure 1, DeviceA and DeviceB are directly connected, and prefix-based ORF is
enabled on them; after negotiating the prefix-based ORF capability with DeviceB, DeviceA adds the local
prefix-based inbound policy to a Route-Refresh packet and then sends the Route-Refresh packet to DeviceB.
DeviceB uses the information in the packet to work out an outbound policy to advertise routes to DeviceA.
2022-07-08 1764
Feature Description
As shown in Figure 2, DeviceA and DeviceB are clients of the RR in the domain. Prefix-based ORF is enabled
on all three NEs. After negotiating prefix-based ORF with the RR, DeviceA and DeviceB add the local prefix-
based inbound policies Route-Refresh packets and then send the packets to the RR. Based on the Route-
Refresh packets, the RR uses the information in the Route-Refresh packets to work out the outbound policies
to reflect routes to DeviceA and DeviceB.
Background
As networks develop, users keep increasing. The broadcast export policies used by carriers no longer meet
user requirements because the routes that users desire vary. Users want to receive only required routes, but
it is costly for carriers to maintain an export policy for each user. ORF allows users to receive only desired
routes, without requiring the carrier to maintain an export policy for each user.
Related Concepts
2022-07-08 1765
Feature Description
Implementation
PEs with VPN instances bound send to their BGP peers VPN ORF routes carrying desired import route targets
(IRTs) and the original AS number. Based on the VPN ORF routes, the peers generate an export policy for
each corresponding PE so that the PE receives only desired routes. This reduces the burden on the PEs.
On the network shown in Figure 1, before VPN ORF is enabled, the RR sends to PE3 all routes of VPN
instances received from PE1. However, among these routes, PE3 only desires the routes with ERT 1:1. In
addition, the RR sends to PE1 all routes of VPN instances received from PE3. However, among these routes,
PE1 only desires the routes with ERT 1:1. In this case, PE1 and PE2 both receive unwanted routes.
After VPN ORF is enabled, BGP peer relationships are established in the VPN-Target address family view. In
Figure 1, after BGP peer relationships are established between the RR and PE1 and between the RR and PE3,
the peers negotiate the VPN ORF capability, PE1 and PE3 send VPN ORF routes carrying required import
route targets (IRTs) and original AS number to their VPN ORF peers. The VPN ORF peers construct export
policies based on the VPN ORF routes. After receiving the routes with targets 1:1 and 2:2 from PE1, the RR
advertises only the routes with the target 1:1 to PE3. After receiving the routes with targets 1:1 and 3:3 from
PE3, the RR advertises only the routes with the target 1:1 to PE1.
Usage Scenario
• Intra-AS scenario where a VPN RR has clients
2022-07-08 1766
Feature Description
Benefits
• Reduced bandwidth consumption (because less routes are advertised)
Application
On the network shown in Figure 1, DeviceY advertises a learned BGP route to DeviceB and DeviceC in AS
100; DeviceB and DeviceC then advertise the route to the corresponding RR, which then reflects the route to
DeviceA. In this case, DeviceA receives two routes whose next hops are DeviceB and DeviceC. Then, DeviceA
selects a route based on a configured routing policy. Assume that the route sent by DeviceB is preferred. The
route received through Link B functions as a backup link.
If a node along Link A fails or a fault occurs on Link A, the next hop of the route from DeviceA to DeviceB
becomes unavailable. If Auto FRR is enabled on DeviceA, the forwarding plane then quickly switches Device
2022-07-08 1767
Feature Description
A-to-DeviceY traffic to Link B, which ensures uninterrupted traffic transmission. In addition, DeviceA
performs route reselection based on prefixes. Consequently, it selects the route sent by DeviceC and then
updates its FIB.
Usage Scenario
The BGP dynamic update peer-groups feature is applicable to the following scenarios:
• Scenario with an RR
• Scenario where routes received from EBGP peers need to be sent to all IBGP peers
2022-07-08 1768
Feature Description
2022-07-08 1769
Feature Description
The preceding scenarios have in common that the Router needs to send routes to a large number of BGP
peers, most of which share the same configuration. This situation is most evident in the networking shown in
Figure 2. In addition, when there are a large number of peers and routes, the packet sending efficiency is a
performance bottleneck.
The update peer-group feature can overcome this bottleneck. After the feature is applied, each routing
update is grouped only once, and the generated Update message is sent to all peers in the group. For
example, an RR has 100 clients and needs to reflect 100,000 routes to them. If the RR groups routing
updates per peer, it needs to group 100,000 routing updates 100 times (a total of 10 million) before sending
Update messages to the 100 clients. The update peer-group feature improves efficiency 100-fold, because
the 100,000 routing updates need to be grouped only once.
Purpose
2-byte autonomous system (AS) numbers used on networks range from 1 to 65535, and the available AS
numbers are close to exhaustion as networks expand. Therefore, the AS number range needs to be extended.
4-byte AS numbers ranging from 1 to 4294967295 can address this problem. New speakers that support 4-
byte AS numbers can co-exist with old speakers that support only 2-byte AS numbers.
Definition
4-byte AS numbers are extended from 2-byte AS numbers. Border Gateway Protocol (BGP) peers use a new
capability code and optional transitive attributes to negotiate the 4-byte AS number capability and transmit
4-byte AS numbers. This mechanism enables communication between new speakers and between old
speakers and new speakers.
To support 4-byte AS numbers, an open capability code 0x41 is defined in a standard protocol for BGP
connection negotiation. 0x41 indicates that the BGP speaker supports 4-byte AS numbers.
The following new optional transitive attributes are defined by standard protocols and used to transmit 4-
2022-07-08 1770
Feature Description
If a new speaker with an AS number greater than 65535 communicates with an old speaker, the old speaker
needs to set the peer AS number to AS_TRANS. AS_TRANS is a reserved AS number with the value being
23456.
Related Concepts
• New speaker: a BGP peer that supports 4-byte AS numbers
• Old speaker: a BGP peer that does not support 4-byte AS numbers
• Old session: a BGP connection established between a new speaker and an old speaker, or between old
speakers
Fundamentals
BGP speakers negotiate capabilities by exchanging Open messages. Figure 1 shows the format of Open
messages exchanged between new speakers. The header of a BGP Open message is fixed, in which My AS
Number is supposed to be the local AS number. However, My AS Number carries only 2-byte AS numbers,
and does not support 4-byte AS numbers. Therefore, a new speaker adds the AS_TRANS 23456 to My AS
Number and its local AS number to Optional Parameters before it sends an Open message to a peer. After
the peer receives the message, it can determine whether the new speaker supports 4-byte AS numbers by
checking Optional Parameters in the message.
Figure 2 shows how peer relationships are established between new speakers, and between an old speaker
and a new speaker. BGP speakers notify each other of whether they support 4-byte AS numbers by
exchanging Open messages. After the capability negotiation, new sessions are established between new
speakers, and old sessions are established between a new speaker and an old speaker.
2022-07-08 1771
Feature Description
AS_Path and Aggregator in Update messages exchanged between new speakers carry 4-byte AS numbers,
whereas AS_Path and Aggregator in Update messages sent by old speakers carry 2-byte AS numbers.
• When a new speaker sends an Update message carrying an AS number greater than 65535 to an old
speaker, the new speaker uses AS4_Path and AS4_Aggregator to assist AS_Path and AS_Aggregator in
transferring 4-byte AS numbers. AS4_Path and AS4_Aggregator are transparent to the old speaker. In
the networking shown in Figure 3, before the new speaker in AS 2.2 sends an Update message to the
old speaker in AS 65002, the new speaker replaces each 4-byte AS number (2.2, 1.1, 65001) with 23456
in AS_Path. Therefore, the AS_Path carried in the Update message is (23456, 23456, 65001), and the
carried AS4_Path is (2.2, 1.1, 65001). After the old speaker in AS 65002 receives the Update message, it
transparently transmits the message to other ASs.
• When the new speaker receives an Update message carrying AS_Path, AS4_Path, AS_Aggregator, and
AS4_Aggregator from the old speaker, the new speaker uses the reconstruction algorithm to
reconstruct the actual AS_Path and AS_Aggregator. On the network shown in Figure 3, after the new
speaker in AS 65003 receives an Update message carrying AS_Path (65002, 23456, 23456, 65001) and
AS4_Path (2.2, 1.1, 65001) from the old speaker in AS 65002, the new speaker reconstructs the actual
AS_Path (65002, 2.2, 1.1, 65001).
2022-07-08 1772
Feature Description
If you adjust the display format of 4-byte AS numbers, the matching results of AS_Path regular expressions and extended
community filters are affected. Specifically, if the display format of 4-byte AS numbers is changed when an AS_Path
regular expression or extended community filter is used as an export or import policy, the AS_Path regular expression or
extended community filter needs to be reconfigured. If reconfiguration is not performed, routes cannot match the export
or import policy, and a network fault occurs.
Benefits
4-byte AS numbers alleviate AS number exhaustion and therefore are beneficial to carriers who need to
expand the network scale.
to carrier B. Then carrier A acquires carrier B. In this case, the AS number of device A needs to be changed
from 200 to 100. Because device A already has a BGP peer relationship established with device B in AS 300
using AS 200, device A's AS number used to establish the BGP peer relationship needs to be changed to 100.
The carrier of AS 100 and the carrier of AS 300 then need to communicate about the change. In addition,
the AS number configured on device A and peer AS number configured on device B may not be changed at
the same time, which will lead to a lengthy interruption of the BGP peer relationship between the two
devices. To ensure a smooth merger, you can run the peer fake-as command on device A to set AS 200 of
carrier B as a fake AS number so that device A's AS number used to establish the BGP peer relationship
between devices A and B does not need to be changed.
In addition, the AS number of the original BGP speakers of carrier B may be changed to the actual AS
number at any time when BGP peer relationships are established with devices of carrier A after the merger.
If carrier B has a large number of BGP speakers and some of the speakers use the actual AS number
whereas other speakers use the fake AS number during BGP peer relationship establishment with devices of
carrier A, the local configuration on BGP speakers of carrier B needs to be changed based on the
configuration of the peer AS number, which increases the workload of maintenance. To address this
problem, you can run the peer fake-as command with dual-as specified to allow the local end to use the
actual or fake AS number to establish a BGP peer relationship with the specified peer.
In Figure 2, the AS number of carrier A is 100, whereas the AS number of carrier B is 200; devices A, B, C,
and D belong to carrier B, and device A establishes an IBGP peer relationship with device B, device C, and
device D each. Then carrier A acquires carrier B. In this case, the AS number of device A needs to be changed
from 200 to 100. Because the AS number used by device A to establish the IBGP peer relationship with
devices B, C, and D is 200, the AS number needs to be changed to 100. In this case, carrier A and carrier B
need to communicate about the change. In addition, the AS number configured on device A and peer AS
number configured on devices B, C, and D may not be changed at the same time, which will lead to a
lengthy interruption of the IBGP peer relationships. To ensure a smooth merger, you can run the peer fake-
as command on device A to set AS 200 of carrier B as a fake AS number so that device A's AS number used
to establish the IBGP peer relationships with devices B, C, and D does not need to be changed.
2022-07-08 1774
Feature Description
10.9.2.25 BMP
Background
The BGP Monitoring Protocol (BMP) can monitor the BGP running status and trace data of BGP routes on
network devices in real time. The BGP running status includes the establishment and termination of peer
relationships and update of routing information. The trace data of BGP routes indicates how BGP routes on a
device are processed; for example, processing of the routes that match an import or export route-policy.
Before BMP is implemented, only manual query can be used to obtain the BGP running status of devices,
resulting in low monitoring efficiency.
After BMP is implemented, a device can be connected to a BMP server and configured to report its BGP
running status to the server, significantly improving monitoring efficiency.
• Initiation message: sent by a monitored device to the monitoring server to report information such as
the device vendor and software version.
• Peer Up Notification (PU) message: sent by a monitored device to notify the monitoring server that a
BGP peer relationship has been established.
• Route Monitoring (RM) message: used to provide the monitoring server with a collection of all routes
received from a BGP peer and notify the server of route addition or withdrawal in real time.
• Peer Down Notification (PD) message: sent to notify the monitoring server that a BGP peer relationship
has been disconnected.
• Stats Reports (SR) message: sent to report statistics about the device running status to the monitoring
server.
• Termination message: sent to report the cause for closing a BMP session to the monitoring server.
• Route Policy and Attribute Trace (ROFT) message: used to report the trace data of routes to the
2022-07-08 1775
Feature Description
BMP sessions are unidirectional. Devices send messages to the monitoring server but ignore messages sent by the server.
Implementation
On the network shown in Figure 1, TCP connections are established between the monitoring server and
monitored devices (PE1 and PE2 shown in the figure). The monitored devices send unsolicited BMP messages
to the monitoring server to report information about the BGP running status. Upon receiving these BMP
messages, the monitoring server parses them and displays the BGP running status in the monitoring view. By
analyzing the headers in the received BMP messages, the monitoring server can determine which BGP peers
have advertised the routes carried in these messages.
When establishing a connection between a BGP device and monitoring server, note the following guidelines:
• BMP operates over TCP, and you can specify a port number for the TCP connection between the BGP
device and monitoring server.
• One device can connect to multiple monitoring servers, and one monitoring server can also connect to
multiple devices.
• Each BMP instance can connect to multiple monitoring servers. The advantages are as follows:
■ Different servers can be used to monitor routes from the same BGP peer in different address
families, allowing each BGP service to be monitored by a different server.
2022-07-08 1776
Feature Description
Benefits
BMP facilitates the monitoring of the BGP running status and reports security risks on networks in real time
so that preventive measures can be taken promptly to improve network stability.
Background
If multiple routes to the same destination are available, a BGP device selects one optimal route based on
BGP route selection policies and advertises the route to its BGP peers.
For details about BGP route selection rules, see BGP Fundamentals.
However, in scenarios with master and backup provider edges (PEs), if routes are selected based on the
preceding policies and the primary link fails, the BGP route convergence takes a long time because no
backup route is available. To address this problem, the BGP Best External feature was introduced.
Related Concepts
BGP Best External: A mechanism that enables a backup device to select a sub-optimal route and send the
route to its BGP peers if the route preferentially selected based on BGP route selection policies is an Internal
2022-07-08 1777
Feature Description
Border Gateway Protocol (IBGP) route advertised by the master device. Therefore, BGP Best External speeds
up BGP route convergence if the primary link fails.
Best External route: The sub-optimal route selected after BGP Best External is enabled.
BGP Best External can be enabled on PE2 to address this problem. With BGP Best External, PE2 selects the
EBGP route from CE1 and advertises it to PE3. In this case, a backup link is available. Table 1 lists the
differences with and without BGP Best External.
Route Convergence in
Feature Enabling Status Route Advertisement Optimal Route
Case of a Link Fault
After BGP Best External PE3 receives two routes: Traffic can be directly
is enabled CE1→PE1→PE3 CE1 -> PE1 -> PE3 switched to the backup
CE1→PE2→PE3 link.
Usage Scenario
2022-07-08 1778
Feature Description
The BGP Best External feature applies to scenarios in which master and backup PEs are deployed and the
backup PE needs to advertise the sub-optimal route (Best External route) to its BGP peers to speed up BGP
route convergence.
Benefits
As networks develop, services, such as Voice over Internet Protocol (VoIP), online video, and financial
services, pose higher requirements for real-time transmission. After BGP Best External is deployed, if the
optimal route selected by a device is an IBGP route, the device selects the suboptimal route and advertises it
to BGP peers. This implements fast route convergence in the case of a link fault and reduces the impact of
traffic interruption on services.
Background
In a scenario with a route reflector (RR) and clients, if the RR has multiple routes to the same destination
(with the same prefix), the RR selects an optimal route from these routes and then sends only the optimal
route to its clients. Therefore, the clients have only one route to the destination. If a link along this route
fails, route convergence takes a long time, which cannot meet the requirements on high reliability.
To address this issue, deploy the BGP Add-Path feature on the RR. With BGP Add-Path, the RR can send two
or more routes with the same prefix to its clients. After reaching the clients, these routes can work in
primary/backup or load-balancing mode, which ensures high reliability in data transmission.
• For details about BGP route selection and advertisement policies, see BGP Fundamentals.
• Although BGP Add-Path can be deployed on any router, you are advised to deploy it on RRs.
• With BGP Add-Path, you can configure the maximum number of routes with the same prefix that an RR can send
to its clients. The actual number of routes with the same prefix that an RR can send to its clients is the smaller
value between the configured maximum number and the number of available routes with the same prefix.
Related Concepts
Add-Path routes: The routes selected by BGP after BGP Add-Path is configured.
Typical Networking
On the network shown in Figure 1, DeviceA, DeviceB, and DeviceC are clients of the RR, and DeviceD is an
EBGP peer of DeviceB and DeviceC. Both DeviceB and DeviceC receive a route to 10.1.1.1/32 from DeviceD.
DeviceB and DeviceC advertise the route 10.1.1.1/32 to the RR. The two routes have the same destination
address but different next hops. The RR selects an optimal route based on BGP route selection rules and
advertises the optimal route to DeviceA. Therefore, Device A has only one route to 10.1.1.1/32.
2022-07-08 1779
Feature Description
BGP Add-Path can be configured on the RR to control the maximum number of routes with the same prefix
that the RR can send to DeviceA. Assume that the configured maximum number of routes with the same
prefix that the RR can send to DeviceA is 2. Table 1 lists the differences with and without BGP Add-Path.
Usage Scenario
BGP Add-Path applies to scenarios in which an RR is deployed and needs to send multiple routes with the
same prefix to clients to ensure data transmission reliability.
BGP Add-Path is used in traffic optimization scenarios and allows multiple routes to be sent to the controller.
Benefits
Deploying BGP Add-Path can improve network reliability.
2022-07-08 1780
Feature Description
Background
BGP peer flapping occurs when BGP peer relationships are disconnected and then immediately re-established
2022-07-08 1781
Feature Description
in a quick sequence that is repeated. Frequent BGP peer flapping is caused by various factors; for example, a
link is unstable, or an interface that carries BGP services is unstable. After a BGP peer relationship is
established, the local device and its BGP peer usually exchange all routes in their BGP routing tables with
each other. If the BGP peer relationship is disconnected, the local device deletes all the routes learned from
the BGP peer. Generally, a large number of BGP routes exist, and in this case, a large number of routes
change and a large amount of data is processed when BGP peers frequently flap. As a result, a large number
of resources are consumed, causing high CPU usage. To prevent this issue, a device supports suppression on
BGP peer flapping. With this function enabled, the local device suppresses the establishment of the BGP peer
relationship if it flaps continuously.
Related Concepts
ConnectFlaps: indicates the peer flapping counter. Each time a BGP peer relationship flaps, the counter
changes in increments of 1.
Peer flapping suppression period: The peer flapping suppression period is adjusted based on the
ConnectFlaps value.
Idle hold timer: indicates the timer used by BGP to determine the waiting period for establishing a peer
relationship with a peer. After the Idle hold timer expires, BGP attempts to establish a new connection with
the BGP peer.
Half-life period: When the peer flapping counter (ConnectFlaps value) changes, the peer flapping count
adjustment timer starts. If the timer expires (more than 1800s), the ConnectFlaps value is reduced by half.
This period specified by the peer flapping count adjustment timer is called a half-life period.
Fundamentals
Entering flapping suppression
As shown in Figure 1, when the ConnectFlaps value reaches a certain value (greater than 5), the Idle hold
timer is used to suppress the establishment of the BGP peer relationship. The Idle hold timer value is
calculated as follows:
Idle hold timer = Initial waiting time + Peer flapping suppression period,
where, if the peer timer connect-retry connect-retry-time command is not run, the initial time that BGP
waits to establish the peer relationship is 10s. If this command is run, the configured connect-retry-time
value is used as the initial waiting time.
The peer flapping suppression period is processed as follows: If the ConnectFlaps value ranges from 1 to 5,
the establishment of the peer relationship is not suppressed. If the ConnectFlaps value ranges from 6 to 10,
the peer flapping suppression period increases by 10s each time the ConnectFlaps value is incremented by 1.
If the ConnectFlaps value ranges from 11 to 15, the peer flapping suppression period increases by 20s each
time the ConnectFlaps value is incremented by 1. For each of the following five-value ranges, the peer
flapping suppression period increases by twice the time of the previous range each time the ConnectFlaps
value is incremented by 1. The peer flapping suppression period no longer increases until the Idle hold timer
reaches 600s. This prevents a BGP negotiation failure due to long-time suppression.
2022-07-08 1782
Feature Description
Figure 1 Relationship between the Idle hold timer and ConnectFlaps values when the initial waiting time is 10s
When the ConnectFlaps value changes, the peer flapping count adjustment timer starts. If the timer expires
(more than 1800s have passed), the ConnectFlaps value is reduced by half, and a half-life period ends. In
this case, if the ConnectFlaps value has not reached 0, the next half-life period will start. This process is
cyclically repeated until the ConnectFlaps counter is reset. Assume that the ConnectFlaps value is 10. After
four half-life periods elapse, the ConnectFlaps value changes to 0, as shown in Figure 2.
2022-07-08 1783
Feature Description
Background
In some scenarios, if a large number of routes recurse to the same next hop that flaps frequently, the system
will be busy processing reselection and re-advertisement of these routes, which consumes excessive
resources and leads to high CPU usage. BGP recursion suppression in case of next hop flapping can address
this problem.
Principles
After this function is enabled, BGP calculates the penalty value that starts from 0 by comparing the flapping
interval with configured intervals if next hop flapping occurs. When the penalty value exceeds 10, BGP
suppresses route recursion to the corresponding next hop. For example, if the intervals for increasing,
retaining, and clearing the penalty value are T1, T2, and T3, respectively, BGP calculates the penalty value as
follows:
• Increases the penalty value by 1 if the flapping interval is less than T1.
• Retains the penalty value if the flapping interval is greater than or equal to T1, but less than T2.
• Reduces the penalty value by 1 if the flapping interval is greater than or equal to T2, but less than T3.
• Clears the penalty value if the flapping interval is greater than or equal to T3.
When the penalty value exceeds 10, the system processes reselection and re-advertisement of the routes
that recurse to a flapping next hop much slower.
Benefits
BGP recursion suppression in case of next hop flapping prevents the system from frequently processing
reselection and re-advertisement of a large number of routes that recurse to a flapping next hop, which
reduces system resource consumption and CPU usage.
10.9.2.31 BGP-LS
BGP-link state (LS) enables BGP to report topology information collected by IGPs to the upper-layer
controller.
Background
BGP-LS is a new method of collecting topology information.
Without BGP-LS, the Router uses an IGP (OSPF, OSPFv3, or IS-IS) to collect network topology information
through routing information flooding and report the topology information of each area to the controller
separately. This method has the following disadvantages:
• The controller must have high computing capabilities and support both the IGP and its algorithm.
2022-07-08 1784
Feature Description
• The controller cannot obtain complete information about the inter-IGP area topology. As a result, it
cannot compute E2E optimal paths.
• The controller receives topology information from different routing protocols, making it difficult for the
controller to analyze and process such information.
After BGP-LS is introduced, BGP summarizes topology information collected by IGPs and reports it to an
upper-layer controller. With BGP's powerful route selection capabilities, BGP-LS has the following
advantages:
• Facilitates path selection and computation on the controller by using BGP to summarize topology
information in each process or AS and report the complete information directly to the controller.
• Requires only one routing protocol (BGP) to report information about the entire network's topology to
the controller.
Related Concepts
BGP-LS provides a simple and efficient method of collecting topology information.
BGP-LS routes carry topology information and are classified into six types of routes that carry node, link,
route prefix, IPv6 route prefix, SRv6 SID, and TE Policy information, respectively. These routes work together
to transmit topology information.
BGP-LS Routes
Based on BGP, BGP-LS introduces a series of new Network Layer Reachability Information (NLRI) attributes
to carry information about links, nodes, and IPv4/IPv6 prefixes. Such new NLRIs are called Link-State NLRIs.
BGP-LS includes the MP_REACH_NLRI or MP_UNREACH_NLRI attribute in BGP Update messages to carry
Link-State NLRIs.
BGP-LS defines the following types of Link-State NLRI:
• Node NLRI
• Link NLRI
In addition, the BGP-LS attribute is defined for Link-State NLRI to carry link, node, and IPv4/IPv6 prefix
parameters and attributes. The BGP-LS attribute is defined as a set of Type, Length, Value (TLV) triplets and
carried with Link-State NLRI attributes in BGP-LS messages. All these attributes are optional, non-transitive
BGP attributes, including Node Attribute, Link Attribute, and Prefix Attribute.
Node NLRI format
0 1 2 3
01234567890123456789012345678901
2022-07-08 1785
Feature Description
+-+-+-+-+-+-+-+-+
| Protocol-ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier |
| (64 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
// Local Node Descriptors (variable) //
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Protocol-ID 1 octet Protocol identifier, identifying a protocol such as IS-IS, OSPF, OSPFv3, or
BGP.
Identifier 8 octets Uniquely identifies a protocol instance when IS-IS, OSPFv3 multi-
instance, or OSPF multi-instance is running.
Local Node Variable The Local Node Descriptors TLV contains Node Descriptors for the local
Descriptors node of the link. This TLV consists of a series of Node Descriptor sub-
TLVs.
Protocol-ID 1 octet Protocol identifier, identifying a protocol such as IS-IS, OSPF, OSPFv3, or
BGP.
Identifier 8 octets Uniquely identifies a protocol instance when IS-IS, OSPFv3 multi-
instance, or OSPF multi-instance is running.
Local Node Variable The Local Node Descriptors TLV contains Node Descriptors for the local
Descriptors node of the link. This TLV consists of a series of Node Descriptor sub-
2022-07-08 1786
Feature Description
TLVs.
Remote Node Variable The Remote Node Descriptors TLV contains Node Descriptors for the
Descriptors remote node of the link.
Link Descriptors Variable The Link Descriptors field is a set of TLV triplets. This field uniquely
identifies a link among multiple parallel links between a pair of devices.
Protocol-ID 1 octet Protocol identifier, identifying a protocol such as IS-IS, OSPF, OSPFv3, or
BGP.
Identifier 8 octets Uniquely identifies a protocol instance when IS-IS, OSPFv3 multi-
instance, or OSPF multi-instance is running.
Local Node Variable The Local Node Descriptors TLV contains Node Descriptors for the local
Descriptors node of the link. This TLV consists of a series of Node Descriptor sub-
TLVs.
Prefix Variable The Prefix Descriptors field is a set of TLV triplets. This field uniquely
Descriptors identifies a prefix originated by a node.
2022-07-08 1787
Feature Description
Item Description
Item Description
2022-07-08 1788
Feature Description
Item Description
Item Description
2022-07-08 1789
Feature Description
Item Description
Item Description
2022-07-08 1790
Feature Description
Item Description
TE Traffic engineering.
2022-07-08 1791
Feature Description
Item Description
2022-07-08 1792
Feature Description
2][IDENTIFIER100][LOCAL[as200][bgp-ls-identifier192.168.11.11][ospf-area-id0.0.0.0][igp-router-
id0000.0000.0004.00]][SID[mt-id0][sid2001:db8:1::1]].
Such routes carry information about reachable network segments.
Table 9 describes the fields in this type of route.
Item Description
Typical Networking
Collecting topology information in an IGP area
In Figure 1, DeviceA, DeviceB, DeviceC, and DeviceD use IS-IS to communicate with each other at the IP
network layer. DeviceA, DeviceB, DeviceC, and DeviceD are all Level-2 devices in the same area (area 10).
After BGP-LS is deployed on any one of the devices (DeviceA, DeviceB, DeviceC, and DeviceD) and this device
establishes a BGP-LS peer relationship with the controller, topology information of the entire network can be
collected and reported to the controller. Reliability can be improved by deploying BGP-LS on two or more
devices and establishing a BGP-LS peer relationship between each BGP-LS device and the controller. Because
the BGP-LS devices collect the same topology information, they back up each other. This means that the
topology information can be reported promptly if any of the BGP-LS devices fails.
2022-07-08 1793
Feature Description
On the network shown in Figure 3, two controllers are each connected to a device in a different AS. If both
controllers need to obtain information about the entire network's topology, a BGP-LS peer relationship
needs to be established between the controllers or between the devices (DeviceB and DeviceC in this
example) connected to the controllers.
2022-07-08 1794
Feature Description
To minimize the number of connections with the controllers, one or more devices can be used as BGP-LS RRs, which
then function as proxies to establish BGP-LS peer relationships between the devices and controllers.
Usage Scenario
The Router functions as a forwarder and reports topology information to the controller for topology
monitoring and traffic control.
Benefits
BGP-LS offers the following benefits:
• Requires only one routing protocol (BGP) to report topology information to the controller.
Background
Route policy distribution (RPD) is used to distribute route-policies dynamically.
Without RPD, route-policies can be generated only through manual configuration, and then the route-
policies are applied to peers. Such a generation mode is not applicable when the route-policies need to be
adjusted dynamically and frequently. For example, in the inbound traffic optimization scenario with an NCE,
the NCE monitors the traffic bandwidth usage on the network in real time, and users perform traffic
optimization based on the analysis result. Specifically, for traffic optimization purposes, route-policies need
to be used to modify route attributes to control the route selection on the peer end. However, the traffic
bandwidth usage constantly changes, leading to the constant changes of traffic optimization policies. In this
case, route-policies configured manually are not suitable. RPD provides a dynamic route-policy distribution
2022-07-08 1795
Feature Description
mode for the NCE. With RPD, route-policy information is transmitted through the BGP RPD address family.
After RPD is configured, you can use the NCE to monitor and control traffic in real time. The traffic
optimization policy configurations are performed on the NCE, not on forwarders. Forwarders receive RPD
routes from the NCE, generate route-policies based on the routes, and implement the route-policies.
Related Concepts
RPD route: Carries route-policy information and distributes the information to peers in the BGP RPD address
family. After learning the RPD route, the receiver converts it into a route-policy and applies the policy.
Field Description
Route-policies carried in RPD routes are encapsulated through the WideCommunity attribute. Figure 1 shows
the WideCommunity format used by RPD routes.
2022-07-08 1796
Feature Description
2022-07-08 1797
Feature Description
RouteAttr 8 Route attribute type. The TLV has three sub-TLVs: IP Prefix,
AS-Path, and Community.
IP Address 32 IP address.
GeMask 8 Used to specify a range. The value of this field must be less
than or equal to the length of the Mask or be 0.
AS-Path 8 AS_Path.
2022-07-08 1798
Feature Description
as-path regex string 32 Content of the AS_Path, which is presented using a regular
expression. The maximum length of the AS_Path is 1024
bytes.
ExcTargetTLV 8 This field is not used currently. Even if this TLV has a value,
it is also ignored.
Implementation
In the following example, the typical networking of inbound traffic optimization is used to describe how RPD
works to implement traffic optimization:
Figure 2 shows an inbound traffic optimization scenario. NCE collects traffic information from devices and
performs analysis and calculation to identify the routes to be adjusted. After a traffic optimization policy is
configured on NCE, NCE converts the policy into an RPD route and delivers the route to the devices, which
are forwarders in this scenario.
2022-07-08 1799
Feature Description
1. Before traffic optimization is implemented, the MED values of the BGP routes advertised from DeviceA
to DeviceC and from DeviceB to DeviceC are 50, and traffic is balanced.
2. NCE collects traffic information from devices and performs calculation and finds that the path from
DeviceC to DeviceA is congested. In this case, it is expected that the traffic that is from AS 200 and
destined for 10.1.1.1/24 enters AS 100 through DeviceB rather than DeviceA. NCE configures a traffic
optimization policy, converts it into an RPD route, and delivers the route to DeviceA to instruct Device
A to increase the MED value of the route advertised to AS 200 to 100. The MED value of the route
advertised by DeviceB to AS 200 remains unchanged.
3. After receiving the RPD route delivered by the NCE, Device A generates a route-policy based on the
RPD route and executes the policy.
4. The RPD route-policy takes effect. After receiving the routes destined for 10.1.1.1/24 from DeviceA and
DeviceB, DeviceC selects the route received from DeviceB because its MED value is smaller than the
MED value of the route received from DeviceA. In this case, traffic that is from AS 200 and destined
for 10.1.1.1/24 enters AS 100 through DeviceB rather than DeviceA.
In this scenario, forwarders receive the policies delivered by NCE and adjust route attributes (MED, Community, or
AS_Path) based on the policies. The forwarders follow the policies strictly when advertising routes, but are not
responsible for the traffic optimization results. You can check the traffic optimization results in real time through NCE.
Usage Scenario
The IP network optimization solution provides users with a method of on-demand traffic scheduling to make
full use of network bandwidth. In the IP network optimization solution, this feature ensures inbound traffic
optimization in MAN ingress or IGW scenarios. In this solution, the Router functions as a forwarder and
needs to be configured with the RPD feature so that the Router executes the route-policies carried in the
2022-07-08 1800
Feature Description
RPD routes delivered by NCE to dynamically adjust traffic for inbound traffic optimization.
Benefits
In traffic optimization scenarios, this feature spares manual BGP route-policy maintenance, which is complex,
time-consuming, and error-prone. Therefore, this feature reduces maintenance workload and improves
maintenance quality.
Background
By default, all BGP routes are stored in the BGP basic instance, and separate route management and
maintenance are impossible. To address this problem, BGP multi-instance is introduced. A device can
simultaneously run two BGP instances: a BGP basic instance and a BGP multi-instance. The two BGP
instances are independent of each other and can have either the same AS number or different AS numbers.
BGP multi-instance can achieve separate route management and maintenance by having different address
families deployed in the BGP basic instance and BGP multi-instance.
Basic Concepts
A BGP instance can be classified as either of the following:
A device can run the BGP basic instance and BGP multi-instance simultaneously. Their AS numbers can be
either the same or different. The BGP multi-instance process functions in a similar way to the BGP basic
instance process.
Implementation
On the network shown in Figure 1, to isolate private and public network services, specifically, to deploy
public network services between Device A and Device B and private network services between Device B and
Device C, configure BGP as follows:
• Configure BGP basic instance bgp 100 and BGP multi-instance bgp 200 instance a on Device B.
The public network BGP-IPv4 unicast address family is enabled in BGP basic instances on Device A and
Device B and a public network EBGP peer relationship is established for the exchange of public network
2022-07-08 1801
Feature Description
routes. The VPN address family is enabled in the BGP multi-instance on Device B and BGP basic instance on
Device C, and an EBGP-VPN peer relationship is established for the exchange of VPN routes. Check route
information on Device A, Device B, and Device C. If Device A has only public network routes, Device B has
both VPN and public network routes, and Device C has only VPN routes, instance-specific management and
maintenance of routes can be achieved.
2022-07-08 1802
Feature Description
1 D An SRGB and Label-Index are configured on node D. The incoming label (In Label for short)
of the route to 1.1.1.1/32 is 16100 on node D. In this case, node D instructs node C to use
16100 as the BGP SR LSP label for the route to 1.1.1.1/32. Node D creates an ILM entry to
guide the processing of the In Label, encapsulates its SRGB and Label-Index into the Prefix
SID attribute of a BGP route, and advertises the BGP route to its BGP peers.
2 C After parsing the BGP message advertised by node D, node C sets the outgoing label (Out
Label for short) to the In Label advertised by node D, instructs the tunnel management
module to update the BGP LSP information, and delivers an NHLFE entry. In addition, node
C calculates the In Label by adding the start value of its SRGB [36000–65535] and the
Label-Index carried in the received message. The calculated In Label is 36100 (36000 +
100). After applying for the label, node C creates an ILM entry.
3 B The process is similar to that on node C. The Out Label is 36100, and the In Label is 26100
(26000 + 100).
Data Forwarding
BGP SR LSPs use the same three types of label operations as those used in MPLS: push, swap, and pop.
2022-07-08 1803
Feature Description
Table 2 describes the process of data forwarding on the network shown in Figure 2.
1 A Node A receives a data packet and finds that the destination IP address of the packet is
1.1.1.1. Node A searches for the corresponding BGP LSP and pushes an inner MPLS label
20100 to the packet.
Node A searches the inner and outer label mapping table, pushes an outer MPLS label
48123 to the packet, and forwards the packet through the outbound interface.
NOTE:
To implement MPLS forwarding, each node creates an inner and outer label mapping table. Take
node A as an example. According to the BGP SR LSP, when the inner label of a packet is 20100, the
destination address is 1.1.1.1. According to the LDP LSP, when the packet needs to be sent to node B,
the outer label 48123 needs to be added to the packet. Therefore, when the inner label of a packet is
20100, the outer label 48123 needs to be added. This mapping is recorded in the inner and outer
label mapping table. With this table, node A does not need to query the IP routing table for an entry
to send the packet to node B. Instead, node A only needs to query its inner and outer label mapping
table for packet forwarding.
2 B After receiving the labeled packet, node B searches for an LDP LSP and pops the outer label
48123. Then, node B searches for a BGP LSP and swaps the inner label from 20100 to
26100. Finally, node B queries its inner and outer label mapping table, pushes an outer
label 48120 to the packet, and forwards the packet through the outbound interface.
4 D After receiving the labeled packet, node D searches for an LDP LSP and pops the outer label
48125 from the packet. Then, node D searches for a BGP LSP, pops the inner label 36100
from the packet, and forwards the packet to the destination.
2022-07-08 1804
Feature Description
Definition
Routing policies are used to filter routes and control how routes are received and advertised. If route
attributes, such as reachability, are changed, the path along which network traffic passes changes
accordingly.
Purpose
When advertising, receiving, and importing routes, the Router implements certain routing policies based on
actual networking requirements to filter routes and change the route attributes. Routing policies serve the
following purposes:
Benefits
Routing policies have the following benefits:
• Modify attributes of routes for proper traffic planning, improving network performance.
2022-07-08 1805
Feature Description
PBR selects routes based on the user-defined routing policies, with reference to the source IP addresses and
lengths of incoming packets. PBR can be used to improve security and implement load balancing.
A routing policy and PBR have different mechanisms. Table 1 shows the differences between them.
Forwards packets based on destination Forwards packets based on the policy. The device searches
addresses in the routing table. the routing table for packet forwarding only after packets
fail to be forwarded based on the policy.
Based on the control plane, serves routing Based on the forwarding plane, serves forwarding.
protocols and routing tables.
Combines with a routing protocol to form a Needs to be configured hop by hop to ensure that packets
policy. are forwarded based on the policies.
1. Define rules: The rules that contain characteristics of routes to which routing policies are applied need
to be defined. Specifically, you need to define a set of matching rules regarding different attributes of
routing information, such as the destination address and the source IP address of the device that
advertises routes. A filter, the core of a routing policy, is used to define a set of matching rules.
2. Apply rules: The rules are used in a routing policy to advertise, accept, and import routes.
Filter
By using filters, you can define matching rules for a group of routing policies. The NE40E provides multiple
types of filters for routing policies. Table 1 lists the filters supported by the device and their application
scopes and matching conditions.
2022-07-08 1806
Feature Description
Access control list (ACL) Dynamic Inbound interface, source or destination IP address,
routing protocol type, and source or destination port number
protocols
IP prefix list Dynamic Source and destination IP addresses and next hop
routing address
protocols
The ACL, IP prefix list, AS_Path filter, Large-community filter, community filter, extended community filter,
and RD filter can only be used to filter routes and cannot be used to modify the attributes of matched
routes. A route-policy is a comprehensive filter, and it can use the matching rules of the ACL, IP prefix list,
AS_Path filter, community filter, extended community filter, and RD filter to filter routes. In addition,
attributes of the matched routes can be modified. The following sections describe the filters in more detail.
ACL
An ACL is a set of sequential filtering rules. Users can define rules based on packet information, such as
inbound interfaces, source or destination IP addresses, protocol types, or source or destination port numbers
and specify an action to deny or permit packets. After an ACL is configured, the system classifies received
packets based on the rules defined in the ACL and denies or permits the packets accordingly.
An ACL only classifies packets based on defined rules and filters packets only after it is applied to a routing
policy.
ACLs are classified as the ACLs that apply to IPv4 routes or those that apply to IPv6 routes. Based on the
2022-07-08 1807
Feature Description
usage, ACLs are classified as interface-based ACLs, basic ACLs, or advanced ACLs. Users can specify the IP
address and subnet address range in an ACL to match the source IP address, destination network segment
address, or the next hop address of a route.
ACLs can be configured on network devices, such as access and core devices, to improve network security
and stability. For example:
• Protect the devices against IP, TCP, and Internet Control Message Protocol (ICMP) packet attacks.
• Control network access. For example, ACLs can be used to control the access of enterprise network
users to external networks, the specific network resources that users can access, and the period for
which users can access networks.
• Limit network traffic and improve network performance. For example, ACLs can be used to limit
bandwidth for upstream and downstream traffic, charge for the bandwidth that users have applied for,
and fully use high-bandwidth network resources.
IP Prefix List
An IP prefix list contains a group of route filtering rules. Users can specify the prefix and mask length range
to match the destination network segment address or the next hop address of a route. An IP prefix list is
used to filter routes that are advertised and received by various dynamic routing protocols.
An IP prefix list is easier and more flexible than an ACL. However, if a large number of routes with different
prefixes need to be filtered, configuring an IP prefix list to filter the routes is complex.
IP prefix lists are classified as IPv4 prefix lists that apply to IPv4 routes or IPv6 prefix lists that apply to IPv6
routes. IPv4 prefix lists and IPv6 prefix lists share the same implementation.. An IP prefix list filters routes
based on the mask length or mask length range.
• Mask length: An IP prefix list filters routes based on IP address prefixes. An IP address prefix is defined
by an IP address and the mask length. For example, for route 10.1.1.1/16, the mask length is 16 bits,
and the valid prefix is 16 bits (10.1.0.0).
• Mask length range: Routes with the IP address prefix and mask length within the range defined in the
IP prefix list meet the matching rules.
0.0.0.0 is a wildcard address. If the IP prefix is 0.0.0.0, specify either a mask or a mask length range, with the following
results:
• If a mask is specified, all routes with the mask are permitted or denied.
• If a mask length range is specified, all routes with the mask length in the range are permitted or denied.
The following table describes the implementation of route matching rules when the preceding wildcard
address is used.
2022-07-08 1808
Feature Description
Neither The post-processing Matches only the An IP prefix list cannot be configured if the
greater- ipv4-address and mask- default IPv4 route. prefix and mask do not match. For example:
equal nor length are 0.0.0.0 and 0, ip ip-prefix aa index 10 permit 1.1.1.1 0
less-equal respectively. Error: Failed to add the address prefix list 0.0.0.0/0,
because the destination address and mask do not
exists. match.
Correct configuration:
ip ip-prefix aa index 10 permit 0.0.0.0 0
greater- The post-processing Matches all the An IP prefix list cannot be configured if the
equal ipv4-address and mask- routes whose mask prefix and mask do not match. For example:
exists, but length are 0.0.0.0 and 0, length is within the ip ip-prefix aa index 10 permit 1.1.1.1 0 greater-
less-equal respectively. range from greater- equal 16
Error: Failed to add the address prefix list 0.0.0.0/0,
does not. equal to 32. because the destination address and mask do not
match.
Correct configuration:
ip ip-prefix aa index 10 permit 0.0.0.0 0 greater-
equal 16 less-equal 32
2022-07-08 1809
Feature Description
greater- The post-processing Matches all the An IP prefix list cannot be configured if the
equal does ipv4-address and mask- routes whose mask prefix and mask do not match. For example:
not exist, length are 0.0.0.0 and 0, length is within the ip ip-prefix aa index 10 permit 1.1.1.1 0 less-equal
but less- respectively. range from 0 to 30
Error: Failed to add the address prefix list 0.0.0.0/0,
equal less-equal. because the destination address and mask do not
does. match.
Correct configuration:
ip ip-prefix aa index 10 permit 0.0.0.0 0 less-equal
30
Both The post-processing Matches all the An IP prefix list cannot be configured if the
greater- ipv4-address and mask- routes whose mask prefix and mask do not match. For example:
equal and length are 0.0.0.0 and 0, length is within the ip ip-prefix aa index 10 permit 1.1.1.1 0 greater-
less-equal respectively. range from greater- equal 5 less-equal 30
2022-07-08 1810
Feature Description
exist. equal to less-equal. Error: Failed to add the address prefix list 0.0.0.0/0,
because the destination address and mask do not
match.
Correct configuration:
ip ip-prefix aa index 10 permit 0.0.0.0 0 greater-
equal 5 less-equal 30
Neither The post-processing Matches only the An IP prefix list cannot be configured if the
greater- ipv6-address and prefix- default IPv6 route. prefix and mask do not match. For example:
equal nor length are :: and 0, ip ipv6-prefix aa index 10 permit 1::1 0
less-equal respectively. Error: Failed to add the address prefix list ::/0,
because the destination address and mask do not
exists. match.
Correct configuration:
2022-07-08 1811
Feature Description
greater- The post-processing Matches all the IPv6 An IP prefix list cannot be configured if the
equal ipv6-address and prefix- routes whose prefix prefix and mask do not match. For example:
exists, but length are :: and 0, length is within the ip ipv6-prefix aa index 10 permit 1::1 0 greater-
less-equal respectively. range from greater- equal 16
Error: Failed to add the address prefix list ::/0,
does not. equal to 128. because the destination address and mask do not
match.
Correct configuration:
ip ipv6-prefix aa index 10 permit :: 0 greater-equal
16 less-equal 128
2022-07-08 1812
Feature Description
greater- The post-processing Matches all the IPv6 An IP prefix list cannot be configured if the
equal does ipv6-address and prefix- routes whose prefix prefix and mask do not match. For example:
not exist, length are :: and 0, length is within the ip ipv6-prefix aa index 10 permit 1::1 0 less-equal
but less- respectively. range from 0 to 120
Error: Failed to add the address prefix list ::/0,
equal less-equal. because the destination address and mask do not
does. match.
Correct configuration:
ip ipv6-prefix aa index 10 permit :: 0 less-equal 120
Both The post-processing Matches all the IPv6 An IP prefix list cannot be configured if the
greater- ipv6-address and prefix- routes whose prefix prefix and mask do not match. For example:
equal and length are :: and 0, length is within the ip ipv6-prefix aa index 10 permit 1::1 0 greater-
less-equal respectively. range from greater- equal 5 less-equal 30
Error: Failed to add the address prefix list ::/0,
exist. equal to less-equal. because the destination address and mask do not
match.
Correct configuration:
ip ipv6-prefix aa index 10 permit :: 0 greater-equal
5 less-equal 30
2022-07-08 1813
Feature Description
AS_Path
An AS_Path filter is used to filter BGP routes based on AS_Path attributes contained in BGP routes. The
AS_Path attribute is used to record in distance-vector (DV) order the numbers of all ASs through which a
BGP route passes from the local end to the destination. Therefore, AS_Path attributes can be used to filter
BGP routes.
The matching condition of an AS_Path is specified using a regular expression. For example, ^30 indicates
that only the AS_Path attribute starting with 30 is matched. Using a regular expression can simplify
configurations. For details about regular expressions, see Configuration Guide - Basic Configurations.
The AS_Path attribute is a private attribute of BGP and is therefore used to filter BGP routes only. For details about the
AS_Path attribute, see BGP Fundamentals.
Community
A community filter is used to filter BGP routes based on the community attributes contained in BGP routes.
The community attribute is a group of destination addresses with the same characteristics. Therefore,
community attributes can be used to filter BGP routes.
In addition to the well-known community attributes, users can define community attributes using digits. The
matching condition of a community filter can be specified using a community ID or a regular expression.
Like AS_Path filters, community filters are used to filter only BGP routes because the community attribute is also a
2022-07-08 1814
Feature Description
private attribute of BGP. For details about the community attribute, see Community Attribute.
Large-community
A large-community filter is used to filter BGP routes based on large-community attributes contained in BGP
routes. The large-community attribute is an extended community attribute. The community attribute is a
group of destination addresses with the same characteristics and consists of a set of 4-byte values, each of
which specifies a community. Generally, the community attribute on the NE40E is in the format of aa:nn,
where aa specifies a 2-byte AS number and nn specifies the community attribute ID defined by an
administrator. The community attribute is not flexible enough because it fails to carry a 4-byte AS number
and contains only one community attribute ID. To address this problem, the large-community attribute can
be used instead. The large-community attribute consists of a set of 12-byte values and is in the format of
Global Administrator:LocalData1:LocalData2.
The large-community filter is used to filter only BGP routes because the large-community attribute is also a private
attribute of BGP. For details about the large-community attribute, see Large-Community Attribute.
Extended Community
An extended community is used to filter BGP routes based on extended community attributes. BGP extended
community attributes are classified as follows:
• VPN target: A VPN target controls route learning between VPN instances, isolating routes of VPN
instances from each other. A VPN target may be either an import or export VPN target. Before
advertising a VPNv4 or VPNv6 route to a remote MP-BGP peer, a PE adds an export VPN target to the
route. After receiving a VPNv4 or VPNv6 route, the remote MP-BGP peer compares the received export
VPN target with the local import VPN target. If they are the same, the remote MP-BGP peer adds the
route to the routing table of the local VPN instance.
• Source of Origin (SoO): Several CEs at a VPN site may be connected to different PEs. The VPN routes
advertised from the CEs to the PEs may be re-advertised to the VPN site where the CEs reside after the
routes have traversed the backbone network, causing routing loops at the VPN site. In this situation,
configure an SoO attribute for VPN routes. With the SoO attribute, routes advertised from different VPN
sites can be distinguished and will not be advertised to the source VPN site, preventing routing loops.
2022-07-08 1815
Feature Description
• Segmented-nh: The segmented-nh can be added to intra-AS I-PMSI A-D routes in an NG MVPN
scenario where segmented tunnels are used.
The matching condition of an extended community can be specified using an extended community ID or a
regular expression.
An extended community is used to filter only BGP routes because the extended community attribute is also a private
attribute of BGP.
RD
An RD is used to filter BGP routes based on RDs in VPN routes. RDs are used to distinguish IPv4 and IPv6
prefixes in the same address space in VPN instances. RD-specific matching conditions can be configured in
an RD filter.
For details about how to configure an RD, see HUAWEI NE40E-M2 series Universal Service Router
Configuration Guide – VPN.
Route-Policy
A route-policy is a complex filter. It is used to match attributes of specified routes and change route
attributes when specific conditions are met. A route-policy can use the preceding six filters to define its
matching rules.
Composition of a Route-Policy
As shown in the following figure, a route-policy consists of node IDs, matching modes, if-match clauses,
apply clauses, and goto next-node clauses. The if-match, apply, and goto next-node clauses are optional.
1. Node ID
A route-policy can consist of multiple nodes. The method of specifying a node ID is the same as that
2022-07-08 1816
Feature Description
of specifying an index for an IP prefix list. In a route-policy, routes are filtered based on the following
rules:
• Sequential match: A device checks entries in ascending order by node ID. Specifying the node IDs
in a required order is recommended.
• Unique match: The relationship among the nodes of a route-policy is "OR". If a route matches
one node, the route matches the route-policy and will not be matched against a next node.
2. Matching mode
• permit: indicates the permit mode of a node. If a route matches the if-match clauses of the node
in permit mode, the apply clauses of the node are executed, and the route will not be matched
against a next node. If the route does not match the if-match clauses of the node, the device
continues to match the route against a next node.
• deny: indicates the deny mode of a node. In deny mode, apply clauses are not executed. If a
route matches all if-match clauses of the node, the route is rejected and is not matched against a
next node. If the entry does not match if-match clauses of the node, the device continues to
match the route against a next node.
To allow other routes to pass through, a route-policy that contains no if-match or apply clause in the permit
mode is usually configured for a node after multiple nodes in the deny mode are configured.
3. if-match clause
The if-match clause defines the matching rules.
Each node of a route-policy can comprise multiple or none if-match clauses. By default, if the address
family that a route belongs to does not match that specified in an if-match clause of a route-policy,
the route matches the route-policy. Take a route-policy node in permit mode (permit node for short)
as an example. If no if-match clause is configured for the permit node, all IPv4 and IPv6 routes are
considered to match this node. If the permit node is configured with if-match clauses for filtering IPv4
routes only, IPv4 routes that match the if-match clauses and all IPv6 routes are considered to match
this node. If the permit node is configured with if-match clauses for filtering IPv6 routes only, IPv6
routes that match the if-match clauses and all IPv4 routes are considered to match this node. This
implementation also applies to a deny node.
You are not advised to use the same route-policy to filter both IPv4 and IPv6 routes by default.
Otherwise, services may be interrupted in the following scenarios:
• For the same route-policy, some nodes apply only to IPv4 routes and some nodes apply only to
IPv6 routes.
2022-07-08 1817
Feature Description
To use the same route-policy to filter both IPv4 and IPv6 routes, you can change the default behavior
of the route-policy. When the address family that a route belongs to does not match that specified in
an if-match clause of a route-policy, to set the default action of the route-policy to deny, run the
route-policy address-family mismatch-deny command. Take a permit node as an example. If no if-
match clause is configured for the permit node, all IPv4 and IPv6 routes are considered to match this
node. If the permit node is configured with only an if-match clause for filtering IPv4 routes, only IPv4
routes that match the if-match clause are considered to match this node, and no IPv6 routes match
this node. If the permit node is configured with only an if-match clause for filtering IPv6 routes, only
IPv6 routes that match the if-match clause are considered to match this node, and no IPv4 routes
match this node. This implementation also applies to a deny node.
If an if-match clause of a node uses information such as the next hop address or direct route source as a
matching condition, the node compares the address family to which the next hop address or direct route source
belongs with that specified in the if-match clause.
4. apply clause
The apply clauses specify actions. When a route matches a route-policy, the system sets some
attributes for the route based on the apply clause.
Each node of a route-policy can comprise multiple apply clauses or no apply clause at all. No apply
clause needs to be configured if routes are to be filtered but their attributes do not need to be set.
If if-match clauses are not configured in a route-policy but apply clauses are configured, the route-
policy does not have any filtering conditions to match routes. In this case, if the matching mode of a
route-policy node is set to permit, all routes are permitted and the apply clauses are executed; if the
matching mode is set to deny, all routes are denied and the apply clauses are not executed.
The matching results of a route-policy are obtained based on the following aspects:
• Matching rules (either permit or deny) contained in the if-match clause (using filters such as IP prefix
lists or ACLs)
2022-07-08 1818
Feature Description
permit permit Routes matching the if-match clauses of the node match
the route-policy, and the matching is complete.
Routes not matching the if-match clauses of the node
continue to match against the next node of the route-
policy.
deny permit Routes matching the if-match clauses of the node are
denied by the route-policy and continue to match against
the next node.
Routes not matching the if-match clauses of the node
continue to match against the next node of the route-
policy.
By default, all the routes that do not match the filtering conditions in a route-policy on the HUAWEI NE40E-M2 series
are rejected by the route-policy. If more than one node is defined in a route-policy, at least one of them must be in
permit mode. The reason is as follows:
• If a route fails to match any of the nodes, the route is denied by the route-policy.
• If all the nodes in the route-policy are set in deny mode, all the routes to be filtered are denied by the route-policy.
2022-07-08 1819
Feature Description
Other Functions
In addition to the preceding functions, routing policies have an enhanced feature: BGP to IGP.
In some scenarios, when an IGP uses a routing policy to import BGP routes, route attributes, the cost for
example, can be set based on private attributes, such as the community in BGP routes. However, without the
BGP to IGP feature, BGP routes are denied because the IGP fails to identify private attributes, such as
community attributes in these routes. As a result, apply clauses used to set route attributes do not take
effect.
With the BGP to IGP feature, route attributes can be set based on private attributes, such as the community,
extended community, and AS_Path attributes in BGP routes. The BGP to IGP implementation process is as
follows:
• When an IGP imports BGP routes through a route-policy, route attributes can be set based on private
attributes, such as the community attribute in BGP routes.
• If BGP routes carry private attributes, such as community attributes, the system filters the BGP routes
based on the private attributes. If the BGP routes meet the matching rules, the routes match the route-
policy, and apply clauses take effect.
• If BGP routes do not carry private attributes, such as community attributes, the BGP routes fail to match
the route-policy and are denied, and apply clauses do not take effect.
2022-07-08 1820
Feature Description
There are multiple approaches to meet the preceding requirements, and the following two approaches are
used in this example:
■ Configure an IP prefix list for Device A and configure the IP prefix list as an export policy on Device
A for OSPF.
■ Configure another IP prefix list for Device C and configure the IP prefix list as an import policy on
Device C for OSPF.
• Use route-policies
■ Configure a route-policy (the matching rules can be the IP prefix list, cost, or route tag) for Device
A and configure the route-policy as an export policy on Device A for OSPF.
■ Configure another route-policy on Device C and configure the route-policy as an import policy on
Device C for OSPF.
Compared with an IP prefix list, a route-policy can change route attributes and control routes more
flexibly, but it is more complex to configure.
To meet the preceding requirements, configure a route-policy for Device A to set a tag for the imported IS-IS
routes. Device D identifies the IS-IS routes from OSPF routes based on the tag.
2022-07-08 1821
Feature Description
C.
To establish an inter-AS label switched path (LSP) between PE1 and PE2, route-policies need to be
configured for autonomous system boundary routers (ASBRs).
• When an ASBR advertises the routes received from a PE in the same AS to the peer ASBR, the ASBR
allocates MPLS labels to the routes using a route-policy.
• When an ASBR advertises labeled IPv4 routes to a PE in the same AS, the ASBR reallocates MPLS labels
to the routes using another route-policy.
In addition, to control route transmission between different VPN instances on a PE, configure a route-policy
for the PE and configure the route-policy as an import or export policy for the VPN instances.
2022-07-08 1822
Feature Description
To enable devices on the MAN to access the backbone network, Device C and Device D need to import
routes. When OSPF imports BGP routes, a routing policy can be configured to control the number of
imported routes based on private attributes (such as the community) of the imported BGP routes or modify
the cost of the imported routes to control the MAN egress traffic.
Definition
Extended routing-policy language (XPL) is a language used to filter routes and modify route attributes. By
modifying route attributes (including reachability), XPL changes the path through which network traffic
passes. XPL provides the same functions as routing policies do, but it uses different editing and filtering
methods from routing policies. Therefore, XPL can meet different customer requirements.
Table 1 compares XPL and routing policies.
XPL Filters routes and Line-by-line or Uses sets or single Users can configure or modify
modifies route paragraph-by- elements to filter policies as required in a text
attributes. paragraph routes. editor.
editing
2022-07-08 1823
Feature Description
Routing Filter routes and Line-by-line Use filters or single Users must follow strict
policies modify route editing elements to filter command configuration rules.
attributes. routes.
For details about routing policies, see "Routing Policies" in HUAWEI NE40E-M2 seriesUniversal Service Router Feature
Description — IP Routing.
Line-by-line Users who are Each command is run in a The desired command can be
editing used to the command view, and one suggested using the command
traditional command is presented in one association function.
configuration line, which is considered a If any configuration error occurs, it is
method or configuration unit. reported after the command is
unfamiliar with NOTE: configured.
XPL
To modify an existing global
variable set, route attribute
set, or route-filter through
line-by-line editing, enter the
specific command view and
reconfigure the set or policy.
Paragraph- Users who are The paragraph editing UI The command association function is
by- familiar with XPL functions as a text editor, in not supported, and complete clauses
paragraph clause which users edit XPL clauses. must be entered in the paragraph
editing configuration The XPL clauses are committed editing UI.
and want to after a paragraph of them are If any configuration error occurs, it is
simplify the configured, and each reported after the configurations of the
configuration paragraph is considered a whole paragraph are committed.
process configuration unit.
2022-07-08 1824
Feature Description
Purpose
When advertising, receiving, or importing routes, the Router can use XPL based on actual networking
requirements to filter routes and modify route attributes. XPL serves the following purposes:
Benefits
XPL offers the following benefits:
• Improves network performance by modifying route attributes for effective traffic planning.
XPL Implementation
XPL implementation involves the following two steps:
1. Define rules: Define route characteristics for route matching. Specifically, you need to define a set of
matching rules based on route attributes, such as the destination address and the address of the
router that advertises the routes. For details, see Route-Filters.
2. Apply rules: Apply the matching rules to route advertisement, acceptance, and import.
Sets
A set is a group of data that XPL uses as matching rules. Sets are classified as global variable sets and route
2022-07-08 1825
Feature Description
attribute sets.
• Global variable set: A global variable set is a group of frequently used values that are defined as global
variables. Global variables are variables that can be referenced by all route-filters on a device. To enable
a route-filter to reference a global variable, enter $+variable name, for example, $glovar1. The global
variables on a device must be unique. A new global variable will override an existing global variable
with the same name.
• Route attribute set: A route attribute set is a group of data concerning a route attribute. If the routes to
be filtered have the same or similar route attribute, for example, they are destined for the same
network segment or originate from the same AS, you can configure a route attribute set for the routes
as matching rules. The application scopes and matching items vary with the route attribute set. Table 1
shows the application scopes and matching items of different route attribute sets.
Sets do not have the permit or deny function as routing policies do. Instead, sets are only groups of data used as
matching rules, and the actions to be applied are specified in route-filters.
Extended community
VPN routes Route target and Site-of-Origin
set
Route-Filters
Route-filters are used to filter routes based on sets or a single element and modify route attributes of the
routes that match specified rules. Route-filters consist of condition and action clauses.
• Condition clause: A condition clause is defined based on a set or single element. The action specified in
2022-07-08 1826
Feature Description
the action clause is applied only to the routes that match the conditions specified in the condition
clause.
• Action clause: An action clause specifies an action to be applied to the routes that match the conditions
specified in the condition clause. An action clause determines whether the routes match the route-filter
or modifies their route attributes.
Figure 1 shows how a route-filter is used to filter routes. For details about condition and action clauses, see
XPL Statements.
XPL Statements
XPL statements are used to convert matching rules to sets and route-filters. XPL statements include the
remark, set definition, set element, condition clause, action clause, and route-filter with pre-defined
variables.
Remark
A remark is an explanation attached to an XPL policy configuration line, beginning with an exclamatory
mark (!).
NOTE:
If the list is not empty, no remarks can be configured in the last line (the line above the end-list).
2022-07-08 1827
Feature Description
10.0.3.0 24 eq 26,
10.0.2.0 24 le 28,
10.0.4.0 24 ge 26 le 30
end-list
Set definition
A set definition specifies matching rules and begins and ends with apparent clauses.
For example, an IPv4 prefix set begins with xpl ip-prefix-list and ends with end-list, with a group of IPv4
prefixes in between.
xpl ip-prefix-list prefix-list2
10.0.1.0 24,
10.0.3.0 24 eq 26,
10.0.2.0 24 le 28,
10.0.4.0 24 ge 26 le 30
end-list
Set element
Set elements include elements such as IP prefixes, AS_Path values, and communities. The elements are
separated with commas. Elements in a route-filter must have the same type as the route-filter.
Condition clause
Condition clauses are used in route-filters. Condition clauses can be used with sets to define matching
rules. Condition clauses can be if, elseif, or else clauses. Condition clauses may include eq (equal to), ge
(greater than or equal to), le (less than or equal to), and in (included in) expressions, which can be used
in conjunction with the Boolean condition not, and, or or.
in can be followed by a set so that the elements in the set are used as matching rules.
Action clause
2022-07-08 1828
Feature Description
Action clauses specify the actions to be applied to given routes and include the following clauses:
approve clause: permits routes.
refuse clause: denies routes.
finish clause: completes route filtering and indicates that the route matches the route-filter.
abort clause: aborts the route-filter or set modification.
apply clause: modifies route attributes.
call clause: references other route-filters.
break clause: enables the device to exit from the current route-filter. If the current route-filter is
referenced by a parent route-filter, the device keeps implementing remaining condition and action clauses
of the parent route-filter.
XPL supports route-filters with pre-defined variables. Route-filters with pre-defined variables can be
referenced during route-filter configuration through call clauses.
• After receiving the route from Device B, Device C forwards it directly to Device E, and Device D increases
the MED attribute of the route before forwarding it to Device E to ensure that Device C functions as the
2022-07-08 1829
Feature Description
1. Configure an IPv4 prefix set named ip-prefix1, which includes only the 172.16.17.0 24 element, on
Device A.
2. Configure a route-filter named route-filter1 on Device A, which permits the route carrying the
element in ip-prefix1 and denies other routes.
3. Configure route-filter1 as an export policy on Device A so that Device A advertises only route
172.16.17.0/24 to Device B.
4. Configure an IPv4 prefix set named ip-prefix2, which includes only 172.16.17.0 24, on Device D.
5. Configure a route-filter named route-filter2, which increases the MED value of the route carrying the
element in ip-prefix2, on Device D.
6. Configure route-filter2 as an export policy on Device D so that the MED value of the route advertised
by Device D is greater than that of the route advertised by Device C, making Device C the egress for
the traffic from Device E to 172.16.17.0/24.
Definition
All monitored network-side routes of the same type can be added to a group called a route monitoring
group. Each route monitoring group is identified by a unique name.
A route monitoring group monitors the status of its member routes, each of which has a down-weight. The
down-weight indicates the link quality. The higher the value, the more important the route. The down-
weight can be set based on parameters, such as the link bandwidth, rate, and cost.
• If a route in the route monitoring group goes Down, its down-weight is added to the down-weight sum
2022-07-08 1830
Feature Description
• If a route in the route monitoring group goes Up again, its down-weight is subtracted from the down-
weight sum of the route monitoring group.
Purpose
Service modules can be associated with the route monitoring group, with a threshold configured for each
service module for triggering a primary/backup access-side link switchover. If the down-weight sum of the
route monitoring group reaches the threshold of a service module, the routing management (RM) module
notifies the service module to switch services from the primary link to the backup link. If the down-weight
sum of the route monitoring group falls below the threshold, the RM module notifies the service module to
switch services back.
Benefits
If a service module is associated with a route monitoring group in a dual-system backup scenario, services
can be switched to the backup link if the primary link fails, therefore preventing traffic overload and
forwarding failures.
• A route monitoring group monitors the status of its member routes. If a route in the group goes Down,
its down-weight is added to the down-weight sum of the route monitoring group. If the down-weight
sum of the route monitoring group reaches the threshold of a service module, the RM module notifies
the service module to switch services from the primary link to the backup link.
• If a route in the route monitoring group goes Up again, its down-weight is subtracted from the down-
weight sum of the route monitoring group. If the down-weight sum of the route monitoring group falls
below the threshold of a service module, the RM module notifies the service module to switch services
back. You can specify the switchback delay time based on your actual network requirements.
As shown in Figure 1, the route monitoring group contains 10 routes, each having a down-weight of 10. The
route monitoring group is associated with service modules A, B, C, and D whose thresholds for a
primary/backup switchover are 80, 50, 30, and 20, respectively.
• If two routes in the route monitoring group go Down, the RM module notifies service module D to
switch services from the primary link to the backup link. If one more route goes Down, the RM module
2022-07-08 1831
Feature Description
notifies service module C to perform a primary/backup link switchover. If five routes go Down, the RM
module notifies service module B to perform a primary/backup link switchover. If the number of routes
that go Down reaches eight, the RM module notifies service module A to perform a primary/backup link
switchover.
• If the number of routes that are Down in the route monitoring group falls below eight, the RM module
notifies service module A to switch services back. If the number of routes that are Down in the route
monitoring group falls below five, the RM module notifies service module B to switch services back. If
the number of routes that are Down in the route monitoring group falls below three, the RM module
notifies service module C to switch services back. If the number of routes that are Down in the route
monitoring group falls below two, the RM module notifies service module D to switch services back.
Service Overview
To improve network reliability, most carriers implement device-level redundancy by deploying two devices.
The two devices back up each other or share traffic load. If one of the devices fails, the other device takes
over services. Despite the benefit of enhanced network reliability, you must dual-home other devices to the
two devices, which may introduce link reliability and load balancing issues.
Networking Description
As shown in Figure 1, BRAS 2 backs up BRAS 1. NPEs on the user side are dual-homed to the two BRASs to
load-balance traffic, and the two BRASs are connected to Routers on the network side.
• If the link between BRAS 1 and Device A or between BRAS 1 and Device B fails, the link bandwidth
2022-07-08 1832
Feature Description
between BRAS 1 and the IP core network decreases. The NPEs, however, cannot detect the link failure
and keep sending packets to the IP core network through BRAS 1. As a result, the other link between
BRAS 1 and the IP core network may be overloaded.
• If the links between BRAS 1 and Device A and between BRAS 1 and Device B both fail, only the links
between BRAS 2 and the IP core network are available. The NPEs, however, cannot detect the link
failure and keep sending packets to the IP core network through BRAS 1. As a result, the packets are
discarded.
Feature Deployment
To address the packet drop issue, deploy a route monitoring group on each BRAS, and add network-side
routes of the BRAS to the route monitoring group. If the down-weight sum of the route monitoring group
reaches the threshold of a service module that is associated with the group, the RM module will notify the
service module to trigger a primary/backup access-side link switchover. This mechanism prevents traffic
overload and service interruptions.
2022-07-08 1833
Feature Description
Terms
Term Definition
Route monitoring A group that consists of monitored network-side routes of the same type.
group
2022-07-08 1834
Feature Description
11 IP Multicast
Purpose
This document describes the IP multicast feature in terms of its overview, principles, and applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
2022-07-08 1835
Feature Description
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
2022-07-08 1836
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2022-07-08 1837
Feature Description
Definition
IP multicast is a method of sending a single IP stream to multiple receivers simultaneously, reducing
bandwidth consumption. IP multicast provides benefits for point to multi-point (P2MP) services, such as e-
commerce, online conferencing, online auctions, video on demand, and e-learning. P2MP services offer
opportunities for significant profits, yet require high bandwidth and secure operation. IP multicast is used to
meet these requirements.
• Unicast IP address
A unicast IP address can identify only one host, and a host can identify only one unicast IP address. An
IP packet that carries a unicast destination address can be received by only one host.
• Broadcast IP address
A broadcast IP address can identify all hosts on a network segment, and an IP packet that carries a
broadcast destination IP address can be received by all hosts on a network segment. However, a host
can identify only one broadcast IP address. IP broadcast packets cannot be transmitted across network
segments.
• Multicast IP address
A multicast IP address can identify multiple hosts at different locations, and a host can identify multiple
multicast IP addresses. An IP packet that carries a multicast destination IP address can therefore be
received by multiple hosts at different locations.
IP Transmission Modes
Based on the IP address types, networks can transmit packets in the following modes:
• IP unicast mode
2022-07-08 1838
Feature Description
• IP broadcast mode
• IP multicast mode
• Unicast transmission
■ Features: A unicast packet uses a unicast address as the destination address. If multiple receivers
require the same packet from a source, the source sends an individual unicast packet to each
receiver.
■ Disadvantages: This mode consumes unnecessary bandwidth and processor resources when sending
the same packet to a large number of receivers. Additionally, the unicast transmission mode does
not guarantee transmission quality when a large number of hosts exist.
• Broadcast transmission
■ Features: A broadcast packet uses a broadcast address as the destination address. In this mode, a
source sends only one copy of each packet to all hosts on the network segment, irrespective of
whether a host requires the packet.
■ Disadvantages: This mode requires that the source and receivers reside on the same network
segment. Because all hosts on the network segment receive packets sent by the source, this mode
cannot guarantee information security or charging of services.
• Multicast transmission
As shown in Figure 1, a source exists on the network. User A and User C require information from the
source, while User B does not. The transmission mode is multicast.
■ Features: A multicast packet uses a multicast address as the destination address. If multiple
2022-07-08 1839
Feature Description
receivers on a network segment require the same packet from a source, the source sends only one
packet to the multicast address.
The multicast protocol deployed on the network establishes a routing tree for the packet. The tree's
root is the source, and routes branch off to all multicast members. As shown in Figure 1, multicast
data is transmitted along the path: Source → DeviceB → DeviceE [ →DeviceD → User A | → Device
F → User C ].
■ Advantages: In multicast mode, a single information flow is sent to users along the distribution
tree, and a maximum of one copy of the data flow exists on each link. Users who do not require
the packet do not receive the packet, providing the basis for information security. Compared with
unicast, multicast does not increase the network load when the number of users increases in the
same multicast group. This advantage prevents the server and CPU from being overloaded.
Compared with broadcast, multicast can transmit information across network segments and across
long distances.
Multicast technologies therefore provide the ideal solution when one source must address multiple
receivers with efficient P2MP data transmission.
Multicast Group
A multicast group consists of a group of receivers that require the same data stream. A multicast group uses
an IP multicast address identifier. A host that joins a multicast group becomes a member of the group and
can identify and receive IP packets that have the IP multicast address as the destination address.
Multicast Source
A multicast source sends IP packets that carry multicast destination addresses.
• Multiple multicast sources can simultaneously send data to a same multicast group.
2022-07-08 1840
Feature Description
to join or leave a multicast group, so the members of a multicast group are dynamic. The members can be
located anywhere on a network.
A multicast source is generally not a receiver or a member of a multicast group.
Multicast Router
A Router that supports the multicast feature is called a multicast router.
A multicast router implements the following functions:
• Manages group members on the leaf segment networks that connect to users.
IP multicast is an end-to-end service. Figure 1 shows the four IP multicast functions from the lower protocol
layer to the upper protocol layer.
2022-07-08 1841
Feature Description
• Addressing mechanism: transmits data to multicast groups based on multicast destination addresses.
• Host registration: allows a host to dynamically join or leave a group, implementing group member
management.
• Multicast routing: sets up a distribution tree to transmit packets from a source to receivers.
• Multicast application: To work together, multicast sources and receivers must support the same
multicast application software, such as a video conferencing application. The TCP/IP protocol suite must
support multicast data transmission and receipt.
• Multicast IP addresses are needed to implement the communication between a source and its receivers
on the network layer.
• Link layer multicast (also known as hardware multicast) is needed to transmit multicast data on a local
physical network. On an Ethernet link layer network, hardware multicast uses multicast MAC addresses.
• An IP-to-MAC address mapping technology is needed to map multicast IP addresses to multicast MAC
addresses.
2022-07-08 1842
Feature Description
239.0.0.0 to 239.255.255.255 Temporary ASM group addresses valid only in the local
administration domain, called local administration multicast
addresses. Local administration multicast addresses are private
addresses. The same local administrative group address can be
used in different administration domains.
• Permanent multicast group addresses, also known as reserved multicast group addresses, are reserved
by the Internet Assigned Number Authority (IANA) for routing protocols and remain unchanged. Each
permanent multicast group address identifies all devices in a multicast group that may contain any
number (including 0) of members. For details, see Table 2.
• A temporary multicast group address, also known as a common group address, is an IPv4 address that
is assigned to a multicast group temporarily. If there is no user in this group, this address is reclaimed.
2022-07-08 1843
Feature Description
2022-07-08 1844
Feature Description
■ 0: indicates the most significant bit, which is reserved and has a fixed value of 0.
■ R: indicates whether the multicast address is embedded with an RP address. If the value is 1, the
multicast address is embedded with an RP address.
■ P: indicates whether the address is a unicast prefix-based multicast address. If the value is 1, the
address is a unicast prefix-based multicast address.
■ T: indicates whether a multicast address is a permanent multicast group address. If the value is 0,
the address is a multicast address is a permanent multicast group address or a well-known
multicast address defined by the IANA.
• The scope field (4 bits) indicates whether a multicast group contains any node in the global address
space or only the nodes of the same local network, the same site, or the same organization. Values in
this field are defined as follows:
■ 1: node/interface-local scope
■ 2: link-local scope
■ 4: admin-local scope
■ 5: site-local scope
■ 8: organization-local scope
■ E: global scope
Table 3 shows the scopes and meanings of fixed IPv6 multicast addresses.
Scope Description
FF3x::/32 (x cannot be 1 or 2) SSM addresses. This is the default SSM group address scope and
is valid on the entire network.
2022-07-08 1845
Feature Description
2022-07-08 1846
Feature Description
• The last bit in the first byte of a unicast MAC address is fixed at 0.
• The last bit in the first byte of a multicast MAC address is fixed at 1.
Multicast MAC addresses identify receivers of the same multicast group at the link layer.
Ethernet interface boards can identify multicast MAC addresses. After a multicast MAC address of a
multicast group is configured on a device's driver, the device can then receive and forward data of the
multicast group on the Ethernet. The mapping between the multicast IPv4 address and multicast IPv4 MAC
address is as follows:
As defined by the IANA, the 24 most significant bits of a MAC address are 0x01005e, the 25th bit is 0, and
the 23 least significant bits are the same as those of a multicast IPv4 address. Figure 2 shows the mapping
between multicast IPv4 addresses and multicast MAC addresses.
Figure 2 Mapping between multicast IPv4 addresses and multicast MAC addresses
The first four bits of an IPv4 multicast address, 1110, are mapped to the 25 most significant bits of a
multicast MAC address. In the last 28 bits, only 23 bits are mapped to a MAC address, resulting in the loss of
5 bits. Therefore, 32 IPv4 multicast addresses are mapped to the same MAC address.
As defined by the IANA, the higher-order 16 bits of an IPv6 MAC address are 0x3333, and the low-order 32
bits of an IPv6 MAC address are the same as those of a multicast IPv6 address. Figure 3 shows the mapping
between multicast IPv6 addresses and multicast IPv6 MAC addresses.
2022-07-08 1847
Feature Description
Figure 3 Mapping between multicast IPv6 addresses and multicast MAC addresses
This document focuses on IP multicast technology and device operation. Multicast in the document refers to IP multicast,
unless otherwise specified.
2022-07-08 1848
Feature Description
The NE40E supports various multicast routing protocols to implement different applications. Table 1
describes commonly used multicast routing protocols.
Between a user Internet Group Management Allows hosts to access multicast networks:
host and a Protocol (IGMP) for IPv4 networks On the host side, IGMP/MLD allows hosts to
multicast router Multicast Listener Discovery (MLD) dynamically join and leave multicast groups.
for IPv6 networks On the Router side, IGMP/MLD exchanges
information with upper layer multicast routing
protocols and manages and maintains multicast
group member relationships.
Between multicast Protocol Independent Multicast Routes and forwards multicast packets:
routers in the same (PIM) Creates multicast routing entries.
domain
Responds to network topology changes and
maintains multicast routing tables.
Forwards multicast data based on routing
entries.
Between multicast Multicast Source Discovery Protocol Inter-domain multicast source information
routers in different (MSDP) for IPv4 networks sharing:
domains Transmits source information between routers
in different domains.
Multicast protocols have two main types of functions: managing member relationships; establishing and
maintaining multicast routes.
2022-07-08 1849
Feature Description
• IGMP has three versions: IGMPv1, IGMPv2, and IGMPv3. At present, IGMPv2 is most widely used. IGMP
versions are backward compatible.
• All the IGMP versions support the Any-Source Multicast (ASM) model. IGMPv3 can support the Source-
Specific Multicast (SSM) model independently, while IGMPv1 or IGMPv2 needs to work with SSM
mapping to support the SSM model.
• Both of the two MLD versions support the ASM model. MLDv2 supports the SSM model independently,
while MLDv1 needs to work with SSM mapping to support the SSM model.
• Intra-domain multicast routing protocols discover multicast sources and establish multicast distribution
trees in an autonomous system (AS) to deliver information to receivers.
• Inter-domain multicast routing protocols transmit multicast source information between domains to set
up inter-domain routes. Multicast resources can then be shared among different domains. MSDP is a
typical inter-domain multicast routing protocol. It usually works with the Multicast Border Gateway
Protocol (MBGP) to implement inter-domain multicast. MSDP applies to domains that run PIM-SM.
In the SSM model, domains are not classified as intra-domains or inter-domains. Receivers know the location
of the multicast source domain; therefore, multicast transmission paths can be directly established with the
help of partial PIM-SM functions.
• ASM model
• SFM model
2022-07-08 1850
Feature Description
• SSM model
ASM Model
In the any-source multicast (ASM) model, any sender can act as a multicast source and send information to
a multicast group address. Receivers can receive the information sent to this group after joining the group
and can join and leave the group any time. Receivers do not know the multicast source location before they
join a multicast group.
SFM Model
From the sender's point of view, the source-filtered multicast (SFM) model works the same as the ASM
model. That is, any sender can act as a multicast source and send information to a multicast group address.
Compared with the ASM model, the SFM model extends the following function: The upper layer software
checks the source addresses of received multicast packets, permitting or denying packets of multicast
sources as configured.
Compared with ASM, SFM adds multicast source filtering policies. The basic principles and configurations of ASM and
SFM are the same. In this document, information about ASM also applies to SFM.
SSM Model
In real-world situations, users may not require all data sent by multicast sources. The source-specific
multicast (SSM) model allows users to specify multicast data sources.
Compared with receivers in the ASM model, receivers in the SSM model know the multicast source location
before they join a multicast group. The SSM model uses a different address scope from the ASM model and
sets up a dedicated forwarding path between a source and receivers.
• Reverse path forwarding (RPF) ensures that multicast routing uses the shortest path tree. RPF is used by
most multicast protocols to create multicast route entries and forward packets.
2022-07-08 1851
Feature Description
On this network:
• P belongs to the public network. Each customer edge (CE) device belongs to a VPN. Each Router is
dedicated to a network and maintains only one forwarding mechanism.
• PEs are connected to both the public network and one or more VPN networks. The network information
must be completely separated, and a separate set of forwarding mechanism needs to be maintained for
each network. The set of software and hardware device that serves the same network on the PE is
called an instance. A PE supports multiple instances, and one instance can reside on multiple PEs.
For details of the multi-instance multicast technique, see the HUAWEI NE40E-M2 series Universal Service RouterFeature
Description - VPN.
• Maintains a separate multicast forwarding mechanism for each instance. A forwarding mechanism
2022-07-08 1852
Feature Description
supports all multicast protocols and maintains a PIM neighbor list and a multicast routing table. Each
instance searches its own forwarding table or routing table when forwarding multicast data.
• Implements communication and data exchange between a public network instance and a VPN instance.
Definition
In the TCP/IP protocol suite, the Internet Group Management Protocol (IGMP) manages IPv4 multicast
members, and sets up and maintains multicast member relationships between IP hosts and their directly
connected multicast routers.
After IGMP is configured on hosts and their directly connected multicast routers, the hosts can dynamically
join multicast groups, and the multicast routers can manage multicast group members on the local network.
IGMP is a signaling mechanism used by IP multicast on the end network. IGMP is applicable to both the host
side and router side:
• On the host side, IGMP allows hosts to dynamically join and leave multicast groups anytime and
anywhere.
A host's operating system (OS) determines the IGMP version that the host supports.
• On the router side, IGMP enables a router to determine whether multicast receivers of a specific group
exist. Each host stores information about only the multicast groups it joins.
2022-07-08 1853
Feature Description
Purpose
IGMP allows receivers to access IP multicast networks, join multicast groups, and receive multicast data from
multicast sources. IGMP manages multicast group members by exchanging IGMP messages between hosts
and routers. IGMP records host join and leave information on interfaces, ensuring correct multicast data
forwarding on the interfaces.
IGMP Messages
Figure 1 IGMP networking
• IGMP Query message: This type of message is sent by a Router to hosts to learn whether multicast
receivers exist on a specific network segment. IGMP Query messages are sent only by queriers. IGMP
Query messages are categorized into the following types:
■ General Query message: It does not contain specific source or group information.
■ Group-specific Query message: It contains specific multicast group information, but does not
contain specific source information.
■ Group-and-Source-Specific Query message: It contains both specific multicast source and group
information.
• IGMP Report message: It is sent by a host to an upstream device when the host wants to join a
multicast group.
• IGMP Leave message: It is sent by a host to an upstream device when the host wants to leave a
2022-07-08 1854
Feature Description
multicast group.
IGMPv2 and IGMPv3 support leave messages, but IGMPv1 does not.
• Querier
A querier is responsible for sending IGMP Query messages to hosts and receiving IGMP Report
messages and Leave messages from hosts. A querier can then learn which multicast group has receivers
on a specified network segment.
• Non-querier
A non-querier only receives IGMP Report messages from hosts to learn which multicast group has
receivers. Then, based on the querier's action, the non-querier identifies which receivers leave multicast
groups.
Generally, a network segment has only one querier. Multicast devices follow the same principle to select a
querier. The process is as follows (using DeviceA, DeviceB, and DeviceC as examples):
• After IGMP is enabled on DeviceA, DeviceA considers itself a querier in the startup process by default
and sends IGMP Query messages. If DeviceA receives IGMP Query messages from DeviceB that has a
lower IP address, DeviceA changes from a querier to a non-querier. DeviceA starts the another-querier-
existing timer and records DeviceB as the querier of the network segment.
• If DeviceA is a non-querier and receives IGMP Query messages from the querier DeviceB, the another-
querier-existing timer is updated; if DeviceA is a non-querier and receives IGMP Query messages from
DeviceC that has a lower IP address than the querier DeviceB, the querier is changed to DeviceC, and
the another-querier-existing timer is updated.
• If DeviceA is a non-querier and the another-querier-existing timer expires, DeviceA changes to a querier.
IGMPv1 does not support querier election. An IGMPv1 querier is designated by the upper-layer protocol, such as PIM. In
this version, querier election can be implemented only among multicast devices that run the same IGMP version on a
network segment.
IGMP Implementation
IGMP enables a multicast router to identify receivers by sending IGMP Query messages to hosts and
receiving IGMP Report messages and Leave messages from hosts. A multicast router forwards multicast data
to a network segment only if the network segment has multicast group members. Hosts can decide whether
to join or leave a multicast group.
2022-07-08 1855
Feature Description
As shown in Figure 2, IGMP-enabled DeviceA functions as a querier to periodically send IGMP Query
messages. All hosts (Host A, Host B, and Host C) on the same network segment of DeviceA can receive these
IGMP Query messages.
• When a host (for example, Host A) receives an IGMP Query message of a multicast group G, the
processing flow is as follows:
■ If Host A is already a member of group G, Host A replies with an IGMP Report message of group G
at a random time within the response period specified by DeviceA.
After receiving the IGMP Report message, DeviceA records information about group G and
forwards the multicast data to the network segment of the host interface that is directly connected
to DeviceA. Meanwhile, DeviceA starts a timer for group G or resets the timer if it has been started.
If no members of group G respond to DeviceA within the interval specified by the timer, DeviceA
stops forwarding the multicast data of group G.
■ If Host A is not a member of any multicast group, Host A does not respond to the IGMP Query
message from DeviceA.
• When a host (for example, Host A) joins a multicast group G, the processing flow is as follows:
Host A sends an IGMP Report message of group G to DeviceA, instructing DeviceA to update its
multicast group information. Subsequent IGMP Report messages of group G are triggered by IGMP
Query messages sent by DeviceA.
• When a host (for example, Host A) leaves a multicast group G, the processing flow is as follows:
Host A sends an IGMP Leave message of group G to DeviceA. After receiving the IGMP Leave message,
DeviceA triggers a query to check whether group G has other receivers. If DeviceA does not receive
IGMP Report messages of group G within the period specified by the query message, DeviceA deletes
the information about group G and stops forwarding multicast traffic of group G.
2022-07-08 1856
Feature Description
IGMP Characteristic
Version
IGMPv1 IGMPv1 manages multicast groups by exchanging IGMP Query messages and IGMP Report
messages. In IGMPv1, a host does not send an IGMP Leave message when leaving a
multicast group, and a Router deletes the record of a multicast group when the timer for
maintaining the members in the multicast group expires.
IGMPv1 provides only General Query messages.
IGMPv2 In IGMPv2, an IGMP Report message contains information about a multicast group, but does
not contain information about a multicast source. A message contains the record of a
multicast group.
After a host sends an IGMP Report message of a multicast group to a Router, the Router
notifies the multicast forwarding module of this join request. Then the multicast forwarding
module can correctly forward multicast data to the host.
IGMPv2 is capable of suppressing IGMP Report messages to reduce repetitive IGMP Report
messages. This function works as follows:
After a host (for example, Host A) joins a multicast group G, Host A receives an IGMP Query
message from the Router. Then the host randomly selects a value from 0 to the maximum
response time (specified in the IGMP Query message) as the timer value. When the timer
expires, Host A sends an IGMP Report message of group G to the Router. However, if Host A
receives an IGMP Report message of group G from another host in group G before the timer
expires, Host A does not send an IGMP Report message of group G to the Router.
When a host leaves group G, the host sends an IGMP Leave message of group G to a Router.
Because of the Report message suppression mechanism in IGMPv2, the Router cannot
determine whether another host exists in group G. Therefore, the Router triggers a query on
group G. If another host exists in group G, the host sends an IGMP Report message of G to
the Router. If the Router sends the query on group G for a specified number of times, but
does not receive an IGMP Report message for group G, the Router deletes information about
group G and stops forwarding multicast data of group G.
IGMPv2 provides General Query messages and Group-specific Query messages.
NOTE:
Both IGMP queriers and non-queriers can process IGMP Report message, while only queriers can
forward IGMP Report messages. IGMP non-queriers cannot process IGMPv2 Leave messages.
IGMPv3 An IGMPv2 Report message contains information about multicast groups, but does not
contain information about multicast sources. Therefore, an IGMPv2 host can select a
multicast group, but not a multicast source/group. IGMPv3 has resolved the problem. The
IGMPv3 message from a host can contain multiple records of multicast groups, with each
multicast group record containing multiple multicast sources.
On the Router side, the querier sends IGMP Query messages and receives IGMP Report and
2022-07-08 1857
Feature Description
IGMP Characteristic
Version
Leave messages from hosts to identify network segments that contain receivers and forward
the multicast data to the network segments. In IGMPv3, source information in multicast
group records can be filtered in either include mode or exclude mode:
In include mode:
If a source is included in a group record and the source is active, the Router forwards the
multicast data of the source.
If a source is included in a group record but the source is inactive, the Router deletes the
source information and does not forward the multicast data of the source.
In exclude mode:
If a source is active, the Router forwards the multicast data of the source, because there are
hosts that require the multicast data of the source.
If a source is inactive, the Router does not forward the multicast data of the source.
If a source is excluded in a group record, the Router forwards the multicast data of the
source.
IGMPv3 does not have the Report message suppression mechanism. Therefore, all hosts
joining a multicast group must reply with IGMP Report messages when receiving IGMP
Query messages.
In IGMPv3, multicast sources can be selected. Therefore, besides the common query and
multicast group query, an IGMPv3-enabled device adds the designated multicast source and
group query, enabling the Router to find whether receivers require data from a specified
multicast source.
Advantages IGMPv2 provides IGMP Leave messages, and thus IGMPv2 can manage members of multicast
of IGMPv2 groups effectively.
over The multicast group can be selected directly, and thus the selection is more precise.
IGMPv1
Advantages IGMPv3 allows hosts to select multicast sources, while IGMPv2 does not.
of IGMPv3 An IGMPv3 message contains records of multiple multicast groups, and thus the number of
over IGMP messages is reduced on the network segment.
IGMPv2
2022-07-08 1858
Feature Description
an earlier IGMP version, the multicast device automatically changes the version of the corresponding
multicast group to be the same as that of the hosts and then operates in the earlier IGMP version. The
process works as follows:
• When an IGMPv2 multicast device receives an IGMPv1 Report message from a multicast group, the
multicast device lowers the IGMP version of the multicast group to IGMPv1. Then, the multicast device
ignores the IGMPv2 Leave messages of the multicast group.
• When an IGMPv3 multicast device receives IGMPv2 Report messages from a multicast group, the
multicast device lowers the IGMP version of the multicast group to IGMPv2. Then, the multicast device
ignores the IGMPv3 BLOCK messages and the multicast source list in the IGMPv3 TO_EX messages. The
multicast source-selecting function of IGMPv3 messages is then disabled.
• When an IGMPv3 multicast device receives IGMPv1 Report messages from a multicast group, the
multicast device lowers the IGMP version of the multicast group to IGMPv1. Then, the multicast device
ignores the IGMPv2 Leave messages, IGMPv3 BLOCK messages, IGMPv3 TO_IN messages, and multicast
source list in the IGMPv3 TO_EX messages.
If you manually change the IGMP version of a multicast device to a later version, the multicast device still
operates in the original version if group members of the original version exist. The multicast device upgrades
its IGMP version only after all group members of the original version leave.
• If the multicast device does not check the Router-Alert option and sends the IGMP packet to the routing
protocol layer, irrespective of whether the IGMP packet contains the Route-Alert option.
• If the multicast device is configured to check the Router-Alert option, the multicast device sends the
IGMP packet to the routing protocol layer only if the packet contains the Route-Alert option.
• IGMP-Limit
IGMP-limit is configured on Router interfaces connected to users to limit the maximum number of
2022-07-08 1859
Feature Description
multicast groups, including source-specific multicast groups. This mechanism enables users who have
successfully joined multicast groups to enjoy smoother multicast services.
• Group-Policy
Group-policy is configured on Router interfaces to allow the Router to set restrictions on specific
multicast groups, so that entries will not be created for the restricted multicast groups. This mechanism
improves IGMP security.
IGMP-Limit
When a large number of multicast users request multiple programs simultaneously, excessive bandwidth
resources of the Router will be exhausted, and the Router' s performance will be degraded, deteriorating the
multicast service quality.
To prevent this problem, configure IGMP-limit on the Router interface to limit the maximum number of
IGMP entries on the interface. When receiving an IGMP Report message from a user, the Router interface
first checks whether the configured maximum number of IGMP entries is reached. If the maximum number
is reached, the Router interface discards the IGMP Report message and rejects the user. If the maximum
number is not reached, the Router interface sets up an IGMP membership and forwards data flows of the
requested multicast group to the user. This mechanism enables users who have successfully joined multicast
groups to enjoy smoother multicast services.
For example, on the network shown in Figure 1, if the maximum number of IGMP entries is set to 1 on
Interface 1 of DeviceA, Interface 1 allows only one host to join a multicast group and creates an IGMP entry
only for the permitted host.
2022-07-08 1860
Feature Description
• IGMP-limit allows you to configure a maximum number of IGMP entries on the Router interface. After
receiving IGMP Report messages, the Router interface limits the number of IGMP entries on the
interface.
• IGMP-limit allows you to configure an ACL on the Router interface, so that the interface permits IGMP
Report messages that contain a group address, including a source-specific group address, that is in the
range specified in the ACL, regardless of whether the configured maximum number of IGMP entries is
reached. An IGMP entry that contains a group address in the range specified in the ACL is not counted
as one entry on an interface.
• Each (*, G) entry is counted as one entry on an interface, and each (S, G) is counted as one entry on an
interface.
• SSM-mapping (*, G) entries are not counted as entries on an interface, and each (S, G) entry mapped
using the SSM-mapping mechanism is counted as one entry on an interface.
• Source address-based IGMP message filtering for IGMP Report and Leave messages:
■ The device permits the message only if the message's source address is 0.0.0.0 or an address on the
same network segment as the interface that receives the message.
■ If ACL rules are configured for filtering IGMP Report and Leave messages, the device determines
whether to permit or discard an IGMP Report or Leave message based on the ACL configurations.
• Source address-based IGMP message filtering for IGMP Query messages: A device determines whether
to permit or drop an IGMP Query message based on only the configured ACL rules.
On the network shown in Figure 2, the IP address of DeviceA's interface connected to a user network is
10.0.0.1/24. Host A sends IGMP Report or Leave messages with the source address 10.1.0.1, Host B sends
IGMP Report or Leave messages with the source address 10.0.0.8, and Host C sends IGMP Report or Leave
messages with the source address 0.0.0.0. If no ACL rule is configured, DeviceA permits the messages
received from Host B and Host C and denies the messages received from Host A. If ACL rules are configured,
DeviceA accepts only the IGMP Report or Leave messages whose destination addresses match the ACL rules.
For example, if an ACL rule only permits IGMP Report or Leave messages with the source address 10.0.0.8,
DeviceA permits the IGMP Report or Leave messages received from Host B and denies the IGMP Report or
Leave messages received from Host C.
2022-07-08 1861
Feature Description
On the network shown in Figure 3, DeviceA is a querier that receives IGMP Report or Leave messages from
hosts. If DeviceB constructs bogus IGMP Query messages that contain a source address (such as 10.0.0.1/24)
lower than DeviceA's address, Device A will become a non-querier and fail to respond to IGMP Leave
messages from hosts. However, DeviceA continues to forward multicast traffic to user hosts who have left,
which wastes network resources. To resolve this problem, you can configure an ACL rule on DeviceA to deny
the IGMP Query messages with the source address 10.0.0.1/24.
IGMP Group-Policy
Group-policy is a filtering policy configured on Router interfaces. For example, on the network shown in
Figure 4, Host A and Host C request to join the multicast group 225.1.1.1. Host B and Host D request to join
the multicast group 226.1.1.1. Group-policy is configured on RouterA to permit join requests only for the
multicast group 225.1.1.1. Then, RouterA creates entries for Host A and Host C, but not for Host B or Host D.
2022-07-08 1862
Feature Description
To improve network security and facilitate network management, you can use group-policy to disable the
Router from receiving IGMP Report messages from or forwarding multicast data to specific multicast groups.
Group-policy is implemented through ACL configurations.
In real-world situations, static-group is configured on the Router interface that is connected to hosts, which
facilitates multicast data forwarding to the Router. The Router interface can then quickly forward the
multicast data, which shortens the channel switchover period.
2022-07-08 1863
Feature Description
multicast device cannot determine whether another host exists in group G. Therefore, the multicast device
triggers a query on group G. If another host exists in group G, the host sends the IGMP Report message of
group G to the multicast device. If the multicast device sends the query on group G a specified number of
times but does not receive IGMP Report messages from any host, the multicast device deletes information
about group G and stops forwarding multicast data of group G.
If a multicast device is directly connected to an access device on which IGMP proxy is enabled, when the
access device leaves group G and sends the IGMP Leave message of group G to the multicast device, the
multicast device can identify that group G contains no receivers and will not trigger the IGMP Query
message. Then, the multicast device deletes all records of group G and stops forwarding data of group G.
This is called IGMP Prompt-Leave.
After IGMP Prompt-Leave is enabled on a multicast device, the multicast device does not trigger IGMP Query
messages destined for the multicast group when the multicast device receives IGMP Leave messages from
the multicast group. In this case, the multicast device deletes all records about the multicast group and stops
forwarding the data of the multicast group. In this manner, the multicast device responds faster to IGMP
Leave messages.
Background
IGMPv3 supports source-specific multicast (SSM) but IGMPv1 and IGMPv2 do not. Although the majority of
latest multicast devices support IGMPv3, most legacy multicast terminals only support IGMPv1 or IGMPv2.
SSM mapping is a transition solution that provides SSM services for such legacy multicast terminals.
Using rules that specify the mapping from a particular multicast group G to a source-specific group, SSM
mapping can convert IGMPv1 or IGMPv2 packets whose group addresses are within the SSM range to
IGMPv3 packets. This mechanism allows hosts running IGMPv1 or IGMPv2 to access SSM services. SSM
mapping allows IGMPv1 or IGMPv2 terminals to access only specific sources, thus minimizing the risks of
attacks on multicast sources.
A multicast device does not process the (*, G) requirements, but only processes the (S, G) requirements from the
multicast group in the SSM address range. For details about SSM, see PIM-SSM.
If a large number of multicast devices on a network have IGMPv1 or IGMPv2 users and there are many SSM
mappings, you can use DNS-based SSM mapping to provide dynamic mapping services to facilitate mapping
rule management and simplify maintenance. That is, after receiving IGMPv1 or IGMPv2 messages whose
group addresses are in the SSM range, a multicast device queries the DNS server for the multicast source
address, and converts IGMPv1 or IGMPv2 messages into IGMPv3 messages based on the reply from the DNS
server.
2022-07-08 1864
Feature Description
without upgrading the IGMP versions to IGMPv3, configure SSM mapping on the multicast device.
If Device A has SSM mapping enabled and is configured with mappings between group addresses and source
addresses, it will perform the following actions after receiving a (*, G) message from Host B or Host C:
• If the multicast group address contained in the message is within the any-source multicast (ASM)
range, Device A processes the request as described in Principles of IGMP.
• If the multicast group address contained in the message is within the SSM range, Device A maps a (*, G)
join message to multiple (S, G) join messages based on mapping rules. With this processing, hosts
running IGMPv1 or IGMPv2 can access multicast services available only in the SSM range.
If DNS-based SSM mapping is enabled on Device A and the domain name suffix of the DNS server is
configured, after Device A receives an IGMP (*, G) Join message from Host B or Host C, it performs the
following operations based on the actual situation:
• If the multicast group of the message is in the Any-Source Multicast (ASM) address range, see Principles
2022-07-08 1865
Feature Description
• If the multicast group of the message is in the SSM address range, Device A adds the domain name
suffix to the multicast group address to form a complete domain name, and sends a query request to
the DNS server. The domain name is in the format of reverse multicast group address + domain name
suffix. For example, if the default domain name suffix in-addr.arpa is used and the (*, 232.0.0.1) Join
message is received, Device A queries the DNS server for the IP address corresponding to the domain
name 1.0.0.232.in-addr.arpa.
• After receiving the query request, the DNS server returns the corresponding IP address to Device A.
Device A uses the IP address in the response packet as the source address to convert (*, G) into (S, G).
Then, Device A can provide multicast services in the SSM range for user hosts using lower IGMP
versions.
Background
After IGMP is configured on hosts and the hosts' directly connected multicast device, the hosts can
dynamically join multicast groups, and the multicast device can manage multicast group members on the
local network.
In some cases, the device directly connected to a multicast device, however, may not be a host but an IGMP
proxy-capable access device to which hosts are connected. If you configure only IGMP on the multicast
device, access device, and hosts, the multicast and access devices need to exchange a large number of
packets.
To resolve this problem, enable IGMP on-demand on the multicast device. The multicast device sends only
one general query message to the access device. After receiving the general query message, the access
device sends the collected Join and Leave status of multicast groups to the multicast device. The multicast
device uses the Join and Leave status of the multicast groups to maintain multicast group memberships on
the local network segment.
Benefits
IGMP on-demand reduces packet exchanges between a multicast device and its connected access device and
reduces the loads on these devices.
Related Concepts
IGMP on-demand
IGMP on-demand enables a multicast device to send only one IGMP general query message to its connected
access device (IGMP proxy-capable) and to use Join/Leave status of multicast groups reported by its
2022-07-08 1866
Feature Description
Implementation
When a multicast device is directly connected to hosts, the multicast device sends IGMP Query messages to
and receives IGMP Report and Leave messages from the hosts to identify the multicast groups that have
receivers. The device directly connected to the multicast device, however, may be not a host but an IGMP
proxy-capable access device, as shown in Figure 1.
The provider edge (PE) is a multicast device, and the customer edge (CE) is an access device.
• On the network segment a shown in Figure 1, if IGMP on-demand is not enabled on the PE, the PE
sends a large number of IGMP Query messages to the CE, and the CE sends a large number of Report
and Leave messages to the PE. As a result, lots of PE and CE resources are consumed.
• On the network segment b shown in Figure 1, after IGMP on-demand is enabled on the PE, the PE
sends only one general query message to the CE. After receiving the general query message from the
PE, the CE sends the collected Join and Leave status of IGMP groups to the PE. The CE sends a Report or
Leave message for a group to the PE only when the Join or Leave status of the group changes. To be
specific, the CE sends an IGMP Report message for a multicast group to the PE only when the first user
joins the multicast group and sends a Leave message only when the last user leaves the multicast
group.
2022-07-08 1867
Feature Description
After you enable IGMP on-demand on a multicast device connected to an IGMP proxy-capable access device, the
multicast device implements IGMP in a different way as it implements standard IGMP in the following aspects:
• The multicast device interface connected to the access device sends only one IGMP general query message to the
access device.
• The records about dynamically joined IGMP groups on the multicast device interface connected to the access device
do not time out.
• The multicast device interface connected to the access device directly deletes the entry for a group only after the
multicast device interface receives an IGMP Leave message for the group.
IGMP IPsec This function is used to IGMP IPsec uses security association IGMP IPsec applies to
authenticate IGMP (SA) to authenticate sent and multicast devices
packets to prevent received IGMP packets. The IGMP connected to user
bogus IGMP protocol IPsec implementation process is as hosts.
packet attacks, follows:
improving multicast Before an interface sends out an
service security. IGMP protocol packet, IPsec adds an
AH header to the packet.
After an interface receives an IGMP
protocol packet, IPsec uses an SA to
authenticate the AH header in the
packet. If the AH header is
authenticated, the interface forwards
the packet. Otherwise, the interface
discards the packet.
Background
On the live network, the OTT service mode uses unicast technologies rather than multicast technologies. As
such, this mode provides only delayed live broadcast services. Applying this mode to the programs that have
high requirements on real-time performance, such as galas and sport events, will cause poor user experience
and consume a large amount of bandwidth. The IPTV mode, which uses multicast technologies for transport,
can provide real-time live broadcast services with low bandwidth consumption. Therefore, carriers urgently
need an upgrade from the OTT mode to the IPTV mode. One of the key prerequisites for such an upgrade is
configuring the IGMP over L2TP function for the BRAS.
Implementation
Carriers can perform an upgrade from the OTT mode to the IPTV mode on the live network using either of
the following methods:
• Deploy a standalone LAC on the network, and configure the home gateway to use PPPoE to dial up to
the LNS.
• Configure the STB to function as a LAC and dial up to the LNS, and configure the home gateway to use
PPPoE to dial up to the BRAS (the LNS and BRAS can be deployed on the same device).
Method 1
Figure 1 shows the networking where a standalone LAC is deployed and the home gateway uses PPPoE to
dial up to the LNS. The service process is as follows:
2. The home gateway dials up to the LNS through PPPoE to obtain an IP address.
3. The STB user orders a program by sending an IGMP Report message to the home gateway.
4. The home gateway converts the received IGMP Report message into a PPPoE packet by encapsulating
it with a PPPoE header and sends the PPPoE packet to the LAC.
5. The LAC decapsulates the PPPoE packet, converts it into an L2TP packet by encapsulating it with an
L2TP header, and sends the L2TP packet to the LNS.
6. The LNS decapsulates the L2TP packet and authenticates the ordered multicast program. If the
authentication succeeds, the LNS generates a multicast routing entry for traffic diversion. In addition,
the LNS periodically sends IGMP Query messages to check whether the program is still required.
7. When the user no longer watches the video program, the STB sends an IGMP Leave message. After
receiving the message, the LNS deletes the program authentication result as well as the multicast
routing entry.
8. After detecting the logout of the user, the LNS deletes its information.
2022-07-08 1869
Feature Description
Figure 1 Networking where a standalone LAC is deployed and the home gateway uses PPPoE to dial up to the
LNS
Method 2
Figure 2 shows the networking where the STB functions as a LAC and dials up to the BRAS (LNS) and the
home gateway uses PPPoE to dial up to the BRAS (LNS). The service process is as follows:
1. The user performs PPPoE dialup through the home gateway to obtain an IP address from the BRAS
(LNS). The STB obtains a private network IP address from the home gateway through DHCP.
2. The STB performs L2TP dialup and establishes an L2TP tunnel with the BRAS (LNS). The BRAS (LNS)
authenticates the STB and assigns an IP address to the STB.
3. The STB user orders a program by sending an IGMP Report message. The message is then
encapsulated with an outer PPPoE header and an inner L2TP header and sent to the BRAS (LNS).
4. The BRAS (LNS) decapsulates the PPPoE packet and authenticates the ordered multicast program. If
the authentication succeeds, the BRAS (LNS) generates a multicast routing entry for traffic diversion.
In addition, the BRAS (LNS) periodically sends IGMP Query messages to check whether the program is
still required.
5. When the user no longer watches the video program, the STB sends an IGMP Leave message. After
receiving the message, the BRAS (LNS) deletes the program authentication result as well as the
multicast routing entry.
6. After detecting the logout of the user, the BRAS (LNS) deletes its information.
2022-07-08 1870
Feature Description
Figure 2 Networking where the STB functions as a LAC and dials up to the BRAS (LNS) and the home gateway
uses PPPoE to dial up to the BRAS (LNS)
Definition
Unless otherwise specified, IPv4 PIM and IPv6 PIM implement a feature in the same way. For details about
implementation differences between IPv4 PIM and IPv6 PIM, see Appendix.
PIM is a multicast routing protocol that uses unicast routing protocols to forward data, but PIM is
independent of any specific unicast routing protocols.
PIM can be implemented in PIM-DM, PIM-SM, or PIM-SSM mode. PIM-SM and PIM-SSM apply to IPv4 and
2022-07-08 1871
Feature Description
Purpose
On a network, multicast data is replicated and forwarded through a multicast network from a multicast
source to receivers. PIM is a widely used intra-domain multicast protocol that builds MDTs to transmit
multicast data.
PIM can create multicast routing entries on demand, forward packets based on these entries, and
dynamically respond to network topology changes.
Benefits
PIM works together with other multicast protocols to implement applications, such as:
2022-07-08 1872
Feature Description
IP multicast is being widely used in Internet services provided by ISPs, such as online broadcast, network TV,
remote education, telemedicine, network TV stations, and real-time video/voice conferencing services.
11.4.2.1 PIM-DM
Background
Multicast protocols are required to implement data forwarding on a multicast network. Protocol
Independent Multicast (PIM) is the most widely used multicast protocol that forwards data between devices
in the same domain. Protocol Independent Multicast-Dense Mode (PIM-DM) is one type of PIM.
PIM-DM mainly uses the flood-prune mechanism to implement multicast data forwarding. Specifically, PIM-
DM floods a multicast flow to all network segments and then prunes the network segments on which no
receivers want the flow. PIM-DM periodically performs flood-prune operations to build up and maintain a
shortest path tree (SPT) that connects a multicast source and multicast receivers. Then, PIM-DM forwards
multicast data along this unidirectional loop-free SPT. PIM-DM applies to small-scale networks on which
multicast receivers are densely located. PIM-DM is not a good choice for large-scale networks because the
flood-prune period will be long on such a network. PIM-DM neither suits for networks with sparsely located
receivers because excessive Prune messages will be generated on such a network.
Related Concepts
This section provides basic PIM-DM concepts. See Figure 1.
• PIM device
A multicast router that supports PIM is called a PIM device. A PIM-enabled interface on a PIM device is
2022-07-08 1873
Feature Description
• SPT
A shortest path tree (SPT) is a multicast distribution tree (MDT) with the multicast source at the root
and group members at leaves. SPTs can be used in PIM-DM, Protocol Independent Multicast-Sparse
Mode (PIM-SM), and Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) scenarios.
Implementation
The multicast data forwarding process in a PIM-DM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-DM domain periodically sends Hello messages to all other PIM devices to
discover PIM neighbors and maintain PIM neighbor relationships.
By default, a PIM device permits other PIM control messages or multicast messages from a neighbor, irrespective
of whether the PIM device has received Hello messages from the neighbor. However, if a PIM device has the
neighbor check function enabled, the PIM device permits other PIM control messages or multicast messages from
a neighbor only after the PIM device has received Hello messages from the neighbor.
2. Flooding
PIM-DM assumes that at least one multicast group member exists on each network segment, and
floods multicast data to all routers on the network. Therefore, all PIM devices on the network can
receive multicast data.
3. Prune
After flooding multicast data, PIM-DM prunes network segments that have no multicast data receiver
and retains only the network segments that have multicast data receivers. Only PIM devices that
require multicast data can receive multicast data.
4. State Refresh
If a downstream device is in the prune state, the upstream device maintains a prune timer for this
device. When the prune timer expires, the upstream device resumes data forwarding to the
downstream device, which wastes network resources. To prevent this problem, the state-refresh
function can be enabled on the upstream router. This function enables the upstream router to
periodically send State-Refresh messages to refresh the status of the prune timers of downstream
devices. Downstream devices that do not require multicast data remain in the prune state.
5. Graft
If a node on a pruned network segment has new group members, PIM-DM uses the graft mechanism
to enable the node to immediately forward multicast data.
6. Assert
If there are multiple PIM devices on a network segment, the same multicast packets are sent
repeatedly across the network segment. The Assert mechanism can be used to select a unique
multicast data forwarder, preventing redundant multicast data forwarding.
2022-07-08 1874
Feature Description
Neighbor Discovery
This mechanism is the same as that in PIM-SM. For details about this mechanism, see PIM-SM.
Flooding
The following example uses the network shown in Figure 2 to describe the flooding function. The source
sends a data packet to DeviceA. Then DeviceA floods the packet to all its neighbors. DeviceB and DeviceC
also exchange data packets with each other. To prevent data duplication, PIM-DM capable DeviceB uses the
reverse path forwarding (RPF) mechanism to ensure that it only permits data packets from one neighbor,
DeviceA or DeviceC. (For details about RPF check, see RPF Check.) Finally, data is flooded to DeviceB with
receivers, as well as DeviceC without receivers. This process is called flooding.
Prune
The following example uses the network shown in Figure 3 to describe the prune function. DeviceC has no
receivers, so it sends a Prune message upstream to DeviceA to instruct DeviceA to stop forwarding data to
the interface connected to DeviceC. After receiving the Prune message, DeviceA stops forwarding data to the
downstream interface connected to DeviceC. This process is called pruning.
Because a downstream interface on DeviceA is connected to DeviceB that has a receiver, DeviceA forwards
multicast data to the downstream interface connected to DeviceB. In this manner, a unidirectional and loop-
free SPT is set up from the source to User A.
2022-07-08 1875
Feature Description
State Refresh
The following example uses the network shown in Figure 3 to describe the state refresh function. After
DeviceA prunes the network segment of DeviceC, DeviceA maintains a prune timer for DeviceC. When the
prune timer expires, DeviceA resumes data forwarding to DeviceC. This results in a waste of network
resources.
The state refresh function can prevent this problem and works as follows: DeviceA periodically floods State-
Refresh messages to all its downstream interfaces to reset the prune timers of all the downstream devices.
Graft
The following example uses the network shown in Figure 4 to describe the graft function. After DeviceC in
the pruned state receives an IGMP Report message from user B, DeviceC uses the graft function to
implement fast data forwarding, without waiting a flood-prune period. The graft function works as follows:
DeviceC sends a Graft messages upstream to require DeviceA to restore the forwarding status of the
downstream interface connected to DeviceC. After restoring the forwarding the status, DeviceA sends
multicast data to DeviceC. Therefore, the graft function implements rapid data forwarding for devices in the
pruned state.
2022-07-08 1876
Feature Description
Assert
Either of the following conditions indicates other multicast forwarders are present on the network segment:
• The interface that receives the multicast packet is a downstream interface in the (S, G) entry on the
local Router.
If other multicast forwarders are present on the network segment, the Router starts the Assert mechanism.
The Router sends an Assert message through the downstream interface. The downstream interface also
receives an Assert message from a different multicast forwarder on the network segment. The destination
address of the multicast packet in which the Assert message is encapsulated is 224.0.0.13. The source
address of the packet is the downstream interface address. The TTL value of the packet is 1. The Assert
message carries the route cost from the PIM device to the source or RP, priority of the used unicast routing
protocol, and the group address.
The Router compares its information with the information contained in the message sent by its neighbor.
This is called Assert election. The election rules are as follows:
1. The Router that runs a higher priority unicast routing protocol wins.
2. If the Routers have the same unicast routing protocol priority, the Router with the smaller route cost
to the source wins.
3. If the Routers have the same priority and route cost, the Router with the highest IP address for the
downstream interface wins.
The Router performs the following operations based on the Assert election result:
• If the Router wins the election, the downstream interface of the Router is responsible for forwarding
multicast packets on the network segment. The downstream interface is called an Assert winner.
• If the Router does not win the election, the downstream interface is prohibited from forwarding
multicast packets and is deleted from the downstream interface list of the (S, G) entry. The downstream
interface is called an Assert loser.
After Assert election is complete, only one upstream Router that has a downstream interface exists on the
network segment, and the downstream interface transmits only one copy of each multicast packet. The
Assert winner then periodically sends Assert messages to maintain its status as the Assert winner. If the
Assert loser does not receive any Assert messages from the Assert winner after the timer of the Assert loser
expires, the loser re-adds downstream interfaces for multicast data forwarding.
The following example uses the network shown in Figure 5 to describe the assert function. DeviceB and
DeviceC can receive multicast packets from the multicast source and the multicast packets that pass the RPF
check. (S, G) entries can be created on DeviceB and DeviceC. Because the downstream interfaces of DeviceB
and DeviceC are connected to the same network segment, DeviceB and DeviceC can both send multicast
data to the network segment. The assert function is used to ensure that only one multicast data forwarder
2022-07-08 1877
Feature Description
1. DeviceB receives a multicast packet from DeviceC through a downstream interface, but this packet
fails the RPF check and is discarded by DeviceB. At the same time, DeviceB sends an Assert message to
the network segment.
2. DeviceC compares its routing information with that carried in the Assert message sent by DeviceB.
DeviceC is denied because the route cost from DeviceB to the source is lower. The downstream
interface of DeviceC is prohibited from forwarding multicast packets and deleted from the
downstream interface list of the (S, G) entry.
3. DeviceC receives a multicast packet from DeviceB through the network segment, but the packet fails
the RPF check and therefore is discarded.
11.4.2.2 PIM-SM
PIM-SM implements P2MP data transmission on large-scale networks on which multicast data receivers are
sparsely distributed. PIM-SM forwards multicast data only to network segments with receivers that have
required the data.
PIM-SM assumes that no host wants to receive multicast data. Therefore, PIM-SM sets up an MDT only after
a host requests multicast data, and then sends the data to the host along the MDT.
Concepts
Basic PIM-SM concepts are described based on the networking shown in Figure 1.
2022-07-08 1878
Feature Description
• PIM device
A router that runs PIM is called a PIM device. A router interface on which PIM is enabled is called a PIM
interface.
• PIM domain
A network constructed by PIM devices is called a PIM network.
A PIM-SM network can be divided into multiple PIM-SM domains by configuring BSR boundaries on
router interfaces to restrict BSR message transmission. PIM-SM domains isolate multicast traffic
between domains and facilitate network management.
• DR
■ In PIM-SM, a multicast source's DR is a PIM device directly connected to a multicast source and is
responsible for sending Register messages to a Rendezvous Point (RP).
■ A receiver's DR is a PIM device directly connected to receivers and is responsible for sending Join
messages to an RP and forwarding multicast data to the receivers.
• RP
An RP is the forwarding core in a PIM-SM domain, used to process join requests of the receiver's DR and
registration requests of the multicast source's DR. An RP constructs an MDT with itself at the root and
creates (S, G) entries to transmit multicast data to hosts. All routers in the PIM-SM domain must know
the RP's location. The following table lists the types of RPs.
Table 1 RP classifications
Static RP A static RP is manually Static RPs are recommended To use a static RP,
configured. If a static RP is on small-/medium-sized ensure that all Routers,
used, the same RP address networks because such including the RP, have
2022-07-08 1879
Feature Description
must be configured on all networks are stable and have the same RP and
PIM devices in the same low requirements on network multicast group address
domain. devices. range information.
NOTE:
Dynamic RP A dynamic RP is elected Dynamic RPs can be used on To use a dynamic RP,
among candidate-RPs (C- large-scale networks to you must configure a
RPs) in the same PIM improve network reliability BSR that dynamically
domain. The BSR sends and maintainability. advertises group-to-RP
Bootstrap messages to If multiple multicast sources mapping information.
collect all C-RP information are densely distributed on the
as an RP-Set, and advertises network, configuring core
the RP-Set information to all devices close to the multicast
PIM devices in the domain. sources as C-RPs is
Then, all the PIM devices use recommended.
the same RP-Set information
If multiple users are densely
and follow the same rules to
distributed on the network,
elect an RP. If the elected RP
configuring core devices close
fails, the other C-RPs start an
to the users as C-RPs is
election process again to
recommended.
elect a new RP.
2022-07-08 1880
Feature Description
• BSR
A BSR on a PIM-SM network collects RP information, summarizes that information into an RP-Set
(group-RP mapping database), and advertises the RP-Set to the entire PIM-SM network.
A network can have only one BSR but can have multiple C-BSRs. If a BSR fails, a new BSR is elected
from the C-BSRs.
• RPT
An RPT is an MDT with an RP at the root and group members at the leaves.
• SPT
An SPT is an MDT with the multicast source at the root and group members at the leaves.
Implementation
The multicast data forwarding process in a PIM-SM domain is as follows:
1. Neighbor discovery
Each PIM device in a PIM-SM domain periodically sends Hello messages to all other PIM devices in the
domain to discover PIM neighbors and maintain PIM neighbor relationships.
By default, a PIM device permits other PIM control messages or multicast packets from a neighbor, regardless of
whether the PIM device has received Hello messages from the neighbor. However, if a PIM device has the
neighbor check function, it permits other PIM control messages or multicast packets from a neighbor only after
the PIM device has received Hello messages from the neighbor.
2. DR election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The receiver's DR is
the only multicast data forwarder on a shared network segment. The source's DR is responsible for
forwarding multicast data received from the multicast source to the RP.
3. RP discovery
An RP is the forwarding core in a PIM-SM domain. A dynamic or static RP forwards multicast data
over the entire network.
4. RPT setup
PIM-SM assumes that no hosts want to receive multicast data. Therefore, PIM-SM sets up an RPT only
after a host requests multicast data, and then sends the data from the RP to the host along the RPT.
5. SPT switchover
A multicast group in a PIM-SM domain is associated with only one RP and one RPT. All multicast data
packets are forwarded by the RP. The path along which the RP forwards multicast data may not be
2022-07-08 1881
Feature Description
the shortest path from the multicast source to receivers. The load of the RP increases when the
multicast traffic volume increases. If the multicast data forwarding rate exceeds a configured
threshold, an RPT-to-SPT switchover can be implemented to reduce the burden on the RP.
If a network problem occurs, the Assert mechanism or a DR switchover delay can be used to guarantee that
multicast data is transmitted properly.
• Assert
If multiple multicast data forwarders exist on a network segment, each multicast packet is repeatedly
sent across the network segment, generating redundant multicast data. To resolve this issue, the Assert
mechanism can be used to select a unique multicast data forwarder on a network segment.
• DR switchover delay
If the role of an interface on a PIM device is changed from DR to non-DR, the PIM device immediately
stops using this interface to forward data. If the new DR has not received multicast data, multicast data
traffic is temporarily interrupted. If a DR switchover delay is configured, the interface continues to
forward multicast data until the delay expires. Setting a DR switchover delay prevents multicast data
traffic from being interrupted.
Neighbor Discovery
Each PIM-enabled interface on a PIM device sends Hello messages. A multicast packet that carries a Hello
message has the following features:
• The destination address is 224.0.0.13, indicating that this packet is destined for all PIM devices on the
same network segment as the interface that sends this packet.
• The TTL value is 1, indicating that the packet is sent only to neighbor interfaces.
Hello messages are used to discover neighbors, adjust protocol parameters, and maintain neighbor
relationships.
■ DR_Priority: priority used by each Router to elect a DR. The higher a Router's priority is, the higher
the probability that the Router will be elected as the DR.
■ Holdtime: timeout period during which the neighbor remains in the reachable state.
2022-07-08 1882
Feature Description
■ LAN_Delay: delay for transmitting a Prune message on the shared network segment.
DR Election
The network segment on which a multicast source or group members reside is usually connected to multiple
PIM devices, as shown in Figure 2. The PIM devices exchange Hello message to set up PIM neighbor
relationships. A Hello message carries the DR priority and the address of the interface that connects the PIM
device to this network segment. The Router compares the local information with the information carried in
the Hello messages sent by other PIM devices to elect a DR. This process is a DR election. The election rules
are as follows:
• If PIM devices have the same DR priority or PIM devices that do not support Hello messages carrying DR
priorities exist on the network segment, the PIM device with the highest IP address wins.
Figure 2 DR election
RP Discovery
• Static RP
A static RP is specified using a command. A static RP's address needs to be manually configured on
2022-07-08 1883
Feature Description
other Routers so they can find and use this RP for data forwarding.
• Dynamic RP
A dynamic RP is elected from a set of PIM devices.
1. To use a dynamic RP, configure C-BSRs to elect a BSR among the set of C-BSRs.
Each C-BSR considers itself a BSR and advertises a Bootstrap message. The Bootstrap message
carries the address and priority of the C-BSR. Each Router compares the information contained in
all received Bootstrap messages to determine which C-BSR becomes the BSR. The election rules
are as follows:
a. If the C-BSRs have different priorities, the C-BSR with the highest priority (largest priority
value) is elected as the BSR.
b. If the C-BSRs have the same priority, the C-BSR with the highest IP address is elected as the
BSR.
All Routers use the same election rule and therefore they will elect the same BSR and learn the
BSR address.
2. The C-RPs send C-RP Advertisement messages to the BSR. Each of the message carries the
address of the C-RP that sent it, the range of multicast groups that the C-RP serves, and the
priority of the C-RP.
3. The BSR collects the received information as an RP-Set, encapsulates the RP-Set information in a
Bootstrap message, and advertises the Bootstrap message to all PIM-SM devices.
4. Each Router uses the RP-Set information to perform calculation and comparison using the same
2022-07-08 1884
Feature Description
rule to elect an RP from multiple C-RPs. The election rules are as follows:
a. The C-RP with the longest mask length of the served group address range matching the
specific multicast group wins.
b. If group addresses that all C-RPs serve have the same mask length, the C-RP with the
highest priority wins (a larger priority value indicates a lower priority).
c. In case of the same priority, hash functions are operated. The C-RP with the greatest
calculated value wins.
d. If all the preceding factors are the same, the C-RP with the highest IPv6 address wins.
5. Because all Routers use the same RP-Set and the same election rules, the mapping between the
multicast group and the RP is the same for all the Routers. The Routers save the mapping to
guide subsequent multicast operations.
If a router needs to interwork with an auto-RP-capable device, auto-RP listening must be enabled. After
auto-RP listening is enabled, the router can receive auto-RP announcement and discovery messages,
parse the messages to obtain source addresses, and perform RPF checks based on the source addresses.
■ If an RPF check succeeds, the router forwards the auto-RP message to PIM neighbors. The auto-RP
message carries the multicast group address range served by the RP to guide subsequent multicast
operations.
• Embedded RP
Embedded-RP is a mode used by the Router in the ASM model to obtain an RP address and applies only
to IPv6 PIM-SM. To ensure consistent RP election results, an RP obtained in embedded-RP mode takes
precedence over RPs elected using other mechanisms. The address of an RP obtained in embedded-RP
mode must be embedded in an IPv6 multicast group address, which must meet both of the following
conditions:
■ The IPv6 multicast group address must not be within the SSM group address range.
After a router calculates the RP address from the IPv6 multicast group address, the router uses the RP
address to discover a route for forwarding multicast packets. The process for calculating the RP address
is as follows:
1. The router copies the first N bits of the network prefix in the IPv6 multicast group address. Here,
N is specified by the plen field.
2. The router replaces the last four bits with the contents of the RIID field. An RP address is then
obtained. RIID indicates the interface ID of the RP. There is no default value.
Figure 4 shows the mapping between the IPv6 multicast group address and RP address.
2022-07-08 1885
Feature Description
Figure 4 Mapping between the IPv6 multicast group address and RP address
• Anycast RP
In a traditional PIM-SM domain, each multicast group is mapped to only one RP. When the network is
overloaded or traffic is heavy, many network problems can occur. For example, if the RP is overloaded,
routes will converge slowly, or the multicast forwarding path will not be optimal.
Anycast-RP can be used to address these problems. Currently, Anycast-RP can be implemented through
MSDP or PIM:
■ Through MSDP: Multiple RPs with the same address are configured in a PIM-SM domain and MSDP
peer relationships are set up between the RPs to share multicast data sources.
This mode is only for use on IPv4 networks. For details about the implementation principles, see
Anycast-RP in MSDP.
■ Through PIM: Multiple RPs with the same address are configured in a PIM-SM domain and the
device where an RP resides is configured with a unique local address to identify the RP. These local
addresses are used to set up connectionless peer relationships between the devices. The peers share
multicast source information by exchanging Register messages.
This mode is for use on both IPv4 and IPv6 networks.
These two modes cannot be both configured on the same device in a PIM-SM domain. If Anycast-RP is implemented
through PIM, you can also configure the device to advertise the source information obtained from MSDP peers in
another domain to peers in the local domain.
Receivers and the multicast source each select the RPs closest to their own location to create RPTs. After
receiving multicast data, the receiver's DR determines whether to trigger an SPT switchover. This ensures the
optimal RPT and load sharing. The following section covers the principles of Anycast-RP in PIM.
2022-07-08 1886
Feature Description
As shown in Figure 5, in a PIM-SM domain, multicast sources S1 and S2 send multicast data to multicast
group G, and U1 and U2 are members of group G. Perform the following operations to use PIM to
implement Anycast-RP in the PIM-SM domain:
• Configure RP1 and RP2 and assign both the same IP address (address of a loopback interface). Assume
that the IP address is 10.10.10.10.
• Set up a connectionless peer relationship between RP1 and RP2 using unique IP addresses. Assume that
the IP address of RP1 is 1.1.1.1 and the IP address of RP2 is 2.2.2.2.
1. The receiver sends a Join message to the closest RP and builds an RPT.
• U1 joins the RPT with RP1 as the root, and RP1 creates an (*, G) entry.
• U2 joins the RPT with RP2 as the root, and RP2 creates an (*, G) entry.
• DR1 sends a Register message to RP1, and RP1 creates an (S1, G) entry. Multicast data from S1
reaches U1 along the RPT.
• DR2 sends a Register message to RP2, and RP2 creates an (S2, G) entry. Multicast data from S2
reaches U2 along the RPT.
3. After receiving Register messages from the source's DRs, RPs re-encapsulate the Register messages
and forward them to peers to share multicast source information.
• After receiving the (S1, G) Register message from DR1, RP1 replaces the source and destination
addresses with 1.1.1.1 and 2.2.2.2, respectively, and re-encapsulates the message and sends it to
RP2. Upon receiving the specially encapsulated Register message from peer 1.1.1.1, RP2 processes
this Register message without forwarding it to other peers.
2022-07-08 1887
Feature Description
• After receiving the (S2, G) Register message from DR2, RP2 replaces the source and destination
addresses with 2.2.2.2 and 1.1.1.1, respectively, and re-encapsulates the message and sends it to
RP1. Upon receiving the specially encapsulated Register message from peer 2.2.2.2, RP1 processes
this Register message without forwarding it to other peers.
4. The RP joins an SPT with the source's DR as the root to obtain multicast data.
• RP1 sends a Join message to S2. Multicast data from S2 first reaches RP1 along the SPT and then
reaches U1 along the RPT.
• RP2 sends a Join message to S1. Multicast data from S1 reaches RP2 first through the SPT and
then reaches U2 through the RPT.
5. After receiving multicast data, the receiver's DR determines whether to trigger an SPT switchover.
RPT Setup
Figure 6 RPT setup and data forwarding processes
Setting up an RPT creates a forwarding path for multicast data. Figure 6 shows the networking.
• When a multicast source sends the first multicast packet of a multicast group to its DR, the source's DR
encapsulates the multicast packet in a Register message and unicasts the Register message to the RP.
The RP creates an (S, G) entry to register the multicast source information.
• When a receiver joins a multicast group through IGMP, the receiver's DR sends a Join message to the
RP. An (*, G) entry is then created on each hop, and an RPT is created.
• When a receiver joins a multicast group and a multicast source sends a multicast packet for the group,
the multicast source's DR encapsulates the multicast packet in a Register message and unicasts the
Register message to the RP. The RP then forwards the multicast data along the RPT to group members.
The RPT implements on-demand multicast data forwarding, which reduces bandwidth consumption.
2022-07-08 1888
Feature Description
To reduce the RPT forwarding loads and improve multicast data forwarding efficiency, PIM-SM supports SPT switchovers,
allowing a multicast network to set up an SPT with the multicast source as the root. Then, the multicast source can send
multicast data directly to receivers along the SPT.
SPT Switchover
In a PIM-SM domain, a multicast group interacts with only one RP, and only one RPT is set up. If SPT
switchover is not enabled, all multicast packets must be encapsulated in Register messages and then sent to
the RP. After receiving the packets, the RP de-encapsulates them and forwards them along the RPT.
Since all multicast packets forwarded along the RPT are transferred by the RP, the RP may be overloaded
when multicast traffic is heavy. To resolve this problem, PIM-SM allows the RP or the receiver's DR to trigger
an SPT switchover.
1. As shown in Figure 7, multicast data is forwarded along the RPT. The receiver's DR (DeviceD)
sends (*, G) Join messages to the RP. Multicast data is sent to the receiver's DR (DeviceD) along
2022-07-08 1889
Feature Description
the path multicast source's DR (DeviceA) -> RP (DeviceB) -> receiver's DR (DeviceD).
2. The receiver's DR periodically checks the forwarding rate of multicast packets. If the receiver's DR
finds that the forwarding rate is greater than the configured threshold, the DR triggers an SPT
switchover.
3. The receiver's DR sends (S, G) Join messages to the source's DR. After receiving multicast data
along the SPT, the receiver's DR discards multicast data received along the RPT and sends a Prune
message to the RP to delete the receiver from the RPT. The switchover from the RPT to the SPT is
complete.
4. Multicast data is forwarded along the SPT. Specifically, multicast data is transmitted to receivers
along the path multicast source's DR (DeviceA) -> receiver's DR (DeviceD).
An SPT is set up from the source to group members, and therefore subsequent packets may bypass the
RP. The RPT may not be an SPT. After an SPT switchover is performed, delays in transmitting multicast
data on the network are reduced.
If one source sends packets to multiple groups simultaneously and an SPT switchover policy is specified for a
specified group range:
• Before an SPT switchover, these packets reach the receiver's DR along the RPT.
• After an SPT switchover, only the packets sent to the groups within the range specified in the SPT
switchover policy are forwarded along the SPT. Packets sent to other groups are still forwarded along
the RPT.
Assert
Either of the following conditions indicates other multicast forwarders are present on the network segment:
• The interface that receives the multicast packet is a downstream interface in the (S, G) entry on the
local Router.
If other multicast forwarders are present on the network segment, the Router starts the Assert mechanism.
The Router sends an Assert message through the downstream interface. The downstream interface also
receives an Assert message from a different multicast forwarder on the network segment. The destination
address of the multicast packet in which the Assert message is encapsulated is 224.0.0.13. The source
address of the packet is the downstream interface address. The TTL value of the packet is 1. The Assert
message carries the route cost from the PIM device to the source or RP, priority of the used unicast routing
protocol, and the group address.
The Router compares its information with the information carried in the message sent by its neighbor. This
process is called Assert election. The election rules are as follows:
1. The Router that runs a higher priority unicast routing protocol wins.
2022-07-08 1890
Feature Description
2. If the Routers have the same unicast routing protocol priority, the Router with the smaller route cost
to the source wins.
3. If the Routers have the same priority and route cost, the Router with the highest IP address for the
downstream interface wins.
The Router performs the following operations based on the Assert election result:
• If the Router wins the election, the downstream interface of the Router is responsible for forwarding
multicast packets on the network segment. The downstream interface is called an Assert winner.
• If the Router does not win the election, the downstream interface is prohibited from forwarding
multicast packets and is deleted from the downstream interface list of the (S, G) entry. The downstream
interface is called an Assert loser.
After Assert election is complete, only one upstream Router that has a downstream interface exists on the
network segment, and the downstream interface transmits only one copy of each multicast packet. The
Assert winner then periodically sends Assert message to maintain its status as the Assert winner. If the Assert
loser does not receive any Assert message from the Assert winner throughout the timer of the Assert loser,
the loser re-adds downstream interfaces for multicast data forwarding.
DR Switchover Delay
If an existing DR fails, the PIM neighbor relationship times out, and a new DR election is triggered.
By default, when an interface changes from a DR to a non-DR, the Router immediately stops using the
interface to forward data. If the new DR has not received multicast data, multicast data traffic is temporarily
interrupted.
When a PIM-SM interface that has a PIM DR switchover delay configured receives Hello messages from a
new neighbor and changes from a DR to a non-DR, the interface continues to function as a DR and to
forward multicast packets until the delay times out.
If the Router that has a DR switchover delay configured receives packets from a new DR before the delay
expires, the Router immediately stops forwarding packets. When a new IGMP Report message is received on
the shared network segment, the new DR (instead of the original DR configured with a DR switchover delay)
sends a PIM Join message to the upstream device.
If the new DR receives multicast data from the original DR before the DR switchover delay expires, an Assert election is
triggered.
2022-07-08 1891
Feature Description
Each BSR administrative domain has only one BSR that serves a multicast group for a specific address range.
The global domain has a BSR that serves the other multicast groups.
The relationship between the BSR administrative domain and the global domain is described as follows in
terms of the domain space, group address range, and multicast function.
• Domain space
As shown in Figure 8, different BSR administrative domains contain different Routers. A Router cannot
belong to multiple BSR administrative domains. Each BSR administrative domain is independent and
geographically isolated from other domains. A BSR administrative domain manages a multicast group
for a specific address range. Multicast packets within this address range can be transmitted only in this
BSR administrative domain and cannot exit the border of the domain.
The global domain contains all the Routers on the PIM-SM network. Multicast packets that do not
belong to a particular BSR administrative domain can be transmitted over the entire PIM network.
2022-07-08 1892
Feature Description
Each BSR administrative domain provides services to the multicast group within a specific address range.
The multicast groups that different BSR administrative domains serve can overlap. However, a multicast
group address that a BSR administrative domain serves is valid only in its BSR administrative domain
because a multicast address is a private group address. As shown in Figure 9, the group address range
of BSR1 overlaps with that of BSR3.
The multicast group that does not belong to any BSR administrative domain belongs to the global
domain. That is, the group address range of the global domain is G-G1-G2.
• Multicast function
As shown in Figure 8, the global domain and each BSR administrative domain have their respective C-
RP and BSR devices. Devices only function in the domain to which they are assigned. Each BSR
administrative domain has a BSR mechanism and RP elections that are independent of other domains.
Each BSR administrative domain has a border. Multicast information for this domain, such as the C-RP
Advertisement messages and BSR Bootstrap message, can be transmitted only within the domain.
Multicast information for the global domain can be transmitted throughout the entire global domain
and can traverse any BSR administrative domain.
11.4.2.3 PIM-SSM
Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) enables a user host to rapidly join a
multicast group if the user knows a multicast source address. PIM-SSM sets up a shortest path tree (SPT)
from a multicast source to a multicast group, while PIM-SM uses rendezvous points (RPs) to set rendezvous
point trees (RPTs). Therefore, PIM-SSM implements a more rapid join function than PIM-SM.
Different from the any-source multicast (ASM) model, the SSM model does not need to maintain an RP,
construct an RPT, or register a multicast source.
The SSM model is based on PIM-SM and IGMPv3/Multicast Listener Discovery version 2 (MLDv2). The
procedure for setting up a multicast forwarding tree on a PIM-SSM network is similar to the procedure for
setting up an SPT on a PIM-SM network. The receiver's DR, which knows the multicast source address, sends
Join messages directly to the source so that multicast data streams can be sent to the receiver's designated
2022-07-08 1893
Feature Description
router (DR).
In SSM mode, multicast traffic forwarding is based on (S, G) channels. To receive the multicast traffic of a channel, a
multicast user must join the channel. A multicast user can join or leave a multicast channel by subscribing to or
unsubscribing from the channel. Currently, only IGMPv3 can be used for channel subscription or unsubscription.
Related Concepts
PIM-SSM implementation is based on PIM-SM. For details about PIM-SSM, see Related Concepts.
Implementation
The process for forwarding multicast data in a PIM-SSM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-SSM domain periodically sends Hello messages to all other PIM devices in
the domain to discover PIM neighbors and maintain PIM neighbor relationships.
By default, a PIM device permits other PIM control messages or multicast messages from a neighbor, irrespective
of whether the PIM device has received Hello messages from the neighbor. However, if a PIM device has the
neighbor check function, the PIM device permits other PIM control messages or multicast messages from a
neighbor only after the PIM device has received Hello messages from the neighbor.
2. DR Election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The receiver's DR is
the only multicast data forwarder on the segment.
3. SPT setup
Users on a PIM-SSM network can know the multicast source address and can, therefore, specify the
source when joining a multicast group. After receiving a Report message from a user, the receiver's DR
sends a Join message towards the multicast source to establish an SPT between the source and the
user. Multicast data is then sent by the multicast source to the user along the SPT.
• SPT establishment can be triggered by user join requests (both dynamic and static) and SSM-mapping.
• The DR in an SSM scenario is valid only in the shared network segment connected to group members. The DR on
the group member side sends Join messages to the multicast source, creates the (S, G) entry hop by hop, and then
sets up an SPT.
• PIM-SSM supports PIM silent, BFD for PIM, and a PIM DR switchover delay.
2022-07-08 1894
Feature Description
• Hardware detection: For example, the Synchronous Digital Hierarchy (SDH) alarms are generated if link
faults are detected. Hardware detection detects faults rapidly; however, it is not applicable to all the
media.
• Slow Hello mechanism: It usually refers to the Hello mechanism offered by a routing protocol. This
mechanism takes seconds to detect a fault. In high-speed data transmission, for example, at gigabit
rates, the detection time longer than 1s causes the loss of a large amount of data. In delay-sensitive
services such as voice services, the delay longer than 1s is also unacceptable.
• Other detection mechanisms: Different protocols or device vendors may provide dedicated detection
mechanisms. However, these detection mechanisms are difficult to deploy when systems are
interconnected.
Bidirectional Forwarding Detection (BFD) provides unified detection for all media and protocol layers on the
entire network within milliseconds. Two systems set up a BFD session and periodically send BFD control
packets along the path between them. If one system does not receive BFD control packets within a detection
period, the system considers that a fault has occurred on the path.
In multicast applications, if the current designated router (DR) on a shared network segment is faulty, other
PIM neighbors trigger a new round of DR election only after the neighbor relationship times out. As a result,
multicast data transmission is interrupted. The interruption time (usually in seconds) is not shorter than the
timeout time of the neighbor relationship.
BFD for PIM can detect a link's status on a shared network segment within milliseconds and respond quickly
to a fault on a PIM neighbor. If the interface configured with BFD for PIM does not receive any BFD packets
from the current DR within a configured detection period, the interface considers that a fault has occurred
on the DR. The BFD module notifies the route management (RM) module of the session status, and the RM
module notifies the PIM module. Then, the PIM module triggers a new round of DR election immediately
rather than waiting for the neighbor relationship to time out. This shortens the multicast data transmission
interruption period and improves the reliability of multicast data transmission.
Currently, BFD for PIM can be used on IPv4 and IPv6 PIM-SM/SSM networks.
In Figure 1, on the shared network segment connected to user hosts, a PIM BFD session is set up between
2022-07-08 1895
Feature Description
the downstream interface (Port 2) of DeviceB and the downstream interface (Port 1) of DeviceC. Both ends
of the link send BFD packets to detect the link status.
The downstream interface (Port 2) of DeviceB functions as the DR and is responsible for forwarding
multicast data to the receiver. If Port 2 fails, BFD immediately notifies the RM module of the session status,
and the RM module then notifies the PIM module. The PIM module triggers a new round of DR election. The
downstream interface (Port 1) of DeviceC is then elected as the new DR and forward multicast data to the
receiver immediately. This shortens the multicast data transmission interruption period.
Limit on the IPv4 PIM- Any router on a PIM-SM An ACL and All multicast BSR
BSR address SM network that uses the filtering rules devices on a
range IPv6 PIM- BootStrap router (BSR) can be network
SM mechanism can be configured to
configured as a limit the range
Candidate-BootStrap of valid BSR
Router (C-BSR) and addresses.
participate in a BSR Consequently,
2022-07-08 1896
Feature Description
Limit on the IPv4 PIM- Any router on a PIM-SM An ACL and C-BSR RP
C-RP SM network that uses the filtering rules
address IPv6 PIM- BSR mechanism can be can be
range SM configured as a configured to
Candidate-Rendezvous limit the range
Point (C-RP) and serve of valid C-RP
multicast groups in a addresses and
specified range. Each C- the range of
RP unicasts an multicast
Advertisement message groups that
to the BSR. The BSR each C-RP
collects all received C- serves. Then
RP information and the BSR will
summarizes it as the discard
RP-Set, and floods the Advertisement
RP-Set over the entire messages
network using carrying C-RP
Bootstrap messages. addresses
Based on the RP-Set, outside the
routers on the network valid C-RP
can calculate out the RP address range.
to which a multicast
group in a specific
range corresponds.
This function is used to
guarantee C-RP security
2022-07-08 1897
Feature Description
by preventing C-RP
spoofing and malicious
hosts from replacing
valid C-RPs. With this
function, an RP can be
correctly elected.
Limit on the IPv4 PIM- This feature is used to A PIM entry All PIM devices All PIM devices
number of SM limit the number of number limit on a network. on a network.
PIM entries IPv4 PIM- PIM-SM/PIM-SSM can be
SSM entries to prevent a configured
device from generating globally to
excessive multicast restrict the
routing entries when maximum
attackers send number of
numerous multicast PIM-SM/PIM-
data or IGMP/PIM SSM entries
protocol messages. that can be
Therefore, this feature created. After
helps prevent high the specified
memory and CPU usage limit is
and improve multicast reached, the
service security. device will not
create new
PIM-SM/PIM-
SSM entries.
PIM (*, G) and
(S, G) entries
are limited
separately.
After the
specified limit
for PIM (*, G)
entries is
reached, the
device will stop
creating PIM-
SM (*, G)
2022-07-08 1898
Feature Description
entries.
After the
specified limit
for PIM (S, G)
entries is
reached, the
device will stop
creating PIM-
SM/PIM-SSM
(*, G) entries.
PIM IPv4 PIM- Some unknown devices An ACL and All multicast All multicast
neighbor SM on a network may set filtering rules devices on a devices on a
filtering IPv6 PIM- up PIM neighbor can be network network
2022-07-08 1899
Feature Description
Join IPv4 PIM- A Join/Prune message An ACL and All multicast All multicast
information SM received by an interface filtering rules devices on a devices on a
filtering IPv6 PIM- contains both join and can be network network
SM prune information. configured to
Source IPv4 PIM- This function enables a An ACL and All multicast All multicast
address- SM device to filter multicast filtering rules devices on a devices on a
based IPv6 PIM- data packets based on can be network network
filtering SM source or source/group configured to
2022-07-08 1900
Feature Description
addresses
within the
valid source or
source/group
address range.
PIM IPv4 PIM- This function When receiving All multicast All multicast
neighbor SM guarantees the security or sending devices on a devices on a
check IPv6 PIM- of Join/Prune or Assert Join/Prune or network network
SM messages received or Assert
PIM silent IPv4 PIM- If PIM-SM is enabled on The interface is Interface PIM devices
SM the interface directly not allowed to directly directly
IPv6 PIM- connecting a multicast receive or connected to connected to
SM device to user hosts, this forward any the user host user host
IPv4 PIM- interface can set up PIM PIM packets network network
2022-07-08 1901
Feature Description
PIM IPsec IPv4 PIM- This function is used to PIM IPsec uses All PIM devices All PIM devices
SM authenticate PIM security on a network. on a network.
IPv6 PIM- packets to prevent association
SM bogus PIM protocol (SA) to
IPv4 PIM- packet attacks or denial authenticate
SSM of service (DoS) attacks, sent and
improving multicast received PIM
IPv6 PIM-
service security. packets. The
SSM
PIM IPsec
implementation
process is as
follows:
Before an
interface sends
out a PIM
protocol
packet, IPsec
adds a
protocol
header to the
packet.
After an
interface
receives a PIM
protocol
packet, IPsec
uses a protocol
header to
authenticate
the protocol
2022-07-08 1902
Feature Description
header in the
packet. If the is
authentication
is successful,
the packet is
forwarded.
Otherwise, the
packet is
discarded.
NOTE:
For IPsec
feature
description,
see IPsec.
Background
SPT setup relies on unicast routes. If a link or node failure occurs, a new SPT can be set up only after unicast
routes are converged. This process is time-consuming and may cause severe multicast traffic loss.
PIM FRR resolves these issues. It allows a device to search for a backup FRR route based on unicast routing
information and send the PIM Join message of a multicast receiver along both the primary and backup
routes, setting both primary and backup SPTs. The cross node of the primary and backup links can receive
one copy of a multicast flow from each of the links. Each device's forwarding plane permits the multicast
traffic on the primary link and discards that on the backup link. However, the forwarding plane starts
permitting multicast traffic on the backup link as soon as the primary link fails, thus minimizing traffic loss.
PIM FRR supports fast SPT switchovers only in IPv4 PIM-SSM or PIM-SM. In extranet scenarios, PIM FRR supports only
source VPN, not receiver VPN entries.
Implementation
PIM FRR implementation involves three steps:
2022-07-08 1904
Feature Description
Table 1 PIM FRR implementation before and after a link or node failure occurs
Local primary link In Figure 2, DeviceA permits the multicast In Figure 3, DeviceA permits
traffic on the primary link and discards that the multicast traffic on the
on the backup link. backup link (DeviceB ->
DeviceD -> DeviceA)
Figure 2 PIM FRR implementation before a
immediately after the local
local primary link failure occurs
primary link fails.
Remote primary link In Figure 6, DeviceA permits the multicast In Figure 7, DeviceA permits
traffic on the primary link and discards that the multicast traffic on the
on the backup link. backup link (DeviceC ->
DeviceD -> DeviceA)
immediately after Device A
detects the remote primary
link failure.
2022-07-08 1905
Feature Description
3. Traffic switchback
After the link or node failure is resolved, PIM detects a route change at the protocol layer, starts route
switchback, and then smoothly switches traffic back to the primary link.
PIM FRR in Scenarios Where IGP FRR Cannot Fulfill Backup Root
Computation Independently
PIM FRR relies on IGP FRR to compute both primary and backup routes. IGP FRR can generally compute out
both primary and backup routes on a node. However, a live network easily encounters backup route
computation failures on some nodes due to the increase of nodes on the network. Therefore, if IGP FRR
cannot fulfill route computation independently on a network, deploy IP FRR to work jointly with IGP FRR.
The following example uses a non-ECMP network.
2022-07-08 1906
Feature Description
In a PIM FRR scenario on a non-ECMP network, the devices between the multicast source and receivers must be Huawei
devices with PIM FRR configured.
On the ring network shown in Figure 8, DeviceC connects to a multicast receiver. The primarily multicast
traffic link for this receiver is DeviceC -> DeviceB -> DeviceA. To compute a backup route for the link
DeviceD -> DeviceC, IGP FRR requires that the cost of link DeviceD -> DeviceA be less than the cost of link
Device C-> DeviceA plus the cost of link DeviceD -> DeviceC. That is, the cost of link DeviceD -> DeviceE ->
DeviceF -> DeviceA must be less than the cost of link DeviceC -> DeviceA plus the cost of link DeviceD ->
DeviceC. This ring network does not meet this requirement; therefore, IGP FRR cannot compute a backup
route for link DeviceD -> DeviceC.
To solve the preceding problem, you can manually specify the primary and backup paths to the multicast
source. To configure a multicast static route, you need to specify the outbound interface and next-hop
address. The following example uses DeviceC as an example. The primary and backup links are as follows:
• Primary link of the multicast static route: DeviceC -> DeviceB -> DeviceA, with a higher priority.
Before a link or node failure occurs, DeviceC permits the multicast traffic on the primary link and discards
that on the backup link. After a link or node failure (between DeviceB and DeviceC for example) occurs,
DeviceC permits the multicast traffic on the backup link immediately after detecting the failure.
2022-07-08 1907
Feature Description
• When a remote link fault occurs (for example, the link between DeviceA and DeviceB fails), the control plane of
DeviceC cannot detect the fault. The master and backup inbound interfaces in PIM entries remain unchanged, and
the forwarding plane switches traffic to the backup path.
• When multicast static routes are configured on two adjacent devices, the two devices both use the route passing
each other as the primary route. In this case, link fault protection cannot be implemented, and multicast traffic
cannot be received.
• The next-hop outbound interfaces of the primary and backup multicast static routes configured on two adjacent
devices must be on the same link. For example, the next-hop outbound interface of the primary multicast static
route configured on DeviceC must be on the same link as the next-hop outbound interface of the backup multicast
static route configured on DeviceB. If there are multiple links between adjacent devices, you need to bind the links
to the trunk interface.
• In the case of multi-ring cross, you need to specify the route towards the multicast source as the primary route
when configuring the multicast static route for the multi-ring cross node.
• When adding a node to the network, you need to change the next hops of the multicast static routes on the
upstream and downstream nodes.
Benefits
PIM FRR helps improve the reliability of multicast services and minimize service loss for users.
Limitations
PIM FRR has the following limitations:
• Node protection cannot take precedence over link protection in equal-cost multiple path (ECMP)
scenarios, because IGPs cannot compute backup paths in ECMP scenarios.
■ On an IGP network with PIM FRR deployed, the IGP does not back up the information about the
backup link. After a primary/backup link switchover occurs, the multicast backup link may be
deleted during smooth data verification. As a result, traffic fails to be switched to the backup link,
and rapid switchover cannot be implemented.
■ Only non-ECMP PIM FRR based on LFA FRR is supported. Non-ECMP PIM FRR based on remote FRR
is not supported.
■ On a static route network with PIM FRR deployed, a local route to a neighboring device and the
route from the neighboring device to the local device cannot be both configured as primary routes.
Otherwise, multicast data fails to be received, and link protection cannot be implemented.
■ On a static route network with PIM FRR deployed, you need to modify the multicast static routes of
the upstream and downstream devices and affected devices when a new node is added to the
network.
• If PIM FRR deployment is based on LFA FRR, PIM FRR also has the limitations that IGP LFA FRR has.
2022-07-08 1908
Feature Description
• If PIM FRR deployment is based on LFA FRR, rapid primary/backup link switchover is not supported if
the backup link is an ECMP one.
• Regardless of whether PIM FRR is enabled, primary and backup links cannot be generated for multicast
traffic if the following conditions are met: A TE tunnel is configured, local MT is enabled, and the TE
tunnel interface is the next hop interface of the route to the multicast source.
• On a network that uses multicast static routes, PIM FRR has the following limitations:
■ If load balancing is required, the load balancing modes of neighboring devices must be the same.
■ If a local device's neighboring device is the next hop of the primary link connected to a multicast
source, the local device cannot respond to remote route faults. If a remote route fault occurs,
multicast users fail to receive traffic. To resolve this issue, configure the next hop address as the
multicast source address and a multicast static route as the backup route. (This method does not
apply to networks with equal-cost routes. If equal-cost routes exist and an equal-cost route's next
hop outbound interface is connected to two or more devices, change the route cost to eliminate
equal-cost routes.
■ If loops exist on primary/backup links, PIM entries fail to be deleted even if users have left
multicast groups and stopped requesting traffic.
• PIM FRR supports only PIM-SM SPT (S, G) entries. The backup path and PIM-SSM entries are generated
only when multicast traffic is transmitted.
• PIM FRR cannot implement link protection in discontinuous multicast IPTV service scenarios.
Background
PIM FRR replies on unicast route FRR or multicast static route FRR when establishing backup paths. Such
implementation enables PIM FRR to improve link and node reliability, but cannot effectively provide an end-
to-end node and link protection mechanism in complex networking scenarios.
Multicast source cloning-based PIM FRR can address this issue. This feature enables a device to send cloned
multicast source Join messages to a multicast source and then sends cloned multicast traffic to multicast
receivers along user-planned RPF vector paths. Normally, a multicast traffic receive device permits the traffic
on the primary link and discards that on the backup link. However, the device starts to permit the traffic on
the backup link immediately after detecting a primary link failure, minimizing service loss.
2022-07-08 1909
Feature Description
• Multicast source cloning-based PIM FRR applies only to IPv4 PIM-SM, IPv4 PIM-SSM, and Rosen MVPN scenarios.
Implementation
Multicast source cloning-based PIM FRR implements dual feed and selective receiving of multicast traffic by
cloning multicast source Join messages, allowing you to manually specify two paths to the same multicast
source, cloning multicast traffic from the source, and transmitting cloned traffic along the user-planned
paths.
The implementation of multicast source cloning-based PIM FRR involves the following steps:
Usage Scenario
2022-07-08 1910
Feature Description
Multicast Source Cloning-based PIM FRR Through Strict Explicit Paths in PIM-SM/PIM-SSM Scenarios
On the network shown in the following figure, Device A is connected to a multicast user (Receiver 1). The
user's terminal runs IGMPv3 for multicast services. The multicast source is connected to Device F.
Figure 2 Multicast source cloning-based PIM FRR through strict explicit paths in a PIM-SM/PIM-SSM scenario
• Enable Device A to clone (S, G) source Join messages to (S1, G) and (S2, G) source Join messages.
Specify explicit paths to S1 and S2. Configure the path to S1 as the primary path and the path to S2 as
the backup path. The path to S1 passes through Interface B, Interface C, and Interface F1. The path to
S2 passes through Interface D, Interface E, and Interface F2. Both paths pass through Device F.
• Enable Device F to clone multicast traffic, so that Device F can replicate the traffic of the (S, G) group to
the traffic of the (S1, G) and (S2, G) groups and forward the cloned traffic. In this manner, two copies
of the same multicast traffic flow are forwarded along the primary and backup paths established for
the multicast source Join messages.
• Device A permits the traffic on the primary path but discards that on the backup path. However, Device
A starts to permit the traffic on the backup path immediately after detecting a primary path failure.
Multicast Source Cloning-based PIM FRR Through Loose Explicit Paths in PIM-SM/PIM-SSM Scenarios
On the network shown in the following figure, Device A is connected to a multicast user (Receiver 1). The
user's terminal runs IGMPv3 for multicast services. The multicast source is connected to Device F.
Figure 3 Multicast source cloning-based PIM FRR through loose explicit paths in a PIM-SM/PIM-SSM scenario
2022-07-08 1911
Feature Description
• Enable Device A to clone (S, G) source Join messages to (S1, G) and (S2, G) source Join messages.
Specify explicit paths to S1 and S2. Configure the path to S1 as the primary path and the path to S2 as
the backup path. The path to S1 passes through Loopback 2, Loopback 3, and Loopback 6. The path to
S2 passes through Loopback 4, Loopback 5, and Loopback 6. Both paths pass through Device F.
• Enable Device F to clone multicast traffic, so that Device F can replicate the traffic of the (S, G) group to
the traffic of the (S1, G) and (S2, G) groups and forward the cloned traffic. In this manner, two copies
of the same multicast traffic flow are forwarded along the primary and backup paths established for
the multicast source Join messages.
• Device A permits the traffic on the primary path but discards that on the backup path. However, Device
A starts to permit the traffic on the backup path immediately after detecting a primary path failure.
Multicast Source Cloning-based PIM FRR Through Strict Explicit Paths in Rosen MVPN Scenarios
On the network shown in the following figure, Device A is connected to a multicast user (Receiver 1). The
user's terminal runs IGMPv3 for multicast services. The multicast source is connected to Device F. Device A,
Device C, and Device E are PEs, and Device F is a CE. Both the user-side and multicast source-side networks
are VPN networks.
Figure 4 Multicast source cloning-based PIM FRR through strict explicit paths
• Enable Device A to clone (S, G) source Join messages to (S1, G) and (S2, G) source Join messages.
Specify explicit paths to S1 and S2 on the VPN network. Configure the path to S1 as the primary path
and the path to S2 as the backup path. The path to S1 passes through Loopback 3. The path to S2
passes through Loopback 5. After receiving an (S, G) source Join message, both Device C and Device D
forward the message to Device F (multicast source-side device) on the VPN network.
• Device F forwards the multicast traffic to Device C and Device E. Configure Device C and Device E to
clone multicast traffic. Device C clones the traffic of (S, G) to the traffic of (S11, G) and (S12, G) and
forwards the cloned traffic. Device E clones the traffic of (S, G) to the traffic of (S21, G) and (S22, G)
and forwards the cloned traffic. The traffic of (S11, G) and (S21, G) is sent to Device A along the public
2022-07-08 1912
Feature Description
network strict explicit path specified on Device A. The traffic of (S12, G) and (S22, G) is sent to Device B
and Device D and then forwarded to Device A along the public network strict explicit path specified on
Device A. Four copies of the same multicast flow are sent to Device A
• Device A permits the traffic on the primary path but discards that on the backup path. However, Device
A starts to permit the traffic on the backup path immediately after detecting a primary path failure.
Note the following when the feature is used in Rosen MVPN scenarios:
• If the multicast traffic on a VPN network is discontinuous and multicast source cloning-based PIM FRR is deployed
on the public network, configure a policy on the root node to allow discontinuous traffic to be forwarded through
the share-group on the public network.
• RPF vector paths can only be strict explicit paths.
• Multicast source cloning-based PIM FRR cannot protect the multicast traffic of share-groups.
• The IP address configured in the strict explicit path must be the IP address of the BGP peer.
Benefits
Multicast source cloning-based PIM FRR helps improve the reliability of multicast services and minimize
service loss for users.
• The destination address identifies a receiver. The destination address can be either a unicast address or
a multicast address.
2022-07-08 1913
Feature Description
In PIM messages, unicast and multicast addresses are encapsulated in encoding formats, for example, group addresses in
the Encoded-Group format, source addresses in the Encoded-Source format, and BSR addresses in the Encoded-Unicast
format. The length of the address that can be encoded and encapsulated is variable, depending on the supported
protocol type, such as IPv4 and IPv6.
Field Description
Reserved Reserved
Checksum Checksum
Hello Messages
PIM devices periodically send Hello messages through all PIM interfaces to discover neighbors and maintain
neighbor relationships.
In an IP packet that carries a Hello message, the source address is a local interface's address, the destination
address is 224.0.0.13, and the TTL value is 1. The IP packet is transmitted in multicast mode.
2022-07-08 1914
Feature Description
Field LengthDescription
Reserved 4 bits Reserved. The field is set to 0 when the message is sent and is ignored when the
message is received.
2022-07-08 1915
Feature Description
Register Messages
When a multicast source becomes active on a PIM-SM network, the source's DR sends a Register message to
register with the rendezvous point (RP).
In an IP packet that carries a Register message, the source address is the address of the source's DR, and the
destination address is the RP's address. The message is transmitted in unicast mode.
2022-07-08 1916
Feature Description
Reserved 8 bits The field is set to 0 when the message is sent and is ignored when the
message is received.
Reserved2 30 bits. Reserved. The field is set to 0 when the message is sent and this field is
ignored when the message is received.
Multicast data Variable The source's DR encapsulates the received multicast data in a Register
packet length message and sends the message to the RP. After decapsulating the message,
the RP learns the (S, G) information of the multicast data packet.
A multicast source can send data to multiple groups, and therefore a source's DR must send Register
messages to the RP of each target multicast group. A Register message is encapsulated only in one multicast
data packet, so the packet carries only one copy of (S, G) information.
In the register suppression period, a source's DR sends Null-Register messages to notify the RP of the
source's active state. A Null-Register message contains only an IP header, including the source address and
group address. After the register suppression times out, the source's DR encapsulates a multicast data packet
in a Register message again.
Register-Stop Messages
• Multicast data has been switched from a rendezvous point tree (RPT) to a shortest path tree (SPT).
After receiving a Register-Stop message, a source's DR stops using the Register message to encapsulate
multicast data packets and enters the register suppressed state.
In an IP packet that carries a Register-Stop message, the source address is the RP's address, and the
destination address is the source DR's address. The message is transmitted in unicast mode.
2022-07-08 1917
Feature Description
Reserved 8 bits Reserved. The field is set to 0 when the message is sent and
this field is ignored when the message is received.
An RP can serve multiple groups, and a group can receive data from multiple sources. Therefore, an RP may
simultaneously perform multiple (S, G) registrations.
A Register-Stop message carries only one piece of (S, G) information. When an RP sends a Register-Stop
message to a source's DR, the RP can terminate only one (S, G) registration.
After receiving the Register-Stop message carrying the (S, G) information, the source's DR stops
encapsulating (S, G) packets. The source still uses Register messages to encapsulate packets and send the
packets to other groups.
Join/Prune Messages
A Join/Prune message can contain both Join messages and Prune messages. A Join/Prune message that
contains only Join information is called a Join message. A Join/Prune message that contains only Prune
information is called a Prune message.
• When a PIM device no longer has multicast receivers, it sends Prune messages through its upstream
interfaces to instruct the upstream device to stop forwarding packets to the network segment on which
the PIM device resides.
• When a receiver starts to require data from a PIM-SM network, the receiver's DR sends a Join message
through the reverse path forwarding (RPF) interface towards the RP to instruct the upstream neighbor
2022-07-08 1918
Feature Description
to forward packets to the receiver. The Join message is sent upstream hop by hop to set up an RPT.
• When an RP triggers an RPT-to-SPT switchover, the RP sends a Join message through the RPF interface
that points to the source to instruct the upstream neighbor to forward packets to the network segment.
The Join message is sent upstream hop by hop to set up an MDT from the RP to the source.
• When a receiver's DR triggers an RPT-to-SPT switchover, the DR sends a Join message through the RPF
interface that points to the source to instruct the upstream neighbor to forward packets to the network
segment. The Join message is sent upstream hop by hop to set up an SPT.
• A PIM shared network segment may be connected to a downstream interface and multiple upstream
interfaces. If an upstream interface sends a Prune message, but other upstream interfaces still require
multicast packets, these interfaces that require multicast packets must send Join messages within the
override-interval. Otherwise, the downstream interface responsible for forwarding packets on the
network segment performs the prune action.
■ If PIM is enabled on the interfaces of user-side routers, a receiver's DR is elected, and outbound interfaces are
added to the PIM DR's outbound interface list. The PIM DR then sends Join messages to the RP.
As shown in Figure 7, interface 1 on DeviceA is a downstream interface, and interface 2 on DeviceB and
interface 3 on DeviceC are upstream interfaces. If DeviceB sends a Prune message through interface 2,
interface 3 of DeviceC and interface 1 of DeviceA will receive this message. If DeviceC still wants to
receive the multicast data of the group, DeviceC must send a Join message within the override-interval.
This message will notify interface 1 of DeviceA that a downstream Router still wants to receive the
multicast data. Therefore, the prune action is not performed.
In an IP packet that carries a Join/Prune message, the source address is a local interface's address, the
destination address is 224.0.0.13, and the TTL value is 1. The message is transmitted in multicast mode.
2022-07-08 1919
Feature Description
Upstream Neighbor Address Variable Upstream neighbor's address, that is, the address of
(Encoded-Unicast format) length the downstream interface that performs the Join or
Prune action on the Router that receives the
Join/Prune message.
2022-07-08 1920
Feature Description
requested.
Joined Source Address (Encoded- Variable Address of the source whose multicast traffic is
Source format) length requested.
Pruned Source Address (Encoded- Variable Address of the source whose multicast traffic is no
Source format) length longer requested.
Bootstrap Messages
When a dynamic RP is used on a PIM-SM network, candidate-bootstrap Routers (C-BSRs) periodically send
Bootstrap messages through all PIM interfaces to participate in BSR election. The winner continues to send
Bootstrap messages carrying RP-Set information to all PIM devices in the domain.
In an IP packet that carries a Bootstrap message, the source address is a PIM interface's address, the
destination address is 224.0.0.13, and the TTL value is 1. The packet is transmitted in multicast mode and is
forwarded hop by hop on the PIM-SM network and is flooded on the entire network.
2022-07-08 1921
Feature Description
Fragment Tag 16 bits Random number used to distinguish the Bootstrap message.
Hash Mask length 8 bits Length of the hash mask of the C-BSR.
RP-Count 8 bits Total number of C-RPs that want to serve the group.
Frag RP-Cnt 8 bits Number of C-RP addresses included in this fragment of the
Bootstrap message for the corresponding group range. This
field facilitates parsing of the RP-Set for a given group range,
when carried over more than one fragment.
RP-holdtime 16 bits Aging time of the advertisement message sent by the C-RP.
The BSR boundary can be set using the pim bsr-boundary command on a PIM interface. Multiple BSR
boundary interfaces divide the network into different PIM-SM domains. Bootstrap messages cannot pass
2022-07-08 1922
Feature Description
Assert Messages
On a shared network segment, if a PIM device receives an (S, G) packet from the downstream interface of
the (S, G) or (*, G) entry, other forwarders exist on the network segment. The Router then sends an Assert
message through the downstream interface to participate in the forwarder election. The devices that fail in
the forwarder election stop forwarding multicast packets through the downstream interface.
In an IP packet that carries an Assert message, the source address is a local interface's address, the
destination address is 224.0.0.13, and the TTL value is 1. The packet is transmitted in multicast mode.
Field Length
Description
2022-07-08 1923
Feature Description
Graft Messages
On the PIM-DM network, when a Router receives a Report message from a host, the Router sends a Graft
message through the upstream interface of the related (S, G) entry if the Router is not on the SPT. The
upstream neighbor immediately restores the forwarding of the downstream interface. If the upstream
neighbor is not on the SPT, the neighbor forwards the Graft message upstream.
The source address of the IP packet that carries the Graft message is the local interface address and the
destination address is the RPF neighbor. The packet is sent in unicast mode.
The format of the Graft message is the same as that of the Join/Prune message except for the values of
some fields. Table 9 shows the values of these fields in the Graft message.
Field Description
Number of Pruned Sources This field is not used in a Graft message. The value is 0.
Graft-Ack Messages
On the PIM-DM network, when a Router receives a Graft message from a downstream device, the Router
restores the forwarding of the related downstream interface and sends a Graft-Ack message through the
downstream interface to acknowledge the Graft message. If the Router that sent the Graft message does
not receive any Graft-Ack message in the set period, the Router considers that the upstream device does not
receive the Graft message and resends it.
The source address of the IP packet that carries the Graft-Ack message is the downstream interface address
of an upstream device and the destination address is the address of the Router that sent the Graft message.
The packet is sent in unicast mode.
The format of the Graft-Ack message is the same as that of the Graft message, and the Graft-Ack message
copies the contents of the Graft message. The values of some fields in the Graft-Ack message are different
from those in the Graft message, as described in Table 10.
2022-07-08 1924
Feature Description
Field Description
Upstream Neighbor Address Indicates the address of the Router that sends out the Graft
(Encoded-Unicast format) message.
When a dynamic RP is used, C-RPs periodically send Advertisement messages to notify the BSR of the range
of groups they want to serve.
In an IP packet that carries an Advertisement message, the source address is the source's C-RP address, and
the destination address is the BSR's address. The packet is transmitted in unicast mode.
2022-07-08 1925
Feature Description
State-Refresh Message
In the PIM-DM network, to avoid that the interface restores forwarding because the prune timer times out,
the first-hop router nearest to the source periodically triggers State-Refresh messages. The State-Refresh
message is flooded in the entire network and the statuses of prune timers on all Routers are refreshed.
The source address of the IP packet encapsulated with the State-Refresh message is the downstream
interface address, the destination address is 224.0.0.13, and the TTL value is 1. The packet is sent in multicast
mode.
Originator Address (Encoded- Variable Indicates the address of the first-hop router.
Unicast format) length
Metric Preference 32 bits Indicates the priority of the unicast route to the source.
2022-07-08 1926
Feature Description
Metric 32 bits Indicates the cost of the unicast route to the source.
Masklength 8 bits Indicates the address mask length of the unicast route to the
source.
TTL 8 bits Indicates the TTL of the State-Refresh message. The TTL is
used to limit the transmission range of the messages. The TTL
value is reduced by 1 each time the State-Refresh message is
forwarded by a Router.
Background
Traditional core networks and backbone networks usually use the IP/MPLS backbone network to transmit
service packets. Deployment of multicast services, such as IPTV, multimedia conference, and real-time online
games, continues to increase on the IP/MPLS backbone network. These services require sufficient bandwidth,
assured quality of service (QoS), and high reliability on the bearer network. Currently, the following
multicast solutions are used to run multicast services, but these solutions cannot meet the requirements of
multicast services and network carriers:
• IP multicast technology: It can be deployed on point-to-point (P2P) networks to run multicast services,
reducing network upgrade and maintenance costs. Similar to IP unicast, IP multicast does not support
QoS or traffic planning and has low reliability. Multicast applications place high demands on real-time
transmission and reliability, and IP multicast technology cannot meet these requirements.
• Establishing a dedicated multicast network: A dedicated multicast network is usually constructed over
Synchronous Optical Network (SONET)/Synchronous Digital Hierarchy (SDH). SONET/SDH has high
reliability and provides a high transmission rate. However, such a network is expensive to construct,
incurs significant OPEX, and must be maintained separately.
IP/MPLS backbone network carriers require a multicast solution with high TE capabilities to run multicast
2022-07-08 1927
Feature Description
Multicast over P2MP TE tunnels can meet the carriers' requirements by establishing tree tunnels to transmit
multicast data. It has the advantages of high IP multicast packet transmission efficiency and assured MPLS
TE end-to-end (E2E) QoS.
Benefits
Deploying P2MP TE on an IP/MPLS backbone network brings the following benefits:
• Simplifies network deployment because multicast protocols, such as PIM, do not need to be deployed on
core devices on the backbone network.
Related Concepts
P2MP TE data forwarding is similar to IP multicast data forwarding. A branch node copies MPLS packets,
performs label operations, and sends only one packet copy over every sub-LSP. This process increases
network bandwidth resource utilization.
For details on P2MP TE concepts, see Related Concepts in the HUAWEI NE40E-M2 series Feature Description
- MPLS.
2022-07-08 1928
Feature Description
• Ingresses
The P2MP tunnel interfaces of the ingresses (PE1 and PE2) direct multicast data to a P2MP TE tunnel.
• Egresses
The egresses (PE3, PE4, PE5, and PE6) must be configured to ignore the unicast reverse path forwarding
(URPF) check. Whether to configure multicast source proxy on the egresses is based on the location of
the rendezvous point (RP).
2022-07-08 1929
Feature Description
route to the RP. In this case, multicast source proxy can be used to enable the egress to register
multicast source information with the RP.
If a multicast data packet for a group in the any-source multicast (ASM) address range is directed
to an egress which is not directly connected to the multicast source and does not function as the
RP to which the group corresponds, the multicast data packet stops being forwarded. As a result,
downstream hosts cannot receive these multicast data packets. Multicast source proxy can be used
to address this problem. Multicast source proxy enables the egress to send a Register message to
the RP in a PIM domain, such as AR1 or AR2.
Service Overview
Continuing development of the Internet has led to considerable growth in the types of data, voice, and video
information exchanged online. New services, such as Video on Demand (VOD) and Broadcast Television
(BTV) have emerged and continue to develop. Multicast plays an increasingly important role in transmitting
these services.
Multicast services are deployed on the small-scale network shown in Figure 1. An IGP has been deployed,
and each network segment route is reachable. Group members are distributed densely. Users want to receive
VoD information without consuming too many network bandwidth resources.
Networking Description
On the network shown in Figure 1, Hosts A and B are multicast information receivers, each located on a
different leaf network. The hosts receive VoD information in multicast mode. PIM-DM is used throughout
the PIM domain. Device D is connected to the multicast source. Device A is connected to Host A. Devices B
2022-07-08 1930
Feature Description
• IGMP runs between Device A and Host A, between Device B and Host B, and between Device C and
Host B.
When configuring IGMP on Router interfaces, ensure that interface parameters are consistent. All
Routers connected to the same network must run the same version of IGMP (IGMPv2 is recommended)
and be configured with the same interface parameter values, such as the Query timer value and hold
time of memberships. If the IGMP versions or interface parameters are different, IGMP group
memberships are inconsistent on different Routers.
Implementation Solution
On the network shown in Figure 1, Host A and Host B are multicast information receivers on different leaf
2022-07-08 1931
Feature Description
networks. The hosts receive VoD information in multicast mode. PIM-SM is configured in the entire PIM
domain. DeviceB is connected to multicast source S1. DeviceA is connected to multicast source S2. DeviceC is
connected to Host A. Devices E and F are connected to Host B.
• As shown in Figure 1, multicast sources are densely distributed. Candidate rendezvous points (C-RPs)
can be deployed on devices close to the multicast sources. Loopback 0 interfaces on Devices A and D
are configured as candidate bootstrap routers (C-BSRs) and C-RPs. A BSR and an RP are elected
dynamically to serve the PIM-SM network.
■ Static RPs are recommended on small- and medium-sized networks because such networks are
stable and have low requirements on network devices.
If only one multicast source exists on the network, setting the device directly connected to the
multicast source as a static RP is recommended. This eliminates the need for the source DR to
register with the RP.
To use a static RP, ensure that all Routers, including the RP, have the same RP information and the
same range of multicast groups that the RP serves.
■ Dynamic RPs or anycast RPs are recommended on large-scale networks because such RPs are easy
to maintain and provide high reliability.
■ Dynamic RP
■ If multiple multicast sources are densely distributed on the network, configuring core
devices close to the multicast sources as C-RPs is recommended.
■ If multiple users are densely distributed on the network, configuring core devices close to
the users as C-RPs is recommended.
■ Anycast-RP
■ Large-scale network: You are advised to specify the RP address in BSR RP mode to
facilitate RP information maintenance.
To ensure RP information consistency, do not use static RP addresses on some Routers but RP addresses
dynamically elected by BSRs on other Routers in the same PIM domain.
• IGMP runs between DeviceC and Host A, between DeviceE and Host B, and between DeviceF and Host
B.
When configuring IGMP on Router interfaces, ensure that interface parameters are consistent. All
Routers connected to the same network must run the same IGMP version (IGMPv2 is recommended)
2022-07-08 1932
Feature Description
and be configured with the same parameter values, such as the interval at which IGMP Query messages
are sent and holdtime of memberships. If the IGMP versions or interface parameters are different, IGMP
group memberships are inconsistent on different Routers.
• After the network is deployed, Host A and Host B send Join messages to the RP based on service
requirements, and multicast data sent from the multicast source can reach the receivers.
Configuring interfaces on network edge devices to statically join all multicast groups is recommended to increase
the speed for changing channels and to provide a stable viewing environment for users.
Implementation Solution
On the network shown in Figure 1, Host A and Host B are multicast information receivers on different leaf
2022-07-08 1933
Feature Description
networks. The hosts receive VoD information in multicast mode. PIM-SSM is configured in the entire PIM
domain. DeviceB is connected to multicast source S1. DeviceA is connected to multicast source S2. DeviceC is
connected to Host A. Devices E and F are connected to Host B.
A receiver in a PIM-SSM scenario can send a Join message directly to a specific multicast source. A shortest path
tree (SPT) is established between the multicast source and the receiver, not requiring rendezvous points (RPs) on
the network.
• IGMP runs between Device C and Host A, between Device E and Host B, and between Device F and Host
B.
When configuring IGMP on Router interfaces, ensure that interface parameters are consistent. All
Routers connected to the same network must run the same IGMP version (IGMPv2 is recommended)
and be configured with the same interface parameter values, such as the Query timer value and hold
time of memberships. If the IGMP versions or interface parameters are different, IGMP group
memberships are inconsistent on different Routers.
• After the network is deployed, Host A directly sends a Join message to multicast source S1, and Host B
directly sends a Join message to multicast source S2. Multicast data sent from the multicast source can
reach the receivers.
Configuring interfaces on network edge devices to statically join all multicast groups is recommended to increase
the speed for changing channels and to provide a stable viewing environment for users.
Service Overview
There is an increasing diversity of multicast services, such as IPTV, multimedia conferencing, and real-time
online multi-player gaming. These services require the bearer network to have the following capabilities:
• Normally and smoothly forward multicast traffic even during traffic congestion.
• Rapidly detect network faults and switch traffic to a backup link if the primary link fails.
Networking Description
Point-to-multipoint (P2MP) traffic engineering (TE) is deployed on an IP/MPLS backbone network to resolve
multicast traffic congestion and maintain reliability. Figure 1 shows the application of P2MP TE for multicast
services on an IP/MPLS backbone network.
2022-07-08 1934
Feature Description
Feature Deployment
The deployment of P2MP TE for IP multicast services involves the following aspects:
■ Path planning: Configuring explicit paths is recommended. Prevent the re-merge and cross-over
problems during path planning.
■ RSVP Srefresh: Configure RSVP Srefresh to improve the resource utilization of the backbone
network.
■ P2MP TE FRR: Configure FRR to improve the reliability of the backbone network.
■ Configure PIM on the egresses (PE2 and PE3) to generate multicast forwarding entries. Configure
the devices to ignore reverse path forwarding (RPF) check.
■ An egress cannot forward a received multicast data message of an any-source multicast (ASM)
group if the RPF check result shows that the egress is neither directly connected to the multicast
source nor the rendezvous point (RP) of the multicast group. To enable downstream hosts to
receive the message in such a case, deploy multicast source proxy, which enables the egress to
send a Register message to the RP (for example, SR1) in the PIM domain. The data message can
then be forwarded along an RPT.
Service Overview
There is an increasing diversity of multicast services, such as IPTV, multimedia conference, and massively
multiplayer online role-playing games (MMORPGs), and multimedia conferences. To bear these services, the
service providers' networks have to meet the following requirements:
Networking Description
PIM FRR function deployed on the user-access devices helps the network prevent multicast traffic congestion
and maintain reliability. PIM FRR is used on the IPTV service network shown inFigure 1.
Feature Deployment
PIM FRR is used to transmit and protect IP multicast services. The process consists of the following stages:
2022-07-08 1936
Feature Description
Service Overview
In a NON-ECMP network, the IGP LFA FRR function may fail to calculate unicast routes. To avoid multicast
service failure, configure static main and backup routes to establish main and backup links.
Networking Description
PIM FRR function deployed on the user-access devices helps the network prevent multicast traffic congestion
and maintain reliability. PIM FRR is used on the IPTV service network shown inFigure 1.
Feature Deployment
PIM FRR is used to transmit and protect IP multicast services. The process consists of the following stages:
2022-07-08 1937
Feature Description
Service Overview
PIM over GRE is used to transmit multicast data traffic over GRE tunnels.
Network Description
As shown in Figure 1, the vehicle-mounted control system uses GRE keepalive messages to check network
connectivity. A GRE tunnel to the vehicle-mounted control system must be configured on DeviceA. Multicast
services are transmitted between DeviceA and the vehicle-mounted control system. Therefore, PIM-SM needs
to be enabled on the GRE tunnel interface of DeviceA.
Feature Deployment
A GRE tunnel is configured between DeviceA and the vehicle-mounted control system.
PIM-SM is configured on DeviceA, and PIM-SM is enabled on the GRE tunnel interface.
The multicast data flow received by DeviceA from the source is forwarded to the vehicle-mounted control
system through the GRE tunnel.
2022-07-08 1938
Feature Description
11.4.4 Appendix
Feature Name IPv4 PIM IPv6 PIM Implementation Difference
Anycast-RP
Anycast-RP implemented using
MSDP is supported only in IPv4 PIM
scenarios.
Anycast-RP implemented using PIM
is supported in both IPv4 PIM and
IPv6 PIM scenarios.
Definition
Multicast Source Discovery Protocol (MSDP) is an inter-domain multicast solution that applies to
interconnected multiple Protocol Independent Multicast-Sparse Mode (PIM-SM) domains. Currently, MSDP
applies only to IPv4.
Purpose
A network composed of PIM-SM devices is called a PIM-SM network. In real-world situations, a large PIM-
SM network may be maintained by multiple Internet service providers (ISPs).
A PIM-SM network uses Rendezvous Points (RPs) to forward multicast data. A large PIM-SM network can be
2022-07-08 1939
Feature Description
divided into multiple PIM-SM domains. On a PIM-SM network, an RP does not communicate with RPs in
other domains. An RP knows only the local multicast source's location and distributes data only to local
domain users. A multicast source registers only with the local domain RP, and hosts send Join messages only
to the local domain RP. Using this approach, PIM-SM domains implement load splitting among RPs, enhance
network stability, and facilitate network management.
After a large PIM-SM network is divided into multiple PIM-SM domains, a mechanism is required to
implement inter-domain multicast. MSDP provides this mechanism, enabling hosts in the local PIM-SM
domain to receive multicast data from sources in other PIM-SM domains.
In this section, a PIM-SM domain refers to the service range of an RP. A PIM-SM domain can be a domain defined by
bootstrap router (BSR) boundaries or a domain formed after you configure static RPs on the Router.
MSDP Peer
On a PIM-SM network, MSDP enables Rendezvous Points (RPs) in different domains to interwork. MSDP also
enables different PIM-SM domains to share multicast source information by establishing MSDP peer
relationships between RPs.
An MSDP peer relationship can be set up between two RPs in the following scenarios:
To ensure successful reverse path forwarding (RPF) checks in an inter-AS scenario, a BGP or a Multicast Border
Gateway Protocol (MBGP) peer relationship must be established on the same interfaces as the MSDP peer
relationship.
Basic Principles
Setting up MSDP peer relationships between RPs in different PIM-SM domains ensures the communication
between PIM-SM domains, and thereby forming an MSDP-connected graph.
MSDP peers exchange Source-Active (SA) messages. An SA message carries (S, G) information registered by
the source's DR with the RP. Message exchange between MSDP peers ensures that SA messages sent by any
RP can be received by all the other RPs.
Figure 1 shows a PIM-SM network divided into four PIM-SM domains. The source in the PIM-SM 1 domain
sends data to multicast group G. The receiver in the PIM-SM 3 domain is a member of group G. RP 3 and
the receiver's PIM-SM 3 domain maintain an RPT for group G.
2022-07-08 1940
Feature Description
As shown in Figure 1, the receiver in the PIM-SM 3 domain can receive data sent by the source PIM-SM 1
domain after MSDP peer relationships are set up between RP 1, RP 2, and RP 3. The data processing flow is
as follows:
1. The source sends multicast data to group G. DR 1 encapsulates the data into a Register message and
sends the message to RP 1.
2. As the source's RP, RP 1 creates an SA message containing the IP addresses of the source, group G,
and RP 1. RP 1 sends the SA message to RP 2.
3. Upon receiving the SA message, RP 2 performs an RPF check on the message. If the check succeeds,
RP 2 forwards the message to RP3.
4. Upon receiving the SA message, RP 3 performs an RPF check on the message. If the check succeeds, it
means that (*, G) entries exist on RP 3, indicating that the local domain contains members of group G.
RP 3 then creates an (S, G) entry and sends a Join message with the (S, G) information to the source
hop by hop. A multicast path (routing tree) from the source to RP 3 is then set up.
5. After the multicast data reaches RP 3 along the routing tree, RP 3 forwards the data to the receiver
along the rendezvous point tree (RPT).
6. After receiving the multicast data, the receiver determines whether to initiate shortest path tree (SPT)
switchover.
2022-07-08 1941
Feature Description
Background
If multiple Multicast Source Discovery Protocol (MSDP) peers exist in the same or different ASs, the
following problems may easily occur:
• Source active (SA) messages are flooded between peers. Especially when many MSDP peers are
configured in the same PIM-SM domain, reverse path forwarding (RPF) rules cannot filter out useless
SA messages effectively. The MSDP peer needs to perform the RPF check on each received SA messages,
which brings heavy workload to the system.
Implementation Principle
A mesh group requires each two MSDP peers in the group to set up a peer relationship, implementing full-
mesh connections in the group. To implement the mesh group function, add all MSDP peers in the same and
different ASs to the same mesh group on a multicast device. When a member of the mesh group receives an
SA message, it checks the source of the SA message:
• If the SA message is sent by a member of the mesh group, the member directly accepts the message
without performing the RPF check. In addition, it does not forward the message to other members in
the mesh group.
In real-world situations, adding all MSDP peers in the same and different ASs to the same mesh group is
recommended to prevent SA messages from being discarded due to RPF check failures.
• If the SA message is sent by an MSDP peer outside the mesh group, the member performs the RPF
check on the SA message. If the SA message passes the check, the member forwards it to other
members of the mesh group.
The mesh group mechanism greatly reduces SA messages to be exchanged among MSDP peers, relieving the
workload of the multicast device.
Usage Scenario
2022-07-08 1942
Feature Description
In a traditional PIM-SM domain, each multicast group is mapped to only one rendezvous point (RP). When
the network is overloaded or traffic is heavy, many network problems occur. For example, the RP may be
overloaded, routes may converge slowly if the RP fails, or the multicast forwarding path may not be optimal.
To resolve those problems, Anycast-RP is used in MSDP. Anycast-RP allows you to configure multiple
loopback interfaces as RPs in a PIM-SM domain, assign the same IP address to each of these loopback
interfaces, and set up MSDP peer relationships between these RPs. These configurations help select the
optimal paths and RPs and implement load splitting among the RPs.
If Anycast-RP is not applied to a PIM-SM domain, multicast source information and multicast group joining
information needs to be aggregated to the same RP. As a result, the load of a single RP is heavy. Anycast-RP
can resolve this problem. In addition, a receiver sends Join messages to the nearest RP, and a multicast
source registers with the nearest RP, which ensures the optimal RP path.
Implementation Principle
As shown in Figure 1, in a PIM-SM domain, the multicast sources, S1 and S2, send multicast data to the
multicast group G. U1 and U2 are members of group G.
Figure 1 Anycast-RP
2. The receiver sends a Join message to the nearest RP and sets up a rendezvous point tree (RPT). The
multicast source registers with the nearest RP. RPs exchange source active (SA) messages to share
multicast source information.
3. Each RP joins a shortest path tree (SPT) with the source's designated router (DR) at the root. After the
receiver receives the multicast data, it determines whether to initiate the SPT switchover.
2022-07-08 1943
Feature Description
• MD5 authentication
MSDP uses TCP as the transport layer protocol. To enhance MSDP security, you can configure MD5 to
authenticate TCP connections. If a TCP connection fails to be authenticated, the TCP connection cannot
be established.
• Keychain authentication
Keychain authentication works at the application layer. This authentication method ensures smooth
service transmission and improves security by periodically changing the authentication password and
encryption algorithm. Keychain authenticates both MSDP packets and the TCP connection setup process.
For details about keychain, see the "Keychain" chapter in HUAWEI NE40E-M2 series Feature Description
- Security.
The encryption algorithm used for MD5 authentication poses security risks. Therefore, you are advised to use an
authentication mode based on a more secure encryption algorithm.
• Rule 1: If an SA message is sent from an MSDP peer that functions as a source rendezvous point (RP)
constructing the SA message, the receiving multicast device permits the SA message.
• Rule 2: If an SA message is sent from an MSDP peer that is a static RPF peer, the receiving multicast
device permits the SA message. A receiving multicast device can set up MSDP peer relationships with
multiple other multicast devices. You can specify one or more MSDP peers as static RPF peers.
2022-07-08 1944
Feature Description
• Rule 3: If the receiving multicast device has only one MSDP peer, the peer automatically becomes an
RPF peer. The receiving multicast device permits SA messages sent from this peer.
• Rule 4: If an SA message is sent from an MSDP peer that is in the same mesh group as the receiving
multicast device, the receiving multicast device permits the SA message. The receiving multicast device
does not forward the SA message to MSDP peers in the mesh group but forwards it to all MSDP peers
outside the mesh group.
• Rule 5: If an SA message is sent from an MSDP peer that is a route advertiser or the next hop of a
source RP, the receiving multicast device permits the SA message. If a network has multiple equal-cost
routes to a source RP, the receiving multicast device permits SA messages sent from all MSDP peers on
the equal-cost routes.
• Rule 6: If a network has inter-AS routes to a source RP, the receiving multicast device permits SA
messages sent from MSDP peers whose AS numbers are recorded in the AS-path.
If an SA message matches any of rules 1 to 4, the receiving multicast device permits the SA message. The
application of rules 5 and 6 depends on route types.
■ If an MSDP peer is an External Border Gateway Protocol (EBGP) or MEBGP peer, rule 6 applies.
■ If an MSDP peer is an Internal Border Gateway Protocol (IBGP) or MIBGP peer, rule 5 applies.
■ If an MSDP peer is not a BGP or an MBGP peer and the route to the source RP is an inter-AS route,
rule 6 applies. Rule 5 applies in other cases.
■ If no routes exist, the receiving multicast device discards SA messages sent from MSDP peers.
Inter-Domain Multicast
Figure 1 shows an inter-domain multicast application.
• An MSDP peer relationship is set up between rendezvous points (RPs) in two different PIM-SM domains.
Multicast source information can then be shared between the two domains.
• After multicast data reaches RP 1 (the source's RP), RP 1 sends a source active (SA) message that
carries the multicast source information to RP 2.
2022-07-08 1945
Feature Description
• After Receiver receives the multicast data, it independently determines whether to initiate an SPT
switchover.
Anycast-RP
Figure 2 shows an Anycast-RP application.
• Device 1 and Device 2 function as RPs and establish an MSDP peer relationship between each other.
• Intra-domain multicast is performed using this MSDP peer relationship. A receiver sends a Join message
to the nearest RP to set up a rendezvous point tree (RPT).
• The multicast source registers with the nearest RP. RPs exchange SA messages to share the multicast
source information.
• After receiving the multicast data, the receiver decides whether to initiate an SPT switchover.
2022-07-08 1946
Feature Description
Figure 2 Anycast-RP
Definition
A multicast forwarding table consists of groups of (S, G) entries. In an (S, G) entry, S indicates the source
information, and G indicates the group information. The multicast route management module supports
multiple multicast routing protocols. The multicast forwarding table therefore collects multicast routing
entries generated by various types of protocols.
• Multicast multi-topology
• Multicast Boundary
Purpose
• RPF check
2022-07-08 1947
Feature Description
This function is used to find an optimal unicast route to the multicast source and build a multicast
distribution tree. The outbound interface of the unicast route functions as the inbound interface of the
forwarding entry. Then, when the forwarding module receives a multicast data packet, the module
matches the packet with the forwarding entry and checks whether the inbound interface of the packet
is correct. If the inbound interface of the packet is identical with the outbound interface of the unicast
routing entry, the packet passes the RPF check; otherwise, the packet fails the RPF check and is
discarded. The RPF check prevents traffic loops in multicast data forwarding.
• Multicast multi-topology
The multicast multi-topology function helps you plan a multicast topology for multicast services on a
physical network. Then, when a multicast device performs the RPF check, the device searches for routes
and builds a multicast distribution tree only in the multicast topology. In this manner, the problem that
multicast services heavily depend on unicast routes is addressed.
• Multicast Boundary
Multicast boundaries are used to control multicast information transmission by allowing the multicast
information of each multicast group to be transmitted only within a designated scope. A multicast
boundary can be configured on an interface to form a closed multicast forwarding area. After a
multicast boundary is configured for a specific multicast group on an interface, the interface cannot
receive or send multicast packets for the multicast group.
If all the MIGP, MBGP, and MSR routing tables have candidate routes for the RPF route, the system selects
one optimal route from each of the routing table. If the routes selected from each table are Rt_urt (migp),
Rt_mbgp, and Rt_msr, the system selects the RPF route based on the following rules:
2022-07-08 1948
Feature Description
1. The system compares the preferences of Rt_urt (migp), Rt_mbgp, and Rt_msr. The route with the
smallest preference value is preferentially selected.
2. If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same preference, the system selects the route in
descending order of Rt_msr, Rt_mbgp, and Rt_urt (migp).
3. In a public network scenario, if a BGP route that carries NG MVPN attributes (import RT and
source AS) is preferentially selected, the route will not recurse to the local MT.
• If the multicast longest-match command is run to control route selection based on the route mask:
■ The system compares the mask lengths of Rt_urt (migp), Rt_mbgp, and Rt_msr. The route with the
longest mask is preferentially selected.
■ If routes have the same mask length, the system compares their preferences. The route with the
smallest preference value is preferentially selected.
■ If the routes have the same mask length and preference, the system selects a route in descending
order of Rt_msr, Rt_mbgp, and Rt_urt (migp).
In Figure 1, multicast packets reach DeviceC through Port 1. DeviceC performs the RPF check on the packets
and finds that the actual inbound interface of the packets is inconsistent with the inbound interface (Port 2)
in the corresponding forwarding entry. In this case, the RPF check fails, and DeviceC discards the packets.
2022-07-08 1949
Feature Description
Multicast group-based load splitting, multicast source-based load splitting, and multicast source- and multicast group-
based load splitting are all methods of hash mode load splitting.
Based on the hash algorithm, a multicast Router can select a route among several equal-cost routes for each
multicast group. The routes are used for packet forwarding for the groups. As a result, multicast traffic for
different groups can be split into different forwarding paths.
2022-07-08 1950
Feature Description
Based on the hash algorithm, a multicast Router can select a route among several equal-cost routes for each
multicast source. The routes are used for packet forwarding for the sources. As a result, multicast traffic
from different sources can be split into different forwarding paths.
Based on the hash algorithm, a multicast Router can select a route among several equal-cost routes for each
source-specific multicast group. The routes are used for packet forwarding for the source-specific multicast
groups. As a result, multicast traffic for different source-specific groups can be split into different forwarding
paths.
2022-07-08 1951
Feature Description
A stable-preferred load splitting policy can be used in the preceding three load splitting scenarios, shown in
Figure 1, Figure 2, and Figure 3.
Stable-preferred load splitting enables a Router to select an optimal route for a new join entry. An optimal
route is one on which the fewest entries depend. When the network topology and entries are stable, all
entries with the sources on the same network segment are distributed evenly among the equal-cost routes.
If an unbalance occurs after entries are deleted or route costs change, stable-preferred load splitting does
not allow a Router to balance the existing entries immediately, but allows the Router to select the optimal
routes for subsequent entries to resolve the unbalance problem.
Stable-preferred load splitting is based on entries, not traffic. Therefore, if some multicast entries are not
used to guide through traffic forwarding, multicast traffic may not evenly split among outbound interfaces,
although the outbound interfaces have an equal number of multicast entries.
1. If the longest match principle is configured for route selection, a route with the longest matched mask
is chosen by the multicast router.
For example, there is a multicast source with the IP address of 10.1.1.1, and multicast data needs to be
sent to a host with the IP address of 192.168.1.1. There are two reachable routes to the source in the
static routing table and intra-domain unicast routing table, and the destination network segments are
10.1.0.0/16 and 10.1.1.0/24. Based on the longest match principle for route selection, the route to the
network segment of 10.1.1.0/24 is chosen as the forwarding path for the multicast data.
2. If the mask lengths of the routes are the same, the route with a higher priority is chosen as the
forwarding path for the multicast data.
3. If the mask lengths and priorities of the routes are the same, a route is selected in the order of a static
route, an inter-domain unicast route, and an intra-domain unicast route as the forwarding path for
multicast data.
2022-07-08 1952
Feature Description
4. If all the preceding conditions cannot determine a forwarding path for multicast data, the route with
the highest next-hop address is chosen.
• Use multicast multi-topology to deploy multicast services on a network that has a unidirectional
Multiprotocol Label Switching – Traffic Engineering (MPLS TE) tunnel configured.
On the network, a unidirectional MPLS TE tunnel is established, and multicast services are enabled.
After Interior Gateway Protocol (IGP) Shortcut or Forwarding Advertise (FA) is configured, the outbound
interface of the route calculated by IGP is not the actual physical interface but a TE tunnel interface. A
receiver joins a multicast group, but the multicast data sent by the server can only travel through Device
E and reach Device C through a physical link. This is because the TE tunnel is unidirectional. Device C
has no multicast routing entries, so it does not forward the multicast data to the receiver. The multicast
service fails to work for this receiver.
Multicast multi-topology resolves this problem by dividing the network into several logical topologies.
For example, the links in green shown in Figure 1 construct a multicast topology and the network
operators deploy multicast services only in the multicast topology. Then, after Device A receives a Join
2022-07-08 1953
Feature Description
message from the receiver and performs the RPF check, it selects only the route in the multicast
topology with the upstream device being Device D and sets up an MDT hop-by-hop. The multicast data
travels through the path Device E → Device D → Device A and successfully reaches the receiver.
Usage Scenario
Multicast boundaries are used to control multicast information transmission by allowing the multicast
information of each multicast group to be transmitted only within a designated scope. A multicast boundary
can be configured on an interface to form a closed multicast forwarding area. After a multicast boundary is
configured for a specific multicast group on an interface, the interface cannot receive or send multicast
packets for the multicast group.
Principles
As shown in Figure 1, DeviceA, DeviceB, and DeviceC form multicast domain 1. DeviceD, DeviceE, and Device
F form multicast domain 2. The two multicast domains communicate through DeviceB and DeviceD.
2022-07-08 1954
Feature Description
Interface 1 and Interface 2 in this example are GE 1/0/0 and GE 2/0/0, respectively.
To isolate the data for a multicast group G from the other multicast domain, configure a multicast boundary
on GE 1/0/0 or GE 2/0/0 for group G. Then, the interface no longer forwards data to and receives data from
group G.
Definition
Multicast VPN (MVPN) in Rosen Mode is based on the multicast domain (MD) scheme defined in relevant
standards. MVPN in Rosen Mode implements multicast service transmission over MPLS/BGP VPNs.
Purpose
MVPN in Rosen Mode transmits multicast data and control messages of PIM instances in a VPN over a
public network to remote sites of the VPN.
With MVPN in Rosen Mode, a public network PIM instance (called a PIM P-instance) does not need to know
multicast data transmitted in a PIM VPN instance (called a PIM C-instance), and a PIM C-instance does not
need to know multicast data transmitted in a PIM P-instance. Therefore, MVPN in Rosen Mode isolates
multicast data between a PIM P-instance and a PIM C-instance.
2022-07-08 1955
Feature Description
• Share-group
A share-group is a group that all PE VPN instances in the same MD should join. A VPN instance can join
a maximum of one share-group.
• Share-MDT
A share-multicast distribution tree (share-MDT) transmits PIM protocol packets and data packets
between PEs in the same VPN instance. A share-MDT is built when PIM C-instances join share-groups.
• MTI
A multicast tunnel interface (MTI) is the outbound or inbound interface of a multicast tunnel (MT) or
an MD. MTIs are used to transmit VPN data between local and remote PEs.
An MTI is regarded as a channel through which the public network instance and a VPN instance
communicate. An MTI connects a PE to an MT on a shared network segment and sets up PIM neighbor
relationships between PE VPN instances in the same MD.
• Switch-group
A switch-group is a group to which all VPN data receivers' PEs join. Switch-groups are the basis of
switch-MDT setup.
• Switch-MDT
A switch-multicast distribution tree (switch-MDT) implements on-demand multicast data transmission,
so a switch-MDT transmits multicast data to only PEs that require the multicast data. A switch-MDT can
be built after a share-MDT is set up and VPN data receivers' PEs join a switch-group.
• A PIM instance that runs in a VPN instance bound to a PE is called a VPN-specific PIM instance or a PIM
C-instance.
• A PIM instance that runs in a public network instance bound to a PE is called a PIM P-instance.
2022-07-08 1956
Feature Description
1. MVPN establishes a multicast tunnel (MT) between each two PIM C-instances.
2. Each PIM C-instance creates a multicast tunnel interface (MTI) to connect to a specific MT.
VPN instances with the same share-group address construct a multicast domain (MD).
On the network shown inFigure1 MVPN networking, VPN BLUE instances bound to PE1 and PE2
communicate through MD BLUE, and VPN RED instances bound to PE1 and PE2 communicate through MD
RED. See Figure 2 and Figure 3.
2022-07-08 1957
Feature Description
The PIM C-instance on the local PE considers the MTI as a LAN interface and sets up a PIM neighbor
relationship with the remote PIM C-instance. The PIM C-instances then use the MTIs to perform DR election,
send Join/Prune messages, and transmit multicast data.
The PIM C-instances send PIM protocol packets or multicast data packets to the MTIs and the MTIs
encapsulate the received packets. The encapsulated packets are public network multicast data packets that
are forwarded by PIM P-instances. Therefore, an MT is actually a multicast distribution tree on a public
network.
• VPNs use different MTs, and each MT uses a unique packet encapsulation mode, so multicast data is
isolated between VPNs.
• PIM C-instances on PEs in the same VPN use the same MT and communicate through this MT.
A VPN uniquely defines an MD. An MD serves only one VPN. This relationship is called a one-to-one relationship. The
VPN, MD, MTI, and share-group are all in a one-to-one relationship.
2022-07-08 1958
Feature Description
2022-07-08 1959
Feature Description
1. The PIM P-instance on PE1 sends the rendezvous point (RP) a Join message that carries a share-group
address. The RP, that is the P device, receives the Join message and creates the (*, 239.1.1.1) entry. PE2
and PE3 also send Join messages to the RP. A rendezvous point tree (RPT) is thus created in the
multicast domain (MD), with the RP at the root and PE1, PE2, and PE3 at the leaves.
2. The PIM P-instance on PE1 sends the RP a Register message that has the multicast tunnel interface
(MTI) address as the source address and the share-group address as the group address. The RP
receives the Register message and creates the (10.1.1.1, 239.1.1.1) entry. PE2 and PE3 also send
Register messages to the RP. Then, three independent RP-source trees that connect PEs to the RP are
built in the MD.
On the PIM-SM network, an RPT (*, 239.1.1.1) and three independent RP-source trees construct a share-
MDT.
2022-07-08 1960
Feature Description
2. The PE adds the MTI address as the source address and the share-group address as the group address
to the message and converts the message to a multicast data message of the public network,
regardless of whether the message is a protocol message or a data message. Figure 1 shows the
encapsulation format of a public network multicast data message.
4. The public network instance forwards the message to a public network instance on a remote PE along
the share-MDT.
5. The remote PE decapsulates the message, reverts it to a VPN multicast message, and forwards it to a
VPN instance.
Figure 1 shows the message converting processes. Table 1 describes involved fields.
Field Description
P-IP Header IP header of a public network multicast data message. In this header,
the source address is the MTI's address, and the destination address is
the share-group's address.
2022-07-08 1961
Feature Description
• MTIs exchange Hello messages to set up PIM neighbors between VPN instances.
• If receivers and the VPN rendezvous point (RP) belong to different sites, receivers send Join messages
across the public network to set up a shared tree.
• If the multicast source and the VPN RP belong to different sites, registration must be initiated across the
public network to set up a source tree.
In the following example, multicast protocol messages are transmitted along a Share-MDT, the public and
VPN networks run PIM-SM, and receivers on the VPN send Join messages across the public network.
As shown in Figure 2, the receiver in VPN A belongs to Site 2 and is connected to CE 2. CE 1 is the RP of the
VPN group G (225.1.1.1) and belongs to Site 1.
The process of transmitting multicast protocol messages along the Share-MDT is as follows:
1. The receiver instructs CE2 to receive and forward data of the multicast group G. CE 2 creates the (*,
225.1.1.1) entry, and then sends a Join message to the VPN RP (CE1).
2. The VPN instance on PE 2 receives the Join message sent by CE2, creates the (*, 225.1.1.1) entry, and
2022-07-08 1962
Feature Description
specifies an MTI as the upstream interface. The instance then forwards the Join message for further
processing. The VPN instance on PE2 considers that the Join message has been sent from the MTI.
3. PE 2 encapsulates the Join message with the address of the MTI on PE 2 as the source address and the
share-group address as the group address, and converts the message to a common multicast data
message (10.1.2.1, 239.1.1.1) on the public network. PE2 forwards the multicast data packet to the
public network instance.
4. The share-MDT forwards the multicast data message (10.1.2.1, 239.1.1.1) to the public network
instance on each PE. PEs decapsulate the message and revert it to the Join message sent to the VPN
RP. If the VPN RP (CE1) resides in the site directly connected with a PE, the PE sends the message to
its VPN instances for further processing. Otherwise, the PE discards the Join message.
5. After receiving the Join message, the VPN instance on PE 1 considers that the message is received
from an MTI. The instance creates the (*, 225.1.1.1) entry, and specifies an MTI as the downstream
interface and the interface towards CE1 as the upstream interface. Then, the instance sends the Join
message to the VPN RP.
6. After receiving the Join message from the instance on PE1, CE1 updates or creates the (*, 225.1.1.1)
entry. The multicast shared tree across VPNs is thus set up.
• When receivers and the VPN RP belong to different sites, the VPN multicast data is transmitted across
the public network along a VPN rendezvous point tree (RPT).
• When the multicast source and receivers belong to different sites, the VPN multicast data is transmitted
across the public network along a source tree.
In the following example, the public network and VPNs run PIM-SM. VPN multicast data is transmitted
across the public network along an SPT.
As shown in Figure 3, the multicast source in VPN A sends multicast data to the group G (232.1.1.1). The
receiver belongs to Site 2 and connects to CE 2.
2022-07-08 1963
Feature Description
1. The source sends a VPN multicast data message (192.168.1.1, 232.1.1.1) to CE1.
2. CE 1 forwards the VPN multicast data to PE 1 along the SPT. The VPN instance on PE 1 searches for a
matching forwarding entry. If the outbound interface of the forwarding entry contains an MTI, the
instance forwards the VPN multicast data for further processing. The VPN instance on PE 1 then
considers that the Join message is sent out from the MTI.
3. PE 1 encapsulates the VPN multicast message with the address of the MTI on PE 1 as the source
address and the share-group address as the group address, and converts the message to a public
network multicast data message (10.1.1.1, 239.1.1.1). The message is forwarded to the public network
instance.
4. The share-MDT forwards the multicast data message (10.1.1.1, 239.1.1.1) to the public network
instance on each PE. Each PE decapsulates it, reverts it to VPN multicast data, and forwards it to a
specific VPN instance for further processing. If there is an SPT downstream interface on the PE, the
data is forwarded along the SPT. Otherwise, the data is discarded.
5. PE2 searches for the forwarding entry in the VPN instance and sends the VPN multicast data message
to the receiver. Transmission of this VPN multicast data message is complete.
Background
According to the process of establishing a Share-multicast distribution tree (Share-MDT) described in the
previous section, you can find that the VPN instance bound to PE3 has no receivers but PE3 still receives the
VPN multicast data packet of the group (192.168.1.1, 225.1.1.1). This is a defect of the multicast domain
(MD) scheme: All the PEs belonging to the same MD can receive multicast data packets regardless of
2022-07-08 1964
Feature Description
whether they have receivers. This wastes the bandwidth and imposes extra burden on PEs.
In MVPN, an optimized solution, Switch-MDT, is provided so that multicast data can be transmitted on
demand. It allows on-demand multicast transmission. Traffic will be switched from the Share-MDT to the
Switch-MDT if the rate of multicast traffic on PEs reaches the threshold. Only the PEs that have receivers
connected to them will receive multicast data from the Switch-MDT. This reduces the burden on PEs and
bandwidth consumption.
Implementation
Figure 1 shows the switch-MDT implementation process based on the assumption that a share-MDT has
been successfully established.
1. On PE1, set 238.1.1.0–238.1.1.255 as the switch-group-pool address range of the switch-MDT and set
the data forwarding rate threshold that triggers a switch-MDT switchover.
2. When the rate of data forwarded by the source connected with CE1 exceeds the configured threshold,
PE1 selects a group address (for example, 238.1.1.0) and periodically sends signaling packets to other
PEs through the share-MDT to instruct them to switch to the switch-MDT.
3. If PE2 has a receiver, after receiving the signaling packet, PE2 joins the group 238.1.1.0. Then, a
switch-MDT is set up. The switch-MDT setup process is similar to that of a share-MDT. If PE3 has no
receivers, after receiving the signaling packet, PE3 does not join the switch-MDT. As a result, only PE2
can receive the VPN multicast data packets of (192.168.1.1, 225.1.1.1). Note that PIM control
messages are still transmitted along the share-MDT.
A switch-MDT switchover occurs if the following conditions are met:
• The source and group addresses of VPN multicast data packets match the source and group
address ranges defined in ACL filtering rules. Otherwise, the packets are still forwarded along the
share-MDT.
2022-07-08 1965
Feature Description
• The forwarding rate of VPN multicast data packets exceeds the switchover threshold for a
specified time range.
4. In some cases, the forwarding rate of VPN multicast data packets fluctuates around the switchover
threshold. To prevent multicast data packets from being frequently switched between a share-MDT
and a switch-MDT, the system does not immediately perform a switchover after the system detects
that the forwarding rate exceeds the switchover threshold. Instead, the system starts a switch-delay
timer. During the switch-MDT setup, the share-MDT is still used for multicast data packet forwarding.
Therefore, the switch-delay timer helps implement non-stop data forwarding during a switchover from
a share-MDT to a switch-MDT. Before the switch-delay timer expires, the system keeps detecting the
data forwarding rate. If the rate remains consistently higher than the switchover threshold throughout
the timer period, data packets are switched to the switch-MDT. Otherwise, the packets are still
forwarded along the share-MDT.
• The forwarding rate of VPN multicast data packets is lower than the specified threshold throughout the
switch-Holddown period.
• In some cases, the forwarding rate of VPN multicast data packets fluctuates around the switchover
threshold. To prevent the multicast data flow from being frequently switched between a switch-MDT
and a share-MDT, the system does not immediately perform a switchover when the system detects that
the forwarding rate is lower than the switchover threshold. Instead, the system starts a Holddown timer.
Before the Holddown timer expires, the system keeps detecting the data forwarding rate. If the rate
remains consistently lower than the switchover threshold throughout the timer period, the data packets
are switched back to the share-MDT. Otherwise, the packets are still forwarded along the switch-MDT.
• After the switch-group-pool is switched, the switch-group address encapsulated in VPN multicast data is
not in the address range of the new switch-group-pool.
• After advanced ACL rules that control a switch-MDT switchover change, VPN multicast data packets do
not match the new ACL rules.
Background
Rosen MVPN supports only intra-VPN multicast service distribution. To enable a service provider on a VPN to
provide multicast services for users on other VPNs, use MVPN extranet.
Implementation
2022-07-08 1966
Feature Description
Remote cross The source and receiver VPN instances The device supports the configuration
reside on different PEs. of a source VPN instance on a receiver
PE.
• The address range of multicast groups using the MVPN extranet service cannot overlap that of multicast groups
using the intra-VPN service.
• Only a static RP can be used in an MVPN extranet scenario, the same static RP address must be configured on the
source and receiver VPN sides, and the static RP address must belong to the source VPN. If different RP addresses
are configured, inconsistent multicast routing entries will be created on the two instances, causing service
forwarding failures.
• To provide an SSM service using MVPN extranet, the same SSM group address must be configured on the source
and receiver VPN sides.
Remote Cross
On the network shown in Figure 1, VPN GREEN is configured on PE1; PE1 encapsulates packets with the
share-group G1 address; CE1 connects to the multicast source in VPN GREEN. VPN BLUE is configured on
PE2; PE2 encapsulates packets with the share-group G2 address; CE2 connects to the multicast source in
VPN BLUE. VPN BLUE is configured on PE3; PE3 encapsulates packets with the share-group G2 address; PE3
establishes a multicast distribution tree (MDT) with PE2 on the public network. Users connect to CE3 require
to receive multicast data from both VPN BLUE and VPN GREEN
2022-07-08 1967
Feature Description
Configure source VPN GREEN on PE3 and a multicast routing policy for receiver VPN BLUE. Table 2 describes
the implementation process.
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data from the
multicast source in VPN GREEN and forwards the Join message to PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message from CE3 in VPN BLUE, it creates a multicast
routing entry. Through the RPF check, PE3 determines that the upstream interface of the
RPF route belongs to VPN GREEN. Then, PE3 adds the upstream interface (serving as an
extranet inbound interface) to the multicast routing table.
3 PE3 PE3 encapsulates the PIM Join message with the share-group G1 address of VPN GREEN
and sends the PIM Join message to PE1 in VPN GREEN over the public network.
4 PE1 After PE1 receives the multicast data from the source in VPN GREEN, PE1 encapsulates
the multicast data with the share-group G1 address of VPN GREEN and sends the data to
PE3 in VPN GREEN over the public network.
5 PE3 PE3 decapsulates and imports the received multicast data to receiver VPN BLUE and
sends the data to CE3. Then, CE3 forwards the data to the receiver in VPN BLUE.
Local Cross
On the network shown in Figure 2, PE1 is the source PE of VPN BLUE. CE1 connects to the multicast source
in VPN BLUE. CE4 connects to the multicast source in VPN GREEN. Both CE3 and CE4 reside on the same
side of PE3. Users connect to CE3 require to receive multicast data from both VPN BLUE and VPN GREEN.
2022-07-08 1968
Feature Description
Table 3 describes how MVPN extranet is implemented in the local crossing scenario.
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data from the
multicast source in VPN GREEN and forwards the Join message to PE3 through PIM.
2 PE3 After PE3 receives the PIM Join message, it creates a multicast routing entry of VPN
BLUE. Through the RPF check, PE3 determines that the upstream interface of the RPF
route belongs to VPN GREEN. PE3 then imports the PIM Join message to VPN GREEN.
3 PE3 PE3 creates a multicast routing entry in VPN GREEN, records receiver VPN BLUE in the
entry, and sends the PIM Join message to CE4 in VPN GREEN.
4 PE3 After CE4 receives the PIM Join message, it sends the multicast data from VPN GREEN to
PE3, PE3 imports the multicast data to receiver VPN BLUE based on the multicast routing
entries of VPN GREEN.
5 PE3 PE3 sends the multicast data to CE3 based on the multicast routing entries of VPN BLUE.
Then, CE3 forwards the data to the receiver in VPN BLUE.
In MVPN extranet scenarios where the multicast source resides on a public network and the receiver resides on a VPN,
static routes to the multicast source and public network RP must be configured in the receiver VPN instance. After the
device where the receiver VPN instance resides imports the PIM join message from the VPN instance to the public
network instance and establishes a multicast routing entry, the device can send multicast data from the public network
instance to the VPN instance, and then to the receivers. Multicast protocol and data packets can be directly forwarded to
the receiver without the need to be encapsulated and decapsulated by GRE.
2022-07-08 1969
Feature Description
Background
Multicast packets, including protocol packets and data packets, are transmitted over the public network to
private networks along public network multicast distribution trees (MDTs). Public network MDTs are
categorized into the following types:
• PIM-SM MDT: an MDT established by sending PIM-SM Join messages to the intermediate device RP.
PIM-SM MDTs are used in scenarios in which the location of the multicast source (MTI) is unknown.
• PIM-SSM MDT: an MDT established by sending PIM-SSM Join messages to the multicast source. PIM-
SSM MDTs are used in scenarios in which the location of the multicast source (MTI) is known.
A PIM-SSM MDT can be established only when the location of the public network multicast source (address
of the MTI on the PE) is known.
In MD MVPN scenarios, however, a PE cannot obtain the MTI address of the peer PE before an MDT is
established. Therefore, only the PIM-SM MDT can be used in this case. You can configure the RP on the
public network and establish a public network MDT for PEs through the RP.
In BGP A-D MVPN scenarios, MDT-AD routes are transmitted through BGP MDT-AD messages. MDT-AD
routes carry the public multicast source address, and a PE can obtain the MTI address of the peer PE.
Therefore, a PIM-SSM MDT can be established in this case to transmit multicast protocol and data packets.
In both the MD MVPN and BGP A-D MVPN scenarios, all PEs are logically fully-meshed, and public network
MDTs must be established between PEs. Therefore, public network MDTs can be established, regardless of
whether there is VPN traffic.
The establishment of public network MDTs is related only to the configurations of the VPN Share-Group
address and Mtunnel interface.
Related Concepts
The concepts related to BGP A-D MVPN are as follows:
• Peer: a BGP speaker that exchanges messages with another BGP speaker.
• BGP A-D: a mechanism in which PEs exchange BGP Update packets that carry A-D route information to
obtain and record peer information of a VPN.
Implementation
For multicast VPN in BGP A-D mode, only MDT-SAFI A-D is supported, in which a new address family is
defined by BGP. In this manner, after a VPN instance is configured on a PE, the PE advertises the VPN
configuration including the RD, Share-Group address, and IP address of the MTI to all its BGP peers. After a
remote PE receives an MDT-SAFI message advertised by BGP, the remote PE compares the Share-Group
address in the message with its Share-Group address. If the remote PE confirms that it is in the same VPN as
the sender of the MDT-SAFI message, the remote PE establishes the PIM-SSM MDT on the public network to
2022-07-08 1970
Feature Description
As shown in Figure 1, PE1, PE2, and PE3 belong to VPN1, and join the Share-Group G1. The address of G1 is
within the SSM group address range. BGP MDT-SAFI A-D mode is enabled on each PE. In addition, the BGP
A-D function is enabled on VPN1. The site where CE1 resides is connected to Source of VPN1, and CE2 and
CE3 are connected to VPN1 users. Based on the BGP A-D mechanism, every PE on the network obtains and
records information about all its BGP peers on the same VPN, and then directly establishes a PIM-SSM MDT
on the public network for transmitting multicast VPN services. In this manner, MVPN services can be
transmitted over a public network tunnel based on the PIM-SSM MDT.
The following uses PE3 as an example to describe service processing in MVPN in BGP A-D mode:
1. After being configured with the BGP A-D function, PE1, PE2, and PE3 negotiate session parameters,
and confirm that they all support the BGP A-D function. Then, the PEs can establish BGP peer
relationships. After receiving a BGP Update message from PE1 and PE2, PE3 obtains and records the
BGP peer addresses of PE1 and PE2. The BGP Update messages carry the information about the PEs
that send the messages.
2. VPN1 is configured on PE3. PE3 joins the Share-Group G1. PE3 creates a PIM-SSM entry with G1 being
the group address and the address of PE1 being the source address and another PIM-SSM entry with
G1 being the group address and the address of PE2 being the source address. PE3 then directly sends
PIM Join messages to PE1 and PE2 to establish two PIM-SSM MDTs to PE1 and PE2, respectively.
3. CE3 sends a Join message to PE3. After receiving the Join message, PE3 encapsulates the Join message
with the PIM-SSM Share-Group address, and then sends the message to PE1 over the public network
2022-07-08 1971
Feature Description
tunnel. PE1 then decapsulates the received Join message, and then sends it to the multicast source.
4. After the multicast data sent by the multicast source reaches PE1, PE1 encapsulates the multicast data
with the Share-Group address, and then forwards it to PE3 over the public network tunnel. PE3 then
forwards the multicast data to CE3, and CE3 sends the multicast data to the user.
The following example uses VPN BLUE to describe how multicast services are isolated between VPNs.
1. After a share-multicast distribution tree (MDT) is established for the BLUE instances, the two BLUE
instances connected with CE 1 and CE 2 exchange multicast protocol packets through a multicast
tunnel (MT).
2. Multicast devices in the BLUE instances can then establish neighbor relationships, and send Join,
Prune, and BSR messages to each other. The protocol packets in the BLUE instances are encapsulated
and decapsulated only on the MTs of the PEs. The PEs are unaware they are on VPN networks, so they
process the multicast protocol packets and forward multicast data packets like devices on a public
network. Multicast data is transmitted in the same MD, but isolated from VPN instances in other MDs.
Terms
Term Definition
SPT Shortest path tree. A multicast tree with a multicast source at the root and group
members at the leaves. An SPT applies to PIM-DM, PIM-SM, and PIM-SSM.
share-group A multicast group that all VPN instances on PEs in the same multicast domain join.
Currently, one VPN instance can be configured with only one share-group, so one VPN
instance can join only one MD.
share-MDT An MDT that is set up when PIM C-instances on PEs join a share-group. A share-MDT
transmits PIM protocol packets and low-rate data packets in a VPN to other PEs within
the same VPN. A share-MDT is considered as a multicast tunnel (MT) within an MD.
switch-group A group that PEs that have multicast VPN receivers join after a share-MT is set.
switch-MDT Switch-multicast distribution tree. An on-demand MDT that is set up after PEs that
have multicast VPN receivers join a switch-group. A switch-MDT prevents multicast
data packets from being transmitted to unnecessary PEs and transmits high-rate data
packets to other PEs in the same VPN.
AS autonomous system
2022-07-08 1973
Feature Description
RP rendezvous point
Definition
The NG MVPN is a new framework designed to transmit IP multicast traffic across a BGP/MPLS IP VPN. An
NG MVPN uses BGP to transmit multicast protocol packets, and uses PIM-SM, PIM-SSM, P2MP TE, or mLDP
to transmit multicast data packets. The NG MVPN enables unicast and multicast services to be delivered
using the same VPN architecture.
NG MVPN uses BGP to transmit VPN multicast routes and uses MPLS P2MP tunnels to carry VPN multicast
traffic so that the traffic can be transmitted from the multicast sources to the remote VPN site over the
public network.
Figure 1 shows the basic NG MVPN model. Table 1 describes the roles on an NG MVPN.
Customer edge (CE) A CE directly connects to a service CE1, CE2, and CE3 in Figure 1
provider network. Usually, a CE is
2022-07-08 1974
Feature Description
Provider edge (PE) A PE directly connects to CEs. On PE1, PE2, and PE3 in Figure 1
an MPLS network, PEs process all
VPN services. Therefore, the
requirements for PE performance
are high.
Receiver Site A receiver site is a site where Networks where the receivers in
multicast receivers reside. Figure 1 reside.
Sender Site A sender site is a site where the Network where the source in
multicast source resides. Figure 1 resides
Purpose
BGP/MPLS IP VPNs are widely deployed as they provide excellent reliability and security. In addition, IP
multicast is gaining increasing popularity among service providers as it provides highly efficient point-to-
multipoint (P2MP) traffic transmission. Rapidly developing multicast applications, such as IPTV, video
conference, and distance education, impose increasing requirements on network reliability, security, and
efficiency. As a result, service providers' demand for delivering multicast services over BGP/MPLS IP VPNs is
also increasing. In this context, the MVPN solution is developed. The MVPN technology, when applied to a
BGP/MPLS IP VPN, can transmit VPN multicast traffic to remote VPN sites across the public network.
Rosen MVPNs establish multicast distribution trees (MDTs) using PIM to transmit VPN multicast protocol
and data packets, and have the following limitations:
• VPN multicast protocol and data packets must be transmitted using the MDT, which complicates
network deployment because the multicast function must be enabled on the public network.
• The public network uses GRE for multicast packet encapsulation and cannot leverage the MPLS
2022-07-08 1975
Feature Description
advantages, such as high reliability, QoS guarantee, and TE bandwidth reservation, of existing
BGP/MPLS IP VPNs.
NG MVPNs, which have made improvements over Rosen MVPNs, have the following characteristics:
• The public network uses BGP to transmit VPN multicast protocol packets and routing information.
Multicast protocols do not need to be deployed on the public network, simplifying network deployment
and maintenance.
• The public network uses the mature label-based forwarding and tunnel protection techniques of MPLS,
improving multicast service quality and reliability.
Benefits
NG MVPNs, which implement hierarchical forwarding of multicast data and control packets on BGP/MPLS IP
VPNs, offer the following benefits:
• Better service quality and reliability by using mature label-based forwarding and tunnel protection
techniques of MPLS.
2022-07-08 1976
Feature Description
PEs on an NG MVPN exchange control messages to implement functions such as MVPN membership
autodiscovery, PMSI tunnel establishment, and VPN multicast group joining and leaving. The following
describes these NG MVPN control messages. All examples in this section are based on the network shown in
Figure 1. On this network:
• The service provider's backbone network provides both unicast and multicast VPN services for vpn1. The
AS number of the backbone network is 65001.
• The multicast source resides at Site1, accesses PE1 through CE1, and sends multicast traffic to multicast
group 232.1.1.1.
• The backbone network provides MVPN services for vpn1 over RSVP-TE or mLDP P2MP tunnels.
Figure 1 NG MVPN
MVPN NLRI
2022-07-08 1977
Feature Description
In NG MVPN, MVPN routing information is carried in the network layer reachability information (NLRI) field
of BGP Update messages. The NLRI containing MVPN routing information is called MVPN NLRI. The SAFI of
the MVPN NLRI is 5. Figure 2 shows the MVPN NLRI format.
Field Description
Route type Type of an MVPN route. Seven types of MVPN routes are available. For more
information, see Table 2.
Length Length of the Route type specific field in the MVPN NLRI.
Route type MVPN routing information. The value of this field depends on the Route type field.
specific For details, see Table 2.
Table 2 describes the types and functions of MVPN routes. Type 1-5 routes are called MVPN A-D routes.
These routes are used for MVPN membership autodiscovery and P2MP tunnel establishment. Type 6 and
Type 7 routes are called C-multicast routes (C is short for Customer. C-multicast routes refer to multicast
routes from the private network). These routes are used for VPN multicast group joining and VPN multicast
traffic forwarding.
1: Intra-AS I-PMSI Used for MVPN Figure 3 RD: route distinguisher, an 8-byte field
A-D route membership in a VPNv4 address. An RD and a 4-byte
autodiscovery in intra-AS IPv4 address prefix form a VPNv4
scenarios. MVPN-capable address, which is used to differentiate
PEs use Intra-AS I-PMSI IPv4 prefixes using the same address
A-D routes to advertise space.
and learn intra-AS MVPN Originating router's IP address: IP
membership information. address of the device that originates
Intra-AS A-D routes. In NE40E
2022-07-08 1978
Feature Description
2: Inter-AS I-PMSI Used for MVPN Figure 4 RD: route distinguisher, an 8-byte field
A-D route membership in a VPNv4 address. An RD and a 4-byte
autodiscovery in inter-AS IPv4 address prefix form a VPNv4
scenarios. MVPN-capable address, which is used to differentiate
ASBRs use Inter-AS I- IPv4 prefixes using the same address
PMSI A-D routes to space.
advertise and learn inter- Source AS: AS where the source device
AS MVPN membership that sends Inter-AS A-D routes resides.
information.
3: S-PMSI A-D Used by a sender PE to Figure 5 RD: route distinguisher, an 8-byte field
route initiate a selective P- in a VPNv4 address. An RD and a 4-byte
tunnel for a particular IPv4 address prefix form a VPNv4
(C-S, C-G). address, which is used to differentiate
IPv4 prefixes using the same address
space.
Multicast source length: length of a
multicast source address. The value is
32 if the multicast group address is an
IPv4 address or 128 if the multicast
group address is an IPv6 address.
Multicast source: address of a multicast
source.
Multicast group length: length of a
multicast group address. The value is 32
if the multicast group address is an IPv4
address or 128 if the multicast group
address is an IPv6 address.
Multicast group: address of a multicast
group.
Originating router's IP address: IP
address of the device that originates A-
D routes. In NE40E implementation, the
2022-07-08 1979
Feature Description
4: Leaf A-D route Used to respond to a Figure 6 Route key: set to the MVPN NLRI of the
Type 1 Intra-AS I-PMSI NOTE:
S-PMSI A-D route received.
A-D route with the flags Originating router's IP address: IP
The Route key
field in the PMSI is set to the address of the device that originates A-
MVPN NLRI of
attribute being 1 or a the S-PMSI A- D routes. In NE40E implementation, the
Type 3 S-PMSI A-D route. D route
received. value is the MVPN ID of the device that
If a receiver PE has a originates BGP A-D routes.
request for establishing
an S-PMSI tunnel, it
sends a Leaf A-D route
to help the sender PE
collect tunnel
information.
5: Source Active Used by PEs to learn the Figure 7 RD: RD of the sender PE connected to
A-D route identity of active VPN the multicast source.
multicast sources. Multicast source length: length of a
multicast source address. The value is
32 if the multicast group address is an
IPv4 address or 128 if the multicast
group address is an IPv6 address.
Multicast source: address of a multicast
source.
Multicast group length: length of a
multicast group address. The value is 32
if the multicast group address is an IPv4
address or 128 if the multicast group
address is an IPv6 address.
Multicast group: address of a multicast
group.
6: Shared Tree Used in (*, G) scenarios. Figure 8 Route type: MVPN route type. The value
Join route A Shared Tree Join route NOTE:
6 indicates that the route is a Type 6
2022-07-08 1980
Feature Description
receiver PE receives a (C- Join routes Rt-import: VRF Route Import Extended
and Source
*, C-G) PIM Join Community of the unicast route to the
Tree Join
message. A receiver PE routes have multicast source. For more information
the same NLRI
sends the Shared Tree format. The about the VRF Route Import Extended
multicast
Join route to sender PEs Community, see MVPN Extended
source address
with which it has is the RP Communities. The VRF Route Import
address for
established BGP peer (C-*, C-G) Extended Community is used by sender
joins. PEs to determine whether to process
relationships.
NOTE:
the BGP C-multicast route sent by a
receiver PE. This attribute also helps a
The (*, G) PIM-SM join
initiated by a VPN is sender PE to determine to which VPN
called a (C-*, C-G) PIM
join. instance routing table a BGP C-
multicast route should be added.
Next hop: next hop address.
RD: RD of the sender PE connected to
the multicast source.
Source AS: Source AS Extended
Community of the unicast route to the
multicast source. For more information
about the Source AS Extended
Community, see MVPN Extended
Communities.
Multicast source length: length of a
multicast source address. The value is
32 if the multicast group address is an
IPv4 address or 128 if the multicast
group address is an IPv6 address.
RP address: rendezvous point address.
Multicast group length: length of a
multicast group address. The value is 32
if the multicast group address is an IPv4
address or 128 if the multicast group
address is an IPv6 address.
Multicast group: address of a multicast
group.
2022-07-08 1981
Feature Description
7: Source Tree Used in (S, G) scenarios. Figure 9 RD: RD of the sender PE connected to
Join route A Source Tree Join route the multicast source.
2022-07-08 1982
Feature Description
2022-07-08 1983
Feature Description
2022-07-08 1984
Feature Description
On an NG MVPN, the sender PE sets up the P-tunnel, and therefore is responsible for originating the PMSI
Tunnel attribute. The PMSI Tunnel attribute can be attached to intra-AS I-PMSI A-D route, inter-AS I-PMSI
A-D routes, or S-PMSI A-D routes and sent to receiver PEs. Figure 10 is an example that shows the format of
an Intra-AS I-PMSI A-D route carrying the PMSI Tunnel attribute.
2022-07-08 1985
Feature Description
Figure 10 Intra-AS I-PMSI A-D route carrying the PMSI Tunnel attribute
• The multicast source can be registered by the directly connected multicast source DR to the RP, or
receive the Join message sent by the receiver DR, to send multicast data to the receiver. For details
about multicast source registration, see Understanding PIM.
• A multicast user joins a multicast group through IGMP/MLD, and then the multicast device to which the
multicast group belongs sends a Join message to the multicast source through PIM. In this manner, the
multicast user can receive multicast data. For details about how multicast users join multicast groups on
VPNs, see Understanding IGMP and Understanding MLD.
On an NG MVPN, after a BGP peer relationship is established between PEs in the BGP MVPN address family,
the BGP MVPN extended community attribute can be used to carry the VPN multicast route (C-multicast
route) to transmit the join/leave information of multicast users.
• Source AS Extended Community: carried in VPNv4 routes advertised by PEs. This attribute is an AS
extended community attribute and is mainly used in inter-AS scenarios.
• VRF Route Import Extended Community: carried in VPNv4 routes advertised by sender PEs to receiver
PEs. When a receiver PE sends a BGP C-multicast route to a sender PE, the receiver PE attaches this
attribute to the route. In a scenario in which many sender PEs exist, this attribute helps a sender PE that
receives the BGP C-multicast route to determine whether to process the route and to which VPN
instance routing table the BGP C-multicast route should be added.
2022-07-08 1986
Feature Description
The value of the VRF Route Import Extended Community is in the format of "Administrator field value:
Local Administrator field value". The Administrator field is set to the local MVPN ID, whereas the Local
Administrator field is set to the local VPN instance ID of the sender PE.
On the network shown in Figure 1, PE1 and PE2 are both sender PEs, and PE3 is a receiver PE. PE1 and
PE2 connect to both vpn1 and vpn2. On PE1, the VRF Route Import Extended Community is 1.1.1.9:1 for
vpn1 and 1.1.1.9:2 for vpn2; on PE2, the VRF Route Import Extended Community is 2.2.2.9:1 for vpn1
and 2.2.2.9:2 for vpn2.
After PE1 and PE2 both establish BGP MVPN peer relationships with PE3, PE1 and PE2 both send to PE3
a VPNv4 route destined for the multicast source 192.168.1.2. The VRF Route Import Extended
Community carried in the VPNv4 route sent by PE1 is 1.1.1.9:1 and that carried in the VPNv4 route sent
by PE2 is 2.2.2.9:1. After PE3 receives the two VPNv4 routes, PE3 adds the preferred route (VPNv4 route
sent by PE1 in this example) to the vpn1 routing table and stores the VRF Route Import Extended
Community value carried in the preferred route locally for later BGP C-multicast route generation.
Upon receipt of a PIM Join message from CE3, PE3 generates a BGP C-multicast route with the RT-
import attribute and sends this route to PE1 and PE2. The RT-import attribute value of this route is the
same as the locally stored VRF Route Import Extended Community value, 1.1.1.9:1.
■ Upon receipt of the BGP C-multicast route, PE1 checks the RT-import attribute of this route. After
PE1 finds that the Administrator field value is 1.1.1.9, which is the same as its local MVPN ID, PE1
accepts this route and adds it to the vpn1 routing table based on the Local Administrator field
value (1).
■ Upon receipt of the BGP C-multicast route, PE2 also checks the RT-import attribute of this route.
After PE2 finds that the Administrator field value is 1.1.1.9, which is different from its local MVPN
ID 2.2.2.9, PE2 drops this route.
2022-07-08 1987
Feature Description
This section describes the process of transmitting VPN multicast routes through the (S, G) and (*, G)
Join/Leave processes of multicast members.
On the network shown in Figure 1, CE1 connects to the multicast source, and CE2 connects multicast
receivers. CE2 sends PIM (S, G) Join/Prune messages to CE1. This process shows how a multicast member
joins and leaves a multicast group.
2022-07-08 1988
Feature Description
Figure 1 NG MVPN
Figure 2 Time sequence for joining a multicast group in PIM (S, G) mode
PE1 After PE1 receives a unicast route destined for the multicast source from CE1, PE1
converts this route to a VPNv4 route, adds the Source AS Extended Community and VRF
Route Import Extended Community to this route, and advertises this route to PE2.
For more information about the Source AS Extended Community and VRF Route Import
Extended Community, see MVPN Extended Community Attributes.
PE2 After PE2 receives the VPNv4 route from PE1, PE2 matches the export VPN target of the
route against its local import VPN target:
If the two targets match, PE2 accepts the VPNv4 route and stores the Source AS
Extended Community and VRF Route Import Extended Community values carried in this
route locally for later generation of the BGP C-multicast route.
2022-07-08 1989
Feature Description
If the two targets do not match, PE2 drops the VPNv4 route.
CE2 After CE2 receives an IGMP join request, CE2 sends a PIM-SSM Join message to PE2.
and VRF Route Import Extended Community values stored in . The RT-import
attribute of this route is set to the locally stored VRF Route Import Extended Community
value.
NOTE:
In the BGP route with MVPN information, the NLRI field is called MVPN NLRI. The routes whose
Route type value is 6 or 7 are C-multicast routes. For more information about C-multicast route
structure, see MVPN NLRI.
CE1 After CE1 receives the PIM-SSM Join message, CE1 generates a multicast entry. In this
entry, the downstream interface is the interface that receives the PIM-SSM Join message.
After that, the multicast receiver successfully joins the multicast group, and CE1 can send
multicast traffic to CE2.
2022-07-08 1990
Feature Description
Figure 3 shows the procedure for leaving a multicast group, and Table 2 describes this procedure.
CE2 CE2 detects that a multicast receiver attached to itself leaves the multicast group.
PE2 PE2 deletes the corresponding multicast entry after this entry ages out. Then, PE2
generates a BGP Withdraw message.
PE1 After PE1 receives the BGP Withdraw message, PE1 deletes the corresponding multicast
entry and generates a PIM-SSM Prune message.
CE1 After CE1 receives the PIM-SSM Prune message, CE1 stops sending multicast traffic to
CE2.
PIM (*, G) PIM (*, G) entries are The private network The RPT-to-SPT
2022-07-08 1991
Feature Description
multicast joining transmitted across the public rendezvous point (RP) can switching may occur on
and leaving across network to remote PEs. The be deployed on either a CE the public network.
the public network multicast joining process or a PE. Therefore, PEs need to
includes: maintain a lot of state
Rendezvous point tree (RPT) information.
construction (see Table 2 for Currently, a private
more information) network RP must be a
Switching from an RPT to a static RP.
shortest path tree (SPT) (see
Table 3 for more
information)
PIM (*, G) PIM (*, G) entries are PIM (*, G) entries are not The private network RP
multicast joining converted to PIM (S, G) transmitted across the can be deployed on
and leaving not entries before being public network, lowering either a PE or a CE. If a
across the public transmitted to remote PEs the performance CE serves as the private
network across the public network. requirements for PEs. network RP, the CE must
The private network RP establish an MSDP peer
can be either a static or relationship with the
dynamic RP. corresponding PE.
PIM (*, G) Multicast Joining and Leaving Across the Public Network
On the network shown in Figure 1, CE3 serves as the RP. Figure 2 shows the time sequence for establishing
an RPT. Table 2 describes the procedure for establishing an RPT.
2022-07-08 1992
Feature Description
CE2 After receiving a user join request through IGMP, CE2 sends a PIM (*, G) Join message to
PE2.
PE2 After receiving the PIM (*, G) Join message, PE2 generates a PIM (*, G) entry, in which
the downstream interface is the interface that receives the PIM (*, G) Join message. PE2
searches for the unicast route to the RP and finds that the upstream device is PE3. PE2
then generates a BGP C-multicast route (Shared Tree Join route) and sends it to PE3
through the BGP peer connection.
NOTE:
PE3 After PE3 receives the BGP C-multicast route (Shared Tree Join route):
PE3 checks the Administrator field and Local Administrator field values in the RT-import
2022-07-08 1993
Feature Description
attribute of the BGP C-multicast route. After PE3 confirms that the Administrator field
value is the same as its local MVPN ID, PE3 accepts the BGP C-multicast route.
PE3 determines the VPN instance routing table to which the BGP C-multicast route
should be added based on the Local Administrator field value in the RT-import attribute
of the route.
PE3 adds the BGP C-multicast route to the corresponding VPN instance routing table
and creates a VPN multicast entry to guide multicast traffic forwarding. In the multicast
entry, the downstream interface is PE3's P2MP tunnel interface.
PE3 converts the BGP C-multicast route to a PIM (*, G) Join message and sends this
message to CE3.
CE3 Upon receipt of the PIM (*, G) Join message, CE3 generates a PIM (*, G) entry. In this
entry, the downstream interface is the interface that receives the PIM (*, G) Join
message. Then, an RPT rooted at CE3 and with CE2 as the leaf node is established.
CE1 After CE1 receives multicast traffic from the multicast source, CE1 sends a PIM Register
message to CE3.
CE3 Upon receipt of the PIM Register message, CE3 generates a PIM (S, G) entry, which
inherits the outbound interface of the previously generated PIM (*, G) entry. In addition,
CE3 sends multicast traffic to PE3.
PE3 Upon receipt of the multicast traffic, PE3 generates a PIM (S, G) entry, which inherits the
outbound interface of the previously generated PIM (*, G) entry. Because the outbound
interface of the PIM (*, G) entry is a P2MP tunnel interface, multicast traffic is imported
to the I-PMSI tunnel.
PE2 Upon receipt of the multicast traffic, PE2 generates a PIM (S, G) entry, which inherits the
outbound interface of the previously generated PIM (*, G) entry.
CE2 Upon receipt of the multicast traffic, CE2 sends the multicast traffic to multicast
receivers.
When the multicast traffic sent by the multicast source exceeds the threshold set on set, CE2 initiates RPT-
to-SPT switching. Figure 3 shows the time sequence for switching an RPT to an SPT. Table 3 describes the
procedure for switching an RPT to an SPT.
When the receiver PE receives multicast traffic transmitted along the RPT, the receiver PE immediately initiates RPT-to-
SPT switching. The RPT-to-SPT switching process on the receiver PE is similar to that on CE2.
2022-07-08 1994
Feature Description
CE2 After the received multicast traffic exceeds the set threshold, CE2 initiates RPT-to-SPT
switching by sending a PIM (S, G) Join message to PE2.
PE2 Upon receipt of the PIM (S, G) Join message, PE2 updates the outbound interface status
in its PIM (S, G) entry, and switches the PIM (S, G) entry to the SPT. Then, PE2 searches
its multicast routing table for a route to the multicast source. After PE2 finds that the
upstream device on the path to the multicast source is PE1, PE2 sends a BGP C-multicast
route (Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP C-multicast route (Source Tree Join route), PE1 generates a PIM
(S, G) entry, and sends a PIM (S, G) Join message to CE1.
CE1 Upon receipt of the PIM (S, G) Join message, CE1 generates a PIM (S, G) entry. Then, the
RPT-to-SPT switching is complete, and CE1 can send multicast traffic to PE1.
2022-07-08 1995
Feature Description
PE1 To prevent duplicate multicast traffic, PE1 carries the PIM (S, G) entry information in a
Source Active AD route and sends the route to all its BGP peers.
PE3 Upon receipt of the Source Active AD route, PE3 records the route. After RPT-to-SPT
switching, PE3, the ingress of the P2MP tunnel for the RPT, deletes received multicast
traffic, generates the (S, G, RPT) state, and sends a PIM (S, G, RPT) Prune to its
upstream. In addition, PE3 updates its VPN multicast routing entries and stops
forwarding multicast traffic.
NOTE:
To prevent packet loss during RPT-to-SPT switching, the PIM (S, G, RPT) Prune operation is
performed after a short delay.
PE2 Upon receipt of the Source Active AD route, PE2 records the route. Because the Source
Active AD route carries information about the PIM (S, G) entry for the RPT, PE2 initiates
RPT-to-SPT switching. After PE2 sends a BGP C-multicast route (Source Tree Join route)
to PE1, PE2 can receive multicast traffic from PE1.
Figure 4 shows the time sequence for leaving a multicast group in PIM (*, G) mode. Table 4 describes the
procedure for leaving a multicast group in PIM (*, G) mode.
CE2 After CE2 detects that a multicast receiver attached to itself leaves the multicast group,
CE2 sends a PIM (*, G) Prune message to PE2. If CE2 has switched to the SPT, CE2 also
2022-07-08 1996
Feature Description
PE2 Upon receipt of the PIM (*, G) Prune message, PE2 deletes the corresponding PIM (*, G)
entry. Upon receipt of the PIM (S, G) Prune message, PE2 deletes the corresponding PIM
(S, G) entry.
PE2 PE2 sends a BGP Withdraw message (Shared Tree Join route) to PE3 and a BGP
Withdraw message (Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP Withdraw message (Source Tree Join route), PE1 deletes the
previously recorded BGP C-multicast route (Source Tree Join route) as well as the
outbound interface in the PIM (S, G) entry.
PE3 Upon receipt of the BGP Withdraw message (Shared Tree Join route), PE3 deletes the
previously recorded BGP C-multicast route (Shared Tree Join route) as well as the
outbound interface in the PIM (S, G) entry.
PIM (*, G) Multicast Joining and Leaving Not Across the Public Network
On the network show in Figure 1, each site of the MVPN is a PIM-SM BSR domain. The PE2 serves as the RP.
Figure 5 shows the time sequence for joining a multicast group when a PE serves as the RP. Table 5
describes the procedure for joining a multicast group when a PE serves as the RP.
Figure 5 Time sequence for joining a multicast group when a PE serves as the RP
2022-07-08 1997
Feature Description
CE2 After receiving a user join request through IGMP, CE2 sends a PIM (*, G) Join message to
PE2.
PE2 Upon receipt of the PIM (*, G) Join message, PE2 generates a PIM (*, G) entry. Because
PE2 is the RP, PE2 does not send the BGP C-multicast route (Shared Tree Join route) to
other devices. Then, an RPT rooted at PE2 and with CE2 as the leaf node is established.
CE1 After CE1 receives multicast traffic from the multicast server, CE1 sends a PIM Register
message to PE1.
PE1 Upon receipt of the PIM Register message, PE1 generates a PIM (S, G) entry.
PE1 PE1 sends a Source Active AD route to all its BGP peers.
PE2 Upon receipt of the Source Active AD route, PE2 generates a PIM (S, G) entry, which
inherits the outbound interface of the previously generated PIM (*, G) entry.
PE2 PE2 initiates RPT-to-SPT switching and sends a BGP C-multicast route (Source Tree Join
route) to PE1.
PE1 Upon receipt of the BGP C-multicast route (Source Tree Join route), PE1 imports
multicast traffic to the I-PMSI tunnel based on the corresponding VPN multicast
forwarding entry. Then, multicast traffic is transmitted over the I-PMSI tunnel to CE2.
Figure 6 shows the time sequence for leaving a multicast group when a PE serves as the RP. Table 6
describes the procedure for leaving a multicast group when a PE serves as the RP.
2022-07-08 1998
Feature Description
Figure 6 Time sequence for leaving a multicast group when a PE serves as the RP
CE2 After CE2 detects that a multicast receiver attached to itself leaves the multicast group,
CE2 sends a PIM (*, G) Prune message to PE2.
PE2 Upon receipt of the PIM (*, G) Prune message, PE2 deletes the corresponding PIM (*, G)
entry.
PE2 Upon receipt of the PIM (S, G) Prune message, PE2 deletes the corresponding PIM (S, G)
entry. PE2 sends a BGP Withdraw message (Source Tree Join route) to PE1.
PE1 Upon receipt of the BGP Withdraw message (Source Tree Join route), PE1 deletes the
previously recorded BGP C-multicast route (Source Tree Join route) as well as the
outbound interface in the PIM (S, G) entry. In addition, PE1 sends a PIM (S, G) Prune
message to CE1.
CE1 Upon receipt of the PIM (S, G) Prune message, CE1 stops sending multicast traffic to
CE2.
On the network show in Figure 1, each site of the MVPN is a PIM-SM BSR domain. The CE2 serves as the RP.
CE3 has established an MSDP peer relationship with PE3, and PE2 has established an MSDP peer relationship
2022-07-08 1999
Feature Description
with CE2. Figure 7 shows the time sequence for joining a multicast group when a CE serves as the RP. Table
7 describes the procedure for joining a multicast group when a CE serves as the RP.
Figure 7 Time sequence for joining a multicast group when a CE serves as the RP
CE2 After receiving a user join request through IGMP, CE2 generates a PIM (*, G) Join
message. Because CE2 is the RP, CE2 does not send the PIM (*, G) Join message to its
upstream.
CE1 After CE1 receives multicast traffic from the multicast server, CE1 sends a PIM Register
message to CE3.
CE3 Upon receipt of the PIM Register message, CE3 generates a PIM (S, G) entry.
CE3 CE3 carries the PIM (S, G) entry information in an MSDP Source Active (SA) message
and sends the message to its MSDP peer, PE3.
PE3 Upon receipt of the MSDP SA message, PE3 generates a PIM (S, G) entry.
PE3 PE3 carries the PIM (S, G) entry information in a Source Active AD route and sends the
route to other PEs.
PE2 Upon receipt of the Source Active AD route, PE2 learns the PIM (S, G) entry information
carried in the route. Then, PE2 sends an MSDP SA message to transmit the PIM (S, G)
entry information to its MSDP peer, CE2.
2022-07-08 2000
Feature Description
CE2 Upon receipt of the MSDP SA message, CE2 learns the PIM (S, G) entry information
carried in the message and generates a PIM (S, G) entry. Then, CE2 initiates a PIM (S, G)
join request to the multicast source. Finally, CE2 forwards the multicast traffic to
multicast receivers.
Figure 8 shows the time sequence for leaving a multicast group when a CE serves as the RP. Table 8
describes the procedure for leaving a multicast group when a CE serves as the RP.
Figure 8 Time sequence for leaving a multicast group when a CE serves as the RP
CE2 After CE2 detects that a multicast receiver attached to itself leaves the multicast group,
CE2 generates a PIM (*, G) Prune message. Because CE2 is the RP, CE2 does not send the
PIM (*, G) Prune message to its upstream.
PE2 Upon receipt of the PIM (S, G) Prune message, PE2 deletes the corresponding PIM (S, G)
entry. Then, PE2 sends a BGP Withdraw message (Shared Tree Join route) to PE1.
2022-07-08 2001
Feature Description
PE1 Upon receipt of the BGP Withdraw message (Source Tree Join route), PE1 deletes the
previously recorded BGP C-multicast route (Source Tree Join route) as well as the
outbound interface in the PIM (S, G) entry. In addition, PE1 sends a PIM (S, G) Prune
message to CE1.
CE1 Upon receipt of the PIM (S, G) Prune message, CE1 stops sending multicast traffic to
CE2.
The establishment of NG MVPN tunnels is affected by the network deployed on the public network,
including whether the public network contains multiple ASs and whether different MPLS protocols are
deployed in different areas. According to the two factors, NG MVPN deployment scenarios can be classified
into the following types:
• Intra-AS non-segmented NG MVPN: The public network contains only one AS, and only one MPLS
protocol is deployed.
• Intra-AS segmented NG MVPN: The public network contains only one AS but contains multiple areas.
Different MPLS protocols are deployed in adjacent areas.
• Inter-AS non-segmented NG MVPN: The public network contains multiple ASs, and only one MPLS
protocol is deployed in the ASs.
For details about the NG MVPN deployment scenarios, see NG MVPN Typical Deployment Scenarios on the
Public Network.
Tunnel establishment includes the following basic steps and slightly differs in different scenarios:
2022-07-08 2002
Feature Description
tunnel sends multicast data only to PEs interested in the data, reducing bandwidth consumption and
PEs' burdens.
The concepts and protocols related to the multicast traffic carried by the public network tunnel are as
follows:
• PMSI Tunnel
• MVPN Targets
PMSI Tunnel
Public tunnels (P-tunnels) are transport mechanisms used to forward VPN multicast traffic across service
provider networks. In NE40E, PMSI tunnels can be carried over RSVP-TE P2MP or mLDP P2MP tunnels. Table
1 lists the differences between RSVP-TE P2MP tunnels and mLDP P2MP tunnels.
Table 1 Differences between RSVP-TE P2MP tunnels and mLDP P2MP tunnels
RSVP-TE P2MP tunnel Established from the root node. RSVP-TE P2MP tunnels support
bandwidth reservation and can
ensure service quality during
network congestion. Use RSVP-TE
P2MP tunnels to carry PMSI
tunnels if high service quality is
required.
mLDP P2MP tunnel Established from a leaf node. mLDP P2MP tunnels do not
support bandwidth reservation
and cannot ensure service quality
during network congestion.
Configuring an mLDP P2MP
tunnel, however, is easier than
configuring an RSVP-TE P2MP
tunnel. Use mLDP P2MP tunnels
to carry PMSI tunnels if high
service quality is not required.
Theoretically, a P-tunnel can carry the traffic of one or multiple MVPNs. However, in NE40E, a P-tunnel can
2022-07-08 2003
Feature Description
On an MVPN that uses BGP as the signaling protocol, a sender PE distributes information about the P-tunnel
in a new BGP attribute called PMSI. PMSI tunnels are the logical tunnels used by the public network to
transmit VPN multicast data, and P-tunnels are the actual tunnels used by the public network to transmit
VPN multicast data. A sender PE uses PMSI tunnels to send specific VPN multicast data to receiver PEs. A
receiver PE uses PMSI tunnel information to determine which multicast data is sent by the multicast source
on the same MVPN as itself. There are two types of PMSI tunnels: I-PMSI tunnels and S-PMSI tunnels.Table 2
lists the differences between I-PMSI and S-PMSI tunnels.
I-PMSI tunnel An I-PMSI tunnel connects to all Multicast data sent over an I-PMSI
PEs on an MVPN. tunnel can be received by all PEs
on the MVPN. In a VPN instance,
one PE corresponds to only one I-
PMSI tunnel.
S-PMSI tunnel An S-PMSI tunnel connects to the Multicast data sent over an S-
sender and receiver PEs of specific PMSI tunnel is received by only
sources and multicast groups. PEs interested in the data. In a
VPN instance, one PE can
correspond to multiple S-PMSI
tunnels.
A public network tunnel can consist of one PMSI logical tunnel or multiple interconnected PMSI tunnels. The
former is a non-segmented tunnel, and the latter forms a segmented tunnel.
• For a non-segment tunnel, the public network between the sender PE and receiver PE uses the same
MPLS protocol. Therefore, an MPLS P2MP tunnel can be used to set up a PSMI logical tunnel to carry
multicast traffic.
• For a segmented tunnel, different areas on the public network between the sender PE and receiver PE
use different MPLS protocols. Therefore, PMSI tunnels need to be established in each area based on the
MPLS protocol type and MPLS P2MP tunnel type. In addition, tunnel stitching must be configured on
area connection nodes to stitch PMSI tunnels in different areas into one tunnel to carry the data traffic
of the MVPN. Currently, the NE40E supports intra-AS segmented tunnels, not inter-AS segmented
tunnels.
MVPN Targets
MVPN targets are used to control MVPN A-D route advertisement. MVPN targets function in a similar way
as VPN targets used on unicast VPNs and are also classified into two types:
2022-07-08 2004
Feature Description
• Export MVPN target: A PE adds the export MVPN target to an MVPN instance before advertising this
route.
• Import MVPN target: After receiving an MVPN A-D route from another PE, a PE matches the export
MVPN target of the route against the import MVPN targets of its VPN instances. If the export MVPN
target matches the import MVPN target of a VPN instance, the PE accepts the MVPN A-D route and
records the sender PE as an MVPN member. If the export MVPN target does not match the import
MVPN target of any VPN instance, the PE drops the MVPN A-D route.
By default, if you do not configure MVPN targets for an MVPN, MVPN A-D routes carry the VPN target communities
that are attached to unicast VPN-IPv4 routes. If the unicast and multicast network topologies are congruent, you do not
need to configure MVPN targets for MVPN A-D routes. If they are not congruent, configure MVPN targets for MVPN A-D
routes.
To transmit multicast traffic from multicast sources to multicast receivers, sender PEs must establish BGP
2022-07-08 2005
Feature Description
MVPN peer relationships with receiver PEs. On the network shown in Figure 1, PE1 serves as a sender PE,
and PE2 and PE3 serve as receiver PEs. Therefore, PE1 establishes BGP MVPN peer relationships with PE2
and PE3.
PEs on an NG MVPN use BGP Update messages to exchange MVPN information. MVPN information is
carried in the network layer reachability information (NLRI) field of a BGP Update message. The NLRI
containing MVPN information is also called the MVPN NLRI. For more information about the MVPN NLRI,
see MVPN NLRI.
• RSVP-TE P2MP tunnels: A sender PE sends an intra-AS PMSI A-D route to each receiver PE. Upon
receipt, each receiver PE sends a reply message. Then, the sender PE collects P2MP tunnel leaf
information from received reply messages and establishes an RSVP-TE P2MP tunnel for each MVPN
based on the leaf information of the MVPN. For more information about RSVP-TE P2MP tunnel
establishment, see "P2MP TE" in NE40E Feature Description - MPLS.
• mLDP P2MP tunnels: Receiver PEs directly send Label Mapping messages based on the root node
address (sender PE address) and opaque value information carried in the Intra-AS PMSI A-D route sent
by the sender PE to establish an mLDP P2MP tunnel. For more information about mLDP P2MP tunnel
establishment, see "mLDP" in NE40E Feature Description - MPLS.
For comparison between RSVP-TE and mLDP P2MP tunnels, see Table 1 in NG MVPN Public Network Tunnel Principle.
The following example uses the network shown in Figure 1 to describe how to establish PMSI tunnels.
Because RSVP-TE P2MP tunnels and mLDP P2MP tunnels are established differently, the following uses two
scenarios, RSVP-TE P2MP Tunnel and mLDP P2MP Tunnel, to describe how to establish PMSI tunnels.
This example presumes that:
• PE1 has established BGP MVPN peer relationships with PE2 and PE3, but no BGP MVPN peer
relationship is established between PE2 and PE3.
• The network administrator has configured MVPN on PE1, PE2, and PE3 in turn.
2022-07-08 2006
Feature Description
Figure 2 Time sequence for establishing an I-PMSI tunnel with the P-tunnel type as RSVP-TE P2MP LSP
Table 1 briefs the procedure for establishing an I-PMSI tunnel with the P-tunnel type as RSVP-TE P2MP LSP.
2022-07-08 2007
Feature Description
Table 1 Procedure for establishing an I-PMSI tunnel with the P-tunnel type as RSVP-TE P2MP LSP
PE1 BGP and MVPN have been As a sender PE, PE1 initiates the I-PMSI tunnel
configured on PE1. establishment process. The MPLS module on PE1 reserves
PE1 has been configured as resources for the corresponding RSVP-TE P2MP tunnel.
a sender PE. Because PE1 does not know RSVP-TE P2MP tunnel leaf
The P-tunnel type for I-PMSI information, the RSVP-TE P2MP tunnel is not established in
tunnel establishment has a real sense.
PE1 BGP and MVPN have been PE1 sends a Type 1 BGP A-D route to PE2. This route
configured on PE2. carries the following information:
PE1 has established a BGP MVPN Targets: used to control A-D route advertisement.
MVPN peer relationship The Type 1 BGP A-D route carries the export MVPN target
with PE2. information configured on PE1.
PMSI Tunnel attribute: specifies the P-tunnel type (RSVP-TE
P2MP LSP in this case) used for PMSI tunnel establishment.
This attribute carries information about resources reserved
PE2 - PE2 sends a BGP A-D route that carries the export MVPN
target to PE1. Because PE2 is not a sender PE configured
with PMSI tunnel information, the BGP A-D route sent by
PE2 does not carry the PMSI Tunnel attribute.
After PE2 receives the BGP A-D route from PE1, PE2
matches the export MVPN target of the route against its
local import MVPN target. If the two targets match, PE2
accepts this route, records PE1 as an MVPN member, and
joins the P2MP tunnel that is specified in the PMSI Tunnel
attribute carried in this route (at the moment, the P2MP
tunnel has not been established yet).
PE1 - After PE1 receives the BGP A-D route from PE2, PE1
matches the export MVPN target of the route against its
local import MVPN target. If the two targets match, PE1
accepts this route, records PE2 as an MVPN member, and
instructs the MPLS module to send an MPLS message to
PE2 and add PE2 as a leaf node of the RSVP-TE P2MP
2022-07-08 2008
Feature Description
tunnel to be established.
PE1 - After PE1 receives a reply from PE2, the MPLS module on
PE1 completes the process of establishing an RSVP-TE
P2MP tunnel with PE1 as the root node and PE2 as a leaf
node. For more information about RSVP-TE P2MP tunnel
establishment, see "P2MP TE" in NE40E Feature Description
- MPLS.
PE2 - After PE2 receives the MPLS message from PE1, PE2 joins
the established RSVP-TE P2MP tunnel.
PE3 joins the RSVP-TE P2MP tunnel rooted at PE1 in a similar way as PE2. After PE2 and PE3 both join the
RSVP-TE P2MP tunnel rooted at PE1, the I-PMSI tunnel is established and the MVPN service becomes
available.
Figure 3 Time sequence for establishing an I-PMSI tunnel with the P-tunnel type as mLDP P2MP LSP
Table 2 briefs the procedure for establishing an I-PMSI tunnel with the P-tunnel type as mLDP P2MP LSP.
2022-07-08 2009
Feature Description
Table 2 Procedure for establishing an I-PMSI tunnel with the P-tunnel type as mLDP P2MP LSP
PE1 BGP and MVPN have been As a sender PE, PE1 initiates the I-PMSI tunnel
configured on PE1. establishment process. The MPLS module on PE1 reserves
PE1 has been configured as resources (FEC information such as the opaque value and
a sender PE. root node address) for the corresponding mLDP P2MP
The P-tunnel type for I- tunnel. Because PE1 does not know leaf information of the
PMSI tunnel establishment mLDP P2MP tunnel, the mLDP P2MP tunnel is not
P2MP LSP.
PE1 BGP and MVPN have been PE1 sends a Type 1 BGP A-D route to PE2. This route
configured on PE2. carries the following information:
PE1 has established a BGP MVPN Targets: used to control A-D route advertisement.
MVPN peer relationship The Type 1 BGP A-D route carries the export MVPN target
with PE2. configured on PE1.
PMSI Tunnel attribute: specifies the P-tunnel type (mLDP
P2MP in this case) used for PMSI tunnel establishment.
This attribute carries information about resources reserved
PE2 - After PE2 receives the BGP A-D route from PE1, the MPLS
module on PE2 sends a Label Mapping message to PE1.
This is because the PMSI Tunnel attribute carried in the
received route specifies the P-tunnel type as mLDP,
meaning that the P2MP tunnel must be established from
leaves.
After PE2 receives the MPLS message replied by PE1, PE2
becomes aware that the P2MP tunnel has been
established. For more information about mLDP P2MP
tunnel establishment, see "mLDP" in NE40E Feature
Description - MPLS.
PE2 - PE2 sends a BGP A-D route that carries the export MVPN
target to PE1. Because PE2 is not a sender PE configured
with PMSI tunnel information, the BGP A-D route sent by
PE2 does not carry the PMSI Tunnel attribute.
2022-07-08 2010
Feature Description
After PE1 receives the BGP A-D route from PE2, PE1
matches the export MVPN target of the route against its
local import MVPN target. If the two targets match, PE1
accepts this route and records PE2 as an MVPN member.
PE3 joins the mLDP P2MP tunnel and MVPN in a similar way as PE2. After PE2 and PE3 both join the mLDP
P2MP tunnel rooted at PE1, the I-PMSI tunnel is established and the MVPN service becomes available.
Background
An NG MVPN uses the I-PMSI tunnel to send multicast data to receivers. The I-PMSI tunnel connects to all
PEs on the MVPN and sends multicast data to these PEs regardless of whether these PEs have receivers. If
some PEs do not have receivers, this implementation will cause redundant traffic, wasting bandwidth
resources and increasing PEs' burdens.
To solve this problem, S-PMSI tunnels are introduced. An S-PMSI tunnel connects to the sender and receiver
PEs of specific multicast sources and groups on an NG MVPN. Compared with the I-PMSI tunnel, an S-PMSI
tunnel sends multicast data only to PEs interested in the data, reducing bandwidth consumption and PEs'
burdens.
For comparison between I-PMSI and S-PMSI tunnels, see NG MVPN Public Network Tunnel Principle in Table 2.
Implementation
The following example uses the network shown in Figure 1 to describe switching between I-PMSI and S-
PMSI tunnels on an NG MVPN.
2022-07-08 2011
Feature Description
Switching from the I-PMSI tunnel The multicast data forwarding S-PMSI tunnels are classified as
to an S-PMSI tunnel rate is consistently above the RSVP-TE S-PMSI tunnels or mLDP
specified switching threshold. S-PMSI tunnels, depending on the
bearer tunnel type. For details
about switching from the I-PMSI
tunnel to an S-PMSI tunnel, see:
Switching from the I-PMSI Tunnel
to an RSVP-TE S-PMSI Tunnel
Switching from the I-PMSI Tunnel
to an mLDP S-PMSI Tunnel
• After multicast data is switched from the I-PMSI tunnel to an S-PMSI tunnel, if the S-PMSI tunnel fails but the I-
PMSI tunnel is still available, multicast data will be switched back to the I-PMSI tunnel.
• After multicast data is switched from the I-PMSI tunnel to an S-PMSI tunnel, if the multicast data forwarding rate
is consistently below the specified switching threshold but the I-PMSI tunnel is unavailable, multicast data still
travels along the S-PMSI tunnel.
2022-07-08 2012
Feature Description
Figure 2 Time sequence for switching from the I-PMSI tunnel to an RSVP-TE S-PMSI tunnel
Table 2 Procedure for switching from the I-PMSI tunnel to an RSVP-TE S-PMSI tunnel
PE1 After PE1 detects that the multicast data forwarding rate exceeds the specified
switching threshold, PE1 initiates switching from the I-PMSI tunnel to an S-
PMSI tunnel by sending a BGP S-PMSI A-D route to its BGP peers. In the BGP S-
PMSI A-D route, the Leaf Information Require flag is set to 1, indicating that a
PE that receives this route needs to send a BGP Leaf A-D route in response if
the PE wants to join the S-PMSI tunnel to be established.
PE2 Upon receipt of the BGP S-PMSI A-D route, PE2, which has downstream
receivers, sends a BGP Leaf A-D route to PE1.
PE3 Upon receipt of the BGP S-PMSI A-D route, PE3, which does not have
downstream receivers, does not send a BGP Leaf A-D route to PE1 but records
the BGP S-PMSI A-D route information.
PE1 Upon receipt of the BGP Leaf A-D route from PE2, PE1 establishes an S-PMSI
tunnel with itself as the root node and PE2 as a leaf node.
PE2 After PE2 detects that the RSVP-TE S-PMSI tunnel has been established, PE2
joins this tunnel.
2022-07-08 2013
Feature Description
After PE3 has downstream receivers, PE3 will send a BGP Leaf A-D route to PE1. Upon receipt of the
route, PE1 adds PE3 as a leaf node of the RSVE-TE S-PMSI tunnel. After PE3 joins the tunnel, PE3's
downstream receivers will also be able to receive multicast data.
Figure 3 Time sequence for switching from the I-PMSI tunnel to an mLDP S-PMSI tunnel
Table 3 Procedure for switching from the I-PMSI tunnel to an mLDP S-PMSI tunnel
PE1 After PE1 detects that the multicast data forwarding rate exceeds the
specified switching threshold, PE1 initiates switching from the I-PMSI tunnel
to an S-PMSI tunnel by sending a BGP S-PMSI A-D route to its BGP peers. In
the BGP S-PMSI A-D route, the Leaf Information Require flag is set to 0.
PE2 Upon receipt of the BGP S-PMSI A-D route, PE2, which has downstream
receivers, directly joins the mLDP S-PMSI tunnel specified in the BGP S-PMSI
A-D route.
PE3 Upon receipt of the BGP S-PMSI A-D route, PE3, which does not have
downstream receivers, does not join the mLDP S-PMSI tunnel specified in the
BGP S-PMSI A-D route, but records the BGP S-PMSI A-D route information.
After PE3 has downstream receivers, PE3 will also directly join the mLDP S-PMSI tunnel. Then, PE3's
downstream receivers will also be able to receive multicast data.
2022-07-08 2014
Feature Description
PE1 starts a switch-delay timer upon the completion of S-PMSI tunnel establishment and determines whether to switch
multicast data to the S-PMSI tunnel as follows: If the S-PMSI tunnel fails to be established, PE1 still uses the I-PMSI
tunnel to send multicast data. If the multicast data forwarding rate is consistently below the specified switching
threshold throughout the timer lifecycle, PE1 still uses the I-PMSI tunnel to transmit multicast data. If the multicast data
forwarding rate is consistently above the specified switching threshold throughout the timer lifecycle, PE1 switches data
to the S-PMSI tunnel for transmission.
Figure 4 Time sequence for switching from an S-PMSI tunnel to the I-PMSI tunnel
Table 4 Procedure for switching from an S-PMSI tunnel to the I-PMSI tunnel
PE1 After PE1 detects that the multicast data forwarding rate is consistently below
the specified switching threshold, PE1 starts a switchback hold timer:
If the multicast data forwarding rate is consistently above the specified
switching threshold throughout the timer lifecycle, PE1 still uses the S-PMSI
tunnel to send traffic.
If the multicast data forwarding rate is consistently below the specified
switching threshold throughout the timer lifecycle, PE1 switches multicast data
to the I-PMSI tunnel for transmission. Meanwhile, PE1 sends a BGP Withdraw S-
PMSI A-D route to PE2, instructing PE2 to withdraw bindings between multicast
2022-07-08 2015
Feature Description
PE2 Upon receipt of the BGP Withdraw S-PMSI A-D route, PE2 withdraws the
bindings between its multicast entries and the S-PMSI tunnel. If PE2 has sent a
BGP Leaf A-D route to PE1, PE2 will send a BGP Withdraw Leaf A-D route to
PE1 in this step.
PE2 After PE2 detects that none of its multicast entries is bound to the S-PMSI
tunnel, PE2 leaves the S-PMSI tunnel.
PE1 PE1 deletes the S-PMSI tunnel after waiting for a specified period of time.
In an RSVP-TE P2MP tunnel dual-root 1+1 protection scenario, S-PMSI tunnels must be carried over RSVP-TE P2MP
tunnels. The I-PMSI/S-PMSI switching processes in this scenario are similar to those described above except that the leaf
PEs need to start a tunnel status check delay timer:
• Before the timer expires, leaf PEs delete tunnel protection groups to skip the status check of the primary I-PMSI or
S-PMSI tunnel. The leaf PEs select the multicast data received from the primary tunnel and discard the multicast
data received from the backup tunnel.
• After the timer expires, leaf PEs start to check the primary I-PMSI or S-PMSI tunnel status again. Leaf PEs select the
multicast data received from the primary tunnel only if the primary tunnel is Up. If the primary tunnel is Down,
Leaf PEs select the multicast data received from the backup tunnel.
On a leaf PE, a P2MP tunnel can be mapped to only one VPN instance. Therefore, the import VPN target of each VPN
instance must be unique on a leaf PE. If multiple VPN instances with the same import VPN target exist on a leaf PE, only
the downstream node of one VPN instance can receive multicast traffic.
Figure 1 shows a typical NG MVPN networking, and Figure 2 shows how an IP multicast packet is
encapsulated and transmitted on the network.
2022-07-08 2016
Feature Description
2022-07-08 2017
Feature Description
• Intra-AS non-segmented NG MVPN: The public network contains only one AS, and only one MPLS
protocol is deployed.
• Inter-AS non-segmented NG MVPN: The public network contains multiple ASs, and only one MPLS
protocol is deployed in the ASs.
• Intra-AS segmented NG MVPN: The public network contains only one AS but multiple areas, and
different MPLS protocols are deployed in adjacent areas.
2022-07-08 2018
Feature Description
The public network that the multicast service traverses contains only one AS, and only one MPLS protocol is
used between PE1 on the multicast source side and PE2 on the multicast user side, as shown in Figure 1.
• Deploy MVPN on the PEs, so that the PEs in the same MVPN can automatically discover each other and
use BGP to transmit BGP C-multicast routes.
• Configure a P2MP tunnel and use BGP to transmit BGP A-D routes to each other, so that PE1 and PE2
can establish a PMSI tunnel based on the P2MP tunnel to transmit multicast traffic.
This scenario supports three VPN modes: Option A, Option B, and Option C. In Option A mode, ASBRs use
each other as CEs. The establishment process is similar to that in the intra-AS non-segment scenario.
• Establish an IBGP peer relationship between a PE and an ASBR in the same AS. Establish an EBGP peer
relationship between ASBRs in different ASs.
• Deploy MVPN on the PEs, so that the PEs in the same MVPN can automatically discover each other and
use BGP to transmit BGP C-multicast routes through ASBRs.
• Configure a P2MP tunnel and use BGP to transmit BGP A-D routes to each other through ASBRs, so that
2022-07-08 2019
Feature Description
PE1 and PE2 can establish a PMSI tunnel based on the P2MP tunnel to transmit multicast traffic.
• Establish an IBGP peer relationship between a PE and an ASBR in the same AS. Establish an EBGP peer
relationship between ASBRs in different ASs. Establish an MP-EBGP peer relationship between PE1 and
PE2.
• Deploy MVPN on the PEs, so that the PEs in the same MVPN can automatically discover each other and
use BGP to directly transmit BGP C-multicast routes over ASBRs.
• Configure a P2MP tunnel and use BGP to directly transmit BGP A-D routes to each other over ASBRs, so
that PE1 and PE2 can establish a PMSI tunnel based on the P2MP tunnel to transmit multicast traffic.
• Deploy MVPN on the PEs, so that the PEs in the same MVPN can automatically discover each other and
use BGP to transmit BGP C-multicast routes.
• Configure a P2MP tunnel and use BGP to transmit BGP A-D routes to each other so that PE1 and the
ABR can establish a PMSI tunnel based on the P2MP tunnel. The ABR and PE2 establish a PMSI tunnel
based on the P2MP tunnel. The two tunnels are stitched on the ABR to carry the multicast traffic
transmitted from PE1 to PE2.
Background
2022-07-08 2020
Feature Description
NG MVPN supports inter-VPN multicast service distribution. To enable a service provider on a VPN to
provide multicast services for users on other VPNs, configure NG MVPN extranet.
Implementation
Table 1 describes the usage scenarios of NG MVPN extranet.
Local cross A multicast receiver and multicast source are connected to the
same PE and belong to different VPN instances.
• The address range of multicast groups using the NG MVPN extranet service cannot overlap that of multicast
groups using the intra-VPN service.
• Only a static RP can be used in an NG MVPN extranet scenario, the same static RP address must be configured on
the source and receiver VPN sides, and the static RP address must belong to the source VPN. If different RP
addresses are configured, inconsistent multicast routing entries will be created on the two instances, causing
service forwarding failures.
• To provide an SSM service using NG MVPN extranet, the same SSM group address must be configured on the
source and receiver VPN sides.
Remote Cross
On the network shown in Figure 1, VPN GREEN is configured on PE1. CE1 connects to the multicast source in
VPN GREEN. VPN BLUE is configured on PE2. CE2 connects to the multicast source in VPN BLUE. VPN GREEN
and VPN BLUE are configured on PE3. Users connecting to CE3 need to receive multicast data from both
VPN BLUE and VPN GREEN.
2022-07-08 2021
Feature Description
Figure 1 Networking for configuring a source VPN instance on a receiver PE in the remote cross scenario of NG
MVPN extranet
Configure source VPN GREEN on PE3 and a multicast routing policy for receiver VPN BLUE. Table 2 describes
the implementation process.
Table 2 Process of configuring a source VPN instance on a receiver PE in the remote cross scenario of NG
MVPN extranet
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data from the
multicast source in VPN GREEN and forwards a PIM Join message to PE3.
2 PE3 After PE3 receives the PIM Join message from CE3 in VPN BLUE, it creates a multicast
routing entry. Through the RPF check, PE3 determines that the upstream interface of the
RPF route belongs to VPN GREEN. Then, PE3 adds the upstream interface (serving as an
extranet inbound interface) to the multicast routing table.
3 PE3 PE3 sends the C-multicast route of VPN GREEN to PE1 in VPN GREEN through BGP.
4 PE1 After PE1 receives the multicast data from the multicast source in VPN GREEN, PE1 sends
the multicast traffic of VPN GREEN to PE3 in VPN GREEN over the public network.
5 PE3 PE3 decapsulates and imports the received multicast data to receiver VPN BLUE and
sends the data to CE3. Then, CE3 forwards the data to the receiver in VPN BLUE.
Local Cross
2022-07-08 2022
Feature Description
On the network shown in Figure 2, PE1 is the source PE of VPN BLUE, and PE3 is the source PE of VPN
GREEN. CE4 connects to the multicast source in VPN GREEN. Both CE3 and CE4 reside on the same side of
PE3. Users connecting to CE3 need to receive multicast data from both VPN BLUE and VPN GREEN.
Table 3 describes how NG MVPN extranet is implemented in the local cross scenario.
1 CE3 CE3 receives an IGMP Report message from the receiver that requires data from the
multicast source in VPN GREEN and forwards a PIM Join message to PE3.
2 PE3 After PE3 receives the PIM Join message, it creates a multicast routing entry of VPN
BLUE. Through the RPF check, PE3 determines that the upstream interface of the RPF
route belongs to VPN GREEN. PE3 then imports the PIM Join message to VPN GREEN.
3 PE3 PE3 creates a multicast routing entry in VPN GREEN, records receiver VPN BLUE in the
entry, and sends the PIM Join message to CE4 in VPN GREEN.
4 PE3 After CE4 receives the PIM Join message, it sends the multicast data from VPN GREEN to
PE3, and PE3 imports the multicast data to receiver VPN BLUE based on the multicast
routing entries of VPN GREEN.
5 PE3 PE3 sends the multicast data to CE3 based on the multicast routing entries of VPN BLUE.
Then, CE3 forwards the data to the receiver in VPN BLUE.
Background
On an NG MVPN, when multiple sender PEs exist, receiver PEs select routes based on preferred unicast
routes by default. In this case, different receiver PEs select different sender PEs as their root nodes. This
requires multiple P2MP tunnels to be established. As a result, many public network tunnel resources are
consumed. To resolve the preceding issue, enable the highest IP address to be selected as the upstream
multicast hop (UMH) on receiver PEs, so that the receiver PEs select the same sender PE as their root node
during VPN route selection.
Implementation
• UMH route selection fundamentals
Figure 1 shows the fundamentals of UMH route selection on an NG MVPN.
PE1, PE2, and PE3 are sender PEs and advertise VPN routes of the VPN multicast source or RP. Both PE4
and PE5 can receive routes from the VPN multicast source or RP. The unicast routes, from PE4 to the
VPN multicast source or RP, advertised by PE3, PE2, and PE1 are in descending order of priority. The
unicast routes, from PE5 to the VPN multicast source or RP, advertised by PE2, PE3, and PE1 are in
descending order of priority.
After the system is enabled to select the highest IP address as the UMH in a VPN instance on PE4 or
PE5:
PE4 or PE5 constructs a UMH route candidate set and imports the VPN-IP routes with the same prefix
of the same VPN. Each UMH route candidate record consists of route, UpstreamPE, and UpstreamRD.
PE4 or PE5 then selects the highest IP address from the upstream PEs' IP addresses as the UMH.
According to the preceding topology, PE4 and PE5 select the route advertised by PE1 as the UMH route
because PE1's IP address is the highest. Both PE4 and PE5 use the route to construct a C-multicast
route. The RD in the C-multicast is the UpstreamRD of the selected route, and the vpn-target in the C-
multicast route is the VRF Route Import Extend Community of the selected route.
2022-07-08 2024
Feature Description
When the VPN-IP route does not contain the VRF Route Import Extended Community, upstream PEs obtain it from
the BGP next-hop address of the VPN-IP route. In this case, a C-multicast route cannot be constructed and VPN
multicast routes cannot be established.
After the function of enabling the highest IP address to be selected as the UMH on a PE, VPN multicast load
splitting does not take effect on the (*, G) and (S, G) entries across the public network, but takes effect on the (*,
G) and (S, G) entries that are not across the public network.
Single-MVPN Networking Sender CEs, receiver PEs, and nodes Advantage: The network does not
Protection and links between sender CEs and have redundant multicast traffic.
receiver PEs Disadvantages:
2022-07-08 2025
Feature Description
Disadvantages:
This solution also enhances network
reliability through networking
redundancy. If a network fault occurs,
traffic depends on unicast route
convergence to switch between links.
A longer route convergence time
results in lower network reliability.
Redundant multicast traffic exists on
the network, wasting bandwidth
resources.
Dual-Root 1+1 Protection Sender PEs (P-tunnels can also be Advantage: The network uses BFD or
protected after this solution is flow based detection to detect link
deployed) faults, implementing fast route
convergence and high network
reliability.
Disadvantages:
Redundant multicast traffic exists on
the network, wasting bandwidth
resources.
Only sender PEs and P-tunnels can be
protected. Receiver PEs and CEs
cannot be protected.
NOTE:
reliable.
2022-07-08 2026
Feature Description
Figure 1 shows how a multicast receiver joins a multicast group and how the multicast traffic is transmitted
when unicast routing, VPN, BGP, MPLS, and multicast are deployed properly.
• Multicast joining process: After CE3 receives a multicast group join request from a receiver, CE3 sends a
PIM Join message to PE3. Upon receipt, PE3 converts the message to a BGP C-multicast route and sends
the route to PE1, its BGP MVPN peer. Upon receipt, PE1 converts the route to a PIM Join message and
sends the message to the multicast source. Then, the receiver joins the multicast group.
• Multicast forwarding process: After PE1 receives multicast traffic from the multicast source, PE1 sends
the multicast traffic to PE3 over the P2MP tunnel. Upon receipt, PE3 sends the traffic to CE3, which in
turn sends the traffic to the multicast receiver.
2022-07-08 2027
Feature Description
1 CE1 or link The network can rely only on unicast route convergence for recovery. The handling
between process is as follows:
PE1 and PE1 detects that the multicast source is unreachable.
the
PE1 sends to PE3 a BGP Withdraw message that carries information about a VPNv4
multicast
route to the source.
source
After PE3 receives the message, PE3 preferentially selects the route advertised by PE2
as the route to the multicast source. Then, PE3 sends a BGP C-multicast route to PE2.
Upon receipt, PE2 converts the route to a PIM Join message and sends the message to
CE2.
CE2 constructs an MDT and sends the multicast traffic received from the multicast
source to PE2. Upon receipt, PE2 sends the traffic to PE3 over the P2MP tunnel.
After PE3 receives the traffic, PE3 sends the traffic to CE3, which in turn sends the
traffic to the multicast receiver.
2 PE1 The network can rely only on unicast route convergence for recovery. The handling
process is as follows:
After PE3 uses BFD for BGP to detect that PE1 is unreachable, PE3 withdraws the route
(to the multicast source) advertised by PE1 and preferentially selects the route
advertised by PE2 as the route to the multicast source.
Then, PE3 sends a BGP C-multicast route to PE2. After PE2 receives the route, PE2
converts the route to a PIM Join message and sends the message to CE2.
CE2 constructs an MDT and sends the multicast traffic received from the multicast
source to PE2. Upon receipt, PE2 sends the traffic to PE3 over the P2MP tunnel.
After PE3 receives the traffic, PE3 sends the traffic to CE3, which in turn sends the
traffic to the multicast receiver.
3 Public If MPLS tunnel protection is configured, the network relies on MPLS tunnel protection
network for recovery. The MVPN is unaware of public network link changes. If MPLS tunnel
link protection is not configured, the network relies on unicast route convergence for
recovery. In this situation, the handling process is similar to the process for handling
PE1 failures.
4 PE3 The network can rely only on unicast route convergence for recovery. The handling
process is as follows:
When CE3 detects that PE3 is unreachable, CE3 withdraws the unicast route (to the
multicast source) advertised by PE3. After route convergence, CE3 preferentially selects
2022-07-08 2028
Feature Description
In single-MVPN networking protection, if PE3 and PE4 both receive PIM Join messages but their upstream
peers are different (for example, the upstream peer is PE1 for PE3 and PE2 for PE4), PE1 and PE2 both send
multicast traffic to PE3 and PE4. In this situation, you need to ensure that PE3 accepts only the multicast
traffic from PE1 and PE4 accepts only the multicast traffic from PE2. Specifically, you need to create multiple
P2MP tunnels (with each I-PMSI tunnel corresponding to one P2MP tunnel) if a receiver PE joins multiple I-
PMSI tunnels. Then, when multicast traffic reaches the receiver PE over multiple I-PMSI tunnels, the receiver
PE permits the traffic received from the P2MP tunnel corresponding to the upstream neighbor according to
its VPN instance multicast routing table and discards traffic received from other tunnels.
■ The master sender and receiver PEs belong to one MVPN; the backup sender and receiver PEs
belong to another MVPN.
■ One receiver CE sends a PIM Join message to the master receiver PE, and the other receiver CE
sends a PIM Join message to the backup receiver PE. The master receiver PE sends a BGP C-
multicast route to the master sender PE, whereas the backup receiver PE sends a BGP C-multicast
route to the backup sender PE.
■ The master and backup sender PEs convert received BGP C-multicast routes to PIM Join messages
and send these messages to the two sender CEs. The two CEs then construct two MDTs.
■ The master and backup sender PEs send multicast traffic received from different sender CEs to the
2022-07-08 2029
Feature Description
master and backup receiver PEs respectively over different P2MP tunnels.
■ The master and backup receiver PEs send received multicast traffic to corresponding receiver CEs.
■ The receiver CEs send received multicast traffic to corresponding multicast receivers.
Figure 2 shows how a multicast receiver joins a multicast group and how the multicast traffic is transmitted
when unicast routing, VPN, BGP, MPLS, and multicast are deployed properly.
• CE3 serves as a DR. After CE3 receives a multicast group join request from a receiver, CE3 sends a PIM
Join message to PE3. Upon receipt, PE3 converts the message to a BGP C-multicast route and sends the
route to PE1, its BGP MVPN peer. Upon receipt, PE1 converts the BGP C-multicast route to a PIM Join
message and sends the message to CE1. Upon receipt, CE1 establishes an MDT. Then, multicast traffic
can be transmitted from the multicast source to the multicast receiver along the path CE1 -> PE1 -> P1
-> PE3 -> CE3.
• CE4 serves as a non-DR. After CE4 receives a multicast group join request from a receiver, CE4 does not
send a PIM Join message upstream. To implement traffic redundancy, configure static IGMP joining on
CE4, so that CE4 can send a PIM Join message to PE4. After PE4 receives the message, PE4 converts the
message to a BGP C-multicast route and sends the route to PE2. Upon receipt, PE2 converts the route to
a PIM Join message and sends the message to CE2. Upon receipt, CE2 establishes an MDT. Then,
multicast traffic can be transmitted along the path CE2 -> PE2 -> P2 -> PE4 -> CE4. The multicast traffic
will not be forwarded to receivers because CE4 is a non-DR.
2022-07-08 2030
Feature Description
1 CE1 or link The network relies on unicast route convergence for recovery. The handling process is
between as follows:
PE1 and PE1 detects that the multicast source is unreachable.
the
PE1 sends to PE3 a BGP Withdraw message that carries information about a VPNv4
multicast
route to the source.
source
After PE3 receives the message, PE3 withdraws the route (to the multicast source)
advertised by PE1.
CE3 performs route convergence and finds that the next hop of the route to the
multicast source is CE4. Then, CE3 sends a PIM Join message to CE4.
After CE4 receives the message, CE4 adds the downstream outbound interface on the
path to the multicast receiver to the corresponding multicast entry. Then, CE4 starts to
send the multicast traffic received from the multicast source to the multicast receiver.
2 PE1 The network relies on unicast route convergence for recovery. The handling process is
as follows:
After PE3 uses BFD for BGP to detect that PE1 is unreachable, PE3 withdraws the
route (to the multicast source) advertised by PE1. Then, PE3 instructs CE3 to withdraw
this route.
CE3 performs route convergence and finds that the next hop of the route to the
multicast source is CE4. Then, CE3 sends a PIM Join message to CE4.
After CE4 receives the message, CE4 adds the downstream outbound interface on the
path to the multicast receiver to the corresponding multicast entry. Then, CE4 starts to
send the multicast traffic received from the multicast source to the multicast receiver.
3 Public If MPLS tunnel protection is configured, the network relies on MPLS tunnel protection
network for recovery. The MVPN is unaware of public network link changes. If MPLS tunnel
link protection is not configured, the network relies on unicast route convergence for
recovery. In this situation, the handling process is similar to the process for handling
PE1 failures.
4 PE3 The network relies on unicast route convergence for recovery. The handling process is
as follows:
CE3 detects route changes during unicast route convergence and recalculates routes.
After CE3 finds that the next hop of the route to the multicast source is CE4, CE3
sends a PIM Join message to CE4.
After CE4 receives the message, CE4 adds the downstream outbound interface on the
2022-07-08 2031
Feature Description
path to the multicast receiver to the corresponding multicast entry. Then, CE4 starts to
send the multicast traffic received from the multicast source to the multicast receiver.
5 CE3 After CE4 uses BFD for PIM to detect that CE3 is faulty, CE4 starts to serve as a DR
and adds the downstream outbound interface on the path to the multicast receiver to
the corresponding multicast entry. Then, CE4 starts to send the multicast traffic
received from the multicast source to the multicast receiver.
• Configure PE1 and PE2 as sender PEs for the MVPN. Configure RSVP-TE/mLDP P2MP on PE1 and PE2, so
that two RSVP-TE/mLDP P2MP tunnels rooted at PE1 and PE2 respectively can be established. PE3
serves as a leaf node of both tunnels.
• Configure PE to use BFD for P2MP TE/mLDP to detect public network node or link failures.
• Configure VPN FRR on PE3, so that PE3 can have two routes to the multicast source. PE3 uses the route
advertised by PE1 as the primary route and the route advertised by PE2 as the backup route.
• Configure MVPN FRR on PE3 to import VPN multicast traffic to the primary and backup routes.
In a BFD for NG MVPN over P2MP scenario, if the leaf node of a P2MP tunnel is configured with a default static route,
the leaf node forwards the received BFD packet according to the default route. In this case, the BFD session cannot be
set up. To solve this problem, you can configure mutual import of public and private network routes so that routes from
the public network are copied to the NG MVPN network. This ensures that the leaf node can forward the BFD packet
received from the P2MP tunnel.
• Configure PE1 and PE2 as sender PEs for the MVPN. Configure RSVP-TE/mLDP P2MP on PE1 and PE2, so
that two RSVP-TE/mLDP P2MP tunnels rooted at PE1 and PE2 respectively can be established. PE3
serves as a leaf node of both tunnels.
• Configure VPN FRR on PE3, so that PE3 can have two routes to the multicast source. PE3 uses the route
advertised by PE1 as the primary route and the route advertised by PE2 as the backup route.
2022-07-08 2032
Feature Description
• Configure MVPN FRR on PE3 and specify the flow based detection as the detection method of MVPN
FRR.
Figure 3 shows how a multicast receiver joins a multicast group and how the multicast traffic is transmitted
when unicast routing, VPN, BGP, MPLS, and multicast are deployed properly.
• Multicast joining process: After CE3 receives a multicast group join request from a receiver, CE3 sends a
PIM Join message to PE3. Upon receipt, PE3 converts the message to a BGP C-multicast route and sends
the route to PE1 and PE2, its BGP MVPN peers. Upon receipt, PE1 and PE2 convert the route to a PIM
Join message and send the message to the multicast source. Then, the multicast receiver joins the
multicast group.
• Multicast forwarding process: After PE1 receives multicast traffic from the multicast source, PE1 sends
the multicast traffic to PE3 over the RSVP-TE/mLDP P2MP tunnel. Upon receipt, PE3 sends the traffic to
CE3, which in turn sends the traffic to the multicast receiver. After PE3 receives the multicast traffic sent
over the RSVP-TE/mLDP P2MP tunnel rooted at PE2, PE3 drops the traffic.
1 PE1 or the P2MP If a fault occurs on the RSVP-TE/mLDP P2MP tunnel, PE3 can use
tunnel connected to BFD for P2MP TE/mLDP to quickly detect the fault and choose to
PE1 accept the multicast traffic sent by PE2. Traffic switchover can be
completed within 50 ms. The specific route convergence time
2022-07-08 2033
Feature Description
2 P1 or the link The handling process is similar to the process for handling PE1 or
connected to P1 the P2MP tunnel connected to PE1 failures.
1 PE1 or the P2MP If a fault occurs on the nodes or tunnel of primary link, PE3 can
tunnel connected to use flow-based detection to quickly detect the fault and choose to
PE1 accept the multicast traffic received from backup link.
2 P1 or the link The handling process is similar to the process for handling PE1 or
connected to P1 the P2MP tunnel connected to PE1 failures.
3 Public network The handling process is similar to the process for handling PE1 or
tunnel the P2MP tunnel connected to PE1 failures.
Overview
Multicast services, such as IPTV services, video conferences, and real-time multi-player online games, are
increasingly used in daily life. These services are transmitted over service bearer networks that need to:
• Detect network faults in a timely manner and quickly switch traffic from faulty links to normal links.
2022-07-08 2034
Feature Description
Networking Description
NG MVPN is deployed on the service provider's backbone network to solve multicast service issues related to
traffic congestion, transmission reliability, and data security. Figure 1 shows the application of NG MVPN to
IPTV services.
Feature Deployment
In this scenario, NG MVPN deployment consists of the following aspects:
■ Configure a BGP/MPLS IP VPN on the service provider's backbone network and ensure that this
VPN runs properly.
2022-07-08 2035
Feature Description
■ Configure MVPN on the service provider's backbone network, so that PEs belonging to the same
MVPN can use BGP to exchange BGP A-D and BGP C-multicast routes.
■ Configure static multicast joining on sender PEs (PE1 and PE2) to direct multicast traffic to the
P2MP tunnels corresponding to the I-PMSI tunnels.
■ Configure receiver PEs (PE3, PE4, PE5, and PE6) not to perform RPF checks.
You can use either single-MVPN or dual-MVPN networking protection to enhance network reliability or use
either of the following solutions to protect specific parts of the MVPN:
• To protect P-tunnels, configure P2MP TE FRR or use other MPLS tunnel protection technologies.
Terms
Term Definition
BFD Bidirectional Forwarding Detection. A common fault detecting mechanism that uses Hello
packets to quickly detect a link status change and notify a protocol of the change. The
protocol then determines whether to establish or tear down a peer relationship.
DR Designated router. A router that applies only to PIM-SM. On the network segment that
connects to a multicast source, a DR sends Register messages to the RP. On the network
segment that connects to multicast receivers, a DR sends Join messages to the RP. In SSM
mode, a DR at the group member side directly sends Join messages to a multicast source.
Join A type of message used on PIM-SM networks. When a host requests to join a network
segment, the DR of the network segment sends a Join message to the RP hop by hop to
generate a multicast route. When the RP starts an SPT switchover, the RP sends a Join
message to the source hop by hop to generate a multicast route.
2022-07-08 2036
Feature Description
Term Definition
Reachable unicast routes are the basis of PIM forwarding. PIM uses the existing unicast
routing information to perform RPF check on multicast packets to create multicast routing
entries and set up an MDT.
Prune A type of message. If there are no multicast group members on a downstream interface, a
router sends a prune message to the upstream node. After receiving the prune message,
the upstream node removes the downstream interface from the downstream interface list
and stops forwarding data of the specified group to the downstream interface.
P-tunnel A public network tunnel used to transmit VPN multicast traffic. A P-tunnel can be
established using GRE, MPLS, or other tunneling technologies.
PMSI A logical tunnel used by a public network to transmit VPN multicast traffic. A sender PE
transmits VPN multicast traffic to receiver PEs over a PMSI tunnel. Receiver PEs determine
whether to accept the VPN multicast traffic based on PMSI tunnel information. PMSI
tunnels are categorized as I-PMSI or S-PMSI tunnels.
RD Route distinguisher. An 8-byte field in a VPN IPv4 address. An RD together with a 4-byte
IPv4 address prefix constructs a VPN IPv4 address to differentiate the IPv4 prefixes using
the same address space.
(S, G) A multicast routing entry. S indicates a multicast source, and G indicates a multicast group.
After a multicast packet with S as the source address and G as the group address reaches a
router, it is forwarded through the downstream interfaces of the (S, G) entry. The packet is
expressed as an (S, G) packet.
(*, G) A PIM routing entry. * indicates any source, and G indicates a multicast group. The (*, G)
entry applies to all multicast packets whose group address is G. All multicast packets that
are sent to G are forwarded through the downstream interfaces of the (*, G) entry,
regardless of which source sends the packets.
tunnel ID A group of information, including token, slot number of an outgoing interface, tunnel type.
VPN Virtual private network. A technology that implements a private network over a public
network.
2022-07-08 2037
Feature Description
Term Definition
VPN instance An entity that is set up and maintained by the PE devices for directly-connected sites. Each
site has its VPN instance on a PE device. A VPN instance is also called a VPN routing and
forwarding (VRF) table. A PE device has multiple forwarding tables, including a public-
network routing table and one or more VRF tables.
VPN target A BGP extended community attribute that is also called Route Target. In BGP/MPLS IP VPN,
VPN target controls VPN routing information. VPN target defines a VPN-IPv4 route can be
received by which site and a PE device can receive routes from which site.
MVPN target Control MVPN A-D route advertisement. MVPN target functions in a similar way as VPN
target on unicast VPNs.
A-D autodiscovery
AS autonomous system
CE customer edge
P2MP point-to-multipoint
2022-07-08 2038
Feature Description
P provider (device)
PE provider edge
RP rendezvous point
TE traffic engineering
Definition
As an MVPN technology independent of NG MVPN, multipoint extensions for LDP (mLDP) in-band MVPN is
usually deployed on an IP/MPLS backbone network that needs to carry multicast traffic. It uses mLDP
signaling to transmit PIM-SM/PIM-SSM Join messages and the mLDP-based data bearer mode to transmit
multicast and unicast services in the same VPN architecture. In the current version, mLDP signaling can
transmit only PIM-SM/PIM-SSM (S, G) Join messages.
Purpose
The MVPN solution mainly uses MVPN technologies to allow multicast services to be deployed on a
BGP/MPLS IP VPN and C-multicast traffic to be transmitted to remote VPN sites through the public network.
mLDP in-band MVPN encapsulates the (S, G) information carried in C-multicast PIM Join messages into the
Opaque value of mLDP P2MP Label Mapping messages, implementing one-to-one mapping between
multicast (S, G) entries and mLDP P2MP tunnels. In this manner, C-multicast route transmission and tunnel
establishment are integrated. This MVPN technology can be used to implement C-multicast or Global Table
Multicast (GTM) in an MPLS domain.
2022-07-08 2039
Feature Description
Length 2 octets Length of the Opaque value. The value of this field is 16. The Opaque
value consists of a source, a group, and an RD.
2022-07-08 2040
Feature Description
The Connector attribute is an optional transitive attribute of BGP and can be used to advertise root IP
addresses of mLDP tunnels. mLDP in-band MVPN is implemented differently according to whether the
VPNv4/EVPN route advertised by the ingress carries the Connector attribute. If the VPNv4/EVPN route
advertised by the ingress carries the Connector attribute, the egress uses the IP address carried in the
Connector attribute as the root IP address to instruct mLDP to create a tunnel. If the VPNv4/EVPN route
advertised by the ingress does not carry the Connector or VRF Route Import Extended Community attribute,
the egress uses the next hop address of the route as the root IP address to instruct mLDP to create a tunnel.
In an mLDP in-band MVPN scenario, you can determine whether to enable Connector attribute compatibility
on the ingress. If Connector attribute compatibility is enabled, the Connector attribute is sent to the remote
MP-BGP peer through BGP. If Connector attribute compatibility is disabled, the Connector attribute is
withdrawn, and the VRF Route Import Extended Community attribute is sent to the remote MP-BGP peer
through BGP.
For example, if CE2 sends a PIM (S, G) Join message, mLDP in-band MVPN is implemented as follows:
Scenario where the route advertised by the ingress carries the Connector attribute
Figure 2 shows the mLDP in-band MVPN data forwarding process in a scenario where the route advertised
by the ingress carries the Connector attribute.
2022-07-08 2041
Feature Description
Figure 2 Process of establishing an mLDP tunnel by mLDP in-band MVPN using VPNv4 (in a scenario where the
route advertised by the ingress carries the Connector attribute)
Table 1 Description of the process for establishing an mLDP tunnel by mLDP in-band MVPN using VPNv4 (in
a scenario where the route advertised by the ingress carries the Connector attribute)
PE1 After receiving the unicast route to the multicast source from CE1, PE1 converts it to a
VPNv4 route, encapsulates the Connector attribute into the VPNv4 route, and advertises
the VPNv4 route to PE2.
PE2 After receiving the route, PE2 matches it against import RTs of local VPN instances:
If a match is found, the VPNv4 route received from PE1 is accepted, and the Connector
attribute and RD information is stored.
If no match is found, the VPNv4 route is discarded.
CE2 After obtaining a join request through IGMP, CE2 sends a PIM Join message containing
an (S, G) entry to PE2 through PIM.
PE2 After receiving the PIM Join message, PE2 subscribes to the remote route based on the S
information in the PIM message, obtains the RD and root IP address (IP address carried
in the Connector attribute), and encapsulates the (S, G) information in the PIM Join
message and the obtained RD into the Opaque value of an mLDP Label Mapping
message. It then instructs mLDP to establish an mLDP P2MP tunnel from PE2 to PE1.
PE1 After receiving the mLDP Label Mapping message carrying the (S, G) information and
the RD, PE1 extracts the (S, G) information and converts it into a PIM Join message.
PE1 PE1 sends the PIM Join message to CE1 through PIM.
2022-07-08 2042
Feature Description
CE1 After receiving the PIM Join message, CE1 generates a multicast routing entry, in which
the downstream interface is the interface that receives the PIM Join message. At this
point, the multicast receiver successfully joins the multicast group, and CE1 can send
multicast traffic to CE2.
Scenario where the route advertised by the ingress carries the VRF Route Import Extended Community
attribute
Figure 3 shows the mLDP in-band MVPN data forwarding process in a scenario where the route advertised
by the ingress carries the VRF Route Import Extended Community attribute.
Figure 3 Process of establishing an mLDP tunnel by mLDP in-band MVPN using VPNv4 (in a scenario where the
route advertised by the ingress carries the VRF Route Import Extended Community attribute)
Table 2 Description of the process for establishing an mLDP tunnel by mLDP in-band MVPN using VPNv4 (in
a scenario where the route advertised by the ingress carries the VRF Route Import Extended Community
attribute)
PE1 After receiving the unicast route to the multicast source from CE1, PE1 converts it to a
VPNv4 route and advertises the VPNv4 route to PE2.
PE2 After receiving the route, PE2 matches it against import RTs of local VPN instances:
If a match is found, the VPNv4 route received from PE1 is accepted, and the route
information carried in the VRF Route Import Extended Community attribute is recorded
to the routing table of the corresponding local VPN instance.
If no match is found, the VPNv4 route is discarded.
2022-07-08 2043
Feature Description
CE2 After obtaining a join request through IGMP, CE2 sends a PIM Join message containing
an (S, G) entry to PE2 through PIM.
PE2 After receiving the PIM Join message, PE2 subscribes to the remote route based on the S
information in the PIM message, obtains the RD and root IP address (IP address carried
in the VRF Route Import Extended Community attribute), and encapsulates the (S, G)
information in the PIM Join message and the obtained RD into the Opaque value of an
mLDP Label Mapping message. It then instructs mLDP to establish an mLDP P2MP
tunnel from PE2 to PE1.
PE1 After receiving the mLDP Label Mapping message carrying the (S, G) information and
the RD, PE1 extracts the (S, G) information and converts it into a PIM Join message.
PE1 PE1 sends the PIM Join message to CE1 through PIM.
CE1 After receiving the PIM Join message, CE1 generates a multicast routing entry, in which
the downstream interface is the interface that receives the PIM Join message. At this
point, the multicast receiver successfully joins the multicast group, and CE1 can send
multicast traffic to CE2.
Scenario where the route advertised by the ingress does not carry the Connector attribute or VRF
Route Import Extended Community attribute
Figure 4 shows the mLDP in-band MVPN data forwarding process in a scenario where the route advertised
by the ingress does not carry the Connector attribute or VRF Route Import Extended Community attribute.
Figure 4 Process of establishing an mLDP tunnel by mLDP in-band MVPN using VPNv4 (in a scenario where the
route advertised by the ingress does not carry the Connector attribute or VRF Route Import Extended Community
attribute)
2022-07-08 2044
Feature Description
Table 3 Description of the process for establishing an mLDP tunnel by mLDP in-band MVPN using VPNv4 (in
a scenario where the route advertised by the ingress does not carry the Connector attribute or VRF Route
Import Extended Community attribute)
PE1 After receiving the unicast route to the multicast source from CE1, PE1 converts it to a
VPNv4 route and advertises the VPNv4 route to PE2.
CE2 After obtaining a join request through IGMP, CE2 sends a PIM Join message containing
an (S, G) entry to PE2 through PIM.
PE2 After receiving the PIM Join message, PE2 subscribes to the remote route based on the S
information in the PIM message, obtains the RD and root IP address (next-hop IP
address carried in the route) from the remote route, and encapsulates the (S, G)
information in the PIM Join message and the obtained RD into the Opaque value of an
mLDP Label Mapping message. It then instructs mLDP to establish an mLDP P2MP
tunnel from PE2 to PE1.
PE1 After receiving the mLDP Label Mapping message carrying the (S, G) information and
the RD, PE1 extracts the (S, G) information and converts it into a PIM Join message.
PE1 PE1 sends the PIM Join message to CE1 through PIM.
CE1 After receiving the PIM Join message, CE1 generates a multicast routing entry, in which
the downstream interface is the interface that receives the PIM Join message. At this
point, the multicast receiver successfully joins the multicast group, and CE1 can send
multicast traffic to CE2.
Dual-root 1+1 protection, which primarily protects sender PEs as well as public network tunnels, has the
following characteristics:
• It mainly uses traffic detection to detect link failures, ensuring fast convergence and high reliability.
• Only sender PEs and public network tunnels can be protected, whereas receiver PEs and CEs cannot be
protected.
• Two sender PEs (PE1 and PE2) are configured for an MVPN. mLDP P2MP is configured on PE1 and PE2
so that two mLDP P2MP tunnels (rooted at PE1 and PE2, respectively) can be established, with PE3 as a
leaf node.
• VPN FRR is configured on PE3 so that PE3 can have two routes to the same multicast source. PE3 uses
the route advertised by PE1 as the primary route, and that advertised by PE2 as the backup route.
Normal scenario
On the network shown in Figure 1, unicast routing, VPN, BGP, MPLS, and multicast are deployed properly.
When no fault occurs on the network:
• The process for a user to join a multicast group is as follows: CE3 sends a PIM Join message to PE3,
which converts the information in the message into the Opaque value of an mLDP Label Mapping
message and uses signaling information to establish mLDP tunnels to PE1 and PE2. PE1 and PE2 convert
the mLDP message to a PIM Join message, and send the PIM Join message to the corresponding CEs. In
this way, the user joins the multicast group.
• The multicast forwarding process is as follows: After receiving multicast traffic from the multicast
source, PE1 sends it to PE3 over the mLDP P2MP tunnel. Upon receipt, PE3 sends the traffic to CE3,
which in turn sends it to the multicast receiver. The multicast traffic received over the backup tunnel
(mLDP P2MP tunnel rooted at PE2) is discarded.
Fault scenario
A node or public network tunnel may fail on the network shown in Figure 1. If PE1 or the tunnel (primary
tunnel) passing through PE1 fails, PE3 can quickly detect the interruption of the traffic transmitted over the
primary tunnel through traffic detection and accepts the multicast traffic received over the backup tunnel.
When a fault occurs on the public network tunnel, P1, or the link where P1 resides, the processing procedure
is the same as the preceding procedure.
2022-07-08 2046
Feature Description
Terms
Term Definition
Join A type of message used on PIM-SM networks. When a host on a network segment requests
to join a multicast group, the receiver DR sends a Join message to the RP hop by hop to
generate a multicast route. When starting a switchover to the SPT, the RP sends a Join
message to the source hop by hop to generate a multicast route.
RD Route distinguisher. It is an 8-byte field in a VPN IPv4 address. An RD and a 4-byte IPv4
address prefix constitute a VPN IPv4 address to differentiate the IPv4 prefixes using the
same address space.
(S, G) A PIM routing entry. S indicates a multicast source, and G indicates a multicast group.
VPN Virtual private network, used to construct a private network on a public network.
AS Autonomous system
CE Customer edge
2022-07-08 2047
Feature Description
P2MP Point-to-multipoint
P Provider
PE Provider edge
Definition
Bit Index Explicit Replication (BIER) is a new multicast technology. It encapsulates the set of destinations of
multicast packets in the BitString format in the packet header before sending the packets. Transit nodes do
not need to establish an MDT for each multicast flow or maintain the states of multicast flows. Instead, the
transit nodes perform packet replication and forwarding according to the destination set in the packet
header.
In BIER, each destination node is a network edge node. For example, on a network with less than 256 edge
nodes, each node needs to be configured with a unique value from 1 to 256. In this case, the set of
destinations is represented by a 256-bit (32-byte) BitString, and the position or index of each bit in the
BitString indicates an edge node.
Purpose
In traditional multicast technologies, an MDT is established for each multicast flow, so that the multicast
flow is replicated along this specific MDT. In this way, the flow is transmitted and network bandwidth is
saved. Traditional multicast technology has the following characteristics:
• An MDT needs to be established for each multicast flow, and each node in the MDT needs to maintain
the multicast state. For example, PIM on the public network requires that a PIM MDT be established for
each multicast flow. NG MVPN requires that a P2MP tunnel be established for each multicast flow. The
P2MP tunnel is equivalent to a P2MP MDT.
• When a new multicast user joins a multicast group, the user needs to be added to the MDT hop by hop.
2022-07-08 2048
Feature Description
Traditional multicast technologies, however, cannot meet the requirements for rapid development of
multicast services in the following aspects:
• With the increase of multicast services, the number of MDTs that need to be maintained by traditional
multicast technologies increases sharply. Each node on the network is required to maintain the states of
a large number of multicast flows. When the network changes, the convergence of multicast entries is
slow.
• Multicast users are added to MDTs hop by hop, which increases the delay for users to be added to
MDTs, and multicast services cannot be quickly deployed. In addition, large-scale multicast service
requirements cannot be met. For example, to implement fast multicast service deployment on SDN
networks, it may be expected that a controller delivers destination information to edge nodes for
multicast replication.
To solve this problem, BIER uses the BitString format to encapsulate the set of destinations to which
multicast packets are to be sent in the packet header and then sends the packets.
Benefits
BIER offers the following benefits:
• Reduces resource consumption in large-scale multicast service scenarios, as BIER does not need to
establish an MDT for each multicast flow or maintain the states of multicast flows.
• Improves multicast group joining efficiency of multicast users in SDN network scenarios because
requests of the multicast users do not need to be forwarded along the MDT hop by hop, and instead
their requests are directly sent by leaf nodes to the ingress node. This is more suitable for the controller
on an SDN network to directly deliver the set of destinations to which multicast packets are to be sent
after collecting the set.
BIER Flooding
IS-IS for BIER encapsulates BIER path computation information in the packet header, and uses IS-IS LSPs to
flood the information.
IS-IS defines the BIER Info Sub-TLV to support the flooding of BIER information.
2022-07-08 2049
Feature Description
The sub-sub-TLVs field in the BIER Info Sub-TLV carries the BIER MPLS encapsulation information and
appears multiple times in one BIER Info Sub-TLV.
The format of sub-sub-TLVs is as follows:
2022-07-08 2050
Feature Description
1 byte.
• In the neighbor table, each directly connected neighbor has one entry.
• Each entry contains information about the edge nodes that are reachable to a neighbor.
2022-07-08 2051
Feature Description
flooded through the IGP. Each node on the network generates its BIER forwarding information. After
receiving a BIER packet carrying a BitString, each node performs packet replication and forwarding according
to the BitString in the packet.
• The ID refers to a BFR-ID. When the next hop of a BFR-ID is reached, the record needs to be queried.
• F-BM is short for Forwarding BitMask. It indicates the set of BIER domain edge nodes that are reachable
through the next hop after packets are replicated and sent to the next hop.
• NBR is short for neighbor. It indicates the next hop neighbor of a BFR-ID.
2022-07-08 2052
Feature Description
Table 1 Comparison between NG MVPN over BIER and NG MVPN over mLDP P2MP/RSVP-TE P2MP
NG MVPN in transmitting multicast traffic The packet encapsulation formats are different.
2022-07-08 2053
Feature Description
Typical NG MVPN deployment scenarios on the public NG MVPN over BIER supports only intra-AS intra-
network area scenarios.
NG MVPN over mLDP supports four scenarios:
intra-AS intra-area, intra-AS inter-area
nonsegmented, intra-AS segmented, and inter-AS
nonsegmented scenarios.
NG MVPN over RSVP-TE P2MP supports intra-AS
intra-area and intra-AS segmented scenarios.
Table 1 Values of fields in the PTA on the ingress and egress during BIER I-PMSI tunnel establishment
Flags 1 0
2022-07-08 2054
Feature Description
Sub-domain-id Set by the ingress PE based on the Sub-domain-id in the BIER tunnel
service carried over the tunnel information carried in the PMSI A-
D route received from the ingress
2022-07-08 2055
Feature Description
Figure 1 Time sequence for establishing Inclusive-PMSI (I-PMSI) tunnels with the P-tunnel type as BIER
Table 1 Procedure for establishing I-PMSI tunnels with the P-tunnel type as BIER
PE1 BGP and MVPN have As a sender PE, PE1 initiates the I-PMSI tunnel
been configured on establishment process.
PE1. PE1 has been
configured as a sender
PE. The I-PMSI tunnel
type has been set to
BIER.
PE1 BGP and MVPN have PE1 sends a Type 1 BGP A-D route to PE2. This route
been configured on carries the following information:
PE2. PE1 has MVPN Target: It is used to control A-D route
established a BGP advertisement, with the value set to the export MVPN
MVPN peer target configured on PE1.
relationship with PE2. PMSI Tunnel attribute: The tunnel type in the attribute
2022-07-08 2056
Feature Description
is set to BIER.
The PMSI Tunnel attribute carries the following
information:
Sub-domain-id: The value is the Sub-domain-id
configured in the MVPN I-PMSI view on the ingress PE.
BFR-ID: The value is the BFR-ID configured in the
corresponding sub-domain on the ingress PE.
BFR-prefix: The value is the BFR-prefix configured in
the corresponding sub-domain on the ingress PE.
PE2 - 1. After PE2 receives the BGP A-D route from PE1, PE2
matches the export MVPN target in the route against
its local import MVPN target. The two targets match.
Therefore, PE2 accepts this route.
2. PE2 replies with a Leaf A-D route carrying the
following information:
Sub-domain-id: The value is the Sub-domain-id in the
BIER tunnel information carried in the PMSI A-D route
sent by the ingress PE.
BFR-ID: The value is the BFR-ID configured in the
corresponding sub-domain on the leaf PE.
BFR-prefix: The value is the BFR-prefix configured in
the corresponding sub-domain on the leaf PE.
PE1 - After PE1 receives the BGP Leaf A-D route from PE2,
PE1 matches the export MVPN target in the route
against its local import MVPN target. The two targets
match. Therefore, PE1 accepts this route and records
PE2 as an MVPN neighbor.
2022-07-08 2057
Feature Description
Figure 1 Time sequence for switching multicast traffic from the I-PMSI tunnel to the S-PMSI tunnel
2022-07-08 2058
Feature Description
Table 1 Procedure for switching multicast traffic from the I-PMSI tunnel to the S-PMSI tunnel
PE1 When PE1 detects that the multicast traffic forwarding rate is higher than the
threshold, PE1 initiates a switchover from the I-PMSI tunnel to the S-PMSI tunnel
and advertises a BGP S-PMSI A-D route to its BGP peers. In the BGP S-PMSI A-D
route, the Leaf Information Require flag is set to 1, instructing the BGP peers that
receive this route to reply with a BGP Leaf A-D route if they want to join the S-
PMSI tunnel to be established. Although the control plane initiates the tunnel
switchover instruction, multicast traffic is not switched to the S-PMSI tunnel until
the delay timer expires.
PE2 After receiving the BGP S-PMSI A-D route from PE1, PE2 replies with a BGP Leaf A-
D route carrying the PMSI Tunnel attribute as PE2 has a receiver downstream.
PE3 After receiving the BGP S-PMSI A-D route from PE1, PE3 does not reply with a BGP
Leaf A-D route as PE3 has no receivers downstream, but PE3 records the BGP S-
PMSI A-D route. PE3 does not join the S-PMSI tunnel whose information is carried
in the S-PMSI A-D route.
PE1 Upon receipt of the BGP Leaf A-D route from PE2, PE1 generates a new BitString
with itself as the root node and PE2 as a leaf node. Then an S-PMSI tunnel is
established.
After PE3 has downstream receivers, PE3 will send a BGP Leaf A-D route to PE1. After receiving the route,
PE1 updates the leaf node set and generates a new BIER BitString. Then a new S-PMSI tunnel is established.
2022-07-08 2059
Feature Description
Figure 2 Time sequence for switching traffic back from the S-PMSI tunnel to the I-PMSI tunnel
Table 2 Procedure for switching traffic back from the S-PMSI tunnel to the I-PMSI tunnel
PE1 After PE1 detects that the multicast data forwarding rate is lower than the
threshold after the switchover, PE1 starts a switchback hold timer: Before the
timer expires:
If the multicast data forwarding rate increases and is higher than the threshold,
PE1 continues to use the S-PMSI tunnel to send traffic.
If the multicast data forwarding rate is lower than the threshold, PE1 switches
multicast traffic back to the I-PMSI tunnel for transmission. In addition, PE1 sends
a BGP Withdraw S-PMSI A-D route to PE2 and withdraws bindings between
multicast entries and the S-PMSI tunnel.
PE2 After receiving the BGP Withdraw S-PMSI A-D route from PE1, PE2 replies with a
BGP Withdraw Leaf A-D route.
PE2 After PE2 detects that none of its multicast entries is bound to the S-PMSI tunnel,
PE2 leaves the S-PMSI tunnel.
PE1 PE1 deletes the S-PMSI tunnel after waiting for a specified period of time.
2022-07-08 2060
Feature Description
over BIER
After a multicast receiver joins a multicast group, the multicast source can send MVPN traffic to the
multicast receiver over an established public network PMSI tunnel.
Table 1 describes how MVPN traffic is transmitted through NG MVPN over BIER.
2022-07-08 2061
Feature Description
2022-07-08 2062
Feature Description
Service Overview
In MVPN services, BIER can replace the traditional P2MP tunnel or public network PIM technology and
provide public network tunnels (P-Tunnels) to carry MVPN traffic.
Networking Description
PE1, PE2, PE3, and PE4 are edge nodes of the service provider's backbone network. PE4 is connected to the
multicast source, and PE1, PE2, and PE3 are connected to multicast users. An MVPN is configured for the
PEs, as shown in Figure 1
Feature Deployment
In this scenario, BIER deployment consists of the following aspects:
• Control plane:
■ IGP BIER is configured on the service provider's backbone network, and the unicast VPN runs
properly.
■ MVPN is configured on the service provider's backbone network, and BGP is used by PEs in the
same MVPN to exchange BGP A-D and BGP C-multicast routes.
• Data plane:
■ After receiving a multicast packet from the private network side, the sender PE (PE4) encapsulates
the packet with a BIER header carrying the BitString that represents a destination node set and
sends the encapsulated packet to P1.
■ After receiving the packet with the BIER header, receiver PEs (PE1, PE2, and PE3) find their own bit
2022-07-08 2063
Feature Description
positions in the BitString of the packet, remove the BIER header from the packet, search their VPN
routing tables for routing entries, and forward the packet accordingly.
■ After receiving the packet with the BIER header, intermediate nodes (P1 and P2) replicate the
packet, edit the BitString of the packet copy according to the F-BM in the BIER forwarding table,
and forward the packet copy according to the BIER forwarding table.
Terms
Term Definition
BIER sub- One BIER domain can be divided into multiple BIER sub-domains.
domain
BFR-ID ID of an edge router in a BIER sub-domain. For example, on a network with less than 256
edge BFRs, the value of BFR-ID ranges from 1 to 256.
2022-07-08 2064
Feature Description
Definition
Bit Index Explicit Replication IPv6 Encapsulation (BIERv6) is a multicast solution. With it, the ingress on an
IPv6 network encapsulates the set of nodes for which each multicast packet is destined as a BitString in the
packet header. Based on the BitString, each multicast packet is then replicated and forwarded. In this way,
transit nodes do not need to establish a multicast distribution tree (MDT) for each multicast flow or
maintain per-flow states.
The combined use of the BitString (64, 128, or 256 bits long) and a set ID (1 byte long at most) determines
the destination nodes of each multicast packet. Currently, a BIERv6 sub-domain supports a maximum of
65535 destination nodes.
Purpose
Conventional multicast protocols, such as Protocol Independent Multicast (PIM) and next-generation
multicast VPN (NG MVPN), need to establish an MDT for each multicast flow, and each node in the MDT
needs to maintain per-flow states. When joining a multicast group, a new user needs to be added to the
MDT hop by hop from the corresponding receiver PE, which resides at the edge of the network. This
mechanism brings the following problems:
• Difficult network capacity expansion: Because each multicast flow requires an MDT to be established
and each node in the MDT must maintain per-flow states, there is a linear increase in resource
consumption and volume of forwarded traffic. This means that conventional multicast protocols are
unsuitable for large-scale networks.
• Complex management and O&M: As multicast services continue to develop, there is a sharp increase in
the number of MDTs that need to be managed and maintained. Service management and O&M
become more complex due to the creation, teardown, and re-creation of numerous MDTs.
• Slow convergence after a failure occurs: A single point of failure causes the re-establishment of MDTs
for all multicast flows. As a result, fast convergence cannot be implemented.
• Difficulty in optimizing service experience: Each request message sent by users must be forwarded along
the MDT hop by hop, limiting the scope of optimizing user experience. This means that in an IPTV
scenario, for example, users cannot quickly receive program signals of a channel.
To resolve the preceding problems, a next-generation multicast technology is needed. This is where BIER or
BIERv6 comes into play. Compared with conventional multicast protocols, BIERv6 has the following
advantages (BIER does not have the first two advantages):
• Programmable IPv6 addresses, independent of MPLS label-based forwarding: Using the natural
programmability of IPv6 addresses, BIERv6 carries multicast service VPN information and BIER
forwarding instructions, eliminating the need for MPLS label-based forwarding. (This is not supported in
BIER.)
• Unified unicast and multicast protocols on an SRv6-based network: Similar to the SRv6 SID function
2022-07-08 2065
Feature Description
that carries L3VPN and L2VPN services, the IPv6 addresses in BIERv6 carry MVPN and Global Table
Multicast (GTM) services, simplifying network management and O&M. (This is not supported in BIER.)
• Applicable to large-scale networks: BIERv6 does not need to establish an MDT for each multicast flow
or maintain any per-flow state. This reduces resource consumption and allows BIERv6 to support large-
scale multicast services.
• Simplified protocol processing: Only an IGP and BGP need to be extended, and unicast routes are used
to forward traffic, sparing MDT establishment. Therefore, complex protocol processing, such as
multicast source sharing and SPT switching, is not involved.
• Simplified O&M: Transit nodes are unaware of changes in multicast service deployments. Consequently,
they do not need to withdraw or re-establish numerous MDTs when the network topology changes.
• Fast convergence and high reliability: Devices do not need to maintain per-flow MDT states, reducing
the number of entries that they need to store. Because devices need to update only one entry if a fault
occurs on a network node, faster convergence and higher reliability are achieved.
• Better service experience: When a multicast user requests to join a BIERv6 domain, the corresponding
receiver PE sends the request to the ingress directly, speeding up service response.
• SDN-oriented: Receiver PE and service information is set on the ingress. Other network nodes do not
need to create or manage complex protocol and tunnel entries. Instead, they only need to execute the
instructions contained in packets. This design concept is consistent with that of SDN.
Combining BIER with native IPv6 packet forwarding, BIERv6 does not need to explicitly establish MDTs, nor
does it require each transit node to maintain per-flow states. This means that BIERv6 can be seamlessly
integrated into an SRv6 network, simplifying protocol complexity and implementing efficient forwarding of
multicast packets for various services, such as IPTV, video conferencing, tele-education, telemedicine, and
online live telecasting.
BIERv6 has another form — Generalized Bit Index Explicit Replication (G-BIER). The implementation of G-BIER is similar
to that of BIERv6 except for the description of packet fields. By default, the device uses the BIERv6 mode to process
multicast packets.
2022-07-08 2066
Feature Description
packet header uses one IPv6 extension header: Destination Options Header (DOH), whose type value is 60.
Figure 1 shows the format of a BIERv6 packet header.
Next Header 8 bits Type of the header following the BIERv6 packet header. The common types
are as follows:
4: IPv4 packet
41: IPv6 packet
143: Ethernet frame
Hdr Ext Len 8 bits Length of the BIERv6 packet header excluding the first eight bytes (fixed
length), in multiples of 8 bytes.
Option Type 8 bits Option type. In BIERv6, the value is 0x7A, indicating a BIERv6 option.
Option 8 bits Length of the BIERv6 packet header, excluding Option Type and Option Length
Length fields, in bytes.
BIFT-ID 20 bits ID of a bit index forwarding table (BIFT). It consists of a 4-bit BSL, 8-bit sub-
domain ID, and 8-bit set ID.
2022-07-08 2067
Feature Description
TC 3 bits This field is set to 0 during packet encapsulation and is ignored when the
packet is received. This field can be considered as a reserved field.
S 1 bit This field is set to 1 during packet encapsulation and is ignored when the
packet is received. This field can be considered as a reserved field.
In G-BIER mode, TC and S are combined into Rev1, whose value is set to 0
during packet encapsulation. Rev1 is ignored when the packet is received and
can be regarded as a reserved field.
TTL 8 bits Time to live (TTL), indicating the maximum number of hops through which a
packet can be forwarded using BIERv6. The TTL value decreases by 1 each
time the packet passes through a BIERv6 forwarding node. When the TTL
becomes 0, the packet is discarded.
Nibble 4 bits This field is set to 0 during packet encapsulation and is ignored when the
packet is received. This field can be considered as a reserved field.
Ver 4 bits Version of the BIERv6 packet format. The value is 0 in the current version and
can be ignored when the packet is received.
BSL 4 bits Coded value of the BitString length (BSL). The available coded values are as
follows:
0001: indicates that the BSL is 64 bits long.
0010: indicates that the BSL is 128 bits long.
0011: indicates that the BSL is 256 bits long.
Other values are currently reserved according to an RFC and unsupported.
One or more BSLs can be configured in a BIERv6 sub-domain.
In G-BIER mode, BSL, Nibble, and Ver are combined into Rev2, and the
requirements for packet encapsulation are the same as those in BIERv6. Rev2
is ignored when the packet is received and can be regarded as a reserved field.
Entropy 20 bits Entropy value that can be used for load-balancing purposes.
OAM 2 bits Used for operations, administration and maintenance (OAM). It has no impact
on packet forwarding. The default value is 0.
DSCP 6 bits This field can be used to differentiate services and is not used currently.
Proto 6 bits This field is set to 0 during packet encapsulation and is ignored when the
2022-07-08 2068
Feature Description
BFIR-ID 16 bits This field is set to 0 during packet encapsulation and is ignored when the
packet is received. This field can be considered as a reserved field.
Next-Hop Address
On a BIERv6 network, each BIERv6 forwarding node must have SRv6 enabled and be configured with an IPv6
address (called an End.BIER SID) for forwarding BIERv6 packets. In G-BIER, this address is called a multicast
policy reserved address (MPRA). When processing a received BIERv6 packet, each node encapsulates the
End.BIER SID/MPRA of the next-hop node as the outer IPv6 destination address of the BIERv6 packet (note
that the destination nodes of the multicast packet are defined through the BitString). Upon receiving the
BIERv6 packet, the next-hop node forwards it according to the BIERv6 process.
Figure 2 shows the structure of the End.BIER SID/MPRA, in which the most significant bits must be a locator.
The locator is an IPv6 address prefix of an SRv6 node, can be considered as this node's identifier, and is used
for route addressing.
2022-07-08 2069
Feature Description
Sub-TLV BIER Info Sub-TLV Advertises information such as sub- TLV (Type = 237) in IS-IS
domain IDs and BFR-IDs. packets
Sub-Sub- End.BIER Info Sub-sub-TLV Advertises End.BIER SIDs or MPRAs. BIER Info Sub-TLV
TLV MPRA Info Sub-sub-TLV
BIERv6 Encapsulation Sub- Advertises the Max SI (short for set BIER Info Sub-TLV
sub-TLV ID), BSL, and start BIFT-ID.
Type 8 bits TLV type. The value is 236 or 237. The value 236 indicates that this TLV
is used if the IPv6 capability is enabled for an IS-IS process in standard
topology mode using the ipv6 enable topology standard command.
The value 237 indicates that this TLV is used if the IPv6 capability is
enabled for an IS-IS process in IPv6 topology mode using the ipv6
enable topology ipv6 command. Currently, only the type value 237 is
2022-07-08 2070
Feature Description
supported.
Prefix Len 8 bits The value ranges from 0 to 128, and is 128 when BIERv6 information is
carried.
Prefix 128 bits BFR-prefix, which is a loopback interface IPv6 address of a BFR in a sub-
domain.
BIER Info Variable This field is optional and used to carry BIER information.
Sub-TLV
2022-07-08 2071
Feature Description
BAR 8 bits BIER algorithm used to calculate a path to a BFER. BIER algorithms are
defined by the Internet Assigned Numbers Authority (IANA).
IPA 8 bits IGP algorithm, which can be used by the BIER algorithm defined in the
BAR field.
Sub-sub-TLV Variable This field is optional. Whether the sub-sub-TLV is present is determined
by the Length field. The BIER Info Sub-TLV may include End.BIER Info
Sub-sub-TLV and BIERv6 Encapsulation Sub-sub-TLV.
2022-07-08 2072
Feature Description
Extension Not limited This field is optional and is not filled in. If a received packet contains the
Extension field, this field is ignored.
2022-07-08 2073
Feature Description
BSL 4 bits Coded value of the BitString length. The available coded values are as
follows:
0001: indicates that the BSL is 64 bits long.
0010: indicates that the BSL is 128 bits long.
0011: indicates that the BSL is 256 bits long.
Other values are currently reserved according to an RFC and
unsupported.
BIFT-ID 20 bits This field is set to 0 during packet encapsulation and is ignored when
the packet is received. This field can be considered as a reserved field.
2022-07-08 2074
Feature Description
1. Each BFR in a BIERv6 sub-domain uses TLVs defined in IS-ISv6 for BIERv6 to advertise information
about the local BFR-prefix, sub-domain ID, BFR-ID, BSL, and path calculation algorithm to other BFRs.
2. Each BFR obtains the BFR-neighbor to each BFER through path calculation and generates BIRT entries.
3. Each BFR performs a bitwise OR operation between the BFR-IDs in the BIRT entries with the same
BFR-neighbor to obtain the F-BM and generates BIFT entries based on the BIRT entries.
dynamically added to the multicast group. For example, if a terminal user requests to watch an IPTV channel
but the BFER corresponding to this user does not have the channel's traffic, the BFER needs to be added to a
multicast group corresponding to the channel. On a BIERv6 network, the process for a host to join a
multicast group does not require hop-by-hop negotiation and is transparent to transit nodes.
Figure 1 shows the process for a host to join a multicast group.
1. The receive end (host) sends a message (such as an IGMP Report message) to the connected BFER,
requesting to join the multicast group corresponding to a multicast service.
3. After receiving the Join message, the BFIR sets the bit position of the BFER in the BitString to 1.
4. Upon receipt of a multicast packet, each BFR replicates the packet and sends a packet copy to the
next hop based on the new BitString until the BFER receives the multicast packet.
2022-07-08 2076
Feature Description
Concept Description
Domain A network domain in which multicast data packets are forwarded using BIERv6. A
domain can contain one or a maximum of eight sub-domains.
BFR, BFIR, and BFER Bit forwarding routers (BFRs) are nodes that forward packets according to the
BIERv6 process.
A bit forwarding ingress router (BFIR) is an ingress router through which multicast
packets enter a sub-domain. A bit forwarding egress router (BFER) is an egress
router through which multicast packets leave a sub-domain. Both the BFIR and
BFER are BFRs, and they are edge nodes in a BIERv6 sub-domain.
When planning sub-domains, ensure that the BFIR and all the BFERs to which the multicast traffic received
2022-07-08 2077
Feature Description
by the BFIR is to be sent reside in the same sub-domain. This is necessary because sub-domain stitching is
currently not supported. To facilitate management and improve the forwarding efficiency on a multicast
network, you are advised to plan sub-domains based on the suggestions provided in Table 2.
Single-AS small- and Plan only one sub-domain, with all IGP areas in it.
medium-sized
network
Single-AS multi- Plan sub-domains in one domain based on the number of topologies, with each
topology network topology corresponding to one sub-domain.
Multi-AS large-scale Inter-AS static traversal must be deployed. The sub-domain planning solutions are
network as follows:
Plan only one sub-domain, with all ASs in it. In addition, set BFR-IDs and BSLs so
that the BFERs in the same AS have the same set ID. This helps to improve
forwarding efficiency. The concept and calculation formula of a set ID are
described in the following section.
2022-07-08 2078
Feature Description
In the preceding formulas, mod indicates a modulo operation, and int rounds down to the nearest integer.
Figure 2 shows an example of the mapping among BFR-IDs, a 256-bit BitString, and set IDs.
A set ID and BitString together uniquely identify the destination nodes of each multicast packet in a sub-
domain. In cases where the destination nodes of a multicast packet reside in two or more sets, the BFIR
replicates the multicast packet based on the number of sets to which these destination nodes belong. On the
network shown in Figure 3, the BSL is 256 bits. If the destination nodes of a multicast packet are BFER 1
(with BFR-ID 1) and BFER 2 (with BFR-ID 2), the BFIR needs to send only one packet copy, in which the set
ID is 0 and the BitString is ...11 (in this example, ... indicates 254 consecutive 0s). If the destination nodes of
a multicast packet are BFER 1 (with BFR-ID 1) and BFER 257 (with BFR-ID 257), the BFIR needs to send two
packet copies: one in which the set ID is 0 and the BitString is ...01, and the other in which the set ID is 1 and
the BitString is ...01.
Planning BSLs and BFR-IDs properly can reduce the number of multicast packet copies and improve the
forwarding efficiency on the multicast network. When planning BSLs and BFR-IDs, you are advised to adhere
to the following guidelines:
2022-07-08 2079
Feature Description
• Denseness: Set the maximum BFR-ID as the number of BFERs in a sub-domain to deploy as few sets as
possible. For example, if a sub-domain contains up to 256 BFERs, allocate BFR-IDs to the BFERs within
the range of 1 to 256. Similarly, if the sub-domain contains up to 512 BFERs, allocate BFR-IDs to the
BFERs within the range of 1 to 512.
• Region: Allocate the BFERs in the same region to the same set.
• Necessity: Allocate BFR-IDs only for BFERs. If a BFIR also functions as a BFER, you also need to configure
a BFR-ID for it. For a BFIR-only node (functioning as a BFIR but not a BFER), it does not need to be
configured with a BFR-ID.
• Evolvability: Reserve some IDs in each set for future network expansion.
2022-07-08 2080
Feature Description
After receiving a multicast packet, each BFR performs the following operations:
1. The BFR identifies that the destination address of the packet is the local End.BIER SID and initiates
BIERv6 processing.
2. The BFR identifies the BIFT-ID and BitString in the packet, and then locates the corresponding BIFT
according to the BIFT-ID.
3. The BFR reads the first line in the BIFT and performs a bitwise AND operation between the forwarding
bitmask (F-BM) in this line and the BitString of the packet to obtain a new BitString.
4. The BFR performs one of the following actions based on the calculation result:
• If the new BitString is 0 (all the bit positions are 0), the BFR does not replicate the packet or send
a copy to the BFR-neighbor (BFR-NBR).
• If the new BitString is not 0 and the BFR-neighbor is not the BFR, the BFR replicates the packet
2022-07-08 2081
Feature Description
and replaces the current BitString with the new BitString in the packet copy. It also places the
End.BIER SID of the BFR-neighbor in the destination address field of the packet copy, and then
forwards the packet copy to the BFR-neighbor.
• If the new BitString is not 0 and the BFR-neighbor is the BFR, the BFR replicates the packet and
checks whether the bit position with the value of 1 in the new BitString represents the BFR itself.
If the bit position with the value of 1 in the new BitString represents the BFR itself, the BFR
removes the BIERv6 header from the packet copy and forwards the packet out of the BIERv6
network. Otherwise, the BFR discards the packet copy.
5. The BFR checks the rest of the lines in the BIFT one by one and performs corresponding operations.
Definition
Multicast VPN (MVPN) offers advantages such as bandwidth saving, service isolation, high reliability, and
good scalability, making it the option of choice as networks carry more and more video services. MVPN over
BIERv6 uses BIERv6 as a transport tunnel to forward VPN IP multicast traffic across the VPN. MVPN over
BIERv6 applies to both IPv4 and IPv6 networks, and is known as MVPNv4 over BIERv6 and MVPNv6 over
BIERv6, respectively.
Figure 1 shows the typical MVPN over BIERv6 networking, in which PIM must be enabled on the sender PE
and the network where the multicast source resides for interworking. Due to the service deployment and to
facilitate subsequent descriptions, this document refers to the networks where CEs reside as VPNs, and the
network where PEs and the P reside as a public network.
2022-07-08 2082
Feature Description
Network Architecture
An MVPN over BIERv6 network consists of three layers, as shown in Figure 2.
• Underlay: It uses an IGP to establish adjacencies between BFRs and generate BIFTs to implement
network interworking. Similar to a BIERv6 network, the MVPN over BIERv6 network also uses the TLVs
defined in IS-ISv6 for BIERv6.
• BIERv6 layer: The BFIR encapsulates a BIERv6 packet header containing a BitString for each multicast
packet. According to the BitString, transit BFRs forward the packets. Upon receipt of such packets, BFERs
remove the BIERv6 packet header and send the packets to the overlay module for processing.
• Overlay: A VPN instance is created and bound to an interface on each PE, and BGP MVPN peer
relationships are established using BGP MVPN Network Layer Reachability Information (NLRI). Based on
MVPN A-D routes, the sender PE calculates the set ID and BitString to be encapsulated in the BIERv6
packet header. In this manner, multicast forwarding paths are established between the sender PE and
receiver PEs. In addition, a receiver PE constructs a BGP MVPN C-multicast (C is short for customer)
route based on the PIM Join/Prune message received from a VPN and sends the C-multicast route to
the sender PE, which then converts the route into a PIM Join/Prune message and sends the message to
the corresponding CE.
2022-07-08 2083
Feature Description
• Inclusive-PMSI (I-PMSI) tunnel: connects all PEs in the same MVPN. An I-PMSI tunnel is typically used as
the default tunnel for data forwarding.
• Selective-PMSI (S-PMSI) tunnel: connects to some PEs in the same MVPN. An S-PMSI tunnel is used to
transmit VPN data to PEs that require the data. If no multicast user requests multicast data from the
corresponding (S, G) in the VPN connected to a receiver PE, the receiver PE will not receive such data.
Compared with an I-PMSI tunnel, an S-PMSI tunnel prevents redundant data from being forwarded,
thereby conserving network bandwidth.
On an MVPN over BIERv6 network, both an I-PMSI tunnel and an S-PMSI tunnel can exist. The I-PMSI tunnel
is used for data forwarding by default, and the S-PMSI tunnel automatically generates one or more tunnels
based on the (S, G) information of different multicast services. For details about how PMSI tunnels are
established, see BIERv6 PMSI Tunnel Establishment.
2022-07-08 2084
Feature Description
discovery, establish and maintain PMSI tunnels, and advertise C-multicast routes for MVPN members to join
or leave multicast groups. Each MVPN over BIERv6 control message is carried in the NLRI field in a BGP
Update message.
Field Description
Route Type Type of a BGP MVPN route (MVPN route for short). MVPN routes have seven
types. For details, see Table 2.
Route Type Specific MVPN route information. Different types of MVPN routes contain different
information. Therefore, the length of this field is variable.
Table 2 describes the types and functions of MVPN routes. C-S refers to the IP address (C-Source IP) of a C-
multicast source, and C-G refers to the IP address (C-Group IP) of a C-multicast group. (C-S, C-G) multicast
traffic is sent to all hosts that have joined the C-G multicast group and request data sent from the multicast
source address C-S. (C-*, C-G) multicast traffic is sent to all hosts that have joined the C-G multicast group
and have no requirements for a specific multicast source address.
1: Intra-AS I-PMSI A-D It is used for intra-AS MVPN member auto-discovery The two types of routes
route and is advertised by each PE with MVPN enabled. are called MVPN auto-
discovery (A-D) routes
2: Inter-AS I-PMSI A-D It is used for inter-AS MVPN member auto-discovery
and are used to
route and is advertised by each ASBR with MVPN enabled.
automatically discover
This type of route is currently not supported (the inter-
MVPN members and
AS static traversal solution is used instead).
2022-07-08 2085
Feature Description
3: S-PMSI A-D route It is used by a sender PE to send a notification of establish PMSI tunnels.
establishing a selective P-tunnel for a particular (C-S,
C-G).
6: Shared Tree Join It is used in (C-*, C-G) scenarios. The two types of routes
route When a receiver PE receives a PIM (C-*, C-G) Join or are called C-multicast
Prune message, it converts the message into a Shared routes. They are used to
Tree Join route and sends the route to the sender PE initiate the join or leave
through the BGP peer relationship. of VPN users and guide
the transmission of C-
7: Source Tree Join It is used in (C-S, C-G) scenarios. multicast traffic.
route When a receiver PE receives a PIM (C-S, C-G) Join or
Prune message, it converts the message into a Source
Tree Join route and sends the route to the sender PE
through the BGP peer relationship.
Table 3 describes the fields of the Route type specific field in different route types.
1: Intra-AS I-PMSI RT Used to filter the routing entries that can be leaked to the local
A-D route routing table.
2022-07-08 2086
Feature Description
Prefix-SID In BIERv6 mode, this field carries the Src.DTX SID (IPv6 address
for forwarding BIERv6 packets).
In G-BIER mode, the MSID field carries Src.DTX SID information.
By default, a device uses the Prefix-SID attribute to carry the
information. You can configure the device to use a Multicast
Service Identifier (MSID) to carry the information.
3: S-PMSI A-D RT Used to filter the routing entries that can be leaked to the local
route routing table.
RD VPN RD.
4: Leaf A-D route RT Used to filter the routing entries that can be leaked to the local
routing table.
2022-07-08 2087
Feature Description
6: Shared Tree RT Used to filter the routing entries that can be leaked to the local
Join route; 7: routing table.
Source Tree Join
Next Hop Next-hop IP address.
route
RD VPN RD.
2022-07-08 2088
Feature Description
Figure 2 shows the format of the PTA. Table 4 describes the values of the fields in the PTA on the ingress
and egress.
Table 4 Values of the fields in the PTA on the ingress and egress
Flags 1 0
Sub-domain-id Set by the ingress based on the Sub-domain ID in the BIER tunnel
service carried over a tunnel information carried in the PMSI A-D
route received from the ingress
2022-07-08 2089
Feature Description
Field Description
TLV Type The value 5 indicates SRv6 L3 Service TLV. The value 6 indicates SRv6 L2 Service
TLV.
Route type SRv6 service information. The length of this field is variable. For detailed format
specific(variable) of this field, see the format of SRv6 Service Sub-TLVs.
Field Description
SRv6 Service Sub-TLV The length of this field is variable, and this field contains data specific to SRv6
2022-07-08 2090
Feature Description
Field Description
Field Description
SRv6 Service Sub-TLV The value is fixed at 1, indicating SRv6 SID Information Sub-TLV.
Type
Reserved1 Reserved field. For the transmit end, the value is 0. This field must be ignored
on the receive end.
SRv6 SID Flags SRv6 SID flag. For the transmit end, the value is 0. This field must be ignored on
the receive end.
Reserved2 Reserved field. For the transmit end, the value is 0. This field must be ignored
2022-07-08 2091
Feature Description
Field Description
SRv6 Service Data Sub- Used to advertise SRv6 SID attributes. The length is variable.
Sub-TLV Value
Type Description
Sub-TLV Length Length. If there is one sub-sub-TLV, the value is 25. If there is no sub-sub-TLV, the
value is 17.
In G-BIER, the sub-TLV can carry one or no sub-sub-TLV. Figure 7 shows the format of the sub-sub-TLV, and
Table 9 describes the fields in the sub-sub-TLV.
Type Description
2022-07-08 2092
Feature Description
Type Description
MSID Length Length of the MSID. It is recommended that the maximum value be less than or equal
to 20.
On the MVPN over BIERv6 network, the ingress encapsulates the Src.DT4 SID or Src.DT6 SID as the outer
source IPv6 address of each BIERv6 packet. This source address remains unchanged during transmission from
2022-07-08 2093
Feature Description
1. After receiving a C-multicast packet, the sender PE selects a PMSI tunnel based on the C-Source IP and
C-Group IP in the C-IP Header of the packet. This PE then inserts the outer BIERv6 packet header
(including a set ID and BitString) into the packet based on the tunnel attribute, and sets its Src.DT4
SID or Src.DT6 SID to the outer source IPv6 address.
2. According to the BIERv6 forwarding process, the sender PE queries the BIFT, sets the destination
address of the packet to the End.BIER SID of the next-hop node, and forwards the multicast packet
2022-07-08 2094
Feature Description
through one or more matched outbound interfaces. For details about the BIERv6 forwarding process,
see BIERv6 Forwarding Plane Fundamentals.
3. Transit nodes on the MVPN over BIERv6 network forward the packet according to the BIERv6
forwarding process.
4. After receiving the multicast packet, a receiver PE determines that it is a destination node of the
BIERv6 packet according to the BitString. This PE removes the BIERv6 header and then forwards the
packet out of the MVPN over BIERv6 network and into the VPN for forwarding.
1 PE1, PE2, and Complete basic network configurations, including BIERv6 network configurations.
PE3
2022-07-08 2095
Feature Description
PE3
3 PE1 Configure PE1 as a sender PE and set the I-PMSI tunnel type to BIER in the IPv6
MVPN I-PMSI view.
4 PE1 PE1 sends an Intra-AS I-PMSI A-D route to PE2 and PE3 through BGP peer
relationships. The route carries the following information:
MVPN RT: controls A-D route advertisement. The value is the export MVPN
target configured on PE1.
PTA:
Tunnel Type: The tunnel type is BIER.
Sub-domain-id: sub-domain ID of PE1.
BFR-ID: BFR-ID of PE1.
BFR-prefix: BFR-prefix of PE1.
In G-BIER scenarios, in addition to MVPN RT and PMSI tunnel attribute
information, the routes advertised by PE1 carry the MSID attribute, which
contains the following information:
IPv6 Address: MVPN Src.DTX SID
Prefix Length: Src.DTX SID locator prefix length
MSID Length: Src.DTX SID locator (128 - prefix length – args length)
5 PE2 and PE3 After receiving the route from PE1, PE2 and PE3 reply with a Leaf A-D route. The
route carries the following information:
Sub-domain-id: sub-domain ID of PE2 or PE3. The value must be the same as the
sub-domain-ID of PE1.
BFR-ID: BFR-ID of PE2 or PE3.
BFR-prefix: BFR-prefix of PE2 or PE3.
6 PE1 After receiving the routes from PE2 and PE3, PE1 records PE2 and PE3 as MVPN
members and sets their bit positions in the BIERv6 BitString corresponding to the
tunnel to 1. Consequently, PE2 and PE3 join the BIERv6 I-PMSI tunnel.
2022-07-08 2096
Feature Description
1 PE1 PE1 must be configured with an address pool range and criteria for switching
from an I-PMSI tunnel to an S-PMSI tunnel. This includes the multicast group
address pool, BSL, maximum number of S-PMSI tunnels that can be dynamically
established, forwarding rate threshold for the switching, and a delay for the
switching.
2 PE1 After PE1 receives a C-multicast packet, it determines the I-PMSI tunnel based on
the (C-S, C-G) or (C-*, C-G) entry carried in the packet. If the address pool range
and criteria for the I-PMSI-to-S-PMSI tunnel switching are met, PE1
automatically establishes an S-PMSI tunnel. If a delay for the switching has been
set, multicast traffic is not switched to the S-PMSI tunnel until the delay timer
expires.
3 PE1 PE1 sends an S-PMSI A-D route carrying PMSI tunnel information to PE2 and
PE3. In the route, the Leaf Information Require flag is set to 1, instructing PE2
and PE3 to reply with a BGP Leaf A-D route if they want to join the S-PMSI
tunnel.
4 PE2 and PE3 After receiving the S-PMSI A-D route, PE2 and PE3 record the route locally and
check whether they have receivers in the connected VPN. PE2 determines that it
has such receivers, and PE3 determines that it has no such receivers.
5 PE2 Because it has receivers in the connected VPN, PE2 sends a Leaf A-D route to
2022-07-08 2097
Feature Description
6 PE1 After receiving the Leaf A-D route from PE2, PE1 records PE2 as a receiver PE
and sets the bit position of PE2 in the BIERv6 BitString to 1. PE2 then joins the
BIERv6 S-PMSI tunnel.
If VPN receivers request multicast data from PE3 after a while, PE3 sends a Leaf A-D route to PE1. After
receiving the route, PE1 updates the receiver PE set and generates a new BIERv6 BitString. PE3 then joins the
BIERv6 S-PMSI tunnel.
Figure 1 Process of switching traffic from an S-PMSI tunnel back to the I-PMSI tunnel
Table 1 Description of the switchback from an S-PMSI tunnel to the I-PMSI tunnel
1 PE1 When PE1 detects that the forwarding rate of multicast traffic is lower than the
threshold, it starts the switchback timer. Before the timer expires:
2022-07-08 2098
Feature Description
If the multicast traffic forwarding rate increases above the threshold, PE1
continues using the S-PMSI tunnel to forward the traffic.
If the multicast traffic forwarding rate remains lower than the threshold, PE1
switches multicast traffic back to the I-PMSI tunnel for transmission.
2 PE1 PE1 sends an S-PMSI A-D route to PE2, instructing PE2 to withdraw the bindings
between multicast entries and the S-PMSI tunnel.
3 PE2 After receiving the S-PMSI A-D route, PE2 withdraws the bindings between
multicast entries and the S-PMSI tunnel, and then replies with a Leaf A-D route.
4 PE2 PE1 deletes the S-PMSI tunnel after waiting for a specified period of time.
Definition
In IPTV service scenarios, carriers may use public networks to implement multicast user access and multicast
service deployment. The multicast routing information generated in this case can be stored in the global
table. Such multicast routing information can be referred to as Global Table Multicast (GTM) information.
GTM over BIERv6 allows GTM traffic to be transmitted to BIERv6 domains or receiver nodes over BIERv6
tunnels. GTM over BIERv6 is classified as GTMv4 over BIERv6 or GTMv6 over BIERv6.
Figure 1 shows the typical networking of GTM over BIERv6. In the networking of GTM over BIERv6, PIM must
be enabled on the sender PE and the network where the multicast source resides for interworking.
To facilitate subsequent description and take service deployment into consideration, we refer to the
networks where CEs reside as user-side public networks, and the network where PEs and the P reside as a
network-side public network.
2022-07-08 2099
Feature Description
2022-07-08 2100
Feature Description
• Inclusive-PMSI (I-PMSI) tunnel: connects all PEs on the network-side public network. An I-PMSI tunnel is
typically used as the default tunnel for data forwarding.
• Selective-PMSI (S-PMSI) tunnel: connects to some PEs on the network-side public network. The
multicast traffic carried over an S-PMSI tunnel is sent only to the PEs that require the traffic. If no
multicast user requests multicast traffic from the corresponding (S, G) on the user-side public network
to which a receiver PE is connected, the receiver PE will not receive such traffic. Compared with an I-
PMSI tunnel, an S-PMSI tunnel prevents redundant data from being forwarded, thereby conserving
network bandwidth.
On a GTM over BIERv6 network, one I-PMSI tunnel and multiple S-PMSI tunnels can coexist. The I-PMSI
tunnel is used for data forwarding by default, and one or more S-PMSI tunnels are automatically generated
based on the (S, G) information of different multicast services. For details about PMSI tunnel establishment,
see BIERv6 PMSI Tunnel Establishment.
2022-07-08 2101
Feature Description
Field Description
Route Type Type of a BGP MVPN route (MVPN route for short). MVPN routes are classified
into seven types. For details, see Table 2.
Route Type Specific MVPN route information. Different types of MVPN routes contain different
information. Therefore, the length of this field is variable.
Table 2 describes the types and functions of MVPN routes. C-S refers to the IP address (C-Source IP) of a
multicast source on a user-side public network, and C-G refers to the IP address (C-Group IP) of a multicast
group on a user-side public network. (C-S, C-G) multicast traffic is sent to all hosts that have joined the C-G
multicast group and request data sent from the multicast source address C-S. (C-*, C-G) multicast traffic is
sent to all hosts that have joined the C-G multicast group and have no multicast source address specified.
1: Intra-AS I-PMSI A-D It is mainly used for intra-AS MVPN member auto- The two types of routes
route discovery and is initiated by each PE with MVPN are called MVPN auto-
enabled. discovery (A-D) routes
and are used to
2: Inter-AS I-PMSI A-D It is mainly used for inter-AS MVPN member auto-
automatically discover
route discovery and is initiated by each ASBR with MVPN
MVPN members and
enabled. This type of route is currently not supported
establish PMSI tunnels.
(the inter-AS static traversal solution is used instead).
2022-07-08 2102
Feature Description
6: Shared Tree Join It is used in (C-*, C-G) scenarios. The two types of routes
route When a receiver PE receives a PIM (C-*, C-G) Join or are called C-multicast
Prune message, it converts the message into a Shared routes. They are used to
Tree Join route and sends the route to the sender PE initiate join and leave
through the BGP peer relationship. requests of users on the
user-side public network
7: Source Tree Join It is used in (C-S, C-G) scenarios. and guide the
route When a receiver PE receives a PIM (C-S, C-G) Join or transmission of user-side
Prune message, it converts the message into a Source public network multicast
Tree Join route and sends the route to the sender PE traffic.
through the BGP peer relationship.
Table 3 describes the fields of the Route type specific field in different route types.
1: Intra-AS I-PMSI RT Used to filter the routing entries that can be leaked to the local
A-D route routing table. You can choose whether to configure this item.
2022-07-08 2103
Feature Description
Prefix-SID In BIERv6 mode, this field carries Src.DTX SID (IPv6 address used
to forward BIERv6 packets) information.
In G-BIER mode, the MSID field carries Src.DTX SID information.
By default, a device uses the Prefix-SID attribute to carry the
information. You can configure the device to use the MSID to
carry the information.
3: S-PMSI A-D RT Used to filter the routing entries that can be leaked to the local
route routing table. You can choose whether to configure this item.
4: Leaf A-D route RT Used to filter the routing entries that can be leaked to the local
routing table.
2022-07-08 2104
Feature Description
RT Used to filter the routing entries that can be leaked to the local
routing table. You can choose whether to configure this item.
6: Shared Tree RT Used to filter the routing entries that can be leaked to the local
Join route; 7: routing table.
Source Tree Join
Next Hop Next-hop IP address.
route
2022-07-08 2105
Feature Description
to a receiver PE, and in MVPN NLRI Type 4 routes that are advertised by a receiver PE to a sender PE.
Figure 2 shows the format of the PTA. Table 4 describes the values of the fields in the PTA on the ingress
and egress.
Table 4 Values of the fields in the PTA on the ingress and egress
Flags 1 0
Sub-domain-id Set by the ingress based on the Sub-domain ID in the BIER tunnel
service carried over a tunnel information carried in the PMSI A-D
route sent by the ingress
BFR-ID BFR-ID configured for the ingress in BFR-ID configured for the egress in
the corresponding sub-domain the corresponding sub-domain
BFR-prefix BFR-prefix configured for the ingress BFR-prefix configured for the egress
in the corresponding sub-domain in the corresponding sub-domain
2022-07-08 2106
Feature Description
Field Description
TLV Type The value 5 indicates SRv6 L3 Service TLV. The value 6 indicates SRv6 L2 Service
TLV.
SRv6 Service Sub-TLVs SRv6 service information. The length of this field is variable. For detailed format
(variable) of this field, see the format of SRv6 Service Sub-TLVs.
Field Description
SRv6 Service Sub-TLV The length of this field is variable, and this field contains data specific to SRv6
2022-07-08 2107
Feature Description
Field Description
Field Description
SRv6 Service Sub-TLV The value is fixed at 1, indicating SRv6 SID Information Sub-TLV.
Type
Reserved1 Reserved field. For the transmit end, the value is 0. This field must be ignored
on the receive end.
SRv6 SID Flags SRv6 SID flag. For the transmit end, the value is 0. This field must be ignored on
the receive end.
Reserved2 Reserved field. For the transmit end, the value is 0. This field must be ignored
2022-07-08 2108
Feature Description
Field Description
SRv6 Service Data Sub- Used to advertise SRv6 SID attributes. The length is variable.
Sub-TLV Value
Type Description
Sub-TLV Length Length. If there is one sub-sub-TLV, the value is 25. If there is no sub-sub-TLV, the
value is 17.
In G-BIER, the sub-TLV can carry one or no sub-sub-TLV. Figure 7 shows the format of the sub-sub-TLV, and
Table 9 describes the fields in the sub-sub-TLV.
Type Description
2022-07-08 2109
Feature Description
Type Description
MSID Length Length of the MSID. It is recommended that the maximum value be less than or equal
to 20.
On the GTM over BIERv6 network, the ingress encapsulates the outer source IPv6 address of each BIERv6
packet as the Src.DT4 SID or Src.DT6 SID. This source address remains unchanged during transmission from
the BFIR to BFERs on the GTM over BIERv6 network.
2022-07-08 2110
Feature Description
1. After receiving a multicast packet of a user-side public network, the sender PE selects a PMSI tunnel
based on the C-Source IP and C-Group IP in the C-IP Header of the packet. This PE then inserts the
outer BIERv6 packet header (including a set ID and BitString) into the packet based on the tunnel
attribute, and sets its Src.DT4 SID or Src.DT6 SID to the outer source IPv6 address.
2. According to the BIERv6 forwarding process, the sender PE queries the BIFT, sets the destination
address of the packet to the End.BIER SID of the next-hop node, and forwards the multicast packet
through one or more matched outbound interfaces. For details about the BIERv6 forwarding process,
see BIERv6 Forwarding Plane Fundamentals.
3. Transit nodes on the GTM over BIERv6 network forward the packet according to the BIERv6
forwarding process. If some transit nodes do not support BIERv6, inter-AS static traversal or intra-AS
automatic traversal can be used to allow BIERv6 multicast traffic to traverse these nodes. For details
about inter-AS static traversal and intra-AS automatic traversal, see BIERv6 Inter-AS Static Traversal
2022-07-08 2111
Feature Description
4. After receiving a multicast packet, a receiver PE determines that it is a destination node of the BIERv6
packet according to the BitString. This PE removes the BIERv6 header and then forwards the packet
out of the GTM over BIERv6 network and into the user-side public network for forwarding.
1 PE1, PE2, and Complete basic network configurations, including BIERv6 network configurations.
PE3
3 PE1 Configure PE1 as a sender PE. In the GTM instance I-PMSI view, set the I-PMSI
tunnel type to BIER IPv6.
2022-07-08 2112
Feature Description
4 PE1 PE1 sends an Intra-AS I-PMSI A-D route to PE2 and PE3 through BGP peer
relationships. The route carries the following information:
MVPN RT: controls A-D route advertisement. The RT is set to the export MVPN
target configured on PE1. You can also choose not to set the RT.
PMSI tunnel attribute:
Tunnel Type: The tunnel type is BIER IPv6.
Sub-domain-id: sub-domain ID of PE1.
BFR-ID: BFR-ID of PE1.
BFR-prefix: BFR-prefix of PE1.
In G-BIER scenarios, in addition to MVPN RT and PMSI tunnel attribute
information, the routes advertised by PE1 carry the MSID attribute, which
contains the following information:
IPv6 Address: MVPN Src.DTX SID
Prefix Length: Src.DTX SID locator prefix length
MSID Length: Src.DTX SID locator (128 - prefix length – args length)
5 PE2 and PE3 After receiving the route from PE1, PE2 and PE3 reply with a Leaf A-D route. The
route carries the following information:
Sub-domain-id: sub-domain ID of PE2 or PE3. The value must be the same as the
sub-domain ID of PE1.
BFR-ID: BFR-ID of PE2 or PE3.
BFR-prefix: BFR-prefix of PE2 or PE3.
6 PE1 After receiving the routes from PE2 and PE3, PE1 records PE2 and PE3 as MVPN
members and sets their bit positions in the BIERv6 BitString corresponding to the
tunnel to 1. Consequently, PE2 and PE3 join the BIERv6 I-PMSI tunnel.
2022-07-08 2113
Feature Description
1 PE1 PE1 must be configured with an address pool range and criteria for switching
from an I-PMSI tunnel to an S-PMSI tunnel. This includes the multicast group
address pool, BSL, maximum number of S-PMSI tunnels that can be dynamically
established, forwarding rate threshold for the switching, and a delay for the
switching.
2 PE1 After PE1 receives a multicast packet of a user-side public network, it determines
the I-PMSI tunnel based on the (C-S, C-G) or (C-*, C-G) entry carried in the
packet. If the address pool range and criteria for the I-PMSI-to-S-PMSI tunnel
switching are met, PE1 automatically establishes an S-PMSI tunnel. If a delay for
the I-PMSI-to-S-PMSI tunnel switching has been set, multicast traffic is not
switched to the S-PMSI tunnel until the delay timer expires.
3 PE1 PE1 sends an S-PMSI A-D route carrying PMSI tunnel information to PE2 and
PE3. In the route, Leaf Information Require is set to 1, instructing PE2 and PE3
to reply with join information.
4 PE2 and PE3 After receiving the S-PMSI A-D route, PE2 and PE3 record the route locally and
check whether they have receivers on the connected user-side public network.
PE2 determines that it has receivers, and PE3 determines that it has no receivers.
5 PE2 Because PE2 has receivers on the connected user-side public network, it sends a
Leaf A-D route to PE1. The route carries the PTA.
2022-07-08 2114
Feature Description
6 PE1 After receiving the Leaf A-D route from PE2, PE1 records PE2 as a receiver PE
and sets the bit position of PE2 in the BIERv6 BitString to 1. PE2 then joins the
BIERv6 S-PMSI tunnel.
If PE3 receives requests from receivers on the connected user-side public network after a while, it sends a
Leaf A-D route to PE1. After receiving the route, PE1 updates the receiver PE set and generates a new BIERv6
BitString. PE3 then joins the BIERv6 S-PMSI tunnel.
Figure 1 Process of switching traffic from an S-PMSI tunnel back to the I-PMSI tunnel
Table 1 describes the process of switching traffic from an S-PMSI tunnel back to the I-PMSI tunnel.
Table 1 Description of the switchback from an S-PMSI tunnel to the I-PMSI tunnel
1 PE1 When PE1 detects that the forwarding rate of multicast traffic is lower than the
threshold, it starts the switchback timer. Before the timer expires:
If the multicast traffic forwarding rate increases above the threshold, it continues
2022-07-08 2115
Feature Description
2 PE1 PE1 sends an S-PMSI A-D route to PE2, instructing PE2 to withdraw the bindings
between multicast entries and the S-PMSI tunnel.
3 PE2 After receiving the S-PMSI A-D route, PE2 withdraws the bindings between
multicast entries and the S-PMSI tunnel, and then replies to PE1 with a Leaf A-D
route.
4 PE2 PE1 deletes the S-PMSI tunnel after waiting for a specified period of time.
2022-07-08 2116
Feature Description
2022-07-08 2117
Feature Description
1. PE1 must have ASBR2's End.BIER SID manually set as the next hop for the multicast packets destined
for PE2 (with BFR-ID 2) and PE3 (with BFR-ID 3).
2. PE1 generates a static BIRT based on the preceding configuration. The static BIRT contains only the
BFR-IDs and the specified next-hop End.BIER SID, without BFR-prefixes.
3. PE1 generates a BIFT based on the static BIRT according to the standard BIERv6 BIFT generation
process.
4. Because the BFR-neighbor in the BIFT is ASBR2, PE1 encapsulates the End.BIER SID of ASBR2 as the
destination address of packet copies and forwards these copies based on the BitString and BIFT
according to the standard BIERv6 forwarding process.
5. After receiving the packet copies, ASBR1 reads their destination address and forwards them to ASBR2
according to the native IPv6 forwarding process.
2022-07-08 2118
Feature Description
If some BIERv6-incapable nodes exist on a single-AS BIERv6 network, the BIERv6-capable nodes can generate
forwarding entries through the underlay. Multicast packets can automatically traverse the BIERv6-incapable
nodes as native IPv6 packets, requiring no additional configurations.
Figure 3 shows the forwarding process, in which P2 is BIERv6-incapable, and its upstream node (P1) is
BIERv6-capable. This example assumes that BFIR-to-BFER is the downstream direction.
1. P1 and all other BIERv6-capable nodes generate their BIRTs by flooding packets with the TLVs defined
in IS-ISv6 for BIERv6. The next hop from P1 to the node with BFR-ID 2 (PE2) is P2. Because P2 is
BIERv6-incapable, P1 converts the BFR-neighbor to PE2 into the indirectly connected BFR-neighbor PE2
(destination node). Similarly, P1 converts the BFR-neighbor to PE3 into the indirectly connected BFR-
neighbor PE3.
2. P1 generates a BIFT based on the BIRT according to the standard BIERv6 BIFT generation process.
3. Because the BFR-neighbors (PE2 and PE3) in the BIFT are indirectly connected to P1, P1 encapsulates
the End.BIER SID of PE2 or PE3 as the destination address of each packet copy based on the BIFT. P1
then forwards the packet copies based on the BitString and BIFT according to the standard BIERv6
forwarding process.
2022-07-08 2119
Feature Description
4. After receiving the two packet copies, P2 reads their destination addresses and forwards them
according to the native IPv6 forwarding process: one copy to PE2, and the other to PE3.
Solution Overview
In an MVPN over BIERv6 scenario, if a node or link fails, multicast services can be restored only after BGP
peer convergence is complete. However, such convergence takes a long time and therefore cannot meet the
high reliability requirements of multicast services. To speed up convergence, you can use BFD for BGP. You
can also deploy the dual-root 1+1 protection solution, which further improves the performance of multicast
service convergence.
The networking shown in Figure 1 is used as an example to describe how the MVPN over BIERv6 dual-root
1+1 protection solution is deployed.
1. Two sender PEs (Root1 and Root2) are deployed. Root1 and Root2 each set up a PMSI tunnel with
themselves as the BFIRs. PE1 is a leaf node on the two tunnels.
2. VPN fast reroute (FRR) is configured on PE1 so that PE1 has two routes to the same multicast source.
In this example, PE1 selects the route advertised by Root1 as the primary route, and the one
advertised by Root2 as the backup route.
3. Flow detection-based C-multicast FRR is configured on PE1. When the links function correctly, both
the primary and backup tunnels forward the same multicast traffic. In this case, PE1 accepts the traffic
received through the primary tunnel (Root1 is the BFIR) and discards the traffic received through the
backup tunnel (Root2 is the BFIR).
1. For the multicast traffic PE1 receives from the multicast source through the primary tunnel, PE1
forwards it to the corresponding VPN. PE1 discards the multicast traffic it receives through the backup
2022-07-08 2120
Feature Description
tunnel.
2. If PE1 detects an interruption of traffic received through the primary tunnel, it immediately checks
whether the traffic received through the backup tunnel is normal. If the traffic received through the
backup tunnel is normal, PE1 performs a primary/backup tunnel switchover and forwards the traffic
received through the original backup tunnel to the corresponding VPN.
To ensure that failures do not lead to prolonged interruption of multicast services, you are advised to plan
separate paths for the primary and backup tunnels.
• BIERv6 BIRT
• BIERv6 traffic statistics, including the number of forwarded packets, number of bytes in packets,
inbound interface information, and traffic rates within each 15s.
BIERv6 Ping
The BIERv6 ping function is used to check the connectivity of the BIERv6 network and check whether the
network between the BFIR and one or more BFERs is normal. The process is described as follows:
1. A BIERv6 ping test is initiated on the BFIR, with BFR-IDs of one or more BFERs specified as a
parameter. In this case, the BFIR constructs a BIERv6 ping request packet, sets the End.BIER SID of the
next-hop device to the outer IPv6 destination address of the packet, and encapsulates the BFR-IDs of
the BFERs as a BitString into the inner BIERv6 packet header.
2. The BIERv6 ping request packet is forwarded on the network according to the BIERv6 forwarding
process.
3. After receiving the ping request packet, the BFERs each respond to the BFIR with a reply packet.
4. The BFIR summarizes the reply packets and displays information, including the reachability of a BFER
and performance indicators such as the packet loss rate and delay. If no reply packet is received from
a BFER within the timeout period, the result of the BFER's network connectivity check is Timeout.
2022-07-08 2121
Feature Description
Service Description
IP video traffic accounts for about 80% of traffic on an IP network. This is comprised of traffic for live TV
(news, sports events, movies, TV series, and live webcasting), VOD, video surveillance, and other types of
video broadcasts. Currently, IPTV services consist of mainly live TV and VOD services, lacking rich value-
added IPTV applications. With the advent of the 5G and cloud era, there will be explosive growth in video
transmission service applications such as 4K IPTV, 8K VR, smart city, smart home, autonomous driving,
telemedicine, and safe city, especially in countries and regions with fast economic development.
MVPN over BIERv6 can be deployed on the public network to carry IPTV traffic. In addition to greatly
reducing the network load and improving user experience (for example, delivering fast VOD, clear images,
and smooth playback), BIERv6 multicast technology also simplifies deployment, O&M, and capacity
expansion. This makes BIERv6 an ideal choice for large-scale deployment.
Network Description
In Figure 1, MVPN over BIERv6 is deployed on the carrier's IP backbone network, PIM or BIERv6 inter-AS
static traversal is deployed on the IP metro network, and PIM is deployed in the VPN where the IPTV video
source resides.
Figure 1 Networking of MVPN over BIERv6 for IPTV and MVPN services
Feature Deployment
When BIERv6 is used to carry multicast VPN services, the following features need to be deployed:
■ Deploy MVPN so that MVPN A-D routes are used to establish BIERv6 PMSI tunnels and C-multicast
routes are used to transmit PIM Join/Prune messages received from a VPN.
2022-07-08 2122
Feature Description
Terms
Term Definition
BIERv6 sub-domain Sub-domain of a BIERv6 domain. A BIERv6 domain can be divided into multiple
BIERv6 sub-domains.
BFR-ID ID of a BFR.
(C-S, C-G) PIM routing entry. C, S, and G are short for customer, multicast source, and
multicast group, respectively. (C-S, C-G) multicast traffic is sent to all hosts that
have joined the C-G multicast group and request data sent from the multicast
source address C-S.
(C-*, C-G) PIM routing entry. The asterisk (*) indicates any source, and C and G are short for
customer and multicast group. (C-*, C-G) multicast traffic is sent to all hosts that
have joined the C-G multicast group and have no requirements for a specific
multicast source address.
2022-07-08 2123
Feature Description
BFR-ID BFR-identifier
Definition
MLD manages IPv6 multicast members. MLD sets up and maintains member relationships between IPv6
hosts and the multicast Router to which the hosts are directly connected.
MLD has two versions: MLDv1 and MLDv2. Both MLD versions support the ASM model. MLDv2 supports the
SSM model independently, while MLDv1 needs to work with SSM mapping to support the SSM model.
MLD applies to IPv6 and provides the similar functions as the IGMP for IPv4. MLDv1 is similar to IGMPv2,
and MLDv2 is similar to IGMPv3.
Some features of MLD and IGMP are implemented in the same manner. The following common features of
MLD and IGMP are not mentioned:
2022-07-08 2124
Feature Description
• MLD Prompt-Leave
• MLD static-group
• MLD group-policy
This section describes MLD principles and unique features of MLD, including the MLD querier election
mechanism and MLD group compatibility.
Configuring an ACL filtering rule is mandatory for source address-based MLD message filtering, while is
optional for source address-based IGMP message filtering.
Purpose
MLD allows hosts to dynamically join IPv6 multicast groups and manages multicast group members. MLD is
configured on the multicast Router to which hosts are directly connected.
• When a host (for example, Host A) receives a Multicast Listener Query message of group G, the
processing flow is as follows:
2022-07-08 2125
Feature Description
If Host A is already a member of group G, Host A replies with a Multicast Listener Report message of
group G at a random time point within the response period specified by Device A.
After receiving the Multicast Listener Report message, Device A records information about group G and
forwards the multicast data to the network segment of the host interface that is directly connected to
Device A. Meanwhile, Device A starts a timer for group G or resets the timer if it has been started. If no
members of group G respond to Device A within the interval specified by the timer, Device A stops
forwarding the multicast data of group G.
If Host A is not a member of any multicast group, Host A does not respond to the Multicast Listener
Query message from Device A.
• When a host (for example, Host A) joins a multicast group G, the processing flow is as follows:
Host A sends a Multicast Listener Report message of group G to Device A, instructing Device A to
update its multicast group information. Subsequent Multicast Listener Report messages of group G are
triggered by Multicast Listener Query messages sent by Device A.
• When a host (for example, Host A) leaves a multicast group G, the processing flow is as follows:
Host A sends a Multicast Listener Done message of group G to Device A. After receiving the Multicast
Listener Done message, Device A triggers a query on group G to check whether group G has other
receivers. If Device A does not receive Multicast Listener Report messages of group G within the period
specified by the query message, Device A deletes the information about group G, and stops forwarding
the multicast traffic of group G.
2022-07-08 2126
Feature Description
Both MLD queriers and non-queriers can process Multicast Listener Report messages, while only queriers are responsible
for forwarding Multicast Listener Query messages. MLD non-queriers cannot process Multicast Listener Done messages
of MLDv1.
• MODE_IS_INCLUDE: indicates that the corresponding mode between a group and its source list is
Include. That is, hosts receive the data sent by a source in the source-specific list to the group.
• MODE_IS_EXCLUDE: indicates that the corresponding mode between a group and its source list is
Exclude. That is, hosts receive the data sent by a source that is not in the source-specific list to the
group.
• CHANGE_TO_INCLUDE_MODE: indicates that the corresponding mode between a group and its source
list changes from Exclude to Include. If the source-specific list is empty, the hosts leave the group.
• CHANGE_TO_EXCLUDE_MODE: indicates that the corresponding mode between a group and its source
list changes from Include to Exclude.
• ALLOW_NEW_SOURCES: indicates that a host still wants to receive data from certain multicast sources.
If the current relationship is Include, certain sources are added to the current source list. If the current
relationship is Exclude, certain sources are deleted from the current source list.
• BLOCK_OLD_SOURCES: indicates that a host does not want to receive data from certain multicast
sources any longer. If the current relationship is Include, certain sources are deleted from the current
source list. If the current relationship is Exclude, certain sources are added to the current source list.
On the Router side, the querier sends Multicast Listener Query messages and receives Multicast Listener
Report. In this manner, the Router can identify which multicast group on the network segment contains
receivers, and then forwards the multicast data to the network segment accordingly. In MLDv2, records of
multicast groups can be filtered in either Include mode or Exclude mode.
• In Include mode:
■ The multicast source in the activated state requires the Router to forward its data.
2022-07-08 2127
Feature Description
■ The multicast source in the deactivated state is deleted by the Router and data forwarding for the
multicast source is ceased.
• In Exclude mode:
■ The multicast source in the activated state is in the collision domain. That is, no matter whether
hosts on the same network segment of the Router interface require the data of the multicast
source, the data is forwarded.
■ Data of the multicast source that is not recorded in the multicast group should be forwarded.
• Querier
A querier is responsible for sending Multicast Listener Query messages to hosts and receiving Multicast
Listener Report and Multicast Listener Done messages from hosts. A querier can then learn which
multicast group has receivers on a specified network segment.
• Non-querier
A non-querier only receives Multicast Listener Report messages from hosts to learn which multicast
group has receivers. Then, based on the querier's action, the non-querier identifies which receivers leave
multicast groups.
Generally, a network segment has only one querier. Multicast devices follow the same principle to select a
querier. The process is as follows (using DeviceA, DeviceB, and DeviceC as examples):
• After MLD is enabled on DeviceA, DeviceA considers itself a querier in the startup process by default
and sends Multicast Listener Query messages on the network segment. If DeviceA receives a Multicast
Listener Query message from DeviceB that has a lower link-local address, DeviceA changes from a
2022-07-08 2128
Feature Description
querier to a non-querier. DeviceA starts the another-querier-existing timer and records DeviceB as the
querier of the network segment.
• If DeviceA is a non-querier and receives a Multicast Listener Query message from DeviceB in the querier
state, DeviceA updates another-querier-existing timer; if the received Multicast Listener Query message
is sent from DeviceC whose link-local address is lower than that of DeviceB in the querier state, DeviceA
records DeviceC as the querier of the network segment and updates the another-querier-existing timer.
• If DeviceA is a non-querier and the another-querier-existing timer expires, DeviceA changes to a querier.
In this document version, querier election can be implemented only among multicast devices that run the same MLD
version on a network segment.
• Querier
A querier is responsible for sending Multicast Listener Query messages to hosts and receiving Multicast
Listener Report and Multicast Listener Done messages from hosts. A querier can then learn which
multicast group has receivers on a specified network segment.
• Non-querier
A non-querier only receives Multicast Listener Report messages from hosts to learn which multicast
group has receivers. Then, based on the querier's action, the non-querier identifies which receivers leave
multicast groups.
Generally, a network segment has only one querier. Multicast devices follow the same principle to select a
querier. The process is as follows (using DeviceA, DeviceB, and DeviceC as examples):
• After MLD is enabled on DeviceA, DeviceA considers itself a querier in the startup process by default
and sends Multicast Listener Query messages on the network segment. If DeviceA receives a Multicast
2022-07-08 2129
Feature Description
Listener Query message from DeviceB that has a lower link-local address, DeviceA changes from a
querier to a non-querier. DeviceA starts the another-querier-existing timer and records DeviceB as the
querier of the network segment.
• If DeviceA is a non-querier and receives a Multicast Listener Query message from DeviceB in the querier
state, DeviceA updates another-querier-existing timer; if the received Multicast Listener Query message
is sent from DeviceC whose link-local address is lower than that of DeviceB in the querier state, DeviceA
records DeviceC as the querier of the network segment and updates the another-querier-existing timer.
• If DeviceA is a non-querier and the another-querier-existing timer expires, DeviceA changes to a querier.
In this document version, querier election can be implemented only among multicast devices that run the same MLD
version on a network segment.
Background
When a multicast device is directly connected to user hosts, the multicast device sends MLD Query messages
to and receives MLD Report and Done messages from the user hosts to identify the multicast groups that
have attached receivers on the shared network segment.
The device directly connected to a multicast device, however, may not be a host but an MLD proxy-capable
access device to which hosts are connected. If you configure only MLD on the multicast device, access device,
and hosts, the multicast and access devices need to exchange a large number of packets.
To resolve this problem, enable MLD on-demand on the multicast device. The multicast device sends only
one general query message to the access device. After receiving the general query message, the access
device sends the collected Join and Leave status of multicast groups to the multicast device. The multicast
device uses the Join and Leave status of the multicast groups to maintain multicast group memberships on
the local network segment.
Benefits
MLD on-demand reduces packet exchanges between a multicast device and its connected access device and
reduces the loads of these devices.
Related Concepts
MLD on-demand
MLD on-demand enables a multicast device to send only one MLD general query message to its connected
access device (MLD proxy-capable) and to use Join/Leave status of multicast groups reported by its
2022-07-08 2130
Feature Description
Implementation
When a multicast device is directly connected to user hosts, the multicast device sends MLD Query messages
to and receives MLD Report and Done messages from the user hosts to identify the multicast groups that
have attached receivers on the shared network segment. The device directly connected to the multicast
device, however, may be not a host but an MLD proxy-capable access device, as shown in Figure 1.
The provider edge (PE) is a multicast device. The customer edge (CE) is an access device.
• On the network a shown in Figure 1, if MLD on-demand is not enabled on the PE, the PE sends a large
number of MLD Query messages to the CE, and the CE sends a large number of Report and Done
messages to the PE. As a result, lots of PE and CE resources are consumed.
• On the network b shown in Figure 1, after MLD on-demand is enabled on the PE, the PE sends only one
general query message to the CE. After receiving the general query message from the PE, the CE sends
the collected Join and Leave status of MLD groups to the PE. The CE sends a Report or Done message
for a group to the PE only when the Join or Leave status of the group changes. To be specific, the CE
sends an MLD Report message for a multicast group to the PE only when the first user joins the
multicast group and sends a Done message only when the last user leaves the multicast group.
After you enable MLD on-demand on a multicast device connected to an MLD proxy-capable access device, the multicast
device implements MLD in a different way as it implements standard MLD in the following aspects:
2022-07-08 2131
Feature Description
• The records on dynamically joined multicast groups on the multicast device interface connected to the access
device do not time out.
• The multicast device interface connected to the access device sends only one MLD general query message to the
access device.
• The multicast device interface connected to the access device directly deletes the entry for a group after it receives
an MLD Done message for the group.
An MLDv1 message contains An MLDv2 message contains MLDv2 allows hosts to select
multicast group information, both multicast group and source multicast sources, while MLDv1 does
but does not contain multicast information. not.
source information.
An MLDv1 message contains the An MLDv2 message contains MLDv2 reduces the number of MLD
record of only one multicast records of multiple multicast messages on a network segment.
group. groups.
The Multicast Listener Query The Multicast Listener Query MLDv2 ensures better multicast
messages of a specified messages and Multicast Listener information consistency between a
multicast group cannot be Query messages of a specified non-querier and a querier.
retransmitted. multicast source/group can be
retransmitted.
2022-07-08 2132
Feature Description
HostA is a receiver of leaf network N1, and HostC is a receiver of leaf network N2. DeviceA, DeviceB, and
DeviceC are directly connected to hosts. MLDv1 is configured on Port 1 of DeviceA. That is, leaf network N1
runs MLDv1. MLDv2 is configured on Port 1 of DeviceB and DeviceC. That is, leaf network N2 runs MLDv2.
MLD running on the multicast devices on the same network segment must be of the same version.
Definition
User-side multicast enables a BRAS to identify users of a multicast program and helps carriers better
manage and control online users.
In Figure 1, when the set top box (STB) and phone users go online, they send Multicast Listener Discovery
(MLD) Report messages of a multicast program to the BRAS. After receiving the messages, the BRAS
identifies the users and sends a Protocol Independent Multicast (PIM) Join message to the network-side
rendezvous point (RP) or the source's designated router (DR). The RP or source's DR creates multicast
forwarding entries for the users and receives the required multicast traffic from the source. The BRAS finally
sends the multicast traffic to the STB and phone users based on their forwarding entries and replication
modes. The multicast replication in this example is based on sessions.
Now user-side multicast supports IPv4 and IPv6. For IPv4 users, user-side multicast applies to both private and public
networks. For IPv6 users, user-side multicast applies only to the public network.
On Layer 2, user-side multicast supports the PPPoE and IPoE access modes for common users and the IPoE access mode
for Layer 2 leased line users.
2022-07-08 2133
Feature Description
Objective
Because conventional multicast does not provide a method to identify users, carriers cannot effectively
manage multicast users who access services such as Internet Protocol television (IPTV). Such users can join
multicast groups, without notification, by sending Internet Group Management Protocol (IGMP) Report
messages. To identify these users and allow for improved management of them, Huawei provides the user-
side multicast feature.
With user-side multicast, the BRAS can identify users in a multicast group and implement refined user
service control and management.
Benefits
User-side multicast can identify users and the programs they join or leave for carriers to better manage and
control online users.
11.13.2.1 Overview
Table 1 describes multicast service processes.
2022-07-08 2134
Feature Description
to join.
Related Concepts
Multicast program
A multicast program is an IPTV channel or program and is identified by a multicast source address and a
multicast group.
Access mode
In user-side multicast, only the Point-to-Point Protocol over Ethernet (PPPoE) access and IP over Ethernet
(IPoE) access modes are supported, and only session-based replication is supported.
• PPPoE access mode: allows a remote access device to provide access services for hosts on Ethernet
networks and to implement user access control and accounting. PPPoE is a link layer protocol that
transmits PPP datagrams through PPP sessions established over point-to-point connections on Ethernet
networks.
• IPoE access mode: allows the BRAS to perform authentication and authorization on users and user
services based on the physical or logical user information, such as the MAC address, VLAN ID, and
2022-07-08 2135
Feature Description
Option 82, carried in IPoE packets. In IPv4 network access where a user terminal connects to an
Ethernet interface of a BRAS through a Layer 2 device, the user IP packets are encapsulated into IPoE
packets by the user Ethernet interface before they are transmitted to the BRAS through the Layer 2
device.
Table 2 Differences between PPPoE access users and IPoE access users in user-side multicast
PPPoE Multicast traffic and IGMP messages exchanged Multicast replication by interface +
access IGMP messages between a user and a BRAS are all VLAN is not supported for the
mode exchanged between a unicast messages. PPPoE access mode.
user and a BRAS are Multicast traffic that a BRAS
encapsulated using replicates to a user is sent in
PPPoE. unicast PPPoE packets.
Table 3 describes the multicast traffic replication modes on BAS interfaces of BRAS devices.
Session- BRAS. The BRAS The downstream Layer 2 Users who fail in
based The BRAS is replicates device of the BRAS is not authentication cannot join
multicast used as the multicast traffic capable of IGMP multicast programs,
replication multicast to each session. Snooping. which allows for
2022-07-08 2136
Feature Description
Multicast BRAS' The BRAS IGMP Report messages The burden on the BRAS
replication by Downstream replicates carry VLAN tags and to replicate multicast
interface + Layer 2 device. multicast traffic multicast traffic traffic is alleviated and
VLAN This device is by interface + forwarding across VLANs the bandwidth usage is
capable of IGMP VLAN to users is not required. reduced.
snooping. In aggregated
other words, it is based on their
capable of VLANs. For users
multicast on the same
replication. VLAN who go
online through
the same
interface and
join the same
multicast
program, the
BRAS replicates
only one copy of
the multicast
traffic to the
downstream
Layer 2 device.
Then the Layer 2
device replicates
the multicast
traffic to the
users.
Multicast BRAS' Users first join IGMP Report messages The burden on the BRAS
replication by downstream multicast VLANs carry VLAN tags and to replicate multicast
VLAN Layer 2 device. and then BRAS multicast traffic traffic is alleviated and
2022-07-08 2137
Feature Description
Replication BRAS' The BRAS By default, multicast The burden on the BRAS
by interface Downstream replicates replication by interface is to replicate multicast
Layer 2 devices. multicast traffic enabled. traffic is alleviated and
This device is based on the bandwidth usage is
capable of IGMP interfaces and reduced.
snooping. In the downstream
other words, it is Layer 2 device
capable of replicates the
multicast received
replication. multicast traffic
based on
sessions. It is a
2022-07-08 2138
Feature Description
special case of
multicast
replication by
VLAN, which is
enabled by
setting the VLAN
value to 0.
If all of the preceding multicast replication modes are configured, the priority is as follows in descending order:
replication by interface + VLAN, session-based replication, replication by multicast VLAN, and replication by interface.
In addition to multicast data packets replication, IGMP Query messages are sent based on the preceding
multicast replication modes.
Session-based multicast replication is used in the following illustration of the multicast program join process. Multicast
program join processes of other multicast replication modes are similar to that of session-based multicast replication.
2022-07-08 2139
Feature Description
Accessing the Internet through PPPoE or IPoE is a prerequisite for users to join multicast programs. Figure 2
illustrates the procedures of multicast program join, and Table 1 describes each procedure.
STB To join a multicast program after going online, an STB sends to an IGMP-
capable BRAS an IGMP Report message of a multicast program. Upon
receipt of the message, the BRAS identifies the user and the multicast
program that the user wants to join.
BRAS The BRAS creates a multicast forwarding entry for the STB. In this entry,
the downstream interface is the interface that connects to the STB. If it is
the first time that a BRAS creates a multicast forwarding entry for the
STB, the BRAS sends a Protocol Independent Multicast (PIM) Join
2022-07-08 2140
Feature Description
RP/source'sAfter receiving the PIM Join message, the RP or the source's DR generates
DR a multicast forwarding entry for the STB. In this entry, the downstream
interface is the interface that receives the PIM Join message. Then, the
STB successfully joins the multicast group, and the RP or source's DR can
send the multicast traffic to the STB.
Source The multicast source sends multicast traffic to the RP or the source's DR.
BRAS The BRAS replicates the multicast traffic it receives to the STB by session
based on the multicast forwarding entry. The STB user can then watch
the program.
BRAS To determine whether any members remain in the multicast group, the
BRAS periodically sends an IGMP Query message to the STB. If no
members remain, the BRAS tears down the group.
STB Upon receipt of the IGMP Query message, the STB responds with an
IGMP Report message to keep the multicast program active.
Session-based multicast replication is used in the following illustration of the multicast program leave process. Multicast
program leave processes of other multicast replication modes are similar to that of session-based multicast replication.
2022-07-08 2141
Feature Description
BRAS The BRAS sends an IGMP Query message to members in the multicast
group specified in the IGMP Leave message it received. (If IGMP Prompt-
Leave is configured, this step is skipped.)
BRAS The BRAS deletes the multicast forwarding entry of the STB user only if
there are other members in the same multicast group. (If IGMP Prompt-
Leave is configured, this step is skipped.)
NOTE:
If the STB user is not a member of any multicast group, the BRAS stops sending
IGMP Query messages to the STB user after the robustness variable value is
reached.
BRAS The BRAS stops sending to the STB the multicast traffic of the
corresponding multicast group it joined.
BRAS If there is no member in the multicast group after the STB user leaves, the
BRAS sends a PIM Graft message to the RP or source's DR to stop the
multicast traffic replication to the group.
2022-07-08 2142
Feature Description
Session-based multicast replication is used in the following illustration of multicast program leave of all multicast groups
by going offline. Multicast program leave of all multicast groups by going offline processes of other multicast replication
modes are similar to that of session-based multicast replication.
Figure 1 Process of multicast program leave of all multicast groups by going offline
Table 1 Key actions in each step of multicast program leave of all multicast groups by going offline
STB When a PPPoE or IPoE STB user goes offline, the user leaves all the
multicast programs it joined without sending IGMP Leave messages.
BRAS The BRAS searches for the multicast programs that the STB user joined and
removes all multicast entries of the STB user.
BRAS The BRAS stops the multicast traffic replication to the STB user.
BRAS The BRAS stops periodically sending the IGMP Query message to the offline
STB user.
BRAS If the offline STB user was the only member of the multicast program it
joined on the BRAS, the BRAS sends a PIM Graft message to the rendezvous
point (RP) or the source's designated router (DR). Upon receipt of the
2022-07-08 2143
Feature Description
message, the multicast source determines that the multicast data of this
program is no longer required.
Overview
User-side call admission control (CAC) is a bandwidth management and control method used to guarantee
multicast service quality of online users.
A conventional quality-guarantee mechanism is to limit the maximum number of multicast groups that
users can join. With this mechanism, a BRAS checks whether the maximum number of multicast groups has
been exceeded after receiving a Join message from a user. If the maximum number has been exceeded, the
device drops the Join message and denies the user request. This mechanism alone, however, has become
incompetent due to the continuous increase of IPTV program varieties. A high upper limit may prevent the
device from denying many join requests but cannot prevent the device from dropping messages due to
limited bandwidth resources on interfaces.
User-side multicast CAC addresses these issues by enabling a BRAS to limit bandwidth for users.
User-side multicast CAC enables a BRAS to check the bandwidth limit and deny user requests if the limit has
been exceeded.
User-side multicast CAC can be implemented for users in a specific domain and on a specific interface. It
works with the multicast group limit function to implement the following functions:
• User-level bandwidth limit: A bandwidth limit can be set for each user in a specific user access domain,
and new service requests of a user are denied when the bandwidth consumed by the user exceeds the
bandwidth limit.
• Interface-level bandwidth limit: A bandwidth limit can be set for a user access interface, and new service
requests are denied when the consumed bandwidth exceeds the bandwidth limit.
Principles
Figure 1 shows the working principles of user-side multicast CAC in a process of going online.
2022-07-08 2144
Feature Description
• The STB and phone users send IGMP Report messages to request for multicast services.
• The BRAS receives the IGMP Report messages and checks the bandwidth limits configured for the user
access domain and interface.
■ If the remaining bandwidth resources are insufficient for the users, the BRAS discards the IGMP
Report message and denies the service requests.
Purpose
Limiting the maximum number of multicast groups cannot guarantee service quality any more due to the
increase of IPTV service varieties and the big bandwidth requirement difference among multicast channels.
Therefore, user-side multicast CAC was introduced to prevent bandwidth resources from being exhausted,
thus guaranteeing the IPTV service quality of online users.
Benefits
User-side multicast CAC brings the following benefits:
2022-07-08 2145
Feature Description
• Allows the denial of new users when a mass of multicast channels are requested and bandwidth
resources are insufficient.
Service Description
Because conventional multicast does not provide a method to identify users, carriers cannot effectively
manage multicast users who access services such as Internet Protocol television (IPTV). Such users can join
multicast groups, without notification, by sending Internet Group Management Protocol (IGMP) Report
messages.
To identify these users and allow for improved management of them, Huawei provides the user-side
multicast feature.
Networking Description
In Figure 1, a set top box (STB) user initiates a dial-up connection through Point-to-Point Protocol over
Ethernet (PPPoE) to the broadband remote access server (BRAS). The BRAS then assigns an IPv4 address to
the user for Internet access. To join a multicast program, the user sends an IGMP Report message to the
BRAS, and the BRAS creates a multicast forwarding entry for the user. In this entry, the downstream
interface is the interface that connects to the user. After the entry is created, the BRAS sends a Protocol
Independent Multicast (PIM) Join message to the network-side rendezvous point (RP) or the source's
designated router (DR). Upon receipt of this message, the RP or source's DR sends to the BRAS the multicast
traffic of the program that the user wants to join. The BRAS then replicates and sends the multicast traffic to
the user based on the multicast forwarding entry.
2022-07-08 2146
Feature Description
Feature Deployment
Deployment for the user-side multicast feature is as follows:
• Configure an IPv4 address pool on the BRAS to assign IPv4 addresses to online users.
2. Bind a VT to an interface.
3. Bind the sub-interface to the virtual local area network (VLAN) if users are connected to the sub-
interface. (For users connected to the main interface, skip this step.)
4. Configure a broadband access server (BAS) interface and specify a user access type for the
interface. (The BAS interface can be a main interface, a common sub-interface, or a QinQ sub-
interface.)
• Configure basic multicast functions on the BRAS and on the RP or source's DR.
2. Enable Protocol Independent Multicast-Sparse Mode (PIM-SM) on BRAS interfaces and on the RP
or source's DR interfaces.
• Configure a multicast replication mode on a BAS interface. By default, multicast replication by interface
is configured. You can choose to configure one of the following multicast replication modes:
Service Description
Because conventional multicast does not provide a method to identify users, carriers cannot effectively
manage multicast users who access services such as Internet Protocol television (IPTV). Such users can join
multicast groups, without notification, by sending Internet Group Management Protocol (IGMP) Report
messages.
To identify these users and allow for improved management of them, Huawei provides the user-side
multicast feature.
2022-07-08 2147
Feature Description
Networking Description
In Figure 1, a set top box (STB) user connects to the BRAS through IPoE. (Using IPoE, the user does not need
to initiate a dial-up connection, and so no client software is required.) The BRAS then assigns an IPv4
address to the user for Internet access. To join a multicast program, the user sends an IGMP Report message
to the BRAS. The BRAS then creates a multicast forwarding entry and establishes an outbound interface for
the user. After the entry is created, the BRAS sends a PIM Join message to the network-side RP or the
source's DR. Upon receipt of this message, the RP or source's DR sends to the BRAS the multicast data of the
program that the user wants to join. The BRAS then replicates and sends the multicast data to the user
based on the multicast forwarding entry.
Feature Deployment
Deployment for the user-side multicast feature is as follows:
• Configure an IPv4 address pool on the BRAS to assign IPv4 addresses to online users.
2. Bind the sub-interface to the virtual local area network (VLAN) if users are connected to the sub-
interface. (For users connected to the main interface, skip this step.)
3. Configure a broadband access server (BAS) interface and specify a user access type for the
interface. (The BAS interface can be a main interface, a common sub-interface, or a QinQ sub-
interface.)
• Configure basic multicast functions on the BRAS and on the RP or source's DR.
2022-07-08 2148
Feature Description
2. Enable Protocol Independent Multicast-Sparse Mode (PIM-SM) on BRAS interfaces and on the RP
or source's DR interfaces.
• Configure a multicast replication mode on a BAS interface. By default, multicast replication by interface
is configured. You can choose to configure one of the following multicast replication modes:
Service Overview
User-side multicast VPN enables a BRAS to identify users of a multicast program, which allows for improved
management of them.
Networking Description
As shown in Figure 1, the STB user and the multicast source belong to the same VPN instance, which is a
prerequisite for users to join programs of the multicast source on the VPN that they belong to. To join a
multicast program after accessing the Layer 3 VPN, the STB user sends and IGMP Report message to the
BRAS. Upon receipt of the IGMP Report message, the BRAS identifies the domain and private VPN instance
of the STB user. Then the BRAS creates the multicast entry for the STB user in the corresponding VPN
instance and sends the PIM Join message to the network-side multicast source or RP for the multicast traffic.
As the final step, the BRAS replicates the multicast traffic to the STB user based on different multicast
replication modes.
Feature Deployment
Deployment for the user-side multicast VPN is as follows:
2022-07-08 2149
Feature Description
• Configure an IPv4 address pool on the BRAS to assign IPv4 addresses to online users.
• Configure a multicast replication mode on a BAS interface. By default, multicast replication by interface
is configured. You can choose to configure one of the following multicast replication modes:
• Bind a VPN instance of the specified multicast service to the main interface on a BRAS.
Definition
Multicast network address translation (NAT) translates the source IP address, destination IP address, and
destination port number (subsequently referred to as characteristics) in multicast streams. Multicast NAT
allows you to configure traffic policies on inbound interfaces to match input multicast streams. It also allows
you to configure translation rules on outbound interfaces, so that a multicast stream can be replicated to
multiple outbound interfaces and multicast stream characteristics can be modified according to the rules. On
the network shown in Figure 1, after multicast NAT is deployed on DeviceB, DeviceB performs the following
operations: uses a traffic policy to match the input multicast stream StreamIn, translates StreamIn's
characteristics, and outputs the post-translation multicast streams StreamOut1 and StreamOut2.
2022-07-08 2150
Feature Description
Purpose
On the network shown in Figure 1, users 1 and 2 receive the input multicast stream StreamIn from different
multicast groups. However, traditional multicast technologies cannot meet the requirement for sending the
same multicast stream to different multicast groups. To resolve this issue, deploy multicast NAT on DeviceB
so that DeviceB can translate StreamIn's characteristics and output the stream to users 1 and 2.
Benefits
Multicast NAT offers the following benefits:
• Multicast stream characteristics can be translated so that different downstream users can receive
multicast streams.
• The matrixes of multicast streams can be conveniently switched to replace traditional serial digital
interface (SDI) switching matrixes.
2022-07-08 2151
Feature Description
1. The multicast NAT device (DeviceB) translates the input multicast stream StreamIn into one or more
output multicast streams.
2. The characteristics of the output multicast streams can be the same as or different from those of the
input multicast stream.
NOTE:
1. Traffic policies are applied to the inbound interface (Interface1) to match the source MAC address,
2022-07-08 2152
Feature Description
source IP address, destination IP address, source UDP port number, and destination UDP port number
of StreamIn. The traffic behavior is to associate the stream with a multicast NAT instance. The
mapping between StreamIn and the multicast NAT instance is established based on the traffic policies.
2. You can configure a multicast stream translation rule on each outbound interface (Interface2 and
Interface3) for them to translate some characteristics of output streams and bind the streams to a
multicast NAT instance. The input and output multicast streams can be associated through a multicast
instance.
3. Each multicast NAT instance can be bound to multiple multicast NAT outbound interfaces. This allows
one input multicast stream to be replicated to multiple outbound interfaces. The characteristics of
output multicast streams may be the same as or different from those of the input multicast stream.
1. Input multicast stream 1 is input to the multicast NAT device (DeviceB) and is translated into output
multicast stream 1. The characteristics of the multicast stream can change or not.
2. Input multicast stream 2 is input to the multicast NAT device (DeviceB) but is not translated into an
output multicast stream. DeviceB receives both input multicast streams 1 and 2, but outputs only
output multicast stream 1.
3. After receiving a clean switching instruction from the controller, DeviceB switches multicast stream 1
to multicast stream 2. During the switching, the media gateway detects that no excess packets are
received or no packet loss occurs through the stream characteristics. Receiver 1 does not detect erratic
display or frame freezing.
2022-07-08 2153
Feature Description
switching between video sources output by multiple cameras, or switching between video sources of multiple
programs. The basic requirements of video switching are frame precision, clean switching, frame
synchronization for output signals before and after switching, and no picture damage. Clean switching
ensures that no black screen, erratic display, or frame freezing occurs when the receive end receives traffic
during the switching of two video streams.
1. The SSRC field is used to identify a synchronization source. The source of an RTP packet is identified by
the 32-bit SSRC identifier in the RTP header so that it does not depend on the network address. All
packets from a synchronization source form part of the same timing and SN space, and a receiver
groups packets by synchronization source for playback. In clean switching, this field can be configured
to ensure that two synchronization sources have different SSRC identifiers. A receiver distinguishes the
sources based on the SSRC identifiers.
2. The SN field is used to identify the sequence number of an RTP packet sent by a sender. Each time a
packet is sent, the sequence number increases by 1. This field can be used to check packet loss. It can
also be used to reorder data if network jitter occurs. The pre- and post-switching SNs must be
consecutive. In clean switching, this field can be configured to check whether packet disorder or packet
loss occur during the switching.
3. When the length of an SN exceeds 16 bits, overflow occurs. An EXT-SN can be used to resolve this
issue. The SN is equivalent to the lower 16 bits of a 32-bit integer, and the EXT-SN is equivalent to the
upper 16 bits of the 32-bit integer. If the length of the SN is greater than 16 bits (that is, 65535), the
EXT-SN increases by 1. This field can be configured to check whether an abnormal carry occurs on the
SN. An abnormal carry means that a carry occurs when the SN is not greater than 65535.
In some scenarios, the system checks the RTP-SN, RTP-EXT-SN, and RTP-SSRC, or checks only some of the
fields during the switching. Which fields need to be checked depends on the video stream format and media
gateway.
2022-07-08 2154
Feature Description
Multicast Input Multicast Stream Input Multicast Output Multicast Output Multicast
Stream 1 Stream 2 Stream 1 Stream 2
Characteristics
1. Input multicast stream 1 is matched based on two-level traffic policies on interface 1. The level-1
traffic policy matches the source MAC address of input multicast stream 1, and a traffic behavior is
associated with the level-2 traffic policy. The level-2 traffic policy matches the source IP address,
destination IP address, and UDP port number of input multicast stream 1, and a traffic behavior is
associated with a multicast NAT instance. The mapping between input multicast stream 1 and the
multicast NAT instance is established using the two-level traffic policies.
2. Input multicast stream 2 is matched based on two-level traffic policies on interface 2. The level-1
traffic policy matches the source MAC address of input multicast stream 2, and a traffic behavior is
associated with the level-2 traffic policy. The level-2 traffic policy matches the source IP address,
destination IP address, and UDP port number of input multicast stream 2, and a traffic behavior is
associated with a multicast NAT instance. The mapping between input multicast stream 2 and the
multicast NAT instance is established using the two-level traffic policies.
3. Multicast stream translation rules are configured on a specified outbound interface (interface 3) to
translate or not to translate some characteristics of output multicast stream 1 and bind interface 3 to
the multicast NAT instance. Input multicast stream 1 and output multicast stream 1 can be associated
through the multicast instance.
4. After the controller delivers a clean switching instruction to DeviceB, DeviceB unbinds output multicast
stream 1 from the multicast NAT instance, and binds output multicast stream 2 to the multicast NAT
2022-07-08 2155
Feature Description
Service Description
In the broadcasting and TV industry, especially in TV stations or media centers, IP-based production and
broadcasting networks are gaining in popularity. Related IP standards are being formulated, which is an
important step in the development of the 4K industry. Traditional production and broadcasting networks can
be divided into three key domains according to service function, as follows:
• Production domain: produces programs and outputs video/audio stream signals to the control matrices
of the control domain.
• Control domain: schedules video/audio stream signals between departments in TV stations or media
centers.
Network Description
The following figure shows a traditional production and broadcasting network.
2022-07-08 2156
Feature Description
On an IP-based production and broadcasting network, routers configured with multicast NAT can be used to
replace the traditional SDI switching matrices. These routers can output multicast streams to the control
matrices or from the control matrices to the broadcast domain.
Background
With the development of IP-based production and broadcasting networks, related standards are gradually
improved. SMPTE ST 2022 series standards define the rules for transmitting digital videos over IP networks.
SMPTE ST 2022-7 (seamless protection switching of RTP datagrams) specifies the requirements for
redundant data streams so that the receiver can perform seamless switching at the data packet level
without affecting the data content and data stream stability.
On IP-based production and broadcasting networks, deploying redundancy protection according to SMPTE
ST 2022-7 is a feasible scheme to guarantee the system stability and security. However, on SMPTE 2022-7
networks (2022-7 networks for short), the implementation of the media asset clean switching service
requires that clean switching be performed at the same point on the primary and secondary links. Otherwise,
exceptions such as interlacing may occur on the receiver. Traditional multicast NAT clean switching is
incompatible with SMPTE ST 2022-7. Therefore, multicast NAT 2022-7, a new clean switching algorithm, is
designed based on multicast NAT clean switching to support SMPTE ST 2022-7.
Networking Scenario
Figure 1 shows the networking of multicast NAT 2022-7.
2022-07-08 2157
Feature Description
Two forwarders (Device A and Device B) that back up each other and a controller are deployed on the
network. Multicast sources (cameras) Source 1 and Source 2 each send two copies of the same stream to
implement redundancy protection. Specifically, Source 1 sends stream 1 and stream 1', whereas Source 2
sends stream 2 and stream 2'. Streams 1 and 2 are forwarded by Device A, and Streams 1' and 2' are
forwarded by Device B. The controller delivers a switching instruction to Device A and Device B, which
perform the switching at the same time. Device A switches the output multicast stream from stream 1 to
stream 2, and Device B switches the output multicast stream from stream 1' to stream 2'. Stream 2 and
stream 2' are selectively received by the player.
Fundamentals
The SMPTE ST 2022-7 network requires that the streams output by Device A and Device B be the same at
any time. If they are inconsistent, artifacts occur on the player during the switching as a result of the
selective receiving. Figure 2 shows such a problem.
2022-07-08 2158
Feature Description
Figure 2 Problem arising from selective receiving of interlaced streams on a player in a traditional clean switching
scenario
• Stream sampling for learning: Packets are collected in advance to dynamically learn streams and
calculate stream characteristics (internal information and rules of encoding) based on stream
parameters, such as the duration of each frame image and the number of packets in each frame image.
• Switching point prediction: Stream switching points are calculated based on stream characteristics and
the Precision Time Protocol (PTP) timestamp, and switching is performed at the switching points. The
controller must ensure that the same stream characteristics be configured for media signals with a
mirroring relationship on Device A and Device B and the same PTP timestamp be delivered for the clean
switching.
On the SMPTE ST 2022-7 network, the network connection between the controller and devices must be
normal. Each device independently performs switching based on the instruction delivered by the controller.
2022-07-08 2159
Feature Description
There is no internal synchronization protocol or communication interaction between Device A and Device B.
When multicast NAT 2022-7 is used for switching, ensure that the PTP clocks of the multicast sources,
devices that perform switching, and controller are synchronous. Otherwise, the switching effect cannot be
ensured.
11.14.6 Terminology
Definition
Multicast network address translation (NAT) translates the source IP address, destination IP address, and
destination port number (subsequently referred to as characteristics) in multicast streams. Multicast NAT
allows you to configure traffic policies on inbound interfaces to match input multicast streams. It also allows
you to configure translation rules on outbound interfaces, so that a multicast stream can be replicated to
multiple outbound interfaces and multicast stream characteristics can be modified according to the rules. On
the network shown in Figure 1, after multicast NAT is deployed on DeviceB, DeviceB performs the following
operations: uses a traffic policy to match the input multicast stream StreamIn, translates StreamIn's
characteristics, and outputs the post-translation multicast streams StreamOut1 and StreamOut2.
2022-07-08 2160
Feature Description
Purpose
On the network shown in Figure 1, users 1 and 2 receive the input multicast stream StreamIn from different
multicast groups. However, traditional multicast technologies cannot meet the requirement for sending the
same multicast stream to different multicast groups. To resolve this issue, deploy multicast NAT on DeviceB
so that DeviceB can translate StreamIn's characteristics and output the stream to users 1 and 2.
Benefits
Multicast NAT offers the following benefits:
• Multicast stream characteristics can be translated so that different downstream users can receive
multicast streams.
• The matrixes of multicast streams can be conveniently switched to replace traditional serial digital
interface (SDI) switching matrixes.
1. The multicast NAT device (DeviceB) translates the input multicast stream StreamIn into one or more
output multicast streams.
2. The characteristics of the output multicast streams can be the same as or different from those of the
input multicast stream.
2022-07-08 2161
Feature Description
NOTE:
1. Traffic policies are applied to the inbound interface (Interface1) to match the source MAC address,
source IP address, destination IP address, source UDP port number, and destination UDP port number
of StreamIn. The traffic behavior is to associate the stream with a multicast NAT instance. The
mapping between StreamIn and the multicast NAT instance is established based on the traffic policies.
2. You can configure a multicast stream translation rule on each outbound interface (Interface2 and
Interface3) for them to translate some characteristics of output streams and bind the streams to a
multicast NAT instance. The input and output multicast streams can be associated through a multicast
instance.
3. Each multicast NAT instance can be bound to multiple multicast NAT outbound interfaces. This allows
one input multicast stream to be replicated to multiple outbound interfaces. The characteristics of
output multicast streams may be the same as or different from those of the input multicast stream.
2022-07-08 2162
Feature Description
1. Input multicast stream 1 is input to the multicast NAT device (DeviceB) and is translated into output
multicast stream 1. The characteristics of the multicast stream can change or not.
2. Input multicast stream 2 is input to the multicast NAT device (DeviceB) but is not translated into an
output multicast stream. DeviceB receives both input multicast streams 1 and 2, but outputs only
output multicast stream 1.
3. After receiving a clean switching instruction from the controller, DeviceB switches multicast stream 1
to multicast stream 2. During the switching, the media gateway detects that no excess packets are
received or no packet loss occurs through the stream characteristics. Receiver 1 does not detect erratic
display or frame freezing.
2022-07-08 2163
Feature Description
1. The SSRC field is used to identify a synchronization source. The source of an RTP packet is identified by
the 32-bit SSRC identifier in the RTP header so that it does not depend on the network address. All
packets from a synchronization source form part of the same timing and SN space, and a receiver
groups packets by synchronization source for playback. In clean switching, this field can be configured
to ensure that two synchronization sources have different SSRC identifiers. A receiver distinguishes the
sources based on the SSRC identifiers.
2. The SN field is used to identify the sequence number of an RTP packet sent by a sender. Each time a
packet is sent, the sequence number increases by 1. This field can be used to check packet loss. It can
also be used to reorder data if network jitter occurs. The pre- and post-switching SNs must be
consecutive. In clean switching, this field can be configured to check whether packet disorder or packet
loss occur during the switching.
3. When the length of an SN exceeds 16 bits, overflow occurs. An EXT-SN can be used to resolve this
issue. The SN is equivalent to the lower 16 bits of a 32-bit integer, and the EXT-SN is equivalent to the
upper 16 bits of the 32-bit integer. If the length of the SN is greater than 16 bits (that is, 65535), the
EXT-SN increases by 1. This field can be configured to check whether an abnormal carry occurs on the
SN. An abnormal carry means that a carry occurs when the SN is not greater than 65535.
In some scenarios, the system checks the RTP-SN, RTP-EXT-SN, and RTP-SSRC, or checks only some of the
fields during the switching. Which fields need to be checked depends on the video stream format and media
gateway.
Multicast Input Multicast Stream Input Multicast Output Multicast Output Multicast
Stream 1 Stream 2 Stream 1 Stream 2
Characteristics
2022-07-08 2164
Feature Description
Multicast Input Multicast Stream Input Multicast Output Multicast Output Multicast
Stream 1 Stream 2 Stream 1 Stream 2
Characteristics
number
1. Input multicast stream 1 is matched based on two-level traffic policies on interface 1. The level-1
traffic policy matches the source MAC address of input multicast stream 1, and a traffic behavior is
associated with the level-2 traffic policy. The level-2 traffic policy matches the source IP address,
destination IP address, and UDP port number of input multicast stream 1, and a traffic behavior is
associated with a multicast NAT instance. The mapping between input multicast stream 1 and the
multicast NAT instance is established using the two-level traffic policies.
2. Input multicast stream 2 is matched based on two-level traffic policies on interface 2. The level-1
traffic policy matches the source MAC address of input multicast stream 2, and a traffic behavior is
associated with the level-2 traffic policy. The level-2 traffic policy matches the source IP address,
destination IP address, and UDP port number of input multicast stream 2, and a traffic behavior is
associated with a multicast NAT instance. The mapping between input multicast stream 2 and the
multicast NAT instance is established using the two-level traffic policies.
3. Multicast stream translation rules are configured on a specified outbound interface (interface 3) to
translate or not to translate some characteristics of output multicast stream 1 and bind interface 3 to
the multicast NAT instance. Input multicast stream 1 and output multicast stream 1 can be associated
through the multicast instance.
4. After the controller delivers a clean switching instruction to DeviceB, DeviceB unbinds output multicast
stream 1 from the multicast NAT instance, and binds output multicast stream 2 to the multicast NAT
instance. This implements clean switching.
Service Description
In the broadcasting and TV industry, especially in TV stations or media centers, IP-based production and
broadcasting networks are gaining in popularity. Related IP standards are being formulated, which is an
important step in the development of the 4K industry. Traditional production and broadcasting networks can
2022-07-08 2165
Feature Description
• Production domain: produces programs and outputs video/audio stream signals to the control matrices
of the control domain.
• Control domain: schedules video/audio stream signals between departments in TV stations or media
centers.
Network Description
The following figure shows a traditional production and broadcasting network.
On an IP-based production and broadcasting network, routers configured with multicast NAT can be used to
replace the traditional SDI switching matrices. These routers can output multicast streams to the control
matrices or from the control matrices to the broadcast domain.
2022-07-08 2166
Feature Description
Background
With the development of IP-based production and broadcasting networks, related standards are gradually
improved. SMPTE ST 2022 series standards define the rules for transmitting digital videos over IP networks.
SMPTE ST 2022-7 (seamless protection switching of RTP datagrams) specifies the requirements for
redundant data streams so that the receiver can perform seamless switching at the data packet level
without affecting the data content and data stream stability.
On IP-based production and broadcasting networks, deploying redundancy protection according to SMPTE
ST 2022-7 is a feasible scheme to guarantee the system stability and security. However, on SMPTE 2022-7
networks (2022-7 networks for short), the implementation of the media asset clean switching service
requires that clean switching be performed at the same point on the primary and secondary links. Otherwise,
exceptions such as interlacing may occur on the receiver. Traditional multicast NAT clean switching is
incompatible with SMPTE ST 2022-7. Therefore, multicast NAT 2022-7, a new clean switching algorithm, is
designed based on multicast NAT clean switching to support SMPTE ST 2022-7.
Networking Scenario
Figure 1 shows the networking of multicast NAT 2022-7.
2022-07-08 2167
Feature Description
Two forwarders (Device A and Device B) that back up each other and a controller are deployed on the
network. Multicast sources (cameras) Source 1 and Source 2 each send two copies of the same stream to
implement redundancy protection. Specifically, Source 1 sends stream 1 and stream 1', whereas Source 2
sends stream 2 and stream 2'. Streams 1 and 2 are forwarded by Device A, and Streams 1' and 2' are
forwarded by Device B. The controller delivers a switching instruction to Device A and Device B, which
perform the switching at the same time. Device A switches the output multicast stream from stream 1 to
stream 2, and Device B switches the output multicast stream from stream 1' to stream 2'. Stream 2 and
stream 2' are selectively received by the player.
Fundamentals
The SMPTE ST 2022-7 network requires that the streams output by Device A and Device B be the same at
any time. If they are inconsistent, artifacts occur on the player during the switching as a result of the
selective receiving. Figure 2 shows such a problem.
Figure 2 Problem arising from selective receiving of interlaced streams on a player in a traditional clean switching
scenario
• Stream sampling for learning: Packets are collected in advance to dynamically learn streams and
calculate stream characteristics (internal information and rules of encoding) based on stream
parameters, such as the duration of each frame image and the number of packets in each frame image.
• Switching point prediction: Stream switching points are calculated based on stream characteristics and
the Precision Time Protocol (PTP) timestamp, and switching is performed at the switching points. The
controller must ensure that the same stream characteristics be configured for media signals with a
mirroring relationship on Device A and Device B and the same PTP timestamp be delivered for the clean
switching.
2022-07-08 2168
Feature Description
On the SMPTE ST 2022-7 network, the network connection between the controller and devices must be
normal. Each device independently performs switching based on the instruction delivered by the controller.
There is no internal synchronization protocol or communication interaction between Device A and Device B.
When multicast NAT 2022-7 is used for switching, ensure that the PTP clocks of the multicast sources,
devices that perform switching, and controller are synchronous. Otherwise, the switching effect cannot be
ensured.
11.15.6 Terminology
Definition
Multicast network address translation (NAT) translates the source IP address, destination IP address, and
destination port number (subsequently referred to as characteristics) in multicast streams. Multicast NAT
allows you to configure traffic policies on inbound interfaces to match input multicast streams. It also allows
you to configure translation rules on outbound interfaces, so that a multicast stream can be replicated to
multiple outbound interfaces and multicast stream characteristics can be modified according to the rules. On
2022-07-08 2169
Feature Description
the network shown in Figure 1, after multicast NAT is deployed on DeviceB, DeviceB performs the following
operations: uses a traffic policy to match the input multicast stream StreamIn, translates StreamIn's
characteristics, and outputs the post-translation multicast streams StreamOut1 and StreamOut2.
Purpose
On the network shown in Figure 1, users 1 and 2 receive the input multicast stream StreamIn from different
multicast groups. However, traditional multicast technologies cannot meet the requirement for sending the
same multicast stream to different multicast groups. To resolve this issue, deploy multicast NAT on DeviceB
so that DeviceB can translate StreamIn's characteristics and output the stream to users 1 and 2.
Benefits
Multicast NAT offers the following benefits:
• Multicast stream characteristics can be translated so that different downstream users can receive
multicast streams.
• The matrixes of multicast streams can be conveniently switched to replace traditional serial digital
interface (SDI) switching matrixes.
2022-07-08 2170
Feature Description
1. The multicast NAT device (DeviceB) translates the input multicast stream StreamIn into one or more
output multicast streams.
2. The characteristics of the output multicast streams can be the same as or different from those of the
input multicast stream.
NOTE:
1. Traffic policies are applied to the inbound interface (Interface1) to match the source MAC address,
2022-07-08 2171
Feature Description
source IP address, destination IP address, source UDP port number, and destination UDP port number
of StreamIn. The traffic behavior is to associate the stream with a multicast NAT instance. The
mapping between StreamIn and the multicast NAT instance is established based on the traffic policies.
2. You can configure a multicast stream translation rule on each outbound interface (Interface2 and
Interface3) for them to translate some characteristics of output streams and bind the streams to a
multicast NAT instance. The input and output multicast streams can be associated through a multicast
instance.
3. Each multicast NAT instance can be bound to multiple multicast NAT outbound interfaces. This allows
one input multicast stream to be replicated to multiple outbound interfaces. The characteristics of
output multicast streams may be the same as or different from those of the input multicast stream.
1. Input multicast stream 1 is input to the multicast NAT device (DeviceB) and is translated into output
multicast stream 1. The characteristics of the multicast stream can change or not.
2. Input multicast stream 2 is input to the multicast NAT device (DeviceB) but is not translated into an
output multicast stream. DeviceB receives both input multicast streams 1 and 2, but outputs only
output multicast stream 1.
3. After receiving a clean switching instruction from the controller, DeviceB switches multicast stream 1
to multicast stream 2. During the switching, the media gateway detects that no excess packets are
received or no packet loss occurs through the stream characteristics. Receiver 1 does not detect erratic
display or frame freezing.
2022-07-08 2172
Feature Description
switching between video sources output by multiple cameras, or switching between video sources of multiple
programs. The basic requirements of video switching are frame precision, clean switching, frame
synchronization for output signals before and after switching, and no picture damage. Clean switching
ensures that no black screen, erratic display, or frame freezing occurs when the receive end receives traffic
during the switching of two video streams.
1. The SSRC field is used to identify a synchronization source. The source of an RTP packet is identified by
the 32-bit SSRC identifier in the RTP header so that it does not depend on the network address. All
packets from a synchronization source form part of the same timing and SN space, and a receiver
groups packets by synchronization source for playback. In clean switching, this field can be configured
to ensure that two synchronization sources have different SSRC identifiers. A receiver distinguishes the
sources based on the SSRC identifiers.
2. The SN field is used to identify the sequence number of an RTP packet sent by a sender. Each time a
packet is sent, the sequence number increases by 1. This field can be used to check packet loss. It can
also be used to reorder data if network jitter occurs. The pre- and post-switching SNs must be
consecutive. In clean switching, this field can be configured to check whether packet disorder or packet
loss occur during the switching.
3. When the length of an SN exceeds 16 bits, overflow occurs. An EXT-SN can be used to resolve this
issue. The SN is equivalent to the lower 16 bits of a 32-bit integer, and the EXT-SN is equivalent to the
upper 16 bits of the 32-bit integer. If the length of the SN is greater than 16 bits (that is, 65535), the
EXT-SN increases by 1. This field can be configured to check whether an abnormal carry occurs on the
SN. An abnormal carry means that a carry occurs when the SN is not greater than 65535.
In some scenarios, the system checks the RTP-SN, RTP-EXT-SN, and RTP-SSRC, or checks only some of the
fields during the switching. Which fields need to be checked depends on the video stream format and media
gateway.
2022-07-08 2173
Feature Description
Multicast Input Multicast Stream Input Multicast Output Multicast Output Multicast
Stream 1 Stream 2 Stream 1 Stream 2
Characteristics
1. Input multicast stream 1 is matched based on two-level traffic policies on interface 1. The level-1
traffic policy matches the source MAC address of input multicast stream 1, and a traffic behavior is
associated with the level-2 traffic policy. The level-2 traffic policy matches the source IP address,
destination IP address, and UDP port number of input multicast stream 1, and a traffic behavior is
associated with a multicast NAT instance. The mapping between input multicast stream 1 and the
multicast NAT instance is established using the two-level traffic policies.
2. Input multicast stream 2 is matched based on two-level traffic policies on interface 2. The level-1
traffic policy matches the source MAC address of input multicast stream 2, and a traffic behavior is
associated with the level-2 traffic policy. The level-2 traffic policy matches the source IP address,
destination IP address, and UDP port number of input multicast stream 2, and a traffic behavior is
associated with a multicast NAT instance. The mapping between input multicast stream 2 and the
multicast NAT instance is established using the two-level traffic policies.
3. Multicast stream translation rules are configured on a specified outbound interface (interface 3) to
translate or not to translate some characteristics of output multicast stream 1 and bind interface 3 to
the multicast NAT instance. Input multicast stream 1 and output multicast stream 1 can be associated
through the multicast instance.
4. After the controller delivers a clean switching instruction to DeviceB, DeviceB unbinds output multicast
stream 1 from the multicast NAT instance, and binds output multicast stream 2 to the multicast NAT
2022-07-08 2174
Feature Description
Service Description
In the broadcasting and TV industry, especially in TV stations or media centers, IP-based production and
broadcasting networks are gaining in popularity. Related IP standards are being formulated, which is an
important step in the development of the 4K industry. Traditional production and broadcasting networks can
be divided into three key domains according to service function, as follows:
• Production domain: produces programs and outputs video/audio stream signals to the control matrices
of the control domain.
• Control domain: schedules video/audio stream signals between departments in TV stations or media
centers.
Network Description
The following figure shows a traditional production and broadcasting network.
2022-07-08 2175
Feature Description
On an IP-based production and broadcasting network, routers configured with multicast NAT can be used to
replace the traditional SDI switching matrices. These routers can output multicast streams to the control
matrices or from the control matrices to the broadcast domain.
Background
With the development of IP-based production and broadcasting networks, related standards are gradually
improved. SMPTE ST 2022 series standards define the rules for transmitting digital videos over IP networks.
SMPTE ST 2022-7 (seamless protection switching of RTP datagrams) specifies the requirements for
redundant data streams so that the receiver can perform seamless switching at the data packet level
without affecting the data content and data stream stability.
On IP-based production and broadcasting networks, deploying redundancy protection according to SMPTE
ST 2022-7 is a feasible scheme to guarantee the system stability and security. However, on SMPTE 2022-7
networks (2022-7 networks for short), the implementation of the media asset clean switching service
requires that clean switching be performed at the same point on the primary and secondary links. Otherwise,
exceptions such as interlacing may occur on the receiver. Traditional multicast NAT clean switching is
incompatible with SMPTE ST 2022-7. Therefore, multicast NAT 2022-7, a new clean switching algorithm, is
designed based on multicast NAT clean switching to support SMPTE ST 2022-7.
Networking Scenario
Figure 1 shows the networking of multicast NAT 2022-7.
2022-07-08 2176
Feature Description
Two forwarders (Device A and Device B) that back up each other and a controller are deployed on the
network. Multicast sources (cameras) Source 1 and Source 2 each send two copies of the same stream to
implement redundancy protection. Specifically, Source 1 sends stream 1 and stream 1', whereas Source 2
sends stream 2 and stream 2'. Streams 1 and 2 are forwarded by Device A, and Streams 1' and 2' are
forwarded by Device B. The controller delivers a switching instruction to Device A and Device B, which
perform the switching at the same time. Device A switches the output multicast stream from stream 1 to
stream 2, and Device B switches the output multicast stream from stream 1' to stream 2'. Stream 2 and
stream 2' are selectively received by the player.
Fundamentals
The SMPTE ST 2022-7 network requires that the streams output by Device A and Device B be the same at
any time. If they are inconsistent, artifacts occur on the player during the switching as a result of the
selective receiving. Figure 2 shows such a problem.
2022-07-08 2177
Feature Description
Figure 2 Problem arising from selective receiving of interlaced streams on a player in a traditional clean switching
scenario
• Stream sampling for learning: Packets are collected in advance to dynamically learn streams and
calculate stream characteristics (internal information and rules of encoding) based on stream
parameters, such as the duration of each frame image and the number of packets in each frame image.
• Switching point prediction: Stream switching points are calculated based on stream characteristics and
the Precision Time Protocol (PTP) timestamp, and switching is performed at the switching points. The
controller must ensure that the same stream characteristics be configured for media signals with a
mirroring relationship on Device A and Device B and the same PTP timestamp be delivered for the clean
switching.
On the SMPTE ST 2022-7 network, the network connection between the controller and devices must be
normal. Each device independently performs switching based on the instruction delivered by the controller.
2022-07-08 2178
Feature Description
There is no internal synchronization protocol or communication interaction between Device A and Device B.
When multicast NAT 2022-7 is used for switching, ensure that the PTP clocks of the multicast sources,
devices that perform switching, and controller are synchronous. Otherwise, the switching effect cannot be
ensured.
11.16.6 Terminology
Definition
Layer 2 multicast implements on-demand multicast data transmission on the data link layer. Figure 1 shows
a typical Layer 2 multicast application where Device B functions as a Layer 2 device. After Layer 2 multicast
is deployed on Device B, it listens to Internet Group Management Protocol (IGMP) packets exchanged
between Device A (a Layer 3 device) and hosts and creates a Layer 2 multicast forwarding table. Then,
Device B forwards multicast data only to users who have explicitly requested the data, instead of
broadcasting the data.
Purpose
Layer 2 multicast is designed to reduce network bandwidth consumption. For example, without Layer 2
multicast, Device B cannot know which interfaces are connected to multicast receivers. Therefore, after
receiving a multicast packet from Device A, Device B broadcasts the packet in the packet's broadcast
domain. As a result, all hosts in the broadcast domain (including those who do not request the packet) will
2022-07-08 2179
Feature Description
receive the packet, which wastes network bandwidth and compromises network security.
With Layer 2 multicast, DeviceB can create a Layer 2 multicast forwarding table and record the mapping
between multicast group addresses and interfaces in the table. After receiving a multicast packet, Device B
searches the forwarding table for downstream interfaces that map to the packet's group address, and
forwards the packet only to these interfaces, which reduces bandwidth consumption. A multicast group
address can be a multicast IP address or a mapped multicast MAC address.
Functions
Major Layer 2 multicast functions include:
• IGMP snooping
• Multicast VLAN
Benefits
Layer 2 multicast offers the following benefits:
Background
Layer 3 devices and hosts use IGMP to implement multicast data communication. IGMP messages are
encapsulated in IP packets. A Layer 2 device can neither process Layer 3 information nor learn multicast
MAC addresses in link layer data frames because source MAC addresses in data frames are not multicast
MAC addresses. As a result, when a Layer 2 device receives a data frame in which the destination MAC
2022-07-08 2180
Feature Description
address is a multicast MAC address, the device cannot find a matching entry in its MAC address table. The
Layer 2 device then broadcasts the multicast packet, which wastes bandwidth resources and compromises
network security.
IGMP snooping addresses this problem by controlling multicast traffic forwarding at Layer 2. IGMP snooping
enables a Layer 2 device to listen to and analyze IGMP messages exchanged between a Layer 3 device and
hosts. Based on the learned IGMP message information, the device creates a Layer 2 forwarding table and
uses it to implement on-demand packet forwarding.
Figure 1 shows a network on which Device B is a Layer 2 device and users connected to Port 1 and Port 2
require multicast data from a multicast group (for example, 225.0.0.1).
• If Device B does not run IGMP snooping, Device B broadcasts all received multicast data at the data link
layer.
• If Device B runs IGMP snooping and receives data for a multicast group, Device B searches the Layer 2
multicast forwarding table for ports connected to the users who require the data. In this example,
Device B sends the data only to Port 1 and Port 2 because the user connected to Port 3 does not require
the data.
2022-07-08 2181
Feature Description
Figure 1 Multicast packet transmission before and after IGMP snooping is configured on a Layer 2 device
225.0.0.1 Port 1
225.0.0.1 Port 2
Related Concepts
Figure 2 illustrates IGMP snooping on a Layer 2 multicast network.
2022-07-08 2182
Feature Description
• A router port (labeled with a blue circle in Figure 2): It connects a Layer 2 multicast device to an
upstream multicast router.
Router ports can be dynamically discovered by IGMP or manually configured.
• A member port of a multicast group (labeled with a yellow square in Figure 2): It connects a Layer 2
multicast device to group member hosts and is used by a Layer 2 multicast device to send multicast
packets to hosts.
Member ports can be dynamically discovered by IGMP or manually configured.
• A Layer 2 multicast forwarding entry: It is stored in the multicast forwarding table and used by a Layer
2 multicast device to determine the forwarding of a multicast packet sent from an upstream device.
Information in a Layer 2 multicast forwarding entry includes:
• Multicast MAC address: It is mapped from a multicast IP address contained in a multicast data packet at
the data link layer. Multicast MAC addresses are used to determine multicast data packet forwarding at
2022-07-08 2183
Feature Description
Implementation
IGMP snooping is implemented as follows:
1. After IGMP snooping is deployed on a Layer 2 device, the device uses IGMP snooping to analyze IGMP
messages exchanged between hosts and a Layer 3 device and then creates a Layer 2 multicast
forwarding table based on the analysis. Information in forwarding entries includes VLAN IDs or VSI
names, multicast source addresses, multicast group addresses, and numbers of ports connected to
hosts.
• After receiving an IGMP Query message from an upstream device, the Layer 2 device sets a
network-side port as a dynamic router port.
• After receiving a PIM Hello message from an upstream device, the Layer 2 device sets a network-
side port as a dynamic router port.
• After receiving an IGMP Report message from a downstream device or user, the Layer 2 device
sets a user-side port as a dynamic member port.
2. The IGMP snooping-capable Layer 2 device forwards a received packet based on the Layer 2 multicast
forwarding table.
2022-07-08 2184
Feature Description
Other Functions
• IGMP snooping supports all IGMP versions.
IGMP has three versions: IGMPv1, IGMPv2, and IGMPv3. You can specify an IGMP version for your
device.
• IGMP snooping enables a Layer 2 device to rapidly respond to Layer 2 network topology changes.
Multiple Spanning Tree Protocol (MSTP) is usually used to connect Layer 2 devices to implement rapid
convergence. IGMP snooping adapts to this feature by enabling a Layer 2 device to immediately update
port information and switch multicast data traffic over a new forwarding path when the network
topology changes, which minimizes multicast service interruptions.
• IGMP snooping allows you to configure a security policy for multicast groups.
This function can be used to limit the range and number of multicast groups that users can join and to
determine whether to receive multicast data packets containing a security field. It provides refined
control over multicast groups and improves network security.
Deployment Scenarios
IGMP snooping can be used on VLANs and virtual private LAN service (VPLS) networks.
Benefits
IGMP snooping deployed on a user-side Router offers the following benefits:
Background
Multicast data can be transmitted to user terminals over an IP bearer network in either dynamic or static
multicast mode.
• In dynamic multicast mode, a device starts to receive and deliver a multicast group's data after
receiving the first Report message for the group. The device stops receiving the multicast group's data
after receiving the last Leave message. The dynamic multicast mode has both an advantage and a
disadvantage:
• In static multicast mode, multicast forwarding entries are configured for each multicast group on a
2022-07-08 2185
Feature Description
device. A multicast group's data is delivered to a device, regardless of whether users are requesting the
data from this device. The static multicast mode has the following advantages and disadvantages:
■ Advantages:
■ Multicast routes are fixed, and multicast paths exist regardless of whether there are multicast
data receivers. Users can change channels without delays, improving user experience.
■ Multicast source and group ranges are easy to manage because multicast paths are stable.
■ The delay when data is first forwarded is minimal because static routes already exist and do
not need to be established the way dynamic multicast routes do.
■ Disadvantages:
■ Each device on a multicast data transmission path must be manually configured. The
configuration workload is heavy.
■ Sub-optimal multicast forwarding paths may be generated because downstream ports are
manually specified on each device.
■ When a network topology or unicast routes change, static multicast paths may need to be
reconfigured. The configuration workload is heavy.
■ Multicast routes exist even when no multicast data needs to be forwarded. This wastes
network resources and creates high bandwidth requirements.
A Layer 2 multicast forwarding table can be dynamically built using IGMP snooping or be manually
configured. Choose the dynamic or static mode based on network quality requirements and demanded
service types.
If network bandwidth is sufficient and hosts require multicast data for specific multicast groups from a
router port for a long period of time, choose static Layer 2 multicast to implement stable multicast data
transmission on a metropolitan area network (MAN) or bearer network. After static Layer 2 multicast is
deployed on a device, multicast entries on the device do not age and users attached to the device can stably
receive multicast data for specific multicast groups.
Related Concepts
Static router ports or member ports are used in static Layer 2 multicast.
• Static member ports are used to send data for specific multicast groups.
Deployment Scenarios
Static Layer 2 multicast can be used on VLANs and VPLS networks.
2022-07-08 2186
Feature Description
Benefits
Static Layer 2 multicast offers the following benefits:
• Improved information security by preventing unregistered users from receiving multicast packets
Background
IGMPv3 supports source-specific multicast (SSM), but IGMPv1 and IGMPv2 do not. The majority of the latest
multicast devices support IGMPv3, but most legacy multicast terminals only support IGMPv1 or IGMPv2. SSM
mapping is a transition solution that provides SSM services for such legacy multicast terminals. Using rules
that specify the mapping from a particular multicast group to a source-specific group, SSM mapping can
convert IGMPv1 or IGMPv2 messages whose group addresses are within the SSM range to IGMPv3
messages. This mechanism allows hosts running IGMPv1 or IGMPv2 to access SSM services. SSM mapping
allows IGMPv1 or IGMPv2 terminals to access only specific sources, thus minimizing the risks of attacks on
multicast sources.
Layer 2 SSM mapping is used to implement SSM mapping on Layer 2 networks. For example, on the network
shown in Figure 1, the Layer 3 device runs IGMPv3 and directly connects to a Layer 2 device. Host A runs
IGMPv3, Host B runs IGMPv2, and Host C runs IGMPv1 on the Layer 2 network. If the IGMP versions of Host
B and Host C cannot be upgraded to IGMPv3, Layer 2 SSM mapping needs to be configured on the Layer 2
device to provide SSM services for all hosts on the network segment.
2022-07-08 2187
Feature Description
Implementation
If SSM mapping is configured on a multicast device and mapping between group addresses and source
addresses is configured, the multicast device will perform the following actions after receiving a (*, G)
message from a host running IGMPv1 or IGMPv2:
• If the message's multicast group address is not in the SSM group address range, the device processes
the message in the same manner as it processes an IGMPv1 or IGMPv2 message.
• If the message's multicast group address is in the SSM group address range, the device maps the (*, G)
message into (S, G) messages based on mapping rules.
Deployment Scenarios
Layer 2 SSM mapping can be used on VLANs and VPLS networks.
Benefits
Layer 2 SSM mapping offers the following benefits:
2022-07-08 2188
Feature Description
Background
Forwarding entries are generated when a Layer 3 device (PE on the network shown in Figure 1) exchanges
IGMP messages with user hosts. If there are many user hosts, excessive IGMP messages will reduce the
forwarding capability of the Layer 3 device.
To resolve this issue, deploy IGMP snooping proxy on a Layer 2 device (CE on the network shown in Figure 1
) that connects the Layer 3 device and hosts. IGMP snooping proxy enables a Layer 2 device to behave as
both a Layer 3 device and a user host, so that the Layer 2 device can terminate IGMP messages to be
transmitted between the Layer 3 device and user host. IGMP snooping proxy enables a Layer 2 device to
perform the following operations:
• Periodically send Query messages to hosts and receive Report and Leave messages from hosts.
After IGMP snooping proxy is deployed on a Layer 2 device, the Layer 2 device is not a transparent message
forwarder between a Layer 3 device and user host any more. Furthermore, the Layer 3 device only
recognizes the Layer 2 device and is unaware of user hosts.
2022-07-08 2189
Feature Description
Implementation
A device that runs IGMP snooping proxy establishes and maintains a multicast forwarding table and sends
multicast data to users based on this table. IGMP snooping proxy implements the following functions:
• IGMP snooping proxy implements the querier function for upstream devices, enabling a Layer 2 device
to send Query messages on behalf of its interworking upstream device. The querier function must be
enabled by deploying directly or enabling IGMP snooping proxy on a Layer 2 device if its interworking
upstream device cannot send IGMP Query messages or if static multicast groups are configured on the
upstream device.
• IGMP snooping proxy enables a Layer 2 device to suppress Report and Leave messages if large numbers
of users frequently join or leave multicast groups. This function reduces message processing workload
for upstream devices.
■ After receiving the first Report message for a multicast group from a user host, the device checks
whether an entry has been created for this group. If an entry has not been created, the device
sends the Report message to its upstream device and creates an entry for this group. If an entry
has been created, the device adds the host to the multicast group and does not send a Report
message to its upstream device.
■ After receiving a Leave message for a group from a user host, the device sends a group-specific
query message to check whether there are any members of this group. If there are members of this
group, the device deletes only the user from the group. If there are no other members of this
group, the device considers the user as the last member of the group and sends a Leave message
to its upstream device.
Deployment Scenarios
IGMP snooping proxy can be used on VLANs and VPLS networks.
Benefits
IGMP snooping proxy deployed on a user-side Layer 2 Router offers the following benefits:
Background
On the network shown in Figure1 Multicast flow replication before and after multicast VLAN is configured,
in traditional multicast on-demand mode, if users in different VLANs (VLAN 11 and VLAN 22) require the
2022-07-08 2190
Feature Description
same multicast flow, PEs on the Layer 3 network must send a copy of the multicast flow to each VLAN. This
mode wastes bandwidth and imposes additional burdens.
The multicast VLAN function can be used to address this problem. Multicast VLAN implements multicast
replication across broadcast domains on devices on a Layer 2 network based on IGMP snooping. After the
multicast VLAN function is configured on the CE, the upstream PE does not need to send one copy of the
multicast stream to the VLAN of each downstream user. Instead, the upstream PE sends only one copy of
the multicast stream to a VLAN (VLAN 3) of the CE. Then, the CE replicates the multicast stream to other
VLANs (VLAN 11 and VLAN 22). The PE no longer needs to send identical multicast data flows downstream.
This mode saves network bandwidth and relieves the load on the PE.
Figure 1 Multicast flow replication before and after multicast VLAN is configured
The following uses the network shown in Figure 1 as an example to describe why multicast VLAN requires
IGMP snooping proxy to be enabled.
• If IGMP snooping proxy is not enabled on VLAN 3 and users in different VLANs want to join the same
group, the CE forwards each user's IGMP Report message to the PE. Similarly, if users in different VLANs
leave the same group, the CE also needs to forward each user's IGMP Leave message to the PE.
• If IGMP snooping proxy is enabled on VLAN 3 and users in different VLANs want to join the same
group, the CE forwards only one IGMP Report message to the PE. If the last member of the group
leaves, the CE sends an IGMP Leave message to the PE. This reduces network-side bandwidth
consumption on the CE and performance pressure on the PE.
2022-07-08 2191
Feature Description
Related Concepts
The following concepts are involved in the multicast VLAN function:
• Multicast VLAN: is a VLAN to which the interface connected to a multicast source belongs. A multicast
VLAN is used to aggregate multicast flows.
• User VLAN: is a VLAN to which a group member host belongs. A user VLAN is used to receive multicast
flows from a multicast VLAN.
One multicast VLAN can be bound to multiple user VLANs.
After the multicast VLAN function is configured on a device, the device receives multicast traffic through
multicast VLANs and sends the multicast traffic to users through user VLANs.
Implementation
The multicast VLAN implementation process can be divided into two parts:
■ After the user VLAN tag in an IGMP Report message is replaced with a corresponding multicast
VLAN tag, the message is sent out through a router port of the multicast VLAN.
■ After the multicast VLAN tag in an IGMP Query message is replaced with a corresponding user
VLAN tag, the message is sent out through a member port of the user VLAN.
■ Entries learned through IGMP snooping in user VLANs are added to the table of the multicast
VLAN.
■ If a matching forwarding entry exists, the Layer 2 device will identify the downstream ports and
their VLAN IDs, replicate the multicast data packet on each downstream port, and send a copy of
the packet to user VLANs.
■ If no matching forwarding entry exists, the Layer 2 device will discard the multicast data packet.
Other Functions
A user VLAN allows you to configure the querier election function. The following uses the network shown in
Figure 2 as an example to describe the querier election function.
• A CE connects to Router A through both Router B and Router C, which improves the reliability of data
transmission. The querier function is enabled on Router B and Router C.
2022-07-08 2192
Feature Description
• Multicast VLAN is enabled on Router B and Router C. VLAN 11 is a multicast VLAN, and VLAN 22 is a
user VLAN.
Both Router B and Router C in VLAN 11 are connected to VLAN 22. As a result, VLAN 22 will receive two
identical copies for the same requested multicast flow from Router B and Router C, causing data
redundancy.
To address this problem, configure querier election on Router B and Router C in the user VLAN and specify
one of them to send Query messages and forward multicast data flows. In this manner, VLAN 22 receives
only one copy of a multicast data flow from the upstream Router A over VLAN 11.
A querier is elected as follows in a user VLAN (the network shown in Figure 2 is used as an example):
1. After receiving a Query message from Router A, Router B and Router C replace the source IP address
of the Query message with their own local source IP address (1.1.1.1 for Router B and 1.1.1.2 for
Router C).
2. Router B and Router C exchange Query messages. Based on the querier election algorithm, Router B
with a smaller source IP address is elected as a querier.
3. As a querier, Router B generates a forwarding entry after receiving a Join message from VLAN 22,
while Router C does not generate a forwarding entry. Then, multicast data flows from upstream
devices are forwarded by Router B to VLAN 22.
Deployment Scenarios
The multicast VLAN function can be used on VLANs.
2022-07-08 2193
Feature Description
Benefits
The multicast VLAN function offers the following benefits:
Principles
With the growing popularity of IPTV applications, multicast services are more widely deployed than ever.
When multicast services are deployed on a Layer 2 network, a number of problems may arise:
• If users join a large number of multicast groups, sparsely distributed multicast groups will increase
performance pressure on network devices.
• If network bandwidth is insufficient, the demand for bandwidth resources will exceed the total network
bandwidth, overloading aggregation layer devices and degrading user experience.
• If multicast packets are used to attack a network, network devices become busy processing attack
packets and cannot respond to normal network requests.
On the network shown in Figure 1, Layer 2 multicast entry limit can be deployed on the UPE and NPEs to
address the problems described above. The Layer 2 multicast entry limit function limits entries of multicast
services on a Layer 2 network. This function implements multicast service access restrictions and refined
control on the aggregation network based on the number of multicast groups. Layer 2 multicast entry limit
also enables service providers to refine content offerings and develop flexible subscriber-specific policies. This
prevents the demand for bandwidth resources from exceeding the total bandwidth of the aggregation
network and improves service quality for users.
2022-07-08 2194
Feature Description
Related Concepts
Entry limit: provides rules to limit the number of multicast groups, implementing control over multicast entry
learning. Layer 2 multicast entry limit is a function of limiting entries of multicast services on a Layer 2
network.
Implementation
If IGMP snooping is enabled, Layer 2 multicast entry limit can be used to control multicast services. Multicast
entry limit constrains the generation of multicast forwarding entries. When a specified threshold is reached,
no more forwarding entries will be generated. This conserves the processing capacity of devices and controls
link bandwidth.
Layer 2 multicast entry limit can be classified by usage scenario as follows:
• VLAN scenario:
• VPLS scenario:
Deployment Scenarios
Layer 2 multicast entry limit can be used on VLANs and VPLS networks.
Benefits
2022-07-08 2195
Feature Description
• Prevents required bandwidth resources from exceeding the total bandwidth of the aggregation network
and improves service quality for users.
Background
With the growing popularity of IPTV applications, multicast services are more widely deployed than ever.
When multicast services are deployed on a Layer 2 network, a number of problems may arise:
• If users join a large number of multicast groups, sparsely distributed multicast groups will increase
performance pressure on network devices.
• If network bandwidth is insufficient, the demand for bandwidth resources will exceed the total
bandwidth of the network, overloading aggregation layer devices and degrading user experience.
• If multicast packets are used to attack a network, network devices become busy processing attack
packets and cannot respond to normal network requests.
• If static multicast group management policies are used, user requests for access to a variety of different
multicast services cannot be met. Service providers expect more refined channel management. For
example, they expect to limit the number and bandwidth of multicast groups in channels.
On the network shown in Figure 1, Layer 2 multicast CAC can be deployed on the UPE and NPEs to address
the problems described above. Layer 2 multicast CAC controls multicast services on the aggregation network
based on different criteria, including the multicast group quantity and bandwidth limit for a channel or sub-
interface. Layer 2 multicast CAC enables service providers to refine content offerings and develop flexible
subscriber-specific policies. This prevents the demand for bandwidth resources from exceeding the total
bandwidth of the aggregation network and ensures service quality for users.
2022-07-08 2196
Feature Description
Related Concepts
The following concepts are involved in multicast CAC.
• Call Admission Control (CAC): provides a series of rules for controlling multicast entry learning,
including the multicast group quantity and bandwidth limits for each multicast group, as well as for
each channel. Layer 2 multicast CAC is used to perform CAC operations for multicast services on Layer 2
networks.
• Channel: consists of a series of multicast groups, each of which can have its own bandwidth attribute.
For example, a TV channel consists of two groups, TV-1 and TV-5, with the bandwidth of 4 Mbit/s and
18 Mbit/s, respectively.
Implementation
Layer 2 multicast CAC constrains the generation of multicast forwarding entries. When a preset threshold is
reached, no more forwarding entries can be generated. This ensures that devices have adequate processing
capabilities and controls link bandwidth.
Layer 2 multicast CAC can restrict the following items:
2022-07-08 2197
Feature Description
generated. The Layer 2 entry will be generated only if the number or bandwidth of member multicast
groups is below the upper threshold.
Deployment Scenarios
Layer 2 multicast CAC applies to VPLS networks.
Benefits
The Layer 2 multicast CAC feature provides the following benefits:
• For providers:
• For users:
■ Prevents bandwidth resources required from exceeding the total bandwidth of the aggregation
network and ensures service quality for users.
Principles
Multicast services have relatively high demands for real-time transmissions. To ensure uninterrupted delivery
of multicast services, master and backup links and devices are deployed on a VPLS network with a UPE dual-
homed to SPEs. In the networking shown in Figure 1, a UPE is connected to two SPEs through a VPLS
network. The PWs between the UPE and SPEs work in master/backup mode. Multicast services are delivered
from a multicast source to users attached to the UPE.
2022-07-08 2198
Feature Description
This networking allows unicast services to be transmitted properly, but there are problems with the
transmission of multicast services. Multicast protocol and data packets are blocked on the backup PW and
this prevents the backup SPE (SPE2) from learning multicast forwarding entries. As a result, SPE2 has no
forwarding entries, and, in the event of a master/backup SPE switchover, it cannot begin forwarding
multicast data traffic immediately. The PE must first resend an IGMP Query message and users attached to
the UPE must reply with Report messages before SPE2 can learn multicast forwarding entries through the
backup PW and resume the forwarding of multicast data packets. As a result, services are interrupted on the
UPE for a long period of time, and network reliability is adversely affected.
If the primary and secondary PWs in this networking are hub PWs, split horizon still takes effect, meaning that protocol
and data packets are not transmitted from the primary PW to the secondary PW.
To address this problem, rapid multicast traffic forwarding is configured on the backup device, SPE2. SPE2
sends an IGMP Query message to the UPE along the backup PW, and receives an IGMP Report message
from the UPE to create a Layer 2 multicast forwarding table. Although the backup PW cannot be used to
2022-07-08 2199
Feature Description
forward multicast data traffic, it can be used by SPE2 to send an IGMP Query message. If there is a
switchover and the backup PW becomes the master, SPE2 has a Layer 2 multicast forwarding table ready to
use and can begin forwarding multicast data traffic immediately. This ensures uninterrupted delivery of
multicast services.
Related Concepts
The following concepts are involved in rapid multicast data forwarding on a backup device:
Other Functions
If the upstream and downstream devices (SPE and UPE) are not allowed to receive IGMP messages that
carry the same source MAC address but are sent from different interfaces, the backup device needs to be
configured to replace the source MAC addresses carried in IGMP messages.
• After rapid multicast traffic forwarding is configured, the UPE receives IGMP Query messages from both
SPE1 and SPE2. Both messages carry the same MAC address. If MAC-flapping or MAC address
authentication has been configured on the UPE, protocol packets that are received by the UPE through
different interfaces but carry the same source MAC address will be filtered out. The backup SPE can be
configured to change the source MAC addresses of packets to its MAC address before sending IGMP
Query messages along the backup PW. This allows the UPE to learn two different router ports and send
IGMP Report and Leave messages from attached users to SPE1 and SPE2.
• Similarly, if MAC-flapping or MAC address authentication has been configured on the PE, the backup
SPE needs to be configured to change the source MAC addresses of received IGMP Report or Leave
messages to its MAC address before sending them to the PE.
Deployment Scenarios
Rapid multicast data forwarding on a backup device is used on VPLS networks that have a device dual-
homed to upstream devices through PWs.
2022-07-08 2200
Feature Description
Benefits
Rapid multicast data forwarding on a backup device provides the following benefit:
• After a master/backup device switchover is performed, multicast data can be quickly forwarded on the
backup device. This ensures reliable multicast service transmission and enhances user experience.
Background
In conventional multicast on-demand mode, if users of a Layer 2 multicast device in different VLANs or VSIs
request for the same multicast group's data from the same source, the connected upstream Layer 3 device
has to send a copy of each multicast flow of this group for each VLAN or VSI. Such implementation wastes
bandwidth resources and burdens the upstream device.
The Layer 2 multicast instance feature, which is an enhancement of multicast VLAN, resolves these issues by
allowing multicast data replication across VLANs and VSIs and supporting multicast data transmission of the
same multicast group across instances. These functions help save bandwidth resources and simplify multicast
group management. A Layer 2 network supports multiple Layer 2 multicast instances. For example, on the
network shown in Figure 1, if users in VLAN 11 and VLAN 22 request for multicast data from channels in the
range of 225.0.0.1 to 225.0.0.5, Layer 2 multicast instances can be deployed on the CE. Then, the CE requests
for only a single copy of each multicast data flow through VLAN 3 from the PE, replicates the multicast data
flow, and sends a copy to each VLAN. This implementation greatly reduces bandwidth consumption.
2022-07-08 2201
Feature Description
Layer 2 multicast instances allow devices to replicate multicast data flows across different types of instances,
such as flow replication from a VPLS to a VLAN or from a VLAN to a VPLS.
Related Concepts
• Multicast instance
An instance to which the interface connected to a multicast source belongs. A multicast instance
aggregates multicast flows.
• User instance
An instance to which the interface connected to a multicast receiver belongs. A user instance receives
multicast flows from a multicast instance.
A multicast instance can be associated with multiple user instances.
• Multicast channel
A multicast channel consists of one or more multicast groups. To facilitate service management,
multicast content providers generally operate different types of channels in different Layer 2 multicast
instances. Therefore, multicast channels need to be configured for Layer 2 multicast instances.
Implementation
2022-07-08 2202
Feature Description
After receiving a multicast data packet from an upstream device, a Layer 2 device searches for a matching
entry in the multicast forwarding table based on the multicast instance ID and the destination address
(multicast group address) contained in the packet. If a matching forwarding entry exists, the Layer 2 device
obtains the downstream interfaces and the VLAN IDs or VSI names, replicates the multicast data packet on
each downstream interface, and sends a copy of the packet to all involved user instances. If no matching
forwarding exists, the Layer 2 device broadcasts the multicast data packet in the local multicast VLAN or VSI.
This implementation is similar to multicast VLAN implementation.
Usage Scenario
Layer 2 multicast instances apply to VLAN and VPLS networks.
Benefits
Layer 2 multicast instances bring the following benefits:
• Isolated unicast and multicast domains to prevent user traffic from affecting each other
Definition
Multicast Listener Discovery Snooping (MLD snooping) is an IPv6 Layer 2 multicast protocol. The MLD
snooping protocol maintains information about the outbound interfaces of multicast packets by snooping
multicast protocol packets exchanged between the Layer 3 multicast device and user hosts. MLD snooping
manages and controls multicast packet forwarding at the data link layer.
Purpose
Similar to an IPv4 multicast network, multicast data on an IPv6 multicast network (especially on an LAN)
have to pass through Layer 2 switching devices. As shown in Figure 1, a Layer 2 switch locates between
multicast users and the Layer 3 multicast device, Router.
2022-07-08 2203
Feature Description
After receiving multicast packets from Router, Switch forwards the multicast packets to the multicast
receivers. The destination address of the multicast packets is a multicast group address. Switch cannot learn
multicast MAC address entries, so it broadcasts the multicast packets in the broadcast domain. All hosts in
the broadcast domain will receive the multicast packets, regardless of whether they are members of the
multicast group. This wastes network bandwidth and threatens network security.
MLD snooping solves this problem. MLD snooping is a Layer 2 multicast protocol on the IPv6 network. After
MLD snooping is configured, Switch can snoop and analyze MLD messages between multicast users and
Router. The Layer 2 multicast device sets up Layer 2 multicast forwarding entries to control forwarding of
multicast data. In this way, multicast data is not broadcast on the Layer 2 network.
Principles
MLD snooping is a basic IPv6 Layer 2 multicast function that forwards and controls multicast traffic at Layer
2. MLD snooping runs on a Layer 2 device and analyzes MLD messages exchanged between a Layer 3 device
and hosts to set up and maintain a Layer 2 multicast forwarding table. The Layer 2 device forwards
multicast packets based on the Layer 2 multicast forwarding table.
On an IPv6 multicast network shown in Figure 2, after receiving multicast packets from Router, Switch at the
edge of the access layer forwards the multicast packets to receiver hosts. If Switch does not run MLD
snooping, it broadcasts multicast packets at Layer 2. After MLD snooping is configured, Switch forwards
multicast packets only to specified hosts.
With MLD snooping configured, Switch listens on MLD messages exchanged between Router and hosts. It
analyzes packet information (such as packet type, group address, and receiving interface) to set up and
maintain a Layer 2 multicast forwarding table, and forwards multicast packets based on the Layer 2
multicast forwarding table.
2022-07-08 2204
Feature Description
Figure 2 Multicast packet transmission before and after MLD snooping is configured on a Layer 2 device
Concepts
As shown in Figure 3, Router connects to the multicast source. MLD snooping is configured on SwitchA and
SwitchB. HostA, HostB, and HostC are receiver hosts.
Figure 3 shows MLD snooping ports. The following table describes these ports.
2022-07-08 2205
Feature Description
The router port and member port are outbound interfaces in Layer 2 multicast forwarding entries. A router
port functions as an upstream interface, while a member port functions as a downstream interface. Port
information learned through protocol packets is saved as dynamic entries, and port information manually
configured is saved as static entries.
Besides the outbound interfaces, each entry includes multicast group addresses and VLAN IDs.
• Multicast group addresses can be multicast IP addresses or multicast MAC addresses mapped from
multicast IP addresses. In MAC address-based forwarding mode, multicast data may be forwarded to
hosts that do not require the data because multiple IP addresses are mapped to the same MAC address.
The IP address-based forwarding mode can prevent this problem.
• The VLAN ID specifies a Layer 2 broadcast domain. After multicast VLAN is configured, the inbound
VLAN ID is the multicast VLAN ID, and the outbound VLAN ID is a user VLAN ID. If multicast VLAN is
not configured, both the inbound and outbound VLAN IDs are the ID of the VLAN to which a host
belongs.
2022-07-08 2206
Feature Description
Implementation
After MLD snooping is configured, the Layer 2 multicast device processes the received MLD protocol packets
in different ways and sets up Layer 2 multicast forwarding entries.
General query MLD General Query message A Layer 2 device forwards MLD
The MLD querier periodically General Query messages to all
sends General Query messages to ports excluding the port receiving
all hosts and the router (FF02::1) the messages. The Layer 2 device
on the local network segment, to processes the receiving port as
check which multicast groups follows:
have members on the network If the port is included in the router
segment. port list, the Layer 2 device resets
the aging timer of the router port.
If the port is not in the router port
list, the Layer 2 device adds it to
the list and starts the aging timer.
2022-07-08 2207
Feature Description
NOTE:
Leave of multicast members MLD Leave message The Layer 2 device determines
There are two phases: whether the multicast group
Members send MLD Done matches a forwarding entry and
messages to notify the MLD whether the port that receives the
querier that the members have message is in the outbound
left a multicast group. interface list.
Upon receiving the MLD Done If no forwarding entry matches
message, the MLD querier obtains the multicast group or the
the multicast group address and outbound interface list of the
sends a Multicast-Address-Specific matching entry does not contain
Query/Multicast-Address-and- the receiving port, the Layer 2
Source-Specific Query message to device drops the MLD Leave
the multicast group. message.
If the multicast group matches a
forwarding entry and the port is
in the outbound interface list, the
Layer 2 device forwards the MLD
Leave message to all router ports
in the VLAN.
2022-07-08 2208
Feature Description
Multicast-Address-Specific A Multicast-Address-Specific
Query/Multicast-Address-and- Query/Multicast-Address-and-
Source-Specific Query message Source-Specific Query message is
forwarded to the ports connected
to members of specific groups.
Upon receiving an IPv6 PIM Hello message, a Layer 2 device forwards the message to all ports excluding the
port that receives the Hello message. The Layer 2 device processes the receiving port as follows:
• If the port is included in the router port list, the device resets the aging timer of the router port.
• If the port is not in the router port list, the device adds it to the list and starts the aging timer.
When the Layer 2 device receives an IPv6 PIM Hello message, it sets the aging time of the router port to the Holdtime
value in the Hello message.
If a static router port is configured, the Layer 2 device forwards received MLD Report and Done messages to
the static router port. If a static member port is configured for a multicast group, the Layer 2 device adds the
port to the outbound interface list for the multicast group.
After a Layer 2 multicast forwarding table is set up, the Layer 2 device searches the multicast forwarding
table for outbound interfaces of multicast data packets according to the VLAN IDs and destination addresses
(IPv6 group addresses) of the packets. If outbound interfaces are found for a packet, the Layer 2 device
forwards the packet to all the member ports of the multicast group. If no outbound interface is found, the
2022-07-08 2209
Feature Description
Layer 2 device drops the packet or broadcasts the packet in the VLAN.
After MLD snooping proxy is deployed on the Layer 2 device, the Layer 3 device considers that it interacts
with only one user. The Layer 2 device interacts with the upstream device and downstream hosts. The MLD
snooping proxy function conserves bandwidth by reducing MLD message exchanges. In addition, MLD
snooping proxy functions as a querier to process protocol messages received from downstream hosts and
maintain group memberships. This reduces the load of the upstream Layer 3 device.
Implementation
A device that runs MLD snooping proxy sets up and maintains a Layer 2 multicast forwarding table and
sends multicast data to hosts based on the multicast forwarding table. Table 3 describes how the MLD
snooping proxy device processes MLD messages.
2022-07-08 2210
Feature Description
MLD General Query message The Layer 2 device forwards the message to all
ports excluding the port receiving the message. The
device generates an MLD Report message based on
the group memberships and sends the MLD Report
message to all router ports.
Multicast-Address-Specific Query/Multicast Address If the group specified in the message has member
and Source Specific Query message ports in the multicast forwarding table, the Layer 2
device responds with an MLD Report message to all
router ports.
Service Overview
IPTV services are video services provided for users through an IP network. IPTV services pose high
2022-07-08 2211
Feature Description
requirements for bandwidth, real-time transmission, and reliability on IP MANs. Multiple users can receive
the same IPTV service data simultaneously.
Given the characteristics of IPTV, multicast technologies can be used to bear IPTV services. Compared with
traditional unicast, multicast ensures that network bandwidth demands do not increase with the number of
users and reduces the workload of video servers and the bearer network. If service providers want to deploy
IPTV services in a rapid and economical way, E2E multicast push is recommended.
Network Description
Currently, the IP MAN consists of a metro backbone network and broadband access network. IPTV service
traffic is pushed to user terminals through the metro backbone network and broadband access network in
sequence. Figure 1 shows an E2E IPTV service push model. The metro backbone network is mainly composed
of network layer (Layer 3) devices. PIM such as PIM-SM is used on each device on the metro backbone to
connect to the multicast source and IGMP is used on the devices directly connected to the broadband access
network to forward multicast packets to user terminals. The broadband access network is mainly composed
of data link layer (Layer 2) devices. Layer 2 multicast techniques such as IGMP proxy or IGMP snooping can
be used on Layer 2 devices to forward multicast packets to terminal users.
2022-07-08 2212
Feature Description
The following section describes Layer 2 multicast features used on the broadband access network.
Feature Deployment
The broadband access network is constructed using Layer 2 devices. Layer 2 devices exchange or forward
data frames by MAC address. They have week IP packet parsing and routing capabilities. As a result, the
Layer 2 devices do not support Layer 3 multicast protocols. Previously, Layer 2 devices broadcast IPTV
multicast traffic to all interfaces, which easily results in broadcast storms.
To solve the problem of multicast packet flooding, commonly used Layer 2 multicast forwarding techniques,
such as IGMP snooping, IGMP proxy, and multicast VLAN, can be used.
• Deploy IGMP snooping on all Layer 2 devices, so that they listen to IGMP messages exchanged between
Layer 3 devices and user terminals and maintain multicast group memberships, implementing on-
demand multicast traffic forwarding.
• Deploy IGMP snooping proxy on CEs close to user terminals, so that the CEs listen to, filter, and forward
IGMP messages. This reduces the number of multicast protocol packets directly exchanged between CEs
and upstream devices, and reduces packet processing pressure on upstream devices.
• Deploy multicast VLAN= on CEs close to user terminals to reduce the network bandwidth required for
transmissions between CEs and multicast sources.
• VSI or VLAN-based Layer 2 multicast instance (a multicast VLAN enhancement) can be deployed on CEs
close to user terminals to reduce the network bandwidth required for transmissions between CEs and
multicast sources.
• If the number of user terminals attached to a CE exceeds the number of IPTV channels, static multicast
groups can be configured on the CE to increase the channel change speed and improve the QoS for
IPTV services.
• If user hosts support IGMPv1 and IGMPv2 only, SSM mapping can be deployed on the CE connected to
these user terminals so the user hosts can access SSM services.
• Rapid multicast traffic forwarding can be deployed on a backup PE to improve the reliability of links
between the PE and CE.
• If a Layer 2 device uses no Layer 2 multicast forwarding technology, the device forwards multicast
packets to all IPTV users. Broadcasting multicast packets for five IPTV channels leads to network
congestion. This is the case even if the bandwidth of the interface connecting the Layer 2 device to
users is 10 Mbit/s.
• After Layer 2 multicast forwarding technologies are used on the Layer 2 device, the Layer 2 device sends
multicast packets only to users that require the multicast packets. If each interface of the Layer 2 device
is connected to at least one IPTV user terminal, multicast packets (2 Mbit/s traffic) for at most one BTV
2022-07-08 2213
Feature Description
channel are forwarded to corresponding interfaces. This ensures the availability of adequate network
bandwidth and the quality of user experience.
Networking Description
As shown in Figure 1, a multicast source exists on an IPv6 PIM network and provides multicast video services
for users on the LAN. Some users such as HostA and HostC on the LAN want to receive video data in
multicast mode. To prevent multicast data from being broadcast on the LAN, configure MLD snooping on
Layer 2 multicast devices to accurately forward multicast data on the Layer 2 network, which prevents
bandwidth waste and network information leakage.
Deployed Features
You can deploy the following features to accurately forward multicast data on the network shown in Figure
1:
• IPv6 PIM and MLD on the Layer 3 multicast device Router to route multicast data to user segments.
• MLD snooping on the Layer 2 device Switch so that Switch can set up and maintain a Layer 2 multicast
forwarding table to forward multicast data to specified users.
2022-07-08 2214
Feature Description
• MLD snooping proxy after configuring MLD snooping on Switch to release Router from processing a
large number of MLD messages.
Terms
Term Definition
(*, G) A multicast routing entry used in the ASM model. * indicates any source,
and G indicates a multicast group.
(*, G) applies to all multicast messages with the multicast group address
as G. That is, all the multicast messages sent to G are forwarded through
the downstream interface of the (*, G) entry, regardless of which multicast
sources send the multicast messages.
(S, G) A multicast routing entry used in the SSM model. S indicates a multicast
source, and G indicates a multicast group.
After a multicast packet with S as the source address and G as the group
address reaches a router, it is forwarded through the downstream
interfaces of the (S, G) entry.
A multicast packet that contains a specified source address is expressed as
an (S, G) packet.
PW pseudo wire
2022-07-08 2215
Feature Description
12 MPLS
Purpose
This document describes the MPLS feature in terms of its overview, principles, and applications.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
• Commissioning engineers
Security Declaration
• Notice on Limited Command Permission
This document describes the commands used for network deployment and maintenance, but does not
cover confidential commands such as those used for production, assembly, and return-to-factory
inspection. For details about confidential commands, please submit an application.
■ When the password encryption mode is cipher, avoid setting both the start and end characters of a
password to "%^%#". This causes the password to be displayed directly in the configuration file.
2022-07-08 2216
Feature Description
■ Your purchased products, services, or features may use users' some personal data during service
operation or fault locating. You must define user privacy policies in compliance with local laws and
take proper measures to fully protect personal data.
■ When discarding, recycling, or reusing a device, back up or delete data on the device as required to
prevent data leakage. If you need support, contact after-sales technical support personnel.
• Feature declaration
■ The NetStream feature may be used to analyze the communication information of terminal
customers for network traffic statistics and management purposes. Before enabling the NetStream
feature, ensure that it is performed within the boundaries permitted by applicable laws and
regulations. Effective measures must be taken to ensure that information is securely protected.
■ The mirroring feature may be used to analyze the communication information of terminal
customers for a maintenance purpose. Before enabling the mirroring function, ensure that it is
performed within the boundaries permitted by applicable laws and regulations. Effective measures
must be taken to ensure that information is securely protected.
■ The packet header obtaining feature may be used to collect or store some communication
information about specific customers for transmission fault and error detection purposes. Huawei
cannot offer services to collect or store this information unilaterally. Before enabling the function,
ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Special Declaration
• This document serves only as a guide. The content is written based on device information gathered
under lab conditions. The content provided by this document is intended to be taken as general
guidance, and does not cover all scenarios. The content provided by this document may be different
from the information on user device interfaces due to factors such as version upgrades and differences
in device models, board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are beyond the
scope of this document.
2022-07-08 2217
Feature Description
• The maximum values provided in this document are obtained in specific lab environments (for example,
only a certain type of board or protocol is configured on a tested device). The actually obtained
maximum values may be different from the maximum values provided in this document due to factors
such as differences in hardware configurations and carried services.
• Interface numbers used in this document are examples. Use the existing interface numbers on devices
for configuration.
• The supported boards are described in the document. Whether a customization requirement can be met
is subject to the information provided at the pre-sales interface.
• In this document, public IP addresses may be used in feature introduction and configuration examples
and are for reference only unless otherwise specified.
• The configuration precautions described in this document may not accurately reflect all scenarios.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Indicates a hazard with a high level of risk which, if not avoided, will
result in death or serious injury.
Indicates a hazard with a low level of risk which, if not avoided, could
result in minor or moderate injury.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made
in earlier issues.
2022-07-08 2218
Feature Description
Background
IP-based Internet prevailed in the mid 90s. The technology is simple and costs little to deploy. However,
nowadays IP technology, which relies on the longest match algorithm, is not the most efficient choice for
forwarding packets.
In comparison, Asynchronous Transfer Mode (ATM) is much more efficient at forwarding packets. However,
ATM technology is a complex protocol with a high deployment cost, which has hindered its widespread
popularity and growth.
Users wanted a technology that combines the best that both IP and ATM have to offer. The MPLS
technology emerges.
Multiprotocol Label Switching (MPLS) is designed to increase forwarding rates. Unlike IP technology, MPLS
analyzes packet headers on the edge of a network, not at each hop. Therefore, packet processing time is
shortened.
MPLS supports multi-layer labels, and its forwarding plane is connection-oriented. MPLS is widely used in
virtual private network (VPN), traffic engineering (TE), and quality of service (QoS) scenarios.
Overview
MPLS takes place between the data link layer and network layer in the TCP/IP protocol stack. MPLS supports
label switching between multiple network protocols, as implied by its name. MPLS can use any Layer 2
media to transfer packets, but is not exclusive by any specific protocol on the data link layer.
MPLS is derived from the Internet Protocol version 4 (IPv4). The core MPLS technology can be extended to
multiple network protocols, such as the Internet Protocol version 6 (IPv6), Internet Packet Exchange (IPX),
Appletalk, DECnet, and Connectionless Network Protocol (CLNP). Multiprotocol in MPLS means that the
protocol supports multiple network protocols.
The MPLS technology supports multiple protocols and services and improves data transmission security.
2022-07-08 2219
Feature Description
• Label edge routers (LERs): reside on the edge of an MPLS domain and connect to one or more MPLS-
incapable nodes.
• Core LSRs: reside inside an MPLS domain and connects only to LSRs inside the domain.
All LSRs on the MPLS network forward data based on labels. When IP packets enter an MPLS network, the
ingress LER analyzes the packets and then adds appropriate labels to them. When the IP packets leave the
MPLS network, the egress LER removes the labels.
The path through which IP packets are transmitted on an MPLS network is called a label switched path
(LSP). The LSP is a unidirectional path, consistent with the direction of data flow.
The start node of an LSP is called the ingress, an intermediate node of the LSP is called the transit node, and
2022-07-08 2220
Feature Description
the end node of the LSP is called the egress. An LSP has one ingress, one egress, and zero, one, or multiple
transit nodes.
Label
A label is a short and fixed-length identifier that has only local significance. It is used to uniquely identify the
FEC to which a packet belongs. In some cases, a FEC can be mapped to multiple incoming labels to balance
loads, but a label only represents a single FEC on a Router.
Figure 3 illustrates the structure of an MPLS header.
• Exp: a 3-bit l field used for extension. This field is used to implement the class of service (CoS) function,
which is similar to Ethernet 802.1p.
• S: a 1-bit field that identifies the bottom of a label stack. MPLS supports multiple labels that may be
stacked. If the S field value is set to 1, the label is at the bottom of the label stack.
• TTL: an 8-bit field indicating a time to live (TTL) value. This field is the same as the TTL field in IP
packets.
Labels are encapsulated between the data link layer and network layer, and are supported by all data link
layer protocols. Figure 4 illustrates the position of the label in a packet.
Label Space
Label space is the label value range. The device supports the following label ranges:
2022-07-08 2221
Feature Description
• 16–1023: label space shared by static LSPs and static CR-LSPs. The value can be greater than 1023, but
a value ranging from 16 to 1023 is recommended.
• 1024 and higher: label space shared by dynamic signaling protocols, such as LDP, RSVP-TE, and MP-
BGP.
Each dynamic signaling protocol uses an independent and contiguous label space, which is not shared
with other dynamic signaling protocols.
0 IPv4 Explicit NULL Label If the egress receives a packet carrying a label with this
value, the egress must remove the label from the packet.
The egress then forwards the packet using IPv4. If the
egress allocates a label with the value of 0 to the
penultimate hop LSR, the penultimate hop LSR pushes
label 0 to the top of the label stack and forwards the
packet to the last hop. When the last hop finds that the
packet carries a label value of 0, it pops up the label.
1 Router Alert Label This label takes effect only when it is not at the bottom
of a label stack. If a node receives a packet carrying a
label with this value (which is similar to the Router Alert
Option field in an IP packet), the node sends the packet
to a software module for further processing. The node
forwards the packet based on the next layer label. If the
packet needs to be forwarded using hardware, the node
pushes the Router Alert Label back onto the top of the
label stack before forwarding the packet.
2 IPv6 Explicit NULL Label The label must be popped out, and the packets must be
forwarded based on IPv6. If the egress allocates a label
with the value of 2 to the LSR at the penultimate hop,
the LSR pushes label 2 to the top of the label stack and
forwards the packet to the last hop. When the last hop
finds that the packet carries a label value of 2, it pops up
the label.
3 Implicit NULL Label If the penultimate LSR receives a packet carrying a label
with this value, it pops up the label and forwards the
packet to the last hop. The last hop then forwards the
packet over an IP route or based on a next label.
2022-07-08 2222
Feature Description
4 to 13 Reserved -
14 OAM Router Alert Label MPLS operation, administration and maintenance (OAM)
sends OAM packets to detect and notify LSP faults. OAM
packets are carried over MPLS. The OAM packets are
transparent to the transit LSR and the penultimate LSR.
15 Reserved -
Label Stack
A label stack is a set of sorted labels. MPLS allows a packet to carry multiple labels. The label next to the
Layer 2 header is called the top label or outer label, and the label next to the IP header is called the bottom
label or inner label. Theoretically, the number of MPLS labels that can be stacked is unlimited.
The labels are processed from the top of the stack based on the last in, first out principle.
Label Operations
The operations on MPLS labels include label push, label swap, and label pop. They are basic actions of label
forwarding and a part of the label forwarding information base (LFIB).
• Push: When an IP packet enters an MPLS domain, the ingress adds a label between the Layer 2 header
and the IP header of the packet. When the packet reaches a transit node, the transit node can also add
a label to the top of the label stack (label nesting) as needed.
• Swap: When the packet is forwarded inside the MPLS domain, a transit node searches the LFIB and
replaces the label on top of the stack in the MPLS packet with the label that is assigned by the next
hop.
• Pop: When the packet leaves the MPLS domain, the egress removes the MPLS label; or the MPLS node
at the penultimate hop removes the label on top of the stack to reduce the number of labels in the
label stack.
2022-07-08 2223
Feature Description
LER
An LSR that resides on the edge of an MPLS domain is called a label edge router (LER). When an LSR
connects to a node that does not run MPLS, the LSR acts as an LER.
An ingress LER classifies the packets that enter an MPLS domain into forwarding equivalence classes (FECs),
pushes labels into them, and then forwards them based on labels. An egress LER pops out the labels from
the packets that leave an MPLS domain, and then forwards them based on the original packet type (that is
the type before labels are encapsulated).
• Ingress LSR: the start node on an LSP. An LSP can have only one ingress.
The ingress creates an MPLS header field into which it pushes a label. This essentially turns the IP
packet into an MPLS packet.
• Transit LSR: an optional intermediate node on an LSP. An LSP can have multiple transit LSRs.
A transit node searches the LFIB and forwards MPLS packets through label swapping.
• Egress LSR: the end node on an LSP. An LSP can have only one egress.
The egress pops the label out of an MPLS packet and restores the original packet before forwarding it.
The ingress and egress function as both LSRs and LERs. The transit node functions only as an LSR.
2022-07-08 2224
Feature Description
• Upstream LSRs: All LSRs that send MPLS packets to the local LSR are upstream LSRs.
• Downstream LSR: All LSRs that receive MPLS packets from the local LSR are called downstream LSR.
In Figure 6, for data flows destined for 192.168.1.0/24, LSRA is the upstream LSR of LSRB, and LSRB is the
downstream of LSRA. Similarly, LSRB is the upstream LSR of LSRC. LSRC is the downstream LSR of LSRB.
Label Distribution
The packets with the same destination address are assigned to an FEC and a label is extracted from the label
resource pool and is allocated to this FEC. An LSR records a mapping between a label and FEC and notifies
upstream LSRs of the mapping. This process is called label distribution.
In Figure 7, packets with the destination address 192.168.1.0/24 are assigned to a specific FEC. LSRB and
LSRC allocate labels that represent the FEC and advertise the mapping between labels and the FEC to
upstream LSRs. Therefore, labels are allocated by downstream LSRs.
2022-07-08 2225
Feature Description
MPLS Architecture
As shown in Figure 8, the MPLS architecture consists of a control plane and a forwarding plane.
• The control plane depends on IP routes, and control protocol packets are transmitted over IP routes. It is
used to distribute labels, create a label forwarding table, and establish or tear down LSPs.
• The forwarding plane, also known as the data plane, does not depend on IP routes. It can apply services
and protocols supported by ATM, and Ethernet. The forwarding plane adds labels to IP packets and
removes labels from MPLS packets. It forwards packets based on the label forwarding table.
Procedure
MPLS assigns packets to a FEC, distributes labels that identify the FEC, and establishes an LSP. Packets travel
along the LSP.
On the network shown in Figure 1, packets destined for 3.3.3.3 are assigned to a FEC. Downstream LSRs
assign labels for the FEC to upstream LSRs and use a label advertisement protocol to inform its upstream
LSR of the mapping between the labels and the FEC. Each upstream LSR adds the mapping to a label
forwarding table. An LSP is established using the label mapping information.
LSPs can be either static or dynamic. Static LSPs are established manually. Dynamic LSPs are established
2022-07-08 2226
Feature Description
■ Bandwidth constraint
■ Link colors
■ Explicit paths
2022-07-08 2227
Feature Description
Field Description
Token An index used to search an MPLS forwarding table for a specific entry
Mixed: Label spaces are created using both global and per slot methods
but takes effect based on the interface type:
The interfaces of a backbone network or VLANIF interfaces use the label
space created using the global method.
Other interfaces use the label space created using the per slot method.
Mixed with 2 global space: Label spaces are created using the global,
global with reserved tokens, and per slot methods. Only one label space
takes effect.
2 global space: Label spaces are created using the global and global with
reserved tokens methods.
2022-07-08 2228
Feature Description
■ Tunnel ID
■ Label operation
■ Tunnel ID
■ Label operation
A transit node creates ILM entries containing the mapping between labels and NHLFEs. The node
searches an ILM table for an entry that matches a specific destination IP address before forwarding the
packet.
An LSP for a FEC with the destination address 3.3.3.3/32 is established on the MPLS network shown in Figure
2022-07-08 2229
Feature Description
2.
The process of forwarding MPLS packets is as follows:
1. The ingress receives an IP packet destined for 3.3.3.3/32. The ingress adds Label Z to the packet and
forwards the packet to the adjacent transit node.
2. The transit node receives the labeled packet and swaps Label Z for Label Y in the packet. It then
forwards the packet to the penultimate transit node.
3. The penultimate transit node receives the packet with Label Y. As the egress assigns Label 3 to the
transit node, the transit node removes Label Y and forwards the IP packet to the egress.
Nodes along an LSP search the following tables for entries used to forward MPLS packets:
1. The ingress searches the FIB and NHLFE tables to forward MPLS packets.
2. The transit searches the ILM and NHLFE tables to forward MPLS packets.
FIB entries, ILM entries, and NHLFEs of the same tunnel have the same token values.
1. Searches the FIB table and finds a tunnel ID mapped to a specific destination IP address.
2. Finds an NHLFE mapped to the tunnel ID in the FIB table and associates the FIB entry with the
NHLFE.
2022-07-08 2230
Feature Description
3. Checks the NHLFE for the outbound interface name, next hop address, outgoing label value, and
label operation. The label operation type is Push.
4. Pushes a label into an IP packet, processes the EXP field based on a specific QoS policy and TTL
field, and sends the encapsulated MPLS packet to the transit node.
1. Searches the ILM table for a token that matches the MPLS label.
3. Checks the NHLFE for the outbound interface name, next hop address, outgoing label value, and
label operation.
• If the label is greater than or equal to 16, the label operation is Swap.
• If the label value is greater than or equal to 16, the transit node performs the following
operations:
■ Replaces the existing label with a new label in the MPLS packet.
■ Forwards the MPLS packet with the new label to the egress.
• If the label value is 3, the transit node performs the following operations:
■ Forwards the IP packet over an IP route or to a next hop based on another label.
• If the S field in the label is 1, the label is at the bottom of the stack and the egress forwards
the packet over an IP route.
• If the S field is 0 in the label, the label is not at the bottom of the stack. Therefore, the
egress forwards the packet based on the new topmost label.
2022-07-08 2231
Feature Description
An MPLS label contains an 8-bit TTL field. The TTL field has the same function as that in an IP packet
header. MPLS processes the TTL to prevent loops and implement traceroute.
As defined in relevant standards, MPLS processes TTLs in either uniform or pipe mode. By default, MPLS
processes TTLs in uniform mode.
• Uniform Mode
The IP TTL value reduces by one each time it passes through a node in an MPLS network.
When IP packets enter the MPLS network shown in Figure 4, the ingress reduces the IP TTL value by
one and copies the IP TTL value to the MPLS TTL field. Each transit node only processes the MPLS TTL.
Then the egress reduces the MPLS TTL value by one and copies the MPLS TTL value to the IP TTL field.
• Pipe Mode
The IP TTL value decreases by one only when passing through the ingress and egress.
On the network shown in Figure 5, the ingress reduces the IP TTL value in packets by one and sets the
MPLS TTL to a specific value. Transit nodes only process the MPLS TTL. When the egress receives the
packets, it removes the MPLS label carrying the MPLS TTL from each packet and reduces the IP TTL
value by one.
2022-07-08 2232
Feature Description
Background
A network with increasing scale and complexity allows for devices of various specifications. Without packet
fragmentation enabled, an MPLS P node transparently transmits packets sent by the ingress PE to the egress
PE. If the MTU configured on the ingress PE is greater than the MRU configured on the egress PE, the egress
PE discards packets with sizes larger than the MRU.
Principles
In Figure 1, the ingress PE1 has MTU1 greater than MRU2 on the egress PE2. PE2 is enabled to discard
packets with sizes larger than MRU2. Without packet fragmentation enabled, a P node transparently
forwards a packet with a size of LENGTH (MTU1 > LENGTH > MRU2) to PE2. Since the packet length is
greater than MRU2, PE2 discards the packet. After packet fragmentation is enabled on the P node, the P
node fragments the same packet in to a packet with the size of MTU2 (MTU2 < MRU2) and a packet with a
specified size (LENGTH minus MTU2). If the LENGTH-MTU2 value is greater than MTU2, the fragment is
also fragmented. After the fragments reach PE2, PE2 properly forwards them because their lengths are less
than MRU2.
2022-07-08 2233
Feature Description
value changes (such as when the local outbound interface or its configuration changes), an LSR recalculates
the MTU value and sends a Label Mapping message carrying the new MTU value to all upstream devices.
• Customer edge (CE): an edge device on a customer network. The CE can be a router, switch, or host.
• Provider (P): is a backbone device in the service provider network and does not connect to CEs directly.
A P device obtains basic MPLS forwarding capabilities but does not maintain VPN information.
• PEs manage VPN users, establish LSPs between themselves, and advertise routes to VPN sites.
• The MPLS-based VPN supports IP address multiplexing between sites and the interconnection of VPNs.
2022-07-08 2234
Feature Description
• The traffic for original services is forwarded through the original network.
To forward part of the traffic of new services through the original network, you can configure the PBR to an
LSP on DeviceA. The traffic matching the specific policy can be forwarded through the original network.
You can also use PBR with LDP fast reroute (FRR) to divert some traffic to a backup LSP to balance traffic
between the primary and backup LSP may be idle relatively.
Definition
The Label Distribution Protocol (LDP) is a Multiprotocol Label Switching (MPLS) control protocol, a signaling
protocol of a traditional network. It classifies forwarding equivalence classes (FECs), distributes labels, and
establishes and maintains label switched paths (LSPs). LDP defines messages in the label distribution process
as well as procedures for processing these messages.
Purpose
2022-07-08 2235
Feature Description
On an MPLS network, LDP distributes label mappings and establishes LSPs. LDP sends multicast Hello
messages to discover local peers and sets up local peer relationships. Alternatively, LDP sends unicast Hello
messages to discover remote peers and sets up remote peer relationships.
Two LDP peers establish a TCP connection, negotiate LDP parameters over the TCP connection, and establish
an LDP session. They exchange messages over the LDP session to set up an LSP. LDP networking is simple to
construct and configure, and LDP establishes LSPs using routing information.
• LDP LSPs guide IP data across a full-mesh MPLS network, over which a Border Gateway Protocol-free
(BGP-free) core network can be built.
• LDP works with BGP to establish end-to-end inter-autonomous system (inter-AS) or inter-carrier tunnels
to transmit Layer 3 virtual private network (L3VPN) services.
• LDP over traffic engineering (TE) combines LDP and TE advantages to establish end-to-end tunnels to
transmit virtual private network (VPN) services.
LDP Adjacency
When an LSR receives a Hello message from a peer, the LSR establishes an LDP adjacency with the peer that
may exist. An LDP adjacency maintains a peer relationship between the two LSRs. There are two types of
LDP adjacencies:
2022-07-08 2236
Feature Description
• Local adjacency: established by exchanging Link Hello messages between two LSRs.
• Remote adjacency: established by exchanging Target Hello messages between two LSRs.
LDP Peers
Two LDP peers establish LDP sessions and exchange Label Mapping messages over the session so that they
establish an LSP.
LDP Session
An LDP session between LSRs helps them exchange messages, such as Label Mapping messages and Label
Release messages. LDP sessions are classified into the following types:
• Local LDP session: created over a local adjacency. The two LSRs, one on each end of the local LDP
session, are directly connected. After a local LDP session is established, LSRs can exchange labels and
establish LDP LSPs.
• Remote LDP session: created over a remote adjacency. The two LSRs, one on each end of the remote
LDP session, can be either directly or indirectly connected. A remote LDP session can be used to transmit
protocol packets of an L2VPN. When two devices on an L2VPN are not directly connected, a remote LDP
session needs to be established. LDP labels are directly distributed between remote LDP peers through
remote LDP sessions. This mode applies to scenarios where LDP tunnels over other types of tunnels,
such as LDP over TE.
Differences and Relations Among the LDP Adjacency, Peer, and Session
Differences
Differences among the LDP adjacency, peer, and session are as follows:
• An LDP adjacency is a TCP connection established after two devices exchange Hello messages with each
other. The LDP adjacency is based on a link between two interconnected interfaces.
• LDP peers refer to two devices that run LDP to exchange label messages over an established TCP
connection.
• An LDP session is a series of processes of exchanging label messages between two LDP peers.
Relations
The association between LDP adjacencies, peers, and sessions is summarized as follows: Before setting up an
LDP session, you need to establish a link that establishes a TCP connection. The link is an adjacency. After an
adjacency is established, the two devices exchange LDP session messages to establish an LDP session and an
LDP peer relationship. Then LDP peers exchange label information. It may be specifically described as
follows:
2022-07-08 2237
Feature Description
• LDP maintains the existence of the peers through adjacencies. The type of peer is determined by the
type of the adjacency that maintains the peer.
• A peer can be maintained using multiple adjacencies. If a peer is maintained by both local and remote
adjacencies, the peer is a local and remote coexistent peer.
LDP Messages
Two LSRs exchange the following messages:
• Discovery message: used to notify or maintain the presence of an LSR on an MPLS network.
• Session message: used to establish, maintain, or terminate an LDP session between LDP peers.
• Advertisement message: used to create, modify, or delete a mapping between a specific FEC and label.
LDP transmits Discovery messages using the User Datagram Protocol (UDP) and transmits Session,
Advertisement, and Notification messages using the Transmission Control Protocol (TCP).
• LDP identifier
An LDP identifier identifies a label space used by a specified LSR. An LDP identifier consists of 6 bytes
including a 4-byte LSR ID and a 2-byte label space. An LDP identifier is in the format of <LSR ID>:<Label
space ID>.
• Basic discovery mechanism: used to discover directly connected LSR peers on a link.
An LSR periodically sends Link LDP Hello messages to discover LDP peers and establish local LDP
sessions with the peers.
The Link Hello messages are encapsulated in UDP packets with a specific multicast destination address
and are sent using LDP port 646. A Link Hello message carries an LDP identifier and other information,
such as the hello-hold time and transport address. If an LSR receives a Link Hello message on a
2022-07-08 2238
Feature Description
• Extended discovery mechanism: used to discover the LSR peers that are not directly connected to a local
LSR.
The Targeted Hello messages are encapsulated in UDP packets and carry unicast destination addresses
and are sent using LDP port 646. A Targeted Hello message carries an LDP identifier and other
information, such as the hello-hold time and transport address. If an LSR receives a Targeted Hello
message, the LSR has a potential LDP peer.
1. Two LSRs exchange Hello messages. After receiving the Hello messages carrying transport addresses,
the two LSRs use the transport addresses to establish an LDP session. The LSR with the larger
transport address serves as the active peer and initiates a TCP connection. LSRA serves as the active
peer to initiate a TCP connection and LSRB serves as the passive peer that waits for the TCP
connection to initiate.
2. After the TCP connection is successfully established, LSRA sends an Initialization message to negotiate
parameters used to establish an LDP session with LSRB. The main parameters include the LDP version,
label advertisement mode, Keepalive hold timer value, maximum PDU length, and label space.
3. Upon receipt of the Initialization message, LSRB replies to LSRA in either of the following situations:
• If LSRB rejects some parameters, it sends a Notification message to terminate LDP session
establishment.
2022-07-08 2239
Feature Description
• If LSRB accepts all parameters, it sends an Initialization message and a Keepalive message to
LSRA.
4. Upon receipt of the Initialization message, LSRA performs operations in either of the following
situation:
• If LSRA rejects some parameters after receiving the Initialization message, it sends a Notification
message to terminate LDP session establishment.
After both LSRA and LSRB have accepted each other's Keepalive messages, the LDP session is successfully
established.
• Global mLDP
The default label advertisement and management modes are DU label advertisement+ordered label
control+liberal label retention.
• DU mode: An LSR binds a label to a specific FEC and notifies its upstream LSR of the binding, without
having to first receive a Label Request message sent by the upstream LSR.
In Figure 1, the downstream egress triggers the establishment of an LSP destined for FEC 192.168.1.1/32
in host mode by sending a Label Mapping message to the upstream transit LSR to advertise the label of
its host route 192.168.1.1/32.
Figure 1 DU mode
• DoD mode: An LSR binds a label to a specific FEC and notifies its upstream LSR of the binding only after
it receives a Label Request message from the upstream LSR.
In Figure 2, the downstream egress triggers the establishment of an LSP destined for FEC 192.168.1.1/32
in host mode. The upstream ingress sends a Label Request message to the downstream egress. The
downstream egress sends a Label Mapping message to the upstream LSR only after receiving the Label
Request message.
2022-07-08 2241
Feature Description
The label distribution control mode defines how an LSR distributes labels during the establishment of an
LSP. Two label distribution control modes are available:
• Independent mode: A local LSR binds a label to an FEC and distributes this label to an upstream LSR
without waiting for a label assigned by a downstream LSR.
■ In Figure 1, if the label distribution mode is DU and the label distribution control mode is
Independent, the transit LSR distributes labels to the upstream ingress without waiting for labels
assigned by the downstream egress.
■ In Figure 2, if the label distribution mode is DoD and the label distribution control mode is
Independent, the downstream transit LSR directly connected to the ingress LSR that sends a Label
Request message replies with labels without waiting for labels assigned by the downstream egress.
• Ordered mode: An LSR advertises the mapping between a label and an FEC to its upstream LSR only
when this LSR has received a Label Mapping message from the next hop of the FEC or the LSR is the
egress of the FEC.
■ In Figure 1, if the label distribution mode is DU and the label distribution control mode is ordered,
the transit LSR distributes a label to the upstream ingress only after receiving a Label Mapping
message from the downstream egress.
■ In Figure 2, if the label distribution mode is DoD and the label distribution control mode is ordered,
the transit LSR directly connected to the ingress LSR that sends a Label Request message
distributes a label upstream to the ingress only after receiving a Label Mapping message from the
downstream egress.
• Liberal mode: An LSR retains the label mappings received from a neighbor LSR regardless of whether
the neighbor LSR is its next hop.
• Conservative mode: An LSR retains the label mappings received from a neighbor LSR only when the
neighbor LSR is its next hop.
When the next hop of an LSR changes due to a network topology change:
• In liberal mode, the LSR can use the labels advertised by a non-next hop LSR to quickly reestablish an
LSP. This mode, however requires more memory and label space than the conservative mode. An LSP
that has been assigned a label but fails to be established is called a liberal LSP.
• In conservative mode, the LSR retains the labels advertised by the next hop only. This mode saves
memory and label space but takes more time to reestablish an LSP. Conservative label retention mode
is usually used together with DoD on the LSRs that have limited label space.
2022-07-08 2242
Feature Description
Background
As user networks and the scope of network services continue to expand, load-balancing techniques are used
to improve bandwidth between nodes. If tunnels are used for load balancing, transit nodes (P) obtain IP
content carried in MPLS packets as a hash key. If a transit node cannot obtain the IP content from MPLS
packets, the transit node can only use the top label in the MPLS label stack as a hash key. The top label in
the MPLS label stack cannot differentiate underlying protocols in detail. As a result, the top MPLS labels are
not distinguished when being used as hash keys, resulting in load imbalance. Per-packet load balancing can
be used to prevent load imbalance but results in packets being delivered out of sequence. This drawback
adversely affects service experience. To address the problems, the entropy label feature can be configured to
improve load balancing performance.
Implementation
An entropy label is generated on an ingress LSR, and it is only used to enhance the ability to load-balance
traffic. To help the egress distinguish the entropy label generated by the ingress from application labels, an
identifier label of 7 is added before an entropy label in the MPLS label stack.
The ingress LSR generates an entropy label and encapsulates it into the MPLS label stack. Before the ingress
LSR encapsulates packets with MPLS labels, it can easily obtain IP or Layer 2 protocol data for use as a hash
key. If the ingress LSR identifies the entropy label capability, it uses IP information carried in packets to
compute an entropy label, adds it to the MPLS label stack, and advertises it to the transit node (P). The P
uses the entropy label as a hash key to load-balance traffic and does not need to parse IP data inside MPLS
packets.
The entropy label is negotiated using RSVP for improved load balancing. The entropy label is pushed into
packets by the ingress and removed by the egress. Therefore, the egress needs to notify the ingress of the
2022-07-08 2243
Feature Description
• Egress: If the egress can parse an entropy label, the egress extends an LDP message by adding an
entropy label capability TLV into the message. The egress sends the message to notify upstream nodes,
including the ingress, of the local entropy label capability.
• Transit node: sends an LDP message to upstream nodes to transparently transmit the downstream
node's entropy label capability. If load balancing is enabled, the LDP messages sent by the transit node
carry the entropy label capability TLV only if all downstream nodes have the capability. If a transit node
does not identify the entropy label capability TLV, the transit node transparently transmits the TLV by
undergoing the unknown TLV process.
• Ingress: determines whether to add an entropy label into packets to improve load balancing based on
the entropy label capability advertised by the egress.
Application Scenarios
Entropy labels can be used in the following scenarios:
• On the network shown in Figure 1, entropy labels are used when load balancing is performed among
transit nodes.
• The entropy label feature applies to public network MPLS tunnels in service scenarios such as IPv4/IPv6
over MPLS, L3VPNv4/v6 over MPLS, VPLS/VPWS over MPLS, and EVPN over MPLS.
Function Restrictions
On the network shown in Figure 2, the entire tunnel has the entropy label capability only when both the
primary and backup paths of the tunnel have the entropy label capability. An LDP session is established
between each pair of directly connected devices (P1 through P4). On P1, for the tunnel to P3, the primary
LSP is P1–>P3, and the backup LSP is P1–>P2–>P4–>P3. On P2, for the tunnel to P3, the primary LSP is
P2–>P4–>P3, and the backup LSP is P2–>P1–>P3. In this example, P1 and P2 are the downstream nodes of
each other's backup path. Assume that the entropy label capability is enabled on P3 and this device sends an
LDP message carrying the entropy label capability to P1 and P4. After receiving the message, P1 checks
whether the entire LSP to P3 has the entropy label capability. Because the path P1–>P2 does not have the
entropy label capability, P1 considers that the LSP to P3 does not have the entropy label capability. As a
result, P1 does not send an LDP message carrying the entropy label capability to P2. P2 performs the same
check after receiving an LDP message carrying the entropy label capability from P4. If the path P2–>P1 does
not have the entropy label capability, P2 also considers that the LSP to P3 does not have the entropy label
capability. To prevent LDP tunnel entropy label negotiation failures, configure P1, P2, and P4 to perform
entropy label negotiation only based on the primary path.
2022-07-08 2244
Feature Description
Benefits
Entropy labels help achieve more even load balancing.
An LSR checks whether an outbound policy mapped to the labeled BGP route or non-BGP route is configured
before sending a Label Mapping message for a FEC.
• If no outbound policy is configured, the LSR sends the Label Mapping message.
• If an outbound policy is configured, the LSR checks whether the FEC in the Label Mapping message is
within the range defined in the outbound policy. If the FEC is within the FEC range, the LSR sends a
Label Mapping message for the FEC; if the FEC is not in the FEC range, the LSR does not send a Label
Mapping message.
2022-07-08 2245
Feature Description
take effect with L2VPN label mapping messages, which means that all L2VPN label mapping messages can
be received. In addition, the range of FECs to which the non-BGP routes are mapped is configurable.
If FECs in the label mapping messages to be received by an LDP peer group or all LDP peers are in the same
range, the same inbound policy applies to the LDP peer group or all LDP peers.
An LSR checks whether an inbound policy mapped to a FEC is configured before receiving a label mapping
message for the FEC.
• If no inbound policy is configured, the LSR receives the label mapping message.
• If an inbound policy is configured, the LSR checks whether the FEC in the label mapping message is
within the range defined in the inbound policy. If the FEC is within the FEC range, the LSR receives the
label mapping message for the FEC; if the FEC is not in the FEC range, the LSR does not receive the
label mapping message.
If the FEC fails to pass an outbound policy on an LSR, the LSR receives no label mapping message for the
FEC.
• If a DU LDP session is established between an LSR and its peer, a liberal LSP is established. This liberal
LSP cannot function as a backup LSP after LDP FRR is enabled.
• If a DoD LDP session is established between an LSR and its peer, the LSR sends a Release message to
tear down label-based bindings.
2022-07-08 2246
Feature Description
1. If a label edge router (LER) on an MPLS network discovers a new direct route due to a network route
change, and the address carried in the new route does not belong to any existing forwarding
equivalence class (FEC), the LER creates a FEC for the address.
2. If the egress has available labels for distribution, it distributes a label for the FEC and pro-actively
sends a Label Mapping message to its upstream transit LSR. The Label Mapping message contains the
assigned label and an FEC bound to the label.
3. The transit LSR adds the mapping in the Label Mapping message to the label forwarding table and
sends a Label Mapping message with a specified FEC to its upstream LSR.
4. The ingress LSR also adds the mapping to its label forwarding table. The ingress LSR establishes an
LSP and forwards packets along the LSP.
A proxy egress LSP can be established on a network with MPLS-incapable routers or in the Border Gateway
Protocol (BGP) route load balancing scenario. For example, on the network shown in Figure 2, LSRA, LSRB,
and LSRC, all except LSRD, are in an MPLS domain. An LSP is established along the path LSA -> LSRB ->
LSRC. LSRC functions as a proxy egress and extends the LSP to LSRD. The extended LSP is a proxy egress LSP.
Background
2022-07-08 2247
Feature Description
If a direct link for a local LDP session fails, the LDP adjacency is torn down, and the session and labels are
deleted. After the direct link recovers, the local LDP session is reestablished and distributes labels so that an
LSP can be reestablished over the session. LSP establishment takes a period of time. During this process,
traffic along the LDP LSP that is to be established is discarded.
To speed up LDP LSP convergence and minimize packet loss, the NE40E implements LDP session protection.
LDP session protection helps maintain an LDP session, eliminating the need to reestablish an LDP session or
re-distribute labels.
Principles
In Figure 1, LDP session protection is configured on the nodes at both ends of a link. The two nodes
exchange Link Hello messages to establish a local LDP session and exchange Targeted Hello messages to
establish a remote LDP session, forming a backup relationship between the remote LDP session and local
LDP session.
In Figure 1, if the direct link between LSRA and LSRB fails, the adjacency established using Link Hello
messages is torn down. Because the indirectly connected link is working properly, the remote adjacency
established using Targeted Hello messages remains. Therefore, the LDP session is maintained by the remote
adjacency, and the mapping between FECs and labels for the session also remains. After the direct link
recovers, the local LDP session can rapidly restore LSP information. There is no need to reestablish the LDP
session or re-distribute labels, which minimizes the time required for LDP session convergence.
2022-07-08 2248
Feature Description
Background
On an MPLS network with both active and standby links, if an active link fails, IGP routes re-converge, and
the IGP route of the standby link becomes reachable. An LDP LSP over the standby link is then established.
During this process, some traffic is lost. To minimize traffic loss, LDP Auto FRR is used.
On the network enabled with LDP Auto FRR, if an interface failure (detected by the interface itself or by an
associated BFD session) or a primary LSP failure (detected by an associated BFD session) occurs, LDP FRR is
notified of the failure and rapidly forwards traffic to a backup LSP, protecting traffic on the primary LSP. The
traffic switchover minimizes the traffic interruption time.
Implementation
LDP LFA FRR
LDP LFA FRR is implemented based on IGP LFA FRR's LDP Auto FRR. LDP LFA FRR uses the liberal label
retention mode to obtain a liberal label, applies for a forwarding entry associated with the label, and
forwards the forwarding entry to the forwarding plane as a backup forwarding entry to be used by the
primary LSP. If an interface detects a failure of its own, bidirectional forwarding detection (BFD) detects an
interface failure, or BFD detects a primary LSP failure, LDP LFA FRR rapidly switches traffic to a backup LSP
to protect traffic on the primary LSP.
Figure 1 Typical usage scenario for LDP Auto FRR (triangle topology)
Figure 1 shows a typical usage scenario for LDP Auto FRR. The preferred LSRA-to-LSRB route is LSRA-LSRB
and the second optimal route is LSRA-LSRC-LSRB. A primary LSP between LSRA and LSRB is established on
LSRA, and a backup LSP of LSRA-LSRC-LSRB is established to protect the primary LSP. After receiving a label
from LSRC, LSRA compares the label with the LSRA-to-LSRB route. Because the next hop of the LSRA-to-
LSRB route is not LSRC, LSRA preserves the label as a liberal label.
If the backup route corresponding to the source of the liberal label for LDP auto FRR exists, and its
destination meets the policy for LDP to create a backup LSP. LSRA can apply a forwarding entry for the
liberal label, establish a backup LSP as the backup forwarding entry of the primary LSP, and send the entries
mapped to both the primary and backup LSPs to the forwarding plane. In this way, the primary LSP is
associated with the backup LSP.
LDP Auto FRR is triggered when the interface detects faults by itself, BFD detects faults in the interface, or
2022-07-08 2249
Feature Description
BFD detects a primary LSP failure. After LSP FRR is complete, traffic is switched to the backup LSP based on
the backup forwarding entry. Then, the route is converged to LSRA-LSRC-LSRB. An LSP is established on the
new LSP (the original backup LSP), and the original primary LSP is torn down, and then the traffic is
forwarded along the new LSP over the path LSRA-LSRC-LSRB.
LDP Remote LFA FRR
LDP LFA FRR cannot calculate backup paths on large networks, especially ring networks, which fails to meet
reliability requirements. On a common ring network shown in Figure 2, PE1-to-PE2 traffic is transmitted
along the shortest path PE1->PE2 based on the path cost. If the link between PE1 and PE2 fails, PE1 first
detects the fault. PE1 then forwards the traffic to P1 and expects P1 to forward the traffic to P2 and finally
to PE2. At the moment when the fault occurs, P1 does not detect the fault. After the traffic forwarded by
PE1 arrives at P1, P1 returns the traffic to PE1 based on the path cost. In this case, a routing loop occurs
between PE1 and P1. A large number of loop packets are transmitted on the link between PE1 and P1. As a
result, packets from PE1 to P1 are discarded due to congestion.
Figure 2 Typical LDP Auto FRR usage scenario – square-shaped topology (1)
To address this issue, LDP Remote LFA FRR is used. Remote LFA FRR is implemented based on IGP Remote
LFA FRR's (IS-IS Auto FRR) LDP Auto FRR. Figure 3 illustrates the typical LDP Auto FRR usage scenario. The
primary LDP LSP is established over the path PE1 -> PE2. Remote LFA FRR establishes a Remote LFA FRR LSP
over the path PE1 -> P2 -> PE2 to protect the primary LDP LSP.
2022-07-08 2250
Feature Description
1. An IGP uses the Remote LFA algorithm to calculate a Remote LFA route with the PQ node (P2) IP
address and the recursive outbound interface's next hop and then notifies the route management
module of the information. For the PQ node definition, see IS-IS Auto FRR.
2. LDP obtains the Remote LFA route from the route management module. PE1 automatically establishes
a remote LDP peer relationship with the PQ node and a remote LDP session for the relationship. PE1
then establishes an LDP LSP to the PQ node and a Remote LFA FRR LSP over the path PE1 -> P2 ->
PE2. For information about how to automatically establish a remote LDP session, see LDP Session.
3. LDP-enabled PE1 establishes an LDP LSP over the path PE1 -> P1 -> P2 with the recursive outbound
interface's next hop. This LSP is called a Remote LFA FRR Recursion LSP.
If PE1 detects a fault, PE1 rapidly switches traffic to the Remote LFA FRR LSP.
Background
LDP-IGP synchronization is used to synchronize the status between LDP and an IGP to minimize the traffic
loss time if a network fault triggers the LDP and IGP switching.
On a network with active and standby links, if the active link fails, IGP routes and an LSP are switched to the
standby link. After the active link recovers, IGP routes are switched back to the active link before LDP
convergence is complete. In this case, the LSP along the active link takes time to make preparations, such as
adjacency restoration, before being established. As a result, LSP traffic is discarded. If an LDP session or
adjacency between nodes fails on the active link, the LSP along the active link is deleted. However, the IGP
2022-07-08 2251
Feature Description
still uses the active link, and as a result, LSP traffic cannot be switched to the standby link, and is
continuously discarded.
According to the fundamentals of LDP-IGP synchronization, an IGP cost value is set to delay a route
switchback until LDP convergence is complete. Before the LSP along the active link is established, the LSP
along the standby link is retained, so that the traffic continues to be forwarded through the standby link.
The backup LSP is torn down only after the primary LSP is established successfully.
LDP-IGP synchronization timers are as follows:
• Hold-max-cost timer
• Delay timer
Implementation
Figure 1 Switchback problem to be solved in LDP-IGP synchronization
• In Figure 1, on a network with active and standby links, after the active link recovers, an attempt is
made to switch traffic back from the standby link to the active link. Revertive traffic is discarded
because the backup LSP becomes unavailable after the IGP convergence is complete but the primary
LSP is not established. In this situation, you can configure LDP-IGP synchronization to delay the IGP
route switchback until LDP convergence is complete. Before the primary LSP is converged, the backup
LSP is retained, so that the traffic continues to be forwarded through the backup LSP until the primary
LSP is successfully established. Then the backup LSP is torn down. The process is as follows:
2. An IGP advertises the maximum cost of the active link, delaying the IGP route switchback.
2022-07-08 2252
Feature Description
4. After the LDP session and adjacency are successfully established, Label Mapping messages are
exchanged to instruct the IGP to start synchronization.
5. The IGP advertises the normal cost of the active link and converges to the original path. The LSP
is reestablished and the forwarding entries are delivered within milliseconds.
• If the LDP session or adjacency between nodes on the active link fails, the primary LSP is deleted, but
the IGP still uses the active link. As a result, LSP traffic cannot be switched to the standby link, and
traffic is continuously discarded. In this situation, you can configure LDP-IGP synchronization. If an LDP
session or adjacency fails, LDP informs the IGP that the LDP session or adjacency is faulty. In this case,
the IGP advertises the maximum cost of the faulty link. The route is switched to the standby link, and
the LSP is also switched to the standby link. The process is as follows:
1. The LDP session or adjacency between nodes on the active link is faulty.
2. LDP informs the IGP that the LDP session or adjacency along the active link is faulty. The IGP
then advertises the maximum cost of the active link.
4. The LSP is reestablished along the standby link, and forwarding entries are delivered.
To prevent a continuous failure to reestablish the LDP session or adjacency, you can configure the Hold-
max-cost timer to enable the node to permanently advertise the maximum cost so that traffic is
transmitted over the standby link until the LDP session and LDP adjacency are reestablished along the
active link.
After LDP-IGP synchronization is enabled on an interface, an IGP queries the status of related interfaces,
LDP sessions, and LDP adjacencies based on the process shown in Figure 2. Then, the interface enters a
state based on the query result. Then, the state transition is performed, as shown in Figure 2.
2022-07-08 2253
Feature Description
Figure 2 Query process of and status transition diagram for LDP-IGP synchronization
When different IGP protocols are used, the preceding states are different.
■ When OSPF is used, the status transits based on the flowchart shown in Figure 2.
■ When IS-IS is used, no Hold-normal-cost state is involved. After the Hold-Max-Cost timer expires, IS-IS
advertises the normal cost value of the interface link, but the Hold-max-cost state is still displayed.
Usage Scenario
On the network shown in Figure 3, an active link and a standby link are established. LDP-IGP
synchronization and LDP FRR can be deployed together.
2022-07-08 2254
Feature Description
Benefits
LDP-IGP synchronization reduces the packet loss rate during an active/standby link switchover and improves
the reliability of an entire network.
12.3.2.10 LDP GR
LDP supports graceful restart (GR) that enables a Restarter, together with a Helper, to perform a
master/backup switchover or protocol restart, without interrupting traffic.
Figure 1 LDP GR
After a device without the GR capability performs a master/backup switchover, an LDP session between the
device and its neighbor node goes Down. As a result, the neighbor node deletes the LSP established over the
LDP session, and services are interrupted for a short period of time. LDP GR can be configured to prevent
2022-07-08 2255
Feature Description
interruptions because LDP GR remains capable of forwarding entries after a master/backup device
switchover or protocol restart is performed. LDP GR helps implement uninterrupted MPLS forwarding. Figure
1 illustrates the process of LDP GR:
1. Before a master/slave switchover is performed, LDP neighbors negotiate the GR capability when
establishing an LDP session.
2. When the GR Helper is aware that the Restarter has performed a master/slave switchover or LDP is
restarted, the Helper starts a Reconnect timer, and reserves the forwarding entries of the Restarter
before the timer expires to prevent forwarding interruptions.
3. If an LDP session between the Restarter and Helper is reestablished before the Reconnect timer
expires, the Helper deletes the Reconnect timer and starts a Recovery timer.
4. The Helper and the Restarter help each other restore the forwarding entries before the Recovery timer
expires. After the timer expires, the Helper deletes all Restarter-related forwarding entries that were
not restored.
5. After the Restarter performs the master/backup switchover or protocol restart, the Restarter starts a
Forwarding State Holding timer. The Restarter preserves the forwarding entries before a restart and
restores the forwarding entries before the timer expires with the help of the Helper. After the
Forwarding State Holding timer expires, the Restarter deletes all forwarding entries that were not
restored.
The NE40E can function as a Helper to help the Restarter implement uninterrupted forwarding during a
master/backup switchover or protocol restart.
Background
If a node or link along an LDP LSP that is transmitting traffic fails, traffic switches to a backup LSP. The path
switchover speed depends on the detection duration and traffic switchover duration. A delayed path
switchover causes traffic loss. LDP fast reroute (FRR) can be used to speed up the traffic switchover, but not
the detection process.
As shown in Figure 1, a local label switching router (LSR) periodically sends Hello messages to notify each
peer LSR of the local LSR's presence and establish a Hello adjacency with each peer LSR. The local LSR
constructs a Hello hold timer to maintain the Hello adjacency with each peer. Each time the local LSR
receives a Hello message, it updates the Hello hold timer. If the Hello hold timer expires before a Hello
message arrives, the LSR considers the Hello adjacency disconnected. The Hello mechanism cannot rapidly
detect link faults, especially when a Layer 2 device is deployed between the local LSR and its peer.
2022-07-08 2256
Feature Description
The rapid, light-load BFD mechanism is used to quickly detect faults and trigger a primary/backup LSP
switchover, which minimizes data loss and improves service reliability.
A BFD session that monitors LDP LSPs is negotiated in either static or dynamic mode:
• Static configuration: The negotiation of a BFD session is performed using the local and remote
discriminators that are manually configured for the BFD session to be established. On a local LSR, you
can bind an LSP with a specified next-hop IP address to a BFD session with a specified peer IP address.
• Dynamic establishment: The negotiation of a BFD session is performed using the BFD discriminator
type-length-value (TLV) in an LSP ping packet. You must specify a policy for establishing BFD sessions
on a local LSR. The LSR automatically establishes BFD sessions with its peers and binds the BFD sessions
to LSPs using either of the following policies:
■ Host address-based policy: The local LSR uses all host addresses to establish BFD sessions. You can
specify a next-hop IP address and an outbound interface name of LSPs and establish BFD sessions
to monitor the specified LSPs.
■ Forwarding equivalence class (FEC)-based policy: The local LSR uses host addresses listed in a
configured FEC list to automatically establish BFD sessions.
BFD uses the asynchronous mode to check LSP continuity. That is, the ingress and egress periodically send
BFD packets to each other. If one end does not receive BFD packets from the other end within a detection
period, BFD considers the LSP Down and sends an LSP Down message to the LSP management (LSPM)
module.
Although BFD for LDP is enabled on a proxy egress, a BFD session cannot be established for the reverse path of a proxy
egress LSP on the proxy egress.
2022-07-08 2257
Feature Description
• A BFD for LDP tunnel session is triggered using a host IP address, a FEC list, or an IP prefix list.
• No next-hop address or outbound interface name can be specified in any BFD session trigger policies.
Usage Scenarios
• BFD for LDP LSP can be used when primary and bypass LDP FRR LSPs are established.
• BFD for LDP Tunnel can be used when primary and bypass virtual private network (VPN) FRR LSPs are
established.
Benefits
BFD for LDP LSP provides a rapid, light-load fault detection mechanism for LDP LSPs, which improves
network reliability.
Background
As mobile services evolve from narrowband voice services to integrated broadband services, providing rich
voice, streaming media, and high speed downlink packet access (HSDPA) services, the demand for network
bandwidth is rapidly increasing. Meeting the bandwidth demand on traditional bearer networks requires
2022-07-08 2258
Feature Description
huge investments. Therefore, carriers are in urgent need of an access mode that is low cost, flexible, and
highly efficient, which can help them meet the challenges brought by the growth in wideband services. In
this context, the all-IP mobile bearer networks are an effective means of dealing with these issues. IP radio
access networks (RANs), a type of IP-based mobile bearer network, are increasingly widely used.
IP RANs, however, have more complex reliability requirements than traditional bearer networks when
carrying broadband services. Traditional fault detection mechanisms cannot trigger protection switching
based on random bit errors. Therefore, bit errors may degrade or even interrupt services on an IP RAN in
extreme cases. Bit-error-triggered protection switching can solve this problem.
Benefits
Bit-error-triggered LDP protection switching has the following benefits:
• Enables devices to record bit error events, enabling carriers to quickly locate the nodes or lines with bit
errors and take corrective measures.
Related Concepts
LDP interface bit error rate
LDP interface bit error rate is the bit error rate detected by LDP on an interface. A node uses a Link Hello
message to report its LDP interface bit error rate to an upstream LDP peer.
LSP bit error rate
LSP bit error rate on a node = LSP bit error rate reported by the downstream LDP peer + the LDP interface
bit error rate reported by the downstream LDP peer.
Implementation
The NE40E supports single-node and multi-node LDP bit error detection and calculation. When LDP detects
an interface bit error on a node along an LSP, the node sends a Link Hello message to notify its upstream
LDP peer of the interface bit error rate and a Label Mapping message to notify its upstream LDP peer of the
LSP bit error rate. Upon receipt of the notifications, the upstream LDP peer uses the received interface bit
error rate as the local LDP interface bit error rate, adds the LDP interface bit error rate to the received LSP
bit error rate to obtain the local LSP bit error rate, and sends the interface bit error rate and local LSP bit
error rate to its upstream LDP peer. This process repeats until the ingress of the LSP calculates its local LSP
bit error rate. Figure 1 illustrates the networking for bit-error-triggered LDP protection switching.
2022-07-08 2259
Feature Description
In Figure 1, an LSP is established between PE1 and PE2. If if1 and if3 interfaces both detect bit errors, the bit
errors along the LSP to the ingress are advertised and calculated as described by the text in Figure 1.
LDP only detects and transmits bit errors, and service switching such as in PW switching or L3VPN route switching occurs
on paths carried over LDP.
2022-07-08 2260
Feature Description
spoofing. An MD5 message digest is a unique result generated using an irreversible character string
conversion. If a message is modified during transmission, a different digest is generated. After the
message arrives at the receive end, the receive end can detect the modification after comparing the
received digest with a pre-computed digest.
LDP MD5 authentication prevents LDP packets from being modified by generating unique summary
information for the same information segment. It is stricter than the common TCP connection check.
LDP MD5 authentication is performed before LDP messages are sent over TCP. A unique message digest
is added following the TCP header in a message. The message digest is generated using the MD5
algorithm based on the TCP header, LDP message, and user-defined password.
When receiving the message, the receive end obtains the TCP header, message digest, and LDP
message. It generates the message digest based on the obtained information and the locally saved
password. Then, it compares the generated message digest with the message digest carried in the LDP
message. If they are different, the receive end interprets the LDP message as having been tampered
with.
A password can be set either in ciphertext or simple text. If the password is set in simple text, the
password set by users is directly recorded in the configuration file. If the password is set in ciphertext,
the password is encrypted using a special algorithm and then recorded in the configuration file.
Characters set by users are used in digest calculation, regardless of whether the password is set in
simple text or ciphertext. Encrypted passwords are not used in digest calculations. Encryption/decryption
algorithms are proprietary to vendors.
The encryption algorithm MD5 has a low security, which may bring security risks. Using more secure authentication
is recommended.
2022-07-08 2261
Feature Description
Principles
LDP over TE establishes LDP LSPs across RSVP-TE areas. RSVP-TE is an MPLS tunnel technique used to
generate LSPs as tunnels for other protocols to transparently transmit packets. LDP is another MPLS tunnel
technique used to generate LDP LSPs. LDP over TE allows an LDP LSP to span an RSVP-TE area so that a TE
tunnel functions as a hop along an LDP LSP.
After an RSVP-TE tunnel is established, an IGP (OSPF or IS-IS) locally computes routes or advertises link
state advertisements (LSAs) or link state PDUs (LSPs) to select a TE tunnel interface as the outbound
interface. In the following example, the original Router is directly connected to the destination Router of the
TE tunnel through logical interfaces. Packets are transparently transmitted along the TE tunnel.
In Figure 1, P1, P2, and P3 belong to an RSVP-TE domain. PE1 and PE2 are located in a VPN, and LDP
sessions between PE1 and P1 and between P3 and PE2 are established. The following example demonstrates
the process of establishing an LDP LSP between PE1 and PE2 over the RSVP-TE domain:
1. An RSVP-TE tunnel between P1 and P3 is set up. P3 assigns RSVP-Label-1 to P2, and P2 assigns RSVP-
Label-2 to P1.
2. PE2 initiates LDP to set up an LSP and sends a Label Mapping message carrying LDP-Label-1 to P3.
3. Upon receipt, P3 sends a Label Mapping message carrying LDP-Label-2 to P1 over a remote LDP
session.
Usage Scenario
2022-07-08 2262
Feature Description
LDP over TE is used to transmit VPN services. Because carriers have difficulties in deploying MPLS traffic
engineering on an entire network, they use LDP over TE to plan a core TE area and implement LDP outside
this area. Figure 2 illustrates an LDP over TE network.
The advantage of LDP over TE is that an LDP LSP is easier to operate and maintain than a TE tunnel, and
the resource consumption of LDP is lower than that of the RSVP soft state. On an LDP over TE network, TE
tunnels are deployed only in the core area, but not on all devices including PEs. This simplifies deployment
and maintenance on the entire network and relieves burden from PEs. In addition, the core area can take
full advantage of TE tunnels to perform protection switchovers, path planning, and bandwidth protection.
Principles
LDP GTSM implements GTSM implementation over LDP.
To protect the Router against attacks, GTSM checks the TTL in each packet to verify it. GTSM for LDP verifies
LDP packets exchanged between neighbor or adjacent (based on a fixed number of hops) Routers. The TTL
range is configured on each Router for packets from other Routers, and GTSM is enabled. If the TTL of an
LDP packet received by a Router configured with LDP is out of the TTL range, the packet is considered
invalid and discarded. Therefore, the upper layer protocols are protected.
Usage Scenario
GTSM is used to protect the TCP/IP-based control plane against CPU usage attacks, for example, CPU
overload attacks. GTSM for LDP is used to verify all LDP packets to prevent LDP from suffering CPU-based
2022-07-08 2263
Feature Description
attacks when LDP receives and processes a large number of forged packets.
In Figure 1, LSR1 through LSR5 are core Routers on the backbone network. When LSRA is connected to the
Router through another device, LSRA may initiate an attack by forging LDP packets that are transmitted
among LSR 1 to LSR 5.
After LSRA accesses the backbone network through another device and forges a packet, the TTL carried in
the forged packet cannot be forged.
A GTSM policy is configured on LSR1 through LSR5 separately and is used to verify packets reaching possible
neighbors. For example, on LSR5, the valid number of hops is set to 1 or 2, and the valid TTL is set to 254 or
255 for packets sent from LSR2. The forged packet sent by LSRA to LSR5 through multiple intermediate
devices contains a TTL value that is out of the preset TTL range. LSR5 discards the forged packet and
prevents the attack.
Principles
The local and remote LDP adjacencies can be connected to the same peer so that the peer is maintained by
both the local and remote LDP adjacencies.
On the network shown in Figure 1, when the local LDP adjacency is deleted due to a failure in the link to
which the adjacency is connected, the peer's type may change without affecting its presence or status. (The
peer type is determined by the adjacency type. The types of adjacencies can be local, remote, and coexistent
local and remote.)
If the link becomes faulty or is recovering from a fault, the peer type may change while the type of the
session associated with the peer changes. However, the session is not deleted and does not become Down.
Instead, the session remains Up.
Usage Scenario
2022-07-08 2264
Feature Description
Figure 1 Networking diagram for a coexistent local and remote LDP session
A coexistent local and remote LDP session typically applies to L2VPNs. On the network shown in Figure 1,
L2VPN services are transmitted between PE1 and PE2. When the directly connected link between PE1 and
PE2 recovers from a disconnection, the processing of a coexistent local and remote LDP session is as follows:
1. MPLS LDP is enabled on the directly connected PE1 and PE2, and a local LDP session is set up between
PE1 and PE2. PE1 and PE2 are configured as the remote peer of each other, and a remote LDP session
is set up between PE1 and PE2. Local and remote adjacencies are then set up between PE1 and PE2.
Since now, both local and remote LDP sessions exist between PE1 and PE2. L2VPN signaling messages
are transmitted through the compatible local and remote LDP session.
2. When the physical link between PE1 and PE2 becomes Down, the local LDP adjacency also goes Down.
The route between PE1 and PE2 is still reachable through the P, which means that the remote LDP
adjacency remains Up. The session changes to a remote session so that it can remain Up. The L2VPN
does not detect the change in session status and does not delete the session. This prevents the L2VPN
from having to disconnect and recover services, and shortens service interruption time.
3. When the fault is rectified, the link between PE1 and PE2 as well as the local LDP adjacency can go Up
again. The session changes to the compatible local and remote LDP session and remains Up. Again,
the L2VPN will not detect the change in session status and does not delete the session. This reduces
service interruption time.
assigned to upstream nodes and P2 receives a Label Mapping message from P1, it does not send the Label
Mapping message associated with the route to P1. If the link between P1 and P3 is faulty, the route from
PE1 to PE3 is switched from PE1 -> P1 -> P3 -> PE3 to PE1 -> P1 -> P2 -> P4 -> P3 -> PE3 and P2 becomes
the downstream node of P1. The LSP can only be set up after P2 resends a Label Mapping message.
However, P2 does not send the Label Mapping message to P1, which slows LSP re-convergence.
When LDP is enabled to distribute labels to all peers, P2 sends a Label Mapping message associated with the
route to P1 after receiving the Label Mapping message from P1, which allows LDP to generate a liberal LSP
on P1. If the link between P1 and P3 becomes faulty, the route from PE1 to PE3 is switched from PE1 -> P1 -
> P3 -> PE3 to PE1 -> P1 -> P2 -> P4 -> P3 -> PE3, P2 becomes the downstream of P1, and the liberal LSP
changes to a normal LSP, which accelerates LSP convergence.
Figure 1 Networking diagram for both upstream and downstream LSRs assigned labels by LDP
In addition, split horizon can be configured to have Label Mapping messages only sent to specified upstream
LSRs.
12.3.2.19 mLDP
The multipoint extensions for Label Distribution Protocol (mLDP) transmits multicast services over IP or
Multiprotocol Label Switching (MPLS) backbone networks, which simplifies network deployment.
Background
Traditional core and backbone networks run IP and MPLS to flexibly transmit unicast packets and provide
high reliability and traffic engineering (TE) capabilities.
The proliferation of applications, such as IPTV, multimedia conference, and massively multiplayer online
role-playing games (MMORPGs), amplifies demands on multicast transmission over IP/MPLS networks. The
existing P2P MPLS technology requires a transmit end to deliver the same data packet to each receive end,
which wastes network bandwidth resources.
The point-to-multipoint (P2MP) Label Distribution Protocol (LDP) technique defined in mLDP can be used to
address the preceding problem. mLDP P2MP extends the MPLS LDP protocol to meet P2MP transmission
requirements and uses bandwidth resources much more efficiently.
2022-07-08 2266
Feature Description
Figure 1 shows the P2MP LDP LSP networking. A tree-shaped LSP originates at the ingress PE1 and is
destined for egresses PE3, PE4, and PE5. The ingress directs multicast traffic into the LSP. The ingress sends a
single packet along the trunk to the branch node P4. P4 replicates the packet and forwards the packet to its
connected egresses. This process prevents duplicate packets from wasting trunk bandwidth.
Related Concepts
Table 1 describes the nodes used on the P2MP LDP network shown in Figure 1.
Root node An ingress on a P2MP LDP LSP. The ingress initiates LSP calculation PE1
and establishment. The ingress pushes a label into each multicast
packet before forwarding it along an established LSP.
Transit node An intermediate node that swaps an incoming label for an outgoing P1 and P3
label in each MPLS packet. A branch node may function as a transit
node.
Leaf node A destination node on a P2MP LDP LSP. PE3, PE4, and PE5
Bud node An egress of a sub-LSP and transit node of other sub-LSPs. The bud PE2
node is connected to a customer edge (CE) and is functioning as an
egress.
2022-07-08 2267
Feature Description
Implementation
The procedure for using mLDP to establish and maintain a P2MP LDP LSP is as follows:
As shown in Figure 3, P2MP LDP-enabled label switching routers (LSRs) exchange signaling messages to
negotiate mLDP sessions. Two LSRs can successfully negotiate an mLDP session only if both the LDP
Initialization messages carry the P2MP Capability TLV. After successful negotiation, an mLDP session is
established. The mLDP session establishment process is similar to the LDP session establishment process. The
difference is that the mLDP session establishment involves P2MP capability negotiation.
2022-07-08 2268
Feature Description
Field Description
2022-07-08 2269
Feature Description
Field Description
Opaque Value Value that identifies a specific P2MP LSP on a root node and carries information
about the root (also called ingress) and leaf nodes on the P2MP LSP
The P2MP LDP LSP establishment mode varies depending on the node type. A P2MP LDP LSP contains the
following nodes:
• Leaf node: manually specified. When configuring a leaf node, you must also specify the root node IP
address and the opaque value.
• Transit node: any node that can receive P2MP Label Mapping messages and whose LSR ID is different
from the LSR IDs of the root nodes.
• Root node: a node whose host address is the same as the root node's IP address carried in a P2MP LDP
FEC.
• Nodes send Label Mapping messages to upstream nodes and generate forwarding entries.
As shown in Figure 5, each node performs the following operations before completing the LSP
establishment:
■ Leaf node: sends a Label Mapping message to its upstream node and generates a forwarding entry.
■ Transit node: receives a Label Mapping message from its downstream node and checks whether it
has sent a Label Mapping message to its upstream node:
■ If the transit node has sent no Label Mapping message to any upstream nodes, it looks up the
routing table and finds an upstream node. If the upstream and downstream nodes of the
transit node have different IP addresses, the transit node sends a Label Mapping message to
the upstream node. If the upstream and downstream nodes of the transit node have the same
IP address, the transit node does not send a Label Mapping message.
■ If the transit node has sent a Label Mapping message to its upstream node, it does not send a
Label Mapping message again.
■ Root node: receives a Label Mapping message from its downstream node and generates a
forwarding entry.
2022-07-08 2270
Feature Description
• Leaf node
A leaf node sends a Label Withdraw message to an upstream node. After the upstream node receives
the message, it replies with a Label Release message to instruct the leaf node to tear down the sub-LSP.
If the upstream node has only the leaf node as a downstream node, the upstream node sends the Label
Withdraw message to its upstream node. If the upstream node has another downstream node, the
upstream node does not send the Label Withdraw message.
• Transit node
If a transit node or an LDP session between a transit node and its upstream node fails or a user
manually deletes the transit node configuration, the upstream node of the transit node deletes the sub-
LSPs that pass through the transit node. If the upstream node has only the transit node as a
downstream node, the upstream node sends the Label Withdraw message to its upstream node. If the
upstream node has another downstream node, the upstream node does not send the Label Withdraw
message.
• Root node
If a root node fails or a user manually deletes the LSP configuration on the root node, the root node
deletes the whole LSP.
2022-07-08 2271
Feature Description
node to the LSP and updates the forwarding entry for the sub-LSP.
As shown in Figure 6, the upstream node of Leaf 2 is changed from P4 to P2. To prevent LSP loops, Leaf
2 sends a Label Withdraw message to P4. Upon receipt, P4 deletes the sub-LSP to Leaf 2 and deletes
the forwarding entry for the sub-LSP. Leaf 2 then sends a Label Mapping message to P2. Upon receipt,
P2 establishes a sub-LSP to Leaf 2 and generates a forwarding entry.
Other Usage
mLDP P2MP LSPs can transmit services on next generation (NG) multicast VPN (MVPN) and multicast VPLS
networks. In the MVPN or multicast VPLS scenario, NG MVPN signaling or multicast VPLS signaling triggers
the establishment of mLDP P2MP LSPs. There is no need to manually configure leaf nodes.
Usage Scenarios
mLDP can be used in the following scenarios:
2022-07-08 2272
Feature Description
• The virtual private LAN service (VPLS) is transmitted along a P2MP LDP LSP.
Benefits
mLDP used on an IP/MPLS backbone network offers the following benefits:
• Core nodes on the IP/MPLS backbone network can transmit multicast services, without Protocol
Independent Multicast (PIM) configured, which simplifies network deployment.
• Uniform MPLS control and forwarding planes are provided for the IP/MPLS backbone network. The
IP/MPLS backbone network can transmit both unicast and multicast VPN traffic.
Background
With the growth of user services, the demands for using mLDP LSPs to carry multicast traffic are increasing.
Therefore, mLDP LSP protection techniques become increasingly important. In implementation of a mLDP
LSP protection technique, an mLDP FRR LSP can be established if routes are reachable and the downstream
outbound interface of an mLDP LSP is not co-routed with the outbound interface of the primary mLDP LSP.
mLDP FRR link protection is implemented using the primary route to a downstream device, LFA FRR route,
RLFA FRR route, or multi-link method, which improves user network reliability.
mLDP FRR link protection does not support backup links on a TE tunnel.
Related Concepts
• DS node: downstream node
Implementation
An upstream node generates an mLDP FRR path for each outbound interface of an mLDP LSP. If the
outbound interface of the primary LSP fails, the forwarding plane rapidly switches traffic to the mLDP FRR
path to a directly connected downstream LDP peer, which protects traffic on the primary LSP.
The mLDP FRR path transmits traffic over a P2P LDP LSP that is destined for a directly connected
downstream peer. When traffic passes through the mLDP FRR path, the inner label is the outgoing label
2022-07-08 2273
Feature Description
mapped to the original primary P2MP LDP LSP, and the outer label is the outgoing label of a P2P LDP LSP.
After traffic arrives at the downstream directly connected LDP peer, the peer removes the P2P LDP label and
swaps the inner P2MP LDP label with another label before forwarding the traffic downstream. An mLDP FRR
path is selected based on the following rules:
• An mLDP FRR path has P2P LDP labels and its destination is the downstream directly connected LDP
peer.
• The outbound interface of a P2P LDP LSP is different from the outbound interface of the primary LDP
LSP.
mLDP FRR link protection only protects traffic on the outbound interface of the primary mLDP LSP.
A link fault on the primary mLDP LSP triggers protocol convergence on the control plane. To minimize
packet loss during the convergence, configure LDP GR Helper and mLDP MBB.
On the triangle network shown in Figure 1, if a fault occurs on the link between the upstream and
downstream nodes, link protection on the mLDP LSP perform the following convergence functions:
• After the upstream node detects the link fault, the forwarding plane rapidly switches traffic. A P2P LDP
label is added to the outbound label of the original primary mLDP LSP in each packet. The packet is
forwarded by the P to the downstream node. After the downstream node removes the P2P LDP label,
the node swaps the mLDP LSP outgoing label with another label before sending the packet
downstream.
• Without mLDP FRR link protection, the LDP GR helper function must be enabled for the path between
the upstream and downstream nodes. The GR helper function protects the active and standby
forwarding entries from being deleted on the upstream node in case of an LDP session disconnection.
Consequently, the MBB process can continue on the destination node.
• Without mLDP FRR link protection, mLDP MBB must have been enabled on a device. Once the control
plane detects a fault, the downstream node identifies the change in the next hop of a route to the root
node and enters the MBB process. After the DS node-P-US node path is established, the downstream
node receives traffic sent only by the new upstream P node to complete the convergence process.
2022-07-08 2274
Feature Description
Usage Scenarios
• Figure 1 shows the typical triangle networking.
• Figure 2 shows the typical four-point ring networking. If an RLFA route to the downstream node is used
and the outbound interface of the RLFA FRR route differs from the outbound interface of the primary
LSP, the upstream node selects the RLFA FRR path as a backup path.
■ If multiple links load-balance traffic, the upstream node selects a load-balancing link as a
protection path, but does not select the outbound interface of the primary mLDP LSP.
■ If multi-link routes form an LFA FRR path, the upstream node selects a protection path in the same
way as that in the typical triangle networking.
■ If one of multiple links has an active route and FRR is disabled, the upstream node selects one of
2022-07-08 2275
Feature Description
multi-link interfaces as a protection path, but does not select the outbound interface of the primary
mLDP LSP.
Benefits
mLDP LSP link protection offers the following benefits:
Background
Both NG MVPN over mLDP P2MP and VPLS over mLDP P2MP provide dual-root 1+1 protection. If an mLDP
P2MP master tree fails, traffic rapidly switches to a backup tree, which reduces service traffic loss.
mLDP P2MP searches a unicast routing table created in the base topology for root routes, which may cause
a protection failure. If unicast routes from the two roots to a leaf node partially overlap and the overlapping
link fails, dual-root 1+1 protection fails. Adjusting the unicast routes to prevent a protection failure
stemming from an overlapping link but adversely affects existing unicast services.
An apparent solution is to divide a physical network into different logical topologies for different services.
This is called multi-topology. Each class-specific topology in a public network address family contains an
independent routing table. The class-specific topology allows protocol routes to be added, deleted, and
imported. Based on the multi-topology and class-specific topology, the class-specific topology can be
configured to address the dual-root 1+1 protection failure for mLDP P2MP tunnels.
mLDP P2MP is a typical application of the class-specific topology. A primary mLDP P2MP LSP can be
2022-07-08 2276
Feature Description
configured in the class-specific topology. Routes then only partially depend on the unicast routing table.
Route priorities can be adjusted in the class-specific topology to prevent the overlapping link, which does not
affect unicast services.
The mLDP P2MP master tree is created in the class-specific topology, whereas an mLDP FRR LSP is created in the base
topology because the FRR LSP is established using unicast techniques.
Basic Concepts
• Base topology: is created by default on a public network and cannot be configured or deleted.
Implementation
On the network shown in Figure 1, in the base topology, the master tree PE1 -> P1 -> PE3 and the backup
tree PE2 -> P1 -> PE3 share the P1 -> PE3 link. If this link fails, both the master and backup trees are
interrupted, causing traffic loss. To prevent the overlapping link, properly plan the network deployment and
prevent the master and backup trees from overlapping. Change the path of the PE2-to-PE3 tunnel from PE2
-> P1 -> PE3 to PE2 -> P2 -> PE3. After the route is changed in the base topology, all service paths are
updated to PE2 -> P2 -> PE3. In this situation, existing unicast services are adversely affected. To address this
problem, create the master tree in the class-specific topology. Deploy the class-specific topology on each
router and adjust the master tree to PE1 -> P1 -> PE3 and the backup tree to PE2 -> P1 -> PE3, which
addresses the overlapping link issue and does not affect existing unicast services along the path PE2 -> P1 ->
PE3.
2022-07-08 2277
Feature Description
Benefits
• Deployment can be modified to prevent mLDP P2MP dual-root 1+1 protection failures stemming from
the overlapping link issue, without unicast services affected.
Implementation
LDP traffic statistics collection enables the ingress or a transit node to collect statistics only about outgoing
LDP LSP traffic with the destination IP address mask of 32 bits.
In Figure 1, each pair of adjacent devices establishes an LDP session and LDP LSP over the session. Two LSPs
originate from LSRA and are destined for LSRD along the paths LSRA -> LSRB -> LSRD and LSRA -> LSRB ->
LSRC -> LSRD. LSRB is used as an example. LSRB functions as either a transit node to forward LSRA-to-LSRD
traffic or the ingress to forward LSRB-to-LSRD traffic. LSRB collects statistics about traffic sent by the
outbound interface connected to LSRD and outbound interface connected to LSRC. LSRA can only function as
the ingress, and therefore, collects statistics about traffic only sent by itself. LSRD can only function as the
egress, and therefore, does not collect traffic statistics.
2022-07-08 2278
Feature Description
Benefits
No tunnel protection is provided in the NG-MVPN over mLDP P2MP function or VPLS over mLDP P2MP
function. If an LSP fails, traffic can only be switched using route change-induced hard convergence, which
renders low performance. BFD for P2MP tunnel provides a dual-root mLDP 1+1 protection mechanism for
the NG-MVPN over mLDP P2MP function or VPLS over mLDP P2MP function. The primary and backup
tunnels are established for VPN traffic. If a P2MP tunnel fails, BFD For mLDP P2MP tunnel rapidly detects
the fault and switches traffic, which improves convergence performance for the NG-MVPN over mLDP P2MP
function or VPLS over mLDP P2MP function and minimizes traffic loss.
Principles
Figure 1 Dual-root P2MP LDP tunnel protection
In Figure 1, a root uses BFD to send protocol packets to all leaf nodes along a P2MP LDP LSP. If a leaf node
fails to receive BFD packets within a specified period, a fault occurs.
In an NG-MVPN or VPLS scenario shown in Figure 1, each of two roots establishes an mLDP P2MP tree. PE-
AGG1 is the master root, and PE-AGG2 is the backup root. The two trees do not overlap. BFD for P2MP
tunnel is configured on the roots and leaf nodes to establish BFD sessions. If a BFD session detects a fault in
the primary P2MP tunnel, a forwarder rapidly detects the fault and switches NG-MVPN or VPLS traffic to the
backup P2MP tunnel.
Principles
In a large-scale network, multiple IGP areas usually need to be configured for flexible network deployment
2022-07-08 2279
Feature Description
and fast route convergence. When advertising routes between IGP areas, to prevent a large number of
routes from consuming too many resources, an area border router (ABR) needs to aggregate the routes in
the area and then advertise the aggregated route to the neighbor IGP areas. The LDP extension for inter-
area LSP function supports the longest match rule for looking up routes so that LDP can use aggregated
routes to establish inter-area LDP LSPs.
As shown in Figure 1, there are two IGP areas: Area 10 and Area 20.
In the routing table of LSRD on the edge of Area 10, there are two host routes to LSRB and LSRC. You can
use IS-IS to aggregate the two routes to one route to 192.168.3.0/24 and send this route to Area 20 in order
to prevent a large number of routes from occupying too many resources on the LSRD. Consequently, there is
only one aggregated route (192.168.3.0/24) but not 32-bit host routes in LSRA's routing table. By default,
when establishing LSPs, LDP searches the routing table for the route that exactly matches the forwarding
equivalence class (FEC) in the received Label Mapping message. Figure 1 shows routing entry information of
LSRA and routing information carried in the FEC in the example shown in Table 1.
Table 1 Routing entry information of LSRA and routing information carried in the FEC
192.168.3.0/24 192.168.3.1/32
192.168.3.2/32
LDP uses a summarized route to create only a liberal LSP (that is assigned labels but fails to be established)
but cannot create an inter-IGP-area LDP LSP to carry VPN services on a backbone network.
Therefore, in the situation shown in Figure 1, configure LDP to search for routes based on the longest match
2022-07-08 2280
Feature Description
rule for establishing LSPs. There is already an aggregated route to 192.168.3.0/24 in the routing table of
LSRA. When LSRA receives a Label Mapping message (such as the carried FEC is 192.168.3.1/32) from Area
10, LSRA searches for a route according to the longest match rule defined in relevant standards. Then, LSRA
finds information about the aggregated route to 192.168.3.0/24, and uses the outbound interface and next
hop of this route as those of the route to 192.168.3.1/32. LDP can establish inter-area LDP LSPs.
In Figure 2, no exact routes between LSRA and LSRC are configured. The default LSRA-to-LSRB route to
0.0.0.0 is used between LSRA and LSRC. A remote LDP session in DoD mode is established between LSRA and
LSRC. Before an LSP is established between the two LSRs, LSRA uses the longest match rule to query the
next-hop IP address and sends a Label Request packet to the downstream LSR. Upon receipt of the Label
Request packet, the transit LSRB checks whether an exact route to LSRC exists. If no exact route is
configured and the longest match function is enabled, LSRB uses the longest match function to find a route
and establish an LSP over the route.
A remote LDP session in DoD mode is established on LSRA and LSRA does not find an exact route to the LDP
peer ID (IP address). In this situation, after the IP address of a remote peer is specified on LSRA, LSRA uses
the longest match function to automatically send a Label Request packet to request a DoD label to the
remote peer that is assigned an IP address.
Service Overview
The IP or Multiprotocol Label Switching (MPLS) technology has become a mainstream bearer technology on
backbone networks, and the demands for multicast services (for example, IPTV) transmitted over bearer
networks are evolving. Carriers draw on the existing MPLS mLDP technique to provide the uniform MPLS
control and forwarding planes for multicast services transmitted over backbone networks.
2022-07-08 2281
Feature Description
Networking Description
mLDP is deployed on IP/MPLS backbone networks. Figure 1 illustrates mLDP applications in an IPTV
scenario.
Feature Deployment
The procedure for deploying end-to-end (E2E) IP multicast services to be transmitted along mLDP label
switched paths (LSPs) is as follows:
2. Configure leaf nodes to send requests to the root node to establish point-to-multipoint (P2MP)
LDP LSPs.
■ Configure the egresses to run Protocol Independent Multicast (PIM) to generate multicast
forwarding entries.
■ Enable the egresses to ignore the Unicast Reverse Path Forwarding (URPF) check.
This is because the URPF check fails as PIM does not need to be run on core nodes on the P2MP
LDP network.
■ Enable multicast source proxy based on the location of the Rendezvous Point (RP).
2022-07-08 2282
Feature Description
After multicast data packets for a multicast group in an any-source multicast (ASM) address range
are directed to an egress, the egress checks the packets based on unicast routes. Multicast source
proxy is enabled or disabled based on the following check results:
■ If the egress is indirectly connected to a multicast source and does not function as the RP to
which the group corresponds, the egress stops forwarding multicast data packets. As a result,
downstream hosts cannot receive these multicast data packets. Multicast source proxy can be
used to address this problem. Multicast source proxy enables the egress to send a Register
message to the RP deployed on a source-side device (for example, SR1) in a PIM domain. The
RP adds the egress to a rendezvous point tree (RPT) to enable the egress to forward multicast
data packets to the downstream hosts.
■ If the egress is directly connected to a multicast source or functions as the RP to which the
group corresponds, the egress can forward multicast data packets, without multicast source
proxy enabled.
Definition
MPLS TE establishes constraint-based routed label switched paths (LSPs) and transparently transmits traffic
over the LSPs. Based on certain constraints, the LSP path is controllable, and links along the LSP reserve
sufficient bandwidth for service traffic. In the case of resource insufficiency, the LSP with a higher priority
can preempt the bandwidth of the LSP with a lower priority to meet the requirements of the service with a
higher priority. In addition, when an LSP fails or a node on the network is congested, MPLS TE can provide
protection through Fast Reroute (FRR) and a backup path. MPLS TE allows network administrators to deploy
LSPs to properly allocate network resources and prevent network congestion. As the number of LSPs
increases, you can use a dedicated offline tool to analyze traffic. As shown in Figure 1, MPLS TE sets up LSP1
over the path LSRG→LSRB→LSRC→LSRD→LSRI→LSRJ with the bandwidth of 80 Mbit/s and LSP2 over the
path LSRA→LSRB→LSRE→LSRF→LSRH→LSRI→LSRJ with the bandwidth of 40 Mbit/s. MPLS TE then directs
traffic to the two LSPs to prevent congestion.
2022-07-08 2283
Feature Description
Figure 1 MPLS TE
Function Description
Module
Basic Basic MPLS TE functions include basic MPLS TE settings and the tunnel establishment
function capability.
Tunnel Tunnel optimization allows existing tunnels to be reestablished over other paths if the
optimization topology is changed, or these tunnels can be reestablished using updated bandwidth if
service bandwidth values are changed.
Reliability MPLS TE provides various reliability functions, including path protection, local protection,
and node protection.
Security RSVP authentication is implemented to improve the security of the signaling protocol on
the MPLS TE network.
P2MP TE P2MP TE is a promising solution to multicast service transmission. It helps carriers provide
high TE capabilities and increased reliability on an IP/MPLS backbone network and reduce
network operational expenditure (OPEX).
Purpose
TE techniques are common for carriers operating IP/MPLS bearer networks. These techniques can be used to
prevent traffic congestion and uneven resource allocation. Take the network shown in Figure 2 as an
example.
A node on a conventional IP network selects the shortest path as an optimal route, regardless of other
factors, for example, bandwidth. This easily causes the shortest path to be congested with traffic, whereas
other available paths are idle.
2022-07-08 2284
Feature Description
Each Link on the network shown in Figure 2 has a bandwidth of 100 Mbit/s and the same metric value. LSRA
sends LSRJ traffic at 40 Mbit/s, and LSRG sends LSRJ traffic at 80 Mbit/s. Traffic from both routers travels
through the shortest path LSRA (LSRG) → LSRB → LSRC → LSRD → LSRI → LSRJ that is calculated by an
IGP. As a result, the path LSRA (LSRG) → LSRB → LSRC → LSRD → LSRI → LSRJ may be congested because
of overload, whereas the path LSRA (LSRG) → LSRB → LSRE → LSRF → LSRH → LSRI → LSRJ is idle.
Network congestion is a major cause for backbone network performance deterioration. The network
congestion is resulted from insufficient resources or locally induced by incorrect resource allocation. For the
former, network device expansion can prevent the problem. For the later, TE is used to allocate some traffic
to idle link so that traffic allocation is improved. TE dynamically monitors network traffic and loads on
network elements and adjusts the parameters for traffic management, routing, and resource constraints in
real time, which prevents network congestion induced by load imbalance.
• IP traffic engineering: It controls network traffic by adjusting the metric of a path. This method
eliminates congestion only on some links. Adjusting a metric is difficult on a complex network because a
link change affects multiple routes.
• ATM traffic engineering: It uses an overlay network model and sets up virtual connections to guide
some traffic. The overlay model provides a virtual topology over the physical topology of a network,
which facilitates proper traffic scheduling and QoS. However, the overlay model has high extra
overhead, poor scalability, and high operation costs for carriers.
A scalable and simple solution is required to implement traffic engineering on a large-scale backbone
network. MPLS that uses an overlay model allows a virtual topology to be established over a physical
topology and maps traffic to the virtual topology. As such, MPLS TE, a technology that combines MPLS and
TE, is introduced.
Benefits
As a traffic engineering solution, MPLS TE offers the following advantages:
• Provides bandwidth and QoS guarantee for service traffic on the network.
2022-07-08 2285
Feature Description
• Establishes public network tunnels to isolate virtual private network (VPN) traffic.
Related Concepts
Concept Description
MPLS TE tunnel MPLS TE often associates multiple LSPs with a virtual tunnel interface, and such a
group of LSPs is called an MPLS TE tunnel. An MPLS TE tunnel is uniquely identified
by the following parameters:
Tunnel interface: a P2P virtual interface that encapsulates packets. Similar to a
loopback interface, a tunnel interface is a logical interface. A tunnel interface name is
identified by an interface type and number. The interface type is tunnel. The interface
number is expressed in the format of slot ID/card ID/interface ID.
Tunnel ID: a decimal number that uniquely identifies an MPLS TE tunnel, facilitating
tunnel planning and management. A tunnel ID must be specified when an MPLS TE
tunnel interface is configured.
2022-07-08 2286
Feature Description
Concept Description
A primary LSP with an LSP ID 2 is established along the path LSRA → LSRB → LSRC →
LSRD → LSRE on the network shown in Figure 1. A backup LSP with an LSP ID 1024 is
established along the path LSRA → LSRF → LSRG → LSRH → LSRE. The two LSPs are
in MPLS TE tunnel named Tunnel1 with a tunnel ID 100.
CR-LSP LSPs in an MPLS TE tunnel are generally called constraint-based routed label switched
paths (CR-LSPs).
Unlike Label Distribution Protocol (LDP) LSPs that are established based on routing
information, CR-LSPs are established based on bandwidth and path constraints in
addition to routing information.
2 Path The path calculation component runs the Constraint Shortest Path First (CSPF)
calculation algorithm and uses data in the TEDB to calculate a path that satisfies specific
component constraints. Evolving from the Shortest Path First (SPF) algorithm, CSPF excludes
nodes and links that do not satisfy specific constraints and uses SPF to calculate a
path.
3 Path The path establishment component establishes the following types of CR-LSPs:
establishment Static CR-LSP
component Static CR-LSPs are set up by manually configuring forwarding information and
resource information, independent of signaling protocols and path calculation.
Setting up a static CR-LSP consumes few resources because no MPLS control
packets are exchanged between two ends of the CR-LSP. Static CR-LSPs cannot be
2022-07-08 2287
Feature Description
adjusted dynamically when the network topology changes; therefore, static CR-LSPs
generally apply to small-scale networks with simple topologies.
Dynamic CR-LSP
Dynamic CR-LSPs are set up by the NE40E using Resource Reservation Protocol-
Traffic Engineering (RSVP-TE) signaling information, which can carry constraint
parameters, such as the bandwidth, partial explicit routes, and colors.
There is no need to manually configure each hop along a dynamic CR-LSP. Dynamic
CR-LSPs apply to large-scale networks.
4 Traffic The traffic forwarding component imports traffic to MPLS TE tunnels and forwards
forwarding the traffic based on MPLS. The preceding three components are enough for setting
component up an MPLS TE tunnel. However, an MPLS TE tunnel cannot automatically import
traffic after being set up. Instead, it requires the traffic forwarding component to
import traffic to the tunnel.
An MPLS TE network administrator only needs to configure link attributes based on link resource status and
tunnel attributes based on service needs and network planning. MPLS TE can then automatically establish
tunnels based on the configurations. After tunnels are set up and traffic import is configured, traffic can then
be forwarded along tunnels.
Related Concepts
The information advertisement component involves the following concepts:
Concept Description
Total link Total bandwidth of a physical link, which needs to be manually configured.
bandwidth
Maximum Maximum bandwidth that a link can reserve for an MPLS TE tunnel. The maximum
reservable reservable bandwidth must be lower than or equal to the total bandwidth of the link.
bandwidth Manually configure the maximum bandwidth according to the bandwidth usage of the link
2022-07-08 2288
Feature Description
Concept Description
TE metric A TE metric is used in TE tunnel path calculation, allowing the calculation process to be
independent from IGP route-based path calculation. By default, the IGP metric is used as
the TE metric.
SRLG A shared risk link group (SRLG) is a set of links that share a common physical resource
(such as a fiber). Links in an SRLG are at the same risk of faults. Specifically, if one of the
links fails, other links in the SRLG also fail.
SRLG is mainly used in hot-standby CR-LSP and TE FRR scenarios to enhance TE tunnel
reliability. For details about SRLG, see SRLG.
Link A link administrative group, also called a link color, is a 128-bit vector. Each bit can be
administrative associated or not with a desired meaning, such as link bandwidth, a performance
group parameter (such as the delay), or a management policy. The policy can be a traffic type
(multicast for example) or a flag indicating that a link is used by an MPLS TE tunnel. The
link administrative group attribute is used together with affinities to control the paths for
tunnels.
Contents to Be Advertised
The network resource information to be advertised includes:
• Link status information: interface IP addresses, link types, and link metric values, which are collected by
an IGP
• Bandwidth information, such as total link bandwidth and maximum reservable bandwidth
• TE metric: TE link metric, which is the same as the IGP metric by default
• SRLG
Advertisement Methods
Either of the following link status protocol extensions can be used to advertise TE information:
• IS-IS TE
• OSPF TE
OSPF TE and IS-IS TE automatically collect TE information and flood it to MPLS TE nodes.
2022-07-08 2289
Feature Description
• A CR-LSP fails to be established for an MPLS TE tunnel because no adequate bandwidth can be
reserved.
• Link attributes, such as the administrative group attribute or affinity attribute, change.
■ The proportion of the bandwidth reserved for an MPLS TE tunnel to the available bandwidth in the
TEDB is greater than or equal to a specific threshold.
■ The proportion of the bandwidth released by an MPLS TE tunnel to the available bandwidth in the
TEDB is greater than or equal to a specific threshold.
If either of the preceding conditions is met, an IGP floods link bandwidth information, and CSPF updates
the TEDB.
Assume that the available bandwidth of a link is 100 Mbit/s and 100 TE tunnels, each with bandwidth
of 1 Mbit/s, are established over the link. The flooding threshold is 10%. Figure 1 shows the proportion
of the bandwidth reserved for each MPLS TE tunnel to the available bandwidth in the TEDB.
Bandwidth flooding is not performed when tunnels 1 to 9 are created. After tunnel 10 is created, the
bandwidth information (10 Mbit/s in total) on tunnels 1 to 10 is flooded. The available bandwidth is 90
Mbit/s. Similarly, no bandwidth information is flooded after tunnels 11 to 18 are created. After tunnel
19 is created, bandwidth information of tunnels 11 to 19 is flooded. The process repeats until tunnel
100 is established.
2022-07-08 2290
Feature Description
Figure 1 Proportion of the bandwidth reserved for each MPLS TE tunnel to the available bandwidth in the
TEDB
The TEDB and IGP link-state data base (LSDB) are independent of each other. They have similarities and
differences:
• Similarities: The two types of databases both collect routing information flooded by IGPs.
• Differences: A TEDB contains TE information in addition to all the information in an LSDB. An IGP uses
information in an LSDB to calculate the shortest path, while MPLS TE uses information in a TEDB to
calculate the optimal path.
Related Concepts
The path calculation component involves the following concepts.
Concept Description
2022-07-08 2291
Feature Description
Concept Description
Tunnel Tunnel bandwidth needs to be planned and configured based on services to be transmitted
bandwidth through a tunnel. When the tunnel is established, the configured bandwidth is reserved on
each node on the tunnel, implementing bandwidth assurance.
Affinity An affinity is a 128-bit vector that describes the links to be used by a TE tunnel. It is
configured and implemented on the tunnel ingress, and used together with a link
administrative group attribute to manage link selection.
After a tunnel is assigned an affinity, a device compares the affinity with the administrative
group attribute during link selection. Based on the comparison result, the device
determines whether to select a link with specified attributes. The link selection criteria are
as follows:
The result of performing an AND operation between the IncludeAny affinity and the link
administrative group attribute is not 0.
The result of performing an AND operation between the ExcludeAny affinity and the link
administrative group attribute is 0.
IncludeAny = the affinity attribute value ANDed with the subnet mask value; ExcludeAny =
(–IncludeAny) ANDed with the subnet mask value; the administrative group value = the
administrative group value ANDed with the subnet mask value.
The mask of the affinity determines the link attributes to be checked by the device. In this
example, the bits with the mask of 1 are bits 11, 13, 14, and 16, indicating that these bits
need to be checked. The value of bit 11 in both the affinity and the administrative group
attribute of the link is 0 (not 1). In addition, the values of bits 13 and 16 in both the
affinity and the administrative group attribute of the link are 1. Therefore, the link matches
the affinity of the tunnel and can be selected for the tunnel.
2022-07-08 2292
Feature Description
Concept Description
NOTE:
Understand specific comparison rules before deploying devices of different vendors because the
comparison rules vary with vendors.
A network administrator can use the link administrative group and affinities to control the
paths over which MPLS TE tunnels are established.
Explicit path An explicit path is used to establish a CR-LSP. Nodes to be included or excluded are
specified on this path. Explicit paths are classified into the following types:
Strict explicit path
A hop is directly connected to its next hop on a strict explicit path. By specifying a strict
explicit path, the most accurate path is provided for a CR-LSP.
For example, a CR-LSP is set up between LSRA and LSRF on the network shown in Figure 2.
LSRA is the ingress, and LSRF is the egress. "X strict" specifies the LSR that the CR-LSP must
travel through. For example, "B strict" indicates that the CR-LSP must travel through LSRB,
and the previous hop of LSRB must be LSRA. "C strict" indicates that the CR-LSP must
travel through LSRC, and the previous hop of LSRC must be LSRB. The procedure repeats. A
path with each node specified is provided for the CR-LSP.
Loose explicit path
A loose explicit path contains specified nodes through which a CR-LSP must pass. Other
routers that are not specified can also exist on the CR-LSP.
2022-07-08 2293
Feature Description
Concept Description
For example, a CR-LSP is set up over a loose explicit path between LSRA and LSRF on the
network shown in Figure 3. LSRA is the ingress, and LSRF is the egress. "D loose" indicates
that the CR-LSP must pass through LSRD and LSRD and LSRA may not be directly
connected. This means that other LSRs may exist between LSRD and LSRA.
Hop limit Hop limit is a condition for path selection during CR-LSP establishment. Similar to the
administrative group and affinity attributes, a hop limit defines the number of hops that a
CR-LSP allows.
CSPF Fundamentals
CSPF works based on the following parameters:
• TEDB
A TEDB can be generated only after IGP TE is configured. On an IGP TE-incapable network, CR-LSPs are established
based on IGP routes, but not CSPF calculation results.
1. Links that do not meet tunnel attribute requirements in the TEDB are excluded.
2. SPF calculates the shortest path to a tunnel destination based on TEDB information.
CSPF attempts to use the OSPF TEDB to establish a path for a CR-LSP by default. If a path is successfully calculated
2022-07-08 2294
Feature Description
using OSPF TEDB information, CSPF completes calculation and does not use the IS-IS TEDB to calculate a path. If path
calculation fails, CSPF attempts to use IS-IS TEDB information to calculate a path.
CSPF can be configured to use the IS-IS TEDB to calculate a CR-LSP path. If path calculation fails, CSPF uses
the OSPF TEDB to calculate a path.
CSPF calculates the shortest path to a destination. If there are several shortest paths with the same metric,
CSPF uses a tie-breaking policy to select one of them. The following tie-breaking policies for selecting a path
are available:
• Most-fill: selects a link with the highest proportion of used bandwidth to the maximum reservable
bandwidth, efficiently using bandwidth resources.
• Least-fill: selects a link with the lowest proportion of used bandwidth to the maximum reservable
bandwidth, evenly using bandwidth resources among links.
• Random: selects links randomly, allowing LSPs to be established evenly over links, regardless of
bandwidth distribution.
The Most-fill and Least-fill modes are only effective when the difference in bandwidth usage between the
two links exceeds 10%, such as 50% of link A bandwidth utilization and 45% of link B bandwidth utilization.
The value is 5%. At this time, the Most-fill and Least-fill modes do not take effect, and the Random mode is
still used.
On the network shown in Figure 4, except the blue links and the links marked a specific bandwidth value, all
the links are of black and have the bandwidth of 100 Mbit/s. In this topology, an MPLS TE tunnel needs to
be established. The constraints on this tunnel are: The destination is LSRE, the bandwidth is 80 Mbit/s, the
affinity is black, and a transit node is LSRH. The lower part of Figure 4 shows the topology in which links
that do not meet the constraints are removed.
2022-07-08 2295
Feature Description
CSPF calculates a path shown in Figure 5 in the same way SPF would calculate it.
• CSPF calculates the shortest path between the ingress and egress, and SPF calculates the shortest path
between a node and each of other nodes on a network.
• CSPF uses metrics such as the bandwidth, link attributes, and affinity attributes, in addition to link costs,
which are the only metric used by SPF.
• CSPF does not support load balancing and uses three tie-breaking policies to determine a path if
multiple paths have the same attributes.
2022-07-08 2296
Feature Description
• Unidirectional: RSVP-TE only takes effect on traffic that travels from the ingress to the egress.
• Receive end-oriented: A receive end initiates a request to reserve resources and maintains resource
reservation information.
• Soft state-based: RSVP uses a soft state mechanism to maintain the resource reservation information.
RSVP-TE Messages
RSVP-TE messages are as follows:
• Path message: used to request downstream nodes to distribute labels. A Path message records path
information on each node through which the message passes. The path information is used to establish
a path state block (PSB) on a node.
• Resv message: used to reserve resources at each hop of a path. A Resv message carries information
about resources to be reserved. Each node that receives the Resv message reserves resources based on
reservation information carried in the message. The reservation information is used to establish a
reservation state block (RSB) and to record information about distributed labels.
• PathErr message: sent upstream by an RSVP node if an error occurs during the processing of a Path
message. A PathErr message is forwarded by every transit node and arrives at the ingress.
• ResvErr message: sent downstream by an RSVP node if an error occurs during the processing of a Resv
message. A ResvErr message is forwarded by every transit node and arrives at the egress.
• PathTear message: sent downstream by the ingress to delete information about the local state created
on every node of the path.
• ResvTear message: sent upstream by the egress to delete the local reserved resources assigned to a
path. After receiving the ResvTear message, the ingress sends a PathTear message to the egress.
2022-07-08 2297
Feature Description
1. The ingress configured with RSVP-TE creates a PSB and sends a Path message to transit nodes.
2. After receiving the Path message, the transit node processes and forwards this message, and creates a
PSB.
3. After receiving the Path message, the egress creates a PSB, uses bandwidth reservation information in
the Path message to generate a Resv message, and sends the Resv message to the ingress.
4. After receiving the Resv message, the transit node processes and forwards the Resv message and
creates an RSB.
5. After receiving the Resv message, the ingress creates an RSB and confirms that the resources are
reserved successfully.
Reservation Styles
A reservation style defines how a node reserves resources after receiving a request sent by an upstream
node. The NE40E supports the following reservation styles:
• Fixed filter (FF): defines a distinct bandwidth reservation for data packets from a particular transmit
end.
• Shared explicit (SE): defines a single reservation for a set of selected transmit ends. These senders share
one reservation but assign different labels to a receive end.
Background
2022-07-08 2298
Feature Description
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state block (RSB)
information between nodes. They can also be used to monitor the reachability between RSVP neighbors and
maintain RSVP neighbor relationships. As the sizes of Path and Resv messages are larger, sending many
messages to establish many CR-LSPs causes increased consumption of network resources. RSVP Srefresh can
be used to address this problem.
Implementation
RSVP Srefresh defines new objects based on the existing RSVP protocol:
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state block (RSB)
information between nodes. They can also be used to monitor the reachability between RSVP neighbors and
maintain RSVP neighbor relationships.
Using Path and Resv messages to monitor neighbor reachability delays a traffic switchover if a link fault
occurs and therefore is slow. The RSVP Hello extension can address this problem.
Related Concepts
• RSVP Refresh messages: Although an MPLS TE tunnel is established using Path and Resv messages,
RSVP nodes still send Path and Resv messages over the established tunnel to update the RSVP status.
These Path and Resv messages are called RSVP Refresh messages.
• RSVP GR: ensures uninterrupted transmission on the forwarding plane while an AMB/SMB switchover is
performed on the control plane. A GR helper assists a GR restarter in rapidly restoring the RSVP status.
• TE FRR: a local protection mechanism for MPLS TE tunnels. If a fault occurs on a tunnel, TE FRR rapidly
switches traffic to a bypass tunnel.
Implementation
The principles of the RSVP Hello extension are as follows:
LSRA and LSRB are directly connected on the network shown in Figure 1.
• If RSVP Hello is enabled on LSRA, LSRA sends a Hello Request message to LSRB.
• After LSRB receives the Hello Request message and is also enabled with RSVP Hello, LSRB sends a
Hello ACK message to LSRA.
• After receiving the Hello ACK message, LSRA considers LSRB reachable.
2022-07-08 2300
Feature Description
If LSRA and LSRB are enabled with RSVP GR, and the Hello extension detects that LSRB is lost, LSRA
waits for LSRB to send a Hello Request message carrying a GR extension. After receiving the message,
LSRA starts the GR process on LSRB and sends a Hello ACK message to LSRB. After receiving the Hello
ACK message, LSRB performs the GR process and restores the RSVP soft state. LSRA and LSRB
exchange Hello messages to maintain the restored RSVP soft state.
• If GR is disabled and FRR is enabled, FRR switches traffic to a bypass CR-LSP after the Hello extension detects that
the RSVP neighbor relationship is lost to ensure proper traffic transmission.
• If GR is enabled, the GR process is performed.
Deployment Scenarios
The RSVP Hello extension applies to networks enabled with both RSVP GR and TE FRR.
Static Route
Static route is the simplest method for directing traffic to a CR-LSP in an MPLS TE tunnel. A TE static route
works in the same way as a common static route and has a TE tunnel interface as an outbound interface.
Auto Route
An Interior Gateway Protocol (IGP) uses an auto route related to a CR-LSP in a TE tunnel that functions as a
logical link to calculate a path. The tunnel interface is used as an outbound interface in the auto route. The
TE tunnel is considered a P2P link with a specified metric value. The following auto routes are supported:
• IGP shortcut: A route related to a CR-LSP is not advertised to neighbor nodes, preventing other nodes
from using the CR-LSP.
• Forwarding adjacency: A route related to a CR-LSP is advertised to neighbor nodes, allowing these
nodes to use the CR-LSP.
Forwarding adjacency allows tunnel information to be advertised based on IGP neighbor relationships.
If the forwarding adjacency is used, nodes on both ends of a CR-LSP must be in the same area.
2022-07-08 2301
Feature Description
The following example demonstrates the IGP shortcut and forwarding adjacency.
A CR-LSP over the path LSRG → LSRF → LSRB is established on the network shown in Figure 1, and the TE
metric values are specified. Either of the following configurations can be used:
• The auto route is not used. LSRE uses LSRD as the next hop in a route to LSRA and a route to LSRB;
LSRG uses LSRF as the next hop in a route to LSRA and a route to LSRB.
• The auto route is used. Either IGP shortcut or forwarding adjacency can be configured:
■ The IGP shortcut is used to advertise the route of Tunnel 1. LSRE uses LSRD as the next hop in the
route to LSRA and the route to LSRB; LSRG uses Tunnel 1 as the outbound interface in the route to
LSRA and the route to LSRB. LSRG, unlike LSRE, uses Tunnel 1 in IGP path calculation.
■ The forwarding adjacency is used to advertise the route of Tunnel 1. LSRE uses LSRG as the next
hop in the route to LSRA and the route to LSRB; LSRG uses Tunnel 1 as the outbound interface in
the route to LSRA and the route to LSRB. Both LSRE and LSRG use Tunnel 1 in IGP path calculation.
Policy-based Routing
The policy-based routing (PBR) allows the system to select routes based on user-defined policies, improving
security and load balancing traffic. If PBR is enabled on an MPLS network, IP packets are forwarded over
specific CR-LSPs based on PBR rules.
MPLS TE PBR, the same as IP unicast PBR, is implemented based on a set of matching rules and behaviors.
The rules and behaviors are defined using an apply clause, in which the outbound interface is a specific
2022-07-08 2302
Feature Description
tunnel interface. If packets do not match PBR rules, they are properly forwarded using IP; if they match PBR
rules, they are forwarded over specific CR-LSPs.
Tunnel Policy
Tunnel policies applied to virtual private networks (VPNs) guide VPN traffic to tunnels in either of the
following modes:
• Select-seq mode: The system selects tunnels for VPN traffic in the specified tunnel selection sequence.
• Tunnel binding mode: A CR-LSP is bound to a destination address in a tunnel policy. This policy applies
only to CR-LSPs.
If there is no path meeting the bandwidth requirement of a desired tunnel, a device can tear down an
established tunnel and use bandwidth resources assigned to that tunnel to establish a desired tunnel. This is
called preemption. The following preemption modes are supported:
• Hard preemption: A CR-LSP with a higher priority can directly delete preempted resources assigned to a
CR-LSP with a lower priority. Some traffic is dropped on the CR-LSP with a lower priority during the
hard preemption process. The CR-LSP with a lower priority is immediately deleted after its resources are
preempted.
• Soft preemption: A CR-LSP with a higher priority can directly preempt resources assigned to a CR-LSP
with a lower priority, but the CR-LSP with a lower priority is not deleted immediately after its resources
are preempted. During the soft preemption process, the bandwidth assigned to the CR-LSP with a lower
priority gradually decreases to 0 kbit/s. Some traffic is forwarded while some may be dropped on the
CR-LSP with a lower priority. The CR-LSP with a lower priority is deleted after the soft preemption timer
expires.
CR-LSPs use setup and holding priorities to determine whether to preempt resources. Both the setup and
holding priority values range from 0 to 7. The smaller the value, the higher the priority. If only the setup
priority is configured, the value of the holding priority is equal to that of the setup priority. The setup priority
must be lower than or equal to the holding priority for a tunnel.
The priority and preemption attributes are used in conjunction to determine resource preemption among
tunnels. If multiple CR-LSPs are to be established, CR-LSPs with high priorities can be established by
preempting resources. If resources (such as bandwidth) are insufficient, a CR-LSP with a higher setup priority
can preempt resources of an established CR-LSP with a lower holding priority.
Figure 1 shows the bandwidth of each link. Two TE tunnels are established.
• Tunnel 1: established over the path LSRA → LSRF → LSRD. Its bandwidth is 155 Mbit/s, and its setup
2022-07-08 2303
Feature Description
• Tunnel 2: established over the path LSRB → LSRF → LSRC. Its bandwidth is 155 Mbit/s, and its setup
and holding priority values are 7.
If the link between LSRF and LSRD fails, LSRA recalculates a path LSRA → LSRF → LSRC → LSRE → LSRD for
tunnel 1. The link between LSRF and LSRC is shared by tunnels 1 and 2, but has insufficient bandwidth for
these two tunnels. As a result, preemption is triggered.
• If hard preemption is used, since Tunnel 1 has a higher priority than Tunnel 2, LSRF sends an RSVP
message to tear down Tunnel 2. As a result, some traffic on Tunnel 2 is dropped if Tunnel 2 is
transmitting traffic.
• In soft preemption mode, Tunnel 2 is reestablished along the path LSRB → LSRD → LSRE → LSRC if
LSRB does not tear down original Tunnel 2 after receiving a Resv message from LSRF. Original Tunnel 2
is torn down after traffic switchover is complete.
Background
A tunnel affinity and a link administrative group attribute are 8-bit hexadecimal numbers. An IGP (IS-IS or
OSPF) advertises the administrative group attribute to devices in the same IGP area. RSVP-TE advertises the
tunnel affinity to downstream devices. CSPF on the ingress checks whether administrative group bits match
affinity bits to determine whether a link can be used to establish an LSP.
Hexadecimal calculations are complex, and maintaining and querying tunnels established using hexadecimal
calculations are difficult. To address this issue, the NE40E allows you to assign different names (such as
colors) for the 128 bits in the affinity attribute. Naming affinity bits help verify that tunnel affinity bits
2022-07-08 2304
Feature Description
match link administrative group bits, facilitating network planning and deployment.
Implementation
An affinity name template can be configured to manage the mapping between affinity bits and names. On
an MPLS network, you are advised to configure the same template for all nodes, because inconsistent
configuration may cause a service deployment failure. As shown in Figure 1, the affinity bits are named
using colors. For example, bit 1 is named "red", bit 4 is "blue", and bit 6 is "brown." You can name each of
the 128 affinity bits differently.
Bits in a link administrative group must also be configured with the same names as the affinity bits.
After naming affinity bits, you can determine which links a CR-LSP can include or exclude on the ingress.
Rules for selecting links for path calculation are as follows:
• IncludeAny: CSPF includes a link when calculating a path, if at least one link administrative group bit
has the same name as an affinity bit.
• ExcludeAny: CSPF excludes a link when calculating a path, if any link administrative group bit has the
same name as an affinity bit.
• IncludeAll: CSPF includes a link when calculating a path, only if each link administrative group bit has
the same name as each affinity bit.
Usage Scenarios
The affinity naming function is used when CSPF calculates paths over which RSVP-TE establishes CR-LSPs.
Benefits
The affinity naming function allows you to easily and rapidly use affinity bits to control paths over which CR-
LSPs are established.
2022-07-08 2305
Feature Description
Background
A main function of MPLS TE tunnels is to optimize traffic distribution over a network. Generally, the initial
bandwidth of an MPLS TE tunnel is configured based on the initial bandwidth requirement of services, and
its path is calculated and set up based on the initial network status. However, a network topology changes in
some cases, which may cause bandwidth wastes or require traffic distribution optimization. As such, MPLS TE
tunnel re-optimization is required.
Implementation
Tunnel re-optimization allows the ingress to re-optimize a CR-LSP based on certain events so that the CR-
LSP can be established over the optimal path with the smallest metric value.
• If the fixed filter (FF) resource reservation style is used, tunnel re-optimization cannot be configured.
• Tunnel re-optimization is performed based on tunnel path constraints. During path calculation for re-optimization,
path constraints, such as explicit path constraints and bandwidth constraints, are also considered.
Re-optimization is classified into the following types based on the triggering mode:
• Automatic re-optimization
An interval at which a tunnel is re-optimized is configured on the ingress. When the interval elapses,
CSPF attempts to calculate a new path. If the calculated path has a metric smaller than that of the
existing CR-LSP, a new CR-LSP is set up over the new path. After the CR-LSP is successfully set up, the
ingress instructs the forwarding plane to switch traffic to the new CR-LSP and tears down the original
CR-LSP. Re-optimization is then complete. If the CR-LSP fails to be set up, traffic is still forwarded along
the original CR-LSP.
• Manual re-optimization
The re-optimization command is run in the user view to trigger re-optimization on the tunnel ingress.
The make-before-break mechanism is used to ensure uninterrupted service transmission during the re-
optimization process. This means that a new CR-LSP must be established first. Traffic is switched to the new
CR-LSP before the original CR-LSP is torn down.
2022-07-08 2306
Feature Description
Background
MPLS TE tunnels are used to optimize traffic distribution over a network. Traffic that frequently changes
wastes MPLS TE tunnel bandwidth; therefore, automatic bandwidth adjustment is used to prevent this waste.
A bandwidth is initially set to meet the requirement for the maximum volume of services to be transmitted
over an MPLS TE tunnel, to ensure uninterrupted transmission.
Related Concepts
Automatic bandwidth adjustment allows the ingress to dynamically detect bandwidth changes and
periodically attempt to reestablish a tunnel with the needed bandwidth.
Sampling frequency B Interval at which traffic rates on a specific tunnel interface are
sampled. This value takes the larger value in the mpls te timer
auto-bandwidth command and the set flow-stat interval
command.
Implementation
Automatic bandwidth adjustment is enabled on a tunnel interface of the ingress. The automatic bandwidth
adjustment procedure on the ingress is as follows:
1. Samples traffic.
The ingress starts a bandwidth adjustment timer (A) and samples traffic at a specific interval (B
seconds) to obtain the instantaneous bandwidth during each sampling period. The ingress records the
instantaneous bandwidths.
2022-07-08 2307
Feature Description
3. Calculates a path.
The ingress runs CSPF to calculate a path with bandwidth D and establishes a new CR-LSP over that
path.
The preceding procedure repeats each time automatic bandwidth adjustment is triggered. Bandwidth
adjustment is not needed if traffic fluctuates below a specific threshold. The ingress calculates an average
bandwidth after the sampling interval time elapses. The ingress performs automatic bandwidth adjustment if
the ratio of the difference between the average and existing bandwidths to the existing bandwidth exceeds a
specific threshold. The following inequality applies:
[ |(D - C)| / C ] x 100% > Threshold
Other Usage
The following functions are supported based on automatic bandwidth adjustment:
• The ingress only samples traffic on a tunnel interface, and does not perform bandwidth adjustment.
• The upper and lower limits can be set to define a range, within which the bandwidth can fluctuate.
Background
MPLS TE provides various TE and reliability functions, and MPLS TE applications increase. The complexity of
MPLS TE tunnel configurations, however, also increases. Manually configuring full-meshed TE tunnels on a
large network is laborious and time-consuming. To address the issues, the HUAWEI NE40E-M2 series
implements the IP-prefix tunnel function. This function uses an IP prefix list to automatically establish a
number of tunnels to specified destination IP addresses and applies a tunnel template that contains public
attributes to these tunnels. MPLS TE tunnels that meet expectations can be established in a batch.
2022-07-08 2308
Feature Description
Benefits
The IP-prefix tunnel function allows you to establish MPLS TE tunnels in a batch. This function satisfies
various configuration requirements, such as reliability requirements, and reduces TE network deployment
workload.
Implementation
The IP-prefix tunnel implementation is as follows:
3. Use the template to automatically establish MPLS TE tunnels to the specified destination IP addresses.
The IP-prefix tunnel function uses the IP prefix list to filter LSR IDs in the traffic engineering database
(TEDB). Only the LSR IDs that match the IP prefix list can be used as destination IP addresses of MPLS TE
tunnels that are to be automatically established. After LSR IDs in the TEDB are added or deleted, the IP-
prefix tunnel function automatically creates or deletes tunnels, respectively. The tunnel template that the IP-
prefix tunnel function uses contains various configured attributes, such as the bandwidth, priorities, affinities,
TE FRR, CR-LSP backup, and automatic bandwidth adjustment. The attributes are shared by MPLS TE tunnels
that are established in a batch.
12.4.5.1 Make-Before-Break
The make-before-break mechanism prevents traffic loss during a traffic switchover between two CR-LSPs.
This mechanism improves MPLS TE tunnel reliability.
Background
MPLS TE provides a set of tunnel update mechanisms, which prevents traffic loss during tunnel updates. In
real-world situations, an administrator can modify the bandwidth or explicit path attributes of an established
MPLS TE tunnel based on service requirements. An updated topology allows for a path better than the
existing one, over which an MPLS TE tunnel can be established. Any change in bandwidth or path attributes
causes a CR-LSP in an MPLS TE tunnel to be reestablished using new attributes and causes traffic to switch
from the previous CR-LSP to the newly established CR-LSP. During the traffic switchover, the make-before-
break mechanism prevents traffic loss that occurs if the traffic switchover is implemented more quickly than
the path switchover.
Principles
2022-07-08 2309
Feature Description
Make-before-break is a mechanism that allows a CR-LSP to be established using changed bandwidth and
path attributes over a new path before the original CR-LSP is torn down. It helps minimize data loss and
additional bandwidth consumption. The new CR-LSP is called a modified CR-LSP. Make-before-break is
implemented using the shared explicit (SE) resource reservation style.
The new CR-LSP competes with the original CR-LSP on some shared links for bandwidth. The new CR-LSP
cannot be established if it fails the competition. The make-before-break mechanism allows the system to
reserve bandwidth used by the original CR-LSP for the new CR-LSP, without calculating the bandwidth to be
reserved. Additional bandwidth is used if links on the new path do not overlap the links on the original path.
In this example, the maximum reservable bandwidth on each link is 60 Mbit/s on the network shown in
Figure 1. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is established, with the bandwidth of 40
Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data because LSRE has a light
load. The reservable bandwidth of the link between LSRC and LSRD is just 20 Mbit/s. The total available
bandwidth for the new path is less than 40 Mbit/s. The make-before-break mechanism can be used in this
situation.
The make-before-break mechanism allows the newly established CR-LSP over the path LSRA → LSRE →
LSRC → LSRD to use the bandwidth of the original CR-LSP's link between LSRC and LSRD. After the new CR-
LSP is established over the path, traffic switches to the new CR-LSP, and the original CR-LSP is torn down.
In addition to the preceding method, another method of increasing the tunnel bandwidth can be used. If the
reservable bandwidth of a shared link increases to a certain extent, a new CR-LSP can be established.
In the example shown in Figure 1, the maximum reservable bandwidth on each link is 60 Mbit/s. A CR-LSP
along the path LSRA → LSRB → LSRC → LSRD is established, with the bandwidth of 30 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data because LSRE has a light
load, and the bandwidth is expected to increase to 40 Mbit/s. The reservable bandwidth of the link between
LSRC and LSRD is just 30 Mbit/s. The total available bandwidth for the new path is less than 40 Mbit/s. The
make-before-break mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path LSRA → LSRE →
LSRC → LSRD to use the bandwidth of the original CR-LSP's link between LSRC and LSRD. The bandwidth of
the new CR-LSP is 40 Mbit/s, out of which 30 Mbit/s is released by the link between LSRC and LSRD. After
the new CR-LSP is established, traffic switches to the new CR-LSP and the original CR-LSP is torn down.
2022-07-08 2310
Feature Description
If an upstream node on an MPLS network is busy but its downstream node is idle or an upstream node is
idle but its downstream node is busy, a CR-LSP may be torn down before the new CR-LSP is established,
causing a temporary traffic interruption.
To prevent this temporary traffic interruption, the switching and deletion delays are used together with the
make-before-break mechanism. In this case, traffic switches to a new CR-LSP a specified delay time later
after a new CR-LSP is established. The original CR-LSP is torn down a specified delay later after a new CR-
LSP is established. The switching delay and deletion delay can be manually configured.
12.4.5.2 TE FRR
Traffic engineering (TE) fast reroute (FRR) protects links and nodes on MPLS TE tunnels. If a link or node
fails, TE FRR rapidly switches traffic to a backup path, minimizing traffic loss.
Background
Generally, a link or node failure in an MPLS TE tunnel triggers a primary/backup CR-LSP switchover. During
the switchover, IGP routes converge to a backup CR-LSP, and CSPF recalculates a path over which the
primary CR-LSP can be reestablished. Traffic is dropped during this process.
TE FRR can be used to minimize traffic loss. It pre-establishes backup paths that bypass faulty links and
nodes. If a link or node on an MPLS TE tunnel fails, traffic can be rapidly switched to a backup path to
prevent traffic loss, without depending on IGP route convergence. In addition, when traffic is transmitted
along the backup path, the ingress will initiate the reestablishment of the primary path.
Benefits
TE FRR provides carrier-class local protection capabilities for MPLS TE, improving the reliability of an entire
network.
Related Concepts
Facility backup mode
In facility backup mode, TE FRR establishes a bypass tunnel for each link or node that may fail on a primary
tunnel, as shown in Figure 1. A bypass tunnel can protect traffic on multiple primary tunnels. In terms of the
protection granularity, facility backup enables tunnels to protect tunnels. This mode is extensible, resource
efficient, and easy to implement. However, bypass tunnels can only be manually planned and configured.
This is time-consuming and laborious on a complex network. The maintenance workload is also heavy.
2022-07-08 2311
Feature Description
In one-to-one backup mode, TE FRR automatically creates a backup CR-LSP on each possible node along a
primary CR-LSP to protect downstream links or nodes, as shown in Figure 2. In terms of the protection
granularity, one-to-one backup enables CR-LSPs to protect CR-LSPs. This mode is easy to configure,
eliminates manual network planning, and provides flexibility on a complex network. However, this mode has
low extensibility, requires each node to maintain backup CR-LSP status, and consumes more bandwidth.
Bypass CR-LSP Facility backup A backup CR-LSP that can protect multiple primary CR-LSPs. A bypass
CR-LSP and its primary CR-LSP belong to different tunnels.
Detour CR-LSP One-to-one A backup CR-LSP that is automatically established on each node of a
backup primary CR-LSP. A detour LSP and its primary CR-LSP belong to the
same tunnel.
2022-07-08 2312
Feature Description
PLR Facility backup PLR is short for point of local repair. It is the ingress of a bypass or
One-to-one detour CR-LSP. It must reside on a primary CR-LSP, and can be the
backup ingress or transit node of a primary CR-LSP, but cannot be the egress
of a primary CR-LSP.
DMP One-to-one DMP is short for detour merge point. It is an aggregation point of
backup detour CR-LSPs.
Protected Node If a PLR and an MP are not directly connected, a backup CR-LSP protects the
object protection direct link to the PLR and also nodes between the PLR and MP. Both the
bypass CR-LSP in Figure 1 and Detour CR-LSP1 in Figure 2 provide node
protection.
Link If a PLR and an MP are directly connected, a backup CR-LSP only protects the
protection direct link to the PLR. Detour CR-LSP2 in Figure 2 provides link protection.
Bandwidth Bandwidth It is recommended that the By default, a detour CR-LSP has the
guarantee protection bandwidth of a bypass CR-LSP be less same bandwidth as its primary CR-LSP
than or equal to the bandwidth of and provides bandwidth protection
the primary CR-LSP. automatically for the primary CR-LSP.
Implementation
Manual mode A bypass CR-LSP is manually Not supported.
configured.
2022-07-08 2313
Feature Description
In facility backup mode, an established bypass CR-LSP supports a combination of the above protection types. For
example, a bypass CR-LSP can implement manual, node, and bandwidth protection.
Implementation
Facility backup mode
In this mode, TE FRR is implemented as follows:
2022-07-08 2314
Feature Description
The PLR of the primary CR-LSP already knows the next hop (NHOP) and next-next hop (NNHOP). Link
protection can be provided if the egress LSR ID of the bypass CR-LSP is the same as the NHOP LSR ID.
Node protection can be provided if the egress LSR ID of the bypass CR-LSP is the same as the NNHOP
LSR ID. For example, in Figure 4, Bypass CR-LSP 1 protects a link, and Bypass CR-LSP 2 protects a
node.
If multiple bypass CR-LSPs are available on a node, the node selects a bypass CR-LSP based on the
following factors in sequence: bandwidth/non-bandwidth protection, implementation mode, and
protected object. Bandwidth protection takes precedence over non-bandwidth protection, node
protection takes precedence over link protection, and manual protection takes precedence over
automatic protection. If both Bypass CR-LSP 1 and Bypass CR-LSP 2 shown in Figure 4 are manually
configured and provide bandwidth protection, the primary CR-LSP selects Bypass CR-LSP 2 for binding.
If Bypass CR-LSP 1 provides bandwidth protection but Bypass CR-LSP 2 provides only path protection,
the primary CR-LSP selects Bypass CR-LSP 1 for binding.
After a bypass CR-LSP is successfully bound to the primary CR-LSP, the NHLFE of the primary CR-LSP
is recorded. The NHLFE contains the NHLFE index of the bypass CR-LSP and the inner label assigned
by the MP for the previous node. The inner label is used to guide traffic forwarding during FRR
switching.
3. Fault detection
• In link protection, a data link layer protocol is used to detect and advertise faults. The fault
detection speed at the data link layer depends on link types.
• In node protection, a data link layer protocol is used to detect link faults. If no link fault occurs,
RSVP Hello detection or BFD for RSVP is used to detect faults in protected nodes.
In node protection, only the link between the protected node and PLR is protected. The PLR cannot detect faults
in the link between the protected node and MP.
2022-07-08 2315
Feature Description
4. Switchover
A switchover is a process that switches both service traffic and RSVP messages to a bypass CR-LSP and
notifies the upstream node of the switchover when a primary CR-LSP fails. During the switchover, the
MPLS label nesting mechanism is used. The PLR pushes the label that the MP assigns for the primary
CR-LSP as the inner label, and then the label for the bypass CR-LSP as the outer label. The
penultimate hop along the bypass CR-LSP removes the outer label from the packet and forwards the
packet only with the inner label to the MP. As the inner label is assigned by the MP, it can forward the
packet to the next hop on the primary CR-LSP.
Assume that a primary CR-LSP and a bypass CR-LSP are set up. Figure 5 describes the labels assigned
by each node on the primary CR-LSP and forwarding actions. The bypass CR-LSP provides node
protection. If LSRC or the link between LSRB and LSRC fails, traffic is switched to the bypass CR-LSP.
During the switchover, the PLR LSRB swaps 1024 for 1022 and then pushes label 34 as an outer label.
This ensures that the packet can be forwarded to the next hop after reaching LSRD. Figure 6 shows
the forwarding process.
5. Switchback
After the switchover, the ingress of the primary CR-LSP attempts to reestablish the primary CR-LSP.
After the primary CR-LSP is successfully reestablished, service traffic and RSVP messages are switched
back from the bypass CR-LSP to the primary CR-LSP. The reestablished CR-LSP is called a modified CR-
2022-07-08 2316
Feature Description
LSP. In this process, TE FRR (including Auto FRR) adopts the make-before-break mechanism. With this
mechanism, the original primary CR-LSP is torn down only after the modified CR-LSP is set up
successfully.
• Link protection is provided if the detour CR-LSP's egress LSR ID is the same as the NHOP LSR ID.
(For example, Detour CR-LSP2 in Figure 7 provides link protection.)
• Node protection is provided if the detour CR-LSP's egress LSR ID is not the same as the NHOP
LSR ID (that is, other nodes exist between the PLR and MP). (For example, Detour CR-LSP1 in
Figure 7 provides node protection.)
If a PLR supports detour CR-LSPs that provide both link and node protection, the PLR can establish
only detour CR-LSPs that provide node protection.
3. Fault detection
• In link protection, a data link layer protocol is used to detect and advertise faults. The fault
detection speed at the data link layer depends on link types.
• In node protection, a data link layer protocol is used to detect link faults. If no link fault occurs,
RSVP Hello detection or BFD for RSVP is used to detect faults in protected nodes.
2022-07-08 2317
Feature Description
In node protection, only the link between the protected node and PLR is protected. The PLR cannot detect faults
in the link between the protected node and MP.
4. Switchover
A switchover is a process that switches both service traffic and RSVP messages to a detour CR-LSP and
notifies the upstream node of the switchover when a primary CR-LSP fails. During a switchover in this
mode, the MPLS label nesting mechanism is not used, and the label stack depth remains unchanged.
This is different from that in facility backup mode.
In Figure 7, a primary CR-LSP and two detour CR-LSPs are established. If no faults occur, traffic is
forwarded along the primary CR-LSP based on labels. If the link between LSRB and LSRC fails, LSRB
detects the link fault and switches traffic to Detour CR-LSP2 by swapping label 1024 for label 36 in a
packet and sending the packet to LSRE. LSRE is the DMP of these two detour CR-LSPs. On LSRE,
detour LSPs 1 and 2 merge into one detour CR-LSP (for example, Detour CR-LSP1). LSRE swaps the
existing label for label 37 and sends the packet to LSRC. On LSRC, Detour CR-LSP1 overlaps with the
primary CR-LSP. Therefore, LSRC uses the label of the primary CR-LSP and sends the packet to the
egress.
5. Switchback
After the switchover, the ingress of the primary CR-LSP attempts to reestablish the primary CR-LSP,
and service traffic and RSVP messages are switched back from the detour CR-LSP to the primary CR-
LSP after it is established successfully. The reestablished CR-LSP is called a modified CR-LSP. In this
process, TE FRR adopts the make-before-break mechanism. With this mechanism, the original primary
CR-LSP is torn down only after the modified CR-LSP is set up successfully.
Other Functions
When TE FRR is in the FRR in-use state, the RSVP messages sent by the transmit interface do not carry the
interface authentication TLV, and the receive interface does not perform interface authentication on the
RSVP messages that do not carry the authentication TLV and are in the FRR in-use state. In this case, you
can configure neighbor authentication.
Board removal protection: When the interface board where a primary CR-LSP's outbound interface resides is
removed from a PLR, MPLS TE traffic is rapidly switched to a backup path. When the interface board is re-
installed, MPLS TE traffic can be switched back to the primary path if the outbound interface of the primary
path is still available. Board removal protection protects traffic on the primary CR-LSP's outbound interface
of the PLR.
Without board removal protection, after an interface board on which a tunnel interface resides is removed,
tunnel information is lost. To prevent tunnel information loss, ensure that the interface board to be removed
does not have the following interfaces: primary CR-LSP's tunnel interface on the PLR, bypass CR-LSP's tunnel
interface, bypass CR-LSP's outbound interface, or detour CR-LSP's outbound interface. Configuring a TE
tunnel interface on a PLR's IPU is recommended.
2022-07-08 2318
Feature Description
After a TE tunnel interface is configured on the IPU, if the interface board on which the physical outbound
interface of the primary CR-LSP resides is removed or fails, the outbound interface enters the stale state and
the FRR-enabled primary CR-LSP that passes through the outbound interface is not deleted. When the
interface board is re-inserted, the interface becomes available, and the primary CR-LSP reestablishment
starts.
• Hot standby: A backup CR-LSP is set up immediately after a primary CR-LSP is set up. If the primary CR-
LSP fails, traffic switches to the backup CR-LSP. If the primary CR-LSP recovers, traffic switches back to
the primary CR-LSP. Hot-standby CR-LSPs support best-effort paths.
• Ordinary backup: A backup CR-LSP is set up after a primary CR-LSP fails. If the primary CR-LSP fails, a
backup CR-LSP is set up and takes over traffic from the primary CR-LSP. If the primary CR-LSP recovers,
traffic switches back to the primary CR-LSP.
Table 1 lists differences between hot-standby and ordinary CR-LSPs.
When a backup Created immediately after the Created only after the primary CR-LSP fails.
CR-LSP is primary CR-LSP is established.
established
Path overlapping Whether or not a primary CR-LSP Allowed to use the path of the primary CR-
overlaps a backup CR-LSP can be LSP in any case.
determined manually. If an explicit
path is allowed for a backup CR-
LSP, the backup CR-LSP can be set
up over an explicit path.
• Best-effort path
The hot standby function supports the establishment of best-effort paths. If both the primary and hot-
2022-07-08 2319
Feature Description
standby CR-LSPs fail, a best-effort path is established and takes over traffic.
As shown in Figure 1, the primary CR-LSP uses the path PE1 -> P1 -> PE2, and the backup CR-LSP uses
the path PE1 -> P2 -> PE2. If both the primary and backup CR-LSPs fail, PE1 triggers the setup of a best-
effort path PE1 -> P2 -> P1 -> PE2.
A best-effort path does not provide reserved bandwidth for traffic. The affinity attribute and hop limit are
configured as needed.
• Automatic switchover: Traffic switches to a hot-standby CR-LSP from a primary CR-LSP when the
primary CR-LSP goes Down. If the primary CR-LSP goes Up again, traffic automatically switches back to
the primary CR-LSP. This is the default setting. You can determine whether to switch traffic back to the
primary CR-LSP and set a revertive switchover delay time.
• Manual switchover: You can manually trigger a traffic switchover. Forcibly switch traffic from the
primary CR-LSP to a hot-standby CR-LSP before some devices on a primary CR-LSP are upgraded or
primary CR-LSP parameters are adjusted. After the required operations are complete, manually switch
traffic back to the primary CR-LSP.
Path Overlapping
The path overlapping function can be configured for hot-standby CR-LSPs. This function allows a hot-
standby CR-LSP to use links of a primary CR-LSP. The hot-standby CR-LSP protects traffic on the primary CR-
LSP.
2022-07-08 2320
Feature Description
■ Fast reroute (FRR) is a partial protection mechanism used to protect a link or node on a CR-LSP. In
addition, FRR rapidly responds to a fault and takes effect temporarily, which minimizes the
switchover time.
Background
Most live IP radio access networks (RANs) use ring topologies and have the access ring separated from the
aggregation ring. Figure 1 illustrates an E2E VPN bearer solution. On this network, an inter-layer MPLS TE
tunnel is established between a cell site gateway (CSG) on the access ring and a radio service gateway (RSG)
on the aggregation ring. The MPLS TE tunnel implements E2E VPN service transmission. To meet high
2022-07-08 2321
Feature Description
reliability requirements for IP RAN bearer, hot standby is deployed for the TE tunnel, and the primary and
hot-standby CR-LSPs needs to be separated.
However, the existing CSPF algorithm used by TE selects a CR-LSP with the smallest link metric and cannot
automatically calculate separated primary and hot-standby CR-LSPs. Assume that the TE metric of each link
is as shown in Figure 1. CSPF calculates the primary CR-LSP as CSG-ASG1-ASG2-RSG, but cannot calculate a
hot-standby CR-LSP that is completely separated from the primary CR-LSP. However, two completely
separated CR-LSPs exist on the network: CSG-ASG1-RSG and CSG-ASG2-RSG.
Those two completely separated CR-LSPs can be obtained by specifying strict explicit paths. In real-world
situations, nodes are frequently added or deleted on an IP RAN. The method of specifying strict explicit paths
requires you to frequently modify path information, causing heavy O&M workload.
An ideal solution to the problem is to optimize CSPF path calculation so that CSPF can automatically
calculate separated primary and hot-standby CR-LSPs. To achieve this purpose, isolated CR-LSP computation
is introduced.
Implementation
Isolated CR-LSP computation uses the disjoint algorithm to optimize CSPF path calculation. On the network
shown in Figure 2, before the disjoint algorithm is used, CSPF selects CR-LSPs based on link metrics. It
calculates LSRA-LSRB-LSRC-LSRD as the primary CR-LSP, and then LSRA-LSRC-LSRD as the hot-standby CR-
LSP if the hot-standby overlap-path function is configured. These CR-LSPs, however, are not completely
separated.
After the disjoint algorithm is used, CSPF calculates the primary and backup CR-LSPs at the same time and
excludes the paths that may cause overlapping. Two completely separated CR-LSPs can then be calculated,
with the primary CR-LSP being LSRA-LSRB-LSRD, and the hot-standby CR-LSP being LSRA-LSRC-LSRD.
2022-07-08 2322
Feature Description
• CSPF calculates separate primary and hot-standby CR-LSPs only when the network topology permits. If there are
no two completely separate CR-LSPs, CSPF calculates the primary and hot-standby CR-LSPs based on the original
CSPF algorithm.
• The disjoint algorithm is mutually exclusive with the explicit path and hop limit. Ensure that these features are not
deployed before enabling the disjoint algorithm. If this algorithm has been enabled, these features cannot be
deployed.
• After you enable the disjoint algorithm, the shared risk link group (SRLG), if configured, becomes ineffective.
• If an affinity constraint is configured, the disjoint algorithm takes effect only when the primary and backup CR-
LSPs have the same affinity property or no affinity property is configured for the primary and backup CR-LSPs.
Application Scenarios
This feature applies to scenarios where RSVP-TE tunnels and hot standby are deployed.
Benefits
Isolated CR-LSP computation enables CSPF to isolate the primary and hot-standby CR-LSPs if possible. This
feature brings the following benefits:
• Reduces the maintenance workload as explicit path information does not need to be maintained.
2022-07-08 2323
Feature Description
ensures that MPLS TE traffic travels properly along the CR-LSP, therefore improving CR-LSP reliability and
service transmission quality.
Background
If a device is unable to store new link state protocol data units (LSPs) or use LSPs to update its link state
database (LSDB) information, the device will calculate incorrect routes, causing forwarding failures. The IS-IS
overload function enables the device to set the device to the IS-IS overload state to prevent such forwarding
failures. By configuring the ingress to establish a CR-LSP that excludes the overloaded IS-IS device, the
association between CR-LSP establishment and the IS-IS overload function helps the CR-LSP reliably transmit
MPLS TE traffic.
Related Concepts
IS-IS overload state
When a device cannot store new LSPs or use LSPs to update its LSDB information using LSPs, the device will
incorrectly calculate IS-IS routes. In this situation, the device will enter the overload state. For example, an
IS-IS device becomes overloaded if its memory resources decrease to a specified threshold or if an exception
occurs on the device. A device can be manually configured to enter the IS-IS overload state.
Implementation
In Figure 1, RT1 supports the association between CR-LSP establishment and the IS-IS overload function. RT3
and RT4 support the IS-IS overload function.
In Figure 1, devices RT1 to RT4 are in an IS-IS area. RT1 establishes a CR-LSP named Tunnel1 destined for
RT2 along the path RT1 -> RT3 -> RT2. Association between the CR-LSP establishment and IS-IS overload is
implemented as follows:
1. If RT3 enters the IS-IS overload state, IS-IS propagates packets carrying overload information in the IS-
IS area.
2022-07-08 2324
Feature Description
2. RT1 determines that RT3 is overloaded and re-calculates the CR-LSP destined for RT2.
3. RT1 calculates a new path RT1 -> RT4 - >RT2, which bypasses the overloaded IS-IS node. Then RT1
establishes a new CR-LSP along this path.
4. After the new CR-LSP is established, RT1 switches traffic from the original CR-LSP to the new CR-LSP,
ensuring service transmission quality.
12.4.5.6 SRLG
The shared risk link group (SRLG) functions as a constraint that is used to calculate a backup path in the
scenario where CR-LSP hot standby or TE FRR is used. This constraint helps prevent backup and primary
paths from overlapping over links with the same risk level, improving MPLS TE tunnel reliability as a
consequence.
Background
Carriers use CR-LSP hot standby or TE FRR to improve MPLS TE tunnel reliability. However, in real-world
situations protection failures can occur, requiring the SRLG technique to be configured as a preventative
measure, as the following example demonstrates.
The primary tunnel is established over the path PE1 → P1 → P2 → PE2 on the network shown in Figure 1.
The link between P1 and P2 is protected by a TE FRR bypass tunnel established over the path P1 → P3 → P2.
In the lower part of Figure 1, core nodes P1, P2, and P3 are connected using a transport network device.
They share some transport network links marked in yellow. If a fault occurs on a shared link, both the
2022-07-08 2325
Feature Description
primary and FRR bypass tunnels are affected, causing an FRR protection failure. An SRLG can be configured
to prevent the FRR bypass tunnel from sharing a link with the primary tunnel, ensuring that FRR properly
protects the primary tunnel.
Related Concepts
An SRLG is a set of links at the same risk of faults. If a link in an SRLG fails, other links also fail. If a link in
this group is used by a hot-standby CR-LSP or FRR bypass tunnel, the hot-standby CR-LSP or FRR bypass
tunnel cannot provide protection.
Implementation
An SRLG link attribute is a number and links with the same SRLG number are in a single SRLG.
Interior Gateway Protocol (IGP) TE advertises SRLG information to all nodes in a single MPLS TE domain.
The constrained shortest path first (CSPF) algorithm uses the SRLG attribute together with other constraints,
such as bandwidth, to calculate a path.
The MPLS TE SRLG works in either of the following modes:
• Strict mode: The SRLG attribute is a necessary constraint used by CSPF to calculate a path for a hot-
standby CR-LSP or an FRR bypass tunnel.
• Preferred mode: The SRLG attribute is an optional constraint used by CSPF to calculate a path for a hot-
standby CR-LSP or FRR bypass tunnel. For example, if CSPF fails to calculate a path for a hot-standby
CR-LSP based on the SRLG attribute, CSPF recalculates the path, regardless of the SRLG attribute.
Usage Scenario
The SRLG attribute is used in either the TE FRR or CR-LSP hot-standby scenario.
Benefits
The SRLG attribute limits the selection of a path for a hot-standby CR-LSP or an FRR bypass tunnel, which
prevents the primary and bypass tunnels from sharing links with the same risk level.
Related Concepts
As shown in Figure 1, concepts related to a tunnel protection group are as follows:
2022-07-08 2326
Feature Description
• Protection switchover: quickly switches traffic from a faulty working tunnel to a protection tunnel in a
tunnel protection group, which improves network reliability.
A primary tunnel (tunnel-1) and a protection tunnel (tunnel-2) are established on LSRA on the network
shown in Figure 1.
On LSRA (ingress), tunnel-2 is configured as a protection tunnel for tunnel-1 (primary tunnel). If the ingress
detects a fault in tunnel-1, traffic switches to tunnel-2, and LSRA attempts to reestablish tunnel-1. If tunnel-
1 is successfully established, LSRA determines whether to switch traffic back to the primary tunnel based on
the configured policy.
Implementation
An MPLS TE tunnel protection group uses a pre-configured protection tunnel to protect traffic on the
working tunnel to improve tunnel reliability. Therefore, networking planning needs to be performed before
you deploy MPLS TE tunnel protection groups. To ensure the improved performance of the protection tunnel,
the protection tunnel must exclude links and nodes through which the working tunnel passes.
Table 1 describes the implementation of a tunnel protection group.
SequenceProcess Description
Number
1 Establishment The working and protection tunnels must have the same ingress and destination
address. The tunnel establishment process is the same as that of as an ordinary TE
tunnel. The protection tunnel can use attributes that differ from those for the
working tunnel. To implement better protection, ensure that the working and
protection tunnels are established over different paths as much as possible.
NOTE:
Attributes for a protection tunnel can be configured independently of those for the
2022-07-08 2327
Feature Description
SequenceProcess Description
Number
2 Binding After the tunnel protection group function is enabled for a working tunnel, the
working tunnel and protection tunnel are bound to form a tunnel protection group
based on the tunnel ID of the protection tunnel.
3 Fault MPLS OAM/MPLS-TP OAM is used to detect faults in an MPLS TE tunnel protection
detection group, so that protection switching can be quickly triggered.
4 Protection The tunnel protection group supports either of the following protection switching
switching modes:
Manual switching: The network administrator runs a command to forcibly switch
traffic.
Automatic switching: Traffic is automatically switched to the protection tunnel
when a fault is detected on the working tunnel.
A switching interval can be set for automatic switching.
An MPLS TE tunnel protection group only supports bidirectional switching.
Specifically, if a traffic switchover is performed for traffic in one direction, a traffic
switchover is also performed for traffic in the opposite direction.
5 Switchback After protection switching is complete, the system attempts to reestablish the
working tunnel. If the working tunnel is successfully established, the system
determines whether to switch traffic back to the working tunnel according to the
configured switchback policy.
Protected object Primary and backup CR-LSPs are In a tunnel protection group, one tunnel
2022-07-08 2328
Feature Description
LSP attributes The primary and backup CR-LSPs have The attributes of the tunnels in the
the same attributes (such as protection group are independent of each
bandwidth, setup priority, and hold other. For example, a protection tunnel
priority), except for the TE FRR without bandwidth can protect a working
attribute. tunnel requiring bandwidth protection.
Figure 1 BFD
On the network shown in Figure 1, without BFD, if LSRE is faulty, LSRA and LSRF cannot immediately detect
the fault due to the existence of Layer 2 switches, and the Hello mechanism will be used for fault detection.
However, Hello mechanism-based fault detection is time-consuming.
To address these issues, BFD can be deployed. With BFD, if LSRE fails, LSRA and LSRF can detect the fault in
a short time, and traffic can be rapidly switched to the path LSRA -> LSRB -> LSRD -> LSRF.
BFD for TE can quickly detect faults on CR-LSPs. After detecting a fault on a CR-LSP, BFD immediately
2022-07-08 2329
Feature Description
notifies the forwarding plane of the fault to rapidly trigger a traffic switchover. BFD for TE is usually used
together with the hot-standby CR-LSP mechanism.
A BFD session is bound to a CR-LSP and established between the ingress and egress. A BFD packet is sent by
the ingress to the egress along the CR-LSP. Upon receipt, the egress responds to the BFD packet. The ingress
can rapidly monitor the link status of the CR-LSP based on whether a reply packet is received.
After detecting a link fault, BFD reports the fault to the forwarding module. The forwarding module searches
for a backup CR-LSP and switches service traffic to the backup CR-LSP. The forwarding module then reports
the fault to the control plane.
On the network shown in Figure 2, a BFD session is set up to detect faults on the link of the primary LSP. If a
fault occurs on this link, the BFD session on the ingress immediately notifies the forwarding plane of the
fault. The ingress switches traffic to the backup CR-LSP and sets up a new BFD session to detect faults on
the link of the backup CR-LSP.
2022-07-08 2330
Feature Description
On the network shown in Figure 3, a primary CR-LSP is established along the path LSRA -> LSRB, and a hot-
standby CR-LSP is configured. A BFD session is set up between LSRA and LSRB to detect faults on the link of
the primary CR-LSP. If a fault occurs on the link of the primary CR-LSP, the BFD session rapidly notifies LSRA
of the fault. After receiving the fault information, LSRA rapidly switches traffic to the hot-standby CR-LSP to
ensure traffic continuity.
Benefits
No tunnel protection is provided in the NG-MVPN over P2MP TE function or VPLS over P2MP TE function. If
a tunnel fails, traffic can only be switched using route change-induced hard convergence, which renders low
performance. This function provides dual-root 1+1 protection for the NG-MVPN over P2MP TE function and
VPLS over P2MP TE function. If a P2MP TE tunnel fails, BFD for P2MP TE rapidly detects the fault and
switches traffic, which improves fault convergence performance and reduces traffic loss.
Principles
2022-07-08 2331
Feature Description
In Figure 1, BFD is enabled on the root PE1 and the backup root PE2. Leaf nodes UPE1 to UEP4 are enabled
to passively create BFD sessions. Both PE1 and PE2 sends BFD packets to all leaf nodes along P2MP TE
tunnels. The leaf nodes receive the BFD packets transmitted only on the primary tunnel. If a leaf node
receives detection packets within a specified interval, the link between the root node and leaf node is
working properly. If a leaf node fails to receive BFD packets within a specified interval, the link between the
root node and leaf node fails. The leaf node then rapidly switches traffic to a protection tunnel, which
reduces traffic loss.
Background
When a Layer 2 device is deployed on a link between two RSVP nodes, an RSVP node can only use the Hello
mechanism to detect a link fault. For example, on the network shown in Figure 1, a switch exists between P1
and P2. If a fault occurs on the link between the switch and P2, P1 keeps sending Hello packets and detects
the fault after it fails to receive replies to the Hello packets. The fault detection latency causes seconds of
traffic loss. To minimize packet loss, BFD for RSVP can be configured. BFD rapidly detects a fault and triggers
TE FRR switching, which improves network reliability.
2022-07-08 2332
Feature Description
Implementation
BFD for RSVP monitors RSVP neighbor relationships.
Unlike BFD for CR-LSP and BFD for TE that support multi-hop BFD sessions, BFD for RSVP establishes only
single-hop BFD sessions between RSVP nodes to monitor the network layer.
BFD for RSVP, BFD for OSPF, BFD for IS-IS, and BFD for BGP can share a BFD session. When protocol-specific
BFD parameters are set for a BFD session shared by RSVP and other protocols, the smallest values take
effect. The parameters include the minimum intervals at which BFD packets are sent, minimum intervals at
which BFD packets are received, and local detection multipliers.
Usage Scenario
BFD for RSVP applies to a network on which a Layer 2 device exists between the TE FRR point of local repair
(PLR) on a bypass CR-LSP and an RSVP node on the primary CR-LSP.
Benefits
BFD for RSVP improves reliability on MPLS TE networks with Layer 2 devices.
12.4.5.12 RSVP GR
RSVP graceful restart (GR) is a status recovery mechanism supported by RSVP-TE.
RSVP GR is designed based on non-stop forwarding (NSF). If a fault occurs on the control plane of a node,
the upstream and downstream neighbor nodes send messages to restore RSVP soft states, but the
forwarding plane does not detect the fault and is not affected. This helps stably and reliably transmit traffic.
RSVP GR uses the Hello extension to detect the neighboring nodes' GR status. For more information about
the Hello feature, see RSVP Hello.
RSVP GR principles are as follows:
2022-07-08 2333
Feature Description
On the network shown in Figure 1, if the restarter performs GR, it stops sending Hello messages to its
neighbors. If the GR-enabled helpers fail to receive three consecutive Hello messages, the helpers consider
that the restarter is performing GR and retain all forwarding information. In addition, the interface board
continues transmitting services and waits for the restarter to restore the GR status.
After the restarter restarts, if it receives Hello Path messages from helpers, it replies with Hello ACK
messages. The types of the Hello messages returned by the upstream and downstream nodes on a tunnel
are different:
• If an upstream helper receives a Hello message, it sends a GR Path message downstream to the
restarter.
• If a downstream helper receives a Hello message, it sends a Recovery Path message upstream to the
restarter.
Figure 1 Networking diagram for restoring the GR status by sending GR Path and Recovery Path messages
If both the GR Path and Recovery Path messages are received, the restarter creates the new PSB associated
with the CR-LSP. This restores information about the CR-LSP on the control plane.
If no Recovery Path message is sent and only a GR Path message is received, the restarter creates the new
PSB associated with the CR-LSP based on the GR Path message. This restores information about the CR-LSP
on the control plane.
The NE40E can only function as a GR Helper to help a neighbor node to complete RSVP GR.
12.4.5.13 Self-Ping
Self-ping is a connectivity check method for RSVP-TE LSPs.
Background
After an RSVP-TE LSP is established, the system sets the LSP status to up, without waiting for forwarding
relationships to be completely established between nodes on the forwarding path. If service traffic is
imported to the LSP before all forwarding relationships are established, some early traffic may be lost.
Self-ping can address this issue by checking whether an LSP can properly forward traffic.
Implementation
With self-ping enabled, an ingress constructs a UDP packet carrying an 8-byte session ID and adds an IP
header to the packet to form a self-ping IP packet. Figure 1 shows the format of a self-ping IP packet. In a
self-ping IP packet, the destination IP address is the LSR ID of the ingress, the source IP address is the LSR ID
2022-07-08 2334
Feature Description
of the egress, the destination port number is 8503, and the source port number is a variable ranging from
49152 to 65535.
Figure 2 shows the self-ping process. In the example network, a P2P RSVP-TE tunnel is established from PE1
to PE2. Each of the numbers 100, 200, and 300 is an MPLS label assigned by a downstream node to its
upstream node through RSVP Resv messages.
Self-ping is enabled on PE1 (ingress). After PE1 receives a Resv message, it constructs a self-ping IP packet
and forwards the packet along the P2P RSVP-TE LSP. The outgoing label of the packet is 100, same as the
label carried in the Resv message. After the self-ping IP packet is forwarded to PE2 (egress) hop by hop, the
label is popped out, and the self-ping IP packet is restored.
The destination IP address of the packet is the LSR ID of PE1. PE2 searches the IP routing table for a route
matching the destination IP address of the self-ping IP packet, and then sends the packet to PE1 along the
matched route. After PE1 receives the self-ping IP packet, PE1 finds a P2P RSVP-TE LSP that matches the
session ID carried in the packet. If a matching LSP is found, PE1 considers the LSP normal, sets the LSP status
to up, and uses the LSP to transport traffic. The LSP self-ping test is then complete.
If PE1 does not receive the self-ping IP packet, it sends a new self-ping packet. If PE1 does not receive the
self-ping IP packet before the detection period expires, it considers the P2P RSVP-TE LSP faulty and does not
use the LSP to transport traffic.
Benefits
Self-ping detects the actual status of RSVP-TE LSPs, improving service reliability.
Principles
RSVP messages are sent over Raw IP with no security mechanism and expose themselves to being modified
and expose devices to attacks. These packets are easy to modify, and a device receiving these packets is
exposed to attacks.
RSVP authentication prevents the following situations and improves device security:
• An unauthorized remote router sets up an RSVP neighbor relationship with the local router.
• A remote router constructs forged RSVP messages to set up an RSVP neighbor relationship with the
local router and initiates attacks (such as maliciously reserving a large number of bandwidths) to the
local router.
HMAC-MD5 authentication has low security. In order to ensure better security, it is recommended to use Keychain
authentication and use a more secure algorithm, such as HMAC-SHA-256.
Sequence number: In addition, each packet is assigned a 64-bit monotonically increasing sequence number
before being sent, which prevents replay attacks. After receiving the packet, the remote node checks whether
the sequence number is in an allowable window. If the sequence number in the packet is smaller than the
lower limit defined in the window, the receiver considers the packet as a replay packet and discards it.
RSVP authentication also introduces handshake messages. If a receiver receives the first packet from a
transmit end or packet mis-sequence occurs, handshake messages are used to synchronize the sequence
number windows between the RSVP neighboring nodes.
Authentication lifetime: Network flapping causes an RSVP neighbor relationship to be deleted and created
alternatively. Each time the RSVP neighbor relationship is created, the handshake process is performed,
which delays the establishment of a CR-LSP. The RSVP authentication lifetime is introduced to resolve the
problem. If a network flaps, a CR-LSP is deleted and created. During the deletion, the RSVP neighbor
relationship associated with the CR-LSP is retained until the RSVP authentication lifetime expires.
2022-07-08 2336
Feature Description
• A single key is assigned to each interface and node. The key can be reconfigured but cannot be
changed.
A neighbor address-based key is associated with the IP address of an RSVP interface. The key takes
effect on the following packets:
■ Received packets with the source or next-hop address the same as the configured one
■ Sent packets with the destination or next-hop address the same as the configured one
On an RSVP node, if the local interface-, neighbor node-, and neighbor address-based keys are configured,
the neighbor address-based key takes effect; the neighbor node-based key takes effect if the neighbor
address-based key fails; if the neighbor node-based key fails, the local interface-based key takes effect.
A specific RSVP authentication key is configured in a specific situation:
■ If multiple links or hops exist between two RSVP nodes, only a neighbor node-based key needs to
be configured, which simplifies the configuration. Two RSVP nodes authenticate all packets
exchanged between them based on the key.
■ On a TE FRR network, packets are exchanged on an indirect link between a Point of Local Repair
(PLR) node and a Merge Point (MP) node.
■ Two RSVP nodes cannot obtain the LSR ID of each other (for example, on an inter-domain
network).
2022-07-08 2337
Feature Description
12.4.7 DS-TE
12.4.7.1 Background
Figure 1 MPLS TE
2022-07-08 2338
Feature Description
delay for voice traffic, reduce the delay in processing voice packets on each hop. When traffic
congestion occurs, the more packets, the longer the queue, and the higher the delay in processing
packets. Therefore, you must restrict the voice traffic on each link.
In Figure 2, the bandwidth of each link is 100 Mbit/s, and all links share the same metric. Voice traffic is
transmitted from R1 to R4 and from R2 to R4 at the rate of 60 Mbit/s and 40 Mbit/s, respectively.
Traffic from R1 to R4 is transmitted along the LSP over the path R1 → R3 → R4, with the ratio of voice
traffic being 60% between R3 and R4. Traffic from R2 to R4 is transmitted along the LSP over the path
R2 → R3 → R7 → R4, with the ratio of voice traffic being 40% between R7 and R4.
If the link between R3 and R4 fails, as shown in Figure 3, the LSP between R1 and R4 changes to the
path R1 → R3 → R7 → R4 because this path is the shortest path with sufficient bandwidth. At this time,
the ratio of voice traffic from R7 to R4 reaches 100%, causing the sum delay of voice traffic to prolong.
2022-07-08 2339
Feature Description
MPLS DS-TE combines MPLS TE and MPLS DiffServ to provide QoS guarantee.
The class type (CT) is used in DS-TE to allocate resources based on the service class. To provide
differentiated services, DS-TE divides the LSP bandwidth into one to eight parts, each part corresponding to
a CoS. Such a collection of bandwidths of an LSP or a group of LSPs with the same service class are called a
CT. DS-TE maps traffic with the same per-hop behavior (PHB) to one CT and allocates resources to each CT.
Defined by the IETF, DS-TE supports up to eight CTs, marked CTi, in which i ranges from 0 to 7.
If an LSP has a single CT, the LSP is also called a single-CT LSP.
DS Field
To implement DiffServ, the ToS field in an IPv4 header is redefined in relevant standards and then called the
Differentiated Services (DS) field. In the DS field, higher two bits are reserved and lower six bits are the DS
codepoint (DSCP).
PHB
Per-Hop Behavior (PHB) is used to describe the next action on packets with the same DSCP. Commonly, PHB
contains traffic traits, such as delay and packet loss rate.
The IETF defines the existing three standard PHBs: Expedited Forwarding (EF), Assured Forwarding (AF), and
Best-Effort (BE). BE is the default PHB.
CT
To provide differentiated services, DS-TE divides the LSP bandwidth into one to eight parts, each part
corresponding to a CoS. Such a collection of bandwidths of an LSP or a group of LSPs with the same service
class are called a CT. A CT can transmit only the traffic of a CoS.
Defined by the IETF, DS-TE supports up to eight CTs, marked CTi, in which i ranges from 0 to 7.
TE-Class
A TE-class refers to a combination of a CT and a priority, in the format of <CT, priority>.
The priority is the priority of a CR-LSP in a TE-class mapping table, not the EXP value in the MPLS header.
The priority value is an integer ranging from 0 to 7. The smaller the value, the higher the priority is. When
you create a CR-LSP, you can set the setup and holding priorities for it and CT bandwidth values. A CR-LSP
can be established only when both <CT, setup-priority> and <CT, holding-priority> exist in a TE-class
mapping table. Assume that the TE-class mapping table of a node contains only TE-Class [0] = <CT0, 6> and
TE-Class [1] = <CT0, 7>, only the following three types of CR-LSPs can be successfully set up:
2022-07-08 2340
Feature Description
The combination of setup-priority = 6 and hold-priority = 7 does not exist because the setup priority cannot be higher
than the holding priority on a CR-LSP.
CTs and priorities can be in any combination. Therefore, there are 64 TE-classes theoretically. The NE40E
supports a maximum of eight TE-classes, which are specified by users.
DS-TE Modes
DS-TE has two modes:
• IETF mode: The IETF mode is defined by the IETF and supports 64 TE-classes by combining 8 CTs and 8
priorities. The NE40E supports up to 8 TE-classes.
• Non-IETF mode: The non-IETF mode is not defined by the IETF and supports 8 TE-classes by combining
CT0 and 8 priorities.
BCM
The Bandwidth Constraints Model (BCM) is used to define the maximum number of Bandwidth Constraints
(BCs), which CTs can use the bandwidth of each BC, and how to use BC bandwidth.
12.4.7.3 Implementation
Basic Implementation
A label edge router (LER) of a DiffServ domain sorts traffic into a small number of classes and marks class
information in the Differentiated Service Code Point (DSCP) field of packets. When scheduling and
forwarding packets, LSRs select Per-Hop Behaviors (PHBs) based on DSCP values.
The EXP field in the MPLS header carries DiffServ information. The key to implementing DS-TE is to map the
DSCP value (with a maximum of 64 values) to the EXP field (with a maximum of 8 values). Relevant
standards provide the following solutions:
• Label-Only-Inferred-PSC LSP (L-LSP): The discard priority is set in the EXP field, and the PHB type is
2022-07-08 2341
Feature Description
determined by labels. During forwarding, labels determine the datagram forwarding path and allocate
scheduling behaviors.
• EXP-Inferred-PSC LSP (E-LSP): The PHB type and the discard priority are set in the EXP field in an MPLS
label. During forwarding, labels determine the datagram forwarding path, and the EXP field determines
PHBs. E-LSPs are applicable to a network that supports no more than eight PHBs.
The NE40E supports E-LSPs. The mapping from the DSCP value to the EXP field complies with the definition
of relevant standards. The mapping from the EXP field to the PHB is manually configured.
The class type (CT) is used in DS-TE to allocate resources based on the class of traffic. DS-TE maps traffic
with the same PHB to one CT and allocates resources to each CT. Therefore, DS-TE LSPs are established
based on CTs. Specifically, when DS-TE calculates an LSP, it needs to take CTs and obtainable bandwidth of
each CT as constraints; when DS-TE reserves resources, it also needs to consider CTs and their bandwidth
requirements.
IGP Extension
To support DS-TE, related standards extend an IGP by introducing an optional sub-TLV (Bandwidth
Constraints sub-TLV) and redefining the original sub-TLV (Unreserved Bandwidth sub-TLV). This helps
inform and collect information about reservable bandwidths of CTs with different priorities.
RSVP Extension
To implement IETF DS-TE, the IETF extends RSVP by defining a CLASSTYPE object for the Path message in
related standards. For details about CLASSTYPE objects, see related standards.
After an LSR along an LSP receives an RSVP Path message carrying CT information, an LSP is established if
resources are sufficient. After the LSP is successfully established, the LSR recalculates the reservable
bandwidth of CTs with different priorities. The reservation information is sent to the IGP module to advertise
to other nodes on the network.
BCM
Currently, the IETF defines the following bandwidth constraint models (BCMs):
• Maximum Allocation Model (MAM): maps a BC to a CT. CTs do not share bandwidth resources. The BC
mode ID of the MAM is 1.
Figure 1 MAM
In the MAM, the sum of CTi LSP bandwidths does not exceed BCi (0≤ i ≤7) bandwidth; the sum of
bandwidths of all LSPs of all CTs does not exceed the maximum reservable bandwidth of the link.
2022-07-08 2342
Feature Description
Assume that a link with the bandwidth of 100 Mbit/s adopts the MAM and supports three CTs (CT0,
CT1, and CT2). BC0 (20 Mbit/s) carries CT0 (BE flows); BC1 (50 Mbit/s) carries CT1 (AF flows); BC2 (30
Mbit/s) carries CT2 (EF flows). In this case, the total reserved LSP bandwidths that are used to transmit
BE flows cannot exceed 20 Mbit/s; the total reserved LSP bandwidths that are used to transmit AF flows
cannot exceed 50 Mbit/s; the total reserved LSP bandwidths that are used to transmit EF flows cannot
exceed 30 Mbit/s.
In the MAM, bandwidth preemption between CTs does not occur but some bandwidth resources may be
wasted.
• Russian Dolls Model (RDM): allows CTs to share bandwidth resources. The BC mode ID of the RDM is 0.
The bandwidth of BC0 is less than or equal to maximum reservable bandwidth of the link. Nesting
relationships exist among BCs. As shown in Figure 2, the bandwidth of BC7 is fixed; the bandwidth of
BC6 nests the bandwidth of BC7; this relationship applies to the other BCs, and therefore the bandwidth
of BC0 nests the bandwidth of all BCs. This model is similar to a Russian doll: A large doll nests a
smaller doll and then this smaller doll nests a much smaller doll, and so on.
Figure 2 RDM
Assume that a link with the bandwidth of 100 Mbit/s adopts the RDM and supports three BCs. CT0, CT1,
and CT2 are used to transmit BE flows, AF flows, and EF flows, respectively. The bandwidths of BC0,
BC1, and BC2 are 100 Mbit/s, 50 Mbit/s, and 20 Mbit/s, respectively. In this case, the total LSP
bandwidths that are used to transmit EF flows cannot exceed 20 Mbit/s; the total LSP bandwidths that
are used to transmit EF flows and AF flows cannot exceed 50 Mbit/s; the total LSP bandwidths that are
used to transmit BE, AF, and EF flows cannot exceed 100 Mbit/s.
The RDM allows bandwidth preemption among CTs. The preemption relationship among CTs is as
follows. In the case of 0 ≤ m < n ≤7 and 0 ≤ i < j ≤ 7, CTi with the priority being m can preempt the
bandwidth of CTi with priority n and the bandwidth of CTj with priority n. The total LSP bandwidths of
CTi, however, does not exceed the bandwidth of BCi.
In the RDM, bandwidth resources are used efficiently.
2022-07-08 2343
Feature Description
If bandwidth constraints or CT or CT reserved bandwidth is configured for a tunnel, the IETF and non-IETF modes cannot
be switched to each other.
TE-class mapping The TE-class mapping table can be The TE-class mapping table can be configured
table configured but does not take and takes effect.
effect.
IGP message The priority-based reservable The CT information is carried in the Unreserved
bandwidth is carried in the Bandwidth sub-TLV and Bandwidth Constraints
Unreserved Bandwidth sub-TLV. sub-TLV.
Background
As user networks and the scope of network services continue to expand, load-balancing techniques are used
to improve bandwidth between nodes. If tunnels are used for load balancing, transit nodes (P) obtain IP
2022-07-08 2344
Feature Description
content carried in MPLS packets as a hash key. If a transit node cannot obtain the IP content from MPLS
packets, the transit node can only use the top label in the MPLS label stack as a hash key. The top label in
the MPLS label stack cannot differentiate underlying protocols in detail. As a result, the top MPLS labels are
not distinguished when being used as hash keys, resulting in load imbalance. Per-packet load balancing can
be used to prevent load imbalance but results in packets being delivered out of sequence. This drawback
adversely affects service experience. To address the problems, the entropy label feature can be configured to
improve load balancing performance.
Implementation
An entropy label is generated on an ingress LSR, and it is only used to enhance the ability to load-balance
traffic. To help the egress distinguish the entropy label generated by the ingress from application labels, an
identifier label of 7 is added before an entropy label in the MPLS label stack.
The ingress LSR generates an entropy label and encapsulates it into the MPLS label stack. Before the ingress
LSR encapsulates packets with MPLS labels, it can easily obtain IP or Layer 2 protocol data for use as a hash
key. If the ingress LSR identifies the entropy label capability, it uses IP information carried in packets to
compute an entropy label, adds it to the MPLS label stack, and advertises it to the transit node (P). The P
uses the entropy label as a hash key to load-balance traffic and does not need to parse IP data inside MPLS
packets.
The entropy label is negotiated using RSVP for improved load balancing. The entropy label is pushed into
packets by the ingress and removed by the egress. Therefore, the egress needs to notify the ingress of the
support for the entropy label capability.
• Egress: If the egress can parse an entropy label, the egress extends a RESV message by adding an
entropy label capability TLV into the message. The egress sends the message to notify upstream nodes,
including the ingress, of the local entropy label capability.
• Transit node: sends a RESV message to upstream nodes to transparently transmit the downstream
node's entropy label capability. If load balancing is enabled, the RESV messages sent by the transit node
carry the entropy label capability TLV only if all downstream nodes have the capability. If a transit node
does not identify the entropy label capability TLV, the transit node transparently transmits the TLV by
undergoing the unknown TLV process.
2022-07-08 2345
Feature Description
• Ingress: determines whether to add an entropy label into packets to improve load balancing based on
the entropy label capability advertised by the egress.
Application Scenarios
Entropy labels can be used in the following scenarios:
• On the network shown in Figure 1, entropy labels are used when load balancing is performed among
transit nodes.
• On the network shown in Figure 2, the entire tunnel has the entropy label capability only when both
the primary and backup paths of the tunnel have the entropy label capability. An RSVP-TE session is
established between each pair of directly connected devices (P1 through P4). On P1, for the tunnel to
P3, the primary LSP is P1–>P3, and the backup LSP is P1–>P2–>P4–>P3. On P2, for the tunnel to P3, the
primary LSP is P2–>P4–>P3, and the backup LSP is P2–>P1–>P3. In this example, P1 and P2 are the
downstream nodes of each other's backup path. Assume that the entropy label capability is enabled on
P3 and this device sends a RESV message carrying the entropy label capability to P1 and P4. After
receiving the message, P1 checks whether the entire LSP to P3 has the entropy label capability. Because
the path P1–>P2 does not have the entropy label capability, P1 considers that the LSP to P3 does not
have the entropy label capability. As a result, P1 does not send a RESV message carrying the entropy
label capability to P2. P2 performs the same check after receiving a RESV message carrying the entropy
label capability from P4. If the path P2–>P1 does not have the entropy label capability, P2 also considers
that the LSP to P3 does not have the entropy label capability.
• The entropy label feature applies to public network MPLS tunnels in service scenarios such as IPv4/IPv6
over MPLS, L3VPNv4/v6 over MPLS, VPLS/VPWS over MPLS, and EVPN over MPLS.
Benefits
Entropy labels help achieve more even load balancing.
2022-07-08 2346
Feature Description
Background
A static CR-LSP is established using manually configured forwarding and resource information. Signaling
protocols and path calculation are not used during the setup of CR-LSPs. Setting up a static CR-LSP
consumes a few resources because the two ends of the CR-LSP do not need to exchange MPLS control
packets. The static CR-LSP cannot be adjusted dynamically in a changeable network topology. A static CR-
LSP configuration error may cause protocol packets of different NEs and statuses interfere one another,
which adversely affects services. To address the preceding problem, a device can be enabled to check source
interfaces of static CR-LSPs. With this function configured, the device can only forward packets if both labels
and inbound interfaces are correct.
Principles
In Figure 1, static CR-LSP1 is configured, with PE1 functioning as the ingress, the P as a transit node, and PE2
as the egress. The P's inbound interface connected to PE1 is Interface1 and the incoming label is Label1.
Static CR-LSP2 remains on PE3 that functions as the ingress of CR-LSP2. The P's inbound interface connected
to PE3 is Interface2 and the incoming label is Label1. If PE3 sends traffic along CR-LSP2 and Interface2 on
the P receives the traffic, the P checks the inbound interface information and finds that the traffic carries
Label1 but the inbound interface is not Interface1. Consequently, the P discards the traffic.
Background
Service packets exchanged by two nodes must travel through the same links and nodes on a transport
network without running a routing protocol. Co-routed bidirectional static CR-LSPs can be used to meet the
requirements.
2022-07-08 2347
Feature Description
Definition
A co-routed bidirectional static CR-LSP is a type of CR-LSP over which two flows are transmitted in opposite
directions over the same links. A co-routed bidirectional static CR-LSP is established manually.
A co-routed bidirectional static CR-LSP differs from two LSPs that transmit traffic in opposite directions. Two
unidirectional CR-LSPs bound to a co-routed bidirectional static CR-LSP function as a single CR-LSP. Two
forwarding tables are used to forward traffic in opposite directions. The co-routed bidirectional static CR-LSP
can go Up only when the conditions for forwarding traffic in opposite directions are met. If the conditions for
forwarding traffic in one direction are not met, the bidirectional CR-LSP is in the Down state. If no IP
forwarding capabilities are enabled on the bidirectional CR-LSP, any intermediate node on the bidirectional
LSP can reply with a packet along the original path. The co-routed bidirectional static CR-LSP supports the
consistent delay and jitter for packets transmitted in opposite directions, which guarantees QoS for traffic
transmitted in opposite directions.
Implementation
A bidirectional co-routed static CR-LSP is manually established. A user manually specifies labels and
forwarding entries mapped to two FECs for traffic transmitted in opposite directions. The outgoing label of a
local node (also known as an upstream node) is equal to the incoming label of a downstream node of the
local node.
A node on a co-routed bidirectional static CR-LSP only has information about the local LSP and cannot
obtain information about nodes on the other LSP. A co-routed bidirectional static CR-LSP shown in Figure 1
consists of a CR-LSP and a reverse CR-LSP. The CR-LSP originates from the ingress and terminates on the
egress. Its reverse CR-LSP originates from the egress and terminates on the ingress.
• On the ingress, configure a tunnel interface and enable MPLS TE on the outbound interface of the
ingress. If the outbound interface is Up and has available bandwidth higher than the bandwidth to be
reserved, the associated bidirectional static CR-LSP can go Up, regardless of the existence of transit
nodes or the egress node.
• On each transit node, enable MPLS TE on the outbound interface of the bidirectional CR-LSP. If the
outbound interface is Up and has available bandwidth higher than the bandwidth to be reserved for the
forward and reverse CR-LSPs, the associated bidirectional static CR-LSP can go Up, regardless of the
existence of the ingress, other transit nodes, or the egress node.
• On the egress, enable MPLS TE on the inbound interface. If the inbound interface is Up and has
available bandwidth higher than the bandwidth to be reserved for the bidirectional CR-LSP, the
2022-07-08 2348
Feature Description
associated bidirectional static CR-LSP can go Up, regardless of the existence of the ingress node or
transit nodes.
1. Loopback is enabled on P1 to loop data packets back to the ingress. The ingress checks whether the
sent packets match the received ones.
• If the packets do not match, a fault occurs on the link between PE1 and P1. Loopback detection
can then be disabled on P1.
• If the packets match, the link between PE1 and P1 is working properly. The fault location
continues.
2. Loopback is disabled on P1 and enabled on P2 to loop data packets back to the ingress. The ingress
checks whether the sent packets match the received ones.
• If the packets do not match, a fault occurs on the link between P1 and P2. Loopback detection
can then be disabled on P2.
• If the packets match, a fault occurs on the link between P2 and PE2. Loopback detection can then
be disabled on P2.
2022-07-08 2349
Feature Description
Loopback detection information is not saved in a configuration file after loopback detection is enabled. A loopback
detection-enabled node loops traffic back to the ingress through a temporary loop. Loopback alarms can then be
generated to prompt users that loopback detection is performed. After loopback detection finishes, it can be manually or
automatically disabled. Loopback detection configuration takes effect only on a main control board. After a master/slave
main control board switchover is performed, loopback detection is automatically disabled.
Background
MPLS networks face the following challenges:
• Traffic congestion: RSVP-TE tunnels are unidirectional. The ingress forwards services to the egress along
an RSVP-TE tunnel. The egress forwards services to the ingress over IP routes. As a result, the services
may be congested because IP links do not reserve bandwidth for these services.
• Traffic interruptions: Two MPLS TE tunnels in opposite directions are established between the ingress
and egress. If a fault occurs on an MPLS TE tunnel, a traffic switchover can only be performed for the
faulty tunnel, but not for the reverse tunnel. As a result, traffic is interrupted.
A forward CR-LSP and a reverse CR-LSP between two nodes are established. Each CR-LSP is bound to the
ingress of its reverse CR-LSP. The two CR-LSPs then form an associated bidirectional CR-LSP. The associated
bidirectional CR-LSP is mainly used to prevent traffic congestion. If a fault occurs on one end, the other end
is notified of the fault so that both ends trigger traffic switchovers, which traffic transmission is
uninterrupted.
Implementation
Figure 1 illustrates an associated bidirectional CR-LSP that consists of Tunnel1 and Tunnel2. The
implementation of the associated bidirectional CR-LSP is as follows:
2022-07-08 2350
Feature Description
• MPLS TE Tunnel1 and Tunnel2 are established using RSVP-TE signaling or manually.
• The tunnel ID and ingress LSR ID of the reverse CR-LSP are specified on each tunnel interface so that
the forward and reverse CR-LSPs are bound to each other. For example, in Figure 1, set the reverse
tunnel ID to 200 and ingress LSR ID to 4.4.4.4 on Tunnel1 so the reverse tunnel is bound to Tunnel1.
The ingress LSR ID of the reverse CR-LSP is the same as the egress LSR ID of the forward CR-LSP.
The forward and reverse CR-LSPs can be established over the same path or over different paths. Establishing the
forward and reverse CR-LSPs over the same path is recommended to implement the consistent delay time.
Usage Scenario
• An associated bidirectional static CR-LSP transmits services and returned OAM packets on MPLS-TP
networks.
12.4.12 CBTS
Class-of-service based tunnel selection (CBTS) is a method of selecting a TE tunnel. Unlike the traditional
method of load-balancing services on TE tunnels, CBTS selects tunnels based on services' priorities so that
high quality resources can be provided for services with higher priority. In addition, FRR and HSB can be
configured for TE tunnels selected by CBTS. For more information about FRR and HSB, see the section
Configuration - MPLS - MPLS TE Configuration - Configuring MPLS TE Manual FRR and Configuration -
2022-07-08 2351
Feature Description
Background
Existing networks face a challenge that they may fail to provide exclusive high-quality transmission resources
for higher-priority services. This is because the policy for selecting TE tunnels is based on public network
routes or VPN routes, which causes a node to select the same tunnels for services with the same destination
IP or VPN address but with different priorities.
Traffic classification can be configured on CBTS-capable devices to match incoming services and map traffic
of different services to different priorities. A rule can be enforced based on traffic characteristics. For BGP
routes, a QoS Policy Propagation Through the Border Gateway Protocol (QPPB) rule can be enforced based
on BGP community attributes from the source device of the routes.
Service class attributes can be configured on a tunnel to which services recurse so that the tunnel can
transmit services with one or more priorities. Services with specified priorities can only be transmitted on
such tunnels instead of being load-balanced by all tunnels to which they may recurse. The default service
class attribute can also be configured for tunnels to carry services of non-specified priorities.
Implementation
Figure 1 illustrates CBTS principles. TE tunnels between LSRA and LSRB balance services, including high-
priority voice services, medium-priority Ethernet data services, and common data services. The following
operations are performed to use different TE tunnels to carry these services:
• Service classes EF, AF1+AF2, and default are configured for the three TE tunnels, respectively.
• Multi-field classification is configured on the PE to map voice services to EF and map Ethernet services
to AF1 or AF2.
• Voice services are transmitted along the TE tunnel that is assigned the EF service class, Ethernet services
along the TE tunnel that is assigned the AF1+AF2 service class, and other services along the TE tunnel
that is assigned the default service class.
The default service class is not a mandatory setting. If it is not configured, mismatching services will be transmitted
along a tunnel that is assigned no service class. If every tunnel is configured with a service class, these services will be
transmitted along a tunnel that is assigned a service class mapped to the lowest priority.
2022-07-08 2352
Feature Description
Application Scenarios
• TE tunnels or LDP over TE tunnels functioning as public network tunnels are deployed for load
balancing on a PE.
• L3VPN, VLL and VPLS services are configured on a PE. Inter-AS VPN services are not supported.
• The TE tunnel includes two types: RSVP-TE tunnel and SR-MPLS TE tunnel.
12.4.13 P2MP TE
Point-to-Multipoint (P2MP) Traffic Engineering (TE) is a promising solution to multicast service transmission.
It helps carriers provide high TE capabilities and increased reliability on an IP/MPLS backbone network and
reduce network operational expenditure (OPEX).
Background
The proliferation of applications, such as IPTV, multimedia conference, and massively multiplayer online
role-playing games (MMORPGs), amplifies demands on multicast transmission over IP/MPLS networks.
These services require sufficient network bandwidth, good quality of service (QoS), and high reliability. The
following multicast solutions are generally used to run multicast services, but these solutions fall short of the
requirements of multicast services or network carriers:
• IP multicast technology: deployed on a live P2P network with software upgraded. This solution reduces
upgrade and maintenance costs. However, IP multicast, similar to IP unicast, does not support QoS or
traffic planning capabilities and cannot provide high reliability. Multicast applications have high
requirements on real-time transmission and reliability. As such, IP multicast cannot meet these
requirements.
• Dedicated multicast network: deployed using ATM or SONET/SDH technologies, which provide high
reliability and transmission rates. However, the construction of a private network requires a large
amount of investment and independent maintenance, resulting in high operation costs.
IP/MPLS backbone network carriers require a multicast solution that has high TE capabilities and can be
2022-07-08 2353
Feature Description
P2MP TE is such a technology. It combines advantages such as high transmission efficiency of IP multicast
packets and MPLS TE end-to-end QoS guarantee, and provides excellent solutions for multicast services on
IP/MPLS backbone networks. P2MP