ADCX 17a SG Vol2 PDF

Advanced Data Center Switching
STUDENT GUIDE – Volume 2 of 2 Revision 17.a
Education Services Courseware

17.a
Student Guide
Volume 2
Worldwide Education Services
1133 Innovation Way

Sunnyvale, CA 94089
USA
408-745-2000
www.juniper.net
Course Number: EDU-JUN-ADCX

This document is produced by Juniper Networks, Inc.
This document or any part thereof may not be reproduced or transmitted in any form under penalty of law, without the prior written permission of Juniper Networks Education
Services.
Juniper Networks, Junos, Steel-Belted Radius, NetScreen, and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. The
Juniper Networks Logo, the Junos logo, and JunosE are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service
marks are the property of their respective owners.
Advanced Data Center Switching Student Guide, Revision 17.a
Copyright © 2017 Juniper Networks, Inc. All rights reserved.
Printed in USA.
Revision History:
Revision 14.a—April 2016
Revision 17.a—June 2017
The information in this document is current as of the date listed above.
The information in this document has been carefully verified and is believed to be accurate for software Release 17.1R1.8.
Juniper Networks assumes no responsibilities for any inaccuracies that may appear in this document. In no event will Juniper Networks be liable for direct, indirect, special,
exemplary, incidental, or consequential damages resulting from any defect or omission in this document, even if advised of the possibility of such damages.
Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.
YEAR 2000 NOTICE
Juniper Networks hardware and software products do not suffer from Year 2000 problems and hence are Year 2000 compliant. The Junos operating system has no known
time-related limitations through the year 2038. However, the NTP application is known to have some difficulty in the year 2036.
SOFTWARE LICENSE
The terms and conditions for using Juniper Networks software are described in the software license provided with the software, or to the extent applicable, in an agreement
executed between you and Juniper Networks, or Juniper Networks agent. By using Juniper Networks software, you indicate that you understand and agree to be bound by its
license terms and conditions. Generally speaking, the software license restricts the manner in which you are permitted to use the Juniper Networks software, may contain
prohibitions against certain uses, and may state conditions under which the license is automatically terminated. You should consult the software license for further details.
Contents
Chapter 11: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1

IP Fabric Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
IP Fabric Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
IP Fabric Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-25
Configure an IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Lab: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-49
Chapter 12: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

Layer Connectivity Over a Layer 3 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
VXLAN Using Multicast Control Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11
VXLAN Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-24
Lab: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-42
Chapter 13: EVPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1

The Benefits of EVPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
VXLAN Using EVPN Control Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-11
EVPN /VXLAN Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-31
Lab: EVPN Control Plane for VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-56
Chapter 14: Data Center Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1

DCI Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3
MPLS VPN Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
DCI Options for a VXLAN Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-43
EVPN Type 5 Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-49
DCI Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-52
Lab: Data Center Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-63
Appendix A: Troubleshooting Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Troubleshooting Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A-3
A Troubleshooting Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-14
Acronym List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACR-1
www.juniper.net Contents • iii

iv • Contents www.juniper.net
Course Overview
This five-day course provides a comprehensive focus on Juniper Networks data center switching technologies.
The first three days are designed to introduce the data center features including zero touch provisioning (ZTP), unified
in-service software upgrade (ISSU), multichassis link aggregation (MC-LAG), Mixed Virtual Fabric, and Virtual Chassis
Fabric (VCF) and provide students with knowledge of troubleshooting some of the key data center features including
MC-LAG, Virtual Chassis, and VCF deployments.
The last two days of the course are designed to introduce data center features that are more advanced including IP
Fabric, Virtual eXtensible Local Area Network (VXLAN) Layer 2 and Layer 3 Gateways, VXLAN with Ethernet VPN (EVPN)
signaling, and Data Center Interconnect (DCI) for a VXLAN overlay.
Students will learn to configure and monitor these features on the Junos operating system running on the QFX5100,
EX4300, and vMX Series platforms. Through demonstrations and hands-on labs, students will gain experience
configuring, monitoring, troubleshooting, and analyzing the mentioned features of the Junos OS. This content is based
on Junos OS Release 17.1R1.8.
Course Level
Advanced Data Center Switching (ADCX) begins at an intermediate-level course, and finishes at an advanced level.
Intended Audience
This course benefits individuals responsible for configuring, monitoring, and troubleshooting data center features that
exist on the Junos OS running on data center-oriented platforms such as EX Series, QFX Series, MX Series, and vMX
Series devices. This includes individuals in professional services, sales and support organizations, and the end users.
Prerequisites
The following are the prerequisites for this course:
• Understanding of the OSI model;
• Advanced routing knowledge—the Advanced Junos Enterprise Routing (AJER) course or equivalent
knowledge; and
• Intermediate switching knowledge—the Junos Enterprise Switching Using Enhanced Layer 2 Software
(JEX-ELS) course or equivalent knowledge.
Objectives
After successfully completing this course, you should be able to:
• Identify current challenges in today’s data center environments and explain how the QFX5100 system
solves some of those challenges.
• List the various models of QFX5100 Series switches.
• List some data center architecture options.
• Explain the purpose and value of ZTP.
• Describe the components and operations of ZTP.
• Deploy a QFX5100 Series switch using ZTP.
• Explain the purpose and value of ISSU.
• Describe the components and operations of ISSU.
• Upgrade a QFX5100 Series switch using ISSU.
• Explain the purpose and value of MC-LAG.
• Describe the components and operations of MC-LAG.
• Implement an MC-LAG on QFX5100 Series switches.
• Describe key concepts and components of a mixed Virtual Chassis.
• Explain the operational details of a mixed Virtual Chassis.
www.juniper.net Course Overview • v

• Implement a mixed Virtual Chassis and verify its operations.
• Describe key concepts and components of a Virtual Chassis Fabric.
• Describe the control and forwarding plane of a Virtual Chassis Fabric.
• Describe how to use the CLI to configure and monitor a Virtual Chassis Fabric.
• Describe how to provision a Virtual Chassis Fabric using nonprovisioning, preprovisioning, and
autoprovisioning.
• Describe the software requirements and upgrade procedure of Virtual Chassis Fabric.
• Describe how to manage a Virtual Chassis Fabric with Junos Space.
• Explain a basic troubleshooting approach.
• List and use available troubleshooting tools.
• Describe the expected state and operation.
• Describe key processes and components.
• Identify potential issues with MC LAG.
• Resolve basic issues with MC LAG.
• Describe the expected state and operation.
• Identify potential issues with Virtual Chassis.
• Resolve basic issues with Virtual Chassis.
• Explain the expected state and operation.
• Identify potential issues with VCF.
• Resolve basic issues with VCF.
• Describe the benefits and challenges of the traditional multitier architecture.
• Describe the new networking requirements in a data center.
• Describe the various data center fabric architectures.
• Explain routing in an IP Fabric.
• Describe how to scale an IP Fabric.
• Configure an EBGP-based IP Fabric.
• Explain why you would use VXLAN in your data center.
• Describe the control and data plane of VXLAN in a controller-less overlay.
• Describe how to configure and monitor VXLAN when using multicast signaling.
• Describe the benefits of using EVPN signaling for VXLAN.
• Describe the operation of the EVPN protocol.
• Configure and monitor EVPN signaling for VXLAN.
• Define the term Data Center Interconnect.
• Describe the control and data plane of an MPLS VPN.
• Describe the DCI options when using a VXLAN overlay with EVPN signaling.
vi • Course Overview www.juniper.net

Course Agenda
Day 1
Chapter 1: Course Introduction
Chapter 2: System Overview
Chapter 3: Zero Touch Provisioning
Lab 1: Zero Touch Provisioning
Chapter 4: In-Service Software Upgrade
Lab 2: In-Service Software Upgrade
Day 2
Chapter 5: MC-LAG
Lab 3: MC-LAG
Chapter 6: Troubleshooting Multichassis LAG
Lab 4: Troubleshooting Multichassis LAG
Chapter 7: Mixed Virtual Chassis
Lab 5: Mixed Virtual Chassis
Day 3
Chapter 8: Virtual Chassis Fabric
Chapter 9: Virtual Chassis Fabric Management
Lab 6: Virtual Chassis Fabric
Chapter 10: Troubleshooting Virtual Chassis Technologies
Lab 7: Troubleshooting Virtual Chassis Technologies
Day 4
Chapter 11: IP Fabric
Lab 8: IP Fabric
Chapter 12: VXLAN
Lab 9: VXLAN
Day 5
Chapter 13: EVPN
Lab 10: VXLAN with EVPN Signaling
Chapter 14: Data Center Interconnect
Lab 11: DCI
www.juniper.net Course Agenda • vii

Document Conventions
CLI and GUI Text

Frequently throughout this course, we refer to text that appears in a command-line interface (CLI) or a graphical user
interface (GUI). To make the language of these documents easier to read, we distinguish GUI and CLI text from chapter
text according to the following table.
Style Description Usage Example
Franklin Gothic Normal text. Most of what you read in the Lab Guide
and Student Guide.
Courier New Console text:

commit complete
• Screen captures
• Noncommand-related Exiting configuration mode
syntax
GUI text elements:
Select File > Open, and then click
• Menu names Configuration.conf in the
Filename text box.
• Text field entry
Input Text Versus Output Text

You will also frequently see cases where you must enter input text yourself. Often these instances will be shown in the
context of where you must enter them. We use bold style to distinguish text that is input versus text that is simply
displayed.
Normal CLI No distinguishing variant. Physical interface:fxp0,

Enabled
Normal GUI
View configuration history by clicking
Configuration > History.
CLI Input Text that you must enter. lab@San_Jose> show route
GUI Input Select File > Save, and type
config.ini in the Filename field.
Defined and Undefined Syntax Variables

Finally, this course distinguishes between regular text and syntax variables, and it also distinguishes between syntax
variables where the value is already assigned (defined variables) and syntax variables where you must assign the value
(undefined variables). Note that these styles can be combined with the input style as well.
CLI Variable Text where variable value is policy my-peers

already assigned.
GUI Variable Click my-peers in the dialog.
CLI Undefined Text where the variable’s value is Type set policy policy-name.
the user’s discretion or text where
ping 10.0.x.y
the variable’s value as shown in
GUI Undefined the lab guide might differ from the Select File > Save, and type
value the user must input filename in the Filename field.
according to the lab topology.
viii • Document Conventions www.juniper.net

Additional Information
Education Services Offerings

You can obtain information on the latest Education Services offerings, course dates, and class locations from the World
Wide Web by pointing your Web browser to: http://www.juniper.net/training/education/.
About This Publication
The Advanced Data Center Switching Student Guide was developed and tested using software Release 14.1X53.
Previous and later versions of software might behave differently so you should always consult the documentation and
release notes for the version of code you are running before reporting errors.
This document is written and maintained by the Juniper Networks Education Services development team. Please send
questions and suggestions for improvement to training@juniper.net.
Technical Publications
You can print technical manuals and release notes directly from the Internet in a variety of formats:
• Go to http://www.juniper.net/techpubs/.
• Locate the specific software or hardware release and title you need, and choose the format in which you
want to view or print the document.
Documentation sets and CDs are available through your local Juniper Networks sales office or account representative.
Juniper Networks Support
For technical support, contact Juniper Networks at http://www.juniper.net/customers/support/, or at 1-888-314-JTAC
(within the United States) or 408-745-2121 (outside the United States).
www.juniper.net Additional Information • ix

x • Additional Information www.juniper.net
Chapter 11: IP Fabric

We Will Discuss:
• Routing in an IP Fabric;
• Scaling of an IP Fabric; and
• Configuring an IP Fabric.
Chapter 11–2 • IP Fabric www.juniper.net

IP Fabric Overview
The slide lists the topics we will discuss. We discuss the highlighted topic first.
www.juniper.net IP Fabric • Chapter 11–3

IP Fabric
An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly
using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers
that must accommodate multiple vendors. Some of the most complicated tasks in building an IP Fabric are assigning all of
the details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other
implementation details. Throughout this chapter we refer to the devices as nodes (Spine-nodes and leaf-nodes). Keep in
mind that all devices in an IP fabric are basically just layer 3 routers that rely on routing information to make forwarding
decisions.

A Three Stage Clos Network

In the 1950s, Charles Clos first wrote about his idea of a non-blocking, multistage, telephone switching architecture that
would allow calls to be completed. The switches in his topology are called crossbar switches. A Clos network is based on a
three-stage architecture, an ingress stage, a middle stage, and an egress stage. The theory is that there are multiple paths
for a call to be switched through the network such that calls will always be connected and not "blocked" by another call. The
term Clos “fabric” came about later as people began to notice that the pattern of links looked like threads in a woven piece
of cloth.
You should notice that the goal of the design is to provide connectivity from one ingress crossbar switch to an egress
crossbar switch. Notice that there is no need for connectivity between crossbar switches that belong to the same stage.

An IP Fabric Is Based on a Clos Fabric

The diagram shows an IP Clos Fabric using Juniper Networks switches. In an IP Fabric the Ingress and Egress stage crossbar
switches are called Leaf nodes. The middle stage crossbar switches are called Spine nodes. Most diagrams of an IP Fabric
do not present the topology with 3 distinct stages as shown on this slide. Most diagrams show an IP Fabric with the Ingress
and Egress stage combined as a single stage. It would be like taking the top of the diagram and folding it over onto itself with
all Spines nodes on top and all Leaf nodes on the bottom of the diagram (see the next slide).

Spine and Leaf Architecture, Part 1

To maximize the throughput of the fabric, each Leaf node should have a connection to each Spine node. This ensures each
server-facing interface is always two hops away from any other server-facing interfaces. This creates a highly resilient fabric
with multiple paths to all other devices. An important fact to keep in mind is that a member switch has no idea of its location
(Spine or Leaf) in an IP Fabric. The Spine or Leaf function is simply a matter of a device’s physical location in the fabric. In
general, the choice of router to be used as a Spine nodes should be partially based on the interface speeds and number of
ports that it supports. The example on the slide shows an example where every Spine node is a QFX5100-24q. The
QFX5100-24q supports (32) 40GbE interfaces and was literally designed by Juniper to be a Spine node.

Spine and Leaf Architecture, Part 2

The slide shows that there are four distinct paths (1 path per Spine node) between Host A and Host B across the fabric. In an
IP Fabric, the main goal of your design should be that traffic is automatically load shared over those equal cost paths using a
hash algorithm (keeping frames from same flow on same path).

IP Fabric Design Options

IP Fabrics are generally structured in either a 3-stage topology or a 5-stage topology. A 3-stage topology is used in small to
medium deployments. We cover the configuration of a 3-stage fabric in the upcoming slides. A 5-stage topology is used in a
medium to large deployment. Although we do not cover the configuration of a 5-stage fabric, you should know that the
configuration of a 5-stage fabric is quite complicated.

Recommended Spine Nodes

The slide shows some of the recommended Juniper Networks products that can act as Spine nodes. As stated earlier, you
should consider port density and scaling limitations when choosing the product to place in the Spine location. Some of the
pertinent features for a Spine node include overlay networking support, Layer 2 and Layer 3 VXLAN Gateway support, and
number of VLANs supported.

Recommended Leaf Nodes

The slide shows some of the recommended Juniper Networks products that can act as Leaf nodes.

IP Fabric Routing
The slide highlights the topic we discuss next.

Routing Strategy, Part 1

The slide highlights the desired routing behavior of a Leaf node. Ideally, each Leaf node should have multiple next-hops to
use to load share traffic over the IP fabric. Notice the router C can use two different paths to forward traffic to any remote
destination.

Routing Strategy, Part 2

The slide highlights the desired routing behavior of a Spine node. Ideally, each Spine node should have multiple next-hops to
use to load share traffic to remote destinations attached to the IP fabric. Notice that routers D and E have one path for singly
homed hosts and two path available for multihomed hosts. It just so happens that getting these routes and associated next
hops into the forwarding table of a Spine node can be tricky. The rest of the chapter discusses the challenges as well as the
solutions to the problem.

Layer 3 Connectivity
Remember that your IP Fabric will be forwarding IP data only. Each node is basically an IP router. In order to forward IP
packets between routers, they need to exchange IP routes. So, you have to make a choice between routing protocols. You
want to ensure that your choice of routing protocol is scalable and future proof. As you can see by the chart, BGP is the
natural choice for a routing protocol.

IBGP, Part 1
IBGP is a valid choice as the routing protocol for your fabric. IBGP peers almost always peer to loopback addresses as
opposed to physical interface addresses. In order to establish a BGP session (over a TCP session), a router must have a route
to the loopback address of its neighbor. To learn the route to a neighbor an Interior Gateway Protocol (IGP) like OSPF must be
enabled in the network. One purpose of enabling an IGP is simply to ensure every router knows how to get to the loopback
address of all other routers. Another problem that OSPF will solve is determining all of the equal cost paths to remote
destinations. For example, router A will determine from OSPF that there are 2 equal cost paths to reach router B. Now
router A can load share traffic destined for router B’s loopback address (IBGP learned routes, see next few slides) across the
two links towards router B.

IBGP, Part 2
There is a requirement in an IBGP network that if one IBGP router needs to advertise an IBGP route, then every other IBGP
router must receive a copy of that route (to prevent black holes). One way to ensure this happens is to have every IBGP router
peer with every other IBGP router (a full mesh). This works fine but it does not scale (i.e., add a new router to your IP fabric
and you will have to configure every router in your IP fabric with a new peer). There are two ways to help scale the full mesh
issue; route reflection or confederations. Most often, it is route reflection that is chosen (it is easy to implement). It is
possible to have redundant route reflectors as well (shown on the slide). It is best practice to configure one or more of the
Spine nodes as route reflectors.

IBGP, Part 3
Note: The next few slides will highlight the problem faced by a Spine node (router D) that is NOT a route reflector.
You must build your IP Fabric such that all routers load share traffic over equal cost paths (when they exist) towards remote
networks. Each router should be configured for BGP multipath so that they will load share when multiple BGP routes exist.
The slide shows that router A and B advertise the 10.1/16 network to RR-A. RR-A will use both routes for forwarding
(multipath) but will chose only one of those routes (the one from router B because it B has the lowest router ID) to send to
router C (a Leaf node) and router D (a Spine node). Router C and router D will receive the route for 10.1/16. Both copies will
have a BGP next hop of router B’s loopback address. This is the default behavior of route advertisement and selection in the
IBGP with route reflection scenario.
Did you notice the load balancing problem (Hint: the problem is not on router C)? Since router C has two equal cost paths to
get to router B (learned from OSPF), router C will load share traffic to 10.1/16 over the two uplinks towards the Spine routers.
The load balancing problem lies on router D. Since router D received a single route that has a BGP next hop of router B’s
loopback, it forwards all traffic destined to 10.1/16 towards router B. The path to router A (which is an equal cost path to
10.1/16) will never be used in this case. The next slide discusses the solution to this problem.
It should be worth noting that although router C has no problem load sharing towards the 10.1/16 network, if router B were
to fail, it may take some time for router C to learn about the router through router A. The next slide discusses the solution to
this problem.

IBGP, Part 4
The problem on RR-A is that it sees the routes received from routers A and B, 10.1/16, as a single route that has been
received twice. If an IBGP router receives different versions of the same route it is supposed to make a choice between them
and then advertise the one, chosen route to its appropriate neighbors. One solution to this problem is to make every Spine
node a route reflector. This would be fine in a small fabric but probably would not make sense when there are 10s of Spine
nodes. Another option would be to make each of the advertisements from router A and B look like unique routes. How can
we make the multiple advertisements of 10.1/16 from router A and B appear to be unique routes? There is a draft RFC
(draft-ietf-idr-add-paths) that defines the ADD-PATH capability which does just that; makes the advertisements look unique.
All routers in the IP Fabric should support this capability for it to work. Once enabled, routers advertise and evaluate routes
based on a tuple of the network and its path ID. In the example, router A and B advertise the 10.1/16 route. However, this
time, every router supports the ADD-PATH capability, RR-A attaches a unique path ID to each route and is able to advertise
both routes to all clients including router D. When the routes arrive on the clients, the clients install both routes in its routing
table (allowing them to load share towards routers A and B.) Although, router C was already able to load share without the
additional route, router C will be able to continue forwarding traffic to 10.1/16 even in the event of a failure of either router A
or router B.

EBGP, Part 1
EBGP is also a valid design to use in your IP Fabric. You will notice that the load balancing problem is much easier to fix in the
EBGP scenario. For example, there will be no need for the routers to support any draft RFCs! Generally, each router in an IP
Fabric should be in its own unique AS. You can use AS numbers from the private or public range or, if you will need
thousands of AS numbers, you can use 32-bit AS numbers.

EBGP, Part 2
In an EBGP-based fabric, there is no need for route reflectors or an IGP. The BGP peering sessions parallel the physical
wiring. For example, every Leaf node has a BGP peering session with every Spine node. There is no leaf-to-leaf or
spine-to-spine BGP sessions just like there is no leaf-to-leaf or spine-to-spine physical connectivity. EBGP peering is done
using the physical interface IP addresses (not loopback interfaces). To enable proper load balancing, all routers need to be
configured for multipath multiple-as as well as a load balancing policy. Both of these configurations will be covered
later in this chapter.

EBGP, Part 3
The slide shows that the router in AS64516 and AS64517 are advertising 10.1/16 to their 2 EBGP peers. Because
multipath multiple-as is configured on all routers, the receiving routers in AS64512 and AS64513 will install both
routes in their routing table and load share traffic destined to 10.1/16.

EBGP, Part 4
The slide shows that the routers in AS64512 and AS64513 are advertising 10.1/16 to all of their EBGP peers (all Leaf
nodes). Since multipath multiple-as is configured on all routers, the receiving router in the slide, the router in
AS64514, will install both routes in its routing table and load share traffic destined to 10.1/16.

Best Practices
When enabling an IP fabric you should follow some best practices. Remember, two of the main goals of an IP fabric design
(or a Clos design) is to provide a non-blocking architecture that also provides predictable load-balancing behavior.
Some of the best practices that should be followed include...
• All Spine nodes should be the exact same type of router. They should be the same model and they should also
have the same line cards installed. This helps the fabric to have a predictable load balancing behavior.
• All Leaf nodes should be the exact same type of router. Leaf nodes do not have to be the same router as the
Spine nodes. Each Leaf node should be the same model and they should also have the same line cards
installed. This helps the fabric to have a predictable load balancing behavior.
• Every Leaf node should have an uplink to every Spine node. This helps the fabric to have a predictable load
balancing behavior.
• All uplinks from Leaf node to Spine node should be the exact same speed. This helps the fabric to have
predictable load balancing behavior and also helps with the non-blocking nature of the fabric. For example, let
us assume that a Leaf has one 40GbE uplink and one 10GbE uplink to the Spine. When using the combination
of OSPF (for loopback interface advertisement and BGP next hop resolution) and IBGP, when calculating the
shortest path to the BGP next hop, the bandwidth of the links will be taken into consideration. OSPF will most
likely always chose the 40GbE interface during its shortest path first (SPF) calculation and use the interface for
forwarding towards remote BGP next hops. This essentially blocks the 10GbE interface from ever being used. In
the EBGP scenario, the bandwidth will not be taken into consideration, so traffic will be equally load shared over
the two different speed interfaces. Imagine trying to equally load share 60 Gbps of data over the two links, how
will the 10GbE interface handle 30 Gbps of traffic? The answer is...it won’t.

IP Fabric Scaling

Scaling
To increase the overall throughput of an IP Fabric, you simply need to increase the number of Spine devices (and the
appropriate uplinks from the Leaf nodes to those Spine nodes). If you add one more Spine node to the fabric, you will also
have to add one more uplink to each Leaf node. Assuming that each uplink is 40GbE, each Leaf node can now forward an
extra 40Gbps over the fabric.
Adding and removing both server-facing ports (downlinks from the Leaf nodes) and Spine nodes will affect the
oversubscription (OS) ratio of a fabric. When designing the IP fabric, you must understand OS requirements of your data
center. For example, does your data center need line rate forwarding over the fabric? Line rate forwarding would equate to
1-to-1 (1:1) OS. That means the aggregate server-facing bandwidth is equal to the aggregate uplink bandwidth. Or, maybe
your data center would work perfectly fine with a 3:1 OS of the fabric. That is, the aggregate server-facing bandwidth is 3
times that of the aggregate uplink bandwidth. Most data centers will probably not require to design around a 1:1 OS. Instead,
you should make a decision on an OS ratio that makes the most sense based on the data center’s normal bandwidth usage.
The next few slides discuss how to calculate OS ratios of various IP fabric designs.

3:1 Topology
The slide shows a basic 3:1 OS IP Fabric. All Spine nodes, four in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
server-facing interfaces. Each of the (48) 10GbE ports for all 32 Spine nodes will be fully utilized (i.e., attached to
downstream servers). That means that the total server-facing bandwidth is 48 x 32 x 10Gbps which equals 15360 Gbps.
Each of the 32 Leaf nodes has (4) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 4 x 32 x
40Gbps which equals 5120 Gbps. The OS ratio for this fabric is 15360:5120 or 3:1.
An interesting thing to note is that if you remove any number of Leaf nodes, the OS ratio does not change. For example, what
would happen to the OS ratio if their were only 31 nodes. The server facing bandwidth would be 48 x 31 x 10Gbps which
equals 14880 Gbps. The total uplink bandwidth is 4 x 31 x 40Gbps which equals 4960 Gbps. The OS ratio for this fabric is
14880:4960 or 3:1. This fact actually makes your design calculations very simple. Once you decide on an OS ratio and
determine the number of Spine nodes that will allow that ratio, you can simply add and remove Leaf nodes from the topology
without effecting the original OS ratio of the fabric.

2:1 Topology
The slide shows a basic 2:1 OS IP Fabric in which two Spine nodes were added to the topology from the last slide. All Spine
nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE interfaces. All leaf nodes, 32 in total, are
qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE server-facing interfaces. Each of the (48) 10GbE
ports for all 32 Spine nodes will be fully utilized (i.e., attached to downstream servers). That means that the total
server-facing bandwidth is still 48 x 32 x 10Gbps which equals 15360 Gbps. Each of the 32 Leaf nodes has (6) 40GbE
Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals 7680 Gbps. The OS
ratio for this fabric is 15360:7680 or 2:1.

1:1 Topology
The slide shows a basic 1:1 OS IP Fabric. All Spine nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
server-facing interfaces. There are many ways that an 1:1 OS ratio can be attained. In this case, although the Leaf nodes
each have (48) 10GbE server-facing interfaces, we are only going to allow 24 servers to be attached at any given moment.
That means that the total server-facing bandwidth is still 24 x 32 x 10Gbps which equals 7680 Gbps. Each of the 32 Leaf
nodes has (6) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals
7680 Gbps. The OS ratio for this fabric is 7680:7680 or 1:1.

Configure an IP Fabric

Example Topology
The slide shows the example topology that will be used in the subsequent slides. Notice that each router is the single
member of a unique autonomous system. Each router will peer using EBGP with its directly attached neighbors using the
physical interface addresses. Host A is singly homed to the router in AS 64514. Host B is multihomed to the routers in AS
64515 and AS 64516.

BGP Configuration—Spine Node

The slide shows the configuration of the Spine node in AS 64512. It is configured to peer with each of the Leaf nodes using
EBGP.

BGP Configuration—Leaf Node

The slide shows the configuration of the Leaf node in AS 64515. It is configured to peer with each of the Spine nodes using
EBGP.

Verifying Neighbors
Once you configure BGP neighbors, you can check the status of the relationships using either the show bgp summary or
show bgp neighbor command.

Routing Policy
Once BGP neighbors are established in the IP Fabric, each router must be configured to advertise routes to its neighbors and
into the fabric. For example, as you attach a server to a top-of-rack (TOR) switch/router (which is usually a Leaf node of the
fabric) you must configure the TOR to advertise the server’s IP subnet to the rest of the network. The first step in advertising
route is to write a policy that will match on a route and then accept that route. The slide shows the policy that must be
configured on the routers in AS64515 and AS 64516.

Applying Policy
After configuring a policy, the policy must be applied to the router EBGP peers. The slide shows the direct policy being
applied as an export policy as64515’s EBGP neighbors.

Verifying Advertised Routes

After applying the policy, the router should begin advertise any routes that were accepted by the policy. Use the show
route advertising-protocol bgp command to see which routes are being advertised to a routers BGP neighbors.

Default Behavior
Assuming the routers in AS 64515 and AS 64516 are advertising Host B’s subnet, the slide shows the default routing
behavior on a Spine node. Notice that the Spine node has received two advertisements for the same subnet. However,
because of the default behavior of BGP, the Spine node chooses a single route to select as the active route in the routing
table (you can tell which is the active route because of the asterisk). Based on what is shown in the slide, the Spine node will
send all traffic destined for 10.1.2/24 over the ge-0/0/2 link. The Spine node will not load share over the two possible next
hops by default.

Override Default BGP Behavior

The multipath statement overrides the default BGP routing behavior and allows two or more next hops to be used for
routing. The statement by itself requires that the multiple routes must be received from the same autonomous system. Use
the multiple-as modifier to override that matching AS requirement.

Verify Multipath
View the routing table to see the results of the multipath statement. As you can see the active BGP route now has two next
hops that can be use for forwarding. Do you think the router is using both next hops for forwarding?

Default Forwarding Table Behavior

The slide shows that since multipath was configured in the previous slides, two next hops are associated with the 10.1.2/
24 route in the routing table. However, only one next-hop is pushed down to the forwarding table, by default. So, at this point,
the Spine node is continuing to only forward traffic destined to 10.1.2/24 over a single link.

Load Balancing Policy

The final step in getting a router to load share, is to write and apply a policy that will cause the multiple next hops in the
routing table to be exported from the routing table into the forwarding table. The slide shows the details of that process.

Results
The output shows that after applying the load balancing policy to the forwarding table, all next hops associated with active
routes in the routing table have been copied into the forwarding table.

AS 64514
The slide shows the BGP and policy configuration for the router in AS 64514.

AS 64515

AS 64512

We Discussed:
• Routing in an IP Fabric;
• Scaling of an IP Fabric; and
• Configuring an IP Fabric.

Review Questions
1.
2.
3.

Lab: IP Fabric
The slide provides the objectives for this lab.

Answers to Review Questions
1.
Some of the Juniper Networks products that can be used in the Spine position of an IP Fabric are MX, QFX10k, and QFX5100
Series routers.
2.
Routing should be implemented in such a way that when multiple, equal physical paths exist between two points data traffic
should be load-shared over those paths to reach those two points.
3.
To allow a BGP speaker to install more than one next hop in the routing table when the same route is received from two or
more neighbors, multipath must be enabled.

Chapter 12: VXLAN

We Will Discuss:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.
Chapter 12–2 • VXLAN www.juniper.net

Layer Connectivity Over a Layer 3 Network

www.juniper.net VXLAN • Chapter 12–3

Layer 2 Apps
The needs of the applications that run on the servers in a data center usually drive the designs of those data centers. There
are many server-to-server applications that have strict requirements layer 2 connectivity between servers. A switched
infrastructure that is built around xSTP or a layer 2 fabric (like Juniper Network’s Virtual Chassis Fabric or Junos Fusion) is
perfectly suited for this type of connectivity. This type of infrastructure allow for broadcast domains to be stretched across
the data center using some form of VLAN tagging.
IP Fabric
Many of today’s next generation data centers are being built around IP Fabrics which, as their name implies, provide IP
connectivity between the racks of a data center. How can a next generation data center based on IP-only connectivity
support the layer 2 requirements of the traditional server-to-server applications? The rest of this section of this chapter will
discuss the possible solutions to the layer 2 connectivity problem.

Layer 2 VPNs
One possible solution to providing layer 2 connectivity over an IP-based data center would be to implement some form of
layer 2 virtual private network (VPN) on the routers that directly attach to the servers in the rack. Usually these routers would
be the top-of-rack (TOR) routers/switches. In this scenario, each TOR router would act as a layer 2 VPN gateway. A gateway is
the device in a VPN that performs the encapsulation and decapsulation of VPN data. In a layer 2 VPN based on Ethernet, a
gateway (router on left) will take Ethernet frames destined for a remote MAC address, encapsulate the original Ethernet
frame in some other data type (like IP, MPLS, IPsec, etc.) and transmit the newly formed packet to the remote gateway. The
receiving gateway (router on right) will receive the VPN data, decapsulate the data by removing the outer encapsulation, and
then forward the remaining original Ethernet frame to the locally attached server. Notice on the diagram, that the IP Fabric
simply had to forward IP data. The IP Fabric had no knowledge of the Ethernet connectivity that exists between Host A and B.

Data Plane
There are generally two components of a VPN. There is the data plane (as described on this slide) and the control plane (as
described on the next slide).
The data plane of a VPN describes the method in which a gateway encapsulates and decapsulates the original data. Also, in
regards to an Ethernet layer 2 VPN, it might be necessary for the gateway to learn the MAC addresses of both local and
remote servers much like a normal Ethernet switch learns MAC addresses. In almost all forms of Ethernet VPNs, the
gateways learn the MAC addresses of locally attached servers in the data plane (i.e. from received Ethernet frames). Remote
MAC addresses can be learned either in the data plane (after decapsulating data received from remote gateways) or in the
control plane.

Control Plane
One question that must be asked is, “How does a gateway learn about remote gateways?” The learning of remote gateways
can happen in one of two ways. Remote gateways can be statically configured on each gateway participating in a VPN or they
can be learned through some dynamic VPN signaling protocol.
Static configuration works fine but it does not really scale. For example, imagine that your have 20 TOR routers participating
in a statically configured layer 2 VPN. If you add another TOR router to the VPN, you would have to manually configure each
of the 20 switches to recognize the newly added gateway to the VPN.
Usually a VPN has some form of dynamic signaling protocol for the control plane. The signaling protocol can allow for
dynamic adds and deletions of gateways from the VPN. Some signaling protocols also allow a gateway to advertise its locally
learned MAC addresses to remote gateways. Usually a gateway has to receive an Ethernet frame from a remote host before
it can learn the host’s MAC address. Learning remote MAC addresses in the control plane allows the MAC tables of all
gateways to be more in sync. This has a positive side effect of causing the forwarding behavior of the VPN to be more
efficient (less flooding of data over the fabric).

Layer 2 VPN Options

The slide lists some of the layer 2 VPNs that exist today.

Virtualization
Data centers are relying on virtualization more and more. The slide shows the concepts of virtualizing servers in a data
center. Instead of installing a bare metal servers (BMS), a server can run as a virtual machine (VM) on a host machine. A VM
is a software computer that runs the same OS and applications as a BMS. A host machine is the physical machine that
house the VMs that run inside it.
One interesting piece of virtualization is how networking works between VMs. Normally, a BMS would simply need a physical
network interface card (NIC) to attach to the network. In the virtualized world, the VMs also utilize NICs, however they are in
fact, virtual. VMs use their virtual NICs to communicate with other VMs. To provide connectivity between VMs on the same
host machine, the virtual NICs attach to virtual switches. To allow VMs to communicate over the physical network, the virtual
switches use the physical NICs of the host machine. If the physical network is a switched network (as in the diagram), the
virtual switches appear to standard switches attached to the network. VLANs can simply be stretched from one virtual
switch, across the physical switched network, and terminate on one or more remote virtual switches. This works great when
the physical network is some sort of Ethernet switches network. However, what happens when the physical network is based
on IP routing?

VXLAN Is Supported by Major Vendors

As described in the previous slides, a layer 2 VPN can solve the problem by tunneling Ethernet frames over the IP network. In
the case of virtualized networks, the virtual switches running on the host machines will act as the VPN gateways. Many
vendors of virtualized products have chosen to support VXLAN as the layer 2 VPN. VXLAN functionality can be found in the
virtual switches like VMWare’s Distributed vSwitch, Open vSwitch, and Juniper Network’s Contrail vRouters. If virtualizing the
network is the future, it would seem that VXLAN has become the de facto layer 2 VPN in the data center.

VXLAN Using Multicast Control Plane


VXLAN—An Ethernet VPN

VXLAN is defined in RFC 7348 and describes a method of tunneling Ethernet frames over an IP network. RFC 7348
describes the data plane and a signaling plane for VXLAN. Although, RFC 7348 discusses PIM and multicast in the signaling
plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway Protocol (MP-BGP) Ethernet VPN
(EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the multicast method of signaling.

VXLAN Packet Format

The VXLAN packet consist of the following:
1. Original Ethernet Frame: The Ethernet frame being tunneled over the underlay network minus the VLAN tagging.
2. VXLAN Header (64 bits): Consists of an 8 bit flags field, the VNI, and two reserved fields. The I flag must be set
to 1 and the other 7 reserved flags must be set to 0.
3. Outer UDP Header: Usually contain the well known destination UDP port 4789. Some VXLAN implementations
allow for this destination port to be configured to some other value. The destination port is a hash of the inner
Ethernet frames header.
4. Outer IP Header: The source address is the IP address of the sending VXLAN Tunnel End Point (VTEP). The
destination address is the IP address of the receiving VTEP.
5. Outer MAC: As with any packet being sent over a layer 3 network, the source and destination MAC addresses
will change at each hop in the network.
6. Frame Check Sequence (FCS): New FCS for the outer Ethernet frame.

VTEP, Part 1
The VXLAN Tunnel Endpoint (VTEP) is the VPN gateway for VXLAN. It performs the encapsulation (and decapsulation) of
Ethernet frames using VXLAN encapsulation. Usually, the mapping of VLAN (VM-facing) to VNI is manually configured on the
VTEP.

VTEP, Part 2
The slide shows how a VTEP handles an Ethernet frame from a locally attached VM that must be sent to a remote VM. Here
is the step by step process taken by Virtual Switch 1...
1. VS1 receives an Ethernet frame with a destination MAC of VM3.
2. VS1 performs a MAC table look up and determines that the frame must be sent over the VXLAN tunnel to the
remote VTEP, VS2.
3. VS1 removes any outer VLAN tagging on the original Ethernet frame and then encapsulates the remaining
Ethernet frame using VXLAN encapsulation while also setting the destination IP address to VS2’s VTEP address
as well as setting the VNI appropriately.
4. VS1 forwards the VXLAN packet towards the IP Fabric.

VTEP, Part 3
The slide shows how a VTEP handles a VXLAN packet from a remote VTEP that must be decapsulated and sent to a local VM.
Here is the step by step process taken by the network and VS2...
1. The routers in the IP fabric simply route the VXLAN packet to its destination, VS2’s VTEP address.
2. VS2 receives the VXLAN packet and uses the received VNI to determine on which MAC table the MAC table
lookup should be performed.
3. VS2 strips the VXLAN encapsulation leaving the original Ethernet frame.
4. VS2 performs a MAC table lookup to determine the outgoing virtual interface to send the Ethernet frame.
5. VS2, if necessary, pushes on VLAN tag and forwards the Ethernet frame to VM3.
One thing you should notice about the VLAN tagging between the VMs and the virtual switches is that since the VLAN tags
are stripped before sending over the IP Fabric, the VLAN tags do not have to match between remote VMs. This actually allows
for more flexibility in VLAN assignments from server to server and rack to rack.

VXLAN Gateways, Part 1

We have discussed VTEPs that exist on virtual switches that sit on the host machines. However, what happens when the VMs
on the host machine need to communicate with a standard BMS that doesn’t support VXLAN. The VXLAN RFC describes how
a networking device like a router or switch can handle the VTEP role. A networking device that can perform that role is called
a VXLAN Gateway. There are two types of VXLAN Gateways; layer 2 and layer 3. The slide shows how a VXLAN Layer 2
Gateway (router on the right) handles VXLAN packets received from a remote VTEP. It simply provides layer 2 connectivity
between hosts on the same VLAN.
As you discuss the concept of a VTEP with others, you may notice that people refer to the different types of VTEPs in different
ways. For example, a VTEP that is part of a virtual switch (as shown in previous slides) is sometimes referred to as a software
VTEP. A physical router or switch acting as a VXLAN Gateway (Layer 2 or Layer 3) is sometimes referred to as a hardware
VTEP.

VXLAN Gateways, Part 2

Another form of gateway is the VXLAN Layer 3 Gateway. A layer 3 gateway acts as the default gateway for hosts on the same
VXLAN Segment (i.e. broadcast domain). In the slide, the default gateway for VM1 and VM2 is 10.1.1.254 which belongs to
Router B’s IRB interface. To send a packet to 1.1.1.1 (a remote IP subnet) VM1 must use Address Resolution Protocol (ARP)
to determine the MAC address of 10.1.1.254. Once VM1 knows the MAC address for 10.1.1.254, VM1 and the devices along
the way to the 1.1.1.1 will use the following procedure to forward an IP packet to its destination...
1. VM1 creates an IP packet destined to 1.1.1.1.
2. Since 1.1.1.1 is on a different subnet than VM1, VM1 encapsulates the IP packet in an Ethernet frame with a
destination MAC address of the default gateway’s MAC address and sends the Ethernet frame to VS1.
3. VS1 receives the Ethernet frame and performs a MAC table lookup and determines that the Ethernet frame
must be sent over the VXLAN tunnel to Router B. Router B appears to VS1 as the VTEP that is directly attached
the host that owns the destination MAC address. The reality is that the destination MAC address is the MAC
address of Router B’s IRB interface for that VLAN/VXLAN segment.
4. Router B receives the VXLAN packet, determines the VNI which maps to a particular MAC table, and strips the
VXLAN encapsulation leaving the original Ethernet frame.
5. Router B performs a MAC table lookup and determines that the destination MAC belongs to its own IRB
interface.
6. Router B strips the remaining Ethernet framing and performs a routing table lookup to determine the next hop
to the destination network.
7. Router B encapsulates the IP packet in the outgoing interface’s encapsulation and forwards it to the next hop.

Layer 3 Gateway Placement

The slide shows that the standard place to implement VXLAN layer 2 gateways is on the Leaf nodes. Layer 3 GW placement
is usually in the Spine or Fabric tier but can also be found on the Leaf nodes. Currently, most Juniper Leaf nodes QFX5100,
EX4300, etc do not support Layer 3 GW functionality.

VXLAN MAC Learning

This slide discusses the MAC learning behavior of a VTEP. The next few slides will discuss the details of how remote MAC
addresses are learned by VTEPs when using PIM as the control protocol.

BUM Traffic
The slide discusses the handling of BUM traffic by VTEPs according to the VXLAN standard model. In this model, you should
note that the underlay network must support a multicast routing protocol, preferably some form of Protocol Independent
Multicast Sparse Mode (PIM-SM). Also, the VTEPs must support Internet Group Membership Protocol (IGMP) so that they can
inform the underlay network that it is a member of the multicast group associated with a VNI.
For every VNI used in the data center, there must also be a multicast group assigned. Remember that there are 2^24 (~16M)
possible VNIs so your customer will need 2^24 group addresses. Luckily, 239/8 is a reserved set of organizationally scoped
multicast group addresses (2^24 group addresses in total) that can be used freely within your customer’s data center.

Building the Multicast Tree

The slide shows an example of a PIM-SM enabled network where the (*,G) rendezvous point tree (RPT) is established from
VTEP A to R1 and finally to the rendezvous point (RP). This is the only part of the RPT shown for simplicity but keep in mind
that each VTEP that belongs to 239.1.1.1 will also build its branch of the RPT (including VTEP B).

Multicast Forwarding
When VTEP B receives a broadcast packet from a local VM, VTEP B encapsulates the Ethernet frame into the appropriate
VXLAN/UDP/IP headers. However, it sets the destination IP address of the outer IP header to the VNI’s group address
(239.1.1.1 on the slide). Upon receiving the multicast packet, VTEP B’s DR (the PIM router closest to VTEP B) encapsulates
the multicast packet into unicast PIM register messages that are destined to the IP address of the RP. Upon receiving the
register messages, the RP de-encapsulates the register messages and forwards the resulting multicast packets down the
(*,G) tree. Upon receiving, the multicast VXLAN packet, VTEP A does the following:
1. Strips the VXLAN/UDP/IP headers;
2. Forwards the broadcast packet towards the VMs using the virtual switch;
3. If VTEP B was unknown, VTEP A learns the IP address of VTEP B; and
4. Learns the remote MAC address of the sending VM and maps it to VTEP B’s IP address.
For all of this to work, you must ensure that the appropriate devices support PIM-SM, IGMP, and the PIM DR and RP
functions.
It is not shown on this slide but once R1 receives the first native multicast packet from the RP (source address is VTEP B’s
address), R1 will build a shortest path tree (SPT) to the DR closest to VTEP B which will establish (S,G) state on all routers
along that path.

VXLAN Configuration

Example Topology
The slide shows the example topology that will be used for the subsequent slides.

Logical View
To help you understand the behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast
domain as well as IP subnet. Also, VRRP will run between the two routers so as to provide a redundant default gateway to the
two hosts.

Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Network’s routers as the VTEP interfaces. Therefore, you must make sure that the loopback
addresses of the routers are reachable. Remember, the loopback interface for each router in the IP Fabric fell into the
172.16.100/24 range.

PIM
Some form of PIM must be enabled in the IP Fabric. The slide shows that the routers will run PIM-SM with a statically
configured RP. The configurations of the RP as well as all other routers is shown on the slide. Notice that PIM-SM only needs
to be enabled on the IP Fabric facing interfaces.

Source Address
You must decided on the source address of the VXLAN and multicast packets that will be generated by the local VTEP. Use
the vtep-source-interface statement to specify the interface where the IP address will come from. This command is
the same for both MX and QFX5100 Series devices.

VXLAN Layer 2 Gateway Configuration, Part 1

The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a QFX5100 Series router. It
might be worth noting that you can configure the same multicast group for different VNIs on the same VXLAN gateway.
However, it may cause a remote VXLAN gateway to receive unwanted BUM traffic for a VNI that does not belong to.

VXLAN Layer 2 Gateway Configuration, Part 2

The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a MX Series router.

VXLAN Layer 3 Gateway

The slide shows how to enable VXLAN Layer 3 Gateway functionality on an MX Series router (not supported on QFX5100
series). Also, notice that VRRP has been enable on router as64512.
The VRRP/IRB configuration for router as64513 is as follows...
[edit interfaces irb]
lab@as64513# show
unit 0 {
family inet {
address 10.1.1.11/24 {
vrrp-group 1 {
virtual-address 10.1.1.254;
priority 100;
}
}
}
}
The bridge domain configuration on router as64513 would be the identical to that shown on the slide.

Multicast Transit Traffic

Since VXLAN-based bridge domains do not support any form of multicast snooping, you can use the command on the slide to
block the forwarding of multicast traffic over the VXLAN tunnels.
As you know, multicast is used in the control plane for VXLAN. It helps in the forwarding of BUM traffic (here we care about
the multicast traffic). Normally, when a VTEP receives multicast traffic from an attached server, it will send a copy to all other
locally attached servers on the same VLAN. It will also send a VXLAN encapsulated copy over the IP fabric using the
multicast-group for the VXLAN segment. That is, every remote VTEP will receive a copy of the original multicast packet,
regardless of whether or not they have any attached receivers. If you know that there are no receivers attached to any
remote VTEPs for a particular multicast group, you can use the command on the slide to help stop the transmission of transit
multicast traffic to uninterested VTEPs.

Preserve Original VLAN Tag

As you know, the default behavior of a Juniper Networks device acting as a VXLAN Layer 2 Gateway is to strip the original
VLAN tag of Ethernet frames received from locally attached receivers. Another default behavior of those same devices, is to
automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet frame. The
slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN
tagging is to preserve the 802.1p bits for class of service purposes.

PIM State Verification

The command on the slide helps determine the current (*,G) and (S,G) state for a router. From the point of view of a VXLAN
Gateway, the (*,G) state should instantiate as soon as you commit the vxlan statement in the configuration. Any (S,G) state
means that the gateway has received multicast traffic (BUM traffic encapsulated in VXLAN) from a remote VTEP allowing it to
learn the remote VTEP’s IP address, so the local gateway has instantiated a SPT towards that remote VTEP.

PIM Neighbors
The commands on the slide verify which PIM neighbors have been discovered and the associated settings for the neighbors.

VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.

VTEP Source and Remote

The source command allows you see the locally configured values for a gateway. The remote command allows you to see
the details of the remotely learned gateway/VTEPs.

MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.

We Discussed:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.

Review Questions
1.
2.
3.

Lab: VXLAN
The slide provides the objective for this lab.

1.
Major vendors of virtualization product support VXLAN to provide the layer 2 stretch over an IP-based data center. If the
vSwitches of your virtualized product ONLY support VXLAN, then more than likely your other networking devices will need to
support VXLAN as well.
2.
A VXLAN Gateway automatically removes the VLAN tag for an Ethernet frames received from a locally attached server.
3.
show ethernet-switching vxlan-tunnel-end-point remote mac-table on a QFX5100 Series switch or
show l2-learning vxlan-tunnel-end-point remote mac-table on an MX Series router can be used to view
the MAC learned from remote gateways.


Chapter 13: EVPN

We Will Discuss:
• The benefits of using EVPN signaling for VXLAN;
• The operation of the EVPN protocol; and
• Configuring and monitoring EVPN signaling for VXLAN.
Chapter 13–2 • EVPN www.juniper.net

The Benefits of EVPN

www.juniper.net EVPN • Chapter 13–3

VXLAN—An Ethernet VPN

VXLAN is defined in RFC 7348 and describes a method of tunneling Ethernet frames over an IP network. RFC 7348
describes the data plane and a signaling plane for VXLAN. Although, RFC 7348 discusses Protocol Independent Multicast
(PIM) and multicast in the signaling plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway
Protocol (MP-BGP) Ethernet VPN (EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the EVPN
method of signaling. Although we cover EVPN as the signaling component for VXLAN in this chapter, it should be noted that
EVPN can also be used as the signaling component for both MPLS/MPLS and MPLS/GRE encapsulations as well. Those
encapsulation types are not covered in this course.

Benefits of EVPN Signaling

The slide lists some of the benefits of using EVPN signaling instead of PIM. The subsequent slides of this section will discuss
each of these benefits at a very high level. It will be in the next section of this chapter that we will take a deep dive into the
EVPN protocol.

MP-BGP
EVPN is based on Multiprotocol Border Gateway Protocol (MP-BGP). It uses the Address Family Identifier (AFI) of 25 which is
the Layer 2 VPN address family. It uses the Subsequent Address Family Identifier of 70 which is the EVPN address family.
BGP is a proven protocol in both service provider and enterprise networks. It has the ability to scale to millions of route
advertisements. BGP also has the added benefit of being policy oriented. Using policy, you have complete control over route
advertisements allowing you to control which devices learn which routes.

Active/Active Forwarding
When using PIM in the control plane for VXLAN, it is really not possible to have a server attach to two different top of rack
switches with the ability to forward data over both links (i.e., both links active). When using EVPN signaling in the control
plane, active/active forwarding is totally possible. EVPN allows for VXLAN gateways (Leaf1 at the top of the slide) to use
multiple paths and multiple remote VXLAN gateways to forward data to multihomed hosts. Also, EVPN has mechanisms (like
split horizon, etc.) to ensure that broadcast, unknown unicast, and multicast traffic (BUM) does not loop back towards a
multihomed host.

Minimizing Unknown Unicast Flooding

The slide shows how EVPN signaling minimizes unknown unicast flooding.
1. Leaf2 receives an Ethernet frame with a source MAC address of HostB and a destination MAC address of
HostC.
2. Based on a MAC table lookup, Leaf2 forwards the Ethernet frame to its destination over the VXLAN tunnel.
Leaf2 also populates its MAC table with HostB’s MAC address and associates with the outgoing interface.
3. Since Leaf2 just learned a new MAC address, it advertises the MAC address to the remote VXLAN gateway,
Leaf1. Leaf1 installs the newly learned MAC address in its MAC table and associates it with an outgoing
interface, the VXLAN tunnel to Leaf2.
Now, when Leaf1 needs to send an Ethernet frame to HostB, it can send it directly to Leaf2 because it is a known MAC
address. Without the sequence above, Leaf1 would have no MAC entry in its table for HostB (making the frame destined to
HostB an unknown unicast Ethernet frame), so it would have to send a copy of the frame to all remote VXLAN gateways.

Proxy ARP
Although not currently supported, the EVPN RFC mentions that a EVPN Provider Edge (PE) router, Leaf1 in the example, can
perform Proxy ARP. It is possible that if Leaf2 knows the IP-to-MAC binding for HostB (because it was snooping some form of
IP traffic from HostB), it can send the MAC advertisement for HostB that also contains HostB’s IP address. Then, when HostA
sends an ARP request for HostB’s IP address (a broadcast Ethernet frame), Leaf1 can simply send an ARP reply back to
HostA without ever having to send the broadcast frame over the fabric.

Distributed Layer 3 Gateways

The EVPN control plane also helps enable distributed layer 3 gateways. In the slide, notice that HostC has a default gateway
configured of 10.1.1.254. SpineA and SpineB have been enabled as VXLAN Layer3 Gateway. They both have been configured
with the same virtual IP address of 10.1.1.254. If the Spine nodes are MX Series routers, they also share the same virtual
MAC address, 00:00:5e:00:01:01 (same as VRRP even though VRRP is not used). SpineA and SpineB send a MAC
Advertisement to LeafC for the same MAC. Now, LeafC can load share traffic from HostC to the default gateway.

VXLAN Using EVPN Control Plane


EVPN Terminology
The slide highlights the terms used in a network using VXLAN with EVPN signaling.
• PE devices: These are the networking devices (Leaf nodes in the diagram) to which servers attach in a data
center. These devices also act as VXLAN Tunnel Endpoints (VTEPs) or VXLAN gateways (can be Layer 2 or Layer
3). These devices can be any node of an IP fabric; Leaf or Spine.
• P devices: These are networking devices that only forward IP data. The do not instantiate any bridge domains
related to the EVPN.
• Customer Edge (CE) devices: These are the devices that require the Layer 2 stretch over the data center. They
are the servers, switches, and storage devices that need layer 2 connectivity with other devices in the data
center.
• Site: An EVPN site is a set of CEs that communicate with one another without needing to send Ethernet frames
over the fabric.
• EVPN Instance (EVI): An EVPN Instance spanning the PE devices participating in that EVPN.
• Bridge Domain: A MAC table for a particular VLAN associated with an EVI. There can be many bridge domains
for a given EVI.
• MP-BGP Session: EVPN PEs exchange EVPN routes using MP-BGP.
• VXLAN Tunnel: A tunnel established between EVPN PE devices used to encapsulate Ethernet frames in VXLAN
IP packets.

EVPN Routes
The slide lists the EVPN routes, their usage, as well as where they are defined. The subsequent slides will discuss most of
these routes in detail.

EVPN Type 2 Route—MAC/IP Advertisement Route

The type 2 route has a very simple purpose which is to advertise MAC addresses. Optionally, this route can be used to
advertise a MAC address, as usual, but also an IP address that is bound to that MAC address. Leaf2, an EVPN PE, will learn
MAC addresses in data plane from Ethernet frames received from CEs, CE2 in the example. Once Leaf2 learns CE2’s MAC
address, it will automatically advertise it to remote PEs and attaches a target community, community “Orange” in the
example. Leaf1, another EVPN PE, upon receiving the route must decide on whether it should keep the route. It makes this
decision based on the received route target community. Leaf1, in order to accept and use this advertisement, must be
configured with an import policy that accepts routes tagged with the “Orange” target community. Without a configured policy
that matches on the “Orange” route target, Leaf1 would just discard the advertisement. So, at a minimum, each EVI on each
participating PE for a given EVPN must be configured with an export policy that attaches a unique target community to MAC
advertisements and also configured with an import policy that matches and accepts advertisements based on that unique
target community.

Two Route Distinguisher Formats Defined

The route distinguisher can be formatted two ways:
• Type 0: This format uses a 2-byte administration field that codes the provider’s autonomous system number,
followed by a 4-byte assigned number field. The assigned number field is administered by the provider and should
be unique across the autonomous system.
• Type 1: This format uses a 4-byte administration field that is normally coded with the router ID (RID) of the
advertising PE router, followed by a 2-byte assigned number field that caries a unique value for each VRF table
supported by the PE router.
The examples on the slide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte
administration field with the 4-byte assigned number field (Type 0).
RFC 7432 recommends using the Type 1 route distinguisher for EVPN signaling.

Route Target Community

Each EVPN route advertised by a PE router contains one or more route target communities. These communities are added using
VRF export policy or explicit configuration.
When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target
matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose
configuration matches the route target.
Because the application of policy determines a VPN’s connectivity, you must take extra care when writing and applying VPN
policy to ensure that the tenant’s connectivity requirements are faithfully met.

VRF Export Policy

VRF export policy for EVPN is applied using the vrf-target statement. In the example, the statement vrf-target
target:1:1 is applied to Leaf2’s orange EVI. That statement causes all locally learned MACs (in the MAC table) to be
copied into the VRF table as EVPN Type 2 routes. Each of the Type 2 routes associated with locally learned MACs will be
tagged with the community target:1:1. Finally, these tagged routes are then advertised to all remote PEs.
In the next few slides, you will learn the details of the other EVPN route types. You should know that the vrf-target
statement always sets the target community (using hidden VRF import and export policies) of Type 1 routes. By default, the
vrf-target statement also sets the target community of Type 2 and Type 3 routes as well. Later in this chapter, you will
learn how to set a different target community for Type 2 and Type 3 routes.

VRF Import Policy

VRF import policy can be applied using the vrf-target statement or it can be enabled by manually writing a policy and
then applying it with the vrf-import statement. As you know, the vrf-target statement is used to enable export policy
that advertises EVPN routes tagged with the target community. The statement also happens to enable the associated import
policy which will accept routes that are tagged with that target community. So, you must configure the vrf-target
statement to enable export policy at a minimum. To override the import policy instantiated by that statement, you can apply
the vrf-import statement.
In the example, the vrf-target target:1:1 is applied to Leaf1’s EVI. When Leaf1 receives the MAC Advertisement
from Leaf2, it runs the route through the configured import policy which will accept routes tagged with target:1:1. Once
accepted, the route is copied into the Leaf1’s global RIB-IN table and then copied into the appropriate VRF table (the one
configured with the vrf-target target:1:1 statement). Finally, the route is converted into a MAC entry and stored in
Leaf1’s MAC table for the Orange EVI. The outgoing interface associated with the MAC is the VXLAN tunnel that terminates
on Leaf2.

Ethernet Segment
The set of links that attaches a site to one or more PEs is called an Ethernet segment. In the slide, there are two Ethernet
Segments. Site 1 has an Ethernet segment that consists of links A and B. Site 2 has an Ethernet segment that consists of
link C. Each Ethernet Segment must be assigned a 10-octet Ethernet Segment Identifier (ESI). There are two reserved ESI
values as shown in the slide. For a single-homed site, like Site 2, the ESI should be set to
0x00:00:00:00:00:00:00:00:00:00. This is the default ESI setting for a server facing interface on a Juniper Networks EVPN
PE. For any multihomed site, the ESI should be set to a globally unique ESI. In the example, both link A and link B have their
ESI set to 0x01:01:01:01:01:01:01:01:01:01. The commands below shows how to set the ESI on the server-facing interface.
{master:0}[edit interfaces et-0/0/50]

lab@leaf1# show
esi {
01:01:01:01:01:01:01:01:01:01;
all-active;
}
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan {
members v100;
...

Type 1—Ethernet Autodiscovery Route

Once you have configured a non-reserved ESI value on a site-facing interface, the PE will advertise an Ethernet Autodiscovery
route to all remote PEs. The route carries the ESI value as well as the ESI Label Extended Community. The community
contains the Single-Active Flag. This flag lets the remote PEs know whether or not they can load share traffic over the
multiple links attached to the site. If the Single-Active flag is set to 1, that means only one link associated with the Ethernet
segment can be used for forwarding. If the Single-Active flag is set to 0, that means that all links associated with the
Ethernet segment can be used for forwarding data (we call this active/active forwarding). Juniper Networks devices only
support active/active forwarding (we always set the flag to 0).

Remote PE Behavior
When a remote PE, Leaf 3 in the example, receives the Ethernet Autodiscovery routes from Leaf1 and Leaf2 it now knows
that it can use either of the two VXLAN tunnels to forward data to MACs learned from Site 1. Based on the forwarding choice
made by CE1, it may be that Leaf1 was the only PE attached to Site1 that learned CE1’s MAC address. That means that
Leaf3 may have only ever received a MAC Advertisement for CE1’s MAC from Leaf1. However, since Leaf1 and Leaf2 are
attached to the same Ethernet Segment (as advertised in their Type 1 routes), Leaf3 knows it can get to CE1’s MAC through
either Leaf1 or Leaf2. You can see in Leaf3’s MAC table, that both VXLAN tunnels have been installed as next hops for CE1’s
MAC address.

Added Benefit
Another benefit of the Ethernet Autodiscovery route is that it helps to enable faster convergence times when a link fails.
Normally, when a site-facing link fails, a PE will simply withdraw each of its individual MAC Advertisement. Think about the
case where there are thousands of MACs associated with that link. The PE would have to send 1000s of withdrawals. When
the Ethernet Autodiscovery route is being advertised (because the esi statement is configured on the interface), a PE (like
Leaf1 on the slide) can simply send a single withdrawal of its Ethernet Autodiscovery route and Leaf3 can immediately
update the MAC table for all of the 1000s of MACs it had learned from Leaf1. This allows convergence times to greatly
improve.

BUM Traffic
When EVPN signaling is used with VXLAN encapsulation, Juniper Networks devices only support ingress replication of BUM
traffic. That is, when BUM traffic arrives on a PE, the PE will unicast copies of the BUM packets to each of the individual PEs
that belong the same EVPN.

Type 3—Inclusive Multicast Ethernet Tag Route

This EVPN route is very simple. The route informs remote PEs of how BUM traffic should be handled. This information is
carried in the Provider Multicast Service Interface (PMSI) Tunnel attribute. It specifies whether PIM or ingress replication will
be used and the addressing that should be used to send the BUM traffic. In the diagram, Leaf2 advertises that it is expecting
and using ingress replication and that Leaf1 should use 4.4.4.4 as the destination address of the VXLAN packets that are
carrying BUM traffic.

Split Horizon Rules, Part 1

The slide shows the default split horizon rules that EVPN PEs follow when they receive BUM traffic from a local CE.

Split Horizon Rules, Part 2

The slide shows the default split horizon rules that EVPN PEs follow when they receive BUM traffic from a remote PE.

Active/Active Breaks Split Horizon

Earlier we discussed how the Type 1 Ethernet Autodiscovery route can enable multipath forwarding when a site is
multihomed to 2 or more PEs. That advertisement works great for known unicast traffic. However, the slide shows what
happens when Leaf1 must send BUM traffic.
In the top diagram, Leaf1 will make copies of the BUM packets and unicast them to each remote PE belonging to the same
EVPN. This will cause CE2 to receive multiple copies of the same packets. This is not good.
In the bottom diagram, Leaf3 receives BUM traffic from the attached CE. It makes copies and unicasts them to the remote
PEs including Leaf2. Leaf2 because of the default split horizon rules will forward BUM traffic back to the source creating a
loop.
Electing a designated forwarder for an ESI will solve these problems.

Designated Forwarder
To fix the problems described on the previous slide, all the PEs attached to the same Ethernet Segment will elect a
designated forwarded for the Ethernet segment (2 or more PEs advertising the same ESI). A designated forwarder will be
elected per broadcast domain. Remember that an EVI can contain 1 or more broadcast domains or VLANs. The Ethernet
Segment Route (Type 4) is used to help with the election of the designated forwarder.

Designated Forwarder Election

Once you’ve configured an ESI on an interface, the PE will advertise the Ethernet Autodiscovery Route (Type 1) and also a
Ethernet Segment Route (Type 4). The type 4 solves two problems. It helps in the designated forwarder election process and
it helps add a new split horizon rule.
Notice that Leaf2 and Leaf3 will advertise a type 4 to every PE belonging to an EVPN. However, notice that the route is not
tagged with a target community. Instead, it is tagged with a ES-import target community. The ES-import target community is
automatically generated by the advertising PE and is based off of the ESI value. Since Leaf1 does not have an import policy
that matches on the ES-import target, it will drop the type 4’s. However, since Leaf2 and Leaf3 are configured with the same
ESI, the routes are accepted by a hidden policy that matches on the ES-import target community that is only known by the
PEs attached to the same Ethernet Segment. Now Leaf2 and Leaf3 use the Originator IP address in the Type 4 route to build
a table that associates an Originator IP address (i.e. the elected designated forwarder) with a VLAN in a round-robin fashion.
After the election, If a non-designated forwarder for a VLAN receives BUM traffic from a remote PE, it will simply drop those
packets.

Distributed Default Gateways

It is possible to have multiple default gateways sharing the same IP address for a subnet. Notice the configuration on an MX
Series router...
[edit interfaces irb]

lab@spine1# show
irb {
unit 0 {
family inet {
address 10.1.1.10/24 { <<<<<this address must be different per GW
virtual-gateway-address 10.1.1.254; <<<this should be the same per GW
}
…
If both Spine1 and Spine2 are configured in this manner using the same virtual gateway address, both devices will not only
share the same virtual IP address but they will share a virtual MAC address, 00:00:5e:00:01:01. The Spine nodes will each
advertise that MAC address to the other PEs. Now the remote PEs will be able to load share traffic over the multiple paths to
the same virtual MAC address.

EVPN /VXLAN Configuration


Underlay Topology
The slide shows the IP Fabric that will serve as the underlay network. It is based on EBGP with each router being in its own
autonomous system. Each router will advertise its loopback address which will also serve as the VTEP address as well.

Overlay Topology
The slide shows the overlay topology. Each leaf will act as a VXLAN Layer 2 Gateway. Each Spine will act as a distributed
VXLAN Layer 3 Gateways and provide routing into and out of the 10.1.1/24 subnet. Host A will be dual-homed using a LAG to
two Leaf nodes. The control plane for VXLAN will be EVPN using MP-IBGP. In the IBGP topology, the Spine nodes will act as
route reflectors.

Logical View
To help you understand they behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast
domain as well as IP subnet. Also, a matching virtual IP address and a matching virtual MAC address will be assigned to
each Spine node’s IRB interface which will provide a redundant, distributed default gateway to the two hosts.

Common Configuration
The slide shows the common configuration for all routers. Notice that a load-balancing policy has been applied to the
forwarding-table that will allow for multiple next hops to be installed in the forwarding table. Also, there is a policy called
direct that will be applied to the EBGP neighbors. The main purpose of this policy is to advertise each router’s loopback
interface (VTEP source interface) to all routers in the fabric. Lastly, in order for each router to run BGP, the autonomous
system must be set under [edit routing-options]. Looking at the example topology, you should notice that each
router will belong to two autonomous systems. Each router will belong to one AS in the underlay and one AS in the overlay. If
you plan to use the automatic route target function (described in subsequent slides) you should set the AS under [edit
routing-options] to the overlay network’s AS number.

Spine Node Configuration

The slide shows the BGP configuration of the Spine nodes.
• Underlay Configuration: Each router is peering to one another using EBGP. The export statement allows for all
directly connect networks to be advertise to BGP neighbors. The local-as statement overrides the settings
under routing-options just for the neighbors in this group. The multipath multi-as statement allows
for multiple routes from multiple ASs to be used as active routes in the routing table.
• Overlay Configuration: Each Spine node is acting as a route reflector running IBGP with its clients. The
cluster statement cause the local router to act like a route reflector for the neighbors in this group. The
family evpn signaling statement sets the AFI and SAFI for the IBGP sessions. The local-as
configuration is probably unnecessary since the same AS is configured under routing-options. The
multipath statement allows for multiple similar received BGP routes to be active in the routing table.

Leaf Node Configuration

The slide shows the BGP configuration of a leaf node. The configuration is very similar to the previous slide. The main
difference is that in the overlay group, a leaf node only needs to peers with the route reflectors.

Underlay Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Network’s routers as the VTEP interface. Therefore, you must make sure that the loopback
addresses of the routers are reachable.

BGP Neighbor Status

Use the show bgp summary command to determine the status of your BGP neighbors. The slide shows that Leaf1 has
established neighbor relationships using EVPN signaling with the route reflectors. Although no routes have been received
from the route reflectors, you can see the RIB-IN that will be used for both sessions is bgp.evpn.0.

VXLAN Layer 2 Gateway Configuration

The slide shows the interface and VLAN configuration necessary to enable VXLAN Layer 2 Gateway functionality on a
QFX5100 Series router. Although this slide shows the configuration on the Leaf1 device, Leaf2 will have the exact same
configuration. Since Leaf1 and Leaf2 have interfaces that belong to the same Ethernet Segment, both Leaf1 and Leaf2
should have their et-0/0/50’s ESI value set to the same value. When you assign an ESI value, you need to make sure that it
is globally unique for the ES.

QFX5100 VRF Import and Export Policy

The slides shows the minimum configuration (along with the previous slide) to enable VXLAN Layer 2 Gateway functionality
using EVPN signaling. It is under [edit switch-options] that you will configure the vtep-source-interface,
route-distinguisher, and the vrf-target statement. It is under [edit protocols evpn] that you will set the
encapsulation of VXLAN, the multicast mode, and the list of VNIs that will receive the benefit of EVPN signaling.
The slide mentions the vrf-target statement and its behavior in the exporting of EVPN routes. It literally creates a hidden
export policy that advertises all locally generated Type1, Type2, and Type3 routes to remote PE routers after tagging the
routes with the specified target community. Also, the vrf-target statement creates a hidden import policy that accepts
any received EVPN routes that are tagged with the specified target community. We will discuss how to modify the router’s
import and export policies in subsequent slides.

MX - VXLAN Layer 3 Gateway

The slide shows the minimum configuration of an MX Series router acting as a VXLAN Layer 3 Gateway. Even though the slide
shows the top to bottom view, it may help to understand what is going on if you look at it from the bottom up. Notice that the
bridge domain and EVPN configuration occurs in the context of a virtual switch, tenant1_vs. Virtual switch configuration is
required on an MX Series when using EVPN signaling. Everything under the tenant1_vs enables the MX Series to be a
VXLAN Layer 2 Gateway, similar to the QFX5100 series configuration on the previous slide except for the
routing-interface irb.0 statement. Notice the IRB interface has been given a real IP address of 10.1.1.10. It has
also been assigned a virtual-gateway-address of 10.1.1.254 which is the default gateway for the 10.1.1/24 subnet.
It may not be obvious but the virtual-gateway-address statement also binds a virtual MAC address of
00:00:5e:00:01:01 to the 10.1.1.254 address on the spine1 router. The virtual-gateway-address does everything
mentioned above as well as cause the Spine1 router to send a MAC Advertisement route to all remote PEs advertising the
virtual MAC address. Since Spine1 and Spine2 are configured with the same virtual-gateway-address (and virtual
MAC), the remote PEs can load share traffic towards the virtual MAC address (i.e. whenever a host needs to send data to the
default gateway). One last thing to mention is that by default, the subnet associated with the IRB interface is installed in the
inet.0 table. The slide shows that the IRB interface has been associated with the tenant1_vr routing instance. That
means that any packet arriving on the IRB interface will be routed based on the tenant_vr.inet.0 routing table.

Default vrf-target Behavior

Earlier, we saw the minimum configuration needed to allow a device to advertise EVPN routes using the vrf-target
statement. Using the vrf-target statement by itself gives you very little control over the routes that get populated in the
VRF tables. The slide shows that Leaf2 only needs to receive MAC Advertisements for VNI 1000. However, since each Leaf
node is only configured with the vrf-target statement, Leaf2 will receive and accept MAC Advertisement Routes for VNI
2000 also. Even though, Leaf2 does not have a MAC table for VLAN 200, it will still install all the MAC Advertisement routes
in its RIB-IN table as well as its VRF table. This can be a major waste of memory on Leaf2 depending on how many MACs
have been advertised for VNI 2000. The next few slides will show you how to get control over which routes are accepted by
the PE routers.

Targets Per VNI

Type 1 Ethernet Autodiscovery routes are always advertised with the target associated with the vrf-target statement.
Also, Type 4 Ethernet Segment routes do not carry a standard target community, instead, they carry an ES-import
community. That leaves us with the Type 2 MAC Advertisement Route and the Type 3 Inclusive Multicast Ethernet Tag Route.
As you know, both of these routes carry the VNI value. That means that these types of routes are VNI-specific (Type 1 and
Type 4 routes are Ethernet Segment-specific). It is possible to set VNI-specific import and export policy using VNI-specific
target communities. The slide shows how to configure the VNI-specific vrf-target statement under the vni-options
hierarchy. The vrf-target statement applies a hidden export policy that advertises and tags the Type 2 and Type 3
routes for the related VNI using the configured target community, the commands also creates an import policy to accepts the
target communities.

Automatic Route Targets

In the previous slide you learned how to take control of a router’s Type 2 and Type 3 advertisements. If you are working with
1000s of VLANs/VNI per interface, the task of applying per-VNI route targets might become cumbersome. The slide shows
you how you can have your router automatically assign route targets to each configured VNI by configuring the auto
statement. This statement will also cause your router to automatically enable hidden VRF import and export policies to
advertise and accept received routes tagged with the automatically generated target communities. You should note that the
automatically generated VRF import policies that are created as a result of the auto statement will override the import
policy that gets instantiated with the vrf-target target:64520:1 statement on the slide (which is used for the Type
1 advertisements). So, you must configure and apply an import policy that will accept the Type 1 routes.
In order for the auto statement to work nicely between PEs (so they calculate the same target communities), every PE router
must be configured with auto statement. Also, each PE router must be configured for the same autonomous system under
the [edit routing-instance] hierarchy since the automatically generated target communities are based on that AS
value.

Preserve Original VLAN Tag

As you know, the default behavior of a Juniper Networks device acting as a VXLAN Layer 2 Gateway is to strip the original
VLAN tag of Ethernet frames received from locally attached servers. Another default behavior of those same devices, is to
automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet frame. The
slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN
tagging is to preserve the 802.1p bits for class of service purposes.

BGP Status
Use the show bgp summary command to determine the status and routing tables used with your router’s BGP neighbors.

EVPN RIB-IN
The slide shows how to view all the routes (for all EVPN instances) that have been accepted by VRF import policies.

VRF Table
The slide shows you how to view the routes for a particular EVPN instance.

Other BGP Commands

The slide shows some other useful BGP troubleshooting commands.

VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.

VTEP Source and Remote

The source command allows you see the locally configured values for a gateway. The remote command allows you to see
the details of the remotely learned gateway/VTEPs.

MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.

We Discussed:
• The benefits of using EVPN signaling for VXLAN;
• The operation of the EVPN protocol; and
• Configuring and monitoring EVPN signaling for VXLAN.

Review Questions
1.
2.
3.

Lab: EVPN Control Plane for VXLAN


1.
EVPN allows for CE devices to multi-home to more than one Leaf node such that all interfaces are actively forwarding data.
EVPN signaling minimizes unknown unicast flooding since PE routers advertise locally learned MACs to all remote PEs.
2.
An Ethernet Segment route is tagged with the ES-Import Route Target community.
3.
Because configuring the auto statement overrides the hidden import policies of the vrf-target statement, you must
configure and apply a VRF import policy that accepts the target community that is assigned to the Type 1 routes.


Chapter 14: Data Center Interconnect

We Will Discuss:
• The meaning of the term Data Center Interconnect;
• The control and data plane of an MPLS VPN; and
• The DCI options when using a VXLAN overlay with EVPN signaling.
Chapter 14–2 • Data Center Interconnect www.juniper.net

DCI Overview
www.juniper.net Data Center Interconnect • Chapter 14–3

What Is a Data Center Interconnect?

Data center interconnect (DCI) is basically a method to connect multiple data centers together. As the name implies, a
Layer 3 DCI uses IP routing between data centers while a Layer 2 DCI extends the Layer 2 network (VLANs) from one data
center to another.
Many of the DCI communication options rely on an MPLS network to transport frames between data centers. Although in
most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there are several
advantages to using an MPLS network including availability, cost, fast failover, traffic engineering, and scalable VPN options.

Interconnect Network
Between two data centers that need to be interconnected is a network of some type. A typical interconnect network could be
a point-to-point line, an IP network, or an MPLS network. The slide shows that these networks can be owned by customer
(the owners of the data center) or by a service provider. All the DCI options that we discuss in this chapter will work in both a
customer-owned or service provider-owned interconnect network. The main difference is how much control a customer has
over the DCI. Sometimes it is just easier and cost effective to let the service provider manage the DCI.

Point-to-Point DCI
In general, if there is great distance between data centers, a point-to-point interconnect can be pretty expensive. However, if
the data centers are just down the street from one another, it might make sense to have a point-to-point interconnect. This
type of interconnect is usually provided as a dark fiber between the data center. The customer simply attaches equipment to
the fiber and has the choice of running any type of protocol they wish over the interconnect.

IP DCI
It is possible to provide a DCI over an IP network. If the DCI is meant to provide Layer 2 stretch (extending of VLANs) between
the data centers then the Ethernet frames will need to be encapsulated in IP as it traverses the DCI. VXLAN and GRE are
some of the typical IP encapsulations that provide the layer 2 stretch. If the DCI is to provide layer 3 reachability between
data center, then an IP network is well suited to meet those needs. However, sometime the DCI network may only support
globally routeable IP addressing while the data centers use RFC 1918 addressing. When that is the case, it might make
sense to create a layer 3 VPN between the two data centers, like GRE, IPsec, or RFC 4364 (MPLS Layer 3 VPN over GRE).

MPLS DCI
The slide shows the encapsulation boundary of an MPLS transport network. The boundaries are different depending on who
owns the MPLS network. If the customer own the MPLS network then MPLS can be used for encapsulation from end-to-end.
If the service provider owns the MPLS network then the encapsulations between DC and MPLS network completely depends
on what is allowed by the service provider. If the service provider is providing a layer 2 VPN service, then the customer should
expect that any Ethernet frames sent from one data center will appear unchanged as it arrives at the remote data center. If
the service provider is providing a layer 3 VPN service, then the customer should expect that any IP packets sent from one
data center will appear unchanged as it arrives at the remote data center. In some cases, the service provider will allow a
customer to established data center-to-data center MPLS label switched paths (LSPs).

MPLS Advantages
Many of the DCI technologies that we will discuss depend on an MPLS network to transport frames between data centers.
Although in most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there
are several advantages to using an MPLS network:
1. Fast failover between MPLS nodes: Fast reroute and Node/Link protection are two features of an MPLS network
that allow for 50ms or better recovery time in the event of a link failure or node failure along the path of an
MPLS label switched path (LSP).
2. Scalable VPNs: VPLS, EVPN, L3 MPLS VPNs are DCI technologies that use MPLS to transport frames between
data centers. These same technologies allow for the interconnection of many sites (potentially hundreds)
without the need for the manual setup of a full mesh of tunnels between those sites. In most cases, adding a
new site only requires administrator to configure the devices at the new site. The remote sites do not need to be
touched.
3. Traffic engineering: MPLS allows for the administrator to decide the path takes over the MPLS network. You no
longer have to take the same path calculated by the IGP (i.e., all data takes the same path between sites). You
can literally direct different traffic types to take different paths over the MPLS network.
4. Any-to-any connectivity: When using an MPLS backbone to provide the DCI, it will allow you the flexibility to
provide any type of MPLS-based Layer 2 DCI, Layer 3 DCI, or both combinations that you choose. An MPLS
backbone is a network that can generally support most types of MPLS or IP-based connectivity at the same
time.

MPLS VPN Review


MPLS Shim Header, Part 1

MPLS is responsible for directing a flow of IP packets along a predetermined path across a network. This path is the LSP,
which is similar to an ATM VC in that it is unidirectional. That is, the traffic flows in one direction from the ingress router to an
egress router. Duplex traffic requires two LSPs—that is, one path to carry traffic in each direction. An LSP is created by the
concatenation of one or more label-switched hops that direct packets between label-switching routers (LSRs) to transit the
MPLS domain.
When an IP packet enters a label-switched path, the ingress router examines the packet and assigns it a label based on its
destination, placing a 32-bit (4-byte) label in front of the packet’s header immediately after the Layer 2 encapsulation. The
label transforms the packet from one that is forwarded based on IP addressing to one that is forwarded based on the
fixed-length label. The slide shows an example of a labeled IP packet. Note that MPLS can be used to label non-IP traffic,
such as in the case of a Layer 2 VPN.
MPLS labels can be assigned per interface or per router. The Junos operating system currently assigns MPLS label values on
a per-router basis. Thus, a label value of 10234 can only be assigned once by a given Juniper Networks router.
At egress the IP packet is restored when the MPLS label is removed as part of a pop operation. The now unlabeled packet is
routed based on a longest-match IP address lookup. In most cases, the penultimate (or second to last) router pops the label
stack in penultimate hop popping. In some cases, a labeled packet is delivered to the ultimate router—the egress LSR—when
the stack is popped, and the packet is forwarded using conventional IP routing.

MPLS Shim Header, Part 2

The MPLS shim header is composed of four fields:
• 20-bit label: Identifies the packet as belonging to a particular LSP. This value changes as the packet flows on
the LSP from LSR to LSR.
• Traffic Class (TC): Formerly called EXP (experimental), these three bits can be used to convey class-of-service
information, specifically the forwarding class a given packet belongs to. The 3-bit width of this field makes it
possible to give a frame a total of eight possible markings, each of them potentially linked to a different
forwarding behavior, for example a different queuing priority and a different buffer size.
• Bottom-of-stack bit: many MPLS applications require a packet to be tagged with several labels, one stacked on
top of the other.
The bottom-of-stack bit of a MPLS header is set to 1 if this is the bottom of the label stack, and below is the
payload. The bottom-of-stack bit is set to zero instead if below the header lays another MPLS header (i.e.
another label).
Among the applications which require label stacking are for example VPNs. Here the outer label, or transport
label, indicates which label-switching router traffic should be delivered to. The inner label, called service label,
describes instead how the payload should be treated once it reaches its destination label-switching router.
• Time to live (TTL): As in the case for the equivalent IP field, TTL limits the number of hops a MPLS packet can
travel. It is decremented at each hop, and if its value drops to zero, the packet is discarded. When using MPLS
for IP traffic engineering, the default behavior is to copy the value of the IP TTL field into the MPLS TTL field. This
allows diagnostic tools like traceroute to continue working even when packets are encapsulated within MPLS
and sent down a label-switched path.

Labels Have Only Local Significance

A very important point to keep in mind is that labels have only local significance: they are assigned by each router according
to its own label availability. In other words, you can establish a label-switched path across a domain, between two endpoints,
and traffic following the path will typically be tagged with a different label at each hop.
A second important point is that generally labels are global to the router, and not tied to the incoming interface; a packet
tagged with a given label will be subject to the same forwarding treatment regardless from the interface it has been received
on. This apparently minor point will play a major role in MPLS traffic protection - a set of MPLS features that try and minimize
packet loss during a link or a node failure.
There are only very few exceptions to this rule, mostly to do with specific (and very advanced) MPLS applications. One
example is carrier-of-carriers, where a MPLS-enabled service provider offers a MPLS transport service to other service
providers.

Reserved MPLS Label Values

Labels 0 through 15 are reserved (RFC 3032, MPLS Label Stack Encoding).
• A value of 0 represents the IP version 4 (IPv4) explicit null label. This label indicates that the label must be
popped, and the forwarding of the packet must then be based on what is below it, either another label or the
payload.
• A value of 1 represents the router alert label. This label value is legal anywhere in the label stack except at the
bottom. When a received packet contains this label value at the top of the label stack, it is delivered to a local
software module for processing. The label beneath it in the stack determines the actual forwarding of the
packet. However, if the packet is forwarded further, the router alert label should be pushed back onto the label
stack before forwarding. The use of this label is analogous to the use of the router alert option in IP packets.
• A value of 2 represents the IP version 6 (IPv6) explicit null label. This label value is legal only when it is the sole
label stack entry. It indicates that the label stack must be popped, and the forwarding of the packet then must
be based on the IPv6 header.
• A value of 3 represents the implicit null label. This is a label that a LSR can assign and distribute, but it never
actually appears in the encapsulation. When an LSR would otherwise replace the label at the top of the stack with
a new label, but the new label is implicit null, the LSR pops the stack instead of doing the replacement. Although
this value can never appear in the encapsulation, it can be specified by a label signaling protocol.
Continued on the next page.

Reserved MPLS Label Values (contd.)
The following list is a continuation of reserved Labels 0 through 15 (RFC 3032, MPLS Label Stack Encoding).
• A value of 7 is used for the Entropy Label Indicator (ELI). After determining a load balancing methodology, the ELI
allows the ingress LSR to notify the downstream LSRs of the chosen load balancing methodology.
• A value of 13 is used for Generic Associated Channel Label (GAL). This label informs an LSR that a received LSP
belongs to a Virtual Circuit Connectivity Verification (VCCV) control channel.
• A value of 14 is used as the OAM Alert Label. This label indicates that a packet is an MPLS OAM packet as
described in ITU-T Recommendation Y.1711.
• Values 4–6, 8-12, and 15 are reserved for future use.

Implicit and Explicit Null Labels

Two labels deserve special attention: Label 0 and Label 3. These two labels can only be used at the end of an LSP, between
the penultimate (that is, second-to-last) and the egress router.
• Label 0 (Explicit null): This label is always assigned an action of “decapsulate” (pop); the label-switching router
will just remove the MPLS header, and take a forwarding action based on what is below it (either another label,
or the actual LSP payload).
• Label 3 (Implicit Null): This is a special label value that is never actually found in MPLS frames, but only within
MPLS signaling protocols. It is used by the egress router, i.e. the last hop in a label-switched path, to request the
previous router to remove the MPLS header. This behavior, referred to as penultimate-hop popping, is the
Junos OS default.

Label-Switching Router
The original definition of label-switching router is “a router that takes forwarding decisions based only on the content of the
MPLS header”. In other words, a label-switching router always operates in label-switching mode. We will use a definition
which is slightly less restrictive, to include also ingress and egress routers, sometimes referred to as label edge routers.
Traffic at the ingress or at the egress of a label-switched path is typically not encapsulated into MPLS, so label-switching is
not possible, and a forwarding decision needs to be taken according to other rules.
We will use the term label-switching router (LSR) to mean any router which participates in MPLS forwarding, including both
the ingress and the egress nodes. For brevity, in the rest of the course we will also use the term router as synonym for
label-switching router.

MPLS Label Operations

The forwarding behavior of a label-switching router is defined according to three basic label operations:
• Push: add a MPLS header (a label) to a packet.
This operation is typically done by the label-switching router at the beginning of a label-switched path, to
encapsulate a non-MPLS packet and allow it to be forwarded by label switching within the MPLS domain.
• Pop: remove a MPLS header from a MPLS-encapsulated packet.
This is often done either at the end of an LSP or, as we will see shortly, by the second-to-last router (the
penultimate hop).
• Swap: replace the label value of a MPLS packet with another value.
This operation is typically performed by transit label-switching routers, as a packet traverses a label-switched
path.
After performing one of these MPLS basic operations, the packet is generally forwarded to the next-hop router.
In some cases the forwarding treatment can be more complex, involving different combinations of the three basic
operations. For some types of services, for example for VPNs, it is common to see a double-push forwarding action; while in
some traffic protection scenarios, when building a local detour to avoid a link failure, sometimes a transit router will have to
perform a swap-push operation.

Label-Switched Path
A label-switched path (LSP) is a unidirectional path through the network defined in terms of label switching operations (push,
pop, swap). You can think of a LSP as a tunnel: any packet that enters it is delivered to its endpoint, no matter what type of
payload it contains.
Establishing a label-switched path across a MPLS domain means determining the actual labels and label operations
performed by the label-switching routers on the path. This can be done with manual configuration, or by some type of
dynamic label distribution protocol.
Often a label-switched path will reside within a single MPLS domain, for example within a single service provider. However,
the development of advanced BGP-based MPLS signaling allows the creation of label-switched paths that span multiple
domains and multiple administrations.

Ingress Label Switching Router

The ingress router, sometimes called head end, is typically performing a label operation of push, inserting the MPLS header
between the layer-2 encapsulation and the payload packet. Its role is encapsulating non-MPLS traffic by adding one or more
labels to it, and forwarding it down a label-switched path.
The ingress router is not a pure label-switching router: the initial decision of which traffic to forward down which LSP is taken
not according to the content of labels (which are not present yet), but according to other criteria, e.g. a route lookup for IP
MPLS traffic engineering, or even the incoming interface, in case of point-to-point transport of layer-2 frames over MPLS
(layer-2 circuits, circuit-cross-connect).

Transit Label Switching Routers

The transit label-switching routers are LSRs that are neither at the beginning nor at the end of a label-switched path. They
typically operate in pure label switching mode, taking forwarding decisions only based on the label value of incoming MPLS
frames.
Very often transit LSRs will perform a swap operation, replacing the incoming label with the one expected by the next-hop of
the label-switched path. Transit LSRs are typically not aware of the content of the MPLS traffic they are forwarding, and do
not know if the payload is IP, IPv6, layer-2 frames or anything else.

The Label Information Base

The Label Information Base contains the actual MPLS label switching table which associates incoming MPLS labels to
forwarding actions, typically a label operation of either swap or pop and a forwarding next-hop.
Even if the label information base can be populated by static entries, generally this is done by a dynamic label distribution
protocol.

Penultimate-Hop Popping
Often the MPLS header is removed by the second-to-last (the penultimate) router in an LSP. This removal is an optimization
that helps in several cases, including using MPLS for IP traffic engineering. Removing the label at the penultimate hop
facilitates the work of the last-hop (egress) router, which, instead of having both to remove the MPLS header and then take
an IP routing decision, will only need to do the latter.
Penultimate-hop popping (PHP) is the default behavior on Juniper routers; however, it can be disabled in the configuration.
Some applications require PHP to be disabled, but that is often done automatically: the Junos OS is smart enough to detect
the need to signal the LSP so that PHP is disabled.

Egress Label Switching Router

The egress router (or tail end) of a LSP is the last router in the label-switched path. Exactly as in the case of the ingress LSR,
it is generally not a pure label-switching router, as it has to take a forwarding decision based on other factors than the
incoming label.
In case of MPLS IP traffic engineering, the egress router will be delivered ordinary IP packets due to penultimate-hop
popping, and will take a forwarding decision based on ordinary IP routing.

Label Information Base

On routers running the Junos OS, the label information base is stored into the mpls.0 table.
As soon as you enable MPLS processing, four default entries are automatically created: they are for label 0 (explicit null),
label 1 (router alert), label 2 (ipv6 explicit null) and label 13 (Generic Associated Label, used for Operation and Maintenance
and defined in RFC5586).

Label to Forwarding Action Mappings

On top of the pre-defined four labels, the mpls.0 table can be populated by static configuration or, much more frequently, by
dynamic label distribution protocols.
Each label is associated with a forwarding action, typically composed of a MPLS label operation (push, pop, swap or a
combination of these) and a next-hop. In this example, label 300576 has been installed by a dynamic protocol called LDP,
while the remaining label, 1004792, has been configured statically.
Note that there are two entries for this last label. This is because, in some cases, a label-switching router may have to take
different forwarding actions according to whether the label is or is not at the bottom of the label stack. In this case, the
forwarding actions turn out to be the same: pop the MPLS header and sent the content to 172.17.23.1 via interface ge-1/1/
5.0. The IP address of the next hop needs of course to be directly connected: it is only use to derive which MAC address to
use for layer-2 encapsulation.

Label Distribution Protocols

Label distribution protocols create and maintain the label-to-forwarding equivalence class (FEC) bindings along an LSP from
the MPLS ingress label-switching router (LSR) to the MPLS egress LSR. A label distribution protocol is a set of procedures by
which one LSR informs a peer LSR of the meaning of the labels used to forward traffic between them. MPLS uses this
information to create the forwarding tables in each LSR.
Label distribution protocols are often referred to as signaling protocols. However, label distribution is a more accurate
description of their function and is preferred in this course.
The label distribution protocols create and maintain an LSP dynamically with little or no user intervention. Once the label
distribution protocols are configured for the signaling of an LSP, the egress router of an LSP will send label (and other)
information in the upstream direction towards the ingress router based on the configured options.

RSVP
The Junos OS uses RSVP as the label distribution protocol for traffic engineered LSPs.
• RSVP was designed to be the resource reservation protocol of the Internet and “provide a general facility for
creating and maintaining distributed reservation state across a set of multicast or unicast delivery paths”
(RFC 2205). Reservations are an important part of traffic engineering, so it made sense to continue to use
RSVP for this purpose rather than reinventing the wheel.
• RSVP was explicitly designed to support extensibility mechanisms by allowing it to carry what are called opaque
objects. Opaque objects make no real sense to RSVP itself but are carried with the understanding that some
adjunct protocol (such as MPLS) might find the information in these objects useful. This encourages RSVP
extensions that create and maintain distributed state for information other than pure resource reservation. The
designers believed that extensions could be developed easily to add support for explicit routes and label
distribution.
• Extensions do not make the enhanced version of RSVP incompatible with existing RSVP implementations. An
RSVP implementation can differentiate between LSP signaling and standard RSVP reservations by examining
the contents of each message.

RSVP (contd.)
• With the proper extensions, RSVP provides a tool that consolidates the procedures for a number of critical
signaling tasks into a single message exchange:
– Extended RSVP can establish an LSP along an explicit path that would not have been chosen by the
interior gateway protocol (IGP);
– Extended RSVP can distribute label-binding information to LSRs in the LSP;
– Extended RSVP can reserve network resources in routers comprising the LSP (the traditional role of
RSVP); and
– Extended RSVP permits an LSP to be established to carry best-effort traffic without making a specific
resource reservation.
Thus, RSVP provides MPLS-signaled LSPs with a method of support for explicit routes (“go here, then here, finally here…”),
path numbering through label assignment, and route recording (where the LSP actually goes from ingress to egress, which is
very handy information to have).
RSVP also gives MPLS LSPs a keepalive mechanism to use for visibility (“this LSP is still here and available”) and redundancy
(“this LSP appears dead…is there a secondary path configured?”).
LDP
LDP associates a set of destinations (prefixes) with each data link layer LSP. This set of destinations is called the FEC. These
destinations all share a common data LSP path egress and a common unicast routing path. LDP supports topology-driven
MPLS networks in best-effort, hop-by-hop implementations. The LDP signaling protocol always establishes LSPs that follow
the contours of the IGP’s shortest path. Traffic engineering is not possible with LDP.

Layer 2 Options
Three classifications exist for Layer 2 DCIs:
1. No MAC learning by the Provider Edge (PE) device: This type of layer 2 DCI does not require that the PE devices
learn MAC addresses.
2. Data plane MAC learning by the PE device: This type of DCI requires that the PE device learns the MAC
addresses of both the local data center as well as the remote data centers.
3. Control plane MAC learning - This type of DCI requires that a local PE learn the local MAC addresses using the
control plane and then distribute these learned MAC addressed to the remote PEs.
Layer 3 Options
A Layer 3 DCI uses routing to interconnect data centers. Each data center must maintain a unique IP address space. A
Layer 3 DCI can be established using just about any IP capable link. Another important consideration for DCIs is
incorporating some level of redundancy by using link aggregation groups (LAGs), IGPs using equal cost multipath, and BGP or
MP-BGP using the mutipath or multihop features.

Customer Edge Devices

CE devices are located in the DC and usually perform standard switching or routing functions. CE devices can interface to PE
routers using virtually any Layer 2 technology and routing protocol.

Provider Edge Devices

PE devices are located at the edge of the data center or at the edge of a Service Provider’s network. They interface to the CE
routers on one side and to the IP/MPLS core routers on the other. PE devices maintain site-specific VPN route and forwarding
(VRF) tables. In a Layer 3 VPN scenario, the PE and CE routers function as routing peers (RIP, OSPF, BGP, etc), with the PE
router terminating the routing exchange between customer sites and the IP/MPLS core. In a Layer 2 VPN scenario, the PE’s
CE-facing interface is configured with matching VLAN-tagging to the CE’s PE-facing interfaces and any frames received from
the CE device will be forwarded over the MPLS backbone to the remote site.
Information is exchanged between PE routers using either MP-BGP or LDP. This information exchange allows the PE routers to
map data to and from the appropriate MPLS LSPs traversing the IP/MPLS core.
PE routers, and Ingress and Egress LSRs, use MPLS LSPs when forwarding customer VPN traffic between sites. LSP tunnels
in the interconnect network separate VPN traffic in the same fashion as PVCs in a legacy ATM or Frame Relay network.

Provider Routers
Provider (P) routers are located in the IP/MPLS core. These routers do not carry VPN data center routes, nor do they interface
in the VPN control and signaling planes. This is a key aspect of the RFC 4364 scalability model; only PE devices are aware of
VPN routes, and no single PE router must hold all VPN state information.
P routers are involved in the VPN forwarding plane where they act as label-switching routers (LSRs) performing label
swapping (and popping) operations.

VPN Site
A VPN site is a collection of devices that can communicate with each other without the need to transit the IP/MPLS
backbone (i.e., a single data center). A site can range from a single location with one switch or router to a network consisting
of many geographically diverse devices.

MPLS VPN Packet

The slide shows how VPN data is encapsulated in an MPLS VPN scenario (MPLS L3 VPN as an example). PE1 receives IP
packets destined for CE2. PE1 performs a lookup in its Green VRF table (the table associated with the PE-CE interface). The
route to CE2’s address should list three things in terms of next-hop. It will list the outgoing interface and the inner and outer
label that should be pushed onto the IP packet. The outer label is swapped by the P routers along the way to deliver the
MPLS packet to PE2. P3 performs a penultimate hop pop, leaving only single labeled packets and forwards them to PE2. PE2
receives the labeled packets, pops the inner label, and uses the inner label to determine which VRF table to use (PE2 might
have many VRF table). PE2 performs a lookup on the Green VRF table (because label 1000=Green VRF) and forwards the
original IP packets to CE2.

MPLS VPN Stitching

Sometimes data from one CE may need to pass through multiple VPNs before reaching the remote CE. The top diagram
shows the situation where packets enter the green VPN at PE1, get decapsulated at PE2, and then forwarded in their original
format to PE3 where they enter the red VPN. From PE2’s perspective, PE3 is the CE for the green VPN. From PE3’s
perspective, PE2 is the CE for the red VPN. You might think that you need 2 physical devices, PE2 and PE3 to “stitch” the two
VPNs together. Well, as the bottom diagram shows, you can actually “stitch” two VPNs together using a single MX Series
router. You can use the logical tunnel interface feature which are internal interfaces that allow you to connect two virtual
routers together. The two virtual routers enabled on the MX Series device would simply perform the same functions as PE2
and PE3 in the top diagram.

The LSP
The next few slides are going to discuss the details of MPLS Layer 3 VPNs. One thing to remember with Juniper Networks
routers is that once an LSP is established (from PE1 to PE2 in the diagram) the ingress PE will install a host route (/32) to the
loopback interface of the egress router in the inet.3 with a next-hop of the LSP (i.e. outbound interface of LSP and push a
label). This default behavior means that not all traffic entering PE1 can get routed through the LSP. So what traffic gets
routed over the LSP then?
Looking at the example in the slide, remember PE1 and PE2 are MP-BGP peers. That means that PE2 will advertise VPN
routes to PE1 using MP-BGP which will have a BGP next-hop of 2.2.2.2 (PE2’s loopback). For these VPN routes to be usable
by PE1, PE1 must find a route to reach 2.2.2.2 in the inet.3 table. PE1 will not look in inet.0 to resolve the nexthop of MPLS
VPN MP-BGP routes.

VPN-IPv4 Route
The VPN-IPv4 route has a very simple purpose which is to advertise IP routes. PE2 installs locally learned routes in its VRF
table. That includes the directly connected PE-CE interface as well as any routes PE2 learns from CE2 (RIP, OSPF, BGP, etc).
Once PE2 has locally learned routes in its VRF table, it advertises it (based on configured policy) to remote PEs and attaches
a target community, target community “Orange” in the example. PE1, upon receiving the route must decide on whether it
should keep the route. It makes this decision based on resolving the BGP nexthop in inet.3 as well as looking at the received
route target community. PE1, in order to accept and use this advertisement, must be configured with an import policy that
accepts routes tagged with the “Orange” target community. Without a configured policy that matches on the “Orange” route
target, PE1 would just discard the advertisement. So, at a minimum, each VRF on each participating PE for a given VPN must
be configured with an export policy that attaches a unique target community to routes and also configured with an import
policy that matches and accepts advertisements based on that unique target community.

Route Distinguisher Formats Defined

The route distinguisher can be formatted two ways:
• Type 0: This format uses a 2-byte administration field that codes the provider’s autonomous system number,
followed by a 4-byte assigned number field. The assigned number field is administered by the provider and should
be unique across the autonomous system.
• Type 1: This format uses a 4-byte administration field that is normally coded with the router ID (RID) of the
advertising PE router, followed by a 2-byte assigned number field that caries a unique value for each VRF table
supported by the PE router.
The examples on the slide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte
administration field with the 4-byte assigned number field (Type 0).

Route Target Community

Each VPN-IPv4 route advertised by a PE router contains one or more route target communities. These communities are added
using VRF export policy or explicit configuration.
When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target
matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose
configuration matches the route target.
Because the application of policy determines a VPN’s connectivity, you must take extra care when writing and applying VPN
policy to ensure that the tenant’s connectivity requirements are faithfully met.

VRF Export Policy

For each VRF, you must apply a VRF export policy. A VRF export policy determine which routes in a PE’s VRF table will be
advertised to remote PEs. A VRF export policy gives you complete control over the connectivity from one site to another
simply by either advertising or not advertising particular routes to a remote site. Another important function of the VRF
export policy is that it will also cause the advertised routes to be tagged with a target community. In the slide, PE2 has a
locally learned route (10.1.2/24, the network between PE2 and CE2) in its VRF table. To ensure CE1 and PE1 can send data
to CE2, PE2 has an VRF export policy applied to its IBGP neighbor, PE1, which advertises locally learned routes tagged with
the target community, target:1:1. The next slide shows PE1’s process of installing the VPN-IPv4 route in its own VRF table.

VRF Import Policy

The slide shows the process that PE1 goes through when it receives a VPN-IPv4 route advertisement PE2. There is an
assumption that PE1 is configured with a VRF import policy that accepts the same route target, target:1:1, that PE2 is
attaching to its VPN routes.

DCI Options for a VXLAN Overlay


EVPN DCI Option

This slide shows the 4 options for DCI when the data centers are enabled for VXLAN using EVPN signaling. The next few
slides discuss each option in detail.

Over the Top (OTT) of an L3VPN

The slide shows an example of the signaling and data plane when using EVPN/VXLAN over a Layer 3 VPN. The two MX Series
devices are the PE routers for the Layer 3 VPN. The layer 3 VPN can be over a private MPLS network or could be a purchased
Service Provider service. From the two QFX perspectives, they are separated by an IP network. The QFXs simply forward
VXLAN packets between each other based on the MAC addresses learned through EVPN signaling. The MX devices have an
MPLS layer 3 VPN between each other (Bidirectional MPLS LSPs, IGP, L3 VPN MP-BGP routing, etc). The MXs advertise the
local QFX’s loopback address to the other MX.
When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN
packet destined to QFX2’s loopback address. MX1 performs a lookup for the received packet on the VRF table associated
with the VPN interface (the incoming interface) and encapsulates the VXLAN packet into two MPLS headers (outer for MPLS
LSP, inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to
determine the VRF table so that it can route the remaining VXLAN packet to QFX2. QFX2 strips the VXLAN encapsulation and
forwards the original Ethernet frame to the destination host.

EVPN Stitching over MPLS Network

The slide shows an example of the signaling/data plane when using EVPN stitching between three EVPNs; EVPN/VXLAN
between QFX1 and MX1, EVPN/MPLS between MX1 and MX2, and an EVPN/VXLAN between MX2 and QFX2. Each EVPN is
signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces.
packet destined to MX1’s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel
interface. MX1 takes the locally received Ethernet frame and encapsulates it in two MPLS headers (outer for MPLS LSP,
inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to determine
the appropriate VRF and outgoing interface. MX2 forwards the remaining Ethernet frame out of a logical tunnel interface.
MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received
Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2’s loopback address. QFX2 strips the VXLAN
encapsulation and forwards the remaining Ethernet frame to the destination host.

Over the Top (OTT) of a Public IP Network

The slide shows an example of the signaling/data plane when using EVPN stitching between three EVPNs; EVPN/VXLAN
between QFX1 and MX1, EVPN/VXLAN between MX1 and MX2, and EVPN/VXLAN between MX2 and QFX2. Each EVPN is
signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces.
packet destined to MX1’s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel
interface. MX1 takes the locally received Ethernet frame and encapsulates it in a VXLAN packet destined to MX2’s loopback
address. MX2 strips the VXLAN encapsulation and forwards the remaining Ethernet frame out of a logical tunnel interface.
MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received
Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2’s loopback address. QFX2 strips the VXLAN
encapsulation and forwards the remaining Ethernet frame to the destination host.

EVPN over IP
The slide shows an example of the signaling/data plane when using EVPN/VXLAN over an IP network. EVPN MP-BGP is used
to synchronize MAC tables.
packet destined to MX1’s loopback address. QFX2 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame to the destination host.

EVPN Type 5 Routes


Stretching Subnets
The slides shows the EVPN Type 2 MAC Advertisements that must be exchanged between data centers when individual
subnets are stretched between data centers. Notice that Host1 and Host2 are attached to the same subnet. The example
shows the advertisement of just a single MAC addresses. However, in a real environment you might see 1000s of MAC
addresses advertised between data centers. That is a bunch of routes! MAC moves, adds, and changes in one data center
will actually effect the MAC tables/EVPN routing exchanges in another data center.

Unique Subnets
The EVPN Type 5 IP Prefix route can be used in a DCI situation in which the IP subnets between data centers are completely
unique. Notice that Host1 and Host2 are attached to different subnets. This fact is very important to the discussion. In this
situation, if Host1 needs to send an IP packet to Host2 it will send it to its default gateway which is the IRB of PE1. Leaf 1 will
encapsulate the Ethernet frames from Host1 into VXLAN and send the VXLAN packets to PE1. PE1 will strip the VXLAN
header and notice that the remaining Ethernet frames from Leaf1 have a destination MAC of its own IRB. It will strip the
Ethernet header and route the remaining IP packet based on the routing table associated with the IRB interface. PE1 will use
the EVPN Type 5 route that was received from PE2 for the 10.1.2/24 network and the packet will be forwarded over the
VXLAN tunnel between PE1 and PE2. You might ask yourself, “Why couldn’t PE1 use a standard IP route? Why does the
10.1.2/24 network need to be advertise by an EVPN Type 5 route?” The answer is that the Type 5 route allows for inter-data
center traffic to be forwarded over VXLAN tunnels (i.e. the end to end VXLAN-based VPN is maintained between data
centers). This is very similar to stitching concept discussed earlier. PE2 then receives the VXLAN encapsulated packet and
forwards the remaining IP packet towards the destination over the IRB interface (while encapsulating the IP packet in an
Ethernet header with a destination MAC of Host2). Finally, PE performs a MAC table lookup and forwards the Ethernet frame
over the VXLAN tunnel between PE2 and Leaf2.

DCI Example

Option 1 Example Topology, Part 1

The slide shows the topology that will serve as the underlay network. It is based on EBGP routing between the routers in the
same data center. In AS 64555, PE and P routers will run OSPF to advertise each router’s loopback address. They will run
LDP to automatically establish MPLS LSPs to each other’s loopback address. Finally, each PE will establish a VPN-IPv4
MP-IBGP session with each other. The PEs will exchange locally learned routes (the loopback addresses of the Leaf nodes)
so that they Leaf nodes can establish the overlay network (next slide).

Option 1 Example Topology, Part 2

Once the underlay network is established, each Leaf node will have learned a route from the local PE (using the EBGP
session) to reach the loopback address (VTEP source address) of the remote Leaf. Leaf1 and Leaf2 will act as VXLAN Layer 2
Gateways and also establish an EVPN MP-IBGP session with each other to exchange EVPN routes to advertise locally learned
MACs to the remote Leaf. Host A and Host B will be able to communicate as if they were on the same LAN segment.

PE1’s MPLS Configuration

The slide shows the MPLS configuration of PE1.

PE1’s MPLS Status

Any easy way to see that there are MPLS LSPs established when using LDP signaling is to view the inet.3 table. If there is a
route in the inet.3 table to the remote PE’s loopback address then there is a unidirectional MPLS LSP established to the
remote PE. Remember, there needs to be an MPLS LSP established in each direction so you must check the inet.3 table on
both PEs.

VRF Configuration
The slide shows the VRF configuration for PE1. Notice the use of the vrf-target statement. Originally, VRF import policies
could only be enabled by writing explicit policies under [edit policy-options] and applying them using the
vrf-import and vrf-export statements. However, more recent versions of the Junos Operating System allow you to
skip those steps and simple configure a single vrf-target statement. The vrf-target statement actually enables two
hidden policies. One policy is an VRF export policy that takes all locally learned routes in the VRF (direct interface routes as
well as routes learned from the local CE) and advertises them to the remote PE tagged with the specified target community.
The other policy is an VRF import policy that will accept all VPN-IPv4 routes learned from remote PEs that are tagged with the
specified target community.

MP-BGP Routing
The slide shows how to enable VPN-IPv4 signaling between PEs. Use the show bgp summary command to verify that the
MP-BGP neighbor relationship is established and that the PE is receiving routes from the remote neighbor.

VRF Table
Remember, the main purpose of establishing an underlay network and the DCI is to ensure that the routers in each site can
reach the loopback addresses (VTEP source addresses) of the remote Leaf nodes. The slide shows that PE1 has learned the
loopback address of Leaf2.

Leaf1 Configuration
The slide shows the underlay and overlay network configuration of Leaf1. Leaf2 would be configured very similarly.

We Discussed:
• The meaning of the term Data Center Interconnect;
• The control and data plane of an MPLS VPN; and
• The DCI options when using a VXLAN overlay with EVPN signaling.

Review Questions
1.
2.
3.

Lab: Data Center Interconnect


1.
A DCI can be provided by a point-to-point link, and IP network, or an MPLS network.
2.
The VPN-IPv4 NLRI includes an MPLS label, the route distinguisher, and an IP prefix. A target community is also tagged to the
route but it is not officially part of the NLRI.
3.
When the transport network of a DCI is a public IP network, the option available for a DCI is option 3.

Adanced Data Center Switching
Resources to Help You Learn More

The slide lists online resources available to learn more about Juniper Networks and technology. These resources include the
following sites:
• Pathfinder: An information experience hub that provides centralized product details.
• Content Explorer: Junos OS and ScreenOS software feature information to find the right software release and
hardware platform for your network.
• Feature Explorer: Technical documentation for Junos OS-based products by product, task, and software release,
and downloadable documentation PDFs.
• Learning Bytes: Concise tips and instructions on specific features and functions of Juniper technologies.
• Installation and configuration courses: Over 60 free Web-based training courses on product installation and
configuration (just choose eLearning under Delivery Modality).
• J-Net Forum: Training, certification, and career topics to discuss with your peers.
• Juniper Networks Certification Program: Complete details on the certification program, including tracks, exam
details, promotions, and how to get started.
• Technical courses: A complete list of instructor-led, hands-on courses and self-paced, eLearning courses.
• Translation tools: Several online translation tools to help simplify migration tasks.
www.juniper.net
Adanced Data Center Switching
www.juniper.net
Appendix A: Troubleshooting Basics

We Will Discuss:
• Available troubleshooting tools; and
• A basic troubleshooting approach.
Appendix A–2 • Troubleshooting Basics www.juniper.net

Troubleshooting Tools
www.juniper.net Troubleshooting Basics • Appendix A–3

Troubleshooting Tools
A number of tools can be useful when troubleshooting Layer 2 environments using the QFX5100 Series switches. The slide
introduces some of the tools designed to aid in troubleshooting efforts. We cover some of these tools in more detail
throughout the remainder of this section.

Visual Indicators
Using the visual indicators on a physical device is one of the most basic troubleshooting tools. While there are other CLI
commands that will provide alarm and link status for system components, using the visual indicators (or the CLI equivalent
shown on the slide) can quickly help you determine if alarms exist or the status of a given interface.
For detailed coverage of the QFX5100 Series switches and their LEDs refer to the hardware documentation at: http://
www.juniper.net/techpubs/en_US/release-independent/junos/information-products/pathway-pages/hardware/qfx-series/
qfx5100.html.

CLI Commands and Outputs

You can use the Junos OS CLI commands and their corresponding outputs to gather information when troubleshooting
issues. Once you have some general idea of where the problem is you can use the CLI commands specific to that part of the
system, whether it be related to a system service, a running protocol, an interface or some other element.
The slide provides some CLI command examples with relevant details for each command listed.
Note
Note that the list of commands on the slide is not exhaustive but
rather a sample of CLI commands that are often used. Refer to the
technical publications for an exhaustive list of available commands.

System Logs: A Review

System logs, also known as syslog, are used to gather information about the system and the processes running on the
system. Syslog operations use a UNIX syslog-style mechanism to record system-wide, high-level operations, such as
interfaces going up or down or users logging in to or out of the router. You configure these operations by using the syslog
statement at the [edit system] hierarchy level and the options statement at the [edit routing-options]
hierarchy level. The results of tracing and logging operations go in files the router stores in the /var/log directory. You use
the show log file-name command to display the contents of these files.
The primary system log file is the messages file. However, some of the processes that run under the Junos OS maintain
their own log files named after their respective process. No requirement exists to configure the router to keep these logs.
Note that in many cases, the software also writes the entries found in these logs to the messages file.
The entries written to individual process log files are also write into the main syslog file (messages). Generally speaking, you
begin by analyzing the messages file for signs of trouble. Once you identify trouble relating to a particular process, you can
parse or monitor the files of that process to reduce the amount of information you must go through.

Traceoptions: A Review
Tracing operations allow you to monitor the operation of various protocols by decoding the sent and received protocol
packets. In many ways, tracing is synonymous with the debug function on equipment made by other vendors. Note that
because of the design of hardware-based Juniper Networks platforms, you can enable reasonably detailed tracing in a
production network without negative impact on overall performance or packet forwarding.
In most cases when you enable tracing (through configuration), you create a trace file that stores decoded protocol
information. You analyze these files using standard CLI log file syntax such as show log logfile-name. While you can
enable detailed tracing in a production network without significantly impacting performance, it is still recommended that you
turn tracing off once you complete your testing to avoid unnecessary resource consumption.
The slide shows a generic tracing stanza that, if applied to the [edit protocols] portion of the configuration hierarchy,
would result in tracing of the specified protocol’s events. Specified protocol tracing operations track the flagged operations
and record them in the specified log file.

Traceoptions: A Review (contd.)
The following are configuration options for tracing:
• file filename: Specifies the name of the file in which to store information.
• size size: Specifies the maximum size of each trace file, in kilobytes (KB), megabytes (MB), or gigabytes
(GB). When a trace file named trace-file reaches this maximum size, it’s compressed and renamed to
trace-file.0.gz. When the trace file again reaches its maximum size, trace-file.0.gz is renamed
trace-file.1.gz, and trace-file is compressed and renamed trace-file.0.gz. This renaming scheme
continues until it reaches the maximum number of allowable trace files. The software then overwrites the
oldest trace file. If you do not specify a maximum number of trace files with the files option, the default
number of files to keep is ten. If you specify a maximum file size, you also must specify a maximum number of
trace files with the files option. You can use xk, xm, or xg to specify kilobytes, megabytes, or gigabytes,
respectively. The default size is 128 KB.
• flag flag: Specifies a tracing operation to perform. You can specify multiple flags.
• files number: Specifies the maximum number of trace files. When a trace file named trace-file
reaches its maximum size, the Junos OS renames it trace-file.0, then trace-file.1, and so on, until it
reaches the maximum number of trace files. The software then overwrites the oldest trace file. The default is
ten files.
Including the traceoptions statement at the [edit interfaces interface-name] hierarchy level allows you to trace
the operations of your system’s interfaces. You can also trace the operations of the interface process, which is the device
control process (DCD).
When tracing a specific interface, the software does not support the specification of a trace file. The Junos kernel does the
logging in this case, so the software places the tracing information in the system’s messages file. In contrast, global
interface tracing supports an archive file; by default it uses /var/log/dcd for global interface tracing.

Junos Space and Network Director

Junos Space is a network management platform that, along with added applications such as Network Director, can be used
to manage, monitor and troubleshoot QFX5100 Series switches in different architectural environments. As indicated on the
slide, there are some specific sections or modes within Network Director that may prove especially helpful when monitoring
and troubleshooting your switches. The Monitor, Fault, and Report modes are specifically designed to aid with
monitoring and troubleshooting efforts. We highlight these modes in more detail later in this section.
Note that complete coverage of Junos Space and Network Director are outside the scope of this course but can be obtained
through other Juniper Networks Education Services course offerings.

Monitor Mode
This slide covers the Monitor mode, which is where you monitor traffic, system utilization, sessions, system status. You can
also perform some troubleshooting operations in this mode as well as any other system verification tasks performed through
Network Director.

Fault Mode
This slide covers the Fault mode, which is where you verify and manage fault management of the switches through
Network Director.

Report Mode
This slide covers the Report mode, which is where you generate, manage, and run reports for managed devices through
Network Director.

A Troubleshooting Approach

Know Your Environment

To effectively troubleshoot any network environment you must be familiar with the individual components of the system and
know what is normal for the surrounding environment. As previously mentioned many tools allow you to become familiar with
your switches and their environment. You can use tools, such as Junos Space, sFlow and SNMP, to monitor your system’s
environment and establish a baseline.
Without a detailed understanding of your environment, troubleshooting issues can be difficult and you might actually end up
causing more problems by troubleshooting something that is operating as expected.

A Basic Troubleshooting Approach

When troubleshooting is required, you must take a structured approach. Before attempting to fix a problem that might or
might not exist, you should gather as much information as possible. Gathering information is the first step in the highlighted
troubleshooting process shown on the slide.
When gathering information, it is helpful to get answers to key questions relevant to the situation. The answers to these key
questions should provide detailed information about the symptoms related to the issue. Use the available tools and
resources to help gather relevant information. Ultimately it is the information gathered that will lead you to the problem and
help you identify a solution.
To help expedite the time to resolve an issue, identify possible root causes. Common issues often fall into one of the
following three categories:
• Configuration: Configuration issues could be as simple as a missed virtual LAN (VLAN) tag or more complicated
like a spanning tree setting that affects the entire network.
• Physical: Physical issues can include, but are not limited to, faulty hardware as well as faulty cabling or fiber.
• Software: Software issues are often referred to as bugs and are problems in the software coding.

A Basic Troubleshooting Approach (contd.)
After narrowing down the problem, create an action plan to prove or disprove the possible cause. Creating an action plan to
test each theory of what might be causing the issue is the second step highlighted on the slide.
Each plan you create should include the steps to prove or disprove the possible cause and how success is defined. It is also
a good idea to have a contingency plan just in case the changes associated with a test make the situation worse. A good
action plan allows you to revert back to the previous state very quickly.
The final step in this basic troubleshooting method is to execute the proposed test outlined in the previous step. If a given
test does not resolve the issue, rollback any changes associated with that test and move on to the next test from the original
starting point. This process will help reduce the likelihood of introducing any new issues or seeing any unexpected issues
later on.
If none of your tests resolve the issue, return to step one and gather additional information. You might need to go through
this entire troubleshooting process multiple times before identifying a resolution to the problem. If you have exhausted all of
your local resources, consider working with JTAC. We discuss some key considerations when working with JTAC later in this
section.

Using a Layered Troubleshooting Approach

In many situations the reported symptoms point to either the control plane or the data plane. As shown on the slide, some
common symptoms related to the data plane include physical errors, intermittent connectivity, and dropped packets. Some
common symptoms related to the control plane include incorrect or missing forwarding paths at Layer 2 and Layer 3. Using
this simplistic layered approach when troubleshooting can help expedite the time to resolve the issue and return system
operations to the desired state.

Hardware Troubleshooting Flowchart

After looking over the configuration and ensuring all configuration elements are in place, you may consider inspecting the
system’s hardware. The slide provides a basic flowchart that can help when troubleshooting hardware. The commands listed
on the slide are in no way a definitive list that you should always use when troubleshooting potential hardware issues but
rather an example of a recommended starting point. Your actual approach and the commands you use can vary!

Software Troubleshooting Flowchart

The slide provides a basic flowchart that can help when troubleshooting software. As with the previous slide, the information
shown on this slide is in no way definitive and the approach and commands you use can vary! Note that in some cases it
might be necessary to execute the same command multiple times over a brief period of time to see patterns or indications of
problems.
Some software issues might be related to a malfunctioning process. The slide lists some of the key processes used on
QFX5100 Series switches. These processes are responsible for individual functions on the system as follows:
• chassisd: This process controls hardware components on the switch.
• l2ald: This process is responsible for MAC address learning on the switch.
• l2cpd: This process is responsible for spanning tree operations on the switch.
• dcd: This process is responsible for managing the interfaces on the switch.
• vccpd: This process is responsible for Virtual Chassis operations on and between switches.
• rpd: This process is responsible for all routing operations on the switch.
These processes can be restarted using the CLI, but this approach should be used only as a last resort. Restarting a process
might resolve an issue, but it makes determining the root cause very difficult. Restarting a process can also have a
cascading and adverse effect on other processes that may impact system operations.

Working with JTAC

Layer 2 environments and the systems that support them can be complex! Never hesitate to contact JTAC should you need
assistance with your troubleshooting efforts. The domestic and international phone numbers for JTAC are on the slide.
If support from JTAC is requested, they will need a clear explanation of the symptoms along with information from your
systems. In almost all cases they will request that you send in the output from the request support information
command and most likely some logs along with a summary of the troubleshooting steps you have already tried.

We Discussed:
• Available troubleshooting tools; and
• A basic troubleshooting approach.

Review Questions
1.
2.
3.

1.
Some of the troubleshooting tools you could use when investigating issues on QFX5100 Series switches include visual
indicators, CLI commands with their corresponding outputs, system logs, traceoptions, and Junos Space and its Network
Director application.
2.
The three basic troubleshooting steps outlined in this chapter include: gather information, create an action plan, and test
possible solutions.
3.
Some key process that run on QFX5100 Series switches include the chassisd, l2cpd, vccpd, l2ald, dcd, and rpd. processes.
The vccpd process is responsible for controlling Virtual Chassis operations on and between participating switches.

Acronym List
AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . aggregation device
AFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Family Identifier
BGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Border Gateway Protocol
BUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . broadcast, unknown unicast, and multicast
CapEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .capital expenses
CE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . customer edge
CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .command-line interface
CSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Control and Status Protocol
DCI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Center Interconnect
EVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EVPN Instance
FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibre Channel over Ethernet
FCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Frame Check Sequence
FEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . forwarding equivalence class
GRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .generic routing encapsulation
GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .graphical user interface
IBGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . internal BGP
IGMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet Group Management Protocol
IGP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . interior gateway protocol
IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IP version 6
JNCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juniper Networks Certification Program
LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . link aggregation group
LSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . label switched path
LSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . label-switching router
MAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . media access control
MC-LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Multichassis Link Aggregation
MP-BGP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi-protocol Border Gateway Protocol
MPLS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiprotocol Label Switching
OpEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . operating expenditures
OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . operating system
P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider
PE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider edge
PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .penultimate-hop popping
PIM-SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Protocol Independent Multicast Sparse Mode
RID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . router ID
RP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point
RPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point tree
SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . satellite device
STP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spanning Tree Protocol
VC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis
VCF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis Fabric
VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .virtual machine
VPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . virtual private network
VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VPN route and forwarding
VTEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VXLAN Tunnel End Point
VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual eXtensible Local Area Network
www.juniper.net Acronym List • ACR–1

ACR–2 • Acronym List www.juniper.net
Corporate and Sales Headquarters
Juniper Networks, Inc.
1133 Innovation Way
Sunnyvale, CA 94089 USA
Phone: 888.JUNIPER (888.586.4737)
or 408.745.2000
Fax: 408.745.2100
www.juniper.net
APAC and EMEA Headquarters

Juniper Networks International B.V.
Boeing Avenue 240
1110 PZ SCHIPHOL-RIJK
Amsterdam, The Netherlands
Phone: 31.0.207.125.700
Fax: 31.0.207.125.701
Copyright 2017
Juniper Networks, Inc. All rights reserved.
Juniper Networks, the Juniper Networks logo, Junos, NetScreen, and ScreenOS
are registered trademarks of Juniper Networks, Inc. in the United States and
other countries. All other trademarks, services marks, registered marks, or
registered services marks are the property of their respective owners. Juniper
Networks assumes no responsibility for any inaccuracies in this document.
Juniper Networks reserves the right to change, modify, transfer, or otherwise
revise this publication without notice.
Printed on recycled paper
Education Services
EDU-JUN-ADCX, Courseware
Revision 17.a

ADCX 17a SG Vol2 PDF

Uploaded by

Copyright:

Available Formats

ADCX 17a SG Vol2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ADCX 17a SG Vol2 PDF

Uploaded by

Copyright:

Available Formats

What is the document about?

What is the document about?

What are some of the key terms mentioned?

What are some of the key terms mentioned?

Advanced Data Center Switching

STUDENT GUIDE – Volume 2 of 2 Revision 17.a

Education Services Courseware

Worldwide Education Services

1133 Innovation Way

Course Number: EDU-JUN-ADCX

Chapter 11: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1

Chapter 12: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

Chapter 13: EVPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1

Chapter 14: Data Center Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1

Appendix A: Troubleshooting Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Acronym List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACR-1

www.juniper.net Contents • iii

www.juniper.net Course Overview • v

vi • Course Overview www.juniper.net

www.juniper.net Course Agenda • vii

CLI and GUI Text

Style Description Usage Example

Courier New Console text:

Input Text Versus Output Text

Style Description Usage Example

Normal CLI No distinguishing variant. Physical interface:fxp0,

Defined and Undefined Syntax Variables

Style Description Usage Example

CLI Variable Text where variable value is policy my-peers

viii • Document Conventions www.juniper.net

Education Services Offerings

www.juniper.net Additional Information • ix

Chapter 11: IP Fabric

Chapter 11–2 • IP Fabric www.juniper.net

www.juniper.net IP Fabric • Chapter 11–3

Chapter 11–4 • IP Fabric www.juniper.net

A Three Stage Clos Network

www.juniper.net IP Fabric • Chapter 11–5

An IP Fabric Is Based on a Clos Fabric

Chapter 11–6 • IP Fabric www.juniper.net

Spine and Leaf Architecture, Part 1

www.juniper.net IP Fabric • Chapter 11–7

Spine and Leaf Architecture, Part 2

Chapter 11–8 • IP Fabric www.juniper.net

IP Fabric Design Options

www.juniper.net IP Fabric • Chapter 11–9

Recommended Spine Nodes

Chapter 11–10 • IP Fabric www.juniper.net

Recommended Leaf Nodes

www.juniper.net IP Fabric • Chapter 11–11

Chapter 11–12 • IP Fabric www.juniper.net

Routing Strategy, Part 1

www.juniper.net IP Fabric • Chapter 11–13

Routing Strategy, Part 2

Chapter 11–14 • IP Fabric www.juniper.net

www.juniper.net IP Fabric • Chapter 11–15

Chapter 11–16 • IP Fabric www.juniper.net

www.juniper.net IP Fabric • Chapter 11–17

Chapter 11–18 • IP Fabric www.juniper.net