ADCX 17a SG Vol2 PDF
ADCX 17a SG Vol2 PDF
ADCX 17a SG Vol2 PDF
Student Guide
Volume 2
Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.
YEAR 2000 NOTICE
Juniper Networks hardware and software products do not suffer from Year 2000 problems and hence are Year 2000 compliant. The Junos operating system has no known
time-related limitations through the year 2038. However, the NTP application is known to have some difficulty in the year 2036.
SOFTWARE LICENSE
The terms and conditions for using Juniper Networks software are described in the software license provided with the software, or to the extent applicable, in an agreement
executed between you and Juniper Networks, or Juniper Networks agent. By using Juniper Networks software, you indicate that you understand and agree to be bound by its
license terms and conditions. Generally speaking, the software license restricts the manner in which you are permitted to use the Juniper Networks software, may contain
prohibitions against certain uses, and may state conditions under which the license is automatically terminated. You should consult the software license for further details.
Contents
This five-day course provides a comprehensive focus on Juniper Networks data center switching technologies.
The first three days are designed to introduce the data center features including zero touch provisioning (ZTP), unified
in-service software upgrade (ISSU), multichassis link aggregation (MC-LAG), Mixed Virtual Fabric, and Virtual Chassis
Fabric (VCF) and provide students with knowledge of troubleshooting some of the key data center features including
MC-LAG, Virtual Chassis, and VCF deployments.
The last two days of the course are designed to introduce data center features that are more advanced including IP
Fabric, Virtual eXtensible Local Area Network (VXLAN) Layer 2 and Layer 3 Gateways, VXLAN with Ethernet VPN (EVPN)
signaling, and Data Center Interconnect (DCI) for a VXLAN overlay.
Students will learn to configure and monitor these features on the Junos operating system running on the QFX5100,
EX4300, and vMX Series platforms. Through demonstrations and hands-on labs, students will gain experience
configuring, monitoring, troubleshooting, and analyzing the mentioned features of the Junos OS. This content is based
on Junos OS Release 17.1R1.8.
Course Level
Advanced Data Center Switching (ADCX) begins at an intermediate-level course, and finishes at an advanced level.
Intended Audience
This course benefits individuals responsible for configuring, monitoring, and troubleshooting data center features that
exist on the Junos OS running on data center-oriented platforms such as EX Series, QFX Series, MX Series, and vMX
Series devices. This includes individuals in professional services, sales and support organizations, and the end users.
Prerequisites
The following are the prerequisites for this course:
• Understanding of the OSI model;
• Advanced routing knowledge—the Advanced Junos Enterprise Routing (AJER) course or equivalent
knowledge; and
• Intermediate switching knowledge—the Junos Enterprise Switching Using Enhanced Layer 2 Software
(JEX-ELS) course or equivalent knowledge.
Objectives
After successfully completing this course, you should be able to:
• Identify current challenges in today’s data center environments and explain how the QFX5100 system
solves some of those challenges.
• List the various models of QFX5100 Series switches.
• List some data center architecture options.
• Explain the purpose and value of ZTP.
• Describe the components and operations of ZTP.
• Deploy a QFX5100 Series switch using ZTP.
• Explain the purpose and value of ISSU.
• Describe the components and operations of ISSU.
• Upgrade a QFX5100 Series switch using ISSU.
• Explain the purpose and value of MC-LAG.
• Describe the components and operations of MC-LAG.
• Implement an MC-LAG on QFX5100 Series switches.
• Describe key concepts and components of a mixed Virtual Chassis.
• Explain the operational details of a mixed Virtual Chassis.
Day 1
Chapter 1: Course Introduction
Chapter 2: System Overview
Chapter 3: Zero Touch Provisioning
Lab 1: Zero Touch Provisioning
Chapter 4: In-Service Software Upgrade
Lab 2: In-Service Software Upgrade
Day 2
Chapter 5: MC-LAG
Lab 3: MC-LAG
Chapter 6: Troubleshooting Multichassis LAG
Lab 4: Troubleshooting Multichassis LAG
Chapter 7: Mixed Virtual Chassis
Lab 5: Mixed Virtual Chassis
Day 3
Chapter 8: Virtual Chassis Fabric
Chapter 9: Virtual Chassis Fabric Management
Lab 6: Virtual Chassis Fabric
Chapter 10: Troubleshooting Virtual Chassis Technologies
Lab 7: Troubleshooting Virtual Chassis Technologies
Day 4
Chapter 11: IP Fabric
Lab 8: IP Fabric
Chapter 12: VXLAN
Lab 9: VXLAN
Day 5
Chapter 13: EVPN
Lab 10: VXLAN with EVPN Signaling
Chapter 14: Data Center Interconnect
Lab 11: DCI
Franklin Gothic Normal text. Most of what you read in the Lab Guide
and Student Guide.
CLI Input Text that you must enter. lab@San_Jose> show route
GUI Input Select File > Save, and type
config.ini in the Filename field.
CLI Undefined Text where the variable’s value is Type set policy policy-name.
the user’s discretion or text where
ping 10.0.x.y
the variable’s value as shown in
GUI Undefined the lab guide might differ from the Select File > Save, and type
value the user must input filename in the Filename field.
according to the lab topology.
We Will Discuss:
• Routing in an IP Fabric;
• Scaling of an IP Fabric; and
• Configuring an IP Fabric.
IP Fabric Overview
The slide lists the topics we will discuss. We discuss the highlighted topic first.
IP Fabric
An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly
using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers
that must accommodate multiple vendors. Some of the most complicated tasks in building an IP Fabric are assigning all of
the details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other
implementation details. Throughout this chapter we refer to the devices as nodes (Spine-nodes and leaf-nodes). Keep in
mind that all devices in an IP fabric are basically just layer 3 routers that rely on routing information to make forwarding
decisions.
IP Fabric Routing
The slide highlights the topic we discuss next.
Layer 3 Connectivity
Remember that your IP Fabric will be forwarding IP data only. Each node is basically an IP router. In order to forward IP
packets between routers, they need to exchange IP routes. So, you have to make a choice between routing protocols. You
want to ensure that your choice of routing protocol is scalable and future proof. As you can see by the chart, BGP is the
natural choice for a routing protocol.
IBGP, Part 1
IBGP is a valid choice as the routing protocol for your fabric. IBGP peers almost always peer to loopback addresses as
opposed to physical interface addresses. In order to establish a BGP session (over a TCP session), a router must have a route
to the loopback address of its neighbor. To learn the route to a neighbor an Interior Gateway Protocol (IGP) like OSPF must be
enabled in the network. One purpose of enabling an IGP is simply to ensure every router knows how to get to the loopback
address of all other routers. Another problem that OSPF will solve is determining all of the equal cost paths to remote
destinations. For example, router A will determine from OSPF that there are 2 equal cost paths to reach router B. Now
router A can load share traffic destined for router B’s loopback address (IBGP learned routes, see next few slides) across the
two links towards router B.
IBGP, Part 2
There is a requirement in an IBGP network that if one IBGP router needs to advertise an IBGP route, then every other IBGP
router must receive a copy of that route (to prevent black holes). One way to ensure this happens is to have every IBGP router
peer with every other IBGP router (a full mesh). This works fine but it does not scale (i.e., add a new router to your IP fabric
and you will have to configure every router in your IP fabric with a new peer). There are two ways to help scale the full mesh
issue; route reflection or confederations. Most often, it is route reflection that is chosen (it is easy to implement). It is
possible to have redundant route reflectors as well (shown on the slide). It is best practice to configure one or more of the
Spine nodes as route reflectors.
IBGP, Part 3
Note: The next few slides will highlight the problem faced by a Spine node (router D) that is NOT a route reflector.
You must build your IP Fabric such that all routers load share traffic over equal cost paths (when they exist) towards remote
networks. Each router should be configured for BGP multipath so that they will load share when multiple BGP routes exist.
The slide shows that router A and B advertise the 10.1/16 network to RR-A. RR-A will use both routes for forwarding
(multipath) but will chose only one of those routes (the one from router B because it B has the lowest router ID) to send to
router C (a Leaf node) and router D (a Spine node). Router C and router D will receive the route for 10.1/16. Both copies will
have a BGP next hop of router B’s loopback address. This is the default behavior of route advertisement and selection in the
IBGP with route reflection scenario.
Did you notice the load balancing problem (Hint: the problem is not on router C)? Since router C has two equal cost paths to
get to router B (learned from OSPF), router C will load share traffic to 10.1/16 over the two uplinks towards the Spine routers.
The load balancing problem lies on router D. Since router D received a single route that has a BGP next hop of router B’s
loopback, it forwards all traffic destined to 10.1/16 towards router B. The path to router A (which is an equal cost path to
10.1/16) will never be used in this case. The next slide discusses the solution to this problem.
It should be worth noting that although router C has no problem load sharing towards the 10.1/16 network, if router B were
to fail, it may take some time for router C to learn about the router through router A. The next slide discusses the solution to
this problem.
IBGP, Part 4
The problem on RR-A is that it sees the routes received from routers A and B, 10.1/16, as a single route that has been
received twice. If an IBGP router receives different versions of the same route it is supposed to make a choice between them
and then advertise the one, chosen route to its appropriate neighbors. One solution to this problem is to make every Spine
node a route reflector. This would be fine in a small fabric but probably would not make sense when there are 10s of Spine
nodes. Another option would be to make each of the advertisements from router A and B look like unique routes. How can
we make the multiple advertisements of 10.1/16 from router A and B appear to be unique routes? There is a draft RFC
(draft-ietf-idr-add-paths) that defines the ADD-PATH capability which does just that; makes the advertisements look unique.
All routers in the IP Fabric should support this capability for it to work. Once enabled, routers advertise and evaluate routes
based on a tuple of the network and its path ID. In the example, router A and B advertise the 10.1/16 route. However, this
time, every router supports the ADD-PATH capability, RR-A attaches a unique path ID to each route and is able to advertise
both routes to all clients including router D. When the routes arrive on the clients, the clients install both routes in its routing
table (allowing them to load share towards routers A and B.) Although, router C was already able to load share without the
additional route, router C will be able to continue forwarding traffic to 10.1/16 even in the event of a failure of either router A
or router B.
EBGP, Part 1
EBGP is also a valid design to use in your IP Fabric. You will notice that the load balancing problem is much easier to fix in the
EBGP scenario. For example, there will be no need for the routers to support any draft RFCs! Generally, each router in an IP
Fabric should be in its own unique AS. You can use AS numbers from the private or public range or, if you will need
thousands of AS numbers, you can use 32-bit AS numbers.
EBGP, Part 2
In an EBGP-based fabric, there is no need for route reflectors or an IGP. The BGP peering sessions parallel the physical
wiring. For example, every Leaf node has a BGP peering session with every Spine node. There is no leaf-to-leaf or
spine-to-spine BGP sessions just like there is no leaf-to-leaf or spine-to-spine physical connectivity. EBGP peering is done
using the physical interface IP addresses (not loopback interfaces). To enable proper load balancing, all routers need to be
configured for multipath multiple-as as well as a load balancing policy. Both of these configurations will be covered
later in this chapter.
EBGP, Part 3
The slide shows that the router in AS64516 and AS64517 are advertising 10.1/16 to their 2 EBGP peers. Because
multipath multiple-as is configured on all routers, the receiving routers in AS64512 and AS64513 will install both
routes in their routing table and load share traffic destined to 10.1/16.
EBGP, Part 4
The slide shows that the routers in AS64512 and AS64513 are advertising 10.1/16 to all of their EBGP peers (all Leaf
nodes). Since multipath multiple-as is configured on all routers, the receiving router in the slide, the router in
AS64514, will install both routes in its routing table and load share traffic destined to 10.1/16.
Best Practices
When enabling an IP fabric you should follow some best practices. Remember, two of the main goals of an IP fabric design
(or a Clos design) is to provide a non-blocking architecture that also provides predictable load-balancing behavior.
Some of the best practices that should be followed include...
• All Spine nodes should be the exact same type of router. They should be the same model and they should also
have the same line cards installed. This helps the fabric to have a predictable load balancing behavior.
• All Leaf nodes should be the exact same type of router. Leaf nodes do not have to be the same router as the
Spine nodes. Each Leaf node should be the same model and they should also have the same line cards
installed. This helps the fabric to have a predictable load balancing behavior.
• Every Leaf node should have an uplink to every Spine node. This helps the fabric to have a predictable load
balancing behavior.
• All uplinks from Leaf node to Spine node should be the exact same speed. This helps the fabric to have
predictable load balancing behavior and also helps with the non-blocking nature of the fabric. For example, let
us assume that a Leaf has one 40GbE uplink and one 10GbE uplink to the Spine. When using the combination
of OSPF (for loopback interface advertisement and BGP next hop resolution) and IBGP, when calculating the
shortest path to the BGP next hop, the bandwidth of the links will be taken into consideration. OSPF will most
likely always chose the 40GbE interface during its shortest path first (SPF) calculation and use the interface for
forwarding towards remote BGP next hops. This essentially blocks the 10GbE interface from ever being used. In
the EBGP scenario, the bandwidth will not be taken into consideration, so traffic will be equally load shared over
the two different speed interfaces. Imagine trying to equally load share 60 Gbps of data over the two links, how
will the 10GbE interface handle 30 Gbps of traffic? The answer is...it won’t.
IP Fabric Scaling
The slide highlights the topic we discuss next.
Scaling
To increase the overall throughput of an IP Fabric, you simply need to increase the number of Spine devices (and the
appropriate uplinks from the Leaf nodes to those Spine nodes). If you add one more Spine node to the fabric, you will also
have to add one more uplink to each Leaf node. Assuming that each uplink is 40GbE, each Leaf node can now forward an
extra 40Gbps over the fabric.
Adding and removing both server-facing ports (downlinks from the Leaf nodes) and Spine nodes will affect the
oversubscription (OS) ratio of a fabric. When designing the IP fabric, you must understand OS requirements of your data
center. For example, does your data center need line rate forwarding over the fabric? Line rate forwarding would equate to
1-to-1 (1:1) OS. That means the aggregate server-facing bandwidth is equal to the aggregate uplink bandwidth. Or, maybe
your data center would work perfectly fine with a 3:1 OS of the fabric. That is, the aggregate server-facing bandwidth is 3
times that of the aggregate uplink bandwidth. Most data centers will probably not require to design around a 1:1 OS. Instead,
you should make a decision on an OS ratio that makes the most sense based on the data center’s normal bandwidth usage.
The next few slides discuss how to calculate OS ratios of various IP fabric designs.
3:1 Topology
The slide shows a basic 3:1 OS IP Fabric. All Spine nodes, four in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
server-facing interfaces. Each of the (48) 10GbE ports for all 32 Spine nodes will be fully utilized (i.e., attached to
downstream servers). That means that the total server-facing bandwidth is 48 x 32 x 10Gbps which equals 15360 Gbps.
Each of the 32 Leaf nodes has (4) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 4 x 32 x
40Gbps which equals 5120 Gbps. The OS ratio for this fabric is 15360:5120 or 3:1.
An interesting thing to note is that if you remove any number of Leaf nodes, the OS ratio does not change. For example, what
would happen to the OS ratio if their were only 31 nodes. The server facing bandwidth would be 48 x 31 x 10Gbps which
equals 14880 Gbps. The total uplink bandwidth is 4 x 31 x 40Gbps which equals 4960 Gbps. The OS ratio for this fabric is
14880:4960 or 3:1. This fact actually makes your design calculations very simple. Once you decide on an OS ratio and
determine the number of Spine nodes that will allow that ratio, you can simply add and remove Leaf nodes from the topology
without effecting the original OS ratio of the fabric.
2:1 Topology
The slide shows a basic 2:1 OS IP Fabric in which two Spine nodes were added to the topology from the last slide. All Spine
nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE interfaces. All leaf nodes, 32 in total, are
qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE server-facing interfaces. Each of the (48) 10GbE
ports for all 32 Spine nodes will be fully utilized (i.e., attached to downstream servers). That means that the total
server-facing bandwidth is still 48 x 32 x 10Gbps which equals 15360 Gbps. Each of the 32 Leaf nodes has (6) 40GbE
Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals 7680 Gbps. The OS
ratio for this fabric is 15360:7680 or 2:1.
1:1 Topology
The slide shows a basic 1:1 OS IP Fabric. All Spine nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
server-facing interfaces. There are many ways that an 1:1 OS ratio can be attained. In this case, although the Leaf nodes
each have (48) 10GbE server-facing interfaces, we are only going to allow 24 servers to be attached at any given moment.
That means that the total server-facing bandwidth is still 24 x 32 x 10Gbps which equals 7680 Gbps. Each of the 32 Leaf
nodes has (6) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals
7680 Gbps. The OS ratio for this fabric is 7680:7680 or 1:1.
Configure an IP Fabric
The slide highlights the topic we discuss next.
Example Topology
The slide shows the example topology that will be used in the subsequent slides. Notice that each router is the single
member of a unique autonomous system. Each router will peer using EBGP with its directly attached neighbors using the
physical interface addresses. Host A is singly homed to the router in AS 64514. Host B is multihomed to the routers in AS
64515 and AS 64516.
Verifying Neighbors
Once you configure BGP neighbors, you can check the status of the relationships using either the show bgp summary or
show bgp neighbor command.
Routing Policy
Once BGP neighbors are established in the IP Fabric, each router must be configured to advertise routes to its neighbors and
into the fabric. For example, as you attach a server to a top-of-rack (TOR) switch/router (which is usually a Leaf node of the
fabric) you must configure the TOR to advertise the server’s IP subnet to the rest of the network. The first step in advertising
route is to write a policy that will match on a route and then accept that route. The slide shows the policy that must be
configured on the routers in AS64515 and AS 64516.
Applying Policy
After configuring a policy, the policy must be applied to the router EBGP peers. The slide shows the direct policy being
applied as an export policy as64515’s EBGP neighbors.
Default Behavior
Assuming the routers in AS 64515 and AS 64516 are advertising Host B’s subnet, the slide shows the default routing
behavior on a Spine node. Notice that the Spine node has received two advertisements for the same subnet. However,
because of the default behavior of BGP, the Spine node chooses a single route to select as the active route in the routing
table (you can tell which is the active route because of the asterisk). Based on what is shown in the slide, the Spine node will
send all traffic destined for 10.1.2/24 over the ge-0/0/2 link. The Spine node will not load share over the two possible next
hops by default.
Verify Multipath
View the routing table to see the results of the multipath statement. As you can see the active BGP route now has two next
hops that can be use for forwarding. Do you think the router is using both next hops for forwarding?
Results
The output shows that after applying the load balancing policy to the forwarding table, all next hops associated with active
routes in the routing table have been copied into the forwarding table.
AS 64514
The slide shows the BGP and policy configuration for the router in AS 64514.
AS 64515
The slide shows the BGP and policy configuration for the router in AS 64515.
AS 64512
The slide shows the BGP and policy configuration for the router in AS 64512.
We Discussed:
• Routing in an IP Fabric;
• Scaling of an IP Fabric; and
• Configuring an IP Fabric.
Review Questions
1.
2.
3.
Lab: IP Fabric
The slide provides the objectives for this lab.
We Will Discuss:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.
Layer 2 Apps
The needs of the applications that run on the servers in a data center usually drive the designs of those data centers. There
are many server-to-server applications that have strict requirements layer 2 connectivity between servers. A switched
infrastructure that is built around xSTP or a layer 2 fabric (like Juniper Network’s Virtual Chassis Fabric or Junos Fusion) is
perfectly suited for this type of connectivity. This type of infrastructure allow for broadcast domains to be stretched across
the data center using some form of VLAN tagging.
IP Fabric
Many of today’s next generation data centers are being built around IP Fabrics which, as their name implies, provide IP
connectivity between the racks of a data center. How can a next generation data center based on IP-only connectivity
support the layer 2 requirements of the traditional server-to-server applications? The rest of this section of this chapter will
discuss the possible solutions to the layer 2 connectivity problem.
Layer 2 VPNs
One possible solution to providing layer 2 connectivity over an IP-based data center would be to implement some form of
layer 2 virtual private network (VPN) on the routers that directly attach to the servers in the rack. Usually these routers would
be the top-of-rack (TOR) routers/switches. In this scenario, each TOR router would act as a layer 2 VPN gateway. A gateway is
the device in a VPN that performs the encapsulation and decapsulation of VPN data. In a layer 2 VPN based on Ethernet, a
gateway (router on left) will take Ethernet frames destined for a remote MAC address, encapsulate the original Ethernet
frame in some other data type (like IP, MPLS, IPsec, etc.) and transmit the newly formed packet to the remote gateway. The
receiving gateway (router on right) will receive the VPN data, decapsulate the data by removing the outer encapsulation, and
then forward the remaining original Ethernet frame to the locally attached server. Notice on the diagram, that the IP Fabric
simply had to forward IP data. The IP Fabric had no knowledge of the Ethernet connectivity that exists between Host A and B.
Data Plane
There are generally two components of a VPN. There is the data plane (as described on this slide) and the control plane (as
described on the next slide).
The data plane of a VPN describes the method in which a gateway encapsulates and decapsulates the original data. Also, in
regards to an Ethernet layer 2 VPN, it might be necessary for the gateway to learn the MAC addresses of both local and
remote servers much like a normal Ethernet switch learns MAC addresses. In almost all forms of Ethernet VPNs, the
gateways learn the MAC addresses of locally attached servers in the data plane (i.e. from received Ethernet frames). Remote
MAC addresses can be learned either in the data plane (after decapsulating data received from remote gateways) or in the
control plane.
Control Plane
One question that must be asked is, “How does a gateway learn about remote gateways?” The learning of remote gateways
can happen in one of two ways. Remote gateways can be statically configured on each gateway participating in a VPN or they
can be learned through some dynamic VPN signaling protocol.
Static configuration works fine but it does not really scale. For example, imagine that your have 20 TOR routers participating
in a statically configured layer 2 VPN. If you add another TOR router to the VPN, you would have to manually configure each
of the 20 switches to recognize the newly added gateway to the VPN.
Usually a VPN has some form of dynamic signaling protocol for the control plane. The signaling protocol can allow for
dynamic adds and deletions of gateways from the VPN. Some signaling protocols also allow a gateway to advertise its locally
learned MAC addresses to remote gateways. Usually a gateway has to receive an Ethernet frame from a remote host before
it can learn the host’s MAC address. Learning remote MAC addresses in the control plane allows the MAC tables of all
gateways to be more in sync. This has a positive side effect of causing the forwarding behavior of the VPN to be more
efficient (less flooding of data over the fabric).
Virtualization
Data centers are relying on virtualization more and more. The slide shows the concepts of virtualizing servers in a data
center. Instead of installing a bare metal servers (BMS), a server can run as a virtual machine (VM) on a host machine. A VM
is a software computer that runs the same OS and applications as a BMS. A host machine is the physical machine that
house the VMs that run inside it.
One interesting piece of virtualization is how networking works between VMs. Normally, a BMS would simply need a physical
network interface card (NIC) to attach to the network. In the virtualized world, the VMs also utilize NICs, however they are in
fact, virtual. VMs use their virtual NICs to communicate with other VMs. To provide connectivity between VMs on the same
host machine, the virtual NICs attach to virtual switches. To allow VMs to communicate over the physical network, the virtual
switches use the physical NICs of the host machine. If the physical network is a switched network (as in the diagram), the
virtual switches appear to standard switches attached to the network. VLANs can simply be stretched from one virtual
switch, across the physical switched network, and terminate on one or more remote virtual switches. This works great when
the physical network is some sort of Ethernet switches network. However, what happens when the physical network is based
on IP routing?
VTEP, Part 1
The VXLAN Tunnel Endpoint (VTEP) is the VPN gateway for VXLAN. It performs the encapsulation (and decapsulation) of
Ethernet frames using VXLAN encapsulation. Usually, the mapping of VLAN (VM-facing) to VNI is manually configured on the
VTEP.
VTEP, Part 2
The slide shows how a VTEP handles an Ethernet frame from a locally attached VM that must be sent to a remote VM. Here
is the step by step process taken by Virtual Switch 1...
1. VS1 receives an Ethernet frame with a destination MAC of VM3.
2. VS1 performs a MAC table look up and determines that the frame must be sent over the VXLAN tunnel to the
remote VTEP, VS2.
3. VS1 removes any outer VLAN tagging on the original Ethernet frame and then encapsulates the remaining
Ethernet frame using VXLAN encapsulation while also setting the destination IP address to VS2’s VTEP address
as well as setting the VNI appropriately.
4. VS1 forwards the VXLAN packet towards the IP Fabric.
VTEP, Part 3
The slide shows how a VTEP handles a VXLAN packet from a remote VTEP that must be decapsulated and sent to a local VM.
Here is the step by step process taken by the network and VS2...
1. The routers in the IP fabric simply route the VXLAN packet to its destination, VS2’s VTEP address.
2. VS2 receives the VXLAN packet and uses the received VNI to determine on which MAC table the MAC table
lookup should be performed.
3. VS2 strips the VXLAN encapsulation leaving the original Ethernet frame.
4. VS2 performs a MAC table lookup to determine the outgoing virtual interface to send the Ethernet frame.
5. VS2, if necessary, pushes on VLAN tag and forwards the Ethernet frame to VM3.
One thing you should notice about the VLAN tagging between the VMs and the virtual switches is that since the VLAN tags
are stripped before sending over the IP Fabric, the VLAN tags do not have to match between remote VMs. This actually allows
for more flexibility in VLAN assignments from server to server and rack to rack.
BUM Traffic
The slide discusses the handling of BUM traffic by VTEPs according to the VXLAN standard model. In this model, you should
note that the underlay network must support a multicast routing protocol, preferably some form of Protocol Independent
Multicast Sparse Mode (PIM-SM). Also, the VTEPs must support Internet Group Membership Protocol (IGMP) so that they can
inform the underlay network that it is a member of the multicast group associated with a VNI.
For every VNI used in the data center, there must also be a multicast group assigned. Remember that there are 2^24 (~16M)
possible VNIs so your customer will need 2^24 group addresses. Luckily, 239/8 is a reserved set of organizationally scoped
multicast group addresses (2^24 group addresses in total) that can be used freely within your customer’s data center.
Multicast Forwarding
When VTEP B receives a broadcast packet from a local VM, VTEP B encapsulates the Ethernet frame into the appropriate
VXLAN/UDP/IP headers. However, it sets the destination IP address of the outer IP header to the VNI’s group address
(239.1.1.1 on the slide). Upon receiving the multicast packet, VTEP B’s DR (the PIM router closest to VTEP B) encapsulates
the multicast packet into unicast PIM register messages that are destined to the IP address of the RP. Upon receiving the
register messages, the RP de-encapsulates the register messages and forwards the resulting multicast packets down the
(*,G) tree. Upon receiving, the multicast VXLAN packet, VTEP A does the following:
1. Strips the VXLAN/UDP/IP headers;
2. Forwards the broadcast packet towards the VMs using the virtual switch;
3. If VTEP B was unknown, VTEP A learns the IP address of VTEP B; and
4. Learns the remote MAC address of the sending VM and maps it to VTEP B’s IP address.
For all of this to work, you must ensure that the appropriate devices support PIM-SM, IGMP, and the PIM DR and RP
functions.
It is not shown on this slide but once R1 receives the first native multicast packet from the RP (source address is VTEP B’s
address), R1 will build a shortest path tree (SPT) to the DR closest to VTEP B which will establish (S,G) state on all routers
along that path.
VXLAN Configuration
The slide highlights the topic we discuss next.
Example Topology
The slide shows the example topology that will be used for the subsequent slides.
Logical View
To help you understand the behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast
domain as well as IP subnet. Also, VRRP will run between the two routers so as to provide a redundant default gateway to the
two hosts.
Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Network’s routers as the VTEP interfaces. Therefore, you must make sure that the loopback
addresses of the routers are reachable. Remember, the loopback interface for each router in the IP Fabric fell into the
172.16.100/24 range.
PIM
Some form of PIM must be enabled in the IP Fabric. The slide shows that the routers will run PIM-SM with a statically
configured RP. The configurations of the RP as well as all other routers is shown on the slide. Notice that PIM-SM only needs
to be enabled on the IP Fabric facing interfaces.
Source Address
You must decided on the source address of the VXLAN and multicast packets that will be generated by the local VTEP. Use
the vtep-source-interface statement to specify the interface where the IP address will come from. This command is
the same for both MX and QFX5100 Series devices.
PIM Neighbors
The commands on the slide verify which PIM neighbors have been discovered and the associated settings for the neighbors.
VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.
MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.
We Discussed:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.
Review Questions
1.
2.
3.
Lab: VXLAN
The slide provides the objective for this lab.
We Will Discuss:
• The benefits of using EVPN signaling for VXLAN;
• The operation of the EVPN protocol; and
• Configuring and monitoring EVPN signaling for VXLAN.
MP-BGP
EVPN is based on Multiprotocol Border Gateway Protocol (MP-BGP). It uses the Address Family Identifier (AFI) of 25 which is
the Layer 2 VPN address family. It uses the Subsequent Address Family Identifier of 70 which is the EVPN address family.
BGP is a proven protocol in both service provider and enterprise networks. It has the ability to scale to millions of route
advertisements. BGP also has the added benefit of being policy oriented. Using policy, you have complete control over route
advertisements allowing you to control which devices learn which routes.
Active/Active Forwarding
When using PIM in the control plane for VXLAN, it is really not possible to have a server attach to two different top of rack
switches with the ability to forward data over both links (i.e., both links active). When using EVPN signaling in the control
plane, active/active forwarding is totally possible. EVPN allows for VXLAN gateways (Leaf1 at the top of the slide) to use
multiple paths and multiple remote VXLAN gateways to forward data to multihomed hosts. Also, EVPN has mechanisms (like
split horizon, etc.) to ensure that broadcast, unknown unicast, and multicast traffic (BUM) does not loop back towards a
multihomed host.
Proxy ARP
Although not currently supported, the EVPN RFC mentions that a EVPN Provider Edge (PE) router, Leaf1 in the example, can
perform Proxy ARP. It is possible that if Leaf2 knows the IP-to-MAC binding for HostB (because it was snooping some form of
IP traffic from HostB), it can send the MAC advertisement for HostB that also contains HostB’s IP address. Then, when HostA
sends an ARP request for HostB’s IP address (a broadcast Ethernet frame), Leaf1 can simply send an ARP reply back to
HostA without ever having to send the broadcast frame over the fabric.
EVPN Terminology
The slide highlights the terms used in a network using VXLAN with EVPN signaling.
• PE devices: These are the networking devices (Leaf nodes in the diagram) to which servers attach in a data
center. These devices also act as VXLAN Tunnel Endpoints (VTEPs) or VXLAN gateways (can be Layer 2 or Layer
3). These devices can be any node of an IP fabric; Leaf or Spine.
• P devices: These are networking devices that only forward IP data. The do not instantiate any bridge domains
related to the EVPN.
• Customer Edge (CE) devices: These are the devices that require the Layer 2 stretch over the data center. They
are the servers, switches, and storage devices that need layer 2 connectivity with other devices in the data
center.
• Site: An EVPN site is a set of CEs that communicate with one another without needing to send Ethernet frames
over the fabric.
• EVPN Instance (EVI): An EVPN Instance spanning the PE devices participating in that EVPN.
• Bridge Domain: A MAC table for a particular VLAN associated with an EVI. There can be many bridge domains
for a given EVI.
• MP-BGP Session: EVPN PEs exchange EVPN routes using MP-BGP.
• VXLAN Tunnel: A tunnel established between EVPN PE devices used to encapsulate Ethernet frames in VXLAN
IP packets.
EVPN Routes
The slide lists the EVPN routes, their usage, as well as where they are defined. The subsequent slides will discuss most of
these routes in detail.
Ethernet Segment
The set of links that attaches a site to one or more PEs is called an Ethernet segment. In the slide, there are two Ethernet
Segments. Site 1 has an Ethernet segment that consists of links A and B. Site 2 has an Ethernet segment that consists of
link C. Each Ethernet Segment must be assigned a 10-octet Ethernet Segment Identifier (ESI). There are two reserved ESI
values as shown in the slide. For a single-homed site, like Site 2, the ESI should be set to
0x00:00:00:00:00:00:00:00:00:00. This is the default ESI setting for a server facing interface on a Juniper Networks EVPN
PE. For any multihomed site, the ESI should be set to a globally unique ESI. In the example, both link A and link B have their
ESI set to 0x01:01:01:01:01:01:01:01:01:01. The commands below shows how to set the ESI on the server-facing interface.
Remote PE Behavior
When a remote PE, Leaf 3 in the example, receives the Ethernet Autodiscovery routes from Leaf1 and Leaf2 it now knows
that it can use either of the two VXLAN tunnels to forward data to MACs learned from Site 1. Based on the forwarding choice
made by CE1, it may be that Leaf1 was the only PE attached to Site1 that learned CE1’s MAC address. That means that
Leaf3 may have only ever received a MAC Advertisement for CE1’s MAC from Leaf1. However, since Leaf1 and Leaf2 are
attached to the same Ethernet Segment (as advertised in their Type 1 routes), Leaf3 knows it can get to CE1’s MAC through
either Leaf1 or Leaf2. You can see in Leaf3’s MAC table, that both VXLAN tunnels have been installed as next hops for CE1’s
MAC address.
Added Benefit
Another benefit of the Ethernet Autodiscovery route is that it helps to enable faster convergence times when a link fails.
Normally, when a site-facing link fails, a PE will simply withdraw each of its individual MAC Advertisement. Think about the
case where there are thousands of MACs associated with that link. The PE would have to send 1000s of withdrawals. When
the Ethernet Autodiscovery route is being advertised (because the esi statement is configured on the interface), a PE (like
Leaf1 on the slide) can simply send a single withdrawal of its Ethernet Autodiscovery route and Leaf3 can immediately
update the MAC table for all of the 1000s of MACs it had learned from Leaf1. This allows convergence times to greatly
improve.
BUM Traffic
When EVPN signaling is used with VXLAN encapsulation, Juniper Networks devices only support ingress replication of BUM
traffic. That is, when BUM traffic arrives on a PE, the PE will unicast copies of the BUM packets to each of the individual PEs
that belong the same EVPN.
Designated Forwarder
To fix the problems described on the previous slide, all the PEs attached to the same Ethernet Segment will elect a
designated forwarded for the Ethernet segment (2 or more PEs advertising the same ESI). A designated forwarder will be
elected per broadcast domain. Remember that an EVI can contain 1 or more broadcast domains or VLANs. The Ethernet
Segment Route (Type 4) is used to help with the election of the designated forwarder.
Underlay Topology
The slide shows the IP Fabric that will serve as the underlay network. It is based on EBGP with each router being in its own
autonomous system. Each router will advertise its loopback address which will also serve as the VTEP address as well.
Overlay Topology
The slide shows the overlay topology. Each leaf will act as a VXLAN Layer 2 Gateway. Each Spine will act as a distributed
VXLAN Layer 3 Gateways and provide routing into and out of the 10.1.1/24 subnet. Host A will be dual-homed using a LAG to
two Leaf nodes. The control plane for VXLAN will be EVPN using MP-IBGP. In the IBGP topology, the Spine nodes will act as
route reflectors.
Logical View
To help you understand they behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast
domain as well as IP subnet. Also, a matching virtual IP address and a matching virtual MAC address will be assigned to
each Spine node’s IRB interface which will provide a redundant, distributed default gateway to the two hosts.
Common Configuration
The slide shows the common configuration for all routers. Notice that a load-balancing policy has been applied to the
forwarding-table that will allow for multiple next hops to be installed in the forwarding table. Also, there is a policy called
direct that will be applied to the EBGP neighbors. The main purpose of this policy is to advertise each router’s loopback
interface (VTEP source interface) to all routers in the fabric. Lastly, in order for each router to run BGP, the autonomous
system must be set under [edit routing-options]. Looking at the example topology, you should notice that each
router will belong to two autonomous systems. Each router will belong to one AS in the underlay and one AS in the overlay. If
you plan to use the automatic route target function (described in subsequent slides) you should set the AS under [edit
routing-options] to the overlay network’s AS number.
Underlay Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Network’s routers as the VTEP interface. Therefore, you must make sure that the loopback
addresses of the routers are reachable.
BGP Status
Use the show bgp summary command to determine the status and routing tables used with your router’s BGP neighbors.
EVPN RIB-IN
The slide shows how to view all the routes (for all EVPN instances) that have been accepted by VRF import policies.
VRF Table
The slide shows you how to view the routes for a particular EVPN instance.
VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.
MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.
We Discussed:
• The benefits of using EVPN signaling for VXLAN;
• The operation of the EVPN protocol; and
• Configuring and monitoring EVPN signaling for VXLAN.
Review Questions
1.
2.
3.
We Will Discuss:
• The meaning of the term Data Center Interconnect;
• The control and data plane of an MPLS VPN; and
• The DCI options when using a VXLAN overlay with EVPN signaling.
DCI Overview
The slide lists the topics we will discuss. We discuss the highlighted topic first.
Interconnect Network
Between two data centers that need to be interconnected is a network of some type. A typical interconnect network could be
a point-to-point line, an IP network, or an MPLS network. The slide shows that these networks can be owned by customer
(the owners of the data center) or by a service provider. All the DCI options that we discuss in this chapter will work in both a
customer-owned or service provider-owned interconnect network. The main difference is how much control a customer has
over the DCI. Sometimes it is just easier and cost effective to let the service provider manage the DCI.
Point-to-Point DCI
In general, if there is great distance between data centers, a point-to-point interconnect can be pretty expensive. However, if
the data centers are just down the street from one another, it might make sense to have a point-to-point interconnect. This
type of interconnect is usually provided as a dark fiber between the data center. The customer simply attaches equipment to
the fiber and has the choice of running any type of protocol they wish over the interconnect.
IP DCI
It is possible to provide a DCI over an IP network. If the DCI is meant to provide Layer 2 stretch (extending of VLANs) between
the data centers then the Ethernet frames will need to be encapsulated in IP as it traverses the DCI. VXLAN and GRE are
some of the typical IP encapsulations that provide the layer 2 stretch. If the DCI is to provide layer 3 reachability between
data center, then an IP network is well suited to meet those needs. However, sometime the DCI network may only support
globally routeable IP addressing while the data centers use RFC 1918 addressing. When that is the case, it might make
sense to create a layer 3 VPN between the two data centers, like GRE, IPsec, or RFC 4364 (MPLS Layer 3 VPN over GRE).
MPLS DCI
The slide shows the encapsulation boundary of an MPLS transport network. The boundaries are different depending on who
owns the MPLS network. If the customer own the MPLS network then MPLS can be used for encapsulation from end-to-end.
If the service provider owns the MPLS network then the encapsulations between DC and MPLS network completely depends
on what is allowed by the service provider. If the service provider is providing a layer 2 VPN service, then the customer should
expect that any Ethernet frames sent from one data center will appear unchanged as it arrives at the remote data center. If
the service provider is providing a layer 3 VPN service, then the customer should expect that any IP packets sent from one
data center will appear unchanged as it arrives at the remote data center. In some cases, the service provider will allow a
customer to established data center-to-data center MPLS label switched paths (LSPs).
MPLS Advantages
Many of the DCI technologies that we will discuss depend on an MPLS network to transport frames between data centers.
Although in most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there
are several advantages to using an MPLS network:
1. Fast failover between MPLS nodes: Fast reroute and Node/Link protection are two features of an MPLS network
that allow for 50ms or better recovery time in the event of a link failure or node failure along the path of an
MPLS label switched path (LSP).
2. Scalable VPNs: VPLS, EVPN, L3 MPLS VPNs are DCI technologies that use MPLS to transport frames between
data centers. These same technologies allow for the interconnection of many sites (potentially hundreds)
without the need for the manual setup of a full mesh of tunnels between those sites. In most cases, adding a
new site only requires administrator to configure the devices at the new site. The remote sites do not need to be
touched.
3. Traffic engineering: MPLS allows for the administrator to decide the path takes over the MPLS network. You no
longer have to take the same path calculated by the IGP (i.e., all data takes the same path between sites). You
can literally direct different traffic types to take different paths over the MPLS network.
4. Any-to-any connectivity: When using an MPLS backbone to provide the DCI, it will allow you the flexibility to
provide any type of MPLS-based Layer 2 DCI, Layer 3 DCI, or both combinations that you choose. An MPLS
backbone is a network that can generally support most types of MPLS or IP-based connectivity at the same
time.
Label-Switching Router
The original definition of label-switching router is “a router that takes forwarding decisions based only on the content of the
MPLS header”. In other words, a label-switching router always operates in label-switching mode. We will use a definition
which is slightly less restrictive, to include also ingress and egress routers, sometimes referred to as label edge routers.
Traffic at the ingress or at the egress of a label-switched path is typically not encapsulated into MPLS, so label-switching is
not possible, and a forwarding decision needs to be taken according to other rules.
We will use the term label-switching router (LSR) to mean any router which participates in MPLS forwarding, including both
the ingress and the egress nodes. For brevity, in the rest of the course we will also use the term router as synonym for
label-switching router.
Label-Switched Path
A label-switched path (LSP) is a unidirectional path through the network defined in terms of label switching operations (push,
pop, swap). You can think of a LSP as a tunnel: any packet that enters it is delivered to its endpoint, no matter what type of
payload it contains.
Establishing a label-switched path across a MPLS domain means determining the actual labels and label operations
performed by the label-switching routers on the path. This can be done with manual configuration, or by some type of
dynamic label distribution protocol.
Often a label-switched path will reside within a single MPLS domain, for example within a single service provider. However,
the development of advanced BGP-based MPLS signaling allows the creation of label-switched paths that span multiple
domains and multiple administrations.
Penultimate-Hop Popping
Often the MPLS header is removed by the second-to-last (the penultimate) router in an LSP. This removal is an optimization
that helps in several cases, including using MPLS for IP traffic engineering. Removing the label at the penultimate hop
facilitates the work of the last-hop (egress) router, which, instead of having both to remove the MPLS header and then take
an IP routing decision, will only need to do the latter.
Penultimate-hop popping (PHP) is the default behavior on Juniper routers; however, it can be disabled in the configuration.
Some applications require PHP to be disabled, but that is often done automatically: the Junos OS is smart enough to detect
the need to signal the LSP so that PHP is disabled.
RSVP
The Junos OS uses RSVP as the label distribution protocol for traffic engineered LSPs.
• RSVP was designed to be the resource reservation protocol of the Internet and “provide a general facility for
creating and maintaining distributed reservation state across a set of multicast or unicast delivery paths”
(RFC 2205). Reservations are an important part of traffic engineering, so it made sense to continue to use
RSVP for this purpose rather than reinventing the wheel.
• RSVP was explicitly designed to support extensibility mechanisms by allowing it to carry what are called opaque
objects. Opaque objects make no real sense to RSVP itself but are carried with the understanding that some
adjunct protocol (such as MPLS) might find the information in these objects useful. This encourages RSVP
extensions that create and maintain distributed state for information other than pure resource reservation. The
designers believed that extensions could be developed easily to add support for explicit routes and label
distribution.
• Extensions do not make the enhanced version of RSVP incompatible with existing RSVP implementations. An
RSVP implementation can differentiate between LSP signaling and standard RSVP reservations by examining
the contents of each message.
Continued on the next page.
LDP
LDP associates a set of destinations (prefixes) with each data link layer LSP. This set of destinations is called the FEC. These
destinations all share a common data LSP path egress and a common unicast routing path. LDP supports topology-driven
MPLS networks in best-effort, hop-by-hop implementations. The LDP signaling protocol always establishes LSPs that follow
the contours of the IGP’s shortest path. Traffic engineering is not possible with LDP.
Layer 2 Options
Three classifications exist for Layer 2 DCIs:
1. No MAC learning by the Provider Edge (PE) device: This type of layer 2 DCI does not require that the PE devices
learn MAC addresses.
2. Data plane MAC learning by the PE device: This type of DCI requires that the PE device learns the MAC
addresses of both the local data center as well as the remote data centers.
3. Control plane MAC learning - This type of DCI requires that a local PE learn the local MAC addresses using the
control plane and then distribute these learned MAC addressed to the remote PEs.
Layer 3 Options
A Layer 3 DCI uses routing to interconnect data centers. Each data center must maintain a unique IP address space. A
Layer 3 DCI can be established using just about any IP capable link. Another important consideration for DCIs is
incorporating some level of redundancy by using link aggregation groups (LAGs), IGPs using equal cost multipath, and BGP or
MP-BGP using the mutipath or multihop features.
Provider Routers
Provider (P) routers are located in the IP/MPLS core. These routers do not carry VPN data center routes, nor do they interface
in the VPN control and signaling planes. This is a key aspect of the RFC 4364 scalability model; only PE devices are aware of
VPN routes, and no single PE router must hold all VPN state information.
P routers are involved in the VPN forwarding plane where they act as label-switching routers (LSRs) performing label
swapping (and popping) operations.
VPN Site
A VPN site is a collection of devices that can communicate with each other without the need to transit the IP/MPLS
backbone (i.e., a single data center). A site can range from a single location with one switch or router to a network consisting
of many geographically diverse devices.
The LSP
The next few slides are going to discuss the details of MPLS Layer 3 VPNs. One thing to remember with Juniper Networks
routers is that once an LSP is established (from PE1 to PE2 in the diagram) the ingress PE will install a host route (/32) to the
loopback interface of the egress router in the inet.3 with a next-hop of the LSP (i.e. outbound interface of LSP and push a
label). This default behavior means that not all traffic entering PE1 can get routed through the LSP. So what traffic gets
routed over the LSP then?
Looking at the example in the slide, remember PE1 and PE2 are MP-BGP peers. That means that PE2 will advertise VPN
routes to PE1 using MP-BGP which will have a BGP next-hop of 2.2.2.2 (PE2’s loopback). For these VPN routes to be usable
by PE1, PE1 must find a route to reach 2.2.2.2 in the inet.3 table. PE1 will not look in inet.0 to resolve the nexthop of MPLS
VPN MP-BGP routes.
VPN-IPv4 Route
The VPN-IPv4 route has a very simple purpose which is to advertise IP routes. PE2 installs locally learned routes in its VRF
table. That includes the directly connected PE-CE interface as well as any routes PE2 learns from CE2 (RIP, OSPF, BGP, etc).
Once PE2 has locally learned routes in its VRF table, it advertises it (based on configured policy) to remote PEs and attaches
a target community, target community “Orange” in the example. PE1, upon receiving the route must decide on whether it
should keep the route. It makes this decision based on resolving the BGP nexthop in inet.3 as well as looking at the received
route target community. PE1, in order to accept and use this advertisement, must be configured with an import policy that
accepts routes tagged with the “Orange” target community. Without a configured policy that matches on the “Orange” route
target, PE1 would just discard the advertisement. So, at a minimum, each VRF on each participating PE for a given VPN must
be configured with an export policy that attaches a unique target community to routes and also configured with an import
policy that matches and accepts advertisements based on that unique target community.
EVPN over IP
The slide shows an example of the signaling/data plane when using EVPN/VXLAN over an IP network. EVPN MP-BGP is used
to synchronize MAC tables.
When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN
packet destined to MX1’s loopback address. QFX2 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame to the destination host.
Stretching Subnets
The slides shows the EVPN Type 2 MAC Advertisements that must be exchanged between data centers when individual
subnets are stretched between data centers. Notice that Host1 and Host2 are attached to the same subnet. The example
shows the advertisement of just a single MAC addresses. However, in a real environment you might see 1000s of MAC
addresses advertised between data centers. That is a bunch of routes! MAC moves, adds, and changes in one data center
will actually effect the MAC tables/EVPN routing exchanges in another data center.
Unique Subnets
The EVPN Type 5 IP Prefix route can be used in a DCI situation in which the IP subnets between data centers are completely
unique. Notice that Host1 and Host2 are attached to different subnets. This fact is very important to the discussion. In this
situation, if Host1 needs to send an IP packet to Host2 it will send it to its default gateway which is the IRB of PE1. Leaf 1 will
encapsulate the Ethernet frames from Host1 into VXLAN and send the VXLAN packets to PE1. PE1 will strip the VXLAN
header and notice that the remaining Ethernet frames from Leaf1 have a destination MAC of its own IRB. It will strip the
Ethernet header and route the remaining IP packet based on the routing table associated with the IRB interface. PE1 will use
the EVPN Type 5 route that was received from PE2 for the 10.1.2/24 network and the packet will be forwarded over the
VXLAN tunnel between PE1 and PE2. You might ask yourself, “Why couldn’t PE1 use a standard IP route? Why does the
10.1.2/24 network need to be advertise by an EVPN Type 5 route?” The answer is that the Type 5 route allows for inter-data
center traffic to be forwarded over VXLAN tunnels (i.e. the end to end VXLAN-based VPN is maintained between data
centers). This is very similar to stitching concept discussed earlier. PE2 then receives the VXLAN encapsulated packet and
forwards the remaining IP packet towards the destination over the IRB interface (while encapsulating the IP packet in an
Ethernet header with a destination MAC of Host2). Finally, PE performs a MAC table lookup and forwards the Ethernet frame
over the VXLAN tunnel between PE2 and Leaf2.
DCI Example
The slide highlights the topic we discuss next.
VRF Configuration
The slide shows the VRF configuration for PE1. Notice the use of the vrf-target statement. Originally, VRF import policies
could only be enabled by writing explicit policies under [edit policy-options] and applying them using the
vrf-import and vrf-export statements. However, more recent versions of the Junos Operating System allow you to
skip those steps and simple configure a single vrf-target statement. The vrf-target statement actually enables two
hidden policies. One policy is an VRF export policy that takes all locally learned routes in the VRF (direct interface routes as
well as routes learned from the local CE) and advertises them to the remote PE tagged with the specified target community.
The other policy is an VRF import policy that will accept all VPN-IPv4 routes learned from remote PEs that are tagged with the
specified target community.
MP-BGP Routing
The slide shows how to enable VPN-IPv4 signaling between PEs. Use the show bgp summary command to verify that the
MP-BGP neighbor relationship is established and that the PE is receiving routes from the remote neighbor.
VRF Table
Remember, the main purpose of establishing an underlay network and the DCI is to ensure that the routers in each site can
reach the loopback addresses (VTEP source addresses) of the remote Leaf nodes. The slide shows that PE1 has learned the
loopback address of Leaf2.
Leaf1 Configuration
The slide shows the underlay and overlay network configuration of Leaf1. Leaf2 would be configured very similarly.
We Discussed:
• The meaning of the term Data Center Interconnect;
• The control and data plane of an MPLS VPN; and
• The DCI options when using a VXLAN overlay with EVPN signaling.
Review Questions
1.
2.
3.
www.juniper.net
Adanced Data Center Switching
www.juniper.net
Advanced Data Center Switching
We Will Discuss:
• Available troubleshooting tools; and
• A basic troubleshooting approach.
Troubleshooting Tools
The slide lists the topics we will discuss. We discuss the highlighted topic first.
Troubleshooting Tools
A number of tools can be useful when troubleshooting Layer 2 environments using the QFX5100 Series switches. The slide
introduces some of the tools designed to aid in troubleshooting efforts. We cover some of these tools in more detail
throughout the remainder of this section.
Visual Indicators
Using the visual indicators on a physical device is one of the most basic troubleshooting tools. While there are other CLI
commands that will provide alarm and link status for system components, using the visual indicators (or the CLI equivalent
shown on the slide) can quickly help you determine if alarms exist or the status of a given interface.
For detailed coverage of the QFX5100 Series switches and their LEDs refer to the hardware documentation at: http://
www.juniper.net/techpubs/en_US/release-independent/junos/information-products/pathway-pages/hardware/qfx-series/
qfx5100.html.
Traceoptions: A Review
Tracing operations allow you to monitor the operation of various protocols by decoding the sent and received protocol
packets. In many ways, tracing is synonymous with the debug function on equipment made by other vendors. Note that
because of the design of hardware-based Juniper Networks platforms, you can enable reasonably detailed tracing in a
production network without negative impact on overall performance or packet forwarding.
In most cases when you enable tracing (through configuration), you create a trace file that stores decoded protocol
information. You analyze these files using standard CLI log file syntax such as show log logfile-name. While you can
enable detailed tracing in a production network without significantly impacting performance, it is still recommended that you
turn tracing off once you complete your testing to avoid unnecessary resource consumption.
The slide shows a generic tracing stanza that, if applied to the [edit protocols] portion of the configuration hierarchy,
would result in tracing of the specified protocol’s events. Specified protocol tracing operations track the flagged operations
and record them in the specified log file.
Continued on the next page.
Monitor Mode
This slide covers the Monitor mode, which is where you monitor traffic, system utilization, sessions, system status. You can
also perform some troubleshooting operations in this mode as well as any other system verification tasks performed through
Network Director.
Fault Mode
This slide covers the Fault mode, which is where you verify and manage fault management of the switches through
Network Director.
Report Mode
This slide covers the Report mode, which is where you generate, manage, and run reports for managed devices through
Network Director.
A Troubleshooting Approach
The slide highlights the topic we discuss next.
We Discussed:
• Available troubleshooting tools; and
• A basic troubleshooting approach.
Review Questions
1.
2.
3.
Copyright 2017
Juniper Networks, Inc. All rights reserved.
Juniper Networks, the Juniper Networks logo, Junos, NetScreen, and ScreenOS
are registered trademarks of Juniper Networks, Inc. in the United States and
other countries. All other trademarks, services marks, registered marks, or
registered services marks are the property of their respective owners. Juniper
Networks assumes no responsibility for any inaccuracies in this document.
Juniper Networks reserves the right to change, modify, transfer, or otherwise
revise this publication without notice.
Education Services
EDU-JUN-ADCX, Courseware
Revision 17.a