Creating Network Automation Strategies: On Topic

Creating Network
Automation Strategies

2 Applying closed-
loop automation
in broadband

7 Advanced
Automation for
Optical Transport


Reprinted with revisions to format from LIGHTWAVE.
Applying closed-loop
automation in
broadband networks

With a continual increase in service level automation strategies to reduce human error
expectations, network automation is a vital and boost productivity.
step in the digital transformation of an
Software-defined networking (SDN) helps
operator’s business. Multi-generational, multi-
to create more self-aware, self-governing
technology networks have led to complex
networks that can apply artificial intelligence
operational processes with increased data
and machine learning (AI/ML) to automate
sources and volumes, heightening the need
operations, improve service assurance, and
for comprehensive data management and
provide detailed analysis for anomaly detection,

Figure 1. Closed-loop automation (CLA) with software-defined access networks.

action recommendation,
and capacity planning.

In this article, we’ll

look at how automation
improves network
performance through
smart service level
agreement (SLA)
management. We will
show how dynamic
network optimization
can deal with the
presence of heavy users
on a passive optical
Figure 2: Internet traffic forecasting in UK. Source: https://www.
network (PON) and increasebroadbandspeed.co.uk/average-home-monthly-internet-usage-
restore the other users’ forecast
capability to use peak
The challenge of PON
capacity, without disturbing the download bandwidth management
performance of the heavy users. Sounds
According to data published by Openreach,
contradictory, but it’s possible.
internet traffic in the UK more than doubled
from 2019 to 2020 and some studies predict that
Software-defined it will exceed 1 TB/subscriber/month by 2025
access networks (Figure 2).
Central to software-defined access network
The challenge for communication service
(SDAN) architectures is the SDN controller,
providers (CSPs) in such a scenario is
which offers automation that can be adapted
that network capacity and corresponding
for a vast permutation of needs, services, and
infrastructure investments are planned over
processes. Analytical engines can use SDAN’s
many years while demand is booming and
open interfaces to retrieve real-time and
as we’ve just seen with COVID-enforced
historical data from central data lakes, while
lockdowns, can change almost overnight.
the closed-loop automation (CLA) framework
enables zero-touch provisioning, network health The notion of SLAs is also evolving, as
analysis, and automated troubleshooting. consumers are now as dependent on their
broadband connection as businesses are. “Best
The open APIs enable operators to build
effort” services are becoming less acceptable
their own automation apps on top of the SDN
to customers, or regulators. So, it’s getting
controller. As SDN controllers enable dynamic
increasingly difficult to dimension the network
determination of threshold crossing alerts and
(determine the maximum number of subscribers
KPI-based monitoring, the analytic engine can,
per PON) as well as to determine the network
in turn, trigger CLA routines for corrective
actions, thus automating the loop (Figure 1).

configuration parameters like the maximum rate Netflix 8K streamer, the dubious downloader,
per subscriber (advertised peak rate). the home-based 3D animator, or when using
a residential subscription for commercial
Overbooking allows CSPs to advertise more
services. If a single user requests the maximum
throughput by relying on the fact that not all
bandwidth permanently, it can prevent other
the subscribers will request the maximum
subscribers from getting high bandwidth even
throughput at the same time. However, to ensure
for a short period of time.
acceptable service levels, overbooking has to be
carefully managed. Historically, CSPs have used rules-of-thumb to
do traffic management. However, traditional
The sharing of a PON among multiple users
ways of managing heavy users (throttling, data
is typically handled by traffic schedulers
caps, or premium pricing for higher SLAs) are
and shapers. In this sharing scheme and
not dynamic; once applied they are permanent
without service differentiation, every active
until unapplied, even when extra bandwidth is
user will receive an equal share of the total
available or no other subscribers are affected,
bandwidth. The free bandwidth is automatically
say during night-time or low traffic periods.
redistributed among active users. In a worst-
case scenario when all subscribers are active, CSPs are caught between increasing advertised
the minimum bandwidth that each user will peak rates while increasing the overbooking
receive is equal to the total bandwidth divided levels or keeping them at a conservative level
by the number of subscribers. In the best case, to guarantee SLAs and limit the impact of
if there were only one active user, this user heavy users.
would get the full bandwidth.

CSPs make sure that all subscribers can access Applying online and offline
services, like video streaming, all at the same network optimization
time, and that significant extra bandwidth is Using CLA, we can implement dynamic
available on top to address best effort services bandwidth management that preserves peak
like web surfing. As user activities, at least rate availability even in the presence of heavy
for the best-effort applications, are totally users. It ensures all subscribers within a PON (or
uncorrelated, it is very unlikely that all of them a slice of a PON) have access to a “fair” portion
request a high throughput at the same time, of available bandwidth.
allowing significant overbooking with limited
SDN takes advantage of telemetry—network
impact on the service level.
data being streamed continuously rather than
One of the limitations of the PON, especially being requested periodically—to understand
when overbooking is introduced, is vulnerability traffic patterns in real-time and compare
against heavy users. Long-term subscriber them with historical bandwidth consumption
fairness is hard to obtain, as at any time, the measured at different levels in the network
bandwidth is fairly shared among the active (subscriber, PON, operator, etc.). This ability
users, irrespective of historical bandwidth enables bottlenecks to be identified and
utilization. This means PONs are vulnerable to autonomously remediated in near-real time,
people using their subscription to the max: the which is impossible to do in traditional systems.

Figure 3. Network design tool using machine learning.

Achieving this goal requires a modern and For example, an ML model can be trained to
efficient approach to network data capture determine the number of subscribers per PON
and telemetry streaming via protocols like that corresponds to a minimum speed test
IPFIX and Kafka. success probability and a given advertised
peak rate. Alternatively, an ML model can
If insufficient capacity is detected, CLA
also be trained to determine the peak rate
reduces the bandwidth for heavy users based
that corresponds to a certain speed test
on their historical consumption by dynamically
success probability and a given number of
adapting the traffic schedulers and shapers.
subscribers per PON.
When more capacity becomes available, CLA
restores the initial configuration. This practice The same overbooking/heavy user problem
guarantees long-term fairness for subscribers exists when a single network is shared by
while maximizing bandwidth utilization. To different virtual network operators. In the same
illustrate the performance in a realistic scenario, way, virtual operators or infrastructure providers
one heavy user can deny all other users to can use CLA and offline recommendation tools
complete a successful speed test; but when the to dynamically dimension both the physical
CLA is activated, we restore the probability of a PON and the virtual slices according to real-time
successful speed test to 80% for all users, while usage and historical trends.
the heavy user’s download times are increased
by less than 1%, which is hardly a performance
We expect SDN automation to make
While the bandwidth closed-loop optimization important decisions for broadband operators
operates online, telemetry can also be leveraged at an accelerating rate: unlocking new service
in offline AI/ML tools that are able to learn capabilities, bringing more agility to their
the traffic patterns from the network and operations, making for smarter decisions, and
recommend optimal advertised peak rates and improving network performance.
the number of subscribers per PON for a desired
SLA (Figure 3).
Filip De Greve is product marketing
director, Fixed Networks, at Nokia.

How to evolve your
broadband network
to a software-defined
access network?
Advanced Network
Automation for
Next-Generation Optical
Transport Networks

With the introduction of 4G and 5G mobile

technology, the increasing number of networked
devices, and the exploding number of services
relying on connectivity, such as Internet of
Things (IoT), virtual reality (VR), augmented
reality (AR), gaming, and any other cloud
offering, communications networking in
general and optical transport networking
in particular have become an increasingly
central infrastructure for a large part of the
global economy.

The optical networking industry is successfully

addressing this important role through speed of
To drive this fast innovation, network operators
innovation. Photonic and electronic integration
increasingly combine best-in-class products
have enabled smaller, more power-efficient
from different vendors into their networks.
optical transmission systems with higher
throughput. With a two-year cadence, the All these innovations and capabilities have
capacity carried on a single wavelength within improved the scale, capacity, and power
a DWDM wavelength band has moved from efficiency, but have also resulted in a significant
the already disruptive introduction of coherent amount of complexity in managing the
100G transmission per channel to 200G, 400G, impressive pool of bandwidth available across
600G, and 800G wavelength capacities. The vendor and technology domains. To leverage
wavelength spectrum that has traditionally this huge amount of scalability and flexibility
been used, the C-Band, has been widened in networks today, developing suitable new
to encompass the L-Band as well to provide paradigms for control and network operation has
mechanisms to further enhance reach and now become the focus for network operators and
spectral efficiency and optical layer connectivity. the optical networking industry.

Combined with the rise of internet content It is worthwhile to briefly discuss some common
providers (ICPs), a planet-scale networking methods used by network operators to achieve
infrastructure has been built in the last decade automation today. The motivation is to automate
to sustain social media and rich media content repeatable tasks that inherently do not require
exchange. With ICPs, we have seen significant any enhanced “intelligence.” That is, the
investments in data centers, which represent tasks are fairly contained, are well-defined,

Figure 1. Autonomous, self-driving optical networks. Source: https://www.osapublishing.org/abstract.


an amalgamation of compute, storage, and and achieve very specific goals. The scope
networking infrastructure. This has also had of automation itself encompasses the overall
a transformational impact on traditional telco spectrum of FCAPS operations; automation
carriers/service providers (CSPs). For CSPs, the tasks could include performing custom
imminent arrival of 5G – which is expected monitoring beyond the capabilities offered by
to act as a force multiplier for other innovative the network element, or a slightly more advanced
technologies, including artificial intelligence task could involve the movement of bandwidth
(AI), IoT, and edge computing – is going to (wavelengths or circuits) based on time-of-day
result in a new cycle of network capacity considerations. Given the progressive evolution
growth and the associated capital investment of telecom systems over time, automation tasks
in infrastructure. Advanced automation is going that are initially defined and maintained by
to play an important role to improve operational operators gradually become first-class features
costs and network availability. that eventually are implemented natively by the

networking equipment vendors as part of their of a network operations center (NOC). Instead
software offerings. of a central hub with operators in front of the
network management system (NMS) graphical
user interfaces, the general objective now is to
Building Next-Generation
Network Operations Centers establish autonomous networks.

The changing operations paradigm from Autonomous networks are the eventual goal
classic telco networking towards a new IT beyond advanced automation (Figure 1). To
networking environment also affects the role contrast autonomous networks from automated/

Components of Automation
enable optical networks to be remotely
configured to changing traffic conditions.
The introduction of software-defined
For instance, automatic migration of a circuit
networking (SDN) has pushed the networking
to a low-latency/high-priority path based on
industry (optical included) towards open
and standardized interfaces, from routing to
photonic layers, allowing greater control and • Automated Network Equipment
visibility into the network. With well-defined Provisioning: Zero-touch provisioning
APIs from the NEs, the SDN controller can (ZTP) is the ability to commission optical
manage the data, control and management devices with very little human involvement.
planes of the NE. As the lines between legacy Field personnel install equipment within the
NMSs and an SDN controller blur, the “SDN NOC and only perform techanical and power
controller layer” is coalescing many other installation procedures.
network functions, hosting automation scripts
• SDN Control, Programmability,
& tasks, archiving of performance monitoring
and APIs: SDN transport has brought
(PM) data, real-time planning, and in the
in extensive API frameworks based on
near future, AI/ML frameworks for cognitive,
YANG data models known as model-driven
predictive/proactive analytics. A few key
networking (MDN). MDN normalizes network
components that are necessary for automation
functions across vendor implementations
in the near term:
through data abstractions, which are
• Programmable Optical Hardware and specified as YANG models. MDN results in
NEs: The ability to automate requires separation of intent (what) from actuation
fundamental capabilities in the optical data (how), which is critical to scale operations in
plane to be able to enact those intents. multi-vendor environments.
These capabilities span from colorless/
• Streaming Telemetry: Modern devices
directionless/contentionless (CDC) ROADMs
support streaming telemetry-based
and hybrid ODU/packet switches to highly
performance monitoring, which resolves
spectrally efficient coherent WDM interfaces
the limitations of legacy SNMP pull-based
(that enable fine-grained tradeoff between
monitoring. The devices can stream
capacity and reach). These characteristics
continued on page 10

(push) data at varying frequencies (from as well as identifying security breaches or
seconds to minutes), which allows improved denial of service attacks.
monitoring and observability. In lieu of AI/
Operators are also considering policy-driven
ML applications where timely data is key
cognitive systems that are used to (re-)configure
to improved prediction accuracy, telemetry
the network to accommodate dynamicity,
allows real-time tracking of key performance
helping automate daily tasks depending on
metrics, which further enhances predictive/
user-specified conditions. Such policy-based
proactive analytics.
engine can be integrated with or without AI/
• Analytics and Machine Learning ML capabilities, providing an optimal ecosystem
Frameworks: Popular public cloud for automation. Integration with AI/ML
providers and other software vendors are components provides the foundation for closed-
increasingly providing “AI/ML and analytics loop autonomous actions. One such example
as a service” offerings, allowing an easier was recently demonstrated by a North American
starting point than having to build software operator when a policy-based system was
stacks from scratch. These approaches able to dynamically reallocate/migrate optical
increasingly provide benefit to network capacity to support Ethernet bandwidth-on-
operations, e.g., through the analyses of demand services utilizing multiple best-in-class
traffic flows and traffic prediction, optical open source tools for streaming, messaging,
performance analytics and optimization, data collection, and learning

automatic operations is akin to comparing a imply that it is imperative to make operations

Level 3+ capable “hands-off” self-driving car personnel an active part of this transformation
to the automatic/semi-automatic cars that and equip them for this new environment.
are ubiquitous today. Autonomous networks
Migrating to NOC-less network operations will
encompass several self-* properties viz. self-
take time and requires an industry effort among
bootstrapping, self-forming, self-managing, and
operators, software players, and equipment
self-healing. These networks can potentially
vendors, as the ability to automate in multi-
run in a driverless auto-pilot mode with zero
vendor environments is a prerequisite. There
human-to-machine interaction. Where current
are different stages in this journey, starting
NOCs include network engineers who have
with automating existing tasks within current
vendor-specialized certifications, an autonomous
NOCs. Alongside the creation of an open
network pushes towards NOC-less operations.
environment, the obvious starting point is to
This involves retasking network engineers from
automate existing repetitive tasks performed by
existing responsibilities (i.e., deploy ⇔ provision
technicians in the NOC. For instance, ICPs are
⇔ monitor ⇔ debug) to “DevOps” roles, where
gradually transforming network management
they double as software developers (Dev),
and operations by minimizing direct human
developing automation systems for operational
interaction and thereby reducing errors and
(Ops) tasks. Changes in the operational tools’
misconfigurations. They are moving to NOC-less
landscape and the associated paradigms

operations, which are managed by robots. The – Q
 uality of Transmission (QoT)
network engineers build robots, which in turn Estimation: Use ML to overcome model
manage the network. (For more on automation, deficits of existing heuristic approaches
see “Components of Automation” below.) and improve accuracy. The optical
Other new and exciting technologies will further lightpath performance can be estimated
enhance this picture, in particular, artificial by learning the characteristics from optical
intelligence and machine learning (AI/ML), devices, especially in open disaggregated
which are expected to play an important role in optical networks.
moving toward a NOC-less network. A NOC operator today uses vendor-specific
The ability to learn from existing information NMSs to provision connectivity services,
and predict future capacity, throughput, faults, monitor alarms and performance data, and
or other events will turn into an important manually trigger actions to fix issues. With the
operations tool. ML algorithms are probabilistic prediction capabilities, operational practices
rather than deterministic and rely purely on will change substantially: The cognitive
data for accurate predictions. ML approaches network will be expected to proactively predict
are cost-effective in comparison to conventional impending issues, take preventive action
approaches such as heuristics or analytical in whenever possible, and trigger steps to deal
cases where the problem that is being solved with those issues. Using AI/ML methodologies
either suffers from model deficit or algorithm will enable operators to cover a much wider set
deficit. ML is a powerful data-driven “tool” to of topics for a more versatile reaction to events.
improve/extend existing solutions – automation Once there are no repetitive tasks that need to
is still the “solution” that operators will deploy, be done from a central location, we move closer
with AI/ML as technologies to achieve this goal. to achieving NOC-less operations, allowing
We briefly highlight three optical networking operators to concentrate on optimizing their
use cases where ML techniques have potential automated environments.
to make an impact:
Operator and Vendor
– F
 ailure Prediction and Preventive Collaboration
Maintenance: Detection of anomalous Finally, while the promise of automation is
network parameters that could cause network apparent, there are several challenges in
failures, identify root cause, and prescribe migrating to next-generation autonomous optical
preventive actions. networks that involve all the players in the
automation ecosystem.
– C
 ognitive Service Provisioning: Using
The need for standardized tools/APIs is
historical data, an SDN-controlled autonomous
crucial for multi-vendor integration. Progress
network can predict traffic volume/growth
in standardization activities (IETF, ONF, OIF,
and dynamically (re-)allocate resources
MEF, TIP) is slow, with parallel efforts trying
(spectrum, wavelengths, circuits, etc.).
to achieve the same outcomes due to siloed
development. For operators, expanding the skills

of the operational workforce is necessary, but the field to improve performance and reliability.
involves financial/business investments and a Operators should be encouraged to share their
change towards a mindset supporting a DevOps operational experiences with vendors to, in
model (executives to network engineers). turn, enable vendors to build use-case-driven,
Further, onboarding AI/ML-based solutions high-value automation solutions for operators.
poses further challenges. As the accuracy of Recently in the context of ML, several key
learning-based systems is fully dependent upon software and hardware industry players have
the quality of data, organizational and business joined forces to establish the Open Neural
boundaries within operator organizations Network Exchange, which strives to define
make it challenging to collect, sanitize, and common data formats and open source building
share collected data, which results in disparate blocks for ML and deep learning models. A
pools of data. similar initiative is required in the networking
community that can bring the key players
Meanwhile, equipment vendors face their
together to build reusable automation and AI/
own set of challenges. The primary challenge
ML frameworks.
is to facilitate operator interactions and data
sharing by operators – which is difficult for
privacy, business, and in some cases intellectual Parthiban Kandappan is chief
property reasons. Vendors need to learn from technology officer at Infinera.
operators how their networking gear behaves in

