AI Operations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

TM Forum Whitepaper

AI Operations

May 5, 2020

TM Forum 2020. All Rights Reserved.


AI Operations

What are AI Operations (AIOps)?

Artificial Intelligence (AI), combined with and empowered by advanced analytics, big data and
virtualized computing power, will drive the automation and enhancement of CSPs’ network, IT
service and business operations.

AI capabilities will be gradually infused in IT, network, and business systems and services through the
implementation and deployment of AI models and AI components in all layers of CSPs systems
architecture.

Systems running in IT & network operations will be providing AI capabilities through AI models and
AI components embedded in their systems (BSS, OSS, Data Analytics, ERP, 3rd parties applications,
digital applications etc.) supporting all sorts of business and operational processes.

AI deployment in systems will bring tremendous opportunity to improve the business processes,
business services and the overall CSPs’ performance but will also create some challenges. In order to
face the challenges created by the large-scale deployments of AI models in CSPs' operations, service
management processes will have to be redesigned and adapted to manage the new AI-driven
operations and business scenarios and their underpinning AI-based systems.

AIOps definition

The term ‘AIOps’ is not new and has been used widely in the industry for many years, albeit in
different contexts with different nuances (Gartner, IDC ..).

TM Forum defines AIOps (Figure 1) where Operations include:

1. AI components that are actively running delivering business and operational services. In AIOps,
we assume that key systems in Production are deeply infused with AI capabilities forming a
blend of AI and traditional software. In figure 1 we indicate these systems as AI-based BSS, AI-
based OSS, AI-based Data Analytics, AI-based ERP, 3rdparty AI platforms, other AI applications.
2. Service Management processes, frameworks and tools that have been properly reengineered
and adapted in order to support the operations management of AI components in Production.
We call it AIOps Service Management.

Page 2 of 8

TM Forum 2020. All Rights Reserved.


AI Operations

Focus of AIOps
Transforming and reengineering CSPs operations to prepare them for AI is a broad, complex and
challenging. At TM Forum, we are identifying the gaps between traditional operations and AI
operations and creating a new operations process framework, allowing CSPs to re-engineer their
processes accordingly to be able to safely and securely implement and manage AI.

Understanding the gaps between AI and traditional software

AI is essentially a software-based technology. However, there are significant differences between AI


and traditional software. These specific characteristics of the AI software create the challenge to
manage, govern and operate systems and processes differently. For this reason, we need to identify
and ultimately work to plug the operational and process gaps between traditional operations and AI
Operations (AIOps).

From an operations management perspective, according to our analysis and experience, we have
identified the following main differences between AI software and traditional software:

• The software lifecycle of traditional systems is mainly driven from left to right, i.e. from
Development to Operations (from Dev to Ops). In AIOps a new and key aspect to manage is that
the self-driven software updates in Production generate a new flow from right to left, i.e. from
Operations to Development (from Ops to Dev), which doesn't exist for traditional software.
Current continuous improvement practices are based on human feedback and interventions, not
on software-driven updates. On the other hand, the lifecycle of AI components is bidirectional
flowing also from Operations to Development, as AI models may change autonomously their
state and configuration in Production (online learning, self-driven updates) without human
interventions, requiring then a prompt and comprehensive retrospective evaluation (Figure 11).

Page 3 of 8

TM Forum 2020. All Rights Reserved.


AI Operations

AIOps: from Dev to Ops and from Ops to Dev

• All software evolves. Continuous Improvement Lean and Kaizen principles have been extensively
adopted in software engineering and service operations management. Indeed, the retrospective
approach is part of Agile and DevOps methodologies.
However,

• For traditional software, evolution and maintenance is planned or can be planned.


• For AI software, the evolution is both planned, and spontaneous, autonomous, self-driven at the
same time.
• Before the introduction of AI, Production environments have always been seen as static, locked
down and sterile environments where all changes go (or should go) through a planned Change
Management process. Agile and DevOps practices have accelerated and streamlined the
preliminary processes leading up to Production (development and testing process, integration
and deployment process) but the core static essence of Production environments has not
significantly changed. The introduction of the AI models in operations overcome and challenges
this traditional static view transforming the Production in an intrinsic dynamic environment. AI
software is changing the operations management approach and the operations culture, which
will need to govern dynamic environments, live with the “fear of change” and manage
consequently the risks associated to the dynamic changes in Production.
• In traditional software engineering, the baseline and the starting point from where we develop
new software are usually well known. With the introduction of AI, the baseline of the software
became blurred and changing.

Page 4 of 8

TM Forum 2020. All Rights Reserved.


AI Operations

• Data has a key role and is one of the key components of the structure of AI models. It is the fuel
driving the evolution of AI systems. New Input datasets enable the evolution of AI models. New
data can bring new and different outcomes. For these reasons in AIOps, data operations become
even more critical and central (AIDataOps).
• ML training of AI algorithms and the re-training of AI models in Production are brand new
processes in software development and operations management, which do not exist in
traditional software lifecycle.
• AI models are nondeterministic by nature. All software in large and complex operations can be
considered at a certain degree nondeterministic because of the high number of involved
variables and unpredictable scenarios that they may face. However traditional software is or
should be deterministic by nature, i.e. given the same input it provides the same output. On the
other hand, AI models may behave differently in the same circumstances because their internal
state and internal logic may permanently change and evolve.
• AI software can be even more fragile than traditional software. As for any software, a small
difference between versions of code, between software configurations or between
environments baseline can create issues, defects or unexpected outcome. In addition to that,
for AI software a new byte in the input data can destabilize the AI model.
• AI models are exposed to the risk of bias. AI software can be biased with inappropriate,
incomplete, corrupted, incorrect or fraudulent input data. This risk adds up to all the other risks
and weaknesses existing for traditional software that are obviously applicable to AI software as
well (virus, malicious agents, sabotage, vulnerability etc.).
• AI models are black boxes. It is challenging to determine why AI models make a specific decision,
prediction, or classification. There are hidden dependencies inside the ML models, resulting
from the combination of the integration of input data, training parameters, configuration
settings etc. While code review of software and other audit techniques would usually clarify the
overall logic behind the behavior of traditional software, for AI software this would not be
enough. Additional and different approaches and techniques are needed to increase the
transparency and the “explainability” of AI software.
• We have learned from Continuous Delivery and DevOps practices that software should be
considered as in permanent working state or beta state. This principle is even truer for AI
models, which are pieces of software with the capability to learn spontaneously and
continuously when exposed to new data. By definition, AI models are in a permanent
evolutionary and working state (like human brains...).
• The intrinsic characteristics of the AI models listed above amplify further the management
responsibility of the Operations departments, making them even more central and accountable
for the service quality, service performance, for the proper and timely control and maintenance
of the continuously evolving and non-deterministic AI systems in Production.
• With the deployment of AI at scale, Production environments become dynamic by nature.
Deploying only AI offline modules, certainly creates new challenges but would contain the
complexity of the operations. However, if we use just AI offline models, we would give up in this

Page 5 of 8

TM Forum 2020. All Rights Reserved.


AI Operations

way to the benefits brought by the AI online models. In order to leverage the full potential of AI,
we need to learn how to manage both offline and online AI models in Production, supervising
their continuous dynamic evolution, ensuring the full control and governance of our operations.

Figure 1. Main differences between traditional software and AI software

Processes must be redesigned for AIOps Service


Management

Because of all the differences and gaps between traditional software and AI software listed above,
we need to rethink and redesign the operations management processes to prepare them to manage
and govern AI software, and more in general to operate safely and effectively a blend of AI and
traditional software running together and simultaneously in CSPs operations. As there are
differences between traditional software and AI software, there are consequently gaps between
service management for traditional software and service management for AI. The very nature of AI
means that we need to operate our systems and processes differently.

Traditional Ops vs AIOps

Page 6 of 8

TM Forum 2020. All Rights Reserved.


AI Operations

The transformation journey from “Traditional Service Management to “AIOps Service Management”
would address the business and operational needs to deploy and integrate into the existing CSPs’
operations a significant number of AI components with relevant business capabilities.

If your organization has no plan to deploy AI or plans to deploy just few isolated, siloed and/or not
relevant AI models that do not really impact key business processes, it is not necessary to start this
journey because the existing frameworks make already a good job to manage traditional software in
operations.

However, if you plan to deploy large amounts of AI across your business, we strongly recommend
you start immediately on the transformation journey towards the AIOps Service Management.

In AIOps Service Management the end goal is to:

• Redesign the Deployment processes to release and commit the AI models to Production.
• Redesign the Production processes to operate AI software
• Redesign the Operations Governance processes to govern AI software
• Deal with fast flows of changes coming from Dev to Ops and from Ops to Dev for both offline
and online models
• Integrate effective AI data operations and ML training practices in the AI software operations
management.
Additionally, the definition of clear roles and responsibilities in new AI operating models, a proper
redesign of the concerned organizations and the selection and inclusion of the appropriate skills are
key success factors of this transformation journey, albeit they are not in the scope of this activity.

Even if it’s not a strictly necessary condition for AIOps, the setup of a loosely coupled, open, service-
based, well-structured, well-documented system architecture helps to support a more agile and
efficient operations service management in general. The TM Forum Whitepaper ‘AIOps Service
Management Deployment’ identifies the gaps between traditional operations and AIOps in the
deployment phase.

As well as identifying and redesigning the operational processes for AI, a TM Forum Catalyst (proof-
of-concept) team, ‘AI for IT & Network Operations (AIOps) – Phase III’ has recently completed its
third phase with participation from 12 companies. The Catalyst included seven leading CSPs,
collectively representing over 1.5 billion customers. The team has developed eight use cases
addressing the various business needs presented by the CSP champions (China Telecom, China
Mobile, China Unicom, KDDI Research, PCCW/Hong Kong Telecommunications (HKT), Smart
Communications and Telefonica Deutschland). These cut across customer experience, quality of
service, business performance and efficiency, and include:

Page 7 of 8

TM Forum 2020. All Rights Reserved.


AI Operations

• Predicting and preventing poor customer experience


• Predicting churn and using proactive techniques to retain customers
• Accurately monitoring service levels
• Identifying potential faults and their root causes in 5G networks before the issues generate
impacts or outages
• Preventing customer complaints
• Performing preventive maintenance activities
• Deploying an intelligent operations and maintenance (O&M) framework for home broadband
services
• Establishing closed-loop service assurance to continuously improve service quality and O&M
efficiency

Page 8 of 8

TM Forum 2020. All Rights Reserved.

You might also like