5 Challenges To Achieving Observability at Scale

5 challenges to achieving
observability at scale
Using automation and intelligence
to overcome obstacles.
What teams are up against
6
Challenge One
Introduction The complexity of dynamic multi-cloud environments
10
Challenge Two
Successful digital transformation requires every application

Monitoring dynamic microservices and containers
and digital service, and the dynamic multi-cloud platforms
in real time
they run on, to work perfectly. All the time.
14
Challenge Three
But these dynamic, highly distributed cloud-native
The volume, velocity, and variety
technologies are fundamentally different than their of data and alerts
predecessors. The resulting complexity brought on by
microservices, containers, and software-defined cloud
18
Challenge Four
infrastructure is overwhelming at web scale. It’s all
beyond the limits of human teams to manage and scale Siloed Infra, Dev, Ops, Apps, and Biz teams
on their own.
22
To understand everything going on in these ever-changing Challenge Five
environments, all of the time, observability needs to scale.
Knowing what efforts drive positive business impact
5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 2

More tools aren’t
the answer
Some teams mistakenly try to solve this ‘observability at scale’
problem by adopting more siloed monitoring tools — and spending
of applications in enterprise
more time on manual configuration, incurring more technical debt,
95%
and struggling to identify issues and prioritize efforts with the organizations are not monitored
greatest impact.
due to siloed tools and burdensome
As cloud complexity continues, this approach becomes increasingly manual effort.
unsustainable for even the most experienced teams, who are
continuously bogged down in manual-intensive tasks that decrease
effectiveness to achieving what matters most.

The shift to intelligent observability
To scale observability, enterprise organizations must fundamentally

transform the way they work to innovate faster, keep up with constantly
changing tech stacks, and reduce risk across teams.
This scale happens when teams shift from simply observing and reacting to issues
UX
as they arise, to a culture of proactive understanding and optimization. This
logs
unlocks the ability to anticipate, predict and even auto-remediate problems that
traces
matter most to the business.
metrics
In deciding how to accelerate digital transformation, companies need to
understand that every decision is an investment in achieving the original goal
of observability: to proactively and efficiently improve user experiences
that drive business growth.

Automation and intelligence are essential
Whether selecting a DIY approach, buying another cheap tool, or investing in a strategic platform, everything costs time, money, people,
and quality. Prioritizing value and speed of delivery to the business and customers is paramount to finding success in this dynamic multi-cloud world.
Automation and intelligence are essential to transform how teams work to quickly and efficiently achieve observability at enterprise scale.
Requirements Results
Complete coverage More productivity and time to innovate
Automation everywhere Higher quality releases
Real-time feedback Better customer experiences
Precise answers Reduced risk
Cross-collaboration Accelerated business outcomes

Challenge 1:
The complexity of dynamic HYBRID SERVERLESS MICROSERVICES IOT
multicloud environments
MULTI-CLOUD + CONTAINERS
? ?
The rate at which new technologies are available and implemented is increasing,
exploding the complexity that results from unmanageable volumes and speed
? ?
of data emitted by dynamic environments. ? ?
This makes it near impossible for IT teams to manually understand how

everything is related in context, all of the time. So, teams must find ways
to automate the understanding of this data and context to accelerate
digital transformation.

Teams often fail at digital transformation because they’re:
44%
of an IT teams’
Hindered by Lacking understanding Forced to prioritize time is spent
disconnected data silos and context of upstream manual instrumentation on manual tasks,
that prevent understanding and downstream system and mundane tasks over on average.
of entity relationships impacts from potential developing new features
— Dynatrace 2020 Global CIO Report
and interdependencies changes
These shortcomings introduce unnecessary risk and burden developers with repetitive toil,
ultimately hurting digital transformation efforts and driving innovation forward.

ns
tio
ica
pl
Ap
How to overcome it r v ice

s
Se
Automation is an absolute necessity to not only handle the scale of every single
component in an enterprise ecosystem, but also understand all the interdependencies.
es
ess
oc
Pr
You can’t hire your way to observability at scale. Understanding dynamic multicloud
environments requires an automated approach that can multiply productivity of your
existing team and shift effort from manual tasks to driving tangible business results.
osts
H
rs
n te
e
t ac
Da

To scale observability and eliminate blind spots across increasingly complex
and expanding environments, teams need automation powered by:
Topology mapping Auto-discovery No-code approach

that continuously maps of new components to better leverage skilled
components, cloud to prevent gaps in coverage developers on proactive
services, and ever-changing in real-time optimization efforts
relationships between and business-driving
potentially billions innovation projects
of interdependencies
This continuous automation and always-on context gives teams confidence in keeping up
with dynamic technology stacks to digitally transform faster, without the ongoing burden
of constant deployments and manual maintenance in attempt to slowly gain more coverage and understanding.

Challenge 2:
Monitoring dynamic microservices
and containers in real-time
Short-lived containers and microservices, like those managed in Kubernetes,

provide the required speed and agility to successfully modernize. However,
the dynamic nature of technologies that can spin up and down within seconds
introduces several major issues to scaling observability for these technologies.
This all results in a lack of understanding of internal states of the application,

other interdependent components that microservices rely on, and even the
impact on users.

IT teams are still blind to what’s happening in their dynamic environments
and actioning on incomplete data because they:
70%
Don’t understand Can’t connect end-to-end Lack real-time visibility of CIOs say
the relationships tracing from real users into exactly what’s inside monitoring containerized
between containers accessing these microservices, the workloads running microservices in real-time
and upstream components to the nodes, the services within containers is almost impossible
that can impact them and containers they run on
— Dynatrace 2020 Global CIO Report

How to overcome it
Enterprises need observability to scale across their multicloud, including cloud,
legacy, and hybrid environments, to handle the dynamic nature of Kubernetes
and containers.

To ensure everything’s accounted for, no matter how short lived,
teams need real-time intelligence and automation with:
Automatic discovery Topology context Full-stack visibility

of containers at start-up, external to containers, all the way from the pod,
along with all things running since anomalies often occur through the cloud provider
inside each workload outside of Kubernetes and application, to the user
nodes, pods, containers, to understand the end-to-end
and clusters business impact
With this speed, automation, and context applied to containers and microservices,
IT teams can continuously understand system behavior and the true origin of anomalies
can be easily isolated and precisely pinpointed at scale.

Challenge 3:
The volume, velocity, and variety
of data and alerts
Dynamic multi-cloud environments are exponentially increasing the amount

of telemetry data emitted, and overwhelmed teams are still stuck trying
to monitor every data point and make sense of it all.
Already constrained IT resources are stuck reacting to each new problem

after users and business goals are already impacted, trying to observe what’s
happening by manually building, maintaining, and constantly watching
potentially thousands of dashboards.

However, this approach doesn’t scale and persists the same challenges
that cannot be solved using the same manual-intensive philosophy:
Defining and redefining Monitoring “unknown Siloed data sending Multiple teams struggling
“normal” for anomaly unknowns” — issues mixed signals that multiply to pinpoint issues across
thresholds that constantly you aren’t aware of, alert storms, intensifying team different tools to guess the root
change with dynamic don’t understand, fatigue and unnecessary cause, causing more finger-
environments and seasonality and don’t monitor war rooms pointing and blaming
All of this forces teams to spend even more of their time “keeping the lights on”
by guessing about the problem, priority, and diagnosis, rather than continuously optimizing
and resolving issues before users are impacted.

How to overcome it
Bu
sin
It’s clear that AI is needed to continuously and instantly understand when es
si
mp
and why anomalies occur. But the only way to transform from reactive ac
t
Ro
ot
to proactive, is having an AI that doesn’t need to learn or be trained. c au
se
Because dynamic multi-cloud environments can change within seconds, AI needs De

pl
oy
to know precise answers and be able to anticipate and auto-remediate issues m en
th
ist
before business impact. Pr or
ob y
lem
ev
olu
tio
n

A few critical capabilities of AI
that enable observability at scale:
Auto-adaptive threshold Intelligent grouping Always-on causation-based Integrating answers

baselining for anomaly of related anomalies AI with code-level analysis with context from external
detection to prioritize into a single problem that processes billions of systems (like ServiceNow
what really matters to eliminate redundant work dependencies with complete and other ITSMs) to broaden
across teams fault tree analysis to instantly workflow automation across
deliver answers multiple teams
The goal of causation-based AI is to provide answers to engineering, infrastructure, operations, and application teams
and empower them to focus on the things that matter. Delivering one precise answer for each issue that
everyone understands can transform teams away from finger-pointing to efficient cross-team
collaboration that drives business outcomes.

Challenge 4:
Siloed Infra, Dev, Ops, Apps,
APM
AIOps
and Biz teams Legacy
Log
DEM
New cloud-native technologies require more solutions to instrument

and monitor, but teams are already drowning in tool sprawl. This tool Network
sprawl aggravates silos that hurt innovation, decrease software quality,
and reduce collaboration.
Infrastructure

Each different tool and point solution amplifies these silos,
with the negative effects spreading across each team that continues to struggle
identifying and resolving issues and optimizations with the highest impact.
Data Environments Platforms Teams
Lack of connective Isolated observability Multiple tools When each team

tissue inflicts and monitoring for multi or hybrid receives alerts
time-consuming across pre-prod cloud platforms and symptoms
and error-prone and production create observability in a vacuum,
joining of disparate environments hurt blind spots for problems and blame
data models speed and quality of infrastructure and are passed “over the
‘shift-left’ efforts for platform operators wall” to others
DevOps and SRE teams

How to overcome it
To eliminate these silos, a solution can’t simply stitch it all together. It has to
bring together teams through a single common language. Bridging these gaps
with a single source of truth removes confusion and multiplies productivity
across teams.
This cross-team collaboration and more efficient working environment boosts

the speed of value-add product features and optimizations that drive better
user experiences.

Several key requirements enable teams to collaborate more efficiently
towards the same technical and business SLIs/SLOs:
Single data model Shared context that Seamlessly tying together

to scale observability facilitates cross-team the entire software lifecycle
across all layers and collaboration, with from feature development,
components across flexibility to slice and testing, releases, and ongoing
the full tech stack dice across infrastructure, optimizations to innovate
applications, operations, faster with higher quality
and business data

Challenge 5:
Knowing which efforts drive
positive business impact
Even with complete visibility to back-end components, a lack of front-end

user perspective diminishes much of the tangible value that organizations
aim to achieve with observability efforts.

Without front-end application performance,
major risks to the user experience are exposed:
Disconnected front-end Critical blind spots Disparate solutions No consideration

and back-end perspectives, like mobile app crashes, to attempt observability of employees working
hurting understanding of 3rd party services, CDN, for mobile and edge-device from home, potentially
technology’s impact on users and front-end errors channels, forcing teams damaging their ability
and business objectives still exist to leave some to access required resources
applications ignored they need to deliver frictionless
customer experiences
Neglecting the end-user experience of applications obstructs the ability

to prioritize optimizations and issues based on greatest business impact.
When teams only look at technology by itself, IT efforts may not align with business priorities.

How to overcome it
An outside-in user perspective of the application is needed to create
a feedback loop from back-end technology teams to product, digital,
and business teams, ensuring the entire cloud stack is supporting
expected outcomes.

To include user experience into a more intelligent observability approach,
organizations need to connect front and back-end perspective to gain:
Complete insight Observability and All-in-one platform

of technology’s impact monitoring across web, to optimize end-user
on user experience mobile, and IoT to gain experience for both customers
and business KPIs like understanding to holistic user and employees, no matter
revenue, conversions, experience across channels where they are in the world
and feature adoption
To achieve observability that scales across channels, customers, employees, and all types of applications, 
back-end and front-end application performance must be connected. Only then can teams across IT, product,
and business prioritize and align efforts that drive the bottom line.

Conclusion
To achieve observability at scale for dynamic multi-cloud environments

at the speed needed to exceed customer expectations and business goals,
a fundamentally different approach is required.
Continuing to waste effort on manual instrumentation and configuration,

digging through siloed data, and working on the wrong things prevents teams
from making progress, and ultimately from achieving strategic business goals.
Automated and intelligent observability is needed.
Dynatrace helps transform the way you work with:
Intelligent observability — See it all down to code-level, at scale
Continuous automation — Stay ahead of modern, dynamic multi-clouds
Precise Intelligence — Go from guessing to knowing

Our smarter approach to observability
helps teams turn AI into ROI, and drive:
99% 20% 75%

Fewer IT tickets Higher cart value Faster innovation
delivered
From 700 tickets a week Order-from-table mobile
With 75% MTTR and 4x
to just 7. application drives higher
productivity increase.
value than order from bar.
Learn more Learn more Learn more

Software intelligence for the enterprise cloud
Click the link to take the next step in your digital journey
and see what we can do for you.
Learn more If you’re ready to learn more, please visit dynatrace.com/platform for assets, resources, and a free 15-day trial.
About Dynatrace
Dynatrace provides software intelligence to simplify cloud complexity and accelerate digital transformation. With automatic and intelligent observability at scale, our all-in-one platform delivers precise answers about the performance of applications, the underlying
infrastructure and the experience of all users to enable organizations to innovate faster, collaborate more efficiently, and deliver more value with dramatically less effort. That’s why many of the world’s largest enterprises trust Dynatrace® to modernize and
automate cloud operations, release better software faster, and deliver unrivaled digital experiences.
dynatrace.com blog @dynatrace
12.17.20 10603_EBK_cs ©2020 Dynatrace

5 Challenges To Achieving Observability at Scale

Uploaded by

Copyright:

Available Formats

5 Challenges To Achieving Observability at Scale

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 Challenges To Achieving Observability at Scale

Uploaded by

Copyright:

Available Formats

5 challenges to achieving

Successful digital transformation requires every application

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 2

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 3

To scale observability, enterprise organizations must fundamentally

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 4

Complete coverage More productivity and time to innovate

Automation everywhere Higher quality releases

Real-time feedback Better customer experiences

Precise answers Reduced risk

Cross-collaboration Accelerated business outcomes

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 5

This makes it near impossible for IT teams to manually understand how

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 6

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 7

How to overcome it r v ice

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 8

Topology mapping Auto-discovery No-code approach

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 9

Short-lived containers and microservices, like those managed in Kubernetes,

This all results in a lack of understanding of internal states of the application,

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 10

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 11

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 12

Automatic discovery Topology context Full-stack visibility

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 13

Dynamic multi-cloud environments are exponentially increasing the amount

Already constrained IT resources are stuck reacting to each new problem

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 14

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 15

Because dynamic multi-cloud environments can change within seconds, AI needs De

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 16

Auto-adaptive threshold Intelligent grouping Always-on causation-based Integrating answers

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 17

and Biz teams Legacy

New cloud-native technologies require more solutions to instrument

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 18

Data Environments Platforms Teams

Lack of connective Isolated observability Multiple tools When each team

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 19

This cross-team collaboration and more efficient working environment boosts

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 20

Single data model Shared context that Seamlessly tying together

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 21

Even with complete visibility to back-end components, a lack of front-end

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 22

Disconnected front-end Critical blind spots Disparate solutions No consideration

Neglecting the end-user experience of applications obstructs the ability

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 23

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 24

Complete insight Observability and All-in-one platform

5 Challenges to Achieving Observability at Scale ©2020 Dynatrace 25

To achieve observability at scale for dynamic multi-cloud environments

Continuing to waste effort on manual instrumentation and configuration,

Automated and intelligent observability is needed.

Dynatrace helps transform the way you work with:

Intelligent observability — See it all down to code-level, at scale

Continuous automation — Stay ahead of modern, dynamic multi-clouds

Precise Intelligence — Go from guessing to knowing