ITSI Seattle-Serviceintelligencehands-Onworkshop
ITSI Seattle-Serviceintelligencehands-Onworkshop
ITSI Seattle-Serviceintelligencehands-Onworkshop
2. If you have not done so already, Sign up for the FREE Splunk ITSI Online Sandbox:
• http://splunk.com/itsi
• Select "Free Online Sandbox"
1
Copyright © 2016 Splunk Inc.
Building Business
Service Intelligence with
Splunk IT Service Intelligence
Thursday October 27, 2016
4
Defining Service Intelligence
Enabling a business-aware IT
Measuring and reporting on indicators that matter
Application Layer
Service Layer
Byte Code
Challenges
Instrumentation
• Too many disparate components
74% Adaptive
• Difficult to define Service Model
-36% Thresholding • Labor intensive
HP Run-Time Service Model
CA Service Operations Insight • Most implementations fail
IBM NetCool/Omnibus
Service Model definition
• Very important source is
Infrastructure Layer
Business Layer
Storage
Network
Data-Defined & Driven Service Insights
Synthetic APM
Availability, Capacity,
User Experience
Application Layer
Byte Code Instrumentation Service Intelligence
Usage, Experience,
Performance, Quality
Splunk> is the missing link
Adaptive Thresholding • Data Fidelity
74% Apps, Services, Systems
-36% • Single Repository for ALL data
MACHINE DATA
• Easier to Manage Services
Server • Reduced Integrations
Infrastructure Layer
Performance, Usage,
Dependency
• Reduced Point Solutions
• Collaborative Approach
Storage • Quick time to value
Utilization, Capacity,
Performance
Data Fabric Platform
Network
Packet, Payload, Traffic,
Utilization, Perf
Splunk Approach to Machine Data
Traditional Splunk
11
Machine learning-powered analytics for real-time service
insights, simplified operations and root-cause isolation
IT Service Intelligence Value Stack
§ Adaptive Threshold
§ Behavior Anomaly
ML § Correlates Data into Knowledge
§ Accelerators
§ Trend aggregation
§ Multi KPI Alerts
Service Health
Buttercup Games Example
What is a Service?
Service
Requests
Responses
17
What is a Service?
Technical Services
Requests
DNS
Responses
Requests
Auth Responses
Requests
Web
Responses
18
What is a Service?
Technical Services Business Services
Requests
DNS
Responses Volume
Order Entry
Revenue
Requests
Auth Responses
Customer Requests
Requests Care SLA Compliance
Web
Responses
19
What is a Service?
Customer Transactions
Business Function
API Services
API/Middleware
Web Services
Mobile
RBMDBs
DNS
Storage Tier
Packet Network
20
What is a KPI?
Customer Business
DNS Transactions Function
21
Key Performance Indicators (KPIs)
22
Service Health Scores
A Health score is a score form 0-100 (0 being critical and 100 being normal)
that measures the health of a Service. It is calculated based on all KPIs
importance and its status (e.g. green, orange, red), once every minute.
23
Splunk IT Service Intelligence
Let’s take a closer look at Service Intelligence with Splunk
24
Service Intelligence
Design Practices
25
Best Practices for Service Intelligence
New online store has issues that impact customer experience and revenue
The Business Problem for Buttercup Games
Supply ERP Online Failed Business
Chain Systems Store Interactions Impact
? ?
?
31
Design Before Configuring
GOAL 1
?
GOAL 2?
Continuous improvement
through visibility to key ?
Increase customer satisfaction and reduce
cost through fewer failures and restoration
indicators of supply chain
activities
performance
• Online Orders
Application Layer • Online Revenue
• Response Time
Online Store EDI
Business Layer
Mail Transport -Order Processing
E-Commerce -Financials
Application Layer
Middleware –Application Server -Database
Custom Apps
Infrastructure Layer
Power / Cooling / Facilities
Server –Networking –Storage
Service Intelligence Design in ITSI
1. High-value business services
• Buttercup Games Online Store and Supply Chain
3. Supporting services
• Web, Middleware, Database
36
Service Decomposition – Buttercup Games
Service Layer Supply Chain
Business Layer
Application Layer
Online Store EDI
Infrastructure Layer
Putting It All Together
• ServiceHealth
Service Layer Supply Chain • Incidents/Changes
• Customer Satisfaction
• Online Orders
Application Layer • Online Revenue
• Response Time
Online Store EDI
Application Layer
Online Store EDI • Application Logs
• Webserver Logs
• DB Perf Counters
• Wire data
• Perf Counters
Infrastructure Layer • Access Logs
• Network Logs
Copyright © 2016 Splunk Inc.
Let’s Play!
CLICK
“Glass Tables”
41
Service Visibility in ITSI
42
Service Visibility in ITSI
43
Goal 1: Supply Chain Visibility
44
Goal 2: Online Store Process Flow
45
New Requirements!
● Create a new KPI for the DB Service:
● Network Utilization
46
Configuration of DB Service
47
Let’s Talk Entities
● Select DB Service
48
A KPI in 5 minutes? Absolutely!
49
KPIs Continued….
● Select Yes for Split by & Filter options
● Select host for Entity Lookup & Alias options
● Click Next
50
Almost There…
Select
● KPI Search Schedule: Every Minute
● Entity Calculation: Average
● Service/Agg Calculation: Average
● Calculation Window: Last Minute
● Click Next
● Unit: Bps
● Click Next
51
Final Steps …
Set your thresholds:
● Aggregate (All)
● Per Entity
● Click “Add Threshold” TWICE
● Make the Neapolitan ice cream colors
Yellow, Green, Yellow
● Drag the sliders around in order to get
the current data graph entirely inside the
Green (normal) band
● Click Finish
● Other options are also available,
including adaptive thresholds and
anomaly detection
52
Adaptive Thresholds
What if your KPI data looks like this?
53
Adaptive Thresholds
Static thresholds will not work…
54
Adaptive Thresholds
Adaptive Thresholding works beautifully with cyclical (and other dynamic) data
55
Anomaly Detection
● Machine Learning
● Works well for data with patterns
● Requires some “training” (trial & error)
to zero in on best sensitivity
● More sophisticated capabilities coming!
(multivariate, more algorithms, etc)
56
Let’s Fix that Glass Table
57
Clone the Glass Table
Return to Saved Glass Tables page
(click on Glass Tables in the upper menu bar)
58
Edit & Have Fun!
Click on Edit in the upper right corner of your Glass Table
59
More Fun with the Glass Table Editor…
Use the Configurations panel on the right to edit a
selected widget
• Can change the visualization type, drilldown
behavior, and other settings
60
Finishing up …
• Add a ServiceHealthScore widget for Online
Store under Buttercup
• Choose a Viz Type with a sparkline graph, then
resize to make it look pretty
• Modify the Custom Drilldown action to go to
the saved glass table,
Buttercup Games Online Store
• Bonus Points: Make the label bigger, more
readable
• Click Save
• View when done
61
Copyright © 2016 Splunk Inc.
Let’s Play!
A Troubleshooting Exercise
A Troubleshooting Exercise
Let’s use ITSI to troubleshoot an outage
● Start at your Glass Table, “<UserName> Buttercup Business Process”
● Customer Care reports that unhappy customers are complaining of failures
and long delays when trying to purchase
● The calls began coming in at around the top of the last hour.
● In the upper right corner of the Glass Table, change the time picker from Now
to XX:00:00.0, where XX is the previous hour. For example, if it is currently
14:05, set the time picker to 13:00:00.0, then Apply
63
A Troubleshooting Exercise, cont’d
● The Online Store seems to be degraded, just as Customer Care reported.
Click on the widget under Buttercup to drill down further
64
A Troubleshooting Exercise, cont’d.
● The Online Store Glass Table shows a much more detailed view, including the impacted customer-facing KPIs
at the far left (Revenue, etc)
● Based on this view of all the relevant
services, where do you think the root cause
lies?
● Which service should we troubleshoot first?
● Click on Health widget for that service, to
drill down to a Deep Dive
65
Deep Dive
● Deep Dive shows multiple KPIs and Health Scores in parallel “swim
lanes”.
● The Health Score for this Service is the top swim lane. Can you see
when it begins to degrade from 100%?
● Mousing over this point in time, can you spot the KPI with the
leading fault indication, i.e., what failed first?
66
Multi-KPI Alerts and Notable Events
● Click on Notable Events Review
● Multiple KPIs and Healthscores can
be combined in sophisticated ways
to create Multi-KPI alerts
● When a Multi-KPI alert fires, one
of the outcomes is the creation of
a Notable Event
● Notable Events allow NOC
personnel and others to triage and
coordinate event management
efforts
67
Service Analyzer
● Click on Service Analyzer > Default Service Analyzer
68
Copyright © 2016 Splunk Inc.
Let’s Play!
Advanced Exercises
Summary
● High-value services can be decomposed and modeled in ITSI, using machine data
from the relevant systems
● Services and KPIs can be created in minutes, with sophisticated thresholding
techniques to distinguish “normal” from “not normal”
● Glass Tables allow service health and KPI metrics to be displayed in a way that
makes sense to specific groups, such as Executive Leadership, Business Service
Owners, the NOC, DevOps & Others
● Deep Dives allow KPIs to be compared side-by-side across any time range,
accelerating root cause analysis and significantly reducing MTTR
● Multi-KPI Alerts and Notable Events reduce alert noise, producing actionable
events and a means to manage them
● … and it’s fast+fun to build!
70
What our ITSI
Customers are
doing
Splunk IT Service Intelligence
Machine Learning-Powered, Analytics-Driven IT Operations
Prioritize incidents with context Redefine the role of IT
Deliver business & service context to prioritize Support decisions & communicate results
incident investigation & action with powerful service-level insights
This example is an
abbreviated ‘Book to
All the scores are time based KPI’s All the scores are color coded to convey if Bill’, or sometimes
or nested sub processes that are they are “normal” or “abnormal” based on called ‘Order to Cash’
searching in real time for some your criteria OR Splunk’s Packaged Machine business process.
relevant condition of interest. Learning, enabled with an ON/OFF switch.
Online Msg
Inbound Calls
CIO Scorecard
Enterprise Service Status Major Incidents Major Changes
Service Health Volume Revenue Incidents Changes Service Health Volume Revenue Incidents Changes
Service Health Volume Revenue Incidents Changes Service Health Volume Ontime Delivery Incidents Changes
Service Health Volume Revenue Incidents Changes Service Health Throughput Container Util Incidents Changes
The Vision - Business Operations Center
SOC
NOC BOC
• Splunk ITSI has the fundamentals to deliver on the promise of real time business visualizations
• Modeled after your Security, Network, and IT Operations Centers
• Monitoring and diagnosis of important ecommerce and brick and mortar operations
• Enhanced with process insight from end-to-end, alerts, machine learning and real-time response
Sign Up Now – We’re here to help!
Harness the creativity and domain knowledge of your
organization to unlock the value of data and solve an
important Business Service problem through a joint service
intelligence workshop with key stakeholders
● ITSI Documentation:
http://docs.splunk.com/Documentation/ITSI
81
Thank You
Please fill out the Survey
https://www.surveymonkey.com/r/NBXBYCG