Rapidminer: Real Data Science, Fast and Simple
Rapidminer: Real Data Science, Fast and Simple
Overview
CB Insights VENTANA
The AI 100, 2017 RESEARCH
Accolades “100 Startups Using Artificial Intelligence 2016 Technology Innovation
Awards Winner
to Transform Industries” Predictive Analytics
2
Insight Without Action Has No Value
Analytics 3.0*
Predictive & Step Five
Prescriptive
Analytics 2.0
Diagnostic
Proactive
Analytics 1.0
Descriptive Reactive
Passive
5
The RapidMiner Competitive Advantage
Unified
Platform
Prototype – Substantiate – Operationalize –
seamless, high performance orchestration
Books
Videos & In-Product Tutorials Webinars Demos & Documentation
8
Systems
Global
Partners 9
Real data science, fast and simple.
RapidMiner Inc.
10 Milk Street
11th Floor
Boston, MA 02108
rapidminer.com
Boston Budapest Dortmund London @rapidminer
Additional Content
11
RapidMiner Data Science Impact
Data Access Data Exploration Data Prep Modeling Validation Collaboration Computation Scheduling Integration Management
Connect to any data Quickly discover patterns Speed & optimize ALL data Efficiently build and Confidently & accurately Connect to any data Quickly discover Speed & Efficiently build and Confidently & accurately
source, any format, at any or data quality issues exploration, blending & deliver better models estimate model source, any format, at patterns or data quality optimize ALL data deliver better models estimate model
scale cleansing tasks faster performance any scale issues exploration, blending & faster performance
cleansing tasks
RapidMiner Radoop
Simplified, Intelligent Big Data Science & Machine Learning
Simplified Analytics Lightning Fast Broad Data Access Integrated Security Optimized for Hadoop Scalable Processing Spark Execution
Reduces Hadoop complexity Covers complete analytics Eliminate connectivity Ensure security compliance Leverage Hadoop distributed Process in-Hadoop and in- Execute RapidMiner sub -
lifecycle struggles power memory processes in parallel
13
The RapidMiner Platform
Web Services
Process Execution
User/Group Access Process
Engine
Rights management Scheduler
Process
Integrate using Web Services, JSON, SQL, …
Execution Engine Server Java SE/EE
RapidMiner Radoop Application Application
Compile + Execute in Hadoop
RapidMiner Radoop Databases / Application (BI, ERP,
Compile + Execute in Hadoop Data CRM…) / Portal
warehouses
14
RapidMiner Studio
All-In-One Data Science Workflow Designer
Lightning Fast
Visual interface for rapidly building complete analytic
workflows
Powerful
Rich library of algorithms and functions to build the strongest
possible model for any use case
15
RapidMiner Server
Operationalization & Collaboration Management
16
RapidMiner Radoop
Extends the RapidMiner’s visual workflow to Hadoop
In Hadoop Execution
Pushes analytic instructions into Hadoop
for computation
Secure
Complies with Hadoop security standards
17
Sample Use Cases
Payments – Worldwide Telco - Austria
Telco - Austria Telco – Germany
Customer feedback & voice of the Optimize customer support by
customer, churn prevention, text Automated Online Market Research, automatically categorizing
Automated Customer Feedback Text
mining, automated text categorization, Text Analytics, Sentiment Analysis, unstructured data by content and to
Analysis for Automated E-Mail
and sentiment analysis to customer Customer Insight prioritize and reduce response time
Categorization & Routing
support and sat to prevent customer and cost so increasing customer
churn satisfaction
Telco – Europe
Telco - Switzerland Telco – Hungary
CRM applications including Marketing – Germany
optimization of direct marketing Customer Relationship Analytics, Churn
Server & Equipment Load
campaigns, automated generation of Prediction & Prevention, Direct
Forecasting, Predictive product recommendations for cross- Automated Online Market Research,
Marketing Campaign Optimization,
Maintenance, Predicting & selling and up-selling, customer churn Scheduling & Automated Execution of Text & Sentiment Analysis, Customer
Preventing Server & Component prevention, and fraud detection ETL Tasks Insight, Competitive Intelligence
Failures
Market Research -
Worldwide Payments – Worldwide OEM – Europe
Telco – Germany
Sentiment Analysis of online text
Prediction of sales volumes; Fraud Detection & Prevention sources, including social media and
Fraud Detection & Prevention Solutions
CRM optimization; social media for Telecoms
other user generated content for
monitoring and sentiment customer care triage
analysis
18
Sample Customer Use Cases
Fraud detection in retail network Prediction of sales volumes; CRM CRM applications including optimization
historical data on service usage, optimization; social media monitoring and of direct marketing campaigns,
transaction history, customer profiles, sentiment analysis automated generation of product
usage logs, and known cases of recommendations for cross-selling and
Automated Customer Feedback Text Analysis
fraudulent behavior up-selling, customer churn prevention,
for Automated E-Mail Categorization &
Routing and fraud detection
19
Sample Customer Use Cases
Maximizing Customer
Fraud Detection
Lifetime Value
Fraud detection in retail network
historical data on service usage, CRM applications including optimization
transaction history, customer profiles, of direct marketing campaigns,
usage logs, and known cases of automated generation of product
fraudulent behavior recommendations for cross-selling and
up-selling, customer churn prevention,
and fraud detection
20
Safeguarding Electronic Payments
Anticipating the risk of fraud
The Challenge
• Protecting against fraud and anticipation of risk 7x24
Russia’s • Large and diverse set of partners (merchants) – over 70,0000
Largest electronic • How to classify and check merchant ecommerce sites for payment system compliance?
payment service
RapidMiner Solution
• Analyze, classify and check merchants’ ecommerce sites for compliance
• Utilize text mining with NLP to auto-categorize with high sentiment accuracy
• Mashup the widest data sets - historical data on service usage, transaction history, customer profiles,
usage logs, and known cases of fraudulent behavior
• Detect anomalies, misuse and fraud through operationalized classification model
Outcome
• Only 8-10% of merchant sites now screened manually at 80% confidence threshold
• Accurate automated analysis of high risk sites- 92% correctly classified
• Elimination of false positives - no normal sites classified as high risk
• Time and cost to resolve fraud case radically reduced
21
Repeat Business through Marketing Efficacy
Identify upsell offers through deep customer analytics
The Challenge
Large • Industry with tight margins & intense competition
North American • Broad array of online & mobile channels for customers to place orders
• Goal to improve marketing offers and create more repeat business
restaurant delivery
chain RapidMiner Solution
• Capture a vast array of customer ordering data from multiple online & mobile phone channels
• Use RapidMiner to join & enriched data with 3rd-party demographics & competitive data
• Use data science to assess performance and growth drivers at individual stores & franchise groups
• Results used to tailor coupons & upsell offers to customers
Outcome
• Greater flow of repeat customers, driving growth at individual stores and franchise groups
• Far outpaced the industry: Posted best Q2 & Q3 domestic same-store sales growth of the 25 largest
restaurant chains in the U.S.
• Next steps: RapidMiner Radoop
22
Customer Satisfaction through Quality of Service
Customer experience begins with network quality
The Challenge
• Backend infrastructure footprint & costs increasing yearly
Leading European • Customer satisfaction driven by service quality in areas such as video streaming latency
• Network operation teams must accelerate root cause analysis, reduce time to repair
Telecoms Provider • Data visualization with big data alone cannot provide operationalized insight needed
RapidMiner Solution
• Secure large scale Hortonworks Hadoop Big Data Hub architecture to leverage data lakes
• Correlation of log events with historical log data to preempt service quality degradation
• Through machine learning rapidly predict demand as consumer usage patterns change
• Utilize text mining to optimize help desk ticket triage and processing
Outcome
• Reduce infrastructure requirements (-10%)
• Improved customer retention (2%+)
• IT Operations costs reduced (-30%)
23
Drive Data Science Agility & Cut Costs
Faster development & deployment of customer analytics models
The Challenge
Leading • Existing data science teams looking to replace SAS
North American – Strong dislike of unwieldy SAS platform with the coding & complexity of it’s multiple
applications & user interfaces
Financial Services – Cost of SAS too high
Institution RapidMiner Solution
• Pull together customer data from across a number of internal databases & third-party sources
• Easily incorporate a large library of legacy predictive models written in R & Python
• Small team of 4 data scientists using collaboration features in RapidMiner Server to share data
prep and machine learning processes
Outcome
• Improved upsell opportunities and customer retention
• Speeds the process of data prep, rapid prototyping & validation of models over SAS methods
and coding-only methods
• Expansion into Risk department where data science team doesn’t code in SAS, R or Python
24
Gartner & Forrester – RapidMiner a Clear Leader
2017
Magic Quadrant for PAML Wave
Data Science Platforms
“…a Leader, owing to its market presence, the volume of client inquiries that Gartner “RapidMiner wraps breadth and depth in a beautiful package.
receives about it, its user community, and its well-rounded product that addresses
most data science use cases well.”
RapidMiner invested heavily to revamp visual interface to make it the most
‘Reference customers praised many facets of the platform — its large selection of concise and fluid that we have seen during this evaluation. Add to that,
algorithms, flexible modeling capabilities, data source integration and RapidMiner’s comprehensive set of operators that encapsulate a wide
consequent data preparation. The platform's strength lies not just in particular range of data prep, analytical, and modeling functionality to increase
areas, but also in its all-around consistency.” productivity of data scientists.”
25
Peer Insights – True Expert Validation
Top Predictive Analytics Products by Enterprise reviewers Reviews for Advanced Analytics Platforms
26