Making Big Data Simple With Databricks
Making Big Data Simple With Databricks
with Databricks
We are Databricks, the company behind Spark
Data Value
3
PROBLEM
4
Your difficult journey to finding value in data
Building a Build and
Import and explore data with disparate tools deploy data
cluster
applications
ETL
Data Dashboards
Warehousing & Reports
6
SOLUTION
Data Value
ETL
Data Dashboards
Warehousing & Reports
Real-time query engine Customizable dashboards
& 3rd party apps
Cluster Manager
Interactive workspace
Managed Spark clusters Production pipeline scheduler 3rd party applications
with notebooks
• Easily provision clusters • Schedule production workflows • Explore data and develop code • Connect powerful BI tools
• Harness the power of Spark • Implement complete pipelines in Java, Python, Scala, or SQL • Leverage a growing
• Import data seamlessly • Monitor progress and results • Collaborate with the entire team ecosystem of applications
• Point and click visualization
• Publish customized dashboards
9
Databricks benefits
12
Customer testimonials
“Without Databricks and the real-time insights from Spark, we wouldn't be
able to maintain our database at the pace needed for our customers”
Darian Shirazi, CEO, Radius Intelligence
“We condensed the 6 months we had planned for the initial prototype to
production process to just about a couple of weeks with Databricks.”
Rob Ferguson, Director of Engineering, Automatic Labs
13
Radius Intelligence
Gathering customer insights for marketers
• 25 million businesses
• Over 100 billion points of data
19
WHAT’S NEW?
20
What’s new with Databricks
• Databricks is now generally available (announced on June 15th, 2015)
• Upcoming features during second half of 2015:
• R-language notebooks: Analyze large-scale data sets using R in the
Databricks environment.
• Access control and private notebooks: Manage permissions to view
and execute code at an individual level.
• Version control: Track changes to source code in the Databricks
platform.
• Spark streaming support: Enabling a fault-tolerant real-time
processing
21
What’s new with Spark
• The general availability of Spark 1.4 was announced on June 10th 2015
• Spark 1.4 is largest Spark release: more than 220 contributors and 1,200
commits.
• Key new features introduced in Spark 1.4:
• New R language API (SparkR)
• Expansion of Spark’s Dataframe API’s: window functions, statistical and
mathematical functions, support for missing data.
• API to build complete machine learning pipelines.
• UI visualizations for debugging and monitoring programs: interactive event
timeline for jobs, DAG visualization, visual monitoring for Spark Streaming.
22
Data science made easy with Apache Spark
From ingest to production
Faster deployment
þ
Flexible þ
Instant to production of data pipelines
Contact sales@databricks.com