Predictive Analytics
Predictive Analytics
Predictive Analytics
Decoding Buzzwords
Big Data, Predictive Analytics,
Business Intelligence
Brief about me
Bashar Tahayna
PhD in Computer Vision and Machine Learning
Part Timer @BirZeit, AAUJ
Founder GAA Software AG
Co-Founder GPAL IBM / Lenovo Partner
Consultant: Germany, Malaysia, UAE, KSA, and Palestine
Training ~~ Love
Agenda
Big Data
Big Data Analytics
Predictive Analytics & Big Data
Predictive Analytics Process
Predictive Analytics on Action
Predictive Analytics Banking and Finance
3
VS
Web logs,
Emails,
Sensors,
Mobile data
Social media
and many other sources.
Variety
Volume
Velocity
Veracity
Value
refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes. If
we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. This makes
most data sets too large to store and analyse using traditional database technology. New big data tools use distributed systems so that we can store and analyse
data across databases that are dotted around anywhere in the world.
Variety
Volume
Velocity
Veracity
Value
refers to the speed at which new data is generated and the speed at which data moves around. Just
think of social media messages going viral in seconds. Technology allows us now to analyse the data while it is being generated
(sometimes referred to as in-memory analytics), without ever putting it into databases.
Variety
Volume
Velocity
Veracity
Value
10
refers to the different types of data we can now use. In the past we only focused on structured
data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the worlds data is unstructured
(text, images, video, voice, etc.) With big data technology we can now analyse and bring together data of different types such
as messages, social media conversations, photos, sensor data, video or voice recordings.
Variety
Volume
Veracity
Velocity
Value
11
refers to the messiness or trustworthiness of the data. With many forms of big data
quality and accuracy are less controllable (just think of Twitter posts with hash tags, abbreviations, typos and
colloquial speech as well as the reliability and accuracy of content) but technology now allows us to work with this
type of data.
Variety
Volume
Velocity
Veracity
Value
12
Variety
Volume
Velocity
Veracity
Value
13
14
15
Put it simple
More data
more accurate analyses
more
confident decision making
greater operational
efficiencies, cost reductions, perfect scoring, and reduced
risk.
16
Scoring
34
52
18
23
41
11
17
18
Predictive
Analytics
helps
connect data to effective action
by drawing reliable conclusion
about the current conditions
and future events.
- Gareth Herschel, Research Director, Gartner Group
19
Why are we
on/off track?
What should
we do next?
20
Data
What happened?
What is happening?
Why did it happen?
ERP
CRM
SCM
Past
Present
Future
3Pty
Black
books
21
Predict
Act
Text
Mining
Data
Collection
Data
Mining
Statistics
Deployment
Technologies
Platform
Pre-built Content
Attract
Up-sell
Retain
22
Capture
Sources
Traditional Relational databases, flat files, excel
spreadsheets, etc
Big Data Hadoop, NoSQL
Data Triangle
Methodology
Types
Forms
Structured,
Unstructured
Data at rest,
Data In Motion
23
Predict
Data Mining
Text Mining
Statistical Analysis
24
25
Intelligence Degree
26
27
(Quarter1) Weeks
Revenue (Thousands)
$15
$21
$24
$25
$28
$32
$40
10
11
12
28
$45
$40
$35
$30
Rev. K$
$25
$20
$15
$10
$5
$0
0
6
Week
12
29
(Quarter1) Weeks
Revenue (Thousands)
$15
$21
$24
$25
$28
$32
$40
$41
$44
10
$48
11
$52
12
$54
30
Revenue
(Thousands)
$15
$21
$24
$25
$28
$32
$40
10
11
12
31
Linear Regression
32
Revenue=15 +A+2B-C
(Quarter1)
Weeks
Revenue
(Thousands)
$15
$21
$24
$25
$28
$32
$40
$31
$26
10
$19
11
$17
12
$32
33
(Quarter1) Weeks
Multivariate Model
$15
$15
$21
$21
$24
$24
$25
$25
$28
$28
$32
$32
$40
$40
$41
$31
$44
$26
10
$48
$19
11
$52
$17
12
$54
$32
34
$45
$40
$35
Rev. k$
$30
$25
$20
$15
$10
$5
$0
0
10
12
14
Week#
35
Agility
Accuracy
Rule-based anti-money
laundering programs
are often ineffective
and time-consuming.
37
Segmentation
Here is segmentation to
clusters that indicates a
suspicious cluster based
on the collected data.
38
39
40
Thank You
41