0% found this document useful (0 votes)

36 views

Data Science Process Alliance CRISP DM For Data Science

Uploaded by

fdx_79

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Data Science Process Alliance CRISP DM For Data Science

Uploaded by

fdx_79

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

EVALUATING

CRISP-DM FOR
DATA SCIENCE

How can you use the classic data

science life cycle on your next project?

Data Science Process Alliance

Integrating data science process effectiveness research with industry leading agile training expertise
Data Science Process Alliance
Integrating data science process effectiveness research with industry leading agile training expertise

Executive Summary
What is CRISP-DM?
Published in 1999, CRISP-DM (CRoss Industry
Standard Process for Data Mining (CRISP-DM)
is the most popular framework for executing
data science projects. It provides a natural
description of a data science life cycle (the
workflow in data-focused projects).

However, this task-focused approach for

executing projects fails to address team and
communication issues. Thus, CRISP-DM should
be combined with other team coordination
frameworks. Results of a 2020 DSPA poll on the use of
data science process frameworks.

Six Phases
1. Business understanding
What does the business need?
2. Data understanding
What data do we have / need? Is it clean?
3. Data preparation
How do we organize the data for modeling?
4. Modeling
What modeling techniques should we apply?
5. Evaluation
What best meets the business objectives?
6. Deployment
How do stakeholders access the results?

How can you use CRISP-DM on your next Project?

Every project, team, and organization is
Evaluating CRISP-DM
unique. So to evaluate CRISP-DM for your next
project, first review its key concepts. Then, 1. Review the CRISP-DM framework
assess its strengths and weaknesses. Finally, 2. Explore Strengths & Weaknesses
consider some keys tips for its use. 3. Actions to consider

© Data Science Process Alliance 2022 www.datascience-pm.com

EVAUATING CRISP-DM FOR DATA SCIENCE

Reviewing CRISP-DM
Diving into the CRISP-DM Phases
I. Business Understanding
The Business Understanding phase focuses on understanding the objectives and requirements of the project.
While many teams hurry through this phase, establishing a strong business understanding is like building the
foundation of a house – absolutely essential. Aside from the third task, the three other tasks in this phase are
foundational project management activities that are universal to most projects:
1. Determine business objectives: understand what the customer / client is trying to
achieve, including the business success criteria.
2. Assess situation: Determine resources availability, project requirements, assess
risks and contingencies, and conduct a cost-benefit analysis.
3. Determine project goals: In addition to defining the business objectives, you should
also define what success looks like from a technical data mining perspective.
4. Produce project plan: Select technologies and tools and define detailed plans for
each project phase.

II. Data Understanding

Adding to the foundation of Business Understanding, the Data Understanding phase focuses on identifying,
collecting, and analyzing data sets that can help the project. This phase also has four tasks:
1. Collect initial data: Acquire the necessary data and (if necessary)
load it into your analysis tool.
2. Describe data: Examine the data and document its surface
properties like data format, number of records, or field identities.
3. Explore data: Dig deeper into the data. Query it, visualize it, and
identify relationships among the data.
4. Verify data quality: How clean/dirty is the data? Document any
quality issues.

III. Data Preparation

This phase, which is often referred to as “data munging”, prepares the final data set(s) for modeling. A common
rule of thumb is that 50% to 80% of the project effort is in the data preparation phase. This phase has five tasks:

1. Select data: Determine which data sets will be used and document reasons
for inclusion/exclusion.
2. Clean data: Often this is the lengthiest task. Without it, you’ll likely fall victim to
garbage-in, garbage-out. A common practice during this task is to correct,
impute, or remove erroneous values.
3. Construct data: Derive new attributes that will be helpful. For example, derive
someone’s body mass index from height and weight fields.
4. Integrate data: Create new data sets by combining data from multiple
sources.
5. Format data: Re-format data as necessary. For example, you might convert
string values that store numbers to numeric values so that you can perform
mathematical operations.

© Data Science Process Alliance 2022 www.datascience-pm.com

EVAUATING CRISP-DM FOR DATA SCIENCE

Reviewing CRISP-DM
Diving into the CRISP-DM Phases
IV. Modeling
Modeling is often regarded as data science’s most exciting work. In this phase, the team builds and assesses
various models based, often using several different modeling techniques. Although the CRISP-DM guide suggests
to “iterate model building and assessment until you strongly believe that you have found the best model(s)”, in
practice teams might iterating until they have a “good enough” model. This phase has four tasks:
1. Select modeling techniques: Determine which algorithms to try (e.g. regression,
neural net).
2. Generate test design: Pending your modeling approach, you might need to split the
data into training, test, and validation sets.
3. Build model: As glamorous as this might sound, this might just be executing a few
lines of code like “reg = LinearRegression().fit(X, y)”.
4. Assess model: Generally, multiple models are competing against each other, and
the data scientist needs to interpret the model results based on domain knowledge,
the pre-defined success criteria, and the test design.

V. Evaluation
Whereas the Assess Model task of the Modeling phase focuses on technical model assessment, the Evaluation
phase looks more broadly at which model best meets the business and what to do next. This phase has three
tasks:
1. Evaluate results: Do the models meet the business success criteria?
Which one(s) should we approve for the business?
2. Review process: Review the work accomplished. Was anything
overlooked? Were all steps properly executed? Summarize findings
and correct anything if needed.
3. Determine next steps: Based on the previous three tasks, determine
whether to proceed to deployment, iterate further, or initiate new
projects.

VI. Deployment
A model is not particularly useful unless the customer can access its results. So, deployment should be thought of
in terms of what does it take to actually use the results of the project. Depending on the project, this can be as
simple as sharing a report or as complex as implementing a live real-time predictive model. This final phase has
four tasks:

1. Plan deployment: Develop and document a plan for deploying the model.
2. Plan monitoring and maintenance: Develop a thorough monitoring and
maintenance plan to avoid issues during the operational phase (or post-project
phase) of a model.
3. Produce final report: The project team documents a summary of the project
which might include a final presentation of data mining results.
4. Review project: Conduct a project retrospective about what went well, what
could have been better, and how to improve in the future.

© Data Science Process Alliance 2022 www.datascience-pm.com

EVAUATING CRISP-DM FOR DATA SCIENCE

Analyzing CRISP-DM
Strengths and Weaknesses
Strengths & Benefits
Common Sense: Data scientists naturally follow a CRISP-DM-like Key Strengths:
process. When people are asked to do a data science project without
project management direction, they tend toward a CRISP-like
Common sense steps
methodology and can easily identify with the CRISP-DM phases and
doing iterations.
Cyclical: CRISP-DM can support the iterative nature of data science Easy to understand
(but how to actually do iterations is not defined)
Adopt-able: CRISP-DM can be implemented without much training, Defines a shared
organizational role changes, or controversy.
vocabulary for the
Right Start: The initial focus on Business Understanding, an often-
overlooked step, is helpful to align technical work with business steps in a project
needs and to steer data scientists away from jumping into a problem
without properly understanding business objectives.
Flexible: A loose CRISP-DM implementation can be flexible to
provide many of the benefits of agile principles and practices. By
accepting that a project starts with significant unknowns, the user can
cycle through steps, each time gaining a deeper understanding of the
data and the problem. The empirical knowledge learned from
previous cycles can then feed into the following cycles.

Weaknesses & Challenges

Not a Team Coordination Framework: Perhaps most significantly, Key Weaknesses:
CRISP-DM is not a true project management methodology because it
implicitly assumes that its user is a single person or small, tight-knit
team and ignores the teamwork coordination necessary for larger
Not clear when to "loop
projects. back" to a previous
Can ignore stakeholders: CRISP-DM phases and tasks can be phase
done with minimal input from stakeholders.
Outdated: CRISP-DM has not been updated since 1999 and is
Missing phases
criticized for not meeting the considerations of modern big data
science projects (e.g., operational support). (operational support)
Documentation Heavy: The full-fledged CRISP-DM approach
requires a lot of time-consuming documentation (although most No structured
teams seem to skip much of it). In fact, nearly every task has a
communication with
documentation step. While documenting one’s work is key in a
mature process, CRISP-DM’s documentation requirements might
stakeholders
unnecessarily slow the team from actually delivering increments.
Slow starts: The process matches closely with building a waterfall-
like approach, which could delay business value delivery by spending
too much time on the early phases, without incremental learning.

© Data Science Process Alliance 2022 www.datascience-pm.com

EVAUATING CRISP-DM FOR DATA SCIENCE

Going Forward
Key Actions to Consider
1. Combine with a team coordination process
There needs to be a mechanism for the team to communicate and prioritize work.
The team process should define how the team communicates, prioritizes tasks and
“loops back” to previous project phases.
Teams can leverage the CRISP-DM phases, and then use a framework such as
Scrum, Kanban or Data Driven Scrum to prioritize potential tasks.

2. Ensure multiple experiments / iterations

Iterate quickly and do not fall get pulled into a waterfall of sequential work.
Rather, try to deliver thin vertical slices of end-to-end value. Your first deliverable
might not be too useful. That’s okay. Iterate.
While it’s important to do multiple iterations, each team needs to think through how
iterations are defined and then evaluated.

3. Define team roles

CRISP-DM does not include roles (nor a team).
Data science efforts are increasingly a team sport.
Roles can include stakeholders / product owners (to ensure the insight is
actionable), as well as a process expert.

4. Ensure actionable insight

CRISP-DM lacks a communication structure with stakeholders.
How does the team ensure actionable insight?
Be sure to communicate and set expectations with stakeholders frequently.

5. Add phases (if needed) and define the subitems within each phase
Add steps or phases for practices like git version control and ML ops.
Be clear how tasks (within a phase) are defined.
Some tasks that should be explicitly discussed include: bias checks, accuracy
assessments, business validation, and dev dicussions.

6. Document enough…but not too much

CRISP-DM can be documentation heavy; for example, CRISP-DM calls for 12
reports prior to data collection.
So, do what’s reasonable and appropriate but don’t go overboard.

Data Science Process Alliance

Data Science Project Management Training & Coaching

We believe that data science projects are unique. And that it’s time we start managing them as such. But
industry struggles to understand how to deliver data science projects - often resorting to ad hoc or
software engineering practices.

But there is a better way. Which is why the Data Science Team™ teaches leaders, teams, and
organizations to apply effective agile principles to data science projects so that they can deliver better data
science outcomes.

Why Get Certified? Learning Components

Individualized mentoring (DSTL™ and DSTL+™ courses)
On-demand video instruction & discussions
Data Science Specific: Training
specifically for data science teams Real-world case studies
Exclusive training on Data Driven Scrum™
Flexible: Customizable team or
individual training & coaching Curated resources (blogs, white papers, etc.)

Tried and True: Based on

research, agile experts and real-
world experience

Differentiate Yourself: Knowing

how to manage data science
projects is a rare in-demand
skillset

Advance your Team: Improve

team morale and deliver better
outcomes Learn More

About the Data Science Process Alliance

DSPA empowers individuals and teams to successfully define and deliver data science projects through
courses and coaching specifically focused on data science project management.

Cambridge IGCSE: Cambridge International Mathematics For Examination From 2025
No ratings yet
Cambridge IGCSE: Cambridge International Mathematics For Examination From 2025
8 pages
Love Train Guitar Chords and Lyrics.
100% (1)
Love Train Guitar Chords and Lyrics.
2 pages
How To Build and Deploy ML Projects
No ratings yet
How To Build and Deploy ML Projects
29 pages
Ernst Friedrich War Against War PDF
100% (1)
Ernst Friedrich War Against War PDF
1 page
Crisp - DM: Data Mining Process
No ratings yet
Crisp - DM: Data Mining Process
8 pages
IBM SPSS Modeler CRISP-DM Guide
No ratings yet
IBM SPSS Modeler CRISP-DM Guide
50 pages
CRISP DM For Data Science
No ratings yet
CRISP DM For Data Science
7 pages
What Is CRISP DM - Data Science Process Alliance
No ratings yet
What Is CRISP DM - Data Science Process Alliance
20 pages
Module 5 - Data Science Methodology
No ratings yet
Module 5 - Data Science Methodology
17 pages
CRISP-DM-for-Data-Science-2025
No ratings yet
CRISP-DM-for-Data-Science-2025
6 pages
Notes On Data Science Methodologies
No ratings yet
Notes On Data Science Methodologies
4 pages
PAM - Complete
No ratings yet
PAM - Complete
322 pages
DS CRISP-DM Model
No ratings yet
DS CRISP-DM Model
2 pages
Crisp DM
No ratings yet
Crisp DM
2 pages
Crisp Note
No ratings yet
Crisp Note
5 pages
T Assignment
No ratings yet
T Assignment
5 pages
Crispslides
No ratings yet
Crispslides
20 pages
CRISP-DM Methodology Sometimes Need To Return To A Previous Step Continuation, Improvement
No ratings yet
CRISP-DM Methodology Sometimes Need To Return To A Previous Step Continuation, Improvement
18 pages
Big Data Analytics - Quick Guide - Tutorialspoint
No ratings yet
Big Data Analytics - Quick Guide - Tutorialspoint
50 pages
Portfolio 3
No ratings yet
Portfolio 3
10 pages
Session Summary CRISP Data Mining: Business Understanding
No ratings yet
Session Summary CRISP Data Mining: Business Understanding
4 pages
Doc2
No ratings yet
Doc2
3 pages
Rapport ML project
No ratings yet
Rapport ML project
26 pages
Organising ML Projects
No ratings yet
Organising ML Projects
52 pages
Topic 2 Business in Practice and The GRISP-DM Framework
No ratings yet
Topic 2 Business in Practice and The GRISP-DM Framework
22 pages
Crisp
No ratings yet
Crisp
14 pages
02 Crispdm
No ratings yet
02 Crispdm
25 pages
course-introduction
No ratings yet
course-introduction
38 pages
Crisp DM Presentation
No ratings yet
Crisp DM Presentation
13 pages
What is CRISP in Data Mining - Javatpoint
No ratings yet
What is CRISP in Data Mining - Javatpoint
10 pages
CRISP DM1 - Chapter 2
No ratings yet
CRISP DM1 - Chapter 2
22 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
IMP Questions & Ans on ML & CI Using Python
No ratings yet
IMP Questions & Ans on ML & CI Using Python
21 pages
Crisp-Dm: Elgounidi Hajar Safsafi Aya El Malki Ikram Aqaabich Reda
No ratings yet
Crisp-Dm: Elgounidi Hajar Safsafi Aya El Malki Ikram Aqaabich Reda
87 pages
Crisp DM
No ratings yet
Crisp DM
7 pages
2 - BBDS - Decisions Management & Problem Framing
No ratings yet
2 - BBDS - Decisions Management & Problem Framing
78 pages
Data Science Methodologies (Coursera)
No ratings yet
Data Science Methodologies (Coursera)
5 pages
Crisp Visualguide
No ratings yet
Crisp Visualguide
1 page
RapidMiner - Humans Guide ML V2
No ratings yet
RapidMiner - Humans Guide ML V2
19 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
Data Mining
100% (2)
Data Mining
36 pages
APA Lecture Notes
No ratings yet
APA Lecture Notes
79 pages
Cross Industry Standard Process For Data Mining
No ratings yet
Cross Industry Standard Process For Data Mining
3 pages
Week 5 CRISP-DM Process and Its Applications (PDF)
No ratings yet
Week 5 CRISP-DM Process and Its Applications (PDF)
24 pages
Crisp DM Presentation
No ratings yet
Crisp DM Presentation
9 pages
Unit 1.2 Layered Framework
No ratings yet
Unit 1.2 Layered Framework
32 pages
CRISP-DM - Towards A Standard Process Model For Data
No ratings yet
CRISP-DM - Towards A Standard Process Model For Data
11 pages
2 CRISP DMProcessModel
No ratings yet
2 CRISP DMProcessModel
18 pages
Crisp DM Thesis
100% (3)
Crisp DM Thesis
4 pages
Data analysis and mining
No ratings yet
Data analysis and mining
39 pages
Big Data Analytics Quick Guide
100% (1)
Big Data Analytics Quick Guide
53 pages
Data Science Methodologies
No ratings yet
Data Science Methodologies
31 pages
Polong Lin Presentation
No ratings yet
Polong Lin Presentation
34 pages
Jumpstart 2022
No ratings yet
Jumpstart 2022
7 pages
Crisp DM
100% (1)
Crisp DM
30 pages
2_Unit 1 - Lecture 3
No ratings yet
2_Unit 1 - Lecture 3
16 pages
rcis2021-crisp-dm
No ratings yet
rcis2021-crisp-dm
17 pages
EDA in DATA analytics
No ratings yet
EDA in DATA analytics
11 pages
Crisp DM
No ratings yet
Crisp DM
33 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Enterprise Architect’s Handbook: A Blueprint to Design and Outperform Enterprise-level IT Strategy (English Edition)
From Everand
Enterprise Architect’s Handbook: A Blueprint to Design and Outperform Enterprise-level IT Strategy (English Edition)
Dr. Vishwakarma J S
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Pecson Vs Coronel
No ratings yet
Pecson Vs Coronel
5 pages
Revised Strain Index PDF
No ratings yet
Revised Strain Index PDF
107 pages
Colour Card Ral
No ratings yet
Colour Card Ral
16 pages
Tutor Registration Form: Pusat Tuisyen Minda Saintis (P.T.M.S.)
No ratings yet
Tutor Registration Form: Pusat Tuisyen Minda Saintis (P.T.M.S.)
2 pages
Daily Sangeet
No ratings yet
Daily Sangeet
2 pages
Instant Access to (Ebook) Aristotle and the Ethics of Difference, Friendship, and Equality : The Plurality of Rule by Zoli Filotas ISBN 9781350160880, 9781350160866, 9781350160873, 9781350160897, 1350160881, 1350160865, 1350160873, 135016089X ebook Full Chapters
100% (8)
Instant Access to (Ebook) Aristotle and the Ethics of Difference, Friendship, and Equality : The Plurality of Rule by Zoli Filotas ISBN 9781350160880, 9781350160866, 9781350160873, 9781350160897, 1350160881, 1350160865, 1350160873, 135016089X ebook Full Chapters
81 pages
UPDV Factsheet R1
No ratings yet
UPDV Factsheet R1
10 pages
Rogers v. Tennessee Keeler v. Superior Court
No ratings yet
Rogers v. Tennessee Keeler v. Superior Court
2 pages
Assignment On Amniocentesis
100% (1)
Assignment On Amniocentesis
4 pages
Download full Tubular Structures XVI : Proceedings of the 16th International Symposium for Tubular Structures (ISTS 2017) December 4-6, 2017, Melbourne, Australia 1st Edition Amin Heidarpour ebook all chapters
100% (1)
Download full Tubular Structures XVI : Proceedings of the 16th International Symposium for Tubular Structures (ISTS 2017) December 4-6, 2017, Melbourne, Australia 1st Edition Amin Heidarpour ebook all chapters
51 pages
Gingee Fort - Troy of The East
No ratings yet
Gingee Fort - Troy of The East
1 page
Hypnosatsang:: Power Moves - Utility Tools For Change
100% (1)
Hypnosatsang:: Power Moves - Utility Tools For Change
22 pages
Pathcare Price List
No ratings yet
Pathcare Price List
7 pages
Partnership Liquidation Problems
No ratings yet
Partnership Liquidation Problems
25 pages
SCJP Questions - 310-055
No ratings yet
SCJP Questions - 310-055
34 pages
Automotive Manufacturing Processes: A Case Study Approach 1st Edition G. K. Awari instant download
No ratings yet
Automotive Manufacturing Processes: A Case Study Approach 1st Edition G. K. Awari instant download
52 pages
United States v. Stanley Lilly Romulus, A/K/A Frank Phillips, 996 F.2d 1213, 4th Cir. (1993)
No ratings yet
United States v. Stanley Lilly Romulus, A/K/A Frank Phillips, 996 F.2d 1213, 4th Cir. (1993)
4 pages
Fascia
100% (1)
Fascia
4 pages
The Autism-Spectrum Quotient (AQ) - Adolescent Version
No ratings yet
The Autism-Spectrum Quotient (AQ) - Adolescent Version
8 pages
Year 3 Book & Stationary List 2
No ratings yet
Year 3 Book & Stationary List 2
5 pages
A Research Work On Sustainable Finance Department at Bangladesh Bank
No ratings yet
A Research Work On Sustainable Finance Department at Bangladesh Bank
43 pages
Common Maternal Breastfeeding Problems
No ratings yet
Common Maternal Breastfeeding Problems
49 pages
Number Systems1
No ratings yet
Number Systems1
52 pages
Ucsp Quiz
No ratings yet
Ucsp Quiz
1 page
30TH EDITED..... July 2022 Saturday WSF Teaching Study Guide (FM)
No ratings yet
30TH EDITED..... July 2022 Saturday WSF Teaching Study Guide (FM)
1 page
XK0-005 Exam - Free Actual Q&As, Page 5 _ ExamTopics 201-250
No ratings yet
XK0-005 Exam - Free Actual Q&As, Page 5 _ ExamTopics 201-250
23 pages
IWBE 2025 Brochure
No ratings yet
IWBE 2025 Brochure
12 pages