0% found this document useful (0 votes)
209 views36 pages

Ds R Capstone Template

This document outlines a capstone project applying data science with R. It includes an introduction, methodology, results, and conclusion sections. The methodology section describes collecting and wrangling bike sharing data, then performing exploratory data analysis with SQL and visualization. Predictive analysis is done using regression models to predict bike demand. A R Shiny dashboard is created to showcase the results. Key results are presented through SQL query outputs, visualizations of bike rentals over time and weather, and model performance metrics for the best predictive model selected. Screenshots demonstrate the interactive dashboard created.

Uploaded by

simeon taiwo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views36 pages

Ds R Capstone Template

This document outlines a capstone project applying data science with R. It includes an introduction, methodology, results, and conclusion sections. The methodology section describes collecting and wrangling bike sharing data, then performing exploratory data analysis with SQL and visualization. Predictive analysis is done using regression models to predict bike demand. A R Shiny dashboard is created to showcase the results. Key results are presented through SQL query outputs, visualizations of bike rentals over time and weather, and model performance metrics for the best predictive model selected. Screenshots demonstrate the interactive dashboard created.

Uploaded by

simeon taiwo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Applied Data

Science with R
Capstone project
<LEARNER’s Name>
<Date>
Outline
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix

2
Executive Summary
• Point1
• Point2
• Sub Point 1
• Sub Point 2
• Sub Point 3
• Point3
• Point4
• Point5

3
Introduction
• Point1
• Point2
• Point3
• Point4
• Sub Point1
• Sub Point2

4
Methodology
• Perform data collection
• Perform data wrangling
• Perform exploratory data analysis (EDA)
using SQL and visualization
• Perform predictive analysis using
regression models
• How to build the baseline model
• How to improve the baseline model
• Build a R Shiny dashboard app

5
Methodology

6
Data collection

• Describe how data sets were collected.

• You need to present your data collection


process use key phrases and flowcharts

• Add screenshots of Notebook code cell and cell


output used for OpenWeatherAPI and Webscrping
to the Appendix section for peer-review

7
Data wrangling

• Describe how data sets were processed

• You need to present your data wrangling process


using key phrases and flowcharts

• Add the screenshots of data wrangling code cell


and output for regular expressions, missing
values handling, generating indicator columns
to the Appendix section for peer-review

8
EDA with SQL

• Summarize performed SQL queries using bullet


points

• Add screenshots of all required SQL queries to


the Appendix section

9
EDA with data visualization

• Summarize what charts were plotted using bullet


points

• Add the screenshots of your ggplot code


snippets to the Appendix section

10
Predictive analysis

• Summarize how you built, evaluated, improved


and found the best performing model

• You need present your model development process


using key phrases and flowchart

11
Build a R Shiny dashboard

• Summarize what plots and interactions you built


into the dashboard using bullet points

12
Results
• Exploratory data analysis results

• Predictive analysis results

• A dashboard demo in screenshots

13
EDA with SQL

14
Busiest bike rental times

• Find dates and hours which had the most bike


rentals

• Present your query result with a short


explanation here

15
Hourly popularity and temperature
by seasons

• Find hourly popularity and temperature by


season

• Present your query result with a short


explanation here

16
Rental Seasonality

• Rental Seasonality

• Present your query result with a short


explanation here

17
Weather Seasonality

• Weather Seasonality

• Present your query result with a short


explanation here

18
Bike-sharing info in Seoul

• Find the total Bike count and city info for


Seoul

• Present your query result with a short


explanation here

19
Cities similar to Seoul

• Find all city names and coordinates with


comparable bike scale to Seoul's bike sharing
system

• Present your query result with a short


explanation here

20
EDA with Visualization

21
Bike rental Click icon to add picture
vs. Date
Show a scatter plot
of RENTED_BIKE_COUNT vs. DATE

Show the screenshot of the


scatter plot with explanations

22
Bike rental Click icon to add picture
vs. Datetime
Show the same plot of
the RENTED_BIKE_COUNT time
series, but now add HOURS as
the colour

Show the screenshot of the


scatter plot with explanations

23
Bike rental Click icon to add picture
histogram
Show a histogram overlaid with
a kernel density curve

Show the screenshot of the


histogram with explanations

24
Daily total
rainfall and Click icon to add picture
snowfall
Show a barchart calculating
the daily total rainfall and
snowfall

Show the screenshot of the box


plot with explanations

25
Predictive analysis

26
Ranked Click icon to add picture
coefficients
Show a screenshot of the
ranked coefficients bar chart
for the baseline model

Try to tell a story why some


variables are important while
some are not for predicting
bike-sharing demand

27
Model Click icon to add picture
evaluation
Built at least 5 different
models using polynomial terms,
interaction terms, and
regularizations

Visualize the refined models’


RMSE and R-squared using
grouped bar chart

28
Find the best performing model
• Select the best performing model with:
• RMSE must be less than 330
• R-squared must be larger than 0.72
• Shown a screenshot of the model performance

• Show its model formula here (RENTED_BIKE_COUNT ~


x1 + x2 + x3 ….)

• You could optionally present their final


coefficients here

29
Q-Q plot of the Click icon to add picture
best model
Plot the Q-Q plot of the best
model’s test results vs the
truths

30
Dashboard

31
<Dashboard screenshot 1>
• Replace <Dashboard screenshot 1> title with an
appropriate title

• Show the screenshot for cities’ max bike-


sharing prediction on a map

• Explain the important elements on the


screenshot

32
<Dashboard screenshot 2>
• Replace <Dashboard screenshot 2> title with an
appropriate title

• Show the screenshot when one specific city is


selected

• Explain the important elements on the


screenshot

33
<Dashboard screenshot 3>
• Replace <Dashboard screenshot 3> title with an
appropriate title

• Show the screenshot when another specific city


is selected

• Explain the important elements on the screenshot

34
CONCLUSION
• Point 1
• Point 2
• Point 3
• Point 4
•…

35
APPENDIX
• Include any relevant assets
like R code snippets, SQL
queries, charts, Notebook
outputs, or data sets that
you may have created during
this project

36

You might also like