0% found this document useful (0 votes)
1 views6 pages

1-Week R Programming Syllabus (Data Science, ML, Time Series)

Uploaded by

Dev Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views6 pages

1-Week R Programming Syllabus (Data Science, ML, Time Series)

Uploaded by

Dev Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1-Week R Programming Syllabus (Data Science,

ML, Time Series)


Day 1: Advanced R & Functional Programming
Learning Goals: Deepen R skills by mastering functional programming and efficient data wrangling. Learn
to write concise, modular code using higher-order functions and the tidyverse ecosystem.
Topics:
- Functional Programming in R: R treats functions as first-class objects. It supports anonymous functions,
closures, and lists of functions. R is “at its heart, a functional language” which encourages writing self-
contained functions that can be passed around 1 . Explore lapply() , sapply() , purrr::map() ,
and anonymous function(x) ... constructs to replace for-loops. Practice creating function factories
(functions that return functions) and using purrr::map() for list-column operations.
- Data Manipulation with dplyr/tidyverse: Learn the core “verbs” of the dplyr package: mutate() ,
select() , filter() , summarise() , arrange() , etc. dplyr is described as “a grammar of data
manipulation, providing a consistent set of verbs” for common tasks 2 . Practice grouping ( group_by )
and summarising data frames, and chaining operations with the pipe %>% . Understand tidy data
principles.
- Advanced R Topics: Cover topics like control structures refresher (if needed), environments, and
debugging. Optionally explore object-oriented programming (S3/S4/R6) or the data.table package for high-
performance data tables.

Exercises: Write R scripts/functions that use map() instead of loops, transform raw data frames into tidy
format, and perform grouped summaries. For example, given a data frame of sales by date, use dplyr to
compute monthly averages or purrr to apply functions to each column.

Resources: Hadley Wickham’s Advanced R (Chapter on Functional Programming) 1 ; dplyr documentation


and cheat sheets; R for Data Science (R4DS) chapters on data transformation; online tutorials (RStudio
Education, Swirl).

Day 2: Data Visualization with ggplot2


Learning Goals: Master the grammar of graphics in R using ggplot2. Create and customize a variety of
plots (scatter, line, bar, histogram, boxplot, etc.) and learn to combine layers, scales, and facets.
Topics:
- Introduction to ggplot2: Learn to start a plot with ggplot(data, aes(x, y)) and add layers
( geom_point() , geom_line() , etc.). ggplot2 is “a system for declaratively creating graphics, based on
the Grammar of Graphics” 3 . Understand aesthetics (color, shape, size) and how they map to data
variables.
- Plot Layers and Faceting: Explore adding multiple layers (points + lines + smoothers), adjusting themes,
and faceting ( facet_wrap ) to compare subsets. Practice customizing axes, legends, and annotations.

1
- Visualizing Distributions and Relationships: Use geom_histogram() , geom_boxplot() ,
geom_bar() , etc. to explore single-variable distributions and geom_violin() . Create scatterplots and
smoothing lines to examine relationships between variables.
- Interactive Plots (optional): Briefly mention interactive graphics tools (e.g. plotly ) for web-based
visualization.

Exercises: Create a scatterplot of iris (petal vs. sepal), a time series line chart of AirPassengers , and
a faceted bar chart of mtcars counts by gear and cyl. Experiment with different geoms and themes to
improve readability.

Resources: ggplot2 documentation 3 ; R for Data Science chapters on data visualization 4 ; ggplot2 cheat
sheets; RStudio’s Data Visualization Cheat Sheet; online video tutorials on ggplot2 (e.g. Posit webinars).

Day 3: Time Series – Trend & Seasonality


Learning Goals: Learn to work with time series data in R and identify trend and seasonal components.
Perform smoothing and decomposition techniques manually and with built-in functions.
Topics:
- Time Series Basics: Introduce R’s ts objects and plotting time series ( plot.ts ). Discuss frequency
(monthly, quarterly) and time indexing.
- Least Squares Trend (LSM): Fit a linear trend to data using the least squares method. The straight-line
trend equation is y = a + bx , solved via the normal equations ∑y = Na + b∑x and ∑xy = a∑x + b∑x² 5 .
In R, one can also use lm(time ~ value) to estimate trend.
- Moving Averages: Compute centered moving averages to smooth data. For a window of size 2k+1, slide across
the data and take mean; for even-sized windows compute a two-step center average 6 . This yields a trend-cycle
estimate that smooths short-term fluctuations.
- Ratio-to-Trend and Seasonal Indices: If seasonality is present, compute seasonal indices by dividing each
observation by its trend value. The ratio-to-trend method “involves dividing each observation in a time series by a
trend value” to isolate seasonal effects 7 . Contrast additive vs. multiplicative models: in an additive model,
Time Series = Trend + Seasonality + Random , whereas multiplicative means Time Series =
Trend * Seasonality * Random 8 .
- Decomposition:* Learn how to decompose a series into trend, seasonal, and remainder components (using
functions like decompose() or stl() ), and how to interpret seasonal indices.

Exercises: Given a monthly sales series, plot the data and compute a centered 12-month moving average.
Fit a linear trend via least squares ( lm ) and overlay it. Compute seasonal indices by the ratio-to-trend
method. Use decompose(AirPassengers) to decompose the famous dataset and compare to manual
calculations.

Resources: Hyndman & Athanasopoulos, Forecasting: Principles and Practice (ch.2–3, online textbook) for
methodology; R’s stats package ( decompose() , filter() ); R tutorials on time series (e.g. “A Little
Book of R for Time Series”).

2
Day 4: Forecasting Techniques
Learning Goals: Apply statistical forecasting methods to time series data. Learn exponential smoothing
and ARIMA modeling using R packages. Evaluate forecast accuracy.
Topics:
- Exponential Smoothing (ETS): Introduce simple, Holt’s, and Holt–Winters methods for smoothing trend
and seasonality. Show how to use forecast::ets() to fit models automatically.
- ARIMA Modeling: Cover ARIMA (auto-regressive integrated moving-average) models for non-seasonal
and seasonal data. Use forecast::auto.arima() to select an ARIMA model. Discuss differencing to
remove trend.
- Using the forecast Package: The forecast package provides “methods and tools for displaying and
analysing univariate time series forecasts including exponential smoothing via state space models and
automatic ARIMA modelling” 9 . Practice forecast() on fitted models to generate future predictions
and prediction intervals.
- Model Evaluation: Compute accuracy metrics (MAE, RMSE, MAPE) using accuracy() . Perform train-test
splits on time series (e.g. hold out last year) and back-testing. Discuss forecast diagnostic plots.
- Model Comparison: Compare simple forecasts (naïve, mean, seasonal naïve) vs. fitted models to choose
the best approach.

Exercises: Using the forecast package, fit an ETS and an ARIMA model to AirPassengers or another
series. Generate 12-step-ahead forecasts and plot them. Compare errors of both models, and contrast to
the naïve method.

Resources: R package forecast documentation 9 ; Hyndman’s online book; Kaggle/R-blogger tutorials on


forecasting; free videos (e.g. Rob Hyndman’s forecasts webinars).

Day 5: Supervised Machine Learning in R


Learning Goals: Learn core supervised ML methods: regression and classification. Practice building and
evaluating models using R packages.
Topics:
- Regression (Continuous Outcome): Use lm() for linear regression. Discuss assessing fit (R², residuals).
Introduce regularized regression via glmnet .
- Classification (Categorical Outcome): Train logistic regression ( glm(family="binomial") ). Explain
decision tree models (using rpart ) and Random Forests via the randomForest package (Breiman &
Cutler’s algorithm) 10 .
- Support Vector Machines & Naive Bayes: Use e1071 package for SVM and Naive Bayes (it provides
“support vector machines, naive Bayes classifier, ...” functions 11 ). Demonstrate classification on datasets
like iris .
- Model Training with caret: The caret package streamlines training and tuning for regression/
classification (“caret” = Classification And REgression Training) 12 . Learn caret::train() with
resampling (cross-validation) to tune parameters (e.g. number of trees, mtry in RF; cost , gamma in
SVM). Show how to specify trainControl() and extract best models.
- Tidymodels (advanced): Briefly mention the tidymodels ecosystem (parsnip, recipes, workflows) for a
tidy approach to modeling 13 .

3
Exercises: Using the built-in mtcars dataset, build a linear regression to predict mpg and evaluate it. On
the iris dataset, train a random forest (with caret or parsnip ) to classify species and report
accuracy. Use caret to tune an SVM on a binary outcome (e.g. Sonar data from mlbench).

Resources: Caret package vignettes (short introduction 12 ); Tidy Modeling with R online book; CRAN docs
for randomForest and e1071; machine learning tutorials (e.g. Andrew Ng’s ML course concepts, applied in
R).

Day 6: Unsupervised Learning & Clustering


Learning Goals: Explore clustering and dimensionality reduction. Learn to find structure in unlabeled data.
Topics:
- Clustering: Study K-means (partitioning) and hierarchical clustering. The goal is “to find homogeneous
subgroups within the data” based on similarity 14 . Use kmeans() to cluster numeric data and
hclust() on distance matrices. Determine the number of clusters (elbow method, silhouette).
- Dimensionality Reduction (PCA): Use prcomp() to perform Principal Component Analysis. Explain how
PCA projects high-dimensional data into principal components for visualization.
- Evaluating Clusters: Measure within-cluster SSE, silhouette scores (package cluster), and interpret
dendrograms. Visualize cluster assignments on scatter plots or pairs plots.

Figure: Example of k-means clustering on the iris data (Sepal vs. Petal length), illustrating how data points are
grouped into 3 clusters.

Exercises: Perform k-means clustering on the numeric columns of iris . Plot the clusters (e.g.
Petal.Length vs. Sepal.Length colored by cluster). Run hierarchical clustering and cut the tree into 3 groups;
compare to k-means labels. Apply PCA on mtcars and plot the first two principal components.

Resources: Texts like An Introduction to Statistical Learning (R labs on clustering); R documentation for
kmeans , hclust , silhouette ; the Kaggle tutorial Clustering in R. For unsupervised overview see 14 .

Day 7: Capstone Project & Review


Learning Goals: Integrate concepts through a hands-on project. Practice an end-to-end data science
workflow: import/clean data, EDA, analysis, modeling, and presentation. Review all key concepts.
Topics:
- Mini Projects/Case Studies: Apply learned skills to real-world scenarios. Examples:
- Time Series Project: Forecast future sales using ARIMA/ETS on a real dataset (e.g. monthly retail data).
- Supervised ML Project: Build and evaluate a predictive model on an open dataset (e.g. Titanic survival,
house prices, or credit risk).
- Unsupervised Project: Cluster customer or image data and interpret clusters (e.g. market segmentation).
- Cross-cutting Skills: Data acquisition (CSV/Excel import), cleaning (dealing with NAs), and reporting
(creating reproducible reports with R Markdown or Quarto).
- Review & Discussion: Summarize what was learned each day. Discuss best practices (version control,
documentation).

4
Exercises/Projects: Choose one of the mini-project ideas. For example, forecast AirPassengers using the
forecast package and compare errors of different models; or predict Titanic survival with caret (train a
random forest, compute ROC). Prepare a short report or presentation of your analysis and results.

Resources:
- Datasets: UCI ML repository, Kaggle Datasets (free signup), Rob Hyndman’s Time Series Data Library
(open data).
- Learning Materials: R for Data Science (free online) for data wrangling and viz, Forecasting: Principles and
Practice (free book) for time series, Tidy Modeling with R (free online) for modeling.
- Tutorials & Community: RStudio tutorials, Coursera (audit), YouTube lectures (e.g. “Machine Learning
with R” by freeCodeCamp), and Stack Overflow/RStudio Community.
- Documentation: CRAN package manuals (caret, randomForest, tidymodels, forecast) and cheat sheets
linked above 12 9 .

Each day’s materials build on the previous ones, providing hands-on practice (code along, exercises, mini-
projects) and free resources for continued learning. All factual content above is supported by authoritative
sources on R programming and data analysis 1 2 3 5 7 12 10 11 14 9 .

1 Introduction | Advanced R
https://adv-r.hadley.nz/fp.html

2 A Grammar of Data Manipulation • dplyr


https://dplyr.tidyverse.org/

3 Create Elegant Data Visualisations Using the Grammar of Graphics • ggplot2


https://ggplot2.tidyverse.org/

4 1 Data visualization – R for Data Science (2e)


https://r4ds.hadley.nz/data-visualize.html

5 6 Time series
https://pravin-hub-rgb.github.io/BCA/resources/sem4/comp_num_tbc405/unit5/index.html

7 8 Time Series Concepts(Terminologies) | by Nikitajain Jain | Medium


https://medium.com/@Niki_Data_n_AI/time-series-concepts-terminologies-73b9cc0b6378

9 CRAN: Package forecast


https://cran.r-project.org/package=forecast

10 CRAN: Package randomForest


https://cran.r-project.org/web/packages/randomForest/index.html

11 CRAN: Package e1071


https://cran.r-project.org/web/packages/e1071/index.html

12 A Short Introduction to the caret Package


https://cran.r-project.org/web/packages/caret/vignettes/caret.html

13 tidymodels package - RDocumentation


https://www.rdocumentation.org/packages/tidymodels/versions/1.3.0

5
14 Chapter 4 Unsupervised Learning | An Introduction to Machine Learning with R
https://lgatto.github.io/IntroMachineLearningWithR/unsupervised-learning.html

You might also like