1-Week R Programming Syllabus (Data Science,
ML, Time Series)
Day 1: Advanced R & Functional Programming
Learning Goals: Deepen R skills by mastering functional programming and efficient data wrangling. Learn
to write concise, modular code using higher-order functions and the tidyverse ecosystem.
Topics:
- Functional Programming in R: R treats functions as first-class objects. It supports anonymous functions,
closures, and lists of functions. R is “at its heart, a functional language” which encourages writing self-
contained functions that can be passed around 1 . Explore lapply() , sapply() , purrr::map() ,
and anonymous function(x) ... constructs to replace for-loops. Practice creating function factories
(functions that return functions) and using purrr::map() for list-column operations.
- Data Manipulation with dplyr/tidyverse: Learn the core “verbs” of the dplyr package: mutate() ,
select() , filter() , summarise() , arrange() , etc. dplyr is described as “a grammar of data
manipulation, providing a consistent set of verbs” for common tasks 2 . Practice grouping ( group_by )
and summarising data frames, and chaining operations with the pipe %>% . Understand tidy data
principles.
- Advanced R Topics: Cover topics like control structures refresher (if needed), environments, and
debugging. Optionally explore object-oriented programming (S3/S4/R6) or the data.table package for high-
performance data tables.
Exercises: Write R scripts/functions that use map() instead of loops, transform raw data frames into tidy
format, and perform grouped summaries. For example, given a data frame of sales by date, use dplyr to
compute monthly averages or purrr to apply functions to each column.
Resources: Hadley Wickham’s Advanced R (Chapter on Functional Programming) 1 ; dplyr documentation
and cheat sheets; R for Data Science (R4DS) chapters on data transformation; online tutorials (RStudio
Education, Swirl).
Day 2: Data Visualization with ggplot2
Learning Goals: Master the grammar of graphics in R using ggplot2. Create and customize a variety of
plots (scatter, line, bar, histogram, boxplot, etc.) and learn to combine layers, scales, and facets.
Topics:
- Introduction to ggplot2: Learn to start a plot with ggplot(data, aes(x, y)) and add layers
( geom_point() , geom_line() , etc.). ggplot2 is “a system for declaratively creating graphics, based on
the Grammar of Graphics” 3 . Understand aesthetics (color, shape, size) and how they map to data
variables.
- Plot Layers and Faceting: Explore adding multiple layers (points + lines + smoothers), adjusting themes,
and faceting ( facet_wrap ) to compare subsets. Practice customizing axes, legends, and annotations.
1
- Visualizing Distributions and Relationships: Use geom_histogram() , geom_boxplot() ,
geom_bar() , etc. to explore single-variable distributions and geom_violin() . Create scatterplots and
smoothing lines to examine relationships between variables.
- Interactive Plots (optional): Briefly mention interactive graphics tools (e.g. plotly ) for web-based
visualization.
Exercises: Create a scatterplot of iris (petal vs. sepal), a time series line chart of AirPassengers , and
a faceted bar chart of mtcars counts by gear and cyl. Experiment with different geoms and themes to
improve readability.
Resources: ggplot2 documentation 3 ; R for Data Science chapters on data visualization 4 ; ggplot2 cheat
sheets; RStudio’s Data Visualization Cheat Sheet; online video tutorials on ggplot2 (e.g. Posit webinars).
Day 3: Time Series – Trend & Seasonality
Learning Goals: Learn to work with time series data in R and identify trend and seasonal components.
Perform smoothing and decomposition techniques manually and with built-in functions.
Topics:
- Time Series Basics: Introduce R’s ts objects and plotting time series ( plot.ts ). Discuss frequency
(monthly, quarterly) and time indexing.
- Least Squares Trend (LSM): Fit a linear trend to data using the least squares method. The straight-line
trend equation is y = a + bx , solved via the normal equations ∑y = Na + b∑x and ∑xy = a∑x + b∑x² 5 .
In R, one can also use lm(time ~ value) to estimate trend.
- Moving Averages: Compute centered moving averages to smooth data. For a window of size 2k+1, slide across
the data and take mean; for even-sized windows compute a two-step center average 6 . This yields a trend-cycle
estimate that smooths short-term fluctuations.
- Ratio-to-Trend and Seasonal Indices: If seasonality is present, compute seasonal indices by dividing each
observation by its trend value. The ratio-to-trend method “involves dividing each observation in a time series by a
trend value” to isolate seasonal effects 7 . Contrast additive vs. multiplicative models: in an additive model,
Time Series = Trend + Seasonality + Random , whereas multiplicative means Time Series =
Trend * Seasonality * Random 8 .
- Decomposition:* Learn how to decompose a series into trend, seasonal, and remainder components (using
functions like decompose() or stl() ), and how to interpret seasonal indices.
Exercises: Given a monthly sales series, plot the data and compute a centered 12-month moving average.
Fit a linear trend via least squares ( lm ) and overlay it. Compute seasonal indices by the ratio-to-trend
method. Use decompose(AirPassengers) to decompose the famous dataset and compare to manual
calculations.
Resources: Hyndman & Athanasopoulos, Forecasting: Principles and Practice (ch.2–3, online textbook) for
methodology; R’s stats package ( decompose() , filter() ); R tutorials on time series (e.g. “A Little
Book of R for Time Series”).
2
Day 4: Forecasting Techniques
Learning Goals: Apply statistical forecasting methods to time series data. Learn exponential smoothing
and ARIMA modeling using R packages. Evaluate forecast accuracy.
Topics:
- Exponential Smoothing (ETS): Introduce simple, Holt’s, and Holt–Winters methods for smoothing trend
and seasonality. Show how to use forecast::ets() to fit models automatically.
- ARIMA Modeling: Cover ARIMA (auto-regressive integrated moving-average) models for non-seasonal
and seasonal data. Use forecast::auto.arima() to select an ARIMA model. Discuss differencing to
remove trend.
- Using the forecast Package: The forecast package provides “methods and tools for displaying and
analysing univariate time series forecasts including exponential smoothing via state space models and
automatic ARIMA modelling” 9 . Practice forecast() on fitted models to generate future predictions
and prediction intervals.
- Model Evaluation: Compute accuracy metrics (MAE, RMSE, MAPE) using accuracy() . Perform train-test
splits on time series (e.g. hold out last year) and back-testing. Discuss forecast diagnostic plots.
- Model Comparison: Compare simple forecasts (naïve, mean, seasonal naïve) vs. fitted models to choose
the best approach.
Exercises: Using the forecast package, fit an ETS and an ARIMA model to AirPassengers or another
series. Generate 12-step-ahead forecasts and plot them. Compare errors of both models, and contrast to
the naïve method.
Resources: R package forecast documentation 9 ; Hyndman’s online book; Kaggle/R-blogger tutorials on
forecasting; free videos (e.g. Rob Hyndman’s forecasts webinars).
Day 5: Supervised Machine Learning in R
Learning Goals: Learn core supervised ML methods: regression and classification. Practice building and
evaluating models using R packages.
Topics:
- Regression (Continuous Outcome): Use lm() for linear regression. Discuss assessing fit (R², residuals).
Introduce regularized regression via glmnet .
- Classification (Categorical Outcome): Train logistic regression ( glm(family="binomial") ). Explain
decision tree models (using rpart ) and Random Forests via the randomForest package (Breiman &
Cutler’s algorithm) 10 .
- Support Vector Machines & Naive Bayes: Use e1071 package for SVM and Naive Bayes (it provides
“support vector machines, naive Bayes classifier, ...” functions 11 ). Demonstrate classification on datasets
like iris .
- Model Training with caret: The caret package streamlines training and tuning for regression/
classification (“caret” = Classification And REgression Training) 12 . Learn caret::train() with
resampling (cross-validation) to tune parameters (e.g. number of trees, mtry in RF; cost , gamma in
SVM). Show how to specify trainControl() and extract best models.
- Tidymodels (advanced): Briefly mention the tidymodels ecosystem (parsnip, recipes, workflows) for a
tidy approach to modeling 13 .
3
Exercises: Using the built-in mtcars dataset, build a linear regression to predict mpg and evaluate it. On
the iris dataset, train a random forest (with caret or parsnip ) to classify species and report
accuracy. Use caret to tune an SVM on a binary outcome (e.g. Sonar data from mlbench).
Resources: Caret package vignettes (short introduction 12 ); Tidy Modeling with R online book; CRAN docs
for randomForest and e1071; machine learning tutorials (e.g. Andrew Ng’s ML course concepts, applied in
R).
Day 6: Unsupervised Learning & Clustering
Learning Goals: Explore clustering and dimensionality reduction. Learn to find structure in unlabeled data.
Topics:
- Clustering: Study K-means (partitioning) and hierarchical clustering. The goal is “to find homogeneous
subgroups within the data” based on similarity 14 . Use kmeans() to cluster numeric data and
hclust() on distance matrices. Determine the number of clusters (elbow method, silhouette).
- Dimensionality Reduction (PCA): Use prcomp() to perform Principal Component Analysis. Explain how
PCA projects high-dimensional data into principal components for visualization.
- Evaluating Clusters: Measure within-cluster SSE, silhouette scores (package cluster), and interpret
dendrograms. Visualize cluster assignments on scatter plots or pairs plots.
Figure: Example of k-means clustering on the iris data (Sepal vs. Petal length), illustrating how data points are
grouped into 3 clusters.
Exercises: Perform k-means clustering on the numeric columns of iris . Plot the clusters (e.g.
Petal.Length vs. Sepal.Length colored by cluster). Run hierarchical clustering and cut the tree into 3 groups;
compare to k-means labels. Apply PCA on mtcars and plot the first two principal components.
Resources: Texts like An Introduction to Statistical Learning (R labs on clustering); R documentation for
kmeans , hclust , silhouette ; the Kaggle tutorial Clustering in R. For unsupervised overview see 14 .
Day 7: Capstone Project & Review
Learning Goals: Integrate concepts through a hands-on project. Practice an end-to-end data science
workflow: import/clean data, EDA, analysis, modeling, and presentation. Review all key concepts.
Topics:
- Mini Projects/Case Studies: Apply learned skills to real-world scenarios. Examples:
- Time Series Project: Forecast future sales using ARIMA/ETS on a real dataset (e.g. monthly retail data).
- Supervised ML Project: Build and evaluate a predictive model on an open dataset (e.g. Titanic survival,
house prices, or credit risk).
- Unsupervised Project: Cluster customer or image data and interpret clusters (e.g. market segmentation).
- Cross-cutting Skills: Data acquisition (CSV/Excel import), cleaning (dealing with NAs), and reporting
(creating reproducible reports with R Markdown or Quarto).
- Review & Discussion: Summarize what was learned each day. Discuss best practices (version control,
documentation).
4
Exercises/Projects: Choose one of the mini-project ideas. For example, forecast AirPassengers using the
forecast package and compare errors of different models; or predict Titanic survival with caret (train a
random forest, compute ROC). Prepare a short report or presentation of your analysis and results.
Resources:
- Datasets: UCI ML repository, Kaggle Datasets (free signup), Rob Hyndman’s Time Series Data Library
(open data).
- Learning Materials: R for Data Science (free online) for data wrangling and viz, Forecasting: Principles and
Practice (free book) for time series, Tidy Modeling with R (free online) for modeling.
- Tutorials & Community: RStudio tutorials, Coursera (audit), YouTube lectures (e.g. “Machine Learning
with R” by freeCodeCamp), and Stack Overflow/RStudio Community.
- Documentation: CRAN package manuals (caret, randomForest, tidymodels, forecast) and cheat sheets
linked above 12 9 .
Each day’s materials build on the previous ones, providing hands-on practice (code along, exercises, mini-
projects) and free resources for continued learning. All factual content above is supported by authoritative
sources on R programming and data analysis 1 2 3 5 7 12 10 11 14 9 .
1 Introduction | Advanced R
https://adv-r.hadley.nz/fp.html
2 A Grammar of Data Manipulation • dplyr
https://dplyr.tidyverse.org/
3 Create Elegant Data Visualisations Using the Grammar of Graphics • ggplot2
https://ggplot2.tidyverse.org/
4 1 Data visualization – R for Data Science (2e)
https://r4ds.hadley.nz/data-visualize.html
5 6 Time series
https://pravin-hub-rgb.github.io/BCA/resources/sem4/comp_num_tbc405/unit5/index.html
7 8 Time Series Concepts(Terminologies) | by Nikitajain Jain | Medium
https://medium.com/@Niki_Data_n_AI/time-series-concepts-terminologies-73b9cc0b6378
9 CRAN: Package forecast
https://cran.r-project.org/package=forecast
10 CRAN: Package randomForest
https://cran.r-project.org/web/packages/randomForest/index.html
11 CRAN: Package e1071
https://cran.r-project.org/web/packages/e1071/index.html
12 A Short Introduction to the caret Package
https://cran.r-project.org/web/packages/caret/vignettes/caret.html
13 tidymodels package - RDocumentation
https://www.rdocumentation.org/packages/tidymodels/versions/1.3.0
5
14 Chapter 4 Unsupervised Learning | An Introduction to Machine Learning with R
https://lgatto.github.io/IntroMachineLearningWithR/unsupervised-learning.html