Google data analytics professional course
Week -1
The exciting world of programming
The R-versus-Python debate
Additional Readings
● https://medium.com/analytics-and-data/r-vs-python-a-comprehensive-
guide-for-data-professionals-321e8dead598
● https://www.dataquest.io/blog/python-vs-r/
● https://blog.rstudio.com/2019/12/17/r-vs-python-what-s-the-best-fo
r-language-for-data-science/
Programming as a data analyst
From spreadsheets to SQL to R
R Packages
Palmer penguins
● https://allisonhorst.github.io/palmerpenguins/ To view: View(penguins)
Tidyverse
● https://www.tidyverse.org/
Week-2
Understand basic programming concepts
Programming fundamentals
Case sensitive
The basic concepts of R
● functions,
● comments,
● variables,
● data types,
● vectors, and
● pipes.
PRINT COMMAND Variables
- print() - # - a<- ”Dhamu”
- ?print() - b<- 10
Some commands to know
● typeof(a)
● is.integer(a)
Vector
vector is a group of data elements of the same type stored in a sequence in
R.
Eg: z<- c(23,45,67)
Pipe
A pipe is a tool in R for expressing a sequence of multiple operations.
Represented by %>%
Vectors and lists in R
Some commands
● typeof(a)
● is.integer(a)
List different data type
● list("a", 1L, 1.5, TRUE)
Naming list
● list('Chicago' = 1, 'New York' = 2, 'Los Angeles' = 3)
● https://r4ds.had.co.nz/vectors.html#vectors
For more information refer pdf
● “ M7_W2_1_Vectors and lists in R.pdf “
Dates and times in R
Install
● install.packages("tidyverse")
Load
● library(tidyverse)
● library(lubridate)
Then
● today()
● now()
Converting from strings
● ymd("2021-01-20")
● mdy("January 20th, 2021")
● dmy("20-Jan-2021")
● ymd_hms("2021-01-20 20:11:59")
● mdy_hm("01/20/2021 08:01")
Other common data structures
● data.frame(x = c(1, 2, 3) , y = c(1.5, 5.5, 7.5))
● dir.create ("destination_folder")
● file.create (“new_text_file.txt”)
● file.copy (“new_text_file.txt” , “destination_folder”)
● unlink (“some_.file.csv”) to delete
● matrix(c(3:8), nrow = 2)
Explore coding in R
Operators and calculations
Assignment operators
Assignment operators are used to assign values to variables and vectors.
Logical operators and conditional statements
Logical operators
● AND (sometimes represented as & or && in R)
● OR (sometimes represented as | or || in R)
● NOT (!)
Conditional statements
● if()
● else()
● else if()
For more information refer pdf
● “ M7_W2_2_Logical operators and conditional statements.pdf “
Learning about R packages
Packages in R
● Tidyverse -It has inbuilt 8 packages
Pipe %>%
Available R packages
Choosing the right packages
● https://www.tidyverse.org/
● https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-us
eful-R-packages
● https://cran.r-project.org/web/views/
R resources for more help
● https://www.rstudio.com/
● https://blog.rstudio.com/
● https://blog.rstudio.com/categories/featured/
● https://stackoverflow.blog/
● https://www.r-bloggers.com/2015/12/how-to-learn-r-2/#h.y5b98o9o
2h1r
Week-3
Explore data and R
Working with data frames
● install.packages("tidyverse")
● library(ggplot2)
● View(diamonds)
● head(diamonds), glimpse()
● str(diamonds) -some info included column names, data type etc.
● colnames(diamonds) -This is only for column names.
● mutate() -Make changes in the dataframe.
Highlighted are summary functions
More about tibbles
Tibbles
Which are a super useful tool for organizing data in R.
● as_tibble(diamonds)
Data-import basics
Import data. and readxl Package
Refer to the pdf named as “ M7_W3_Data-import basics.pdf “
Cleaning data
Cleaning up with the basics
Install
● install.packages(“here”) - Reading files
● install.packages(“skimr”) - Summarizing data
● install.packages(“janitor”) - Cleaning data
● install.packages(“dplyr”)
Load
● library(“here”)
● library(“skimr”)
● library(“janitor”)
● library(“dplyr”)
These are the packages required for data cleaning.
There's a few different functions that we can use to get summaries of
our data frame.
● Skim without charts,
● glimpse,
● head, and
● Select.
Some functions
● rename() - To Rename the column
● rename_with() - Rename with upper case
● clean_names() - Make sure that the column names
are unique and consistent.
File-naming conventions
Give easily understandable file name with underscores
● https://speakerdeck.com/jennybc/how-to-name-files
● https://libguides.princeton.edu/c.php?g=102546&p=930626#:~:text=File%2
0naming%20best%20practices%3A&text=File%20names%20should%20be%2
0short,date%20format%20ISO%208601%3A%20YYYYMMDD
More on R operators
In R, there are four main types of operators:
1. Arithmetic
2. Relational
3. Logical
4. Assignment
Arithmetic operator
Relational operators
Logical operators
Assignment operators
Organize your data
Some functions to organize the data.
It will be helpful to turn information into knowledge.
● arrange() - Sorting
● group_by()
● filter()
Transforming data
Some function to transform data
● Separate()
● unite()
● mutate()
Wide to long with tidyr
Additional resources
● https://tidyr.tidyverse.org/articles/pivot.html
● https://www.tidyverse.org/
● https://rladiessydney.org/courses/ryouwithme/02-cleanitup-5/
● https://scc.ms.unimelb.edu.au/resources-list/simple-r-scripts-for-analysis/r
-scripts
Take a closer look at the data
Same data, different outcome
Anscombe's quartet has four datasets that have nearly identical summary
statistics.
The bias function
bias()
Working with biased data
● https://www.rdocumentation.org/packages/SimDesign/versions/2.2/to
pics/bias
● https://datasciencebox.org/ethics.html
Week-4
Create data visualizations in R
Visualization basics in R and tidyverse
some core concepts in ggplot2:
● aesthetics,
● geoms,
● facets,
● labels and annotations.
Facets
Facets let you display smaller groups or subsets of your data.
With facets, you can create separate plots for all the variables in your
dataset.
Common problems when visualizing in R
● Check the pdf “M7_W4_Common problems when visualizing in R.pdf”
Getting started with ggplot()
● ggplot() in R
Explore aesthetics in analysis
Aesthetic attributes
There are three aesthetic attributes in ggplot2:
● Color: this allows you to change the color of all of the points on your
plot, or the color of each data group
● Size: this allows you to change the size of the points on your plot by
data group
● Shape: this allows you to change the shape of the points on your plot
by data group
Additional resources
● https://ggplot2.tidyverse.org/
● http://statseducation.com/Introduction-to-R/modules/graphics/aesthetics/
● https://www.rdocumentation.org/packages/ggplot2/versions/3.3.3/topics/aes
Smoothing
Smoothing enables the detection of a data trend even when you can't easily
notice a trend from the plotted data points.
Two types of smoothing
Loess smoothing
The loess smoothing process is best for smoothing plots with less than 1000
points.
Gam smoothing
Gam smoothing, or generalized additive model smoothing, is useful for
smoothing plots with a large number of points.
Filtering and plots
Annotate and save visualizations
Drawing arrows and shapes in R
● https://ggplot2.tidyverse.org/reference/annotate.html
● https://www.r-graph-gallery.com/233-add-annotations-on-ggplot2-chart.html
● https://ggplot2-book.org/annotations.html
● https://www.r-bloggers.com/2017/02/how-to-annotate-a-plot-in-ggplot2/
● https://viz-ggplot2.rsquaredacademy.com/ggplot2-text-annotations.html
Saving images without ggsave()
● https://ggplot2.tidyverse.org/reference/ggsave.html#saving-images-without-g
gsave-
● https://www.tidyverse.org/
● https://www.datanovia.com/en/blog/how-to-save-a-ggplot/
● https://www.datamentor.io/r-programming/saving-plot/
Week-5
Develop documentation and reports
R Markdown resources
R Markdown documentation
● https://rmarkdown.rstudio.com/lesson-1.html
R Markdown reference materials
● https://rmarkdown.rstudio.com/lesson-15.html
● https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf?
_ga=2.49295910.1034302809.1602760608-739985330.1601281773
R for Data Science book
● https://r4ds.had.co.nz/communicate-intro.html
R Markdown: The Definitive Guide
● https://bookdown.org/yihui/rmarkdown/
● https://bookdown.org/yihui/rmarkdown/installation.html
● https://bookdown.org/yihui/rmarkdown/documents.html
● https://bookdown.org/yihui/rmarkdown/dashboards.html
● https://bookdown.org/yihui/rmarkdown/parameterized-reports.html
Optional: Jupyter notebooks
● https://colab.research.google.com/notebooks/intro.ipynb
● https://www.kaggle.com/docs/notebooks
● https://jupyter.org/
● https://realpython.com/jupyter-notebook-introduction/
To learn about basic formatting in Jupyter notebooks
● https://jupyter-notebook.readthedocs.io/en/stable/notebook.html
● https://gtribello.github.io/mathNET/assets/notebook-writing.html
● https://medium.com/analytics-vidhya/the-jupyter-notebook-formattin
g-guide-873ab39f765e
Understand code chunks and exports
Adding code chunks to R Markdown notebooks
Output formats in R Markdown
● Refer pdf “ M7_W5_Output formats in R Markdown.pdf “
Exporting your R Markdown notebook
Using R Markdown templates
Quick Review
Week-1
● Introduction to R programming language
Week-2
● Basic concepts
● R Packages
Week-3
● Data frame
● Cleaning data
● Checking for biasing
Week-4
● ggplot()
● Save plotted images
Week-5
● Jupyter notebook
● R Markdown notebook
Dhamodharan
30/10/2021