Introduction to R Programming:
Understanding the Basics of R Programming Language
● Definition and Purpose: R is a powerful language and environment for statistical
computing and graphics. It is designed for data analysis and visualization.
● History: Developed in the early 1990s by Ross Ihaka and Robert Gentleman at
the University of Auckland, New Zealand.
● Key Features:
○ Open-source and free.
○ Comprehensive statistical analysis and graphical capabilities.
○ Extensive community support and a large number of packages for
various applications.
Installing R and RStudio
● R Installation:
○ R can be downloaded from the Comprehensive R Archive Network
(CRAN) website.
○ Installation involves selecting the appropriate version for your
operating system (Windows, macOS, Linux) and following the
installation instructions.
● RStudio Installation:
○ RStudio is an integrated development environment (IDE) for R,
providing a more user-friendly interface.
○ Download the RStudio installer from the RStudio website and install it
after installing R.
○ RStudio includes a console, syntax-highlighting editor, and tools for
plotting, history, and workspace management.
Basic Operations
● Arithmetic Operations: Basic mathematical computations can be performed
using operators like + (addition), -(subtraction), * (multiplication), / (division),
and ^ (exponentiation).
● Relational Operations: Comparisons between values using == (equal to), != (not
equal to), > (greater than), < (less than), >= (greater than or equal to), and <=
(less than or equal to).
● Logical Operations: Logical comparisons using & (AND), | (OR), and ! (NOT).
Data Types and Variables in R
● Data Types:
○ Numeric: Represents real numbers, can be integers or floating-point
numbers.
○ Integer: Whole numbers, declared by appending an L to the number
(e.g., 5L).
○ Character: Text or string data, enclosed in quotes (e.g., "Hello").
○ Logical: Boolean values, either TRUE or FALSE.
○ Factor: Categorical data used for representing variables with a fixed
number of unique values.
● Variables:
○ Variables store data values and are assigned using <- or =.
○ Naming conventions: Start with a letter, can contain letters, numbers,
and underscores, but not spaces or special characters.
Exploring Data Structures
● Vectors: One-dimensional array that can hold numeric, character, or logical data.
○ Created using c() function (e.g., c(1, 2, 3)).
● Matrices: Two-dimensional, homogeneous data structures with rows and
columns.
○ Created using matrix() function (e.g., matrix(1:6, nrow=2, ncol=3)).
● Arrays: Multi-dimensional, homogeneous data structures.
○ Created using array() function (e.g., array(1:8, dim=c(2, 2, 2))).
● Lists: Ordered collections that can hold different types of elements.
○ Created using list() function (e.g., list(name="John", age=25,
scores=c(90, 85, 92))).
● Data Frames: Two-dimensional, heterogeneous data structures similar to tables
in a database.
○ Created using data.frame() function (e.g., data.frame(name=c("Alice",
"Bob"), age=c(25, 30))).
Control Structures
● If-Else Statements:
Allows conditional execution of code based on whether a condition is true
or false.
Syntax:
if (condition) {
# code to execute if condition is true
} else {
# code to execute if condition is false
● Loops:
For Loop: Iterates over a sequence of values.
for (i in 1:5) {
print(i)
}
While Loop: Repeats code while a condition is true.
while (condition) {
# code to execute
Functions and Their Usage in R
● Defining Functions: Functions encapsulate code for reuse and modularity.
Syntax:
function_name <- function(arg1, arg2) {
# function body
return(result)
Example:
add <- function(x, y) {
return(x + y)
add(3, 5) # Returns 8
Importing Data into R from Various Sources
● CSV Files:
○ Imported using read.csv() function.
○ Example: data <- read.csv("path/to/file.csv").
● Excel Files:
○ Imported using readxl package with read_excel() function.
○ Example: data <- read_excel("path/to/file.xlsx").
● Databases:
○ Connected using DBI and RSQLite packages.
○ Example:RCopy code library(DBI)
○ conn <- dbConnect(RSQLite::SQLite(), "path/to/database.sqlite")
○ data <- dbGetQuery(conn, "SELECT * FROM table_name")
○ dbDisconnect(conn)
Data Manipulation Using dplyr and tidyr Packages
● dplyr Package:
○ Provides a set of functions designed to simplify data manipulation.
○ Key functions:
■ filter(): Subset rows based on conditions.
■ select(): Select columns by name.
■ mutate(): Create new variables.
■ summarise(): Summarize data with aggregate functions.
■ arrange(): Sort data by specified variables.
● tidyr Package:
○ Designed for reshaping data.
○ Key functions:
■ gather(): Converts wide data to long format.
■ spread(): Converts long data to wide format.
■ separate(): Splits a column into multiple columns.
■ unite(): Combines multiple columns into one.
Data Visualization with Base R Graphics and ggplot2
● Base R Graphics:
○ Provides basic plotting functions for data visualization.
○ Common functions:
■ plot(): General plotting function.
■ hist(): Creates histograms.
■ boxplot(): Creates box plots.
● ggplot2:
○ A powerful and flexible package for creating advanced visualizations.
○ Based on the Grammar of Graphics.
○ Common functions:
■ ggplot(): Initializes a ggplot object.
■ geom_point(): Creates scatter plots.
■ geom_bar(): Creates bar charts.
■ geom_line(): Creates line plots.