DATA ANALYTICS USING R LAB MANUAL
DATA ANALYTICS USING R LAB
MASTER MANUAL
[AI507PC]
III B.TECH – I SEMESTER
ACADEMIC YEAR : 2024-2025
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ( AI & ML)
CMR ENGINEERING COLLEGE
(Approved by AICTE- New Delhi, Affiliated to JNTUH)
Kandlakoya(V), Medchal Road, Hyderabad
1
DATA ANALYTICS USING R LAB MANUAL
CSE(AI & ML) Department Vision & Mission
Vision:
To produce admirable and competent graduates & experts in Artificial Intelligence &
Machine Learning by quality technical education, innovations and research to
improve the life style in the society.
Mission:
M1: To impart value based technical education in AI & ML through innovative
teaching and learning methods.
M2: To produce outstanding professionals by imparting quality training, hands-on-
experience and value based education.
M3: To produce competent graduates suitable for industries and organizations at global
level including research and development with Social responsibility.
CSE(AI &ML) Program Outcomes [PO’s]:
Engineering Graduates will be able to satisfy these NBA graduate attributes:
1. Engineering knowledge: An ability to apply knowledge of computing,
mathematics, science and engineering fundamentals appropriate to the discipline.
2. Problem analysis: An ability to analyze a problem, and identify and formulate the
computing requirements appropriate to its solution.
3. Design/development of solutions: An ability to design, implement, and evaluate a
computer-based system, process, component, or program to meet desired needs
with appropriate consideration for public health and safety, cultural, societal and
environmental considerations.
4. Conduct investigations of complex problems: An ability to design and conduct
experiments, as well as to analyze and interpret data.
5. Modern tool usage: An ability to use current techniques, skills, and modern tools
necessary for computing practice.
6. The engineer and society: An ability to analyze the local and global impact of
computing on individuals, organizations, and society.
7. Environment and sustainability: Knowledge of contemporary issues.
8. Ethics: An understanding of professional, ethical, legal, security and social issues
and responsibilities.
9. Individual and team work: An ability to function effectively individually and on
teams, including diverse and multidisciplinary, to accomplish a common goal.
10.Communication: An ability to communicate effectively with a range of audiences.
11.Project management and finance: An understanding of engineering and
management principles and apply these to one’s own work, as a member and leader
in a team, to manage projects.
12.Life-long learning: Recognition of the need for and an ability to
engage in continuing professional development.
2
DATA ANALYTICS USING R LAB MANUAL
CSE(AI & ML)Program Educational Outcomes [PEO’s]
1. To provide intellectual environment to successfully pursue higher education in the
area of AI.
2. To impart knowledge in cutting edge Artificial Intelligence technologies in par with
industrial standards.
3. To create an atmosphere for explore research areas and produce outstanding
contribution in various areas of Artificial Intelligence and Machine Learning
CSE(AI & ML) Program Specific Outcome [PSO’s]
1. Ability to use knowledge in emerging technologies in identifying research gaps and
provide solutions with innovative ideas.
2. Ability to analyze the problem to provide optimal solution by fundamental
knowledge and skills in Professional, Engineering Sciences.
3
DATA ANALYTICS USING R LAB MANUAL
LAB CODE
Students should report to the concerned lab as per the time table.
Students who turn up late to the labs will in no case be permitted to do the
program schedule for the day.
After completion of the program, certification of the concerned staff in-
charge in the observation book is necessary.
Student should bring a notebook of 100 pages and should enter the readings
/observations into the notebook while performing the experiment.
The record of observations along with the detailed experimental procedure of
the experiment in the immediate last session should be submitted and certified
staff member in-charge.
The group-wise division made in the beginning should be adhered to and no
mix up of students among different groups will be permitted.
When the experiment is completed, should disconnect the setup made by
them, and should return all the components/instruments taken for the purpose.
Any damage of the equipment or burn-out components will be viewed
seriously either by putting penalty or by dismissing the total group of students
from the lab for the semester/year.
Students should be present in the labs for total scheduled duration.
Students are required to prepare thoroughly to perform the experiment before
coming to laboratory.
4
DATA ANALYTICS USING R LAB MANUAL
INDEX
S.No. List Of Experiments
Data Preprocessing
a. Handling missing values
1 b. Noise detection removal
c. Identifying data redundancy and elimination
2 Implement any one imputation model
3 Implement Linear Regression
4 Implement Logistic Regression
5 Implement Decision Tree Induction for classification
6 Implement Random Forest Classifier
7 Implement ARIMA on Time Series data
8 Object segmentation using hierarchical based methods
Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D
9 Cubes etc)
10 Perform Descriptive analytics on healthcare data
11 Perform Predictive analytics on Product Sales data
12 Apply Predictive analytics for Weather forecasting
5
DATA ANALYTICS USING R LAB MANUAL
Program No. : 1
Date:
Problem Statement:
Data Preprocessing
a. Handling missing values
b. Noise detection removal
c. Identifying data redundancy and elimination
Source Code:
A. Handling missing values
# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)
# Display original data
cat("Original Data:\n")
print(data)
# Method 1: Remove rows with missing values
cleaned_data <- na.omit(data)
cat("\nData after removing rows with missing values:\n")
print(cleaned_data)
# Method 2: Imputation (Replace missing values with mean)
mean_imputation <- function(x) {
x[is.na(x)] <- mean(x, na.rm = TRUE)
return(x)
}
data_mean_imputed <- as.data.frame(lapply(data, mean_imputation))
cat("\nData after mean imputation:\n")
print(data_mean_imputed)
# Method 3: Imputation (Replace missing values with median)
6
DATA ANALYTICS USING R LAB MANUAL
median_imputation <- function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
return(x)
}
data_median_imputed <- as.data.frame(lapply(data, median_imputation))
cat("\nData after median imputation:\n")
print(data_median_imputed)
# Method 4: Imputation using mice package (Multiple Imputation by Chained Equations)
library(mice)
imputed_data <- mice(data)
imputed_data <- complete(imputed_data)
cat("\nData after imputation using mice package:\n")
print(imputed_data)
Output :
Original Data:
A B C
1 NA 1
2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA
Data after removing rows with missing values:
ABC
2222
Data after mean imputation:
A BC
1 1.00 3 1
2 2.00 2 2
3 3.25 3 3
4 4.00 3 4
5 5.00 5 2
7
DATA ANALYTICS USING R LAB MANUAL
Data after median imputation:
ABC
1 131
2 222
3 333
4 434
5 552
Data after imputation using mice package:
ABC
1131
2222
3333
4434
5552
8
DATA ANALYTICS USING R LAB MANUAL
B. Noise detection removal
# Sample data with noise
data <- c(1, 2, 3, 100, 5, 6, 7, 200, 9, 10)
# Display original data
cat("Original Data:\n")
print(data)
# Method 1: Z-score method for outlier detection and removal
z_score_remove_outliers <- function(x, threshold = 3) {
z <- abs((x - mean(x)) / sd(x))
outliers <- which(z > threshold)
x[outliers] <- NA
return(x)
}
# Apply z-score method
data_without_outliers <- z_score_remove_outliers(data)
cat("\nData after removing outliers using z-score method:\n")
print(data_without_outliers)
Output:
Original Data:
[1] 1 2 3 100 5 6 7 200 9 10
Data after removing outliers using z-score method:
[1] 1 2 3 NA 5 6 7 NA 9 10
9
DATA ANALYTICS USING R LAB MANUAL
C. Identifying rata redundancy and elimination
# Sample data with redundancy
data <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("John", "Alice", "Bob", "John", "Alice"),
Age = c(25, 30, 35, 25, 30),
Gender = c("Male", "Female", "Male", "Male", "Female")
)
# Display original data
cat("Original Data:\n")
print(data)
# Method 1: Identifying redundant rows
find_redundant_rows <- function(df) {
duplicated_rows <- duplicated(df) | duplicated(df, fromLast = TRUE)
redundant_rows <- df[duplicated_rows, ]
return(redundant_rows)
}
redundant_rows <- find_redundant_rows(data)
cat("\nRedundant Rows:\n")
print(redundant_rows)
# Method 2: Eliminating redundant rows
eliminate_redundancy <- function(df) {
unique_data <- unique(df)
return(unique_data)
}
cleaned_data <- eliminate_redundancy(data)
cat("\nData after eliminating redundancy:\n")
print(cleaned_data)
10
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male
4 4 John 25 Male
5 5 Alice 30 Female
Redundant Rows:
ID Name Age Gender
4 4 John 25 Male
5 5 Alice 30 Female
Data after eliminating redundancy:
ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male
11
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 2
Date:
Problem Statement: Implement any one imputation model
Source Code:
# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)
# Display original data
cat("Original Data:\n")
print(data)
# Imputation model using linear regression
impute_with_regression <- function(data) {
for (col in colnames(data)) {
missing_indices <- which(is.na(data[, col]))
if (length(missing_indices) > 0) {
non_missing_indices <- which(!is.na(data[, col]))
model <- lm(data[non_missing_indices, col] ~ ., data = data[non_missing_indices, ])
predicted_values <- predict(model, newdata = data[missing_indices, ])
data[missing_indices, col] <- predicted_values
}
}
return(data)
}
# Apply imputation model
data_imputed <- impute_with_regression(data)
# Display data after imputation
cat("\nData after imputation using linear regression:\n")
print(data_imputed)
12
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
A B C
1 1 NA 1
2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA
Data after imputation using linear regression:
A B C
1 1.00000 2.999999 1.000000
2 2.00000 2.000000 2.000000
3 3.00000 3.000000 3.000000
4 4.00000 3.999999 4.000000
5 5.00000 5.000000 2.750001
13
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 3
Date:
Problem Statement: Implement Linear Regression
Source Code:
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)
# Perform linear regression
model <- lm(y ~ x)
# Display regression coefficients
cat("Regression Coefficients:\n")
print(coef(model))
# Plot the data points
plot(x, y, main = "Linear Regression", xlab = "X", ylab = "Y", pch = 19, col = "blue")
# Add regression line to the plot
abline(model, col = "red")
# Add legend
legend("topright", legend = "Regression Line", col = "red", lty = 1, cex = 0.8)
Output:
Regression Coefficients:
(Intercept) x
1 1
14
DATA ANALYTICS USING R LAB MANUAL
Program No. : 4
Date:
Problem Statement: Implement Logistic Regression
Source Code:
# Sample data
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
# Perform logistic regression
model <- glm(y ~ x, family = binomial)
# Display regression coefficients
cat("Regression Coefficients:\n")
print(summary(model)$coefficients)
# Plot the data points
plot(x, y, main = "Logistic Regression", xlab = "X", ylab = "Probability", pch = 19, col = "blue")
# Add logistic regression curve to the plot
curve(predict(model, data.frame(x = x), type = "response"), add = TRUE, col = "red")
# Add legend
legend("topright", legend = "Logistic Regression Curve", col = "red", lty = 1, cex = 0.8)
Output:
Regression Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.2280228 2.7501662 -1.173513 0.2403259764
x 0.5256342 0.4552689 1.154603 0.2484599465
15
DATA ANALYTICS USING R LAB MANUAL
Program No. : 5
Date:
Problem Statement: Implement Decision Tree Induction for classification
Source Code:
# Install and load the rpart package if not already installed
if (!requireNamespace("rpart", quietly = TRUE)) {
install.packages("rpart")
}
library(rpart)
# Sample data
data <- data.frame(
Feature1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Feature2 = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1),
Class = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
)
# Perform decision tree induction
tree_model <- rpart(Class ~ ., data = data, method = "class")
# Plot the decision tree
plot(tree_model, uniform = TRUE, main = "Decision Tree for Classification")
text(tree_model, use.n = TRUE, all = TRUE, cex = 0.8)
# Output the decision rules
cat("Decision Rules:\n")
print(tree_model)
Output:
Decision Rules:
n= 10
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 10 4 A (0.6000000 0.4000000)
2) Feature1< 5.5 5 1 A (0.8000000 0.2000000) *
3) Feature1>=5.5 5 1 B (0.2000000 0.8000000) *
16
DATA ANALYTICS USING R LAB MANUAL
Program No. : 6
Date:
Problem Statement: Implement Random Forest Classifier
Source Code:
# Install and load the randomForest package if not already installed
if (!requireNamespace("randomForest", quietly = TRUE)) {
install.packages("randomForest")
}
library(randomForest)
# Sample data
data <- iris
# Split data into training and testing sets
set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(data), 0.7 * nrow(data)) # 70% for training
train_data <- data[train_indices, ]
test_data <- data[-train_indices, ]
# Perform Random Forest classification
rf_model <- randomForest(Species ~ ., data = train_data)
# Make predictions on the test set
predictions <- predict(rf_model, newdata = test_data)
# Output predictions
cat("Predictions:\n")
print(predictions)
17
DATA ANALYTICS USING R LAB MANUAL
Output:
Predictions:
[1] setosa setosa setosa setosa setosa setosa setosa
[8] setosa setosa setosa setosa setosa setosa setosa
[15] setosa setosa setosa setosa setosa setosa setosa
[22] setosa setosa setosa setosa setosa setosa setosa
[29] setosa setosa setosa setosa setosa setosa setosa
[36] setosa setosa setosa setosa setosa setosa setosa
[43] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[50] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[57] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[64] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[71] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[78] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[85] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[92] virginica versicolor versicolor versicolor versicolor versicolor versicolor
[99] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[106] virginica virginica virginica virginica virginica virginica virginica
[113] virginica virginica virginica virginica virginica virginica virginica
[120] virginica virginica virginica virginica virginica virginica virginica
[127] virginica virginica virginica virginica virginica virginica virginica
[134] virginica virginica virginica virginica virginica virginica virginica
[141] virginica virginica virginica virginica virginica virginica virginica
[148] virginica virginica virginica virginica
Levels: setosa versicolor virginica
18
DATA ANALYTICS USING R LAB MANUAL
Program No. : 7
Date:
Problem Statement: Implement ARIMA on Time Series data
Source code:
# Install and load the forecast package if not already installed
if (!requireNamespace("forecast", quietly = TRUE)) {
install.packages("forecast")
}
library(forecast)
# Sample time series data
ts_data <- c(20, 25, 30, 35, 40, 45, 50, 55, 60, 65)
# Convert the data to a time series object
ts_data <- ts(ts_data)
# Perform ARIMA modeling
arima_model <- auto.arima(ts_data)
# Generate forecast for the next 3 time points
forecast_data <- forecast(arima_model, h = 3)
# Output forecast data
cat("Forecasted values for the next 3 time points:\n")
print(forecast_data$mean)
Output:
Forecasted values for the next 3 time points:
Time Series:
Start = 11
End = 13
Frequency = 1
[1] 70 75 80
19
DATA ANALYTICS USING R LAB MANUAL
Program No. : 8
Date:
Problem Statement: Object segmentation using hierarchical based methods
Source Code:
# Sample data
set.seed(123)
data <- matrix(rnorm(100), ncol = 2)
# Perform hierarchical clustering
hc <- hclust(dist(data))
# Determine clusters
k <- 3
clusters <- cutree(hc, k)
# Output cluster assignments
cat("Cluster Assignments:\n")
print(clusters)
# Plot dendrogram with clusters
plot(hc, main = "Dendrogram with Clusters")
rect.hclust(hc, k = k, border = 2:4)
Output:
Cluster Assignments:
[1] 2 2 1 1 1 1 1 1 3 3 2 3 1 1 3 1 3 3 1 1 1 1 3 1 3 2 2 3 3 1 2 2 2 3 2 2 2
[38] 1 3 2 1 3 2 2 1 3 1 3 2 2 2 2 2 1 3 3 2 1 3 1 1 2 2 2 2 2 1 1 1 2 3 1 1 1
[75] 1 1 1 1 2 3 3 3 2 1 1 3 2 2 3 1 1 2 2 3 1 1 2 2 2
20
DATA ANALYTICS USING R LAB MANUAL
Program No. : 9
Date:
Problem Statement: Perform Visualization techniques (types of maps - Bar, Colum, Line,
Scatter, 3D Cubes etc)
Source Code
Path of the file to read
flight_filepath = "../input/flight_delays.csv"
# Read the file into a variable flight_data
flight_data = pd.read_csv(flight_filepath, index_col="Month")
# Print the data
flight_data
# Set the width and height of the figure
plt.figure(figsize=(10,6))
# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")
# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])
# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")
Output :
21
DATA ANALYTICS USING R LAB MANUAL
Line Graph:
Source Code:
# Path of the file to read
spotify_filepath = "../input/spotify.csv"
# Read the file into a variable spotify_data
spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)
# Print the first 5 rows of the data
spotify_data.head()
# Print the last five rows of the data
spotify_data.tail()
# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)
Output:
22
DATA ANALYTICS USING R LAB MANUAL
Scatter Graph:
Source Code:
# Path of the file to read
insurance_filepath = "../input/insurance.csv"
# Read the file into a variable insurance_data
insurance_data = pd.read_csv(insurance_filepath)
insurance_data.head()
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])
Output:
0 19 female 27.900 0 yes southwest 16884.92400
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520
23
DATA ANALYTICS USING R LAB MANUAL
Program No. : 10
Date:
Problem Statement: Perform Descriptive analytics on healthcare data
Source Code:
# Load necessary libraries
library(dplyr) # for data manipulation
library(ggplot2) # for data visualization
# Load healthcare data (sample data)
healthcare_data <- read.csv("healthcare_data.csv")
# View the structure of the dataset
str(healthcare_data)
# Summary statistics
summary_stats <- summary(healthcare_data)
print(summary_stats)
# Descriptive statistics for blood pressure
blood_pressure_stats <- summarize(healthcare_data,
avg_systolic_bp = mean(systolic_bp),
avg_diastolic_bp = mean(diastolic_bp),
max_systolic_bp = max(systolic_bp),
max_diastolic_bp = max(diastolic_bp),
min_systolic_bp = min(systolic_bp),
min_diastolic_bp = min(diastolic_bp))
print(blood_pressure_stats)
# Descriptive statistics for cholesterol levels
cholesterol_stats <- summarize(healthcare_data,
avg_total_cholesterol = mean(total_cholesterol),
max_total_cholesterol = max(total_cholesterol),
min_total_cholesterol = min(total_cholesterol))
print(cholesterol_stats)
# Data visualization - Histogram of blood pressure
blood_pressure_hist <- ggplot(healthcare_data, aes(x = systolic_bp)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
labs(title = "Histogram of Systolic Blood Pressure", x = "Systolic Blood Pressure", y = "Frequency")
print(blood_pressure_hist)
# Data visualization - Boxplot of cholesterol levels
cholesterol_boxplot <- ggplot(healthcare_data, aes(x = "", y = total_cholesterol)) +
geom_boxplot(fill = "lightgreen", color = "black") +
labs(title = "Boxplot of Total Cholesterol Levels", x = "", y = "Total Cholesterol")
print(cholesterol_boxplot)
24
DATA ANALYTICS USING R LAB MANUAL
Output:
## Pregnancies Glucose BloodPressure SkinThickness
## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 99.0 1st Qu.: 62.00 1st Qu.: 0.00
## Median : 3.000 Median :117.0 Median : 72.00 Median :23.00
## Mean : 3.845 Mean :120.9 Mean : 69.11 Mean :20.54
## 3rd Qu.: 6.000 3rd Qu.:140.2 3rd Qu.: 80.00 3rd Qu.:32.00
## Max. :17.000 Max. :199.0 Max. :122.00 Max. :99.00
## Insulin BMI DiabetesPedigreeFunction Age
## Min. : 0.0 Min. : 0.00 Min. :0.0780 Min. :21.00
## 1st Qu.: 0.0 1st Qu.:27.30 1st Qu.:0.2437 1st Qu.:24.00
## Median : 30.5 Median :32.00 Median :0.3725 Median :29.00
## Mean : 79.8 Mean :31.99 Mean :0.4719 Mean :33.24
## 3rd Qu.:127.2 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00
## Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00
## Outcome
## Min. :0.000
## 1st Qu.:0.000
## Median :0.000
## Mean :0.349
## 3rd Qu.:1.000
## Max. :1.000
25
DATA ANALYTICS USING R LAB MANUAL
Program NO. : 11
Date:
Problem Statement: Perform Predictive analytics on Product Sales data
Source Code:
# Load necessary libraries
library(ggplot2) # for data visualization
library(dplyr) # for data manipulation
library(lmtest) # for linear regression
# Load product sales data (sample data)
sales_data <- read.csv("product_sales_data.csv")
# View the structure of the dataset
str(sales_data)
# Summary statistics
summary_stats <- summary(sales_data)
print(summary_stats)
# Data visualization - Time series plot of sales
time_series_plot <- ggplot(sales_data, aes(x = date, y = sales)) +
geom_line() +
labs(title = "Time Series Plot of Sales", x = "Date", y = "Sales")
print(time_series_plot)
# Train-test split (80-20 split)
set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(sales_data), 0.8 * nrow(sales_data))
train_data <- sales_data[train_indices, ]
test_data <- sales_data[-train_indices, ]
# Simple linear regression model
sales_lm <- lm(sales ~ date, data = train_data)
# Summary of the linear regression model
summary(sales_lm)
# Predictions on test data
predicted_sales <- predict(sales_lm, newdata = test_data)
# Evaluate model performance
rmse <- sqrt(mean((predicted_sales - test_data$sales)^2))
cat("Root Mean Squared Error (RMSE):", rmse, "\n")
# Plot actual vs. predicted sales
actual_vs_predicted_plot <- ggplot() +
geom_line(data = test_data, aes(x = date, y = sales), color = "blue", linetype = "solid") +
26
DATA ANALYTICS USING R LAB MANUAL
geom_line(data = test_data, aes(x = date, y = predicted_sales), color = "red", linetype = "dashed") +
labs(title = "Actual vs. Predicted Sales", x = "Date", y = "Sales")
print(actual_vs_predicted_plot)
Output:
27
PROGRAMMING IN PYTHON LAB MANUAL
Program NO. : 12
Problem Statement: Apply Predictive analytics for Weather forecasting
Source Code:
# Load necessary libraries
library(forecast) # for time series forecasting
# Load weather data (sample data)
weather_data <- read.csv("weather_data.csv")
# Convert date column to Date type
weather_data$date <- as.Date(weather_data$date)
# View the structure of the dataset
str(weather_data)
# Summary statistics
summary_stats <- summary(weather_data)
print(summary_stats)
# Data visualization - Time series plot of temperature
time_series_plot <- plot(weather_data$date, weather_data$temperature,
type = "l", xlab = "Date", ylab = "Temperature",
main = "Time Series Plot of Temperature")
print(time_series_plot)
# Create time series object
weather_ts <- ts(weather_data$temperature, frequency = 365)
# Fit ARIMA model
arima_model <- auto.arima(weather_ts)
# Forecast for the next 7 days
forecast_result <- forecast(arima_model, h = 7)
# Plot the forecast
forecast_plot <- plot(forecast_result, main = "Forecast for Next 7 Days")
print(forecast_plot)
# Print forecasted values
print(forecast_result)
everse_words(s) print("The reversed sentence: ",rs)
28
PROGRAMMING IN PYTHON LAB MANUAL
Output:
29