0% found this document useful (0 votes)

14 views29 pages

Data Analytics Using R Lab - Master Manual

The document is a lab manual for a Data Analytics course using R, aimed at Computer Science and Engineering students in their first semester. It outlines the vision, mission, program outcomes, and specific experiments related to data preprocessing, regression models, and classification techniques. The manual includes guidelines for lab conduct, a list of experiments, and sample code for various data analytics tasks.

Uploaded by

Vinay Kumar Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views29 pages

Data Analytics Using R Lab - Master Manual

Uploaded by

Vinay Kumar Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

DATA ANALYTICS USING R LAB MANUAL

DATA ANALYTICS USING R LAB

MASTER MANUAL
[AI507PC]

III B.TECH – I SEMESTER

ACADEMIC YEAR : 2024-2025
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ( AI & ML)

CMR ENGINEERING COLLEGE

(Approved by AICTE- New Delhi, Affiliated to JNTUH)
Kandlakoya(V), Medchal Road, Hyderabad

1
DATA ANALYTICS USING R LAB MANUAL
CSE(AI & ML) Department Vision & Mission
Vision:
To produce admirable and competent graduates & experts in Artificial Intelligence &
Machine Learning by quality technical education, innovations and research to
improve the life style in the society.
Mission:
M1: To impart value based technical education in AI & ML through innovative
teaching and learning methods.
M2: To produce outstanding professionals by imparting quality training, hands-on-
experience and value based education.
M3: To produce competent graduates suitable for industries and organizations at global
level including research and development with Social responsibility.

CSE(AI &ML) Program Outcomes [PO’s]:

Engineering Graduates will be able to satisfy these NBA graduate attributes:
1. Engineering knowledge: An ability to apply knowledge of computing,
mathematics, science and engineering fundamentals appropriate to the discipline.
2. Problem analysis: An ability to analyze a problem, and identify and formulate the
computing requirements appropriate to its solution.
3. Design/development of solutions: An ability to design, implement, and evaluate a
computer-based system, process, component, or program to meet desired needs
with appropriate consideration for public health and safety, cultural, societal and
environmental considerations.
4. Conduct investigations of complex problems: An ability to design and conduct
experiments, as well as to analyze and interpret data.
5. Modern tool usage: An ability to use current techniques, skills, and modern tools
necessary for computing practice.
6. The engineer and society: An ability to analyze the local and global impact of
computing on individuals, organizations, and society.
7. Environment and sustainability: Knowledge of contemporary issues.
8. Ethics: An understanding of professional, ethical, legal, security and social issues
and responsibilities.
9. Individual and team work: An ability to function effectively individually and on
teams, including diverse and multidisciplinary, to accomplish a common goal.
10.Communication: An ability to communicate effectively with a range of audiences.
11.Project management and finance: An understanding of engineering and
management principles and apply these to one’s own work, as a member and leader
in a team, to manage projects.
12.Life-long learning: Recognition of the need for and an ability to
engage in continuing professional development.

2
DATA ANALYTICS USING R LAB MANUAL

CSE(AI & ML)Program Educational Outcomes [PEO’s]

1. To provide intellectual environment to successfully pursue higher education in the
area of AI.
2. To impart knowledge in cutting edge Artificial Intelligence technologies in par with
industrial standards.
3. To create an atmosphere for explore research areas and produce outstanding
contribution in various areas of Artificial Intelligence and Machine Learning

CSE(AI & ML) Program Specific Outcome [PSO’s]

1. Ability to use knowledge in emerging technologies in identifying research gaps and
provide solutions with innovative ideas.
2. Ability to analyze the problem to provide optimal solution by fundamental
knowledge and skills in Professional, Engineering Sciences.

3
DATA ANALYTICS USING R LAB MANUAL

LAB CODE

 Students should report to the concerned lab as per the time table.
 Students who turn up late to the labs will in no case be permitted to do the
program schedule for the day.
 After completion of the program, certification of the concerned staff in-
charge in the observation book is necessary.
 Student should bring a notebook of 100 pages and should enter the readings
/observations into the notebook while performing the experiment.
 The record of observations along with the detailed experimental procedure of
the experiment in the immediate last session should be submitted and certified
staff member in-charge.
 The group-wise division made in the beginning should be adhered to and no
mix up of students among different groups will be permitted.
 When the experiment is completed, should disconnect the setup made by
them, and should return all the components/instruments taken for the purpose.
 Any damage of the equipment or burn-out components will be viewed
seriously either by putting penalty or by dismissing the total group of students
from the lab for the semester/year.
 Students should be present in the labs for total scheduled duration.
 Students are required to prepare thoroughly to perform the experiment before
coming to laboratory.

4
DATA ANALYTICS USING R LAB MANUAL

INDEX

S.No. List Of Experiments

Data Preprocessing
a. Handling missing values
1 b. Noise detection removal
c. Identifying data redundancy and elimination

2 Implement any one imputation model

3 Implement Linear Regression

4 Implement Logistic Regression

5 Implement Decision Tree Induction for classification

6 Implement Random Forest Classifier

7 Implement ARIMA on Time Series data

8 Object segmentation using hierarchical based methods

Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D

9 Cubes etc)

10 Perform Descriptive analytics on healthcare data

11 Perform Predictive analytics on Product Sales data

12 Apply Predictive analytics for Weather forecasting

5
DATA ANALYTICS USING R LAB MANUAL

Program No. : 1

Date:

Problem Statement:
Data Preprocessing
a. Handling missing values
b. Noise detection removal
c. Identifying data redundancy and elimination

Source Code:

A. Handling missing values

# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)

# Display original data

cat("Original Data:\n")
print(data)

# Method 1: Remove rows with missing values

cleaned_data <- na.omit(data)
cat("\nData after removing rows with missing values:\n")
print(cleaned_data)

# Method 2: Imputation (Replace missing values with mean)

mean_imputation <- function(x) {
x[is.na(x)] <- mean(x, na.rm = TRUE)
return(x)
}
data_mean_imputed <- as.data.frame(lapply(data, mean_imputation))
cat("\nData after mean imputation:\n")
print(data_mean_imputed)

# Method 3: Imputation (Replace missing values with median)

6
DATA ANALYTICS USING R LAB MANUAL
median_imputation <- function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
return(x)
}
data_median_imputed <- as.data.frame(lapply(data, median_imputation))
cat("\nData after median imputation:\n")
print(data_median_imputed)

# Method 4: Imputation using mice package (Multiple Imputation by Chained Equations)

library(mice)
imputed_data <- mice(data)
imputed_data <- complete(imputed_data)
cat("\nData after imputation using mice package:\n")
print(imputed_data)

Output :
Original Data:
A B C
1 NA 1

2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA

Data after removing rows with missing values:

ABC
2222

Data after mean imputation:

A BC
1 1.00 3 1
2 2.00 2 2
3 3.25 3 3
4 4.00 3 4
5 5.00 5 2

7
DATA ANALYTICS USING R LAB MANUAL

Data after median imputation:

ABC
1 131
2 222
3 333
4 434
5 552

Data after imputation using mice package:

ABC
1131
2222
3333
4434
5552

8
DATA ANALYTICS USING R LAB MANUAL

B. Noise detection removal

# Sample data with noise
data <- c(1, 2, 3, 100, 5, 6, 7, 200, 9, 10)

# Display original data

cat("Original Data:\n")
print(data)

# Method 1: Z-score method for outlier detection and removal

z_score_remove_outliers <- function(x, threshold = 3) {
z <- abs((x - mean(x)) / sd(x))
outliers <- which(z > threshold)
x[outliers] <- NA
return(x)
}

# Apply z-score method

data_without_outliers <- z_score_remove_outliers(data)
cat("\nData after removing outliers using z-score method:\n")
print(data_without_outliers)

Output:
Original Data:
[1] 1 2 3 100 5 6 7 200 9 10

Data after removing outliers using z-score method:

[1] 1 2 3 NA 5 6 7 NA 9 10

9
DATA ANALYTICS USING R LAB MANUAL

C. Identifying rata redundancy and elimination

# Sample data with redundancy
data <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("John", "Alice", "Bob", "John", "Alice"),
Age = c(25, 30, 35, 25, 30),
Gender = c("Male", "Female", "Male", "Male", "Female")
)

# Display original data

cat("Original Data:\n")
print(data)

# Method 1: Identifying redundant rows

find_redundant_rows <- function(df) {
duplicated_rows <- duplicated(df) | duplicated(df, fromLast = TRUE)
redundant_rows <- df[duplicated_rows, ]
return(redundant_rows)
}
redundant_rows <- find_redundant_rows(data)
cat("\nRedundant Rows:\n")
print(redundant_rows)

# Method 2: Eliminating redundant rows

eliminate_redundancy <- function(df) {
unique_data <- unique(df)
return(unique_data)
}

cleaned_data <- eliminate_redundancy(data)

cat("\nData after eliminating redundancy:\n")
print(cleaned_data)

10
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male
4 4 John 25 Male
5 5 Alice 30 Female

Redundant Rows:
ID Name Age Gender
4 4 John 25 Male
5 5 Alice 30 Female

Data after eliminating redundancy:

ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male

11
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 2
Date:
Problem Statement: Implement any one imputation model

Source Code:
# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)

# Display original data

cat("Original Data:\n")
print(data)

# Imputation model using linear regression

impute_with_regression <- function(data) {
for (col in colnames(data)) {
missing_indices <- which(is.na(data[, col]))
if (length(missing_indices) > 0) {
non_missing_indices <- which(!is.na(data[, col]))
model <- lm(data[non_missing_indices, col] ~ ., data = data[non_missing_indices, ])
predicted_values <- predict(model, newdata = data[missing_indices, ])
data[missing_indices, col] <- predicted_values
}
}
return(data)
}

# Apply imputation model

data_imputed <- impute_with_regression(data)

# Display data after imputation

cat("\nData after imputation using linear regression:\n")
print(data_imputed)

12
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
A B C
1 1 NA 1
2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA

Data after imputation using linear regression:

A B C
1 1.00000 2.999999 1.000000
2 2.00000 2.000000 2.000000
3 3.00000 3.000000 3.000000
4 4.00000 3.999999 4.000000
5 5.00000 5.000000 2.750001

13
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 3

Date:

Problem Statement: Implement Linear Regression

Source Code:
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)

# Perform linear regression

model <- lm(y ~ x)

# Display regression coefficients

cat("Regression Coefficients:\n")
print(coef(model))

# Plot the data points

plot(x, y, main = "Linear Regression", xlab = "X", ylab = "Y", pch = 19, col = "blue")

# Add regression line to the plot

abline(model, col = "red")

# Add legend
legend("topright", legend = "Regression Line", col = "red", lty = 1, cex = 0.8)

Output:

Regression Coefficients:
(Intercept) x
1 1

14
DATA ANALYTICS USING R LAB MANUAL

Program No. : 4

Date:

Problem Statement: Implement Logistic Regression

Source Code:

# Sample data
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(0, 0, 0, 0, 1, 1, 1, 1, 1, 1)

# Perform logistic regression

model <- glm(y ~ x, family = binomial)

# Display regression coefficients

cat("Regression Coefficients:\n")
print(summary(model)$coefficients)

# Plot the data points

plot(x, y, main = "Logistic Regression", xlab = "X", ylab = "Probability", pch = 19, col = "blue")

# Add logistic regression curve to the plot

curve(predict(model, data.frame(x = x), type = "response"), add = TRUE, col = "red")

# Add legend
legend("topright", legend = "Logistic Regression Curve", col = "red", lty = 1, cex = 0.8)

Output:

Regression Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.2280228 2.7501662 -1.173513 0.2403259764

x 0.5256342 0.4552689 1.154603 0.2484599465

15
DATA ANALYTICS USING R LAB MANUAL
Program No. : 5

Date:

Problem Statement: Implement Decision Tree Induction for classification

Source Code:

# Install and load the rpart package if not already installed

if (!requireNamespace("rpart", quietly = TRUE)) {
install.packages("rpart")
}
library(rpart)

# Sample data
data <- data.frame(
Feature1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Feature2 = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1),
Class = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
)

# Perform decision tree induction

tree_model <- rpart(Class ~ ., data = data, method = "class")

# Plot the decision tree

plot(tree_model, uniform = TRUE, main = "Decision Tree for Classification")
text(tree_model, use.n = TRUE, all = TRUE, cex = 0.8)

# Output the decision rules

cat("Decision Rules:\n")
print(tree_model)

Output:

Decision Rules:
n= 10

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 10 4 A (0.6000000 0.4000000)

2) Feature1< 5.5 5 1 A (0.8000000 0.2000000) *
3) Feature1>=5.5 5 1 B (0.2000000 0.8000000) *

16
DATA ANALYTICS USING R LAB MANUAL

Program No. : 6

Date:

Problem Statement: Implement Random Forest Classifier

Source Code:

# Install and load the randomForest package if not already installed

if (!requireNamespace("randomForest", quietly = TRUE)) {
install.packages("randomForest")
}
library(randomForest)

# Sample data
data <- iris

# Split data into training and testing sets

set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(data), 0.7 * nrow(data)) # 70% for training
train_data <- data[train_indices, ]
test_data <- data[-train_indices, ]

# Perform Random Forest classification

rf_model <- randomForest(Species ~ ., data = train_data)

# Make predictions on the test set

predictions <- predict(rf_model, newdata = test_data)

# Output predictions
cat("Predictions:\n")
print(predictions)

17
DATA ANALYTICS USING R LAB MANUAL
Output:

Predictions:
[1] setosa setosa setosa setosa setosa setosa setosa
[8] setosa setosa setosa setosa setosa setosa setosa
[15] setosa setosa setosa setosa setosa setosa setosa
[22] setosa setosa setosa setosa setosa setosa setosa
[29] setosa setosa setosa setosa setosa setosa setosa
[36] setosa setosa setosa setosa setosa setosa setosa
[43] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[50] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[57] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[64] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[71] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[78] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[85] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[92] virginica versicolor versicolor versicolor versicolor versicolor versicolor
[99] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[106] virginica virginica virginica virginica virginica virginica virginica
[113] virginica virginica virginica virginica virginica virginica virginica
[120] virginica virginica virginica virginica virginica virginica virginica
[127] virginica virginica virginica virginica virginica virginica virginica
[134] virginica virginica virginica virginica virginica virginica virginica
[141] virginica virginica virginica virginica virginica virginica virginica
[148] virginica virginica virginica virginica
Levels: setosa versicolor virginica

18
DATA ANALYTICS USING R LAB MANUAL

Program No. : 7

Date:

Problem Statement: Implement ARIMA on Time Series data

Source code:
# Install and load the forecast package if not already installed
if (!requireNamespace("forecast", quietly = TRUE)) {
install.packages("forecast")
}
library(forecast)

# Sample time series data

ts_data <- c(20, 25, 30, 35, 40, 45, 50, 55, 60, 65)

# Convert the data to a time series object

ts_data <- ts(ts_data)

# Perform ARIMA modeling

arima_model <- auto.arima(ts_data)

# Generate forecast for the next 3 time points

forecast_data <- forecast(arima_model, h = 3)

# Output forecast data

cat("Forecasted values for the next 3 time points:\n")
print(forecast_data$mean)

Output:

Forecasted values for the next 3 time points:

Time Series:
Start = 11
End = 13
Frequency = 1
[1] 70 75 80

19
DATA ANALYTICS USING R LAB MANUAL
Program No. : 8

Date:

Problem Statement: Object segmentation using hierarchical based methods

Source Code:
# Sample data
set.seed(123)
data <- matrix(rnorm(100), ncol = 2)

# Perform hierarchical clustering

hc <- hclust(dist(data))

# Determine clusters
k <- 3
clusters <- cutree(hc, k)

# Output cluster assignments

cat("Cluster Assignments:\n")
print(clusters)

# Plot dendrogram with clusters

plot(hc, main = "Dendrogram with Clusters")
rect.hclust(hc, k = k, border = 2:4)

Output:
Cluster Assignments:
[1] 2 2 1 1 1 1 1 1 3 3 2 3 1 1 3 1 3 3 1 1 1 1 3 1 3 2 2 3 3 1 2 2 2 3 2 2 2
[38] 1 3 2 1 3 2 2 1 3 1 3 2 2 2 2 2 1 3 3 2 1 3 1 1 2 2 2 2 2 1 1 1 2 3 1 1 1
[75] 1 1 1 1 2 3 3 3 2 1 1 3 2 2 3 1 1 2 2 3 1 1 2 2 2

20
DATA ANALYTICS USING R LAB MANUAL

Program No. : 9

Date:

Problem Statement: Perform Visualization techniques (types of maps - Bar, Colum, Line,
Scatter, 3D Cubes etc)

Source Code
Path of the file to read
flight_filepath = "../input/flight_delays.csv"

# Read the file into a variable flight_data

flight_data = pd.read_csv(flight_filepath, index_col="Month")
# Print the data
flight_data

# Set the width and height of the figure

plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis

plt.ylabel("Arrival delay (in minutes)")

Output :

21
DATA ANALYTICS USING R LAB MANUAL

Line Graph:

Source Code:

# Path of the file to read

spotify_filepath = "../input/spotify.csv"

# Read the file into a variable spotify_data

spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)
# Print the first 5 rows of the data
spotify_data.head()
# Print the last five rows of the data
spotify_data.tail()

# Line chart showing daily global streams of each song

sns.lineplot(data=spotify_data)

Output:

22
DATA ANALYTICS USING R LAB MANUAL

Scatter Graph:

Source Code:

# Path of the file to read

insurance_filepath = "../input/insurance.csv"

# Read the file into a variable insurance_data

insurance_data = pd.read_csv(insurance_filepath)
insurance_data.head()

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

Output:

0 19 female 27.900 0 yes southwest 16884.92400

1 18 male 33.770 1 no southeast 1725.55230

2 28 male 33.000 3 no southeast 4449.46200

3 33 male 22.705 0 no northwest 21984.47061

4 32 male 28.880 0 no northwest 3866.85520

23
DATA ANALYTICS USING R LAB MANUAL

Program No. : 10

Date:

Problem Statement: Perform Descriptive analytics on healthcare data

Source Code:

# Load necessary libraries

library(dplyr) # for data manipulation
library(ggplot2) # for data visualization

# Load healthcare data (sample data)

healthcare_data <- read.csv("healthcare_data.csv")

# View the structure of the dataset

str(healthcare_data)

# Summary statistics
summary_stats <- summary(healthcare_data)
print(summary_stats)

# Descriptive statistics for blood pressure

blood_pressure_stats <- summarize(healthcare_data,
avg_systolic_bp = mean(systolic_bp),
avg_diastolic_bp = mean(diastolic_bp),
max_systolic_bp = max(systolic_bp),
max_diastolic_bp = max(diastolic_bp),
min_systolic_bp = min(systolic_bp),
min_diastolic_bp = min(diastolic_bp))
print(blood_pressure_stats)

# Descriptive statistics for cholesterol levels

cholesterol_stats <- summarize(healthcare_data,
avg_total_cholesterol = mean(total_cholesterol),
max_total_cholesterol = max(total_cholesterol),
min_total_cholesterol = min(total_cholesterol))
print(cholesterol_stats)

# Data visualization - Histogram of blood pressure

blood_pressure_hist <- ggplot(healthcare_data, aes(x = systolic_bp)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
labs(title = "Histogram of Systolic Blood Pressure", x = "Systolic Blood Pressure", y = "Frequency")
print(blood_pressure_hist)

# Data visualization - Boxplot of cholesterol levels

cholesterol_boxplot <- ggplot(healthcare_data, aes(x = "", y = total_cholesterol)) +
geom_boxplot(fill = "lightgreen", color = "black") +
labs(title = "Boxplot of Total Cholesterol Levels", x = "", y = "Total Cholesterol")
print(cholesterol_boxplot)

24
DATA ANALYTICS USING R LAB MANUAL
Output:

## Pregnancies Glucose BloodPressure SkinThickness

## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 99.0 1st Qu.: 62.00 1st Qu.: 0.00
## Median : 3.000 Median :117.0 Median : 72.00 Median :23.00
## Mean : 3.845 Mean :120.9 Mean : 69.11 Mean :20.54
## 3rd Qu.: 6.000 3rd Qu.:140.2 3rd Qu.: 80.00 3rd Qu.:32.00
## Max. :17.000 Max. :199.0 Max. :122.00 Max. :99.00
## Insulin BMI DiabetesPedigreeFunction Age
## Min. : 0.0 Min. : 0.00 Min. :0.0780 Min. :21.00
## 1st Qu.: 0.0 1st Qu.:27.30 1st Qu.:0.2437 1st Qu.:24.00
## Median : 30.5 Median :32.00 Median :0.3725 Median :29.00
## Mean : 79.8 Mean :31.99 Mean :0.4719 Mean :33.24
## 3rd Qu.:127.2 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00
## Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00
## Outcome
## Min. :0.000
## 1st Qu.:0.000
## Median :0.000
## Mean :0.349
## 3rd Qu.:1.000
## Max. :1.000

25
DATA ANALYTICS USING R LAB MANUAL
Program NO. : 11

Date:

Problem Statement: Perform Predictive analytics on Product Sales data

Source Code:

# Load necessary libraries

library(ggplot2) # for data visualization
library(dplyr) # for data manipulation
library(lmtest) # for linear regression

# Load product sales data (sample data)

sales_data <- read.csv("product_sales_data.csv")

# View the structure of the dataset

str(sales_data)

# Summary statistics
summary_stats <- summary(sales_data)
print(summary_stats)

# Data visualization - Time series plot of sales

time_series_plot <- ggplot(sales_data, aes(x = date, y = sales)) +
geom_line() +
labs(title = "Time Series Plot of Sales", x = "Date", y = "Sales")
print(time_series_plot)

# Train-test split (80-20 split)

set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(sales_data), 0.8 * nrow(sales_data))
train_data <- sales_data[train_indices, ]
test_data <- sales_data[-train_indices, ]

# Simple linear regression model

sales_lm <- lm(sales ~ date, data = train_data)

# Summary of the linear regression model

summary(sales_lm)

# Predictions on test data

predicted_sales <- predict(sales_lm, newdata = test_data)

# Evaluate model performance

rmse <- sqrt(mean((predicted_sales - test_data$sales)^2))
cat("Root Mean Squared Error (RMSE):", rmse, "\n")

# Plot actual vs. predicted sales

actual_vs_predicted_plot <- ggplot() +
geom_line(data = test_data, aes(x = date, y = sales), color = "blue", linetype = "solid") +

26
DATA ANALYTICS USING R LAB MANUAL
geom_line(data = test_data, aes(x = date, y = predicted_sales), color = "red", linetype = "dashed") +
labs(title = "Actual vs. Predicted Sales", x = "Date", y = "Sales")
print(actual_vs_predicted_plot)

Output:

27
PROGRAMMING IN PYTHON LAB MANUAL
Program NO. : 12
Problem Statement: Apply Predictive analytics for Weather forecasting

Source Code:

# Load necessary libraries

library(forecast) # for time series forecasting

# Load weather data (sample data)

weather_data <- read.csv("weather_data.csv")

# Convert date column to Date type

weather_data$date <- as.Date(weather_data$date)

# View the structure of the dataset

str(weather_data)

# Summary statistics
summary_stats <- summary(weather_data)
print(summary_stats)

# Data visualization - Time series plot of temperature

time_series_plot <- plot(weather_data$date, weather_data$temperature,
type = "l", xlab = "Date", ylab = "Temperature",
main = "Time Series Plot of Temperature")
print(time_series_plot)

# Create time series object

weather_ts <- ts(weather_data$temperature, frequency = 365)

# Fit ARIMA model

arima_model <- auto.arima(weather_ts)

# Forecast for the next 7 days

forecast_result <- forecast(arima_model, h = 7)

# Plot the forecast

forecast_plot <- plot(forecast_result, main = "Forecast for Next 7 Days")
print(forecast_plot)

# Print forecasted values

print(forecast_result)
everse_words(s) print("The reversed sentence: ",rs)

28
PROGRAMMING IN PYTHON LAB MANUAL
Output:

CT2 Full Test Solutions
No ratings yet
CT2 Full Test Solutions
32 pages
THE IMITATION GAME Teaching Guide Jim Ottaviani
100% (2)
THE IMITATION GAME Teaching Guide Jim Ottaviani
2 pages
Data Analytics With R - BDS306C - LAB - Full
No ratings yet
Data Analytics With R - BDS306C - LAB - Full
61 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Da (22C01156)
No ratings yet
Da (22C01156)
26 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
No ratings yet
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
47 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
Bda Labmanual
No ratings yet
Bda Labmanual
16 pages
2 Business
No ratings yet
2 Business
13 pages
1 Asdfadgaf
No ratings yet
1 Asdfadgaf
8 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Saurabh
No ratings yet
Saurabh
22 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
5 pages
Section 03
No ratings yet
Section 03
20 pages
Ilide - Info Data Analytics Lab File Rohit PR
No ratings yet
Ilide - Info Data Analytics Lab File Rohit PR
23 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
R Syllabus Chandigarh University
No ratings yet
R Syllabus Chandigarh University
3 pages
NAS1001 - NASSCOM-FUTURE-SKILLS - ASSOCIATIVE-DATA-ANALYST - LTP - 1.0 - 1 - NAS1001 - NASSCOM-FUTURE-SKILLS-ASSOCIATIVE-DATA-ANALYST - LTP - 1.0 - 1 - Associative Data Analyst
No ratings yet
NAS1001 - NASSCOM-FUTURE-SKILLS - ASSOCIATIVE-DATA-ANALYST - LTP - 1.0 - 1 - NAS1001 - NASSCOM-FUTURE-SKILLS-ASSOCIATIVE-DATA-ANALYST - LTP - 1.0 - 1 - Associative Data Analyst
3 pages
Analysis Report
No ratings yet
Analysis Report
8 pages
Unit 1
No ratings yet
Unit 1
21 pages
C2 - Data Cleaning & Preprocessing
No ratings yet
C2 - Data Cleaning & Preprocessing
59 pages
Aman Data
No ratings yet
Aman Data
64 pages
GOOGLE CLOUD DATA ANALYTICS - Ingage
No ratings yet
GOOGLE CLOUD DATA ANALYTICS - Ingage
4 pages
Computing Systems DS AI Lab Manual
No ratings yet
Computing Systems DS AI Lab Manual
68 pages
DA Lab 1-7
No ratings yet
DA Lab 1-7
26 pages
R Programming LAB
No ratings yet
R Programming LAB
32 pages
Statiscal Method Using R Lab, Syllabus
No ratings yet
Statiscal Method Using R Lab, Syllabus
3 pages
Wa0002.
No ratings yet
Wa0002.
22 pages
Data Analytics-Lab Manual
No ratings yet
Data Analytics-Lab Manual
19 pages
Chapter 2. Pre-Processing Data
No ratings yet
Chapter 2. Pre-Processing Data
37 pages
Ida Lab Final
No ratings yet
Ida Lab Final
29 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Data Analytics Lab Manual Using R Programming
No ratings yet
Data Analytics Lab Manual Using R Programming
27 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
R Programing
No ratings yet
R Programing
32 pages
Data Science Minor Syllabus-Sem-04
No ratings yet
Data Science Minor Syllabus-Sem-04
4 pages
Sameena Parvin R Prog
No ratings yet
Sameena Parvin R Prog
43 pages
R Studio Assignments
No ratings yet
R Studio Assignments
95 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Singh Project1 Report
No ratings yet
Singh Project1 Report
12 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
ML Exp No 1
No ratings yet
ML Exp No 1
8 pages
Awini Mustapha-Project1
No ratings yet
Awini Mustapha-Project1
8 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Research File 3
No ratings yet
Research File 3
10 pages
DWR Tee Paper
No ratings yet
DWR Tee Paper
8 pages
FMS Final Submission
No ratings yet
FMS Final Submission
25 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Question 1 Ans (DAR)
No ratings yet
Question 1 Ans (DAR)
17 pages
DA Lab Manual
No ratings yet
DA Lab Manual
42 pages
Chapter3 DS
No ratings yet
Chapter3 DS
17 pages
R Lab Manual
No ratings yet
R Lab Manual
27 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Weather Company Caselet
No ratings yet
Weather Company Caselet
7 pages
YIA - List of Potential Supervisors - Call 2 - v3
No ratings yet
YIA - List of Potential Supervisors - Call 2 - v3
7 pages
Scholars Term1 Syllabus
No ratings yet
Scholars Term1 Syllabus
8 pages
DMT MCQ
No ratings yet
DMT MCQ
15 pages
Developing A Smart Triaging System Application Based On Fuzzy Logic For Patient Prioritization
No ratings yet
Developing A Smart Triaging System Application Based On Fuzzy Logic For Patient Prioritization
39 pages
Chhillar Arora 2023 Personal Financial Management Behavior Using Digital Platforms and Its Domains
No ratings yet
Chhillar Arora 2023 Personal Financial Management Behavior Using Digital Platforms and Its Domains
26 pages
Generative-AI-smart City-Report
No ratings yet
Generative-AI-smart City-Report
42 pages
1 PB
No ratings yet
1 PB
8 pages
Artificial Intelligence in Marketing: A Bibliographic Perspective
No ratings yet
Artificial Intelligence in Marketing: A Bibliographic Perspective
14 pages
Status Update
No ratings yet
Status Update
5 pages
Stanford Online AI Professional Program
No ratings yet
Stanford Online AI Professional Program
10 pages
Deloitte's Latest Thought Leadership 2024: India Economic/cross-Sector Publications
No ratings yet
Deloitte's Latest Thought Leadership 2024: India Economic/cross-Sector Publications
14 pages
AD8703 Basic of Computer Vision UNIT 1
No ratings yet
AD8703 Basic of Computer Vision UNIT 1
65 pages
GROUP 4 Filipino Students Reliance To AI Technology When Is It Too Much
No ratings yet
GROUP 4 Filipino Students Reliance To AI Technology When Is It Too Much
76 pages
Capstone Project Image Caption Generator
No ratings yet
Capstone Project Image Caption Generator
8 pages
Adversarial Search
No ratings yet
Adversarial Search
3 pages
314321-MICROPROCESSOR PROGRAMMING (K-Scheme-Syllabus)
No ratings yet
314321-MICROPROCESSOR PROGRAMMING (K-Scheme-Syllabus)
9 pages
Resume - Ashish Mangalampalli
No ratings yet
Resume - Ashish Mangalampalli
3 pages
How Is India's Trade Landscape Shaping Up For The Future - Economics Observatory
No ratings yet
How Is India's Trade Landscape Shaping Up For The Future - Economics Observatory
10 pages
Artificial Intelligence July
No ratings yet
Artificial Intelligence July
1 page
Harnessing AI For Advancing Pathogenic Microbiolog
No ratings yet
Harnessing AI For Advancing Pathogenic Microbiolog
15 pages
Ch1 Part 1
No ratings yet
Ch1 Part 1
25 pages
Annual Report HnCC-1
No ratings yet
Annual Report HnCC-1
21 pages
TỔNG HỢP ĐỀ CƯƠNG GKII 12 - HS
No ratings yet
TỔNG HỢP ĐỀ CƯƠNG GKII 12 - HS
18 pages
RPE1
No ratings yet
RPE1
13 pages
Uses and Application in FMCG Sector of Supply Chain Management
No ratings yet
Uses and Application in FMCG Sector of Supply Chain Management
3 pages
Malware Detection With AI
No ratings yet
Malware Detection With AI
33 pages
Chatbot For Healthcare
No ratings yet
Chatbot For Healthcare
6 pages