0% found this document useful (0 votes)

122 views4 pages

Data Wrangling (Data Preprocessing)

The document discusses data wrangling and preprocessing. It generates three synthetic datasets - a sales dataset with 150 rows, a customer dataset with 200 rows, and an inventory dataset with 200 rows. Each dataset contains randomly generated data along with missing values and outliers introduced. The datasets are exported to CSV files. Next steps mentioned are merging the datasets, checking the structure of the combined data, generating summary statistics, and scanning for missing values. However, the code blocks provided are empty and explanations for each step are missing.

Uploaded by

Siddharth Raul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views4 pages

Data Wrangling (Data Preprocessing)

Uploaded by

Siddharth Raul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

9/18/23, 7:29 PM Data Wrangling (Data Preprocessing)

Data Wrangling (Data Preprocessing) Code

Mid-term assessment
Siddharth Dinkar Raul (s4015125)
18-09-2023

Setup
Hide

# Load the necessary packages required to reproduce the report.

library(tibble)
library(dplyr)
library(lubridate)

Data generation
Hide

file:///C:/Users/SIDDHARTH/Downloads/Data Wrangling 2/Mid-term-Assessment-Rmarkdown-Template.nb.html 1/4

9/18/23, 7:29 PM Data Wrangling (Data Preprocessing)

# Data generation, provide your R codes

# Generating date range

start_date <- as.Date("2023-01-01")
end_date <- as.Date("2023-12-31")
date_range <- seq(start_date, end_date, by = "days")

# Setting the seed

set.seed(285)

# Creating the first dataset ( Sales dataset)

sales_data <- tibble(
date = sample(date_range, 150, replace = TRUE),
product_id = sample(1:200, 150, replace = TRUE),
product_name = as.character(replicate(150, paste(sample(words, 2), collapse = " "))),
quantity_sold = as.numeric(sample(1:20, 150, replace = TRUE)),
price = as.numeric(runif(150, min = 50, max = 500)),
customer_id = as.factor(sample(1:500, 150, replace = TRUE)),
store_id = as.factor(sample(1:5, 150, replace = TRUE)) # Common variable "store_id"
)

# Introducing the missing values in the "price" column (approximately 5%)

sales_data[sample(1:150, 5), "price"] <- NA

# Introducing outliers

sales_data[sample(1:150, 5), "quantity_sold"] <- sales_data[sample(1:150, 5), "quantity_sol

d"] * 10
sales_data[sample(1:150, 5), "price"] <- sales_data[sample(1:150, 5), "price"] * 2

# Exporting to CSV
write.csv(sales_data, "sales_data.csv", row.names = FALSE)

# Creating second dataset ( Customer Dataset)

set.seed(286)

customer_data <- tibble(

customer_id = as.factor(1:200),
customer_name = as.character(replicate(200, paste(sample(LETTERS, 5), collapse = ""))),
email = as.character(paste0(replicate(200, paste(sample(letters, 5), collapse = "")), "@exa
mple.com")),
total_purchases = as.numeric(sample(100:1000, 200, replace = TRUE)),
is_member = as.logical(sample(c(TRUE, FALSE), 200, replace = TRUE, prob = c(0.6, 0.4))),
store_id = as.factor(sample(1:5, 200, replace = TRUE)) # Common variable "store_id"
)

# Introduce missing values in the "email" column (approximately 5%)

customer_data[sample(1:200, 10), "email"] <- NA

# Export to CSV
write.csv(customer_data, "customer_data.csv", row.names = FALSE)

file:///C:/Users/SIDDHARTH/Downloads/Data Wrangling 2/Mid-term-Assessment-Rmarkdown-Template.nb.html 2/4

9/18/23, 7:29 PM Data Wrangling (Data Preprocessing)

# Creating second dataset ( Customer Dataset)

# Create an inventory dataset

set.seed(789)
inventory_data <- tibble(
product_id = as.factor(1:200),
product_name = as.character(replicate(200, paste(sample(words, 2), collapse = " "))),
stock_level = as.numeric(sample(1:100, 200, replace = TRUE)),
supplier = as.character(replicate(200, paste(sample(LETTERS, 3), collapse = ""))),
cost_price = as.numeric(runif(200, min = 50, max = 200)),
selling_price = as.numeric(runif(200, min = 100, max = 500)),
store_id = as.factor(sample(1:5, 200, replace = TRUE)) # Common variable "store_id"
)

# Introduce missing values in the "stock_level" column (approximately 5%)

inventory_data[sample(1:200, 10), "stock_level"] <- NA

# Introduce outliers
inventory_data[sample(1:200, 5), "cost_price"] <- inventory_data[sample(1:200, 5), "cost_pric
e"] * 0.5
inventory_data[sample(1:200, 5), "selling_price"] <- inventory_data[sample(1:200, 5), "sellin
g_price"] * 2

# Export to CSV
write.csv(inventory_data, "inventory_data.csv", row.names = FALSE)

Provide explanations here.

Merging data sets

Hide

# Merge your synthetic data sets, provide R codes here.

Provide explanations here.

Checking structure of combined data

Hide

# Check structure of combined data and perform all necessary data type conversions, provide R
codes here.

Provide explanations here.

Generate summary statistics

Hide

# Generate summary statistics, provide R codes here.

file:///C:/Users/SIDDHARTH/Downloads/Data Wrangling 2/Mid-term-Assessment-Rmarkdown-Template.nb.html 3/4

9/18/23, 7:29 PM Data Wrangling (Data Preprocessing)

Provide explanations here.

Scanning data
Hide

# Scan variables for missing values, provide R codes here.

Provide explanations here.

file:///C:/Users/SIDDHARTH/Downloads/Data Wrangling 2/Mid-term-Assessment-Rmarkdown-Template.nb.html 4/4

Big Mart Sales Analysis
No ratings yet
Big Mart Sales Analysis
3 pages
SS Teamproject Documentation
No ratings yet
SS Teamproject Documentation
33 pages
1 - Transport Phenomena Introduction
No ratings yet
1 - Transport Phenomena Introduction
80 pages
How To Draw EXACTLY What You See V1.4
91% (11)
How To Draw EXACTLY What You See V1.4
78 pages
Supermart Grocery Sales - Retail Analytics Dataset (Finance Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset (Finance Analyst)
19 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Unit 3-5 15 Marks
No ratings yet
Unit 3-5 15 Marks
8 pages
Machine Learning Project
No ratings yet
Machine Learning Project
10 pages
Explore and Transform Data Based On Rows - Transcript
No ratings yet
Explore and Transform Data Based On Rows - Transcript
3 pages
Machine Learning Using Python
100% (1)
Machine Learning Using Python
2 pages
DK Phase2
No ratings yet
DK Phase2
5 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
B M Sale Analysis
No ratings yet
B M Sale Analysis
3 pages
Big Mart Sales Analysis
No ratings yet
Big Mart Sales Analysis
3 pages
Predictive Modeling
No ratings yet
Predictive Modeling
27 pages
R Programming
No ratings yet
R Programming
11 pages
Report Shawari
No ratings yet
Report Shawari
10 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Phase 3
No ratings yet
Phase 3
19 pages
Internship Report Data Science
100% (1)
Internship Report Data Science
58 pages
First Coding Session - Overview!
No ratings yet
First Coding Session - Overview!
5 pages
Dar Case Study
No ratings yet
Dar Case Study
12 pages
Part 1
No ratings yet
Part 1
3 pages
Lab 03
No ratings yet
Lab 03
13 pages
Analytics Roadmap
No ratings yet
Analytics Roadmap
30 pages
Cours 3 - TP
No ratings yet
Cours 3 - TP
3 pages
06 Superstore
No ratings yet
06 Superstore
14 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
EDA Report Week2
No ratings yet
EDA Report Week2
15 pages
Data Analysis With R
No ratings yet
Data Analysis With R
72 pages
Main Phase 3 Dharani
No ratings yet
Main Phase 3 Dharani
19 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
IIM PBA Assignment 2
No ratings yet
IIM PBA Assignment 2
3 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Walmart (Project)
No ratings yet
Walmart (Project)
46 pages
Naukri JanardhanJadhav[7y 0m]
No ratings yet
Naukri JanardhanJadhav[7y 0m]
3 pages
Retail Sales Prediction Model
No ratings yet
Retail Sales Prediction Model
50 pages
Supply Chain Management - ML - FA - DA Project
No ratings yet
Supply Chain Management - ML - FA - DA Project
13 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
Each Stage of A Data Mining Project
No ratings yet
Each Stage of A Data Mining Project
5 pages
Analytical Project Using Python BMBA-252
No ratings yet
Analytical Project Using Python BMBA-252
4 pages
Delhivery Feature Engineering - Solution Approach
No ratings yet
Delhivery Feature Engineering - Solution Approach
7 pages
m03 v01 Store Sales Prediction
No ratings yet
m03 v01 Store Sales Prediction
11 pages
BusinessCaseStudyTargetMySQL v1
No ratings yet
BusinessCaseStudyTargetMySQL v1
31 pages
Data Analysis and Data Science Task - 2
No ratings yet
Data Analysis and Data Science Task - 2
3 pages
Pranjali Mishra Resume BusinessAnalyst
No ratings yet
Pranjali Mishra Resume BusinessAnalyst
1 page
Data Analysis
No ratings yet
Data Analysis
4 pages
DS Exp4
No ratings yet
DS Exp4
4 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
BI Pracrical
No ratings yet
BI Pracrical
12 pages
Rakshana SN - LAQ Week 2 DA
No ratings yet
Rakshana SN - LAQ Week 2 DA
3 pages
Jak A Tomovi CV
No ratings yet
Jak A Tomovi CV
2 pages
Beginners Guide To Data Science - A Twics Guide 1
100% (1)
Beginners Guide To Data Science - A Twics Guide 1
41 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
BIDA Thoerypdf
No ratings yet
BIDA Thoerypdf
9 pages
Assignments
No ratings yet
Assignments
3 pages
Internship Report 1 Internship Report 1
No ratings yet
Internship Report 1 Internship Report 1
24 pages
1july Presentation
No ratings yet
1july Presentation
18 pages
Maglumi 2000+
No ratings yet
Maglumi 2000+
7 pages
JEE Advanced Paper - 1 (18-08) - Solutions
No ratings yet
JEE Advanced Paper - 1 (18-08) - Solutions
39 pages
Women in Nazi Society 1st Edition Stephenson Download
100% (1)
Women in Nazi Society 1st Edition Stephenson Download
51 pages
Tsho Rolpa
No ratings yet
Tsho Rolpa
2 pages
HHT Manual
No ratings yet
HHT Manual
37 pages
Technor JB Datasheet
No ratings yet
Technor JB Datasheet
4 pages
Aero Engine Combustor Casing Experimental Design and Fatigue Studies 1st Edition Sashi Kanta Panigrahi PDF Download
No ratings yet
Aero Engine Combustor Casing Experimental Design and Fatigue Studies 1st Edition Sashi Kanta Panigrahi PDF Download
53 pages
2040 4 0 SV8100 ACD Supervisor
No ratings yet
2040 4 0 SV8100 ACD Supervisor
458 pages
Computer Fundamentals-1L1CS - LessonPlan2024
No ratings yet
Computer Fundamentals-1L1CS - LessonPlan2024
3 pages
Cambodia Census Result 98
No ratings yet
Cambodia Census Result 98
305 pages
Business Plan Ambito Final Submission
No ratings yet
Business Plan Ambito Final Submission
17 pages
HFU 2020ANseries
No ratings yet
HFU 2020ANseries
2 pages
Digestion New BioHack
No ratings yet
Digestion New BioHack
10 pages
Ansys Maxwell Egtm Icerigi
No ratings yet
Ansys Maxwell Egtm Icerigi
2 pages
VIPdoc 3
No ratings yet
VIPdoc 3
116 pages
Thesis Social Networking Tagalog
100% (3)
Thesis Social Networking Tagalog
6 pages
JPALS
100% (1)
JPALS
20 pages
Chang Betar Kai-Zen
No ratings yet
Chang Betar Kai-Zen
13 pages
Tutorials On The Usage of The Geo R
No ratings yet
Tutorials On The Usage of The Geo R
113 pages
CPAR Reviewer Finals
No ratings yet
CPAR Reviewer Finals
2 pages
Her First Ball-Katherine Mansfield
100% (1)
Her First Ball-Katherine Mansfield
1 page
Microprocessor and Interfacing Unit 1
No ratings yet
Microprocessor and Interfacing Unit 1
10 pages
SCM Haitham Mamdouh 19
No ratings yet
SCM Haitham Mamdouh 19
4 pages
Gmail - Here's Your Giga Receipt
No ratings yet
Gmail - Here's Your Giga Receipt
2 pages
Leave Application: 509964187 Procrement Dep. USF-M-020 2329343558 USF 阿美安全设施升级改造项目
No ratings yet
Leave Application: 509964187 Procrement Dep. USF-M-020 2329343558 USF 阿美安全设施升级改造项目
1 page
1319047271ba Psychology Ptactical 2024
No ratings yet
1319047271ba Psychology Ptactical 2024
1 page
DPN K06 0501 0508 GH DT SD 2015 00
No ratings yet
DPN K06 0501 0508 GH DT SD 2015 00
1 page
Vidhi Mehta (Updated-23) - 240402 - 105303
No ratings yet
Vidhi Mehta (Updated-23) - 240402 - 105303
3 pages