DV Unit 2 Update

Unit 2 covers data manipulation and visualization using R, emphasizing the importance of organizing data for better interpretation. It introduces various packages like dplyr, ggplot2, and data.table for efficient data handling and visualization techniques such as scatter plots and bar charts. Additionally, it discusses Watson Studio's features for collaborative data analysis and machine learning model development.

Uploaded by

krunalsawarkar2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

DV Unit 2 Update

Uploaded by

krunalsawarkar2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Unit 2

Data Manipulation with R

• Data manipulation is process of organizing or arranging the data

in order to make it easier to interpret.
• Also called as ‘Data Exploration’.
• It involves ‘manipulation’ data using available set of variables.
This is done to enhance accuracy and precision associated with
data.
• The data collection process can have many loopholes .These are
various uncontrollable factors which leads to inaccuracy in data
such as mental situation of respondents , personal biases,
difference /error in reading of machines etc. To is done to
increase the possible accuracy in data.

Different Ways to manipulate / Treat Data

1. Manipulating data using inbuilt base R functions
2. Use of packages for data manipulation
3. Use of ML(machine learning) algorithms for data manipulation.

You can install a packages using

Install.packages(‘package name’)
List of Packages
1. Dplyr
2. Data.tables
3. Ggplot2
4. Reshape2
5. Reader r
6. Read r
7. Tidy r
8. Lubridate

1.dplyr Package: This package is created and maintained by

Hadley Wickham.
• This package has everything to accelerate data manipulation
efforts.
• It includes 5 major data manipulation commands:
1. Filters: Filters the data based on a condition.
2. Select: select column of interest from data set.
3. Arrange: Arrange data set values on ASCENDING or
DESCENDING order.
4. Mutate: Used to create new variable from existing
variable.
5. Summarise: Used to perform analysis by commonly used
operation such as min ,max, mean ,etc

2.data.table Package:
• This package allows you to perform faster manipulation in data
set
• Data.table helps in reducing computing times as compared to
data.frames.
• A data table has 3 part namely DT[I ,j,by]
• Subset the rows using ‘i’ , to calculate ‘j’ which is grouped by
‘by’.

3.ggplot2: Ggplot offers a set of colors and

patterns.
• It provides the function to plot the graph like scatter plot , bar
plot , histogram etc.
4.reshape 2 Package:

• This package useful in reshaping data.

• It has two function namely melt and cast.
1. Melt: this function converts data from wide format to long
format.
2. Cast: This function converts data from long format to wide
format.
5.read r Package:

• ‘Read r ‘ helps in reading various forms of data into R.

• This package can replaced the traditional read.csv() and
read.table() base R function.
1. Web log files with read_logic()
2. Fixed width files with read_fwf(), and read _ table()
3. Read_csv() , read_csv2()
6.Hidyr Package:

• This package can make your data look tidy.

• It has four functions
1. Gather() : It ‘gather’ multiple colummns. Then it converts
them into key: value form wide from of data to long form.
2. Spread() : It is the reverse of gather .It takes key:value pair
and convert it into separate columns.
3. Separate(): It splits a column into multiple columns.
4. Unite: It does reverse of separate . It unites multiple
columns into single column.

7.Lubridate Package: It used to work with data

time variable in R.
• The inbuilt function of this package easy way to parse the date
& time.

Data Visualization with R

 Data visualization is way to explore, analyze and present data is
visual format.
 R provides wide range of libraries and packages specifically
designed for creating various types of data visualization.
 Some commonly used packages for data visualization in R
include ggplot1, plotly, lattice, and base graphics.
1) Scatter plot:
 A scatter plot is set of dotted points representing
individual data pieces on the horizontal and vertical axis.
 In a graph in which the values of two variables are
plotted along x-axis and y-axis the pattern of resulting
points reveals a correction between them.
R-Scatter Plot
 In R scatter plot is created using plot() function.
Syntax: plot(x, y, main, xlab, ylab, xlim, ylim, axes)

Parameters
1. X :- This parameter sets the horizontal coordinates.
2. Y :- This parameter sets the vertical coordinate.
3. Xlab:- This parameter is the label for horizontal axis.
4. Ylab:- This parameter is the label for vertical axis.
5. Main:- This parameter main is the title of the chart.
6. Xlim:- This parameter used to plotting values of x.
7. Ylim:- This parameter is used to plotting values of y.
8. Axes:- This parameter indicates whether both axes
should be drawn on the plot. Ex:- x—c(1,2,3,4,5)
y—c(2,4,8,10)
plot(x, y, main=”scatter plot”, xlab=”x-axis”,
ylab=”y-axis”)
scatter plot diagram

R-Bar chart
 A bar chart is a pictorial representation of data that
presents categorical data with rectangular bars with
height or lengths proportional to the values that they
represent.
 R uses the function barplot () to create bar charts.
Syntax:
Barplot (h, xlab, ylab, main, names.arg, col)

Parameters:
H:- this parameter is a vector or matrix containing numeric values
which is used in bar chart.
Xlab
Ylab
Main
Names.arg:- this parameter is a vector of names appering under each
bar in bar chart.
Col:- This parameter is used to give colors to the bars in the graph.
Ex: a=c(17,32,8,53,1)
Barplot(A, xlab=”x-axis”, ylab=”y-axis”, main=”Bar-char”)
Bar chart diagram

Horizontal Bar chart

Horiz=TRUE
Ex: A—c(17,32,8,53,1)
Barplot (A, horiz=TRUE, xlab=”x-axis”,ylab=”y-axis”,
main=”barchart”)
Bar chart diagram

Ex: A—c (17,2,8,13,1,22)

B—c (“jan”, ”feb”, ”mar”, “apr”, “may”, “jun”)
Barplot(A, names.arg=B, xlab=”month”, ylab=”Articles”,
col=”green”, main=”Article chart”)

R-Histograms
 In R to plot histogram the hist() function in used
Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)

Parameter:
V: - This parameter contain numerical values used in histogram.
Main:-
Col: - This parameter is used to set color of the bars.
Xlab:-
Border: - This parameter is used to set border color of each bar
Xlim:-
Ylim:-
Break:- This parameter is used as width of each bar.
Ex: v—c(19,23,11,6,16,21,32,14,19,27,39)
His(v, xlab=”no. of Articles”, ylab=”frequency” col=”green”,
border=”black”)

Diagram
Histogram is used to show the frequency distribution of data.
Watson Studio
 It is an integrated development environment (IDR) provided by
IBM that allows data scientists and developers to collaboratively
work on data analysis and machine learning projects.
 It is a part of the IBM Watson suite of AI and machine learning
tools.

Key features of Watson studio 1) Data preparation-

 Watson studio allows users to access and explore data
from various sources, including databases, cloud storage
and streaming data. It provide tools for data cleansing
,transformation and enrichment.
2) Notebook- Data scientist can create jupyter notebooks within
Watson studio to perform data analysis, data visualization and to
build and train machine learning models.
3) Model development- Watson studio provides tools and libraries
for building models and training machine learning models. User
can use different framework like Tensorflow, pyTorch,
Scikitlearn etc.
4) Model deployment- after creating and training a machine
learning model, Watson studio allow users to deploy the models
as web services or API`s making them accessible for real-time
predictions and integration with other applications.
5) Auto AI- Watson studio offers Auto AI, on automated learning
capability that helps users quickly build and deploy machine
learning models without the need for extensive manual
configuration.
6) Collaboration- Teams can collaborate on project within Watson
studio, share code, notebooks and data and manage access
controls to ensure secure collaboration.
7) Model Monitoring- Watson studio provides tools to monitor the
performance of deployed machine learning models and track
their usage over time.
Applications of Watson studio-
1) Data Analysis and Exploration.
2) Predictive Analysis
3) Natural language processing(NLP):- Watson studio can be used
for sentiment analysis, language translation, chatbot
development.
4) Image and video analysis
5) Social media analysis
6) Agriculture and environment monitoring
7) Education
8) Healthcare and life sciences
9) Customer segmentation and marketing.

Data Refinary-
 Data refinery is a process which collect data(from
disparate sources) enriches data (blending different data
sets) and create an integrated refine data repository which
can be used for analysis to take actions.
CollectEnrichRefined data repository Analyze 
Act

Steps to visualize data in Watson studio

1)
On the assets tab of your project, click data asset in the list of
assest type and select a data assest.
2)
Click the visualization tab.
3)
Select the chart type from the options that are listed and input
your preferences in the graphical options pane.
Available chart types are ordered from most relevant to least
relevant , based on the selected columns.
If there are no columns in data set with a data type that is
supported for a chart type, that chart will not be available. If a
columns data type is not supported for a chart, that column is
not available for selection for that charts. Dots next to the
charts names suggest the best charts for your data.
As you are building the chart, the canvas displays a preview of
the chart. The preview uses the actual variables labels and
measurement levels that are representative of your actual data.

4)
To save your visualization, select actions > save visualization to
project.
Your saved visualization is listed as a visualization asset in your
project.
You can view or edit the visualization by clicking the name of
the visualization assets of your project.

Interpreting and Preparing Visuals
60% (10)
Interpreting and Preparing Visuals
36 pages
R PROGRAMMING QUESTION BANK Answer
100% (1)
R PROGRAMMING QUESTION BANK Answer
20 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
15 pages
Sigma Plot 11 Users Guide
25% (4)
Sigma Plot 11 Users Guide
947 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
Labview File Io
No ratings yet
Labview File Io
18 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Module 2 data collection
No ratings yet
Module 2 data collection
17 pages
End Term + Mid Term
No ratings yet
End Term + Mid Term
54 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Experiment 3
No ratings yet
Experiment 3
43 pages
FSS 840 - Lecture2 - Stats Overview - Soneye - Sept2021
No ratings yet
FSS 840 - Lecture2 - Stats Overview - Soneye - Sept2021
31 pages
Assignment Set-02: Research Methodology
No ratings yet
Assignment Set-02: Research Methodology
15 pages
DV - Unit 2
No ratings yet
DV - Unit 2
73 pages
Tabular and Graphical Methods: Business Statistics: Communicating With Numbers, 4e
No ratings yet
Tabular and Graphical Methods: Business Statistics: Communicating With Numbers, 4e
32 pages
Week 4 Types of Instructional Materials
No ratings yet
Week 4 Types of Instructional Materials
163 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
DVDA (Ritish) Final
No ratings yet
DVDA (Ritish) Final
46 pages
R Module 4
No ratings yet
R Module 4
42 pages
DSCI Key Terms and Ideas For Review
No ratings yet
DSCI Key Terms and Ideas For Review
98 pages
NCMEH Assignment Template 2018
No ratings yet
NCMEH Assignment Template 2018
10 pages
Training in R For Data Statistics
No ratings yet
Training in R For Data Statistics
113 pages
Final Exam Answer Scheme (Set A)
No ratings yet
Final Exam Answer Scheme (Set A)
18 pages
Bdo Co1 Session 4
No ratings yet
Bdo Co1 Session 4
43 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
lab week2-3
No ratings yet
lab week2-3
26 pages
01 Introduction To SAP HANA - Data Visualization
No ratings yet
01 Introduction To SAP HANA - Data Visualization
12 pages
7 Graphs and Charts
No ratings yet
7 Graphs and Charts
21 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
A Guide in Research Writing
No ratings yet
A Guide in Research Writing
36 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
How To Use The R Programming Language For Statistical Analyses
No ratings yet
How To Use The R Programming Language For Statistical Analyses
38 pages
Unit 5 - R and Data Analysis
No ratings yet
Unit 5 - R and Data Analysis
29 pages
IT_R23_Skills Development-DATA VISUALIZATION Lab
No ratings yet
IT_R23_Skills Development-DATA VISUALIZATION Lab
31 pages
121a1086 - Bda - Assignment - No.2
No ratings yet
121a1086 - Bda - Assignment - No.2
31 pages
Mendenhall R
No ratings yet
Mendenhall R
14 pages
June 2018 MS - Component 1 WJEC Geography (B) GCSE
No ratings yet
June 2018 MS - Component 1 WJEC Geography (B) GCSE
16 pages
A Brief Introduction To R
No ratings yet
A Brief Introduction To R
17 pages
Ogive
No ratings yet
Ogive
16 pages
Physics
No ratings yet
Physics
11 pages
Unit III - R Programming
No ratings yet
Unit III - R Programming
21 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
22MSM40206 Data Visualisation
No ratings yet
22MSM40206 Data Visualisation
13 pages
R – Charts and Graphs[1]
No ratings yet
R – Charts and Graphs[1]
21 pages
Practical File R by Komal
No ratings yet
Practical File R by Komal
26 pages
Csit
No ratings yet
Csit
8 pages
Possible Questions on R Programming and Metaverse
No ratings yet
Possible Questions on R Programming and Metaverse
20 pages
Lesson Plan Graphs
No ratings yet
Lesson Plan Graphs
2 pages
Module.+5+ +Business+Reporting+and+Visual+Analytics
No ratings yet
Module.+5+ +Business+Reporting+and+Visual+Analytics
14 pages
BA Notes
No ratings yet
BA Notes
5 pages
unit3_R[1] (1)
No ratings yet
unit3_R[1] (1)
30 pages
DA_Lab_Week-2
No ratings yet
DA_Lab_Week-2
22 pages
Unit_3 (1)
No ratings yet
Unit_3 (1)
36 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
R
No ratings yet
R
13 pages
Activity 2
No ratings yet
Activity 2
17 pages
BA End Sem Important (3)
No ratings yet
BA End Sem Important (3)
18 pages
basics of R
No ratings yet
basics of R
12 pages
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
No ratings yet
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
28 pages
Unit3__R
No ratings yet
Unit3__R
19 pages
R Programming
No ratings yet
R Programming
11 pages
ppt3
No ratings yet
ppt3
20 pages
4251 Assignment 8
No ratings yet
4251 Assignment 8
15 pages
Experiment # 4
No ratings yet
Experiment # 4
10 pages
R Programming
No ratings yet
R Programming
22 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
Unit 4
No ratings yet
Unit 4
27 pages
Assignment (4).Module RAmanVerma(22MBA10026)
No ratings yet
Assignment (4).Module RAmanVerma(22MBA10026)
18 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
10 pages
Lesson_4.6_-_Creating_a_Line_Chart
No ratings yet
Lesson_4.6_-_Creating_a_Line_Chart
8 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Q.2 Part 2
No ratings yet
Q.2 Part 2
10 pages
Histograms and Density Plots in R
No ratings yet
Histograms and Density Plots in R
9 pages
File of Charts
No ratings yet
File of Charts
10 pages
VISUALIZING A SINGLE VARIABLE USING R
No ratings yet
VISUALIZING A SINGLE VARIABLE USING R
9 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
21CS644 Module 4
No ratings yet
21CS644 Module 4
24 pages
Activity 1
No ratings yet
Activity 1
3 pages
Analytical Paragraph CLASSS 10
No ratings yet
Analytical Paragraph CLASSS 10
4 pages
Firstperformancetask
No ratings yet
Firstperformancetask
7 pages
Fundamentals of Data Visualization
No ratings yet
Fundamentals of Data Visualization
72 pages
Bca-Iv Sem Dar Imp Questions
100% (1)
Bca-Iv Sem Dar Imp Questions
1 page
Lesson Plan- Linear and Non Linear Text
No ratings yet
Lesson Plan- Linear and Non Linear Text
12 pages
SLG 2.2 Bar Graphs
No ratings yet
SLG 2.2 Bar Graphs
6 pages
Types of Graphs
No ratings yet
Types of Graphs
2 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet