DV Unit 2 Update
DV Unit 2 Update
2.data.table Package:
• This package allows you to perform faster manipulation in data
set
• Data.table helps in reducing computing times as compared to
data.frames.
• A data table has 3 part namely DT[I ,j,by]
• Subset the rows using ‘i’ , to calculate ‘j’ which is grouped by
‘by’.
Parameters
1. X :- This parameter sets the horizontal coordinates.
2. Y :- This parameter sets the vertical coordinate.
3. Xlab:- This parameter is the label for horizontal axis.
4. Ylab:- This parameter is the label for vertical axis.
5. Main:- This parameter main is the title of the chart.
6. Xlim:- This parameter used to plotting values of x.
7. Ylim:- This parameter is used to plotting values of y.
8. Axes:- This parameter indicates whether both axes
should be drawn on the plot. Ex:- x—c(1,2,3,4,5)
y—c(2,4,8,10)
plot(x, y, main=”scatter plot”, xlab=”x-axis”,
ylab=”y-axis”)
scatter plot diagram
R-Bar chart
A bar chart is a pictorial representation of data that
presents categorical data with rectangular bars with
height or lengths proportional to the values that they
represent.
R uses the function barplot () to create bar charts.
Syntax:
Barplot (h, xlab, ylab, main, names.arg, col)
Parameters:
H:- this parameter is a vector or matrix containing numeric values
which is used in bar chart.
Xlab
Ylab
Main
Names.arg:- this parameter is a vector of names appering under each
bar in bar chart.
Col:- This parameter is used to give colors to the bars in the graph.
Ex: a=c(17,32,8,53,1)
Barplot(A, xlab=”x-axis”, ylab=”y-axis”, main=”Bar-char”)
Bar chart diagram
R-Histograms
In R to plot histogram the hist() function in used
Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
Parameter:
V: - This parameter contain numerical values used in histogram.
Main:-
Col: - This parameter is used to set color of the bars.
Xlab:-
Border: - This parameter is used to set border color of each bar
Xlim:-
Ylim:-
Break:- This parameter is used as width of each bar.
Ex: v—c(19,23,11,6,16,21,32,14,19,27,39)
His(v, xlab=”no. of Articles”, ylab=”frequency” col=”green”,
border=”black”)
Diagram
Histogram is used to show the frequency distribution of data.
Watson Studio
It is an integrated development environment (IDR) provided by
IBM that allows data scientists and developers to collaboratively
work on data analysis and machine learning projects.
It is a part of the IBM Watson suite of AI and machine learning
tools.
Data Refinary-
Data refinery is a process which collect data(from
disparate sources) enriches data (blending different data
sets) and create an integrated refine data repository which
can be used for analysis to take actions.
CollectEnrichRefined data repository Analyze
Act
4)
To save your visualization, select actions > save visualization to
project.
Your saved visualization is listed as a visualization asset in your
project.
You can view or edit the visualization by clicking the name of
the visualization assets of your project.