QB Samplealllllll Hemu
QB Samplealllllll Hemu
NO Questions
1. What is R? Give the features of R. What are the limitations of R Explain
• R is a language and environment for statistical computing and graphics.
• R provides a wide variety of statistical and graphical techniques, and is
highly extensible.
• R programming is a leading tool for machine learning, statistics, and data
analysis.
• The environment in R is a data structure that stores objects like variables,
functions, and data frames
• One of R’s strengths is the ease with which well-designed publication-
quality plots can be produced, including mathematical symbols and
formulae where needed.
• R is available as Free Software under the terms of the Free Software
Foundation’s source code form. It compiles and runs on a wide variety of
UNIX platforms and similar systems (including Linux), Windows and
MacOS.
• R is a true computer language, it allows users to add additional functionality
by defining new functions.
• Much of the system is itself written in the R. R can be linked to C, C++
and Fortran code and call this code at run time. Advanced users can write C
code to manipulate R objects directly.
Features of R are:
• It is an open-source tool
• R supports Object-oriented as well as Procedural programming.
• It provides an environment for statistical computation and software
development.
• Provides extensive packages & libraries
• R has a wonderful community for people to share and learn from experts
• Platform independence
• Integration with other lang
• Robust community ansd support
Limitations:
• R has less support for dynamic or 3D graphics.
• R consumes more memory. R objects must generally be stored in physical
memory.
• Its functionality is based on consumer demand and (voluntary) user
contributions. If no one feels like implementing your favorite method, then
it’s your job to implement it
• Big data processing is slow
EVALUATION:
When a complete expression is entered at the prompt, it is evaluated and the result
of the evaluated expression is returned. The result may be auto-printed.
> x <- 5 ## nothing printed
> x ## auto-printing occurs
[1] 5
> print(x) ## explicit printing
[1] 5
The [1] shown in the output indicates that x is a vector and 5 is its first element.
Data frames
List
Vector
Mtris: Matrices are two dimensional vectors with a dimension attribute.
The dimension attribute is itself an integer vector of length 2 (number of rows,
number of columns)
4. What are vectors? With the help of a code explain how to create a vector in R.
• Vectors are the basic building blocks of R. Vectors are a sequence of
elements belonging to the same data type. A vector is a single dimensional,
homogenous data structure in R
Empty vectors can be created with the vector(). A vector can only contain objects
of the same class
Matrix Operations
1. Matrix Arithmetic
o Addition: mat + mat2
o Subtraction: mat - mat2
o Multiplication (element-wise): mat * mat2
o Division (element-wise): mat / mat2
o To perform matrix multiplication, use the %*% operator.
o Use the t() function to transpose a matrix.
o Use square brackets [] to access specific elements, rows, or columns.
10. Explain how reading and writing operations of different formats are performed in
R?
READ:
read.table, read.csv, for reading tabular data
• readLines, for reading lines of a text file
source, for reading in R code files (inverse of dump)
• dget, for reading in R code files (inverse of dput)
• load, for reading in saved workspaces
Write:
• write.table, for writing tabular data to text files (i.e. CSV) or connections
• • writeLines, for writing character data line-by-line to a file or connection
• • dump, for dumping a textual representation of multiple R objects
• • dput, for outputting a textual representation of an R object
• • save, for saving an arbitrary number of R objects in binary format to a file.
• • serialize, for converting an R object into a binary format for outputting to a
connection (or file).
11. Explain sub setting a list, matrix and vector in R How will you subset nested
elements in a list?
Subsetting means extracting subsets from R objects like lists, vectors, matrix etc.
There are three operators that can be used to extract subsets of R objects.
• The [ operator always returns an object of the same class as the original. It can be
used to select multiple elements of an object
• The [[ operator is used to extract elements of a list or a data frame. It can only be
used to extract a single element and the class of the returned object will not
necessarily be a list or data frame.
• The $ operator is used to extract elements of a list or data frame by literal name. Its
semantics are similar to that of [[.
12. How will you extract multiple elements of list? Explain partial matching.
The [ operator can be used to extract multiple elements from a list. For example, if
you wanted to extract the first and third elements of a list, you would do the
following
> x <- list(foo = 1:4, bar = 0.6, baz = "hello")
> x[c(1, 3)]
$foo
[1] 1 2 3 4
$baz
[1] "hello"
Note that x[c(1, 3)] is NOT the same as x[[c(1, 3)]].
• Partial matching of names is allowed with [[ and $. This is often very useful
during interactive work if the object you’re working with has very long
element names. You can just abbreviate those names and R will figure out
what element you’re referring to.
13. How will you remove NA values?
dplyr is designed to abstract over how the data is stored. That means as well as
working with local data frames, you can also work with remote database tables,
using exactly the same R code. Install the dbplyr package then read
vignette("databases", package = "dbplyr").
• For operations like filtering, reordering, collapsing we have dplyr package in
R that has a highly optimized set of functions for working with data frames.
• The dplyr package was developed by Hadley Wickham of RStudio
• Dplyr provides a grammar for data manipulation and operating on data
frames. This helps to communicate the operation you are doing on data
frame and it provides an abstraction for data manipulation.
• • %>%: the “pipe” operator is used to connect multiple verb actions together
into a pipeline
repeat {
print(count)
count <- count + 1
if (count > 5) {
break
}
}
• break: break the execution of a loop
for (i in 1:10) {
if (i == 5) {
break # Stop the loop when i is equal to 5
}
print(i)
}
• next: skip an interation of a loop
for (i in 1:5) {
if (i == 3) {
next # Skip the iteration when i is equal to 3
}
print(i)
}
Explain how to create bar graphs, scatter plot, line graph, histogram and curve in R
with and without ggplot.
Using Base R
r
Copy code
# Plotting a curve
curve(sin, from = -pi, to = pi, main = "Sine Curve", xlab = "X", ylab = "sin(X)", col =
"green")
Using ggplot2
r
1.
Copy code
# Data for curve
x <- seq(-pi, pi, length.out = 100)
data <- data.frame(x = x, y = sin(x))
# Sine curve
ggplot(data, aes(x = x, y = y)) +
geom_line(color = "green") +
ggtitle("Sine Curve")
2 What is a bar graph? How will you make a basic bar graph in R? Explain
bar graphs are used to display numeric values (on the y-axis), for different categories
(on the x-axis). sometimes the bar heights represent counts of cases in the data set,
and sometimes they represent values in the data set.
use barplot() and pass it a vector of values for the height of each bar and (optionally) a
vector of labels for each bar. names. arg: This parameter is a vector of names
appearing under each bar in bar chart.
barplot(BOD$demand, names.arg = BOD$Time)
3 Bar graph – with & without factor, fill colour, outline, grouping, palette, reorder,
dodge, scale_fill_brewer() or scale_fill_manual()
RcolorBrewer
Ggplot(BOD,aes(x=time,y=demand))+geam_col(fill=”blue”,color=”black”)
• We’ll map Date to the x position and map Cultivar to the fill color
The most basic bar graphs have one categorical variable on the x-axis and one
continuous variable on the y-axis. Sometimes you’ll want to use another
categorical variable to divide up the data, in addition to the variable on the xaxis.
You can produce a grouped bar plot by mapping that variable to fill,
which represents the fill color of the bars. You must also use position =
"dodge", which tells the bars to “dodge” each other horizontally; if you don’t,
you’ll end up with a stacked bar plot.
• Line graphs are typically used for visualizing how one continuous variable, on
the y-axis, changes in relation to another continuous variable, on the x-axis.
• Often the x variable represents time, but it may also represent some other
continuous quantity
5
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line()
Explain how will you make line graph with multiple lines?
In addition to the variables mapped to the x-and y-axes, map another (discrete)
variable to colour or linetype
7 How will you change appearance of points in a line graph and how will you make a
graph of shaded area? Explain
In geom_point(), set the size, shape, colour, and/or fill outside of aes()
• ggplot(BOD, aes(x = Time, y = demand)) +
• geom_line() +
• geom_point(size = 4, shape = 22, colour = "darkred", fill = "pink")
What is a scatter plot? How will you make a basic scatter plot in R? Explain
A scatter plot is a type of data visualization that displays values for two variables for
8 a set of data. Each point on the plot corresponds to one observation in the dataset.
Scatter plots are particularly useful for identifying relationships, correlations, and
distributions between the two variables.
ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point()
How will you group points together using shapes or colors in scatter plot? Explain
You can also differentiate points using shapes. By mapping a categorical variable to
the shape aesthetic, you can use different symbols to represent different groups
ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl), shape = factor(am))) + geom_
point(size = 3) + labs(title ="Scatter Plot of MPG vs Horsepower Colored by
Cylinders and Shaped by Transmission", x = "Horsepower", y = "Miles per
Gallon", color = "Cylinders", shape = "Transmission")
Explain how to map a continuous variable to color or size.
In ggplot2, mapping a continuous variable to color or size in a scatter plot allows you
to add an extra dimension of information to the visualization. This is particularly
useful when you want to see how a third variable varies with two primary variables
(on the x and y axes) by changing the color gradient or point size.
10
ggplot(mtcars, aes(x = hp, y = mpg, color = wt)) + geom_point(size = 3)
Mapping a continuous variable to size scales each point’s size according to the
variable's value. Larger values will result in larger points, while smaller values will
have smaller points
ggplot(mtcars, aes(x = hp, y = mpg, size = wt)) + geom_point(color = "blue")
11 How to deal with over plotting in scatter plot? Explain different methods
library(ggplot2)
df <- data.frame(
Year = rep(2010:2015, each = 3),
Value = c(40, 60, 80, 50, 65, 85, 55, 70, 90, 58, 75, 95, 60, 80, 100, 62, 85, 105),
Category = rep(c("A", "B", "C"), times = 6)
)
The reorder() function takes a factor and reorders it according to the values of
another variable.
reorder(x, FUN = mean, ...)
15 Explain about ggplot package and different methods like aes etc