RStudio Exercices
RStudio Exercices
RStudio Exercices
Arithmetic
R does arithmetic of course and you can use it as a calculator. For example
> 2 + 3
[1] 5
Notice the > is a prompt that says that R is ready for input. Note also the [1] in the output. That
says the next number is the first number in the output (not incredibly useful at this point).
Try some arithmetic, using the various arithmetic operations, parentheses, and simple functions.
For example, try things like the following. (You might have some guesses as to which functions
should be available!)
>
>
>
>
37 * 43
2^4
1 + 2 * 3
sin(10)
Variables
Variables are used to save objects for later use. For example we can save and then use the results of
computations. Entering the name of a variable is an implicit command to print what is currently
saved in the variable. Notice the peculiar fourth line. It asks R to add 23 to whatever is saved in
y and then save the result in y.
> y = 2 + 3
> y
[1] 5
> y + 1
[1] 6
> y = y + 23
> y
[1] 28
Vectors
A vector is an ordered list of objects. A vector has a length and a mode (numeric, character, or
logical). We will use vectors to store data. Usually we will not enter the values of vector into R by
hand but we will read stored data into R. But short vectors can be entered by hand.
> y = c(1, 2, 3, 9, 8, 7)
> y
[1] 1 2 3 9 8 7
The object y defined above is a vector and R knows that. In most instances, R performs operations
on vectors in a natural way. Here are some examples.
> y + 10
[1] 11 12 13 19 18 17
> x = y + 1
> x
[1]
4 10
7 19 17 15
> x + y
[1]
> log(y)
[1] 0.000 0.693 1.099 2.197 2.079 1.946
Sometimes it will be necessary to access the pieces of a vector. The vector y has 6 elements.
> y[3]
[1] 3
> y[6]
[1] 7
> x = 5
> y[x]
[1] 8
> y[7]
[1] NA
The last result, NA, is read as Not Available and denotes a missing value. A value might be
missing for a number of reasons in this case the vector y does not have a seventh element.
R has some convenient ways to make regular vectors. Determine what each of the following does:
> x = 1:10
> y = seq(0, 1, 0.01)
> z = c(rep(1, 5), rep(2, 10))
Functions
We have already used a couple of functions above. R knows a gazillion functions. A function in
R is much like the functions that you have met in calculus. Namely, a function has inputs and
outputs. In mathematics, we use the notation f (x, y) to denote the result of apply the function
f to the inputs x and y. The notation in R is quite similar. For example, if x is a vector, then
mean(x) computes the mean of the elements of x.
> y = 1:100
> mean(y)
[1] 50.5
There are some important extensions of the notion of function that R implements.
z = 1:10
y = z^2
mean(y)
mean(x = y)
mean(y, trim = 0.1)
mean(trim = 0.1, y) # why does this work?
mean(trim = 0.1, x = y)
mean(0.1, y) # why doesn't this work?
> y = c(1:10)
> log(y)
Packages
One of the reasons that R is so powerful and flexible is that users can easily add capabilities to R. So
users have developed hundreds of packages that can easily be loaded into R for use. The package
tab in the lower right window gives the packages that are available to you in the current version
of Rstudio on dahl. Additional packages can be downloaded from the web. To load a package and
make it available for your use, check the box corresponding to the name of the package. For now,
load the packages lattice, mosaic, and Stob. Each of these adds some capabilities to the base
version of R. For example, the Stob package makes available many of the instructors favorite
data sets. There is a packages pane that you can use to load packages and find information about
the contents of each package. Note the contents of the Stob package.
Data Frames
Most of our data sets will come to us in R objects known as data frames. A data frame has rows and
columns. Generally each row corresponds to one observational unit and each column corresponds
to a variable. Thus each column is itself a vector. For example, the dimes data is in a data frame
called dimes which is in the Stob package. Notice below that the rows are numbered for our
convenience the row numbers are not part of the data frame. There are two variables in this data
frame corresponding to the year the dime was minted and its mass.
> dimes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Mass
2.26
2.25
2.25
2.30
2.29
2.25
2.27
2.21
2.27
2.27
2.27
2.27
2.30
2.29
2.27
Year
2004
2004
1987
1988
1971
2007
2007
1974
2007
2004
1997
1994
1974
1996
1999
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2.23
2.24
2.25
2.25
2.23
2.27
2.23
2.24
2.23
2.25
2.28
2.26
2.28
2.23
2.23
1993
1998
2000
2007
2001
1994
1972
1992
1970
2001
2001
2004
2001
2001
2002
It will be very important for you to become proficient in manipulating data frames. A few functions
are useful for getting information about the data frame:
> str(dimes)
'data.frame': 30 obs. of 2 variables:
$ Mass: num 2.26 2.25 2.25 2.3 2.29 ...
$ Year: int 2004 2004 1987 1988 1971 2007 2007 1974 2007 2004 ...
> dim(dimes)
[1] 30
> head(dimes)
1
2
3
4
5
6
Mass
2.26
2.25
2.25
2.30
2.29
2.25
Year
2004
2004
1987
1988
1971
2007
It will be important to be able to access individual observational units (rows) or variables (columns)
of the data frame. If df is a dataframe, df[i,j] is entry of the ith row and jth column. This can
be generalized. To access rows:
> dimes[2, ]
Mass Year
2 2.25 2004
> dimes[4:6, ]
Mass Year
4 2.30 1988
5 2.29 1971
6 2.25 2007
To access columns:
> dimes[, 1]
[1] 2.26 2.25 2.25 2.30 2.29 2.25 2.27 2.21 2.27 2.27 2.27 2.27 2.30 2.29
[15] 2.27 2.23 2.24 2.25 2.25 2.23 2.27 2.23 2.24 2.23 2.25 2.28 2.26 2.28
[29] 2.23 2.23
> dimes[, 2]
[1] 2004 2004 1987 1988 1971 2007 2007 1974 2007 2004 1997 1994 1974 1996
[15] 1999 1993 1998 2000 2007 2001 1994 1972 1992 1970 2001 2001 2004 2001
[29] 2001 2002
An alternate way to access columns (variables) of a dataframe is the following:
> dimes$Mass
[1] 2.26 2.25 2.25 2.30 2.29 2.25 2.27 2.21 2.27 2.27 2.27 2.27 2.30 2.29
[15] 2.27 2.23 2.24 2.25 2.25 2.23 2.27 2.23 2.24 2.23 2.25 2.28 2.26 2.28
[29] 2.23 2.23
Note that each column of this dataframe is a vector and so can be used as an argument just as any
vector.
> mean(dimes$Mass)
[1] 2.26
> mean(dimes[, 1])
[1] 2.26
The dataframe counties in the Stob package has census data (2000) on all the counties in the
US. Investigate the population of the these counties.
The dataframe bballgames03 has data of all baseball games played in 2003. Investigate this data
set. (Think about what functions you might want and see if they exist!)
Histograms
A histogram is a graphical representation of the distribution of a quantitative variable. The input
to a histogram is a vector. The data frame sr in the Stob package has data on all graduating
Calvin College seniors of 2007. Try the following. (Note that the vector GPA in the dataframe sr
has the GPA of all graduating seniors.) The function histogram is in the lattice package.
Percent of Total
> histogram(sr$GPA)
15
10
5
0
1.5
2.0
2.5
3.0
3.5
4.0
sr$GPA
If you look at the help document, histogram has a boatload of possible options.