R Studio Info For 272
R Studio Info For 272
R Studio Info For 272
Getting Started
Loading Workspaces and Datasets in RStudio
Packages
Basic Examples
Using Functions
Writing Code in the Script Window
Writing Readable Code The Importance of Style
R Punctuation
Getting Help
Common Error Messages
R functions used in Stat 272
Getting Started
RStudio is a user interface for the statistical programming software R. While some operations can be
done by pointing and clicking with the mouse, you will need to learn to write program code. This is like
learning a new language- there is specific syntax, grammar and vocabulary, and it will take time to get
used to. Learning R will ultimately give you complete control, flexibility, and creativity when analyzing
and visualizing data... but fluency in this new language will take time. Be precise, go slow, copy others
code and be patient!
Dowload R and R Studio
R is a FREE software package for statistics and statistical graphics. It can be downloaded on UNIX,
Windows, and Mac operating systems.
Go to http://cran.us.r-project.org/ and follow the instructions appropriate to your OS. In theory, you
should just be able to run the .exe (Windows) or .pkg (Mac) file once downloaded and it will magically
install. I recommend using all the default settings when downloading R.
Once you have downloaded R, you can install RStudio from http://www.rstudio.com/
Click "Download now", and then click "Download RStudio Desktop". Select and download the version
appropriate for your operating system.
R Basics
R is an object based language objects include matrices, vectors, data frames (a special type of matrix),
and functions. All these objects float around in the workspace.
Many functions exist in base R- the basic set of functions and other objects available when you open
RStudio for the first time. These include functions for plotting data, basic mathematical operations, and
some statistical procedures which we will learn. Additional functions can be added by loading packagesthese are sets of objects compiled by other R users and made publicly available. You can also write your
own functions.
Workspace:
The script window:
You can store a document of
commands you used in R to
reference later or repeat analyses
Plot/Help:
Console:
Output appears here. The > sign means
R is ready to accept commands. Type
directly into the console, or run lines of
code from the script window.
Packages
While many useful functions are included in base R, users and developers can create and submit their
own add-on packages with specialized functions and datasets. Accessing these packages requires two
steps: installing the package onto your computer (only needs to be done once) and loading the library
into your workspace (needs to be done every time you open RStudio). For example, many specialized
plotting functions are included in ggplot2.
Packages can be installed by point and click in RStudio. First click on Tools and select Install Packages
Begin typing the package name. You can see auto complete options will appear. It is recommended to
leave the Install dependencies box checked, as this will automatically install any packages required to
run your desired package. Click Install.
Some warning messages may appear if the package was built under a different R version or if other
packages are installed because of dependencies. Errors may occur at this step, so be sure to read the
text which appears in red. When the package is successfully installed, you should see a message in the
console which says:
package ggplot2 successfully unpacked and MD5 sums checked
To use the package, you must use function library() to load the desired package every time you
reopen RStudio. Libraries will not automatically load whenever you use RStudio.
Warning messages may appear. Usually these are safe to ignore, but with less commonly used or very
outdated packages, check CRAN online to see if there are reported problems or if the package is still
supported.
Basic Examples
R runs code line by line. That is, you tell it one thing, and it does it right away. (Sometimes if our one
line of code is super long, it will actually be written as multiple lines on a page, but R treats it as one
super-long sentence of code).
With numbers, we can use R like a calculator. The following is an example of what appears in the
console window when we type 3 + 7 and hit enter.
NATURAL log:
We can also store objects using names. We see this most often in this class with named data frames.
(aka data sets). We will also store tables, function output, or single values. A simple example is the
following code:
se <- sqrt(.75*.25/200)
This is an example where I might want to store the standard error for a sample proportion of .75 with
200 observations as se in my workspace. This is convenient if I am going to be using it over and over in
equations. Youll notice that if you run this line of code, no output appears in your console. But a new
Value appears in your workspace, called se. You can also use = to assign values rather than <-.
The textbook tends to use = but many prefer to use the arrow as a convention; as you write more
code, you will tend to develop your own style.
Note that R is case sensitive. The object se is not the same as SE.
Using Functions
Two functions we will commonly use in Stat 272 are lm and plot.The function lm is used for limple and
multiple linear regression and takes many possible arguments or inputs, though we commonly only use
two: the formula (model statement) and the dataset.
lm(formula, data)
When using R functions, you can specify which argument is which by name, or based on the order of
entering information. For function inputs, you can also truncate the names. That is, all of the following
will give equivalent results:
lm(formula= y~x, data=example1)
lm(form=y~x, dat=example1)
lm(y~x, data=example1)
Notice that the first argument y~x is assumed to be the formula. See the help menu or autocomplete
hints for the expected order of arguments. If you use the argument names, the order does not matter.
(This can be helpful for preventing errors).
Another useful feature of RStudio is auto completion. Try typing just l or lm in the script window
and hit the Tab key.
A list of possible functions that begin with l will appear. Information about the selected function
appears to the right.
You can also use auto completion when you do not remember the names of the function arguments. Hit
tab while your cursor is within the parentheses.
Some function arguments are optional... as you can see, there are many options with the lm function,
but we usually only specify the formula (model choice) and sometimes the dataset.
Once you have typed your complete line of code in the script window, run it by placing your cursor
anywhere in the line, and hitting Ctrl+Enter or Ctrl+R
When you type into the script window, you will notice coloration of your inputs:
This is one big advantage of entering code in the script window rather than directly into the console.
Youll notice several things:
Numbers are blue. Text (denoted by quotes) is green
Function, object and argument names are black.
Parentheses are gray. Closed parenthesis is automatically produced with the open parenthesis.
By placing my cursor by a parenthesis, the matching one is highlighted in grey.
When you run a line of code, you will see the line and any output in the console window:
The blue line shows the command that was run, and the black is output from this function. Should you
receive error messages, they will be red. Plots, if any, will appear in the plot window.
Notice that another blue > prompt sign has appeared beneath the output, indicating this command is
completed, and R is ready to take another command. If a + appears, that indicates the line or sentence
of code has not yet been completed (most likely a missing parenthesis) and R has not yet run the
command.
Starting another line of code right away without finishing this command will probably result in an error.
Code sentences can break over multiple lines; just dont forget to finish them. For example:
The plot function does not produce any output in the console but should produce plots in the plot
window.
# Example R Script
be helpful if youre passing code back and forth
# Created by Laura Boehm Vock on Jan 24, 2014
when working on a project
# Last Edited by Laura Boehm Vock on Jan 28, 2014
#############################################################################
# Create variables x and y
x <- rnorm(20)
y <- rnorm(20)
# Fit the linear model
model1 <- lm(y ~ x)
# Plot y versus x and include line of best fit
plot(x, y, main = Example plot)
abline(model1)
#this is a comment too.
Comments can take up a whole line, or the end of a line. Anything after # but before a new line is
commented and will not be run when you hit Ctrl+Enter or Ctrl+R.
What if you type something into the console, and then decide you want to keep it as part of your script?
Rather than retype the whole thing, either copy and paste directly from the console OR use the
History tab which is in the same pane as the Workspace tab. Click where you want the line inserted
in your script; then highlight the desired line in the History tab and click
. The line will be
copied and pasted where your cursor last clicked in your script. This is especially useful if you load a
workspace, package or data by point-and-click. RStudio generates a line of code that you can use to
repeat this action in the future.
R Punctuation
#
"" or ''
()
Used to create comments in an R Script. Running these lines will create no output.
Double or single quotes indicate chunks of text that should be treated as text rather than
names of functions or other objects.
Parentheses can be used to indicate order of operations in complicated computations:
(3+2)^2
They are used to indicate functions:
function(arg1, arg2, arg3)
Square brackets are used to subset matrices and vectors.
This indicates the first column of the matrix data
data[, 1]
This indicates the second row of the matrix data
data[2, ]
[]
This indicates the single item in the first column and second row
data[2, 1]
This indicates the third item in the vector called variable. Notice we dont need a
comma since vectors only have one set of indices.
variable[3]
Used to create a list of integers. Can be used when creating for loops or for subsetting
data
This gives the first ten rows of data:
data[1:10, ]
Used to subset a dataframe or list object.
Get the variable called var1 from the dataframe called data
data$var1
{}
10
When output from linear model regression function is saved as model can access
coefficients, etc.
model <- lm(y~x, data=data)
model$coeff
Used for creating chunks of code, particularly for loops.
for(i in 1:100){
iscambinomtest(i, 100, .5, greater)
}
Used for creating formula/model statements, e.g. for t.test or lm function.
t.test(response~explanatory, data=data)
Getting Help
For any R functions (that are part of base R or added through packages) you can access the help menu
by typing ? before the function, or by searching directly in the Help tab in the lower right.
1. Check out the R Code Library on Moodle- this is a student generated forum for you to share
code with each other.
2. Google it! Seriously. Just Google R t-test or R plotting and you will be amazed.
3. Ask your professor. This shouldnt be your last resort, but Im not always available immediately.
11
12
abline
barplot
boxplot
cbind
chisq.test
cor
exp
for
glm
hist
histogram {lattice}
legend
lm
log
matrix
mean
par
plot
prop.table
qqnorm
rbind
rbinom
round
sample
sd
step
summary
t.test
table
13