Introduction R
Introduction R
Introduction R
BY SHIKHA BIHARE
What is R?
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S
language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by
John Chambers and colleagues.
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory
statistics at the University of Auckland.
The core R language is augmented by a large number of extension packages containing reusable code and
documentation
R can be considered as a different implementation of S. There are some important differences, but much code written
for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical
statistical tests, time-series analysis, classification, clustering, …) and graphical
techniques, and is highly extensible. The S language is often the vehicle of choice
for research in statistical methodology, and R provides an Open Source route to
participation in that activity.
❖ R is a programming language for statistical computing and graphics supported by the R Core Team and the R
Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman.
❖ R is used among data miners, bioinformaticians and statisticians for data analysis and developing statistical
software.
❖ The language took heavy inspiration from the S programming language with most S programs able to run
unaltered in R as well as from Scheme's lexical scoping allowing for local variables.The name of the language
comes from being an S language successor and the shared first letter of the authors, Ross and Robert.
❖ Ihaka and Gentleman first shared binaries of R on the data archive StatLib and the s-news mailing list in
August 1993.In June 1995, statistician Martin Mächler convinced Ihaka and Gentleman to make R free and
open-source under the GNU General Public License.
❖ The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Fritz Leisch to host
R's source code, executable files, documentation, and user-created packages.Its name and scope mimics
the Comprehensive TeX Archive Network and the Comprehensive Perl Archive Network.
❖ CRAN originally had three mirrors and 12 contributed packages.As of December 2022, it has 103 mirrors and
18,976 contributed packages.
Features
Data processing
❖ R's data structures include vectors, arrays, lists, data frames and environments. Vectors are ordered
collections of values and can be mapped to arrays of one or more dimensions in a column major order.
That is, given an ordered collection of dimensions, one fills in values along the first dimension first,
then fills in one-dimensional arrays across the second dimension, and so on.
❖ R supports array arithmetics and in this regard is like languages such as APL and MATLAB.The special
case of an array with two dimensions is called a matrix. Lists serve as collections of objects that do not
necessarily have the same data type. Data frames contain a list of vectors of the same length, plus a
unique set of row names.R has no scalar data type.[nstead, a scalar is represented as a length-one
vector.
❖ R and its libraries implement various statistical techniques, including linear, generalized
linear and nonlinear modeling, classical statistical tests, spatial and time-series analysis,
classification, clustering, and others. For computationally intensive tasks, C, C++, and Fortran code
can be linked and called at run time.
❖ Another of R's strengths is static graphics; it can produce publication-quality graphs that include
mathematical symbols.
Programming
❖ R is an interpreted language users can access it through a command-line interpreter. If a user types 2+2 at the R command prompt and
presses enter, the computer replies with 4.
❖ R supports procedural programming with functions and, for some functions, object-oriented programming with generic functions.
❖ Due to its S heritage, R has stronger object-oriented programming facilities than most statistical computing languages. Extending it is
facilitated by its lexical scoping rules, which are derived from Scheme.R uses S syntax (not to be confused with S-expressions) to represent
both data and code.R's extensible object system includes objects for (among others): regression models, time-series and geo-spatial
coordinates.
❖ Advanced users can write C, C++, Java,.NET or Python code to manipulate R objects directly.
❖ Functions are first-class objects and can be manipulated in the same way as data objects, facilitating meta-programming that allows multiple
dispatch. Function arguments are passed by value, and are lazy—that is to say, they are only evaluated when they are used, not when the
function is called. A generic function acts differently depending on the classes of the arguments passed to it.
In other words, the generic function dispatches the method implementation specific to that object's class. For example, R has
a generic print function that can print almost every class of object in R with print(objectname). R is highly extensible through the
use of packages for specific functions and specific applications.
Packages
❖ Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are
stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once
installed, they have to be loaded into the session to be used .
❖ R's capabilities are extended through user-created packages, which offer statistical techniques, graphical devices, import/export,
reporting etc. These packages and their easy installation and use has been cited as driving the language's widespread adoption
in data science.The packaging system is also used by researchers to organize research data, code, and report files in a systematic
way for sharing and archiving.
❖ The "Task Views" on the CRAN website lists packages in fields including Finance, Genetics, High-Performance Computing,
Machine Learning, Medical Imaging, Meta-Analysis, Social Sciences and Spatial Statistics.
❖ R has been identified by the FDA as suitable for interpreting data from clinical research. Microsoft maintains a daily snapshot of
CRAN that dates back to Sept. 17, 2014.
❖ Other R package resources include R-Forge,a platform for the collaborative development of R packages.
❖ A group of packages called the Tidyverse, which can be considered a "dialect" of the R language, is increasingly popular among
developers. It strives to provide a cohesive collection of functions to deal with common data science tasks, including data import,
cleaning, transformation, and visualisation (notably with the ggplot2 package). Dynamic and interactive graphics are available
through additional packages.
Common IDEs
An IDE, or Integrated Development Environment, enables programmers to consolidate the different aspects of
writing a computer program. They are powerful interfaces with integrated capabilities that allow developers to
write code more efficiently.
In Python, the most popular IDEs in data science are Jupyter Notebooks and its modern version, JupyterLab, as
well as Spyder.
As for R, the most commonly used IDE is RStudio. Its interface is organized so that the user can view graphs, data
tables, R code, and output all at the same time.
IMPORTANT POINTS IN R
• R is a case sensitive language.
• We can use # inside an R script to add any comment we want. R won't read it during the running
time.
• Small bracket ( ) is used for predefined function.
• Square bracket [ ] is used for subsetting.
R DATA TYPE
• Basics types
• Example 3: # Boolean
z <- TRUE
class(z)
Output:
## [1] "logical"
VECTOR IN R
A vector is a one-dimensional array. We can create a vector with all the basic data type that we learnt before.
• {Array: collection of fixed number of components (elements), wherein all of components have same data
type. One-dimensional array: array in which components are arranged in list form. Multi-dimensional array:
array in which components are arranged in tabular form (not covered).}
• The simplest way to build a vector in R, is to use the c command.
Example 1: # Numerical
vec_num <- c(1, 10, 49)
vec_num
Output:
## [1] 1 10 49
• Example 2: # Character
vec_chr <- c("a", "b", "c")
vec_chr
Output:
## [1] "a" "b" "c“
• Example 3:
# Boolean
vec_bool <- c(TRUE, FALSE, TRUE)
vec_bool
Output:
##[1] TRUE FALSE TRUE
We can do arithmetic calculations on vectors.
• Example 4:
# Create the vectors
vect_1 <- c(1, 3, 5)
vect_2 <- c(2, 4, 6)
# Take the sum of vect_1 and vect_2
sum_vect <- vect_1 + vect_2
# Print out total_vector
sum_vect
Output:
[1] 3 7 11