0% found this document useful (0 votes)
14 views1 page

Starting With R - 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views1 page

Starting With R - 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

CHAPTER 1.

GETTING STARTED WITH R 9

Another type of data in R is factors. You can think of factors as special character vectors with some nice additional
functions. They take on a limited number of different values; such variables are often referred to as categorical variables.
We use them to categorise the data and store it as levels. They can store both strings (texts) and integers. They are
useful in the columns which have a limited number of unique values. Like Employed/Unemployed and True/False etc.
They are useful in data analysis for statistical modelling.
Factors are created using the factor () function by taking a vector as input.
Exercise: Let’s create a vector called regions and the observations in this vector are: "East","West","South","North".
We then print this vector to see the values it takes and confirm whether the levels are factor type data.
Solution:

# Create a vector as input and test if it is a factor or not.


regions <- c("East","West","South","North")
print(regions)

## [1] "East" "West" "South" "North"

is.factor(regions)

## [1] FALSE

We see that the levels are not factor type data. Let’s apply the factor() function to convert the data to factor and
test again whether is it a factor or not.

regions <- c("East","West","South","North")


factor_regions <- factor(regions)
is.factor(factor_regions)

## [1] TRUE

While it may be necessary to convert a numeric variable to a factor for a particular application, it is often very useful
to convert the factor back to its original numeric values, since even simple arithmetic operations will fail when using
factors. Since the as.numeric function will simply return the internal integer values of the factor, the conversion must
be done using the levels attribute of the factor.
Once you converted a numeric or character varible to a factor variable, you can find its levels by using the levels()
function as well.

regions <- c("East","West","South","North")


factor_regions <- factor(regions)
levels(factor_regions)

## [1] "East" "North" "South" "West"

Factors in R come in two varieties: ordered and unordered, e.g., small, medium, large and pen, brush, pencil. For
most analyses, it will not matter whether a factor is ordered or unordered. If the factor is ordered, then the specific
order of the levels matters (small < medium < large). If the factor is unordered, then the levels will still appear in
some order, but the specific order of the levels matters only for convenience (pen, pencil, brush) – it will determine,
for example, how output will be printed, or the arrangement of items on a graph.
One way to change the level order is to use factor() on the factor and specify the order directly. In this example, the
function ordered() could be used instead of factor().
Suppose we are studying the effects of several levels of a fertilizer on the growth of a plant. We record the fertilizer
levels in a vector called fert. To order the levels from smallest to the largest value, we ordered = TRUE as below:

fert = c(10,20,20,50,10,20,10,50,20)
fert = factor(fert,levels = c(10,20,50), ordered = TRUE)
fert

You might also like