Importing Data in R
Importing Data in R
Importing data in R programming means that we can read data from external files, write data
to external files, and can access those files from outside the R environment. File formats like
CSV, XML, xlsx, JSON, and web data can be imported into the R environment to read the data
and perform data analysis, and also the data present in the R environment can be stored in
external files in the same file formats.
Reading CSV Files
CSV (Comma Separated Values) is a text file in which the values in columns are separated by
a comma.
For importing data in the R programming environment, we have to set our working directory
with the setwd() function.
To read a csv file, we use the in-built function read.csv() that outputs the data from the file as
a data frame.
For example:
read.data <- read.csv("file1.csv")
print(read.data)
Output:
Sl. No. empid empname empdept empsalary empstart_date
1 1 Sam IT 25000 03-09-2005
2 2 Rob HR 30000 03-05-2005
3 3 Max Marketing 29000 05-06-2007
4 4 John R&D 35000 01-03-1999
5 5 Gary Finance 32000 05-09-2000
6 6 Alex Tech 20000 09-05-2005
7 7 Ivar Sales 36000 04-04-1999
8 8 Robert Finance 34000 06-08-2008
Analyzing a CSV File
#To print number of columns
print(ncol(read.data))
Output:
[1] 5
#To print number of rows
print(nrow(read.data))
Output:
[1] 8
#To print the range of salary packages
range.sal <- range(read.data$empsalary)
print(range.sal)
Output:
[1] 20000 36000
#To print the details of a person with the highest salary, we use the subset() function to
extract variables and observations
max.sal <- subset(read.data, empsalary == max(empsalary))
print(max.sal)
Output:
Sl. No. empid empname empdept empsalary empstart_date
7 7 Ivar Sales 36000 04-04-1999
#To print the details of all people working in Finance department
fin.per <-
print(fin.per)
Output:
Sl. No. empid empname Empdept empsalary empstart_date
5 5 Gary Finance 36000 05-09-2000
8 8 Robert Finance 34000 06-08-2008
Writing to a CSV File
To write data to a CSV file, we use the write.csv() function.
The output file is stored in the working directory of our R programming environment.
For example:
#To print the details of people having salary between 30000 and 40000 and store the results
in a new file
per.sal <- subset(read.data, empsalary >= "30000" & empsalary <= "40000")
print(per.sal)
Output:
empid empname Empdept empsalary empstart_date
2 2 Rob HR 30000 03-05-2002
4 4 John R&D 35000 01-03-1999
5 5 Gary Finance 32000 05-09-2000
7 7 Ivar Sales 36000 04-04-1999
8 8 Robert Finance 34000 06-08-2008
# Writing data into a new CSV file
write.csv(per.sal,"output.csv")
new.data <- read.csv("output.csv")
print(new.data)
Output:
x empid empname empdept empsalary empstart_date
1 2 2 Rob HR 30000 03-05-2002
2 4 4 John R&D 35000 01-03-1999
3 5 5 Gary Finance 32000 05-09-2000
4 7 7 Ivar Sales 36000 04-04-1999
5 8 8 Robert Finance 34000 06-08-2008
# To exclude the extra column X from the above file
write.csv(per.sal,"file2.csv", row.names = FALSE)
new.data <- read.csv("file2.csv")
print(new.data)
empid empname empdept empsalary empstart_date
Dict object
str string
True true
False false
None null
Python3
form of JSON, without needing to convert it into an actual JSON object. It takes 2
parameters:
dictionary the name of a dictionary which should be converted to a JSON
object.
file pointer pointer of the file opened in write or append mode.
Python3
library(RCurl)
#Reading HTML Tables
#You can import HTML tables into R with the following command.
# Assign your URL to `url`
url <- "Important_datasets_CSV.html"
url
# Read the HTML table using readHTMLTable
data_df <-readHTMLTable(url,3)
data_df
class(data_df)
Alternatively, you can use the rawToChar argument to convert raw objects as the following, to
import data from HTML tables into R via httr package.
# Activate `httr`
library(httr)
# Get the URL data
urldata <- GET(url)
# Read the HTML table
data <- readHTMLTable(rawToChar(urldata$content), stringsAsFactors = FALSE)
To read HTML tables from websites and retrieve data from them, we use
the XML and RCurl packages in R programming.
To install XML and RCurl packages, run the following command:
install.packages("XML")
install.packages("RCurl")
To load the packages, run the following command:
library("XML")
library("RCurl")
library(foreign)
# Read the SPSS data
mySPSSData <- read.spss("airline_passengers.sav")
This works fine if you are currently using SPSS software.
The following command will to view the results in a data frame.
# Activate the `foreign` library
library(foreign)
# Read the SPSS data
mySPSSData <- read.spss("airline_passengers.sav",
to.data.frame=TRUE,
use.value.labels=FALSE)
You can set the use.value.labels argument to FALSE, if you wish to not convert value labels
variables to R factors. Also, to.data.frame argument can be set to TRUE to receive output in
data frame display.
2. STATA FILES
You can import stata files to R via foreign / haven package through the following command.
# Activate the `foreign` library
library(haven)
library(foreign)
# Read Stata data into R
mydata <- read.dta("sample.dta")
3. SYSTAT FILES
You can import Systat files to R via foreign package through the following command.
# Activate the `foreign` library
library(foreign)
# Read Systat data
mydata <- read.systat("sample.dta")
4. SAS FILES
To initiate the importing of SAS files into R, install the sas7bdat package and invoke the
read.sas7bdat() function to proceed further.
# Activate the `sas7bdat` library
library(sas7bdat)
# Read in the SAS data
mySASData <- read.sas7bdat("airline.sas7bdat")
Alternatively, if you are using foreign library, you can initiate the import process with
read.ssd() and read.xport() functions accordingly.
5. MINITAB
To import minitab (.mtp) files into R, you need to install the foreign package and use the
function read.mtp() to initiate the process. This can be done through the following command.
# Activate the `foreign` library
library(foreign)
# Read the Minitab data
myMTPData <- read.mtp("sample.mtp")
6. RDA/ RDATA
You can import your .rdata file into R through the following command.
load(".RDA")
7. READ RELATIONAL AND NON-RELATIONAL DATABASES INTO R
The following are the steps to import data from relational databases by using MonetDB.
Step 1: Create a database by using the MonetDB daemon monetdbd and a new database called
filter, lag
class(batches)
[1] "XMLNodeSet"
Step 5:
batch_titles <- batches %>%
html_nodes(".batch_class") %>%
html_text()
batch_season <- str_extract(batch_titles, "(Fall|Spring|Winter|Summer)")
batch_year <- str_extract(batch_titles, "([[:digit:]]{4})")
# location info is everything in the batch title that is not year info or season info
batch_location <- sub("\\s+$", "",
sub("([[:digit:]]{4})", "",
sub("(Fall|Spring|Winter|Summer)","",batch_titles)))
# create data frame with batch info.
batch_info <- data.frame(location = batch_location,
year = batch_year,
season = batch_season)