0% found this document useful (0 votes)
21 views

Importing Data in R

Importing data in R involves reading external files in formats like CSV, XML, JSON and writing data to external files. The document discusses how to import CSV, XML and JSON files into R and convert them to data frames for analysis. Key functions include read.csv() to import CSV, xmlParse() and xmlToDataFrame() to import XML, and fromJSON() to import JSON files. The data can then be manipulated and analyzed as data frames before being written back to external files using write.csv().

Uploaded by

paseg78960
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Importing Data in R

Importing data in R involves reading external files in formats like CSV, XML, JSON and writing data to external files. The document discusses how to import CSV, XML and JSON files into R and convert them to data frames for analysis. Key functions include read.csv() to import CSV, xmlParse() and xmlToDataFrame() to import XML, and fromJSON() to import JSON files. The data can then be manipulated and analyzed as data frames before being written back to external files using write.csv().

Uploaded by

paseg78960
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Importing Data in R

Importing data in R programming means that we can read data from external files, write data
to external files, and can access those files from outside the R environment. File formats like
CSV, XML, xlsx, JSON, and web data can be imported into the R environment to read the data
and perform data analysis, and also the data present in the R environment can be stored in
external files in the same file formats.
Reading CSV Files
CSV (Comma Separated Values) is a text file in which the values in columns are separated by
a comma.
For importing data in the R programming environment, we have to set our working directory
with the setwd() function.
To read a csv file, we use the in-built function read.csv() that outputs the data from the file as
a data frame.
For example:
read.data <- read.csv("file1.csv")
print(read.data)
Output:
Sl. No. empid empname empdept empsalary empstart_date
1 1 Sam IT 25000 03-09-2005
2 2 Rob HR 30000 03-05-2005
3 3 Max Marketing 29000 05-06-2007
4 4 John R&D 35000 01-03-1999
5 5 Gary Finance 32000 05-09-2000
6 6 Alex Tech 20000 09-05-2005
7 7 Ivar Sales 36000 04-04-1999
8 8 Robert Finance 34000 06-08-2008
Analyzing a CSV File
#To print number of columns
print(ncol(read.data))
Output:
[1] 5
#To print number of rows
print(nrow(read.data))
Output:
[1] 8
#To print the range of salary packages
range.sal <- range(read.data$empsalary)
print(range.sal)
Output:
[1] 20000 36000
#To print the details of a person with the highest salary, we use the subset() function to
extract variables and observations
max.sal <- subset(read.data, empsalary == max(empsalary))
print(max.sal)
Output:
Sl. No. empid empname empdept empsalary empstart_date
7 7 Ivar Sales 36000 04-04-1999
#To print the details of all people working in Finance department
fin.per <-
print(fin.per)
Output:
Sl. No. empid empname Empdept empsalary empstart_date
5 5 Gary Finance 36000 05-09-2000
8 8 Robert Finance 34000 06-08-2008
Writing to a CSV File
To write data to a CSV file, we use the write.csv() function.
The output file is stored in the working directory of our R programming environment.
For example:
#To print the details of people having salary between 30000 and 40000 and store the results
in a new file
per.sal <- subset(read.data, empsalary >= "30000" & empsalary <= "40000")
print(per.sal)
Output:
empid empname Empdept empsalary empstart_date
2 2 Rob HR 30000 03-05-2002
4 4 John R&D 35000 01-03-1999
5 5 Gary Finance 32000 05-09-2000
7 7 Ivar Sales 36000 04-04-1999
8 8 Robert Finance 34000 06-08-2008
# Writing data into a new CSV file
write.csv(per.sal,"output.csv")
new.data <- read.csv("output.csv")
print(new.data)
Output:
x empid empname empdept empsalary empstart_date
1 2 2 Rob HR 30000 03-05-2002
2 4 4 John R&D 35000 01-03-1999
3 5 5 Gary Finance 32000 05-09-2000
4 7 7 Ivar Sales 36000 04-04-1999
5 8 8 Robert Finance 34000 06-08-2008
# To exclude the extra column X from the above file
write.csv(per.sal,"file2.csv", row.names = FALSE)
new.data <- read.csv("file2.csv")
print(new.data)
empid empname empdept empsalary empstart_date

1 2 Rob HR 30000 03-05-2002

2 4 John R&D 35000 01-03-1999

3 5 Gary Finance 32000 05-09-2000

4 7 Ivar Sales 36000 04-04-1999

5 8 Robert Finance 34000 06-08-2008

Reading XML Files


XML (Extensible Markup Language) file shares both data and file format on the web, and
elsewhere, using the ASCII text. Like an html file, it also contains markup tags, but the tags in
an XML file describe the meaning of the data contained in the file rather than the structure of
the page.
For importing data in R from XML files, we need to install the XML package, which can be
done as follows:
install.packages("XML")
To read XML files, we use the in-built function xmlParse().
For example:
#To load required xml package to read XML files
library("XML")
#To load other required packages
library("methods")
#To give the input file name to the function
newfile <- xmlParse(file = "file.xml")
print(newfile)
Output:
<?xml version="1.0"?>
<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Sam</NAME>
<SALARY>32000</SALARY>
<STARTDATE>1/1/2001</STARTDATE>
<DEPT>HR</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>2</ID>
<NAME>Rob</NAME>
<SALARY>36000</SALARY>
<STARTDATE>9/3/2006</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>3</ID>
<NAME>Max</NAME>
<SALARY>42000</SALARY>
<STARTDATE>1/5/2011</STARTDATE>
<DEPT>Sales</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>4</ID>
<NAME>Ivar</NAME>
<SALARY>50000</SALARY>
<STARTDATE>25/1/2001</STARTDATE>
<DEPT>Tech</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>5</ID>
<NAME>Robert</NAME>
<SALARY>25000</SALARY>
<STARTDATE>13/7/2015</STARTDATE>
<DEPT>Sales</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>6</ID>
<NAME>Leon</NAME>
<SALARY>57000</SALARY>
<STARTDATE>5/1/2000</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>7</ID>
<NAME>Samuel</NAME>
<SALARY>45000</SALARY>
<STARTDATE>27/3/2003</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>8</ID>
<NAME>Jack</NAME>
<SALARY>24000</SALARY>
<STARTDATE>6/1/2016</STARTDATE>
<DEPT>Sales</DEPT>
</EMPLOYEE>
</RECORDS>
#To get the root node of xml file
rootnode <- xmlRoot(newfile)
#To get the number of nodes in the
rootrootsize <- xmlSize(rootnode)
print(rootsize)
Output: [1] 8
#To print a specific node
print(rootnode[1])
Output:
$EMPLOYEE
<EMPLOYEE>
<ID>1</ID>
<NAME>Sam</NAME>
<SALARY>32000</SALARY>
<STARTDATE>1/1/2001</STARTDATE>
<DEPT>HR</DEPT>
</EMPLOYEE>
attr(,"class")
[1] "XMLInternalNodeList" "XMLNodeList"
#To print elements of a particular node
print(rootnode[[1]][[1]])
print(rootnode[[1]][[3]])
print(rootnode[[1]][[5]])
Output:
<ID>1</ID>
<SALARY>32000</SALARY>
<DEPT>HR</DEPT>
Converting an XML to a Data Frame
To perform data analysis effectively after importing data in R, we convert the data in an XML
file to a Data Frame. After converting, we can perform data manipulation and other operations
as performed in a data frame.
For example:
library("XML")
library("methods")
#To convert the data in xml file to a data frame
xmldataframe <- xmlToDataFrame("file.xml")
print(xmldataframe)
Output:
ID NAME SALARY STARTDATE DEPT
1 1 Sam 32000 01/01/2001 HR
2 2 Rob 36000 09/03/2006 IT
3 3 Max 42000 01/05/2011 Sales
4 4 Ivar 50000 25/01/2001 Tech
5 5 Robert 25000 13/07/2015 Sales
6 6 Leon 57000 05/01/2000 IT
7 7 Samuel 45000 27/03/2003 Operations
8 8 Jack 24000 06/01/2016 Sales
Reading JSON Files
JSON (JavaScript Object Notation) file is used to exchange data between a web application
and a server. They are text-based human-readable files and can be edited by a normal text
editor.
Importing data in R from a JSON file requires the rjson package that can be installed as
follows:
install.packages("rjson")
Now to read json files, we use the in-built function from JSON() which stores the data as a list.
For example:
#To load rjson package
library("rjson")
#To give the file name to the function
newfile <- fromJSON(file = "file1.json")
#To print the file
print(newfile)
Output:
$ID
[1] "1" "2" "3" "4" "5" "6" "7" "8"
$Name
[1] "Sam" "Rob" "Max" "Robert" "Ivar" "Leon" "Samuel" "Ivar"
$Salary
[1] "32000" "27000" "35000" "25000" "37000" "41000" "36000" "51000"
$StartDate
[1] "1/1/2001" "9/3/2003" "1/5/2004" "14/11/2007" "13/7/2015" "4/3/2007"
[7] "27/3/2013" "25/7/2000"
$Dept
[1] "IT" "HR" "Tech" "HR" "Sales" "HR"
[7] "Operations" "IT"
The full form of JSON is Javascript Object Notation. It means that a script (executable)
file which is made of text in a programming language, is used to store and transfer the
data.
Python supports JSON through a built-in package called JSON. To use this feature, we
import the JSON package in Python script. The text in JSON is done through quoted-
string which contains the value in key-value mapping within { }. It is similar to
the dictionary in Python.

Writing JSON to a file in Python


Serializing JSON refers to the transformation of data into a series of bytes (hence serial)
to be stored or transmitted across a network. To handle the data flow in a file, the JSON
library in Python uses dump() or dumps() function to convert the Python objects into
their respective JSON object, so it makes it easy to write data to files. See the following
table given below.
PYTHON OBJECT JSON OBJECT

Dict object

list, tuple array

str string

int, long, float numbers

True true

False false

None null

Method 1: Writing JSON to a file in Python using json.dumps()


The JSON package in Python has a function called json.dumps() that helps in converting
a dictionary to a JSON object. It takes two parameters:
dictionary the name of a dictionary which should be converted to a JSON
object.
indent defines the number of units for indentation
After converting the dictionary to a JSON object, simply write it to a file using the

Python3

Method 2: Writing JSON to a file in Python using json.dump()


Another way of writing JSON to a file is by using json.dump() method The JSON

form of JSON, without needing to convert it into an actual JSON object. It takes 2
parameters:
dictionary the name of a dictionary which should be converted to a JSON
object.
file pointer pointer of the file opened in write or append mode.
Python3

Reading JSON from a file using Python


Deserialization is the opposite of Serialization, i.e. conversion of JSON objects into
their respective Python objects. The load() method is used for it. If you have used JSON
data from another program or obtained it as a string format of JSON, then it can easily
be deserialized with load(), which is usually used to load from a string, otherwise, the
root object is in a list or Dict.
Reading JSON from a file using json.load()
The JSON package has json.load() function that loads the JSON content from a JSON
file into a dictionary. It takes one parameter:
File pointer: A file pointer that points to a JSON file.
Python3

Converting a JSON File to a Data Frame


To convert JSON file to a Data Frame, we use the as.data.frame() function.
For example:
library("rjson")
newfile <- fromJSON(file = "file1.json")
#To convert a JSON file to a data frame
jsondataframe <- as.data.frame(newfile)
print(jsondataframe)
Output:
ID NAME SALARY STARTDATE DEPT
1 1 Sam 32000 01/01/2001 IT
2 2 Rob 27000 09/03/2003 HR
3 3 Max 35000 01/05/2004 Tech
4 4 Ivar 25000 14/11/2007 HR
5 5 Robert 37000 13/07/2015 Sales
6 6 Leon 41000 04/03/2007 HR
7 7 Samuel 36000 27/03/2013 Operations
8 8 Jack 51000 25/07/2000 IT
Reading Excel Files
Microsoft Excel is a very popular spreadsheet program that stores data in xls and xlsx format.
We can read and write data, from and to Excel files using the readxl package in R.
To install the readxl package, run the following command
install.packages("readxl")
For importing data in R programming from an excel file, we use the read_excel() function
that stores it as a data frame.
library(readxl)
newfile <- read_excel("Sheet1.xlsx)
print(newfile)
Output:
ID NAME DEPT SALARY AGE
1 1 SAM SALES 32000 35
2 2 ROB HR 36000 23
3 3 MAC IT 37000 40
4 4 IVAR IT 25000 37
5 5 MAX R&D 30000 22
6 6 ROBERT HR 27000 32
7 7 SAMUEL FINANCE 50000 41
8 8 RAGNAR SALES 45000 29
Reading HTML Tables
HTML TABLES
You can import HTML tables into R with the following command.
# Assign your URL to `url`
url <- "important_datasets_CSV.html"
# Read the HTML table
data_df <- readHTMLTable(url, which=3)
If the above-mentioned table shows an error, use RCurl and XML packages.
# Activate the libraries
library(XML)

library(RCurl)
#Reading HTML Tables
#You can import HTML tables into R with the following command.
# Assign your URL to `url`
url <- "Important_datasets_CSV.html"
url
# Read the HTML table using readHTMLTable
data_df <-readHTMLTable(url,3)
data_df
class(data_df)
Alternatively, you can use the rawToChar argument to convert raw objects as the following, to
import data from HTML tables into R via httr package.
# Activate `httr`
library(httr)
# Get the URL data
urldata <- GET(url)
# Read the HTML table
data <- readHTMLTable(rawToChar(urldata$content), stringsAsFactors = FALSE)
To read HTML tables from websites and retrieve data from them, we use
the XML and RCurl packages in R programming.
To install XML and RCurl packages, run the following command:
install.packages("XML")
install.packages("RCurl")
To load the packages, run the following command:
library("XML")
library("RCurl")

readHTMLTable() function which stores it as a Data Frame.


#To fetch a table from any website paste the url
url <- "https://en.wikipedia.org/wiki/Ease_of_doing_business_index#Ranking"
tabs <- getURL(url)
#To fetch the first table, if the webpage has more than one table, we use which = 1
tabs <- readHTMLTable(tabs, which = 1, stringsAsFactors = F)
head(tabs)
Output:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1 Classification Jurisdiction 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
2 Very Easy New 1 1 1 2 2 3 3 3 3 2 2
Zealand
3 Very Easy Singapore 2 2 2 1 1 1 1 1 1 1 1
4 Very Easy Denmark 3 3 3 3 4 5 5 5 6 6 5
5 Very Easy Hong 4 5 4 4 3 2 2 2 2 3 4
Kong
6 Very Easy South 5 4 5 5 5 7 8 8 16 19 23
Korea

V14 V15 V16


1 2008 2007 2006
2 2 2 1
3 1 1 2
4 5 7 8
5 4 5 7
6 30 23 27
#To analyze the structure of the data frame use the str() function.
For example:
str(tabs)
Output:
'data.frame': 191 obs. of 16 variables:
$ V1 : chr "Classification" "Very Easy" "Very Easy" "Very Easy" ...
$ V2 : chr "Jurisdiction" "New Zealand" "Singapore" "Denmark" ...
$ V3 : chr "2019" "1" "2" "3" ...
$ V4 : chr "2018" "1" "2" "3" ...
$ V5 : chr "2017" "1" "2" "3" ...
$ V6 : chr "2016" "2" "1" "3" ...
$ V7 : chr "2015" "2" "1" "4" ...
$ V8 : chr "2014" "3" "1" "5" ...
$ V9 : chr "2013" "3" "1" "5" ...
$ V10: chr "2012" "3" "1" "5" ...
$ V11: chr "2011" "3" "1" "6" ...
$ V12: chr "2010" "2" "1" "6" ...
$ V13: chr "2009" "2" "1" "5" ...
$ V14: chr "2008" "2" "1" "5" ...
$ V15: chr "2007" "2" "1" "7" ...
$ V16: chr "2006" "1" "2" "8" ...
#To print rows from 5 to 10 and columns from 1 to 8
T1 <- tabs[5:10, 1:8]
head(T1)
Output:
V1 V2 V3 V4 V5 V6 V7 V8
5 Very Easy Hong Kong 4 5 4 5 3 2
6 Very Easy South Korea 5 4 5 4 5 7
7 Very Easy Georgia 6 9 16 24 15 8
8 Very Easy Norway 7 8 6 9 6 9
9 Very Easy United States 8 6 8 7 7 4
10 Very Easy United Kingdom 9 7 7 6 8 10
#To find the position of India in the Table
T1 <- subset(tabs,tabs$V2 == "India")
head(T1)
Output:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
78 Easy India 77 100 130 130 142 134 132 132 134 133 122 120 134 116
1. SPSS FILES INTO R
To initiate the SPSS files import into R, you have to install the foreign package and run the
read.spss() in the final step to proceed further. The following command will complete the
import.
# Activate the `foreign` library

library(foreign)
# Read the SPSS data
mySPSSData <- read.spss("airline_passengers.sav")
This works fine if you are currently using SPSS software.
The following command will to view the results in a data frame.
# Activate the `foreign` library
library(foreign)
# Read the SPSS data
mySPSSData <- read.spss("airline_passengers.sav",
to.data.frame=TRUE,
use.value.labels=FALSE)
You can set the use.value.labels argument to FALSE, if you wish to not convert value labels
variables to R factors. Also, to.data.frame argument can be set to TRUE to receive output in
data frame display.
2. STATA FILES
You can import stata files to R via foreign / haven package through the following command.
# Activate the `foreign` library

library(haven)
library(foreign)
# Read Stata data into R
mydata <- read.dta("sample.dta")
3. SYSTAT FILES
You can import Systat files to R via foreign package through the following command.
# Activate the `foreign` library
library(foreign)
# Read Systat data
mydata <- read.systat("sample.dta")
4. SAS FILES
To initiate the importing of SAS files into R, install the sas7bdat package and invoke the
read.sas7bdat() function to proceed further.
# Activate the `sas7bdat` library
library(sas7bdat)
# Read in the SAS data
mySASData <- read.sas7bdat("airline.sas7bdat")
Alternatively, if you are using foreign library, you can initiate the import process with
read.ssd() and read.xport() functions accordingly.
5. MINITAB
To import minitab (.mtp) files into R, you need to install the foreign package and use the
function read.mtp() to initiate the process. This can be done through the following command.
# Activate the `foreign` library
library(foreign)
# Read the Minitab data
myMTPData <- read.mtp("sample.mtp")
6. RDA/ RDATA
You can import your .rdata file into R through the following command.
load(".RDA")
7. READ RELATIONAL AND NON-RELATIONAL DATABASES INTO R
The following are the steps to import data from relational databases by using MonetDB.
Step 1: Create a database by using the MonetDB daemon monetdbd and a new database called

Step 2: Install MonetBD.R from R shell


> install.packages("MonetDB.R")
Step 3: Load the MonetDB.R library
> library(MonetDB.R)
Loading required package: DBI
Loading required package: digest
Step 4: Create a connection to the database
> conn <- dbConnect(MonetDB.R(), host="localhost", dbname="demo", user="monetdb",
password="monetdb")
Step 5: Create a database directly from R
> dbGetQuery(conn,"SELECT 'insert data'")
single_value
Step 6: Repeat Step 4 multiple times.
Step 7: Install and load dplyr to manipulate datasets in R
> install.packages("dplyr")
> library(dplyr)

filter, lag

intersect, setdiff, setequal, union


Step 8: Make a connection to database for dplyr
> monetdb_conn <- src_monetdb("demo") Final step: Create database for future import in R
> craftsmen <- tbl(monetdb_conn, "insert data")
impotenten <- tbl(monetdb_conn, "insert data")
invoices <- tbl(monetdb_conn, "insert data")
passengers <- tbl(monetdb_conn, "insert data")
seafarers <- tbl(monetdb_conn, "insert data")
soldiers <- tbl(monetdb_conn, "insert data")
total <- tbl(monetdb_conn, "insert data")
voyages <- tbl(monetdb_conn, "insert data")
8. IMPORTING DATA FROM NON-RELATIONAL DATABASES
The following are the steps to import data from non-relational databases to R by using
MongoDB.
Step 1: Install MongoDB.
import pandas as pandas
import pymongo as pymongo
df = pandas.read_table('../data/csdata.txt')
lst = [dict([(colname, row[i]) for i, colname in enumerate(df.columns)]) for row in df.values]
for i in range(3):
print lst[i]
con = pymongo.Connection('localhost', port = 27017)
test = con.db.test
test.drop()
for i in lst:
test.save(i)
Step 2: Using RMango, write the following command.
library(RMongo)
mg1 <- mongoDbConnect('db')
print(dbShowCollections(mg1))
query <- dbGetQuery(mg1, 'test', "{'AGE': {'$lt': 10}, 'LIQ': {'$gte': 0.1}, 'IND5A': {'$ne':
1}}")
data1 <- query[c('AGE', 'LIQ', 'IND5A')]
summary(data1)
Step 3: You will receive the output as the following.
Loading required package: rJava
Loading required package: methods
Loading required package: RUnit
[1] "system.indexes" "test"
AGE LIQ IND5A
Min. :6.000 Min. :0.1000 Min. :0
1st Qu.:7.000 1st Qu.:0.1831 1st Qu.:0
Median :8.000 Median :0.2970 Median :0
Mean :7.963 Mean :0.3745 Mean :0
3rd Qu.:9.000 3rd Qu.:0.4900 3rd Qu.:0
Max. :9.000 Max. :1.0000 Max. :0
9. IMPORTING DATA THROUGH WEB SCRAPING
Step 1: Install the packages.
library(rvest)
library(stringr)
library(plyr)
library(dplyr)
library(ggvis)
library(knitr)
options(digits = 4)
Step 2: Using PhantomJS, command the following.
// scrape_techstars.js
var webPage = require('webpage');
var page = webPage.create();
var fs = require('fs');
var path = 'techstars.html'
page.open('http://www.techstars.com/companies/stats/', function (status)
{
var content = page.content;
fs.write(path,content,'w')
phantom.exit();
}
);
Step 3: Use system() function.
# Let phantomJS scrape techstars, output is written to techstars.html
system("./phantomjs scrape_techstars.js")
Step 4:
batches <- html("techstars.html") %>%
html_nodes(".batch")

class(batches)
[1] "XMLNodeSet"
Step 5:
batch_titles <- batches %>%
html_nodes(".batch_class") %>%
html_text()
batch_season <- str_extract(batch_titles, "(Fall|Spring|Winter|Summer)")
batch_year <- str_extract(batch_titles, "([[:digit:]]{4})")
# location info is everything in the batch title that is not year info or season info
batch_location <- sub("\\s+$", "",
sub("([[:digit:]]{4})", "",
sub("(Fall|Spring|Winter|Summer)","",batch_titles)))
# create data frame with batch info.
batch_info <- data.frame(location = batch_location,
year = batch_year,
season = batch_season)

breakdown <- lapply(batches, function(x) {


company_info <- x %>% html_nodes(".parent")
companies_single_batch <- lapply(company_info, function(y){ as.list(gsub("\\[\\+\\]\\[\\-
\\]\\s", "", y %>%
html_nodes("td") %>%
html_text()))
})
df <- data.frame(matrix(unlist(companies_single_batch),
nrow=length(companies_single_batch),
byrow=T,
dimnames = list(NULL, c("company","funding","status","hq"))))
return(df)
})
# Add batch info to breakdown
batch_info_extended <- batch_info[rep(seq_len(nrow(batch_info)),
sapply(breakdown, nrow)),]
breakdown_merged <- rbind.fill(breakdown)

# Merge all information


techstars <- tbl_df(cbind(breakdown_merged, batch_info_extended)) %>%
mutate(funding = as.numeric(gsub(",","",gsub("\\$","",funding))))
Step 6:
## Source: local data frame [535 x 7]
##
## company funding status hq location year season
## 1 Accountable 110000 Active Fort Worth, TX Austin 2013 Fall
## 2 Atlas 1180000 Active Austin, TX Austin 2013 Fall
## 3 Embrace 110000 Failed Austin, TX Austin 2013 Fall
## 4 Filament Labs 1490000 Active Austin, TX Austin 2013 Fall
## 5 Fosbury 300000 Active Austin, TX Austin 2013 Fall
## 6 Gone! 840000 Active San Francisco, CA Austin 2013 Fall
## 7 MarketVibe 110000 Acquired Austin, TX Austin 2013 Fall
## 8 Plum 1630000 Active Austin, TX Austin 2013 Fall
## 9 ProtoExchange 110000 Active Austin, TX Austin 2013 Fall
## 10 Testlio 1020000 Active Austin, TX Austin 2013 Fall
## .. ... ... ... ... ... ... ...
names(techstars)
## [1] "company" "funding" "status" "hq" "location" "year"
## [7] "season"
10. IMPORTING DATA THROUGH TM PACKAGE
You can initiate the import data through TM package by installing and activating it as follows.
text <- readLines("")
And in the final step, write the following
docs <- Corpus(VectorSource(text))

You might also like