Skip to content

Commit aa7f9aa

Browse files
committed
Final version
1 parent 32dea94 commit aa7f9aa

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

docs/scenarios/scrape.rst

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ Web Scraping
66

77
Web sites are written using HTML, which means that each web page is a
88
structured document. Sometimes it would be great to obtain some data from
9-
them and preserve the structure while we're at it, but this isn't always easy.
10-
It's not often that web sites provide their data in comfortable formats
11-
such as ``.csv``.
9+
them and preserve the structure while we're at it. Web sites provide
10+
don't always provide their data in comfortable formats such as ``.csv``.
1211

13-
This is where web scraping comes in. Web scraping is the practice of using
12+
This is where web scraping comes in. Web scraping is the practice of using a
1413
computer program to sift through a web page and gather the data that you need
15-
in a format most useful to you.
14+
in a format most useful to you while at the same time preserving the structure
15+
of the data.
1616

1717
lxml and Requests
1818
-----------------
@@ -43,12 +43,12 @@ we can go over two different ways: XPath and CSSSelect. In this example, I
4343
will focus on the former.
4444

4545
XPath is a way of locating information in structured documents such as
46-
HTML or XML pages. A good introduction to XPath is `here <http://www.w3schools.com/xpath/default.asp>`_ .
46+
HTML or XML documents. A good introduction to XPath is on `W3Schools <http://www.w3schools.com/xpath/default.asp>`_ .
4747

48-
One can also use various tools for obtaining the XPath of elements such as
49-
FireBug for Firefox or in Chrome you can right click an element, choose
50-
'Inspect element', highlight the code and the right click again and choose
51-
'Copy XPath'.
48+
There are also various tools for obtaining the XPath of elements such as
49+
FireBug for Firefox or if you're using Chrome you can right click an
50+
element, choose 'Inspect element', highlight the code and then right
51+
click again and choose 'Copy XPath'.
5252

5353
After a quick analysis, we see that in our page the data is contained in
5454
two elements - one is a div with title 'buyer-name' and the other is a
@@ -90,10 +90,10 @@ Lets see what we got exactly:
9090
'$15.00', '$114.07', '$10.09']
9191

9292
Congratulations! We have successfully scraped all the data we wanted from
93-
a web page using lxml and we have it stored in memory as two lists. Now we
94-
can either continue our work on it, analyzing it using python or we can
95-
export it to a file and share it with friends.
93+
a web page using lxml and Requests. We have it stored in memory as two
94+
lists. Now we can do all sorts of cool stuff with it: we can analyze it
95+
using Python or we can save it a file and share it with the world.
9696

97-
A cool idea to think about is writing a script to iterate through the rest
98-
of the pages of this example data set or making this application use
99-
threads to improve its speed.
97+
A cool idea to think about is modifying this script to iterate through
98+
the rest of the pages of this example dataset or rewriting this
99+
application to use threads for improved speed.

0 commit comments

Comments
 (0)