WEB SCRAPING IN PYTHON:
mmmmmmmmmmmmmmmmmkmmmmmmmmmmmmmmmmmmmmmkmWEB SCRAPE ON
INSPIRATIONAL QUOTES USING PYTHON:
{HOW TO SCRAPE INSPIRING QUOTES FROM THE INTERNET WITH PYTHON}
In this article, I will show you how to scrape inspirational quotes from a website using the Python
programming language. I think everyone likes to hear some inspirational quotes from time to time
and hopefully the quotes that we will scrape within this article will brighten your day.
For anyone reading this article and don’t know what web scraping is, I will define it now. Web
scraping, or simply scraping is the act or process of extracting data from a website. This means you
will learn how to extract information from a website using Python.
Understanding the Concept Before Writing the Program
Before writing any code, we must first understand the concept and method for scraping the website.
We have to find a website that contains inspirational quotes. Once, that site is found, then we need
to understand how that website is structured to find and extract the quote data that we want.
1. Find A Website That Contains Inspirational Quotes
Like I said before, we need to find a website that contains the data that we want to extract. Luckily, I
was able to find a great website called goodreads.com. The link is
https://www.goodreads.com/quotes/tag/inspirational?page=0. This website contains inspirational
quotes, and the author who is being quoted.
2. View The Structure Of The Website
Now, we need to know how this website is structured. This can be done by using the inspection tool
on the Google Chrome Browser.
By using the inspection tool, I can see that the quotes appear to be under the div tag with class =
“quoteText”, and the author of the quote is under the span tag with class = “authorOrTitle” ,so I will
use this information to help me gather the data.
Furthermore, I can see that those two classes are under other div tags. So this information will help
me locate these two classes that contain the data that I want through the program. Also the site
itself has many pages with quotes, so this means I can iterate through each page to gather more
quotes simply by changing the page number at the end of the URL.
Example of iterating through the web page:
https://www.goodreads.com/quotes/tag/inspirational?page=0
https://www.goodreads.com/quotes/tag/inspirational?page=1
https://www.goodreads.com/quotes/tag/inspirational?page=2
If you prefer not to read this article and would like a video representation of it, you can check out the
YouTube Video . It goes through everything in this article with a little more detail, and will help make
it easy for you to start programming even if you don’t have the programming language Python
installed on your computer. Or you can use both as supplementary materials for learning.
Programming
First I will write a description about the program, this way I can simply read the description and know
what this program is about or is supposed to do.
#Description: Scrape Inspirational Quotes Using Python
Next, I want to import the libraries that are needed throughout the program
#Import the dependencies
from bs4 import BeautifulSoup
import pandas as pd
import requests
import urllib.request
import time
Now, create empty lists to store the inspirational quote and the author of the quote.
#Create lists to store the scraped data
authors = []
quotes = []
Time for the “meat” of the program. I will create a function to automatically scrape the quote and
the author of the quote and store that data into the empty lists created previously.
#Create a function to scrape the site
def scrape_website(page_number):
page_num = str(page_number) #Convert the page number to a string
URL = 'https://www.goodreads.com/quotes/tag/inspirational?page='+page_num #append the page
number to complete the URL
webpage = requests.get(URL) #Make a request to the website
soup = BeautifulSoup(webpage.text, "html.parser") #Parse the text from the website
quoteText = soup.find_all('div', attrs={'class':'quoteText'}) #Get the tag and it's class
for i in quoteText:
quote = i.text.strip().split('\n')[0]#Get the text of the current quote, but only the sentence before a
new line
author = i.find('span', attrs={'class':'authorOrTitle'}).text.strip()
#print(quote)
quotes.append(quote)
#print(author)
aLoop through ’n’ number of pages to scrape the quotes from.
#Loop through 'n' pages
n = 10
for num in range(0,n):
scrape_website(num)
Combine the two lists together.uthors.append(author)
#Combine the lists
combined_list = []
for i in range(len(quotes)):
combined_list.append(quotes[i]+'-'+authors[i])
Finally, time to show the inspirational quotes and the author of that quote!
#Show the combined list
combined_list
That’s it, you are done! Hopefully this was useful to you!
If you are interested in reading more on Python one of the fastest growing programming languages
that many companies and computer science departments use, then I recommend you check out the
book Learning Python written by Mark Lutz’s.
Conclusion
Thanks for reading this article I hope its helpful to you all! If you enjoyed this article and found it
helpful please leave a comment to show your appreciation. Keep up the learning, and if you like
machine learning, mathematics, computer science, programming or algorithm analysis, please visit
and subscribe to my YouTube channels (randerson112358 & computer science).