Skip to content

Commit ad4eea4

Browse files
authored
Add files via upload
1 parent c9d987b commit ad4eea4

File tree

12 files changed

+222
-0
lines changed

12 files changed

+222
-0
lines changed
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Web Scraping With Excel VBA
2+
3+
In this tutorial, we are going to learn how to implement a web scraper in Excel using VBA. We will briefly go through the installation and prepare the environment. We will also write a scraper using VBA macro to fetch data into excel. So, let’s get started.
4+
5+
## What is Web Scraping?
6+
7+
Web scraping is a technique to extract useful data from the Internet. Modern Internet heavily relies on web scraping. Every time you use a search engine such as Google or Duck Duck Go, etc. you are seeing the results of mass web scraping. For example, Google scrapes billions of pages every day to refine and index the latest relevant information.
8+
9+
## What is VBA?
10+
11+
VBA stands for “Visual Basic Application”. It is a programming language of Microsoft Corporation. VBA extends the capabilities of Microsoft office tools and allows users to develop advanced functions and complex automation. VBA can also be used to write macros to pull data from websites into Excel.
12+
13+
## Pros and Cons of Using VBA for Scraping
14+
15+
Let’s discuss some of the advantages and disadvantages of VBA web scraping:
16+
17+
### Pros
18+
19+
- **ready to use** - VBA is bundled with Microsoft Office, if you already have Microsoft office installed then you won’t have to install anything. You can use VBA right away in all the Microsoft office tools.
20+
- **reliable** - Both Microsoft Excel & VBA are developed and maintained by Microsoft. Unlike other development environments, these tools can be upgraded together to the latest version without much hassle.
21+
- **Out-of-the-box support for browser** - VBA web scrapers can take advantage of Microsoft’s built-in browser Internet Explorer or the latest Microsoft Edge which makes javascript-based dynamic website scraping pretty convenient.
22+
- **Complete automation** - When running the VBA script, the user doesn’t have to click any buttons or, interact with the browser. All of these can be done from the VBA script including log-in, scrolling, button clicks, etc.
23+
24+
### Cons
25+
26+
- **Only works in Windows** - VBA scrapers are not cross-platform. It only works in a Windows environment. Note that, Microsoft office has support for Mac but it is way harder to write a working VBA scraper on Mac. The library supports are also limited, for example, you can not use Internet Explorer or Microsoft Edge.
27+
- **Tightly Coupled with Microsoft Office** - VBA scrapers are tightly coupled with Microsoft Office tools. Third-party useful scraping/helper tools are hard to integrate with it.
28+
- **Steep learning Curve** - VBA programming language is less beginner friendly and a bit harder than the other modern programming languages such as Python or Javascript.
29+
30+
Overall, if you are looking to develop a web scraper for the windows operating system that automatically pulls data from a website into Excel and, won’t need support for other operating systems, then VBA-based web scraping will be a good choice.
31+
32+
Now, that we learned about VBA Web scraping. Let’s explore VBA scripting and write our first web scraper.
33+
34+
## Guide to Extracting Website Data by Using Excel VBA
35+
36+
Before we begin, let us make sure we’ve installed all the prerequisites and set up our environment properly so that it will be easier to follow along.
37+
38+
### Prerequisites
39+
40+
In this tutorial, I will be using Windows 10 & Microsoft Office 10. However, the steps will be same or similar for other version of windows.
41+
You will only need a computer with Windows Operating System. In addition, you will also have to install Microsoft Office if you don’t have it already. Detailed installation instructions can be found in the Microsoft’s Official documentation
42+
43+
### Preparing the environment
44+
45+
Now, that you’ve installed Microsoft office, complete the below steps to setup the development environment:
46+
47+
### Step 1 - Open Microsoft Excel
48+
49+
From the start menu or Cortana search, find Microsoft Excel and open the application. You will see a similar interface as below:
50+
51+
Click on File
52+
53+
![step 1](images/image1.png)
54+
55+
### Step 2 - Go to Option to enable developer menu
56+
57+
By default, Excel doesn’t show the developer button in the top ribbon. To enable this we will have to go to “Options” from the File menu.
58+
59+
![step 2](images/image6.png)
60+
61+
### Step 3 - Select Customize Ribbon
62+
63+
Once you click the “Options”, a dialog will pop up, from the side menu select “Customize Ribbon”. Click on the check box next to “developer”. Make sure it is ticked and then click on Ok.
64+
65+
![step 3](images/image9.png)
66+
67+
### Step 4 - Open Visual Basic Application Dialog
68+
69+
Now you will see a new developer button on the top ribbon, clicking on it will expand the developer menu. From the menu, select “Visual Basic”
70+
71+
![step 4](images/image4.png)
72+
73+
### Step 5 - Insert a new Module
74+
75+
Once you click on visual basic, it will open a new window like below:
76+
77+
![step 5a](images/image5.png)
78+
79+
Click on “insert” and select “Module” to insert a new module. It will open the module editor
80+
81+
![step 5b](images/image3.png)
82+
83+
### Step 6 - Add new references
84+
85+
86+
From the top menu select `Tools > References...`, it will open a new window like the one below. Make sure to scroll through the available list of references and find Microsoft HTML Client Library & Microsoft Internet Controls in the check box. Click on the check box next to both of them to enable these references. Once you are done click ok.
87+
88+
![step 6](images/image8.png)
89+
90+
That’s it! Our development environment is all set. Let’s write our first Excel VBA scraper
91+
92+
### Step 7 - Automate Microsoft Edge to Open a website
93+
94+
In this step, we will update our newly created module to open the following website: <https://quotes.toscrape.com>. In the module editor let’s write the below code:
95+
96+
```vb
97+
Sub scrape_quotes()
98+
Dim browser As InternetExplorer
99+
Dim page As HTMLDocument
100+
Set browser = New InternetExplorer
101+
browser.Visible = True
102+
browser.navigate ("https://quotes.toscrape.com")
103+
End Sub
104+
```
105+
106+
We are defining a subroutine named `scrape_quotes()`. This function will be executed when we run this script. Inside the subroutine, we are defining two objects `browser` and `page`.
107+
108+
The `browser` object will allow us to interact with Microsoft Edge. Then we also set the browser as visible so that we can see it in action. The browser.`navigate()` function tells the VBA browser object to open the URL. The output will be similar to this:
109+
110+
![step 7](images/image7.png)
111+
112+
>💡 Note: You might be wondering why we are writing `InternetExplorer` to interact with Microsoft Edge. VBA initially only supported Internet Explorer-based automation, but once Microsoft discontinued Internet Explorer. They deployed some updates so that VBA’s InternetExplorer module can run the Microsoft Edge browser in IEMode without any issues. The above code will also work in older Windows that have Internet Explorer still available instead of Edge.
113+
114+
### Step 8 - Scrape Data using VBA Script & Save it to Excel
115+
116+
Now, we will scrape the quotes and authors from the website. For simplicity, we will store it in the first Sheet of the excel spreadsheet and, grab the top 5 quotes for now.
117+
118+
We will begin by defining two new objects one for quotes & other for authors
119+
120+
```vb
121+
Dim quotes As Object
122+
Dim authors As Object
123+
```
124+
125+
After navigating to the website we will also add a little bit of pause so that the website loads properly by using Loop.
126+
127+
```vb
128+
Do While browser.Busy: Loop
129+
```
130+
131+
Next we will grab the quotes and authors from the HTML document:
132+
133+
```vb
134+
Set page = browser.document
135+
Set quotes = page.getElementsByClassName("quote")
136+
Set authors = page.getElementsByClassName("author")
137+
```
138+
139+
Then, we will use a for loop to populate the excel rows with the extracted data by calling the Cells function and passing the row and column position:
140+
141+
```vb
142+
For num = 1 To 5
143+
Cells(num, 1).Value = quotes.Item(num).innerText
144+
Cells(num, 2).Value = authors.Item(num).innerText
145+
Next num
146+
```
147+
148+
Finally, we will close the browser by calling the quit function. This will close the browser Window.
149+
150+
```vb
151+
browser.Quit
152+
```
153+
154+
## Output
155+
156+
Now if we run the script again, it will open Microsoft Edge and browse to the quotes.toscrape.com website, grab the top 5 quotes from the list and save them to the current excel file’s first sheet.
157+
158+
![output](images/image2.png)
159+
160+
## Source Code
161+
162+
The full source code is given below:
163+
164+
```vb
165+
Sub scrape_quotes()
166+
Dim browser As InternetExplorer
167+
Dim page As HTMLDocument
168+
Dim quotes As Object
169+
Dim authors As Object
170+
171+
Set browser = New InternetExplorer
172+
browser.Visible = True
173+
browser.navigate ("https://quotes.toscrape.com")
174+
Do While browser.Busy: Loop
175+
176+
Set page = browser.document
177+
Set quotes = page.getElementsByClassName("quote")
178+
Set authors = page.getElementsByClassName("author")
179+
180+
For num = 1 To 5
181+
Cells(num, 1).Value = quotes.Item(num).innerText
182+
Cells(num, 2).Value = authors.Item(num).innerText
183+
Next num
184+
185+
browser.Quit
186+
End Sub
187+
```
188+
189+
## Conclusion
190+
191+
Let’s do a quick recap before calling it a day. VBA-based web scraping is a great choice for Windows automation and web extraction. It enables us to automate and extract web data. We learned the pros and cons of Excel VBA. We also learned all the steps of how to prepare the development environment & use VBA script with Excel to automate Microsoft Edge and scrape data from the Internet.
192+
193+
The biggest disadvantage that we’ve seen is the lack of cross-platform support. If you want to develop web scrapers that can be used on multiple operating systems such as Linux or mac, etc. then Web Query can also be an option. But, I would recommend exploring Python. It is arguably the best programming language for extracting web data and it can save data in excel format in a few lines of code!
75.7 KB
Loading
69.1 KB
Loading
42.8 KB
Loading
74.7 KB
Loading
39.4 KB
Loading
109 KB
Loading
65.7 KB
Loading
52.4 KB
Loading
91.2 KB
Loading

0 commit comments

Comments
 (0)