|
| 1 | +# Web Scraping With Excel VBA |
| 2 | + |
| 3 | +In this tutorial, we are going to learn how to implement a web scraper in Excel using VBA. We will briefly go through the installation and prepare the environment. We will also write a scraper using VBA macro to fetch data into excel. So, let’s get started. |
| 4 | + |
| 5 | +## What is Web Scraping? |
| 6 | + |
| 7 | +Web scraping is a technique to extract useful data from the Internet. Modern Internet heavily relies on web scraping. Every time you use a search engine such as Google or Duck Duck Go, etc. you are seeing the results of mass web scraping. For example, Google scrapes billions of pages every day to refine and index the latest relevant information. |
| 8 | + |
| 9 | +## What is VBA? |
| 10 | + |
| 11 | +VBA stands for “Visual Basic Application”. It is a programming language of Microsoft Corporation. VBA extends the capabilities of Microsoft office tools and allows users to develop advanced functions and complex automation. VBA can also be used to write macros to pull data from websites into Excel. |
| 12 | + |
| 13 | +## Pros and Cons of Using VBA for Scraping |
| 14 | + |
| 15 | +Let’s discuss some of the advantages and disadvantages of VBA web scraping: |
| 16 | + |
| 17 | +### Pros |
| 18 | + |
| 19 | +- **ready to use** - VBA is bundled with Microsoft Office, if you already have Microsoft office installed then you won’t have to install anything. You can use VBA right away in all the Microsoft office tools. |
| 20 | +- **reliable** - Both Microsoft Excel & VBA are developed and maintained by Microsoft. Unlike other development environments, these tools can be upgraded together to the latest version without much hassle. |
| 21 | +- **Out-of-the-box support for browser** - VBA web scrapers can take advantage of Microsoft’s built-in browser Internet Explorer or the latest Microsoft Edge which makes javascript-based dynamic website scraping pretty convenient. |
| 22 | +- **Complete automation** - When running the VBA script, the user doesn’t have to click any buttons or, interact with the browser. All of these can be done from the VBA script including log-in, scrolling, button clicks, etc. |
| 23 | + |
| 24 | +### Cons |
| 25 | + |
| 26 | +- **Only works in Windows** - VBA scrapers are not cross-platform. It only works in a Windows environment. Note that, Microsoft office has support for Mac but it is way harder to write a working VBA scraper on Mac. The library supports are also limited, for example, you can not use Internet Explorer or Microsoft Edge. |
| 27 | +- **Tightly Coupled with Microsoft Office** - VBA scrapers are tightly coupled with Microsoft Office tools. Third-party useful scraping/helper tools are hard to integrate with it. |
| 28 | +- **Steep learning Curve** - VBA programming language is less beginner friendly and a bit harder than the other modern programming languages such as Python or Javascript. |
| 29 | + |
| 30 | +Overall, if you are looking to develop a web scraper for the windows operating system that automatically pulls data from a website into Excel and, won’t need support for other operating systems, then VBA-based web scraping will be a good choice. |
| 31 | + |
| 32 | +Now, that we learned about VBA Web scraping. Let’s explore VBA scripting and write our first web scraper. |
| 33 | + |
| 34 | +## Guide to Extracting Website Data by Using Excel VBA |
| 35 | + |
| 36 | +Before we begin, let us make sure we’ve installed all the prerequisites and set up our environment properly so that it will be easier to follow along. |
| 37 | + |
| 38 | +### Prerequisites |
| 39 | + |
| 40 | +In this tutorial, I will be using Windows 10 & Microsoft Office 10. However, the steps will be same or similar for other version of windows. |
| 41 | +You will only need a computer with Windows Operating System. In addition, you will also have to install Microsoft Office if you don’t have it already. Detailed installation instructions can be found in the Microsoft’s Official documentation |
| 42 | + |
| 43 | +### Preparing the environment |
| 44 | + |
| 45 | +Now, that you’ve installed Microsoft office, complete the below steps to setup the development environment: |
| 46 | + |
| 47 | +### Step 1 - Open Microsoft Excel |
| 48 | + |
| 49 | +From the start menu or Cortana search, find Microsoft Excel and open the application. You will see a similar interface as below: |
| 50 | + |
| 51 | +Click on File |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | +### Step 2 - Go to Option to enable developer menu |
| 56 | + |
| 57 | +By default, Excel doesn’t show the developer button in the top ribbon. To enable this we will have to go to “Options” from the File menu. |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | +### Step 3 - Select Customize Ribbon |
| 62 | + |
| 63 | +Once you click the “Options”, a dialog will pop up, from the side menu select “Customize Ribbon”. Click on the check box next to “developer”. Make sure it is ticked and then click on Ok. |
| 64 | + |
| 65 | + |
| 66 | + |
| 67 | +### Step 4 - Open Visual Basic Application Dialog |
| 68 | + |
| 69 | +Now you will see a new developer button on the top ribbon, clicking on it will expand the developer menu. From the menu, select “Visual Basic” |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | +### Step 5 - Insert a new Module |
| 74 | + |
| 75 | +Once you click on visual basic, it will open a new window like below: |
| 76 | + |
| 77 | + |
| 78 | + |
| 79 | +Click on “insert” and select “Module” to insert a new module. It will open the module editor |
| 80 | + |
| 81 | + |
| 82 | + |
| 83 | +### Step 6 - Add new references |
| 84 | + |
| 85 | + |
| 86 | +From the top menu select `Tools > References...`, it will open a new window like the one below. Make sure to scroll through the available list of references and find Microsoft HTML Client Library & Microsoft Internet Controls in the check box. Click on the check box next to both of them to enable these references. Once you are done click ok. |
| 87 | + |
| 88 | + |
| 89 | + |
| 90 | +That’s it! Our development environment is all set. Let’s write our first Excel VBA scraper |
| 91 | + |
| 92 | +### Step 7 - Automate Microsoft Edge to Open a website |
| 93 | + |
| 94 | +In this step, we will update our newly created module to open the following website: <https://quotes.toscrape.com>. In the module editor let’s write the below code: |
| 95 | + |
| 96 | +```vb |
| 97 | +Sub scrape_quotes() |
| 98 | + Dim browser As InternetExplorer |
| 99 | + Dim page As HTMLDocument |
| 100 | + Set browser = New InternetExplorer |
| 101 | + browser.Visible = True |
| 102 | + browser.navigate ("https://quotes.toscrape.com") |
| 103 | +End Sub |
| 104 | +``` |
| 105 | + |
| 106 | +We are defining a subroutine named `scrape_quotes()`. This function will be executed when we run this script. Inside the subroutine, we are defining two objects `browser` and `page`. |
| 107 | + |
| 108 | +The `browser` object will allow us to interact with Microsoft Edge. Then we also set the browser as visible so that we can see it in action. The browser.`navigate()` function tells the VBA browser object to open the URL. The output will be similar to this: |
| 109 | + |
| 110 | + |
| 111 | + |
| 112 | +>💡 Note: You might be wondering why we are writing `InternetExplorer` to interact with Microsoft Edge. VBA initially only supported Internet Explorer-based automation, but once Microsoft discontinued Internet Explorer. They deployed some updates so that VBA’s InternetExplorer module can run the Microsoft Edge browser in IEMode without any issues. The above code will also work in older Windows that have Internet Explorer still available instead of Edge. |
| 113 | +
|
| 114 | +### Step 8 - Scrape Data using VBA Script & Save it to Excel |
| 115 | + |
| 116 | +Now, we will scrape the quotes and authors from the website. For simplicity, we will store it in the first Sheet of the excel spreadsheet and, grab the top 5 quotes for now. |
| 117 | + |
| 118 | +We will begin by defining two new objects one for quotes & other for authors |
| 119 | + |
| 120 | +```vb |
| 121 | + Dim quotes As Object |
| 122 | + Dim authors As Object |
| 123 | +``` |
| 124 | + |
| 125 | +After navigating to the website we will also add a little bit of pause so that the website loads properly by using Loop. |
| 126 | + |
| 127 | +```vb |
| 128 | +Do While browser.Busy: Loop |
| 129 | +``` |
| 130 | + |
| 131 | +Next we will grab the quotes and authors from the HTML document: |
| 132 | + |
| 133 | +```vb |
| 134 | + Set page = browser.document |
| 135 | + Set quotes = page.getElementsByClassName("quote") |
| 136 | + Set authors = page.getElementsByClassName("author") |
| 137 | +``` |
| 138 | + |
| 139 | +Then, we will use a for loop to populate the excel rows with the extracted data by calling the Cells function and passing the row and column position: |
| 140 | + |
| 141 | +```vb |
| 142 | + For num = 1 To 5 |
| 143 | + Cells(num, 1).Value = quotes.Item(num).innerText |
| 144 | + Cells(num, 2).Value = authors.Item(num).innerText |
| 145 | + Next num |
| 146 | +``` |
| 147 | + |
| 148 | +Finally, we will close the browser by calling the quit function. This will close the browser Window. |
| 149 | + |
| 150 | +```vb |
| 151 | + browser.Quit |
| 152 | +``` |
| 153 | + |
| 154 | +## Output |
| 155 | + |
| 156 | +Now if we run the script again, it will open Microsoft Edge and browse to the quotes.toscrape.com website, grab the top 5 quotes from the list and save them to the current excel file’s first sheet. |
| 157 | + |
| 158 | + |
| 159 | + |
| 160 | +## Source Code |
| 161 | + |
| 162 | +The full source code is given below: |
| 163 | + |
| 164 | +```vb |
| 165 | +Sub scrape_quotes() |
| 166 | + Dim browser As InternetExplorer |
| 167 | + Dim page As HTMLDocument |
| 168 | + Dim quotes As Object |
| 169 | + Dim authors As Object |
| 170 | + |
| 171 | + Set browser = New InternetExplorer |
| 172 | + browser.Visible = True |
| 173 | + browser.navigate ("https://quotes.toscrape.com") |
| 174 | + Do While browser.Busy: Loop |
| 175 | + |
| 176 | + Set page = browser.document |
| 177 | + Set quotes = page.getElementsByClassName("quote") |
| 178 | + Set authors = page.getElementsByClassName("author") |
| 179 | + |
| 180 | + For num = 1 To 5 |
| 181 | + Cells(num, 1).Value = quotes.Item(num).innerText |
| 182 | + Cells(num, 2).Value = authors.Item(num).innerText |
| 183 | + Next num |
| 184 | + |
| 185 | + browser.Quit |
| 186 | +End Sub |
| 187 | +``` |
| 188 | + |
| 189 | +## Conclusion |
| 190 | + |
| 191 | +Let’s do a quick recap before calling it a day. VBA-based web scraping is a great choice for Windows automation and web extraction. It enables us to automate and extract web data. We learned the pros and cons of Excel VBA. We also learned all the steps of how to prepare the development environment & use VBA script with Excel to automate Microsoft Edge and scrape data from the Internet. |
| 192 | + |
| 193 | +The biggest disadvantage that we’ve seen is the lack of cross-platform support. If you want to develop web scrapers that can be used on multiple operating systems such as Linux or mac, etc. then Web Query can also be an option. But, I would recommend exploring Python. It is arguably the best programming language for extracting web data and it can save data in excel format in a few lines of code! |
0 commit comments