If you need to extract data from a page or website there are two ways to go about it: you can do it manually or you can use a web scraping tool. There are a lot of good tools out there, some free and some premium, but if you know your way around Microsoft Excel, you don’t need any of them–and you don’t even need to know any code.
There are essentially two ways to use Excel as a scraper: the web queries method and the VBA method. Here is how you use Excel to scrape a website using each.
Scraping web data is by no means the most sophisticated thing you can use Microsoft Excel for, but it does take some knowledge that would be considered “above basic.”
Having superior Excel skills is worth it for its own sake, but if you don’t consider yourself at least at the intermediate-level, consider investing in an Excel Bootcamp to give you the prerequisite knowledge you need to get started.
The Web Queries Method
This is the most straightforward of the two methods and the one that is most friendly and accessible for non-programmers. It involves two simple procedures in Excel but it gives you less customizability and there is still a large manual aspect to it.
Step 1: Creating the Web Queries
In order for Excel to interface with the website, you are trying to scrape you will need to first select the cell in which you want a particular piece of data to appear. After that, click “data>From Web” and a new web query box will appear asking you to enter the website’s address. Enter the site you’re scraping and then press go.
The page will then load and the yellow icons will appear next to data and tables. Excel is capable of detecting any tables that are embedded in a page’s HTML and you can directly take that data, in table form, from a given website.
Step 2: Import the Data
Once you have found the data or table you are looking for, click the import button and the table will be imported into your spreadsheet. The data is now neatly arranged in rows without you having had to copy and paste a bunch of individual data points or having to spend a bunch of time formatting.
The VBA Method
The VBA method is more sophisticated and requires a deeper understanding of Excel’s programming language (VBA) as well as HTML. VBA, or visual basic for application, is sometimes commonly referred to as macros, as that is what you are primarily designing.
You need to first enable your developer tab by right-clicking on “file>cuztomize ribbon>check developer tab.” The VBA method works by writing procedures in Excel’s programming language that then tell web pages (or other programs) what to do and how Excel should be importing various data sets you want.
Building Your Script
In order for Excel to communicate with a web page, a VBA script needs to be built or copy and pasted that sends requests to the page. VBA is often employed with XMLHTTP and utilizes common expressions to parse and extract data from websites. If you have Windows you can use WinHTTP alongside VBA as well as InternetExplorer to scrape sites.
If you don’t know any VBA but are interested in learning, these skills are in high demand regardless of the industry, since Excel is such a widely used and highly useful application. There are also plenty of forums and YouTube instructional videos to help you get started. Not only that, but there is a wide range of ready-made VBA macros that have already been created expressly for web scraping purposes that you can copy-paste directly into VBA.
Learning how to automate the extraction of data from the web to Excel sheets is useful in a wide range of applications, from investing to marketing. Whether you need to collect and organize data from the web for your business, for academic purposes, or for your own personal reasons, you don’t need any advanced coding or programming knowledge to perform what are ultimately very straightforward tasks.
So, if you were considering putting money into a web scrapping tool, you might want to ask yourself if this is something that couldn’t be solved with a little Excel knowledge.
This is not an endorsement of intensive scraping techniques that many websites expressly prohibit in their terms of service. Such techniques typically involve the use of bots which can place heavy requests on pages and servers.
Using Excel’s web queries functionality to collect publicly available information is not comparable with the aforementioned technique, and using VBA to create macros should be done with similar concerns for site terms and conditions and the pressure a macro puts on a site.