VBA is a very useful helper in the Microsoft Office world to automate everyday recurring processes. Such a process can be that you need to read information from a certain web page and transfer it to Excel, Word or another file, for example. Of course, this can also be done by copy and paste. But it would be much better if you could read the source code of a web page with one click and then process it further.
Fetching HTML code from web page
The following image shows an Excel spreadsheet as an example, in which an Internet address/URL is stored in the first cell. In addition, a “Read web page” button has been added to the Excel table. This is done via the “Developer tools” ➥ “Insert” menu.
Now, when you click on the button, the web page/URL that is entered in the first cell is to be read out for further use.
By right-clicking on the button and selecting “Assign Macro…”, you then assign the VBA function to be executed to the button. In this case the function is called “BtnReadWebsite_Click”.
In the dialog that opens, you can select the SUB-method that reads the HTML code of the web page.
VBA code for scraping URL
To read the content of a web page, use the following VBA code in the VBA editor:
Sub BtnReadWebsite_Click()
Dim url As String
url = Cells(1, 1).Value
URL_Load url
End Sub
Private Sub URL_Load(ByVal sURL As String)
Dim appInternetExplorer As Object
Dim htmlTxt As String
Set appInternetExplorer = CreateObject("InternetExplorer.Application")
appInternetExplorer.navigate sURL
Do: Loop Until appInternetExplorer.Busy = False
Do: Loop Until appInternetExplorer.Busy = False
htmlTxt = appInternetExplorer.document.DocumentElement.outerHTML
Debug.Print htmlTxt
Set appInternetExplorer = Nothing
Close
'Do something here with html source code: parse, output, save,...
MsgBox "The text was read out!!"
End Sub
In the first Sub-routine the internet address is read from the Excel cell and passed to the URL_Load function.
The URL_Load function then creates an Internet Explorer instance that loads the web page. The “Loop” functions are used to bridge the time needed for loading the web page.
Via appInternetExplorer.document.DocumentElement.outerHTML the assignment of the read HTML code takes place.
Now you can use the string variable “htmlText” to process the text, e.g on position of the comment. In the example here the text is printed via Debug.Print in the direct window of the VBA editor. But you can also add the text to the clipboard or paste it into a concrete Excel cell.
In most cases it will be necessary to parse the source code of the web page, because you need only a specific information from the web page.