Hacking Excel

This is a more intermediate level tutorial on Excel Visual Basic, but a beginner tutorial for Selenium Basic. If you are an absolute beginner at working with Excel VBA please visit VBA Tutorial Group: From Beginner to Intermediate, which will guide you through setting up Visual Basic in Excel and working with forms, subroutines, and functions. You should know what functions and subroutines are before proceeding. It could also be helpful if you know what DOM is and have some experience with it.

Now that I have the disclaimer out of the way, shall we begin?

What are we creating?

A tool within Excel that will let you search Google.

You can see a video of what we are creating and download an empty copy (VBA Free) of the file I used to create it here: Hacking Excel

Requirements List:

  • A location to type in our search query within Excel
  • A button to tell it to begin to searching
  • The ability to return the first page results from a search
  • A screenshot of the search results
  1. Getting Required References

In the old days, we could just create an object for Internet Explorer and interact with it using DOM. You can still do that, but Internet Explorer isn’t widely used anymore and it has been replaced by the Edge browser. It would be nicer to be able to use a different browser and have Excel VBA interact with it.

Unfortunately, you cannot really do that with core Excel, you probably want to integrate Selenium Basic.

a. Navigate a web browser to https://github.com/florentbr/SeleniumBasic/releasesb. Click to download Selenium Basic, I am using version 2.0.9.0 but most versions close to this should work fine

c. Install Selenium Basic. It should have given you an executable file, follow the screen prompts when you run it.

Accept the license agreement:

You can leave most of the default component settings alone, but make sure the WebDriver for the browser you want to use is selected:

d. Create a new reference to the Selenium Type Library:

That install will create a new reference for you to use in Excel. From the developer panel, go to “Tools” then find “References” and click it.

A Screen will pop-up that looks like the below screen, find “Selenium Type Library” and put a check mark in the box next to it. Hit “OK” and move on to step 2.

*WARNING: Due to the multiple pieces of software from different vendors working together, you may find some things do not work for you in the following code. 9 times out of 10 (assuming you have eliminated all possible typing mistakes) it has to do with the browser drivers. You might need to download a new ChromeDriver or downgrade your web browser. This is sometimes the nature of hacking things together – Google searching can be your friend here.

While creating this tutorial, I upgraded my Selenium Basic. It broke my code. I had to upgrade my ChromeDriver. A new ChromeDriver can be found here: https://sites.google.com/a/chromium.org/chromedriver/downloads. You will have to manually replace it in your Selenium Basic files if you encounter a similar issue. (C:\Users\username\AppData\Local\SeleniumBasic)

  1. Set up your Excel Document

You can either use the blank copy I provided at the link above or you can create your own Excel file. If you create your own, make sure you have at least one sheet, name it “Google Search.” On the “Google Search” sheet, place a single button and a single text box. Feel free to decorate the rest of the page how you see fit. As an example, this is how the one I created looks:

Name the text box “SearchBox” in the properties, and your search button can be named “BeginAutomation.”

  1. Create a getURL Function and a Selenium Object

We need a way to use selenium to interact with our browser. We will do this by adding the following line of code outside of any function or subroutine:

Private seleniumObj As New Selenium.ChromeDriver

You may notice the part “Selenium.ChromeDriver,” you can also choose to use EdgeDriver, WebDriver, FirefoxDriver and there is even one for Internet Explorer.

As far as the URL we will be navigating to, you can go about this anyway you want. I have a bit of a preference for putting things like URLs, File Paths, and Cell References in a function at the top of my document/page/class.

Private Function getURL() As String
   getURL = "https://www.google.com"
End Function

Your total code at this point should look like the image below:

  1. Create the NavigateToWebPage Function

This function will open up the browser and navigate to our URL. Within it, I have placed some basic code to make sure that before it even bothers to open a browser, our search box contains something to search for. If the search box doesn’t contain any text we want to stop the execution of the code and flash a quick message up letting the user know.

Private Function NavigateToWebPage() As Boolean

    If (SearchBox.Value = "") Then

        MsgBox "You need to enter in a search term"

        NavigateToWebPage = False

    Else

        Call seleniumObj.Start("chrome", getURL())

        Call seleniumObj.Get(getURL())

        NavigateToWebPage = True

    End If

End Function

We use return values of “True” and “False” to give permission for the code to continue execution. In this case, if the SearchBox.Value contains no text we pass back a false value. The only way to continue execution after that is to get the “True” value.

The line Call seleniumObj.Start(“chrome”, getURL())opens up the browser.  It should also navigate to our URL – as I noted in the warning in step 1, I updated my Selenium Basic. Among problems that caused, that line no longer navigated to the URL that I gave it. So I added the line below it, Call seleniumObj.Get(getURL()),and my problem was “fixed.”

Within seleniumObj.Start, you can replace “chrome” with “firefox” or “edge” depending upon which web browser you want the automation to use.

Your function should resemble the below image:

  1. Finding Names, IDs, Tags, and Classes of Items

In order to do this, we have to be able to tell the application where on the page we need to enter our search term and where the button is that we need to click. This can sometimes be a bit of a process. We have to examine the code of the web page we want to interact with. We have to go through it and look for things we can use to help us tell the automation where to go.

For our purposes, right now, we just need to find where to enter our search text and we have to click a button to submit that search text. In order to do this we have to go to Google and right-click on the elements we want to examine and select “Inspect”. If I right-click on the google search bar, I will see this:

And this is what the HTML code for the Google search bar input looks like:

I have underlined and color-coded the items you want to look for, these are not exclusively the only attributes of an HTML tag that you can use to interact with a web page, they just tend to be the easiest to work with. Green id is the best, in my opinion anyway, to work with. The reason being, in a properly formatted web page, an id value is supposed to be unique. An id value is also not required and you won’t always get one.

Yellow items, especially class, is rarely unique. There are ways to work with it still, I will show you some ways to handle potentially non-unique items in the code later in the tutorial.

The orange underlined item is the HTML tag itself. You can work with any HTML tag, but you could find yourself going through all the code trying to count all the tags of the same type.

This is the HTML code for the button we need to click:

If you notice, there is no id attribute and it also uses an HTML input tag, but we have something else to work with, a value. We also have a name attribute, and as far as I can tell it is the only occurrence of that name on the web page. As it turns out, using the name attribute only worked about half the time, so the code I will show you below looks for the input tag and matches the value attribute – and it gave me a 100% success rate.

6. Create the BeginSearch Function

We will do a little bit of error checking, and as before we will return Boolean values to help us decide if the code should continue to execute.

Private Function BeginSearch() As Boolean

    Set SearchBarEl = seleniumObj.FindElementById("lst-ib", timeout:=1000)

    If (SearchBarEl Is Nothing) Then

        MsgBox "Unable to enter in the Search Term"

        BeginSearch = False

    Else

        SearchBarEl.SendKeys SearchBox.Text

        seleniumObj.Wait 1000

        Set SearchButton = seleniumObj.FindElementsByTag("input", timeout:=1000)

        If (SearchButton Is Nothing) Then

            MsgBox "Unable to find the submit button"

            BeginSearch = False

        Else

            For Each inp In SearchButton

                If (inp.Value = "Google Search") Then

                    inp.Click

                    BeginSearch = True

                    Exit For

                End If

            Next inp

        End If

    End If

End Function

The second line, Set SearchBarEl = seleniumObj.FindElementById("lst-ib", timeout:=1000), in the code above is setting an object equal to our reference for the google search bar input. If it doesn’t exist or it cannot get the reference for some reason, we want to issue an error alert and pass back a “False” return value. The seleniumObj.FindElementById call is used because we are looking for the id of that google search bar, our id was named “lst-ib.” The timeout option just tries to give it a set amount of time before it will fail. In more complex automation I would create more safety measures to check the browser for lag, ensure that if it encountered too much lag it failed gracefully. The timeout option is a quick way to help in that department.

If everything is okay, we issue the order to populate that input with the text contained in the text box on our Excel sheet (SearchBarEl.SendKeys SearchBox.Text).

Finding our button is set up kind of the same way, only we are looking for the HTML input tags. We use seleniumObj.FindElementsByTag and put in “input” to let it know that we want the input tag. If it successfully finds any input tag, the code will be allowed to proceed executing (Your homework will be to add a check in for the specific button we are looking for).

For Each inp In SearchButton

If (inp.Value = "Google Search") Then

inp.Click

Exit For

End If

Next inp

We will now loop through every single HTML input tag looking for a value of “Google Search” contained in it. When we find it, we issue a click order with inp.Click and then exit the loop.

This creates the equivalent of you going to Google and entering in your search term and hitting the button to search.

7. Create the ReturnSearchData Function

We need to now get all of our search results, shove them back into Excel, and take our screen shot. The following code will take care of that.

Private Sub ReturnSearchData()

    Dim rowCounter As Integer

    rowCounter = 6

   

    Set ResultsList = seleniumObj.FindElementsByClass("g", timeout:=500)

    If (ResultsList Is Nothing) Then

        MsgBox "Unable to retrieve search results"

    Else

        For Each Result In ResultsList

            ThisWorkbook.Sheets("Google Search").Range("B" & CStr(rowCounter)).Value = Result.Text

            rowCounter = rowCounter + 1

            If (rowCounter = 17) Then

                Exit For

            End If

        Next Result




        seleniumObj.TakeScreenshot.SaveAs (ActiveWorkbook.Path + "/Auto_Screenshot_" + SearchBox.Text + ".jpg")

    End If

End Sub

We create an object called ResultsList and set it equal to all elements that have the class attribute of “g.” If you examine the Google search results page, all of the search results have this and we want them all. We check to make sure ResultsList has something in it, we aren’t being picky, just as long as it contains something. If it does contain something, we loop through each element and place it into the Excel file beginning at cell B6 with this line here: ThisWorkbook.Sheets("Google Search").Range("B" & CStr(rowCounter)).Value = Result.Text.

If we wanted to return the web address only of each search result, we could change the line Set ResultsList = seleniumObj.FindElementsByClass("g", timeout:=500) to this Set ResultsList = seleniumObj.FindElementsByTag("cite", timeout:=500). It is a little bit neater with that alteration:

This is the line that takes the actual screen shot: seleniumObj.TakeScreenshot.SaveAs (ActiveWorkbook.Path + "/Auto_Screenshot_" + SearchBox.Text + ".jpg"). We just tell the seleniumObj to take the screen shot and save, sending it to the same location as the Excel file.

8. Program the button to actually do something

All that is left is just using the subroutine for the button to call all of the other functions and subroutine you created.

Private Sub BeginAutomation_Click()

    If (NavigateToWebPage()) Then

         seleniumObj.Wait 500

        If (BeginSearch()) Then

            seleniumObj.Wait 500

            Call ReturnSearchData

        End If

    End If

End Sub

If you find things are not working as expected, it could be an issue related to the wait timers. seleniumObj.Wait 500 can be increased if your web page or search results are not loading fast enough.

Now you can test it out and make sure it actually works. Don’t get frustrated if it doesn’t work the first time, or the second (or the thirtieth). Automation can be a pain sometimes, and automating through the DOM can be a big trial and error process.

9. Homework

The best way to learn something is to play around with it, so I will give you a few ideas of things to play around with

A. Check to make sure we have an actual button in BeginSearch()

The code in BeginSearch() that was created in step 5 only looks to make sure it contains some sort of input. There is no check in place to make sure it actually finds an input tag with a value of “Google Search”

For Each inp In SearchButton

If (inp.Value = "Google Search") Then

inp.Click

Exit For

End If

Next inp

B. Carry out multiple searches

The application can only search one thing at a time. With what is provided above, we could add a couple of text boxes or read from a column of cells and have the automation repeat over and over again searching for multiple terms and taking screenshots of them. All you need is some of the code above, a loop and some search terms.

C. See if you can get it to work on a different search engine

I have actually never tried this one myself. But in theory, the general steps laid out above should work for other search engines. Yahoo, Bing, or Duck Duck Go should work mostly the same way with some alterations in the ids or tags used.

LEAVE A REPLY

Please enter your comment!
Please enter your name here