Python Web Scraping with Beautiful Soup (#1 Secret to Scrape Google SERPs for Free)

Python Web Scraping with Beautiful Soup

Table of Contents

Spread the love

In this Python Tutorial, we’ll do Python Web scraping with Beautiful soup. And will learn how to Scrape Google SERPs (Search Results).

So essentially what we’ll do is.

1. Search a query using Selenium Web Driver in Python

Python Web Scraping with Beautiful Soup

2. Find the div using Beautiful Soup

Python Web scraping with Beautiful soup(Finding the DIV tag)

3. Use Beautiful Soup in Python to Scrape Google SERPs (Extract and save titles and links)

Scrape Google SERPs

Advantages to Scrape Google SERPs?

This can be a small step for a Bigger Project. For example, You can expand it and prepare a full Data Study and find out What Ranks better on Google. And what Meta Descriptions, Titles, And Tags, that Ranking Content uses.

I’m personally working on, it and will definitely share it with you guys, once I complete it.

Scrape Google SERPs

Be sure to Subscribe to our email list, and YouTube Channel, if you don’t want to miss it.

Now let’s do some Python Web Scraping with Beautiful Soup.

Before moving on to the Actual Project, let’s understand in simple terms…

What is SERP scraping?

SERP stands for Search Engine Results Page) It is the process of scraping (saving) google search results using Web automation. In our case, we are doing Python Web Scraping with Beautiful Soup and Selenium. The details are discussed in the Blog Post.

How do you crawl in Google SERPs?

It depends on the Project. In our case, we are only looking for Titles and their associated links. And hence are only crawling for Titles and their Links in the Google SERPs (Search results). But more can be done, as you can also crawl meta descriptions, tags, etc.

Now let's Scrape Google SERPs by Python Web Scraping with Beautiful Soup.

Confused?

Don't worry! We'll answer all your Questions in Detail.

Here’s what we’ll be covering in detail Step-by-Step

1. Downloading & Setting up (Python + VS Code)

2. Creating a New Python File, and Setting up the IDE

3. Setting up Chrome Driver

4. Beautiful Soup Python Introduction

5. Python Web Scraping with Beautiful Soup to Scrape Google SERPs

6. Source Code for Python Web Scraping with Beautiful Soup to Scrape Google SERPs

7. Output for Python Web Scraping with Beautiful Soup to Scrape Google SERPs

If you have prior knowledge of python or have already set up Python with a relevant code editor. Then you can move directly to step 2. 

All others follow me to step one.

Setting up Python and Visual Studio Code

To take a Website Screenshot with Python, we need everything set up. Most importantly Python and a Code Editor, to be set up properly.

Here is a tutorial on How To Download And Install Python With Visual Studio Code (2022)

Follow it and Come back to this post.

Now I hope you've followed the above tutorial, and have everything set up. Let's move on!

Now before I move on and show you how to take a webpage screenshot with Python and Selenium, let us confirm that everything is working fine on your side.

By creating a New File, and printing Hello World on the screen.

Create a New Python File to Print “Hello World”

Now to see for a final time that everything is working properly.

Go to your Visual Studio Homepage.

Click on ‘New File

Then go ahead and save it in your desired location. (can be desktop or any Folder)

And you should have a running python file just like this.

Let’s Print Hello World in the output.

#
print("Hello World")
#
Printing hello world with python

Just Go to the Top left ‘Menu Bar’ and click on File, then select the Programming language as python. Press Ctrl+S or Cmd+S to save the file.

Now select where you want to save the file(choose the directory), give it a suitable name, and it will be saved.

Now I wrote a simple print command where we’re printing “hello world” on the screen using the print statement in Python.

After writing the print statement, as you see a play button on the top right corner select the drop down right next to it and select Run Python file.

And as you can see the output has successfully been displayed in the terminal.

Now that all prerequisites are out of the way.

Let’s move on to the Actual Coding Part to Scrape Google SERPs with Python.

Setting up Chrome Driver

Also, you have to download the chrome driver executable for this to Work.

To download the Chrome Driver, you have to go here.

https://www.chromedriver.chromium.org/downloads

From here you have to select a version of the driver that matches the version of the chrome browser you’re currently using.

For example, If my chrome browser version is 100, then I’ll download the chromedriver version 100

Below is my current version of Chrome.

And I’ll download this version of the Driver

After downloading the Chrome Driver, save it in a location where it is easy to access. Then Copy that path and give it in the argument. Like this????

#
driver = webdriver.Chrome("E:\TechVideos\Python\chromedriver.exe" , chrome_options=options)
#
Python Web Scraping with Beautiful Soup to Scrape Google SERPs

Chrome Driver Done. Let’s Move to Beautiful Soup.

Beautiful Soup Python Introduction

To successfully complete this project you need to have a little concept of Beautiful Soup. So let's discuss it in Simple Words.

What is Beautiful Soup?

It is an HTML Parser. This means that it makes it easier to extract valuable information or data required by the user from HTML code, it is also useful for web scraping.

Is BeautifulSoup good for web scraping?

Yes! It is the QA's and Web Automation Expert's Go-To solution for Web Scraping. Often whenever data needs to be extracted from the web, Python Experts use Beautiful Soup to get the job done!

What is the use of BeautifulSoup in Python?

It is used for Web Scraping tasks in Python. More specifically, extracting valuable information or data required by the user from HTML code. Beautiful Soup is an HTML Parser.

Python Web Scraping with Beautiful Soup to Scrape Google SERPs

I've also done a Video. I highly recommend watching it to Better Understand the Working and Visual Concept of the Project.

Python Web Scraping with Beautiful Soup to Scrape Google SERPs

We'll start by importing Selenium. Like this. Open up your Code Editor and Type as we go along.

So, for using Selenium Web Driver we need to import it by;

#
from selenium import webdriver
#

If you don’t have Selenium Installed…

Go to the top menu bar, and hover on the terminal. And from the dropdown select a new terminal. Now type PIP install Selenium in the terminal and press enter.

Selenium will be installed in a few moments. Like this????

Now we are going to include something known as Chrome options, so we can pass arguments and relevant commands to the WebDriver.

#
from selenium.webdriver.chrome.options import Options
#

To use Beautiful Soup import it by.

#
from bs4 import BeautifulSoup
#

Import Selenium Stealth

While I was doing my research I found this project on GitHub Known as Selenium stealth

  •  It will even pass all the public bot tests
  •  It can perform Google account logins if required
  •  It can even help with maintaining a normal Recaptcha V3 score

So, essentially it would look like a normal person is browsing the website.

#
from selenium_stealth import stealth
#

Now let's add some options to our Chrome browser

WebDriver is normal, We add headlessly, and Some arguments.

#
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
#

Above we've created an options object and added several arguments.

And an argument called headless

What headless means is that it will open the website within the code, and will not apparently open the website on the screen.

#
driver = webdriver.Chrome("E:\TechVideos\Python\chromedriver.exe" , chrome_options=options)
#

Above we Defined a new driver called the Chrome method in the web driver model, and the first parameter is the path of the chrome driver.

Now we will use the stealth function.

#
stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )
#

Above we've added Most Importantly, the stealth function. Arguments like renderer, platform, languages, and vendor, make it seem like a real person is browsing the website.

Then we will add the thing we want to search on Google, and Scrape Google SERPs for.

query = 'python tutorial for beginners'

We will create two lists, one for storing the titles and the other one for storing the links associated with those titles.

Also, we create a variable to specify the number of pages we want to Scrape or crawl.

links=[]
titles=[]

n_pages=5

Below web will specify a for loop that will loop from the First page of Google to the 5th page of Google SERPs (Because we want 5 pages)

Then it will look for URLs with the help of this string “http://www.google.com/search?q=” because every link in google search results has this in common.

for page in range(1, n_pages):
    url = "http://www.google.com/search?q=" + \
        query + "&start=" + str((page - 1) * 10)
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    search = soup.find_all('div', class_="yuRUbf")
    for h in search:
        links.append(h.a.get('href'))
        titles.append(h.a.h3.text)

Then we use Beautiful Soup and open up the Page Source.

From there we find the DIV that contains our Links and Titles.

Python Web scraping with Beautiful soup(Finding the DIV tag)

Finally, we print out our lists of titles and their associated URLs

for link in links:
    print(link)

for title in titles:
    print(title)

And that's it.

Source Code for Python Web Scraping with Beautiful Soup to Scrape Google SERPs

Here is the Source code for the Project.

#start
from optparse import OptParseError
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from selenium_stealth import stealth



options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(
    options=options, executable_path="E:\TechVideos\Python\chromedriver.exe")

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )


query = 'python turorial for beginners'

links=[]
titles=[]

n_pages=5

for page in range(1, n_pages):
    url = "http://www.google.com/search?q=" + \
        query + "&start=" + str((page - 1) * 10)
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    search = soup.find_all('div', class_="yuRUbf")
    for h in search:
        links.append(h.a.get('href'))
        titles.append(h.a.h3.text)

for link in links:
    print(link)

for title in titles:
    print(title)


#end

Output for Python Web Scraping with Beautiful Soup to Scrape Google SERPs

Links

Titles

Conclusion for  Python Web Scraping with Beautiful Soup to Scrape Google SERPs

So there you Go! Now you have the power to perform Web Scraping. All thanks to Python. 

I hope you learned a lot. And I urge you to further explore selenium and Beautiful Soup if you’re really into web scraping. 

Also, check out other helpful python guides on the blog.

If you have any queries, you can post them in the comments below. And we’ll be happy to help.

Take care and Happy Coding!

Check The Project On GitHub


Spread the love