How to Scrape Reviews from Trustpilot

How to Scrape Reviews from Trustpilot

In a world where many businesses compete within the same niche, positive customer reviews can be invaluable in making a company stand out while providing important feedback that can be used to improve a business. Trustpilot is a world-leading online review platform with data from over 890,000 businesses, making it a central hub for reputation management and market analysis. With companies receiving increasingly more reviews each day, it can be hard to keep up. That’s where web scraping comes in and can be used to automate the collection and storage of data.

Web scraping is the automated extraction of data from the web using a programming language of your choice. Since web scraping can cause websites to slow down, it’s common to implement CAPTCHAs, rate limiting, or IP blocking to help detect and minimize suspicious activity like frequent requests or large-scale data access. That’s when proxies come into play. Proxies shield your IP address and prevent it from being banned or blocked during scraping. In this tutorial, you’ll learn how to use the Selenium web scraping tool with Python to overcome Trustpilot’s anti-scraping defenses, analyze the data, and export it into a CSV file for further analysis.

Understanding Trustpilot’s website layout

To find a review for the company of your choice, follow these steps:

1. Visit Trustpilot’s website and enter a company’s name in the search bar 

2. A section titled “Reviews” will appear, showing ratings from 1 to 5 stars.

3. Click “Filter” to filter reviews by star rating, date, popular mentions, location, etc.

4. To filter reviews by verified status or those with replies, select the review option.

After making your selection, you’ll see each review arranged in white boxes, listed by date. Important review data includes:

  • Review text
  • Reviewer name
  • Reviewer location
  • Date of experience
  • Review post date

To scrape this data, you need to pay attention to HTML tags such as <div>, <span>, and <p>, which help identify the class names required to extract these elements. Follow these instructions to locate HTML tags using browser developer tools such as Chrome Developer Tools:

  • Right-click on the review element you want to extract (e.g. reviewer name)
  • Click “Inspect”
  • A section showing the necessary HTML tags associated with the data will appear.
  • Tip: You can inspect other review data by pressing Ctrl + Shift + C before selecting another review element.

It’s important to consider that Trustpilot uses dynamic loading, meaning reviews only fully load when a user scrolls down the page. Also, pagination on Trustpilot requires using the “Next” button or clicking the next page number to advance.

A step-by-step guide to scraping Trustpilot reviews using The Social Proxy

Step 1: Set up the environment

To start scraping Trustpilot’s data, you’ll need to set up the following tools:

  • Programming language: Python
    If you haven’t used Python before, download the official Python installer, run it, and click install.
  • Scraping tool: Selenium
    Selenium easily navigates through Trustpilot’s dynamic page loading. Unlike other web scrapers, it works with WebDrivers. Download your preferred WebDriver, we’ll use ChromeDriver in this tutorial, and ensure its ‘.exe’ file is in the same path as your code file for this tutorial.

Step 2: Write the script and extract review data

Set up Selenium in a virtual environment with the necessary modules:

				
					# Import necessary modules for webscraping
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import (
    NoSuchElementException,
    TimeoutException,
    ElementClickInterceptedException,
) #To handle common Selenium errors that come up
from selenium.webdriver.common.action_chains import ActionChains
Import time
Import csv
Import zip

				
			

Import time and import csv modules will be referenced later in the code. The time and csv modules will be used later in the code.

Now you’ll need to use your mobile proxy credentials. Click here for more information about how to set up a mobile proxy with The Social Proxy.

Mobile proxies can effectively scrape sites with strict security. Once you have access to a mobile proxy, use variables to set up your proxies as shown:

				
					proxy_host = "miami1.thesocialproxy.com"  # Example: "us.socialproxy.com"
proxy_port = "10000"  # Example: "12345"
proxy_username = YOUR_USERNAME
proxy_password = YOUR_PASSWORD

proxy_url = f"http://{proxy_username}:{proxy_password}@{proxy_host}:{proxy_port}"
				
			

Configure Selenium to use your proxy with selenium_wire:

				
					# Configure proxies with Selenium_wire
seleniumwire_options = {
    "proxy": {
        "http": proxy_url,
        "https": proxy_url
    }
}
				
			

Configure Chrome Options for Selenium WebDriver:

				
					# Configure Chrome Options 
chrome_options = Options()
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--ignore-ssl-errors")

				
			

Initialize Chrome WebDriver:

				
					# Initialize Chrome WebDriver
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://www.trustpilot.com/review/thesocialproxy.com")
				
			

Scroll to Elements and Close Cookie Banners. Since Trustpilot uses dynamic loading, write a function to scroll to a specific element:

				
					# Function to scroll through elements
def scroll_to_element(element):
    actions = ActionChains(driver) #To perform complex actions in Selenium
    actions.move_to_element(element).perform()
				
			

Handle cookie banners with Selenium:

				
					def close_cookie_banner():
    try:
        cookie_button = WebDriverWait(driver, 5).until(
            EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))
        )
# Locates cookie button by ID and handles potential errors 
        cookie_button.click()
        print("Cookie banner closed")
    except (NoSuchElementException, TimeoutException):
        print("No cookie banner found or unable to close it")
				
			

To scrape reviews across multiple pages, use the following function:

				
					def click_next_page():
    try:
        next_button = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.NAME, "pagination-button-next"))
        )
        scroll_to_element(next_button)

# Locates the next button by NAME, function, and handles potential errors

        try:
            next_button.click()
        except ElementClickInterceptedException:
# If normal click fails, try using JavaScript
            driver.execute_script("arguments[0].click();", next_button)
        return True
    except (NoSuchElementException, TimeoutException):
        print("Next page button not found.")
        return False
				
			

To extract review data:

				
					def get_reviews():
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located(
                (By.CLASS_NAME, "styles_reviewsContainer__3_GQw")
            )
        )
# Locates reviews container, finds elements, and prints elements’ texts
        elements = driver.find_elements(By.CLASS_NAME, "styles_reviewCardInner__EwDq2")

        print(f"Number of reviews found: {len(elements)}")

        for el in elements:
            try:
                head = el.find_element(
                    By.CSS_SELECTOR,
                    ".typography_heading-s__f7029.typography_appearance-default__AAY17",
                )
                content = el.find_element(By.CLASS_NAME, "styles_reviewContent__0Q2Tg")
                reviewer = el.find_element(By.CLASS_NAME, "link_internal__7XN06")
                date_posted = el.find_element(By.CLASS_NAME, "styles_reviewHeader__iU9Px")


                print(f"Title: {head.text}")
                content_text = content.text
                reviewer_text = reviewer.text
# To separate reviewer's location and date of experience
                reviewer_text_array = reviewer_text.split('\n')
                content_text_array = content_text.split('\n')
                date_of_experience = content_text_array[-1]
                location = reviewer_text_array[-1]
                print(f"Content: {content.text}")
                print(f"Reviewer: {reviewer.text}")
                print(f"Date Posted: {date_posted.text}")
                print("------------------------------------------------------------------")
				
			

Write a “for” loop to scrape through multiple pages of reviews:

				
					    for i in range(3): 
# 3 is the number of pages you want to scrape so edit edit as needed
        print(f"PAGE NUMBER {i + 1}")

        get_reviews()

        if not click_next_page():
            print("No more pages to scrape.")
            break

# Wait for the page to load after clicking next
        time.sleep(5)  #this is why the time module was imported
				
			

Close the server when scraping is complete

driver.quit ()

Congratulations! You have successfully scraped your desired data from Trustpilot!

Step 3: Store and analyze data

Extracting data will allow you to analyze information. To prepare the data for analysis, you’ll need to store it in a CSV file.

Add the following code above the get_reviews() function, ensuring that get_reviews() is at an indentation level lower than the CSV code, as shown below

				
					# Create csv file 
with open('review_1.csv', mode='w', newline='', encoding='utf-8') as csv_file:
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(["Title", "Content", "Reviewer", "Date Posted", "Date of Experience", "Location"])

    def get_reviews():
				
			

Add the following code at the end of the get_reviews() function:

				
					print("------------------------------------------------------------------")
# Write to CSV
                csv_writer.writerow([head.text, content.text, reviewer.text, date_posted.text, date_of_experience, location])  
# Write data to CSV
            except (NoSuchElementException, StaleElementReferenceException) as e:
                print(f"Error extracting review details: {str(e)}")



				
			

If there are any errors, handle them with:

				
					# Error handling 
 except (NoSuchElementException, StaleElementReferenceException) as e:
                print(f"Error extracting review details: {str(e)}")
				
			

This creates a CSV file with the necessary information. The review data is now at your fingertips to do as you wish.

Conclusion

Trustpilot is a trusted platform for reviewing companies worldwide. This article provides a step-by-step guide to help you scrape Trustpilot’s reviews using The Social Proxy’s mobile proxy, store the data in a CSV file, and perform further analysis.

FAQs about scraping Trustpilot reviews

How do I extract reviews from Trustpilot?

You can extract reviews from Trustpilot using your preferred scraping tool, such as Puppeteer, BeautifulSoup, or Selenium. Having development experience, or hiring a developer, will make the process simpler, especially when you need proxies.

Why should I scrape Trustpilot reviews?

Trustpilot reviews are valuable for gathering customer insights, evaluating a company’s communication, and identifying necessary product improvements. Someone may have already shared feedback about a service you’re considering—why not check it out?

What data can you scrape from Trustpilot?

There is a plethora of data you can scrape from Trustpilot, but for reviews, the most relevant include review text, ratings, and reviewer details. Other scrapable data is discussed in the section on understanding Trustpilot’s layout.

Is it legal to scrape Trustpilot?

According to the General Data Protection Regulation (GDPR), web scraping is legal. However, the businesses you scrape from may impose restrictions on how you can use their data, especially for commercial purposes. It’s legal to scrape Trustpilot, but it’s a good idea to check their terms and conditions for any limitations.

Does Trustpilot allow web scraping?

Yes, Trustpilot allows web scraping, but high-frequency scraping can result in your IP getting blocked. We highly recommend using The Social Proxy’s mobile proxy to prevent your IP address from getting blocked.

Leave a Reply

Your email address will not be published. Required fields are marked *

Accessibility tools

Powered by - Wemake