How to Scrape Real Estate Listings from Airbnb Using The Social Proxy’s Mobile Proxy

Airbnb started out as a simple way to let conference-goers find accommodations, but has since redefined short-term renters. With more than 5 million hosts worldwide, 2023 accounted for more than 448 million stays. That amounts to a plethora of data regarding listings, how much people are willing to pay per night, or even the quality of Wifi.

Scraping Airbnb data involves the use of several heuristic algorithms that detect detailed scraper fingerprints such as IP bans, CAPTCHAs, and dynamic content loading. By rotating proxies and masking your IP address, The Social Proxy ensures smooth data collection without interruption or detection. This tutorial will walk you through leveraging mobile proxies to efficiently scrape Airbnb listings and extract critical information such as property prices, locations, and key features. With these tools, you’ll be able to gather real estate data without worrying about getting blocked, making the process seamless and reliable.

Why scrape real estate listings from Airbnb?

Airbnb plays a major role in the real estate sector with over 7.7 million global listings and over half a billion guests that utilize Airbnb services. Airbnb hosts have collectively earned more than $65 billion. In 2023 Airbnb doubled pre-pandemic revenues, accounting for a significant chunk of impact on the global rental market. This trend indicates a rising demand for short term rentals with a rise of 55 million bookings.

Extracting data from Airbnb provides critical insights for real estate professionals, market analysts, and investors. It also helps them compare rental prices across different regions to identify trends and opportunities, analyze market shifts in property availability and consumer demand, and track key amenities and location preferences to understand what attracts guests.

Airbnb scraping challenges

Airbnb currently employs several measures to prevent automated scraping. These are some of the common scraping obstacles:

  • IP bans: Airbnb monitors IP addresses and can block those exhibiting unusual patterns of activity.
  • CAPTCHAs: These are frequently triggered, making it difficult to scrape.
  • Dynamic content: Airbnb’s data often loads dynamically through JavaScript, making extraction more complicated.

These hurdles typically result in traditional scraping methods getting blocked. Proxy solutions like The Social Proxy’s mobile proxies are essential in order to bypass these issues and offer a reliable way to scrape without triggering alarms.

Introduction to The Social Proxy

The Social Proxy offers two primary products: mobile proxies and the Scraper API. While the Scraper API enables access to rich social network data, this guide will focus on using the mobile proxies to scrape Airbnb listings efficiently. Mobile proxies allocate IP addresses from a device that offers a higher degree of anonymity than conventional proxies. This is because they utilize genuine mobile IP addresses, which makes them less susceptible to getting flagged or banned by Airbnb’s system. The Social Proxy’s mobile proxies rotate automatically, ensuring that you don’t get blocked while scraping even large volumes of data.

Why use a mobile proxy?

Proxies serve as go-betweens for your scraping script and Airbnb by concealing your IP address to prevent detection. But not every proxy is of such caliber.

Mobile proxies imitate users by directing traffic through devices instead of data center or residential proxies, making them more challenging to detect. They provide high anonymity and rotating IPs, which means your scraper can make thousands of requests without raising red flags. Mobile proxies from The Social Proxy also support IP rotation, providing a powerful solution for scraping complex websites like Airbnb.

Step-by-step guide to scraping Airbnb listings using The Social Proxy's mobile proxy

Step 1: Set up your environment

In order to start scraping Airbnb data, you’ll need to set up the following tools in your development environment:

  • Programming language: Python (along with BeautifulSoup, Requests). We’ll set up a virtual environment with the necessary libraries to scrape the listing data from airbnb.com
  • The Social Proxy mobile proxy: Sign up and log in to The Social Proxy dashboard to access your mobile proxies.
  • Access your dashboard by clicking on “Buy Proxy” to select a plan.

Choose a plan: In the buy proxies page, select “Scraper API,” choose your subscription type, and click “Checkout.”

Once your proxies have been created, you will be redirected to the proxies page

Once you’ve finished setting up the proxy, you can use a proxy switcher service. For this example, we’ll use a BP Proxy switcher Chrome extension. Copy the following details from the dashboard:

  • Hostname
  • Username
  • Password
  • Port

Next, open the proxy switching service (BP proxy switcher in this case), and enter your new proxy in the relevant field

Use the following format in the proxy switching service
host:port:user:pass
(You can also copy the details from the Dashboard.)

Once added, select the proxy of your choice.

You can verify the proxy is working and check for speeds here.
Once the proxy is set up, we can continue to set up the development environment.

Use the following code to set up the Python virtual environment:

Once added, select the proxy of your choice.

You can verify the proxy is working and check for speeds here.
Once the proxy is set up, we can continue to set up the development environment.

Use the following code to set up the Python virtual environment:

Install necessary libraries:

Step 2: Analyze Airbnb’s website structure

Now we’ll focus on the following attributes from the Airbnb listings page:

  • Title
  • Description
  • Beds/cancellation info
  • Average rating
  • Total review count
  • Price per night
  • URL for images

To start scraping, you’ll need to navigate to the location you’re interested in in the Airbnb search page. In this example, we’ll use Airbnb options in Townsend, TN in the United States.

You can specify the check-in and check-out dates by marking the “Check in” and “Check out” parameters with your preferred dates in “YYYY-MM-DD” format. You can add other parameters like picking only guest favorite properties and more. For example, you could specify your query URL like this:

https://www.airbnb.com/s/Townsend–Tennessee–United-States/homes?checkin=2024-11-28&checkout=2024-12-05&guest_favorite=true

This is the endpoint where all property listings for Honolulu are present. We are interested in the cards in which properties are listed. They contain information like pricing, description, services available, and developer tools in your browser.

To open the developer tools in your browser, click on the three dots at the rightmost corner. Select “More Tools,” then “Developer Tools.” This will give you access to the web page’s source code.

Hover around one of the cards in the above properties, right-click, and select “Inspect.” This will show you exactly where the building blocks of those elements are present.

Identify the HTML elements containing the data we want to scrape. By right-clicking on the page and selecting “Inspect,” we can correlate which <div> contains the information we need.

Note: Remember that the class names and structure of the Airbnb website may change over time, so it’s important to regularly update your scraping script to ensure it continues functioning correctly.

Step 3: Build the scraper

Now we need to look at the structure of the HTML to find the correct div and span classes that will provide necessary listing details. We’ll need to import BeautifulSoup, Selenium, and Requests. We will also use pandas for data manipulation later on. Airbnb (and other modern websites) relies heavily on JavaScript to dynamically load the content, especially for listings, images, and other interactive elements. When you scrape a website without JavaScript execution, you’re often left with only the static content. Hence we use Selenium to open a browser, render JavaScript, and then scrape the fully loaded webpage content.

We’ll start by adding the necessary imports.

				
					from selenium import webdriver
from bs4 import BeautifulSoup
import time
import pandas as pd
import re
				
			

Then we’ll set up Selenium with ChromeDriver (or another browser driver).

				
					driver = webdriver.Chrome()  # Make sure you've installed the driver
url = 'https://www.airbnb.com/s/Townsend--Tennessee--United-States/homes?checkin=2024-11-28&checkout=2024-12-05&guest_favorite=true'

# Open the webpage
driver.get(url)

# Give it some time to fully load
time.sleep(5)
				
			

If you’re not using a proxy switcher, you can also provide the proxy URL while initializing the driver.

				
					chrome_options = Options()
chrome_options.add_argument(f'--proxy-server={proxy_url}')
driver = webdriver.Chrome(options=chrome_options)
				
			

Next we’ll parse the HTML content of the response using BeautifulSoup. This library will help us navigate and search the HTML structure easily.

				
					html = driver.page_source

# Parse the loaded HTML with BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
				
			

On this Airbnb search page, the individual listings are what we’re interested in. In order to access them, we have to define their tag types and class names. The easiest way to do that is to inspect the page with a Chrome developer tool (press F12).

Let’s start with the listing name which is encapsulated in a div object with a class t1jojoys atm_g3_1kw7nm4 atm_ks_15vqwwr atm_sq_1l2sidv atm_9s_cj1kg8 atm_6w_1e54zos atm_fy_1vgr820 atm_7l_jt7fhx atm_cs_10d11i2 atm_w4_1eetg7c atm_ks_zryt35__1rgatj2 dir dir-ltr.

We know that one search page contains 10 separate listings. We can grab all of them at once using the BeautifulSoup method find_all. To get the names of the listings, we have to look into the html. Right click on a listing name and choose “Inspect” in the browser. This will take you to the area in the html that you want to find info on listings. Expand a grid div class as shown in the image below.

This div class grid will highlight all the information shown about the listing (cabin name, beds, dates, price). From here, you can see the div tag and the class that holds the names of the cabins. We’ll use the code below to fetch all the listing names.

				
					#  Fetching all the titles of the listings
cabin_title = []
ab_cabin_title = soup.find_all('div', attrs = {'class' : 't1jojoys atm_g3_1kw7nm4 atm_ks_15vqwwr atm_sq_1l2sidv atm_9s_cj1kg8 atm_6w_1e54zos atm_fy_1vgr820 atm_7l_jt7fhx atm_cs_10d11i2 atm_w4_1eetg7c atm_ks_zryt35__1rgatj2 dir dir-ltr'})
for i in ab_cabin_title:
    cabin_title.append(i.text)
				
			

We can collect high-level information using detail pages of the listings such as name, total price, average rating, and others. All of these features are located in different HTML objects with different classes. Hence we can write multiple extractions – one per feature.

				
					# Fetching all descriptions
descriptions = []
# ab_descriptions = soup.find_all('div', attrs = {'class': 'fb4nyux atm_da_cbdd7d s1cjsi4j atm_g3_1kw7nm4 atm_ks_15vqwwr atm_sq_1l2sidv atm_9s_cj1kg8 atm_6w_1e54zos atm_fy_kb7nvz atm_7l_1he744i atm_ks_zryt35__1rgatj2 dir dir-ltr', 'data-testid':'listing-card-name'})
ab_descriptions = soup.find_all('span', attrs = {'class':'t6mzqp7 atm_g3_1kw7nm4 atm_ks_15vqwwr atm_sq_1l2sidv atm_9s_cj1kg8 atm_6w_1e54zos atm_fy_kb7nvz atm_7l_1he744i atm_am_qk3dho atm_ks_zryt35__1rgatj2 dir dir-ltr','data-testid':'listing-card-name'})

for i in ab_descriptions:
    descriptions.append(i.text)

# Fetching Bed information
beds = []
beds_div = soup.find_all('div', class_="fb4nyux atm_da_cbdd7d s1cjsi4j atm_g3_1kw7nm4 atm_ks_15vqwwr atm_sq_1l2sidv atm_9s_cj1kg8 atm_6w_1e54zos atm_fy_kb7nvz atm_7l_1he744i atm_ks_zryt35__1rgatj2 dir dir-ltr")

for div in beds_div:
    spans = div.find_all('span', class_="a8jt5op atm_3f_idpfg4 atm_7h_hxbz6r atm_7i_ysn8ba atm_e2_t94yts atm_ks_zryt35 atm_l8_idpfg4 atm_mk_stnw88 atm_vv_1q9ccgz atm_vy_t94yts dir dir-ltr")
    for span in spans:
        beds.append((span.text).strip())

# Fetching Price info
tariff = []
price_div = soup.find_all('span', class_='_11jcbg2')

# Extract and print the text from each span
for el in price_div:
    tariff.append(el.get_text(strip=True))
				
			

We also need the ratings and the total number of reviews. There are multiple ways in which you can get this information. In this example, we’ll use the span tag within the div to fetch the text.

				
					# Fetching ratings data
# Lists to store extracted data
average_ratings = []
total_reviews = []

# Regular expressions for extracting ratings and reviews
rating_pattern = re.compile(r'(\d+(\.\d+)?) out of 5 average rating')
reviews_pattern = re.compile(r',\s*(\d+)\s*reviews')

ratings_span = soup.find_all('span', class_='a8jt5op atm_3f_idpfg4 atm_7h_hxbz6r atm_7i_ysn8ba atm_e2_t94yts atm_ks_zryt35 atm_l8_idpfg4 atm_vv_1q9ccgz atm_vy_t94yts au0q88m atm_mk_stnw88 atm_tk_idpfg4 dir dir-ltr')
for i in ratings_span:
    rating_text = i.get_text(strip=True)
    rating_match = rating_pattern.search(rating_text)
    reviews_match = reviews_pattern.search(rating_text)
    if rating_match:
        average_ratings.append(float(rating_match.group(1)))
    else:
        average_ratings.append(0)
    if reviews_match:
        total_reviews.append(int(reviews_match.group(1)))
    else:
        total_reviews.append(0)
				
			

Finally we can extract the links of the listing images. For that we need to find the img tag, which holds the URL in the src property and fetch the URL of the image displayed on the search page.

				
					#Fetch Listing images    

image_links = []
images = soup.find_all('img', class_='itu7ddv atm_e2_idpfg4 atm_vy_idpfg4 atm_mk_stnw88 atm_e2_1osqo2v__1lzdix4 atm_vy_1osqo2v__1lzdix4 i1cqnm0r atm_jp_pyzg9w atm_jr_nyqth1 i1de1kle atm_vh_yfq0k3 dir dir-ltr')
for img in images:
    if img.has_attr('src'):
        image_url = img['src']
        image_links.append(image_url)
    else:
        image_url = 'No image available'
				
			

To explore more, you can visit the link to the entire codebase.

Step 4: Data extraction and storage

Now that we’ve scraped the required data, we can store it in different structured formats:

  • CSV files: Ideal for small datasets.
  • Databases: For large datasets, consider using MySQL or MongoDB for efficient storage and retrieval.

We’ll use pandas to save the data extracted to a CSV.

				
					# Create Dataframe

data = pd.DataFrame()
data['Title'] = cabin_title
data['Description'] = descriptions
data['Beds/Cancellation status'] = beds
data['Price per night'] = tariff
data['Average Rating'] = average_ratings
data['Total Reviews'] = total_reviews
data['Listing Image Link'] = image_links

data.to_csv("airbnb_scraped_data.csv")
# Close the browser
driver.quit()
				
			

We cleared any extra characters and trailing spaces in the data before using it, but we still have to check that it’s validated and clean of the following:

  • Duplicates: Some listings might appear multiple times.
  • Missing data: Ensure key fields like price and location are present.
  • Data types: Check for the type of data, especially the price and ratings.

Once the raw data has been cleaned, you can build visualizations on top of your data to understand the average listing price in your area, which descriptions work well, look at guest favorites, and see the amenities available.

The results are shown below.

Conclusion

In this article, we’ve covered the basics of scraping Airbnb listings using Python, BeautifulSoup, and The Social Proxy. We’ve gone through the steps to set up a Python environment, make requests to Airbnb, extract desired information, and save results to a CSV file, all without having to handle IP blockages and timeouts.

Please note, the code provided is an example, and you’ll need to adapt it to your specific example as the dynamic HTML elements can change. Happy scraping!

Accessibility tools

Powered by - Wemake