Detecting Fake Real Estate Listings: A Step-by-Step Guide to Building a Detection System

With over $1 billion annual losses in the United States alone, real estate fraud is a growing concern. Unfortunately, as fraud becomes more common on platforms like Airbnb, Craigslist, and Zillow, it threatens to cast a shadow over housing accessibility and the real estate industry as a whole. The increase in fraudulent activity emphasizes the urgency to develop effective detection methods. These systems are crucial for both the platforms themselves as well as their users, offering a safeguard against scams.

In previous articles, we explored how to extract data from real estate platforms like Zillow, Craigslist, and Airbnb using The Social Proxy’s mobile proxies. In this article, we’ll go a step further and demonstrate how to use data to build a detection system. Leveraging the Image Geolocation API, we’ll show you how to verify the authenticity of listings by cross-referencing location data with images.

What is fake real estate listing detection, and why does it matter?

A fake real estate listing is a fraudulent advertisement designed to deceive buyers or renters by promoting non-existent or misrepresented properties. Scammers often lure victims with unusually low prices, push for urgent decisions, request upfront payments, or use inconsistent or generic photos. These surreptitious tactics aim to extract money or personal information from unsuspecting individuals, making them vulnerable to identity theft and severe financial losses.

Manually checking for fake listings is challenging and time-consuming. The vast number of properties listed on platforms like Airbnb, Craigslist, and Zillow makes it nearly impossible for individuals or even entire companies to manually verify every detail. Scrutinizing various data points (e.g. property descriptions, photos, pricing, and location details) in order to identify fake listings is not only labor intensive, but highly susceptible to human error. That said, setting a detection system in place can help automate this process and analyze large volumes of data to quickly spot potential red flags.

How to build a fake real estate listings detection system

Step 1: Set up your environment

In this tutorial, we’ll use two essential tools from The Social Proxy: the mobile proxy and the Image Geolocator API. The mobile proxy masks our identity while scraping real estate platforms, so we can bypass anti-scraping mechanisms. Once we have collected the listing images, we’ll use the Geolocate Lookup API to verify the locations associated with these images, ensuring they match the locations provided in the listings.

Follow these steps to gain access to The Social Proxy’s mobile proxy server:

  • Log in to The Social Proxy dashboard:
    • Navigate to The Social Proxy website and log in with your credentials.
    • If you don’t have an account, create one by clicking on the “Sign Up” button and following the registration process.
  • Purchase a mobile proxy:
    • Once logged in, locate the “Buy Proxy” option in the dashboard menu.
    • Choose a mobile proxy plan that suits your needs and complete the purchase process.
  • Access your proxy details:
    • Go to the “Proxies” section in the dashboard. Go to the “Proxies” section in the dashboard. You’ll be able to view the details of your purchased proxy, including the server address, port, username, and password.

Obtain your Consumer Key and Consumer Secret for the Geolocate Lookup API
You’ll need access to the Geolocate Lookup API, which will allow you to verify the locations associated with images from real estate listings. Here’s how to obtain the necessary API credentials:

  • Navigate to the Scraper API Section:
    • On The Social Proxy dashboard, find and click on the “Scraper API” tab in the main menu.
    • On the page, click on “Scraper API”
  • Generate your API Keys:
    • Click on the “Generate API KEY” button
  • Store your credentials securely.
    • Copy the Consumer Key and Consumer Secret to a secure location. You’ll need them to integrate the Geolocate Lookup API into your detection system.

Step 2: How to collect data from real estate platforms

In this blog, you’ll find detailed articles on scraping major real estate platforms: Airbnb, Craigslist, and Zillow. These platforms are known for their vast databases of property listings, making them ideal sources for our detection system. You can find the articles here:

For this blog, we’ll use data from Zillow for our detection system. Use this NodeJS code to scrape Zillow using The Social Proxy’s mobile proxy:

				
					const fetch = require('node-fetch');
const { HttpsProxyAgent } = require('https-proxy-agent');

// Proxy configuration
const proxyConfig = {
  host: 'your_proxy_host',
  port: 'your_proxy_port',
  username: 'your_proxy_username',
  password: 'your_proxy_password'
};

// Create proxy agent
const proxyAgent = new HttpsProxyAgent(`http://${proxyConfig.username}:${proxyConfig.password}@${proxyConfig.host}:${proxyConfig.port}`);

async function fetchZillowListings() {
  try {
    const response = await fetch(
      'https://www.zillow.com/async-create-search-page-state',
      {
        method: 'PUT',
        headers: {
          "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36",
          accept: '*/*',
          'accept-language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
          'content-type': 'application/json',
          'sec-ch-ua':
            '"Chromium";v="128", "Not;A=Brand";v="24", "Google Chrome";v="128"',
          'sec-ch-ua-mobile': '?0',
          'sec-ch-ua-platform': '"macOS"',
          'sec-fetch-dest': 'empty',
          'sec-fetch-node': 'cors',
          'sec-fetch-site': 'same-origin',
          Referer: 'https://www.zillow.com/los-angeles-ca/',
          'Referrer-Policy': 'unsafe-url',
        },
        body: JSON.stringify({
          searchQueryState: {
            pagination: {},
            isMapVisible: true,
            mapBounds: {
              west: -118.88551790039062,
              east: -117.93794709960937,
              south: 33.870121297819686,
              north: 34.17175136070247,
            },
            regionSelection: [{ regionId: 12447, regionType: 6 }],
            filterState: {
              sortSelection: { value: 'globalrelevanceex' },
              isAllHomes: { value: true },
            },
            isListVisible: true,
          },
          wants: { cat1: ['listResults', 'mapResults'], cat2: ['total'] },
          requestId: 3,
          isDebugRequest: false,
        }),
        agent: proxyAgent // Add the proxy agent here
      }
    );

    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }

    const data = await response.json();

    // Extract and return the listings
    return data.cat1.searchResults.mapResults;
  } catch (error) {
    console.error('Error fetching Zillow listings:', error);
    return null;
  }
}

// Function to display the listings
function displayListings(listings) {
  if (!listings) {
    console.log('No listings found or error occurred.');
    return;
  }

  listings.forEach((listing, index) => {
    console.log(`Listing ${index + 1}:`);
    console.log(`Address: ${listing.address}`);
    console.log(`Price: $${listing.price}`);
    console.log(`Bedrooms: ${listing.beds}`);
    console.log(`Bathrooms: ${listing.baths}`);
    console.log(`Square Feet: ${listing.area}`);
    console.log(`Zillow Link: ${listing.detailUrl}`);
    console.log(`Image URL: ${listing.imgSrc}`);
    console.log('---');
  });
}

// Main function to run the script
async function main() {
  const listings = await fetchZillowListings();
  displayListings(listings);
}

main();

				
			

Before running the code:

  • Install node-fetch and https-proxy-agent packages
  • Replace ‘your_proxy_host’, ‘your_proxy_port’, ‘your_proxy_username’, and ‘your_proxy_password’ with your actual proxy details.
  • Ensure that the user-agent, sec-ch-ua-platform, matches that of your device. (See the image below to learn how to locate them.)

Output:

The code will scrape over 300 listings data, which you can use for your detection system.

Step 3: Analyze the listing data

When detecting fake real estate listings, several key data points can be used to analyze and identify suspicious listings like listings with unrealistic prices, inconsistent property details, etc. However, to build our fake real estate listings detection system, we primarily need the image URL that will be passed into the image Geolocation API and the listing address on the real estate platform.

  • Location: The address of the property provides vital clues. Fake listings often provide inaccurate or vague locations. Geolocation verification using the property’s image metadata (like GPS coordinates) helps detect discrepancies.
  • Images: Images are one of the most significant data points. Analyzing the quality, consistency, and geolocation of images can reveal if they’ve been reused from other listings or locations. Using image geolocation tools like The Social Proxy’s Image API can help detect if the images correspond to the claimed property location.

Step 4: Geolocalization and image analysis

Now we need to utilize The Social Proxy’s Image Geolocation API to verify whether the location data provided in real estate listings, such as those scraped from Zillow aligns with the geographical information extracted from the listing’s images. By analyzing the images using the API, we can compare the coordinates of the locations tagged in the images with the addresses listed on these platforms, helping us determine if a listing is potentially fraudulent.
Below is an example of how to use The Social Proxy’s Geolocation API:

				
					var request = require('request');
var fs = require('fs');

// Read and encode the image file as Base64
var imagePath = 'path/to/real-estate-image.jpg';
var imageBase64 = fs.readFileSync(imagePath, { encoding: 'base64' });

var options = {
  method: 'POST',
  url: 'https://thesocialproxy.com/wp-json/tsp/geolocate/v1/image?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    image: imageBase64,
  }),
};

request(options, function (error, response) {
  if (error) throw new Error(error);
  var jsonResponse = JSON.parse(response.body);
  console.log(jsonResponse);
});

				
			

The Geolocation API will return a JSON response with a list of potential locations (geo_predictions), each one contains:

  • Coordinates: Latitude and longitude values.
  • Address: Predicted address based on the image.
  • Similarity scores: Scores indicating how closely the prediction matches the real location.

Here’s an example response from the API:

				
					{
  "data": {
    "status": 200,
    "message": "Success",
    "geo_predictions": [
      {
        "coordinates": [26.98012924194336, -82.20880126953125],
        "address": "478 Daytona Dr, Port Charlotte, FL 33953, USA",
        "similarity_score_1km": 0.95
      },
      {
        "coordinates": [26.979555130004883, -82.21481323242188],
        "address": "4310 Club Dr, Port Charlotte, FL 33953, USA",
        "similarity_score_1km": 0.433
      }
    ]
  }
}

				
			

Now that you’ve scraped real estate listings, you’ll need to integrate the Geolocation API into your code for the fake listing detection system:

First, we need to set up the request to the Geolocation API. We use the request library in Node.js to make the API call:

				
					function geoLocateImage(base64Image) {
  return new Promise((resolve, reject) => {
    const options = {
      'method': 'POST',
      'url': 'https://thesocialproxy.com/wp-json/tsp/geolocate/v1/image',
      'qs': {
        'consumer_key': process.env.CONSUMER_KEY,
        'consumer_secret': process.env.CONSUMER_SECRET
      },
      'headers': {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        "image": base64Image
      })
    };

    request(options, function (error, response) {
      if (error) {
        console.error('Error in geoLocateImage:', error);
        reject(error);
      } else {
        try {
          const parsedBody = JSON.parse(response.body);
          resolve(parsedBody);
        } catch (parseError) {
          console.error('Error parsing API response:', parseError);
          reject(parseError);
        }
      }
    });
  });
}

				
			

Note: Make sure you create a .env file and include your consumer key and consumer secret key

Now we have to create a function to process each listing, download its image, convert it to base64, and send it to the Geolocation API. The Geolocation API will then analyze the image and provide multiple geo_predictions — each prediction representing a potential location where the image was taken, along with a similarity score indicating how close the prediction is to the true location.

If any of the geo_predictions closely match the Zillow listing’s address (e.g. a similarity score higher than 0.8), the listing can most likely be considered legitimate. If none of the predictions match or the similarity scores are low, the listing should be treated with suspicion.

				
					async function processListing(listing) {
  try {
    console.log(`Processing listing: ${listing.address}`);
    
    // Download the image
    const response = await axios.get(listing.imgSrc, { responseType: 'arraybuffer' });
    const base64Image = Buffer.from(response.data, 'binary').toString('base64');

    // Get geo-location from the image
    const geoData = await geoLocateImage(base64Image);

    if (geoData.data && geoData.data.geo_predictions) {
      const matchedPrediction = geoData.data.geo_predictions.find(prediction => 
        compareAddresses(listing.address, prediction.address) && prediction.similarity_score_1km > 0.8
      );

      if (matchedPrediction) {
        return {
          ...listing,
          verificationResult: "This listing is likely not fake. The image location matches the listing address.",
          matchedPrediction
        };
      } else {
        return {
          ...listing,
          verificationResult: "Exercise caution with this listing. The image location doesn't match the listing address. It's recommended to visit the property in person before making any payments."
        };
      }
    } else {
      return {
        ...listing,
        verificationResult: "Unable to verify this listing. The geolocation data is incomplete or missing."
      };
    }
  } catch (error) {
    console.error(`Error processing listing: ${listing.address}`, error);
    return {
      ...listing,
      verificationResult: `Error occurred while processing: ${error.message}`
    };
  }
}

				
			

We need a function to compare the location data from the listing with the geolocation data from the API.

				
					function compareAddresses(zillowAddress, apiAddress) {
  const zillowParts = zillowAddress.toLowerCase().split(',').map(part => part.trim());
  const apiParts = apiAddress.toLowerCase().split(',').map(part => part.trim());

  return zillowParts.some(part => apiParts.includes(part));
}

				
			

Finally, we’ll create a function to process all our listings. In this blog, we’ll process the first ten listings.

				
					async function analyzeListings(listings) {
  // Process 10 listings
  const listingsToProcess = listings.slice(0, 10);
  const analyzedListings = await Promise.all(listingsToProcess.map(processListing));
  return analyzedListings;
}

				
			

To keep track of our analysis, we’ll save the results to a CSV file:

				
					async function saveToCSV(analyzedListings) {
  const csvWriter = createCsvWriter({
    path: 'analyzed_listings.csv',
    header: [
      {id: 'address', title: 'Address'},
      {id: 'price', title: 'Price'},
      {id: 'beds', title: 'Bedrooms'},
      {id: 'baths', title: 'Bathrooms'},
      {id: 'area', title: 'Square Feet'},
      {id: 'detailUrl', title: 'Zillow Link'},
      {id: 'verificationResult', title: 'Verification Result'},
      {id: 'matchedPrediction', title: 'Matched Prediction'}
    ]
  });

  const records = analyzedListings.map(listing => ({
    ...listing,
    matchedPrediction: listing.matchedPrediction ? JSON.stringify(listing.matchedPrediction) : ''
  }));

  await csvWriter.writeRecords(records);
  console.log('The CSV file was written successfully');
}

				
			

Now we can integrate this analysis into our main script using the following code:

				
					async function main() {
  console.log("Fetching Zillow listings...");
  const listings = await fetchZillowListings();
  
  if (!listings) {
    console.log("Failed to fetch listings. Exiting.");
    return;
  }

  console.log(`Fetched ${listings.length} listings. Processing the first 10...`);
  
  console.log("\nAnalyzing listings...");
  const analyzedListings = await analyzeListings(listings);
  
  console.log("\nAnalysis Results:");
  displayListings(analyzedListings);

  console.log("\nSaving results to CSV...");
  await saveToCSV(analyzedListings);
}

main();

				
			

Once you run the code, you’ll receive an output saved in a CSV file that analyzes each listing and predicts which listings are more likely to be real or fake.

From the output, we can see that only one listing is likely to be real. Others may be also, but we highly recommend visiting property in person before making any financial commitments.

Conclusion

Detecting fake real estate listings is useful for both buyers and realtors to avoid scams and misinformation. Throughout this article, we’ve explored key steps to identify fraudulent listings, from scraping platforms like Zillow to using The Social Proxy’s mobile proxy and Geolocation API to verify the legitimacy of images and property details. By implementing these techniques, users can enhance their detection systems and minimize risks in real estate transactions. Create an account with The Social Proxy to get a free trial and check out The Social Proxy’s Image Geolocation API, proxies, and more. Feel free to schedule a call with customer support for unique use cases.

Accessibility tools

Powered by - Wemake