How to Scrape SimplyHired Jobs Data


Web scraping is a highly-efficient method for automating data collection from websites, making it an ace for anyone working with large datasets. One popular site for web scraping is SimplyHired, a well-known job board that offers a variety of job listings across different industries and locations. Accessing data from SimplyHired can be extremely beneficial for data analysts, web developers, job seekers, and web scraping enthusiasts looking to conduct trend analyses or investigate job opportunities.

In this tutorial, you’ll learn how to scrape job listings from SimplyHired using BeautifulSoup and Selenium. We’ll also show you how to use The Social Proxy’s mobile proxy to bypass location restrictions by simulating a New York-based IP address. This method ensures you can collect accurate location-specific job data. By following this approach, you’ll avoid common scraping challenges while gathering job post information efficiently. You’ll also automate job data collection, improve your ability to analyze job trends and gain valuable insights into the job market.

Understanding SimplyHired’s website structure

Before you start scraping, make sure you familiarize yourself with the layout of SimplyHired’s job listings page. Most job listings include a job title, the employer’s name, the location, and a short job description. These components are presented in a uniform manner, making them easy to identify and scrape as seen in the image below.

When you inspect SimplyHired’s website using browser developer tools like Chrome DevTools, you can view the HTML structure that underpins these elements. To open the Developer Tools window right click on any area of a job listing and select “Inspect.” In this window, you will be able to see how the job title, employer, location, and description are structured within a particular HTML tag (such like <h2>, <span>, <div>).

Identifying these tags helps reveal patterns in the HTML structure that can be used to extract data systematically. For instance, job titles may always be nested within a tag, while employer names may reside in a <span> or <div>. Recognizing these patterns will help you refine your web scraping scripts and target the relevant HTML elements for data extraction.

SimplyHired scraping obstacles

Like most websites, SimplyHired has built-in anti-scraping mechanisms that prevent you from scraping. A common challenge is the implementation of CAPTCHAs, which are created to prevent automated bots from accessing the site’s content. In addition to this, SimplyHired applies IP rate limiting, which limits the number of requests that can be made from a single IP address over a short period of time. If too many requests are detected, your IP might get blocked. Their advanced bot detection mechanisms have the ability to recognize and obstruct scraping attempts based on browser behavior, header patterns, or request frequencies.

In order to combat these obstacles, we’ll use The Social Proxy’s mobile proxy to rotate IP addresses. It will allow you to spread your requests across several addresses, which reduces the chances of getting detected or triggering rate limits. Employing a proxy based in New York will allow you to access location-specific job listings without disclosing that you are scraping, helping maintain regional accuracy.

This approach guarantees that your scraping activities remain unnoticed while collecting precise data from SimplyHired. By using mobile proxies, you can effectively navigate these obstacles and ensure consistent access to SimplyHired job data.

Tools and setup

To start extracting job data from SimplyHired, you’ll need the following tools and libraries. Below is a list of the key components needed for this task:

  1. Programming language: You can choose either Node.js or Python as the main language for your web scraper. In this tutorial, we’ll use Python because of its flexibility and the strong libraries it offers for web scraping.
  2. Web scraping libraries: If you are a Python user, BeautifulSoup and Requests are popular options for extracting HTML data and making HTTP requests. If you prefer automated browser scraping, you can use Selenium to interact with energetic web pages.
  3. Mobile proxy: The Social Proxy offers dependable mobile proxies that help prevent blocks and ensure access to location-specific job listings on SimplyHired. Using a New York-based proxy ensures that the job data you scrape is regionally accurate.
  4. Proxy setup: Learn how to set up The Social Proxy’s mobile proxies here to obtain your credentials and configure the proxy within your scraping environment. This setup will enable you to rotate IPs and scrape SimplyHired without activating its anti-scraping defenses.

A step-by-step guide to scraping SimplyHired job data

In this section, you’ll create a web scraper for SimplyHired job listings with Puppeteer (or BeautifulSoup if you are using Python). You will also set up The Social Proxy’s mobile proxies to overcome SimplyHired’s anti-scraping measures. The objective is to gather job listings for web developers located in New York.

Step 1: Define the target URL

The initial task is to identify the URL that you will be scraping. In this scenario, the URL for SimplyHired job listings for “web developer” will appear as follows:

				
					 https://www.simplyhired.com/search?q=web+developer&l=new+york




				
			

This link provides job listings for web developers according to the location you choose. You can change the parameters (q for job title and l for location) to focus on different keywords or locations.

Step 2: Configure The Social Proxy’s mobile proxies

To prevent IP blocks, set up The Social Proxy’s mobile proxies. A mobile proxy changes IP addresses and mimics a real user, assisting you in evading bot detection. You will need your proxy credentials. Make sure your script is adjusted to route all requests through these mobile proxies.

This is an example of setting up a proxy using Puppeteer:

				
					const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: ['--proxy-server=new-york1.thesocialproxy.com:10000']
});
const page = await browser.newPage();
await page.authenticate({
username: 'yil2kajb0gwfzmt9',
password: 'kv74ungf5yx9lcis'
});
})();
				
			

Step 3: Write the script to navigate SimplyHired search results

Next, use Puppeteer to navigate to the SimplyHired search results page. Here’s a basic script that opens the SimplyHired URL and waits for the page to load.

				
					const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  // Navigate to SimplyHired search results
  await page.goto('https://www.simplyhired.com/search?q=web+developer&l=new+york', { waitUntil: 'networkidle2' });

  // Continue scraping here
})();
				
			

This script opens a browser, loads the job search results page, and waits for the network to be idle, ensuring that all content is fully loaded.

Step 4: Extract key job information

Now that you’ve accessed the job listings, you need to extract the relevant job data, such as job title, employer name, location, and job description. You can use the page.evaluate() function to interact with the page’s DOM and extract the necessary data.

Here’s an example of how you can extract those elements:

				
					const jobListings = await page.evaluate(() => {
  const jobs = [];
  document.querySelectorAll('.jobposting').forEach(job => {
    const title = job.querySelector('.jobposting-title').innerText;
    const employer = job.querySelector('.jobposting-company').innerText;
    const location = job.querySelector('.jobposting-location').innerText;
    const description = job.querySelector('.jobposting-snippet').innerText;
    
    jobs.push({ title, employer, location, description });
  });
  return jobs;
});
console.log(jobListings);
				
			

This script extracts job titles, employer names, locations, and job descriptions for each job listing on the page.

Step 5: Handle pagination to scrape more results

SimplyHired may paginate its job listings, so scraping a single page won’t capture all available data. To handle pagination, you’ll need to identify the “Next” button and instruct Puppeteer to click it and continue scraping subsequent pages.

Here’s how you can automate pagination:

				
					let hasNextPage = true;
while (hasNextPage) {
  // Scrape job data
  const jobListings = await page.evaluate(() => {
    const jobs = [];
    document.querySelectorAll('.jobposting').forEach(job => {
      const title = job.querySelector('.jobposting-title').innerText;
      const employer = job.querySelector('.jobposting-company').innerText;
      const location = job.querySelector('.jobposting-location').innerText;
      const description = job.querySelector('.jobposting-snippet').innerText;
      jobs.push({ title, employer, location, description });
    });
    return jobs;
  });

  // Check if there's a next page
  const nextPageButton = await page.$('a[aria-label="Next"]');
  if (nextPageButton) {
    await nextPageButton.click();
    await page.waitForNavigation({ waitUntil: 'networkidle2' });
  } else {
    hasNextPage = false;
  }
}

				
			

This loop continues scraping until no “Next” button is found, ensuring you capture all job listings across multiple pages.

Storing scraped data in CSV format

Once you’ve successfully scraped job data from SimplyHired, you’ll want to store it in a structured format for future analysis. Using Python’s CSV module, you can export your scraped job listings, such as a job title, employer, location, and job description, into a CSV file. Below is an example of how to write this data into a CSV file:

				
					import csv

# Example scraped data
jobs_data = [
    {"job_title": "Web Developer", "employer": "TechCorp", "location": "New York", "description": "Develop web applications."},
    {"job_title": "Software Engineer", "employer": "Innovate Inc.", "location": "New York", "description": "Design software solutions."},
]

# Define the CSV file headers
csv_columns = ["job_title", "employer", "location", "description"]

# Writing data to CSV
with open('jobs_data.csv', mode='w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=csv_columns)
    writer.writeheader()
    writer.writerows(jobs_data)

print("Data stored in CSV format successfully!")

				
			

Why scrape SimplyHired data?

Scraping SimplyHired job data offers significant value for various industries. By extracting job listings from this platform, you can conduct in-depth job market research to understand the demand for specific roles in different regions, such as web developers or data analysts. For example, recruiters can gain insights into hiring trends, enabling them to adapt their strategies based on the number of job openings or the demand for certain skills.

Moreover, competitor analysis can be enhanced by reviewing job descriptions from competitors to understand their hiring needs and growth trajectories. Salary trends can also be identified through job postings, allowing companies to adjust their compensation structures to remain competitive.

For recruiters, data scientists, and market researchers, scraping SimplyHired provides an invaluable dataset for identifying patterns, conducting trend analysis, and making data-driven decisions that align with real-time market demands, helping them they stay ahead in an ever-evolving job market.

Conclusion

In this tutorial, we’ve walked through building a web scraper to extract valuable job data from SimplyHired. With the right tools and techniques, such as using a New York-based mobile proxy, you can gather critical job listings, including details like job titles, employers, and descriptions. This data is crucial for professionals such as data analysts, web developers, job seekers, and recruiters conducting market research or analyzing hiring trends.

The application of The Social Proxy’s mobile proxies is essential to overcoming common scraping challenges, such as avoiding IP blocks and bypassing bot detection, ensuring accurate, location-specific results.

Now that you’ve seen the potential of this approach, we encourage you to try building your SimplyHired scraper and leverage mobile proxies to scale your web scraping projects efficiently and reliably.

Accessibility tools

Powered by - Wemake