Building an Instagram Profile Location Tracker: A Step-by-Step Guide to Mapping Social Media Movements

Data from social networks like Instagram is a gold mine for understanding user behavior and preferences. Every post, whether an image, text, comment, or reaction, can reveal valuable insights about a user’s location, hashtag, or trend. These insights can enhance business decision-making, improve networking opportunities, and even help monitor a user’s real-life activities.

Imagine you need to gather intelligence before a meeting with a thought leader, business owner, or potential partner. You’ll likely spend a decent amount of time combing through their social media profiles to learn about their interests, frequently visited places, and recent activities. Fortunately, there’s technology that makes it easy to track an Instagram user’s geolocation data.

In this tutorial, we’ll learn how to build an Instagram profile tracker. We’ll start by scraping Instagram profile data, then move on to extracting location information, and finish with mapping the movements of an Instagram user. If you have basic Python knowledge and access to The Social Proxy Scraper API, you should have no trouble following along.

What is an Instagram profile location tracker?

An Instagram profile location tracker is a tool that collects and analyzes the locations associated with an Instagram user’s posts and provides a visual of a user’s whereabouts based on their shared content. Instagram determines a users’ primary location through their device’s IP address and activity on Instagram, regardless of whether or not location services are enabled.

The Instagram profile location tracker follows users’ movements based on the geotags and metadata attached to their public posts (e.g. images, videos, and captions). Instagram users can add or edit locations when making a post, but even when they don’t, location clues can often be found in hashtags, captions, or text linked to an image. The location tracker scrapes posts from public profiles, extracting geolocation data and timestamps to determine where and when a post was made. The data then gets plotted on a map to create a visual representation of the user’s movements.

The importance of tracking Instagram locations

Understanding where a user has been can be crucial for market research, influencer analysis, and competitive intelligence. Brands can use this data to identify trends, target local campaigns, or analyze influencer behavior. There are several benefits to of tracking an Instagram user:

  • Improve your networking skills: You can show up at the favorite spots of prospective clients, thought leaders, or business owners and have an easier time connecting with them.
  • Connect with local influencers: Engage with influencers who can help grow your business by focusing on specific local markets.
  • Monitor teenagers: Track teenagers’ activity to help keep them away from potentially dangerous areas.
  • Track employee activity: Monitor the movement of employees to assess whether their activities align with business goals.

Understanding The Social Proxy’s tools

The Social Proxy offers three main solutions: mobile proxy, residential proxy, and an API scraper. With the fastest mobile proxies on the market, it’s a leading provider in the industry. Key features of its proxies (both mobile and residential) include automatic IP rotation, access to region-restricted content, fast internet speeds, seamless integration for developers, and much more.

The Scraper API from The Social Proxy allows you to ethically scrape data from social media, web pages, maps, and more without getting blocked. Unlike the mobile and residential proxies, which focus on anonymity, the Scraper AI is designed specifically for data scraping.

The Scraper API also includes the Geolocate Lookup API, which enables you to determine the geolocation of an image by passing it through the API endpoint. The Geolocate Lookup API uses AI to detect an image’s location and its corresponding coordinates. We’ll use this feature in this tutorial.

How to set up The Social Proxy for Instagram

For this project, we will need access to the Scraper AI API endpoints for Instagram and the Geolocate Lookup API. To access these, you will need to set up an account with “The Social Proxy.” This will provide you with a username, password, consumer keys, and consumer secrets, which are required to access the Scraper AI API. Skip this section if you already have an account.

Follow these steps to get started:

  1. Visit The Social Proxy’s official website.
  2. Click “Login” if you already have an account. To create a new account, click “Get Started” and follow the next steps.
  3. Fill out the required fields in the signup form and click “Sign Up.”

Click on the account verification link sent to your email from The Social Proxy.

Access your dashboard on The Social Proxy and click on “Buy Proxy” to select a plan.

 

 

Choose a plan: In the buy proxies page, select “Scraper API,” choose your subscription type, and click “Checkout.”

How to build an Instagram profile location tracker

Now we’ll build an Instagram profile location tracker using Python and its libraries and versions of API endpoints. It’s important to follow this guide step by step, without skipping any details. So, roll up your sleeves and let’s get started!

Step 1: Set up your development environment

Download and install Python for your operating system, run the installer, and ensure you follow all instructions that appear on your screen.

Download and install visual studio code, run the installer, and follow all instructions. Now go ahead and set up your vscode for python using this guide.

Set up a virtual environment where you can download all the necessary libraries. It ensures that the libraries required for a specific project won’t interfere with those in your global environment.

  • Create a virtual environment using pipenv.

Open a terminal, create a folder for your project, and run the following command:

  • Install pipenv package: run pip install pipenv
  • Set up the virtual environment within the folder: pipenv shell

Step 2: Scrape Instagram posts using The Social Proxy Scraper API

In this section, we’ll use the Scraper API to scrape text and images from a specific Instagram profile.

Navigate to your project folder and run the command pipenv shell in your terminal to activate your virtual environment.

Note: Your virtual environment must be active while building this project. Ensure all installations are made within the virtual environment. All scripts should be executed from the terminal or command-line interface (CLI).

You’ll need to install two libraries:

  • The requests library for accessing the API endpoints
  • The pydantic-settings library for creating a config class that provides access to environment variables

Install these libraries by running the following commands in your terminal/command line interface (CLI):

  • pipenv install requests
  • pipenv install pydantic-settings

Set up environment variables:
Open VSCode and navigate to your project folder. Create a .env file and enter your CONSUMER_SECRET and CONSUMER_KEY.

				
					CONSUMER_KEY = ""
CONSUMER_SECRET = ""

				
			

Create a config.py file in the project folder and enter this code:

				
					"""This module extracts information from your `.env` file so that
you can use your Social Scrapy Keys in other parts of the application.
"""


# The os library allows you to communicate with a computer's
# operating system: https://docs.python.org/3/library/os.html
import os


# pydantic used for data validation: https://pydantic-docs.helpmanual.io/
from pydantic_settings import BaseSettings




def return_full_path(filename: str = ".env") -> str:
    """Uses os to return the correct path of the `.env` file."""
    absolute_path = os.path.abspath(__file__)
    directory_name = os.path.dirname(absolute_path)
    full_path = os.path.join(directory_name, filename)
    return full_path




class Settings(BaseSettings):
    """Uses pydantic to define settings for project."""


    CONSUMER_KEY: str
    CONSUMER_SECRET: str
   
   


    class Config:
        env_file = return_full_path(".env")




# Create instance of `Settings` class that will be imported
# in lesson notebooks and the other modules for application.
settings = Settings()

				
			

The config.py file contains the module for loading the variables stored in the .env.
Create a scrapy.py file and enter the code below into it.

				
					#import the necessary libraries
import requests
import json
from config import settings
#choose an instagram handle you will like to scrape
username = 'oivindhaug'
#Importing the environment variables from the .env file using the #settings class in config
CONSUMER_KEY=settings.CONSUMER_KEY
CONSUMER_SECRET = settings.CONSUMER_SECRET


def scrape_instagram(consumer_key, consumer_secret, username, limit):
   
    CONSUMER_KEY=consumer_key
    CONSUMER_SECRET = consumer_secret
    username = username
    url = f'https://thesocialproxy.com/wp-json/tsp/instagram/v1/profiles/feed?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}&username={username}'


    payload = {}
    headers = {
      'Content-Type': 'application/json',
    }


   
    # Initialize an empty list to store all results
    all_results = []


    # Pagination handling (if applicable)
    next_page = 1  # Start with the first page
    limit= limit # Limit to the number of pages to fetch, adjust as needed


    while next_page and next_page <= limit:
        # Make the request with current page
        response = requests.get(url, headers=headers, data=payload)


        # Parse JSON response
        response_json = response.json()


        # Check if results are in the 'data' key
        if 'data' in response_json and isinstance(response_json['data'], list):
            results = response_json['data']
            all_results.extend(results)  # Add the current page results to the total list


           


            # Check if there's a next page
            next_page += 1  # Increment to the next page
        else:
            # No more data or unexpected structure
            break
           
    return all_results
# Scrape the data from the instagram profile and store in the variable data
data = scrape_instagram(CONSUMER_KEY,CONSUMER_SECRET,username,10)

				
			

Run python scraper.py in your terminal and wait for the output. If you followed all the steps correctly, you should see an output similar to the one shown below. The value of the limit variable will determine how much data is returned with each API call.

The scrape_instagram function will scrape data from the instagram pages in the username. The limit argument determines the number of pages to scrape data from. This function will scrape ten pages from the Instagram account. The pages contain 12 unique posts.

The data scraped contains information such as the time the post was made (taken_at), the location of the user when the post was made, the latitude and longitude of the location, the caption text, the media url, and lots more. You will learn how to extract the necessary information in the next step

Note: Each API call reduces your API credit. Open a Jupyter notebook within your project directory, and run the code snippets within code blocks in the notebook. A Jupyter notebook commits every session in memory, therefore, you won’t need to call the API again after scraping the data. Here is a guide on how to set up jupyter notebooks in vscode and a guide on how to use your virtual environment as a kernel in Jupyter Notebook.

Follow the guides below if you’re working within a Jupyter notebook. If you’re not working within a Jupyter notebook, skip to the next step.

  • Run the data[1] below in a new code block.
    This will show all data scraped from each Instagram page of an Instagram profile. You can change the value from 0-9.
  • Run data[1][‘items’] to access the “items” part of the data in the first page of the result.
  • Go to the scrapy.py script and enter the two code snippets below.
				
					from datetime import datetime
# Convert Unix timestamp to readable time
def convert_timestamp(unix_timestamp):
    return datetime.utcfromtimestamp(unix_timestamp).strftime('%Y-%m-%dT%H:%M:%S'

				
			
				
					def extract_location(all_results, limit):
    extracted_posts = []  # List to store all extracted data
   
    for i in range(limit):
        postx = all_results[i]['items']
        for post in postx:
            taken_at = post['taken_at']
             # Convert taken_at from Unix to readable time format
            time = convert_timestamp(taken_at) if isinstance(taken_at, int) else 'Unknown Time'
           
            # Default values for caption-related fields
            caption_text = 'No caption'
           
            # Check if 'caption' exists
            caption = post.get('caption')
            if caption:
                caption_text = caption.get('text', 'No caption')  # Extracting the caption text
           
            # Default location values
            location_name = 'Unknown Location'
            latitude = 'Unknown Latitude'
            longitude = 'Unknown Longitude'
           
            # Extract location if available
            location = post.get('location')  # Use get() to avoid KeyError if 'location' doesn't exist
            if location:
                location_name = location.get('name', 'Unknown Location')
                latitude = location.get('lat', 'Unknown Latitude')
                longitude = location.get('lng', 'Unknown Longitude')
           
            # Default value for image URL
            image_url = 'No Image URL'
           
            # Extract image URL if the post is an image
            image_versions = post.get("image_versions2",     {}).get("candidates", [])
            if image_versions:
                image_url = image_versions[0].get("url", "No Image URL")
           
            # Append the extracted data
            extracted_posts.append({
                'taken_at': time,
                'caption_text': caption_text,
                'location_name': location_name,
                'latitude': latitude,
                'longitude': longitude,
                'image_url': image_url  # Include the image URL
            })
   
    return extracted_posts
# Extract the locations
posts = extract_location(data,len(data))
				
			

This code snippet will extract time of post, caption text, location of the post, latitude and longitude of the location. These will be stored in the posts variable.

Step 3: Analyze text for mentioned locations

In the last section, we discovered that some posts were made without explicit locations. However, many captions contain locations mentioned in the text itself. In this section, we’ll go over how to analyze caption text and extract locations using SpaCy. SpaCy is an open-source Python library for natural language processing (NLP) tasks.

  • Run pipenv install spacy in your CLI to install SpaCy library.
  • Run pipenv run python -m spacy download en_core_web_sm in your terminal to install the en_core_web_sm model. SpaCy  has different models for NLP tasks.You can learn more about SpaCy models here.
  • Add import spacy to the top of the code in the scrapy.py file to import the spacy library
  • Enter the code snippet below in the scrapy.py file and run in your terminal.
				
					def get_location_from_text(posts):
    #load SpaCy model
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(posts)
    # Extract any location entities (GPE)
    location = [ent.text for ent in doc.ents if ent.label_ == "GPE"]
    # initialize the nominatin from geopy for getting the coordinates
    geolocator = Nominatim(user_agent="my_app")
    location_coordinate = geolocator.geocode(location)
    if location_coordinate:
        return location_coordinate.latitude, location_coordinate.longitude
    return "Unknown Latitude", "Unknown Longitude"

				
			

The get_location_from_text function extracts locations from all caption texts and adds them to a list of locations. It also retrieves the respective coordinate of each location using the Geopy Python library. Note that some posts without explicit locations might not also mention locations in their captions. In the next step, we’ll learn how to use the AI Geo Lookup API to extract locations from such posts.

Step 4: Use AI Geo Lookup to identify locations from images

The AI Geo Lookup endpoint in Scraper AI utilizes artificial intelligence to detect the location of an image. In this section, you will learn how to use this API endpoint to determine the location of a post based on the image it contains.

Here is the endpoint:

				
					import requests
import json


url = "https://thesocialproxy.com/wp-json/tsp/geolocate/v1/image?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}"


payload = json.dumps({
  "image": "{BASE64_IMAGE}"
})
headers = {
  'Content-Type': 'application/json',
}


response = requests.request("POST", url, headers=headers, data=payload)


print(response.text)

				
			

This endpoint accepts images in BASE64 format, so we’ll continue by doing the following:

  1. Use requests to download the image from the image URL.
  2. Convert the image to BASE64 format.
  3. Pass the BASE64-encoded image through the AI Geo Lookup API endpoint to retrieve the location.

Enter the code snippet below other functions in the scraper.py file. This function calls in the AI Geo Lookup API.

				
					# Geolocation API call
def get_geolocation_from_image(base64_image, CONSUMER_KEY, CONSUMER_SECRET):
    try:
        url = f"https://thesocialproxy.com/wp-json/tsp/geolocate/v1/image?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}"
        payload = json.dumps({
              "image": base64_image
            })
        headers = {
              'Content-Type': 'application/json',
            }
        response = requests.request("POST", url, headers=headers, data=payload)
        #response.raise_for_status()
        data = response.json()
       
        geo_predictions = data['data']['geo_predictions']
        coordinates_list = [prediction['coordinates'] for prediction in geo_predictions]
        if coordinates_list:
            return coordinates_list[0]
        else:
            return ('Unknown Latitude', 'Unknown Longitude')
    except Exception as e:
        print(f"Error during geolocation API call: {e}")
        return ('Unknown Latitude', 'Unknown Longitude')

				
			

Add the code snippet below the previous function. This code snippet uses requests to download the image from the image URL, convert the image to BASE64 format, and pass the BASE64-encoded image through the AI Geo Lookup API endpoint to retrieve the location.

				
					# Function to fetch image-based geolocation if location from caption fails
def fetch_geolocation_from_image(image_url, CONSUMER_KEY, CONSUMER_SECRET):
    try:
        # Step 1: Download the image
        response = requests.get(image_url)
        #response.raise_for_status()  # Check for HTTP errors
        # Step 2: Convert the image to Base64
        image_data = response.content  # Get the image data in bytes
        base64_image = base64.b64encode(image_data).decode('utf-8')  # Encode to Base64 and decode to string
       
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the image: {e}")
        return None
   
    if base64_image:
        return get_geolocation_from_image(base64_image, CONSUMER_KEY, CONSUMER_SECRET)

				
			

Note: You must import base64. Add that to the very top of the scraper.py file.

Add the following code snippet below the other functions in the scraper.py file. This code snippet will extract the latitude, longitude, and time for each post. The posts argument required by this function is the object returned after running the extract_location function.

				
					def extract_location_data(posts, CONSUMER_KEY, CONSUMER_SECRET):
    location_data = []


    for post in posts:
        location = post['location_name']
        caption = post['caption_text']
        image_url = post['image_url']
        latitude = post['latitude']
        longitude = post['longitude']
        time = post['taken_at']
        if location == "Unknown Location":
            latitude, longitude = get_location_from_text(caption)  # Assuming this function exists elsewhere
            if latitude == "Unknown Latitude" and longitude == "Unknown Longitude":
                latitude, longitude = fetch_geolocation_from_image(image_url, CONSUMER_KEY, CONSUMER_SECRET)  # Assuming this function is defined
        # Append only latitude, longitude, and time to the list
       location_data.append([latitude, longitude, time])
    return location_data

				
			

Step 5: Build a map to visualize movement over time

In this section, we will use the Folium Python library to create an interactive map, as shown earlier in this tutorial. Folium is a Python wrapper around the Leaflet.js library, which is designed for making interactive maps.
To get started:

  1. Run pipenv install folium terminal to install Folium.
  2. Add the code below under other functions in the scraper.py and run.
    The code snippet below takes the list of objects returned by the extract_location_data function, converts it into a Pandas DataFrame, formats the time, and creates an interactive plot using Folium.
				
					# Sample data: [latitude, longitude, timestamp]
def create_maps(locations, filename="map_file_john_obidi.html"):
    # create a dataframe from the locations input
   location_df = pd.DataFrame(locations, columns=['latitude', 'longitude', 'timestamp']).sort_values('timestamp')
    locations = location_df.to_dict(orient='records')
    locations
    # Create a map centered around the first location
    m =   folium.Map(location=[locations[0]['latitude'],locations[0]['longitude']], zoom_start=13)


    # Add markers and a path to the map
   
    coordinates = [(loc['latitude'],loc['longitude']) for loc in locations]
    AntPath(locations=coordinates, dash_array=[20, 20], pulse_color='blue').add_to(m)
    # earliest date and latest date 
    start_time  = locations[0]['timestamp']
    end_time = locations[-1]['timestamp']


    for idx, loc in enumerate(locations, start=1):
        date = datetime.fromisoformat(start_time).date()
        time = datetime.fromisoformat(start_time).time()
        if idx == 1:
            text = f"Start\nDate: {date}\nTime: {time}"
            text = folium.Popup(text, show=True)
            icon = folium.Icon(color='red')
        elif idx == len(locations):
            date = datetime.fromisoformat(end_time).date()
            time = datetime.fromisoformat(end_time).time()
            text = f"End\nDate: {date}\nTime: {time}"
            text = folium.Popup(text, show=True)
            icon = folium.Icon(color='red')
        else:
            date = datetime.fromisoformat(loc['timestamp']).date()
            time = datetime.fromisoformat(loc['timestamp']).time()
            text = f"Date: {date}\nTime: {time}"
            text = folium.Popup(text)
            icon = folium.Icon(color='blue')
        folium.Marker(
            location=[loc['latitude'], loc['longitude']],
            popup=text,
            icon=icon
        ).add_to(m)


    # Save the map to an HTML file
    m.save(filename)

				
			

Create a plot_map.py file in your project folder and add the code below.

				
					from scraper import (
    scrape_instagram, extract_location, convert_timestamp,
    extract_location_data, fetch_geolocation_from_image,
    get_location_from_text, get_geolocation_from_image,
    create_maps
)
from config import settings


def main():
    # Define your parameters
    username = 'johnobidi'
    CONSUMER_KEY = settings.CONSUMER_KEY
    CONSUMER_SECRET = settings.CONSUMER_SECRET
    limit = 10


    # Scrape the data from the Instagram profile and store in the variable data
    data = scrape_instagram(CONSUMER_KEY, CONSUMER_SECRET, username, limit)
   
    # Get the necessary information using the extract_location function
    posts = extract_location(data, len(data))
   
    # Extract location data
    location_data = extract_location_data(posts, CONSUMER_KEY, CONSUMER_SECRET)
   
    # Create maps from location data
    create_maps(location_data)


if __name__ == "__main__":
    main()

				
			
  • Here is a link to the scraper.py file
  • Run the plot_map.py file to get the interactive map showing all of “@johnobidi” locations while making some posts on instagram.
    Go to your project directory and click on the “map_file_john_obidi.html”.

Conclusion

In this tutorial, we built an Instagram profile location tracker using the Scraper AI API and several open-source libraries, including Folium and SpaCy. We started by scraping the data with Scraper AI, then extracted location names and coordinates from captions using SpaCy and Geopy. Next, we identified location names from images using the AI Geo Lookup API and created an interactive map with Folium.

Although we developed an Instagram Profile Tracker, you can explore other location trackers with Facebook, LinkedIn, or others. You can also build your new product on its API endpoints for football, streetview and reddit data. If you know languages other than Python, you can apply similar techniques in those languages as well.

Accessibility tools

Powered by - Wemake