How to Detect Hate Speech on Instagram Using The Social Proxy and OpenAI

Freedom of speech is one of the pillars of modern society – it allows us to openly express ourselves and our feelings without penalty, regardless of what they are. Thanks to the internet, this freedom has reached new heights. But unfortunately, freedom of speech on the web comes at a high cost of hate speech and offensive content on social media platforms.

When it comes to harmful speech on social media, platforms like Instagram impose little to no friction to prevent someone from posting something harmful. People can seamlessly share whatever they please without ever stopping in their tracks to consider the consequences. Hate speech poses a threat to individuals and society as a whole, so it’s no surprise that it’s become a major concern.

With millions of posts and comments made daily on Instagram, manual detection combined with Instagram’s algorithm alone leave tons of harmful content undetected. Luckily, the use of the right tools can help identify hate speech, protecting users and maintaining community standards.

In this tutorial, we’ll walk through the step-by-step process of detecting hate speech on Instagram using OpenAI API and The Social Proxy, a web scraping and proxy provider.

What is hate speech?

According to the United Nations, hate speech is an offensive discourse targeting a group or an individual based on inherent characteristics (such as race, religion, or gender) that may threaten social peace. It often includes slurs, derogatory remarks, and statements that incite violence or hatred. Hate speech not only promotes social inequality, but victims of it may suffer psychological harm. It can stir up an environment of fear and marginalization, particularly for historically oppressed communities, undermining their dignity and social standing in society.

Hate speech on Instagram

Hate speech on Instagram is a pressing concern. According to Taylor Lorenz, a reporter for The Atlantic, “[Instagram is] likely where the next great battle against misinformation will be fought, and yet it has largely escaped scrutiny.” With over a billion users and counting, monitoring the platform presents a huge challenge. In an attempt to create a safer environment for all users, Instagram has implemented automated content moderation and user reporting systems to combat hate speech. Even so, more than a handful of posts and comments remain unaddressed for hours or never get removed at all.

Why is detecting hate speech crucial?

Unlike traditional mediums, online platforms have the capacity to reach a large and diverse audience in a matter of minutes. In other words, people can produce and share hate speech with minimal effort, at a low cost, and anonymously. Research indicates that unchecked hate speech can foster a hostile environment, contributing to mental health issues among targeted individuals and perpetuating societal divisions. Studies have shown that exposure to hate speech correlates with increased anxiety and depression among victims, making it critical to implement effective detection mechanisms to mitigate negative consequences and promote a safer online community.

Introduction to The Social Proxy and OpenAI

What is The Social Proxy?

The Social Proxy is a leading mobile proxy provider that offers the fastest mobile proxies in the market. It’s designed to provide users with high-quality, reliable mobile IP addresses for various online activities, particularly social media automation. Key features include an endless stream of mobile IPs, high anonymity, and IP rotation support, which helps users avoid detection and bans from social media platforms. The Social Proxy has a Scraper API, is developer friendly, and has tools that help retrieve data from popular social media platforms. It allows you to do so in an ethical manner and without the risk of being flagged as a bot. API and the user dashboard facilitate easy setup and operation,ideal for developers who want to automate tasks quickly and securely.

What is OpenAI?

OpenAI is an artificial intelligence research company that develops advanced language models, such as GPT-3.5, GPT-4, and GPT-4o. These models can be fine-tuned for various natural language processing tasks, including hate speech detection. For this purpose, OpenAI has an endpoint that can check whether text is potentially harmful—a moderation endpoint. Developers can leverage this endpoint to identify hate speech on social media platforms and content moderation systems and automatically flag it on time before it causes any harm.

How to detect hate speech on Instagram using The Social Proxy and OpenAI

In this section, we’ll review how to detect hate speech on Instagram.

Step 1: Set up The Social Proxy for Instagram

Follow this step-by-step guide to set up The Social Proxy and use it to retrieve data from Instagram:

  • Visit The Social Proxy’s official website.
  • Click “Login” if you already have an account. To create a new account, click “Get Started” and follow the next steps.
  • Fill out the required fields in the signup form and click “Sign Up.”

Click on the account verification link sent to your email from The Social Proxy.

Configure The Social Proxy for Instagram

Access your dashboard on The Social Proxy and click on “Buy Proxy” to select a plan.

Choose a plan: In the buy proxies page, select “Scraper API,” choose your subscription type, and click “Checkout.”

Provide payment details: Fill out your payment information and click “Sign up now.” Once you’ve signed up, you can proceed to use the Scraper API.

Generate your Scraper API keys: You need to generate your keys before you can start making API calls to the Scraper API. In the side menu, click “Scraper API” and select “Scraper API.”

Click on “Generate API KEY”.

Copy your credentials: Copy your Consumer Key and Consumer Secret – you will need them in your code.

Note: If you encounter any issues, you can contact The Social Proxy for customer support 24/7.

Step 2: Integrate OpenAI for hate speech detection

Understanding OpenAI language model for hate speech detection

OpenAI has a moderation endpoint specifically for hate speech detection; it is a tool that automatically detects potentially harmful content, including hate speech, in text data. It leverages sophisticated language models trained on extensive datasets to identify various categories of harmful content. This tool also provides real-time analyses,granular feedback that categorizes harmful content detected, and confidence scores.

Here is a quick demo to show you how the endpoint works:

				
					const axios = require('axios');
async function moderateContent(text) {
  const apiKey = 'YOUR_OPENAI_API_KEY';
  try {
    const response = await axios.post('https://api.openai.com/v1/moderations', {
      input: text
    }, {
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      }
    });
    const result = response.data.results[0];
    console.log(`Text: "${text}"`);
    console.log(`Hate Speech: ${result.category_scores.hate > 0.5}`);
    console.log(`Categories: ${JSON.stringify(result.categories, null, 2)}`);
    console.log(`Scores: ${JSON.stringify(result.category_scores, null, 2)}\n`);
  } catch (error) {
    console.error('Error moderating content:', error);
  }
}
// Example text to analyze
const exampleText = "This is an example of harmful content.";
moderateContent(exampleText);
				
			

Here is the output:

The script detected Hate Speech as true and categorized the text and scores.

Setting up OpenAI integration

You need an API key in order to use the OpenAI moderation endpoint. Follow these steps to get your key:

  • Go to the OpenAI developer platform
  • If you do not have an account, click “Sign Up” and follow the instructions to create one. If you already have an account, click “Log In” and enter your credentials.
  • After logging in, click on “Dashboard” to access the API

Generate an API key: In the Dashboard, click on the side menu item labeled “API keys.” Then, click on “Create new secret key”

Copy the API key: Once the key is generated, copy and store it securely. This key will be used to authenticate your requests to the OpenAI API.

Note: If you are a first-time user, you may have access to free credits. Otherwise you may need to buy credit to use the OpenAI API.

Step 3: Detect hate speech on Instagram with The Social Proxy and OpenAI

To detect hate speech on Instagram, we’ll combine data from Instagram provided by The Social Proxy Scraper API with OpenAI’s moderation endpoint, which will scan the date for hate speech.

Installation and configuration

Install Node.js: To check if you already have Node.js installed on your computer, run the command below:

				
					node -v


				
			

If you don’t have Node.js installed, you can download it here.

Create a project folder and open the folder with the code editor of your choice.
Initialize a new Node.js project in the folder by running the command:

				
					npm init -y


				
			

Inside the folder, install the following dependencies using the command below:

				
					npm install openai axios request


				
			

Now, let’s test the various scenarios.

Note: In the code examples, always remember to replace {CONSUMER_KEY} and {CONSUMER_SECRET} with your actual keys from The Social Proxy Scraper API and replace YOUR_OPENAI_API_KEY with your actual OpenAI API key.

Example detection scenarios

Scenario 1: Detecting hate speech in Instagram post comments

In this scenario, we will analyze the comments made on a particular Instagram post. The Scraper API will fetch all the comments made on the post and then send only the comments that contain keywords to OpenAI’s moderation endpoint for analysis.

For starters, we have to get the data from the post. To do so, we have to get its mediaID. For this example, we’ll use a post by Marques Brownlee, a tech YouTuber. To get the mediaID, do the following:
Inspect the page: Right-click on any element on the page and select “Inspect” from the context menu. This will open the Developer Tools.
Locate the mediaID: Press Control + F (Windows) or CMD + F (MacOS) to open a search bar, then look for “id”:
Search through the results to get the mediaID of the post and copy it.

  1. Inspect the page: Right-click on any element on the page and select “Inspect” from the context menu. This will open the Developer Tools.
  2. Locate the mediaID: Press Control + F (Windows) or CMD + F (MacOS) to open a search bar, then look for “id”:
    Search through the results to get the mediaID of the post and copy it.

After identifying the mediaID, create a comments.js file in your project folder. Inside comments.js, implement the code to use Scraper API and OpenAI to retrieve and analyze comments for the post.

				
					const request = require('request');
const axios = require('axios');

// Keywords to filter comments
const keywords = [
  'kill',
  'murder',
  'sick',
  'shit',
  'cybertruck',
];

var options = {
  method: 'GET',
  url: 'https://thesocialproxy.com/wp-json/tsp/instagram/v1/media/comments?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}&mediaId=3423597702078367970_28943446',
  headers: {
    'Content-Type': 'application/json',
  },
};

request(options, async function (error, response) {
  if (error) throw new Error(error);

  // Parse the JSON response
  const responseData = JSON.parse(response.body);

  // Check if the response contains comments
  if (responseData.data && responseData.data.comments) {
    // Extract username and text for each comment
    const comments = responseData.data.comments.map((comment) => ({
      username: comment.user.username,
      text: comment.text,
    }));

    // Filter comments based on keywords
    const filteredComments = comments.filter((comment) => {
      return keywords.some((keyword) =>
        comment.text.toLowerCase().includes(keyword)
      );
    });

    // Set up OpenAI API
    const openaiApiKey = 'YOUR_OPENAI_API_KEY';

    // Analyze each filtered comment for hate speech
    for (let comment of filteredComments) {
      try {
        const response = await axios.post(
          'https://api.openai.com/v1/moderations',
          {
            input: comment.text,
          },
          {
            headers: {
              Authorization: `Bearer ${openaiApiKey}`,
              'Content-Type': 'application/json',
            },
          }
        );

        const result = response.data.results[0];
        console.log(`Comment: "${comment.text}"`);
        console.log(`Hate Speech: ${result.category_scores.hate > 0.5}`);
        console.log(
          `Categories: ${JSON.stringify(result.categories, null, 2)}`
        );
        console.log(
          `Scores: ${JSON.stringify(result.category_scores, null, 2)}\n`
        );
      } catch (error) {
        console.error(`Error analyzing comment: "${comment.text}"`, error);
      }
    }
  } else {
    console.log('No comments found in the response');
  }
});

				
			

Here is a breakdown of what the code does:

  • Defines keywords for filtering: Instead of analyzing all of the comments, it analyzes those that contain keywords in the array.
  • Configures API request options: The options object is set up to make a GET request to The Social Proxy’s API endpoint, which fetches comments for a specific Instagram media post.
  • The request function fetches comments. The response gets parsed and comments get extracted.
  • The filteredComments function filters based on the defined keywords.
  • Sets up OpenAI API: An API key for OpenAI is defined before analyzing the filtered comments. This key will authenticate requests to OpenAI’s moderation endpoint.
  • Analyzes comments for hate speech: An API request is made to OpenAI’s moderation endpoint for every filtered comment. The response is evaluated to determine if the comment contains hate speech. The results, including categories and scores, are logged to the console.

Here is the output:

Based on the results, none of the comments with the keywords above contain hate speech.

Scenario 2: Detecting hate speech from a user's posts

In this scenario, we’ll use the Scraper API to fetch all of the images on Instagram that pertain to a specific user. Next we’ll use OpenAI moderation API to analyze every post’s image and caption for hate speech using only the name of the user.

Once you have the username, create a userposts.js file in your project folder to implement the code and retrieve and analyze all user posts using the Scraper API and OpenAI.

				
					const request = require('request');
const axios = require('axios');

// Keywords to filter captions
const keywords = ['bad', 'pain','forbidden','sex','fuck','terrorist','kill','murder','assassinate',];

var options = {
  method: 'GET',
  url: 'https://thesocialproxy.com/wp-json/tsp/instagram/v1/profiles/feed?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}&username=samharrisorg',
  headers: {
    'Content-Type': 'application/json',
  },
};

request(options, async function (error, response) {
  if (error) throw new Error(error);
  try {
    const jsonResponse = JSON.parse(response.body);
    const imageDetails = extractImageDetails(jsonResponse);

    // Set up OpenAI API
    const openaiApiKey = 'YOUR_OPENAI_API_KEY';

    // Analyze each filtered caption for hate speech
    for (let detail of imageDetails) {
      const { url, caption } = detail;

      if (
        caption &&
        keywords.some((keyword) => caption.toLowerCase().includes(keyword))
      ) {
        try {
          const response = await axios.post(
            'https://api.openai.com/v1/moderations',
            {
              Input: url
            },
            {
              headers: {
                Authorization: `Bearer ${openaiApiKey}`,
                'Content-Type': 'application/json',
              },
            }
          );

          const result = response.data.results[0];
          console.log(`Image URL: "${url}"`);
          console.log(`Caption: "${caption}"`);
          console.log(`Hate Speech: ${result.category_scores.hate > 0.5}`);
          console.log(
            `Categories: ${JSON.stringify(result.categories, null, 2)}`
          );
          console.log(
            `Scores: ${JSON.stringify(result.category_scores, null, 2)}\n`
          );
        } catch (error) {
          console.error(`Error analyzing caption: "${caption}"`, error);
        }
      }
    }
  } catch (parseError) {
    console.error('Error parsing JSON:', parseError);
    console.log('Raw response:', response.body);
  }
});

function extractImageDetails(data) {
  const details = [];
  if (data.data && data.data[0] && data.data[0].items) {
    data.data[0].items.forEach((item) => {
      if (
        item.media_type === 1 &&
        item.image_versions2 &&
        item.image_versions2.candidates
      ) {
        // Get the URL of the first (usually highest quality) image
        const imageUrl = item.image_versions2.candidates[0].url;
        const caption = item.caption ? item.caption.text : '';
        details.push({ url: imageUrl, caption });
      }
    });
  }
  return details;
}


				
			

In the Social Proxy’s API endpoint, replace the username with a new username of your choice.

Here is a breakdown of what the code does:

  • Defines keywords for filtering: Sets up an array of specific keywords to filter Instagram post captions. Only images with captions containing these keywords will be analyzed.
  • Sets up request options: Configures an options object to make a GET request to The Social Proxy’s API endpoint, fetching Instagram feed data for a specific user.
  • Makes the request: The request function fetches the Instagram feed data. The response is parsed, and relevant image details (URLs and captions) are extracted.
  • Sets up OpenAI API: Defines an API key for OpenAI to authenticate requests to OpenAI’s moderation endpoint.
  • Analyzes captions for hate speech: Checks captions for defined keywords for each image detail. If a caption contains any of these keywords, an API request is made to OpenAI’s moderation endpoint. The response is evaluated to determine if the image URL contains hate speech. The results, including categories and scores, are logged to the console.

Here is the output:

Scenario 3: Detecting hate speech via user followers’ profile pictures

In this scenario, we will extract a user’s followers and their usernames using the Scraper API and analyze their profile pictures for any sign of hate speech with the OpenAI moderation API.

Create a followers.js file in your project folder to implement the code to retrieve and analyze all user posts using the Scraper API and OpenAI. 

				
					const request = require('request');
const axios = require('axios');

const options = {
  method: 'GET',
  url: 'https://thesocialproxy.com/wp-json/tsp/instagram/v1/profiles/followers?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}&username={USERNAME}&number_of_results=25',
  headers: {
    'Content-Type': 'application/json',
  },
};

request(options, async function (error, response) {
  if (error) throw new Error(error);

  try {
    const jsonResponse = JSON.parse(response.body);
    const users = jsonResponse.data.users;
    const extractedData = users.map((user) => ({
      username: user.username,
      profile_pic_url: user.profile_pic_url,
    }));

    for (const user of extractedData) {
      const result = await analyzeProfilePic(user.profile_pic_url);
      console.log(
        `Username: ${user.username}, Hate Speech Detected: ${result.hate_speech}`
      );
    }
  } catch (parseError) {
    console.error('Error parsing JSON:', parseError);
    console.log('Raw response:', response.body);
  }
});

async function analyzeProfilePic(profilePicUrl) {
  const apiKey = ‘YOUR_OPENAI_API_KEY’

  try {
    const response = await axios.post(
      'https://api.openai.com/v1/moderations',
      {
        input: profilePicUrl,
      },
      {
        headers: {
          Authorization: `Bearer ${apiKey}`,
          'Content-Type': 'application/json',
        },
      }
    );

    const result = response.data;
    return {
      hate_speech: result.results.some((r) => r.categories.hate_speech),
    };
  } catch (error) {
    console.error('Error analyzing profile picture:', error);
    return {
      hate_speech: false,
    };
  }
}


				
			

In The Social Proxy’s API endpoint, replace the username with the username of your choice.

Here is a breakdown of what the code does:

  • Sets up request options: An options object is configured to make a GET request to The Social Proxy’s API endpoint, fetching a list of Instagram followers for a specific user. The number_of_results parameter specifies the number of followers to get.
  • Makes the request: The request function fetches the list of Instagram followers. If an error occurs, it throws an error. Otherwise, the response is parsed, extracting relevant user data (usernames and profile picture URLs).
  • Analyzes profile pictures for hate speech: The analyzeProfilePic function analyzes the profile picture URL for each extracted user.
  • Sets up OpenAI API: An API key for OpenAI is defined to authenticate requests to OpenAI’s moderation endpoint.
  • Analyzes profile pictures: The analyzeProfilePic function sends a request to OpenAI’s moderation endpoint to analyze the profile picture URL for hate speech. The response is evaluated to determine if the picture contains hate speech, and the results are logged to the console.
  • Handles JSON parsing errors: Errors that parse the JSON response are caught and logged.
  • Returns hate speech analysis result: The analyzeProfilePic function returns whether or not hate speech was detected in the profile picture.

Here is the output:

Conclusion

This article demonstrates how to effectively use The Social Proxy and OpenAI moderation endpoint to detect hate speech on Instagram. By following these steps, you can easily implement a hate speech detection system into any project.

The Social Proxy facilitates the process of getting valuable data from social media platforms and is valuable to anyone interested in gaining more insights from public social media data.

Accessibility tools

Powered by - Wemake