Unlimited IP Pool
Cost Effective IP Pool
Unlimited IP Pool
Cost Effective IP Pool
Data Sourcing for LLMs & ML
Accelerate ventures securely
Proxy selection for complex cases
Some other kind of copy
Protect your brand on the web
Reduce ad fraud risks
Freedom of speech is one of the pillars of modern society – it allows us to openly express ourselves and our feelings without penalty, regardless of what they are. Thanks to the internet, this freedom has reached new heights. But unfortunately, freedom of speech on the web comes at a high cost of hate speech and offensive content on social media platforms.
When it comes to harmful speech on social media, platforms like Instagram impose little to no friction to prevent someone from posting something harmful. People can seamlessly share whatever they please without ever stopping in their tracks to consider the consequences. Hate speech poses a threat to individuals and society as a whole, so it’s no surprise that it’s become a major concern.
With millions of posts and comments made daily on Instagram, manual detection combined with Instagram’s algorithm alone leave tons of harmful content undetected. Luckily, the use of the right tools can help identify hate speech, protecting users and maintaining community standards.
In this tutorial, we’ll walk through the step-by-step process of detecting hate speech on Instagram using OpenAI API and The Social Proxy, a web scraping and proxy provider.
According to the United Nations, hate speech is an offensive discourse targeting a group or an individual based on inherent characteristics (such as race, religion, or gender) that may threaten social peace. It often includes slurs, derogatory remarks, and statements that incite violence or hatred. Hate speech not only promotes social inequality, but victims of it may suffer psychological harm. It can stir up an environment of fear and marginalization, particularly for historically oppressed communities, undermining their dignity and social standing in society.
Hate speech on Instagram is a pressing concern. According to Taylor Lorenz, a reporter for The Atlantic, “[Instagram is] likely where the next great battle against misinformation will be fought, and yet it has largely escaped scrutiny.” With over a billion users and counting, monitoring the platform presents a huge challenge. In an attempt to create a safer environment for all users, Instagram has implemented automated content moderation and user reporting systems to combat hate speech. Even so, more than a handful of posts and comments remain unaddressed for hours or never get removed at all.
Unlike traditional mediums, online platforms have the capacity to reach a large and diverse audience in a matter of minutes. In other words, people can produce and share hate speech with minimal effort, at a low cost, and anonymously. Research indicates that unchecked hate speech can foster a hostile environment, contributing to mental health issues among targeted individuals and perpetuating societal divisions. Studies have shown that exposure to hate speech correlates with increased anxiety and depression among victims, making it critical to implement effective detection mechanisms to mitigate negative consequences and promote a safer online community.
The Social Proxy is a leading mobile proxy provider that offers the fastest mobile proxies in the market. It’s designed to provide users with high-quality, reliable mobile IP addresses for various online activities, particularly social media automation. Key features include an endless stream of mobile IPs, high anonymity, and IP rotation support, which helps users avoid detection and bans from social media platforms. The Social Proxy has a Scraper API, is developer friendly, and has tools that help retrieve data from popular social media platforms. It allows you to do so in an ethical manner and without the risk of being flagged as a bot. API and the user dashboard facilitate easy setup and operation,ideal for developers who want to automate tasks quickly and securely.
OpenAI is an artificial intelligence research company that develops advanced language models, such as GPT-3.5, GPT-4, and GPT-4o. These models can be fine-tuned for various natural language processing tasks, including hate speech detection. For this purpose, OpenAI has an endpoint that can check whether text is potentially harmful—a moderation endpoint. Developers can leverage this endpoint to identify hate speech on social media platforms and content moderation systems and automatically flag it on time before it causes any harm.
In this section, we’ll review how to detect hate speech on Instagram.
Follow this step-by-step guide to set up The Social Proxy and use it to retrieve data from Instagram:
Click on the account verification link sent to your email from The Social Proxy.
Access your dashboard on The Social Proxy and click on “Buy Proxy” to select a plan.
Choose a plan: In the buy proxies page, select “Scraper API,” choose your subscription type, and click “Checkout.”
Provide payment details: Fill out your payment information and click “Sign up now.” Once you’ve signed up, you can proceed to use the Scraper API.
Generate your Scraper API keys: You need to generate your keys before you can start making API calls to the Scraper API. In the side menu, click “Scraper API” and select “Scraper API.”
Click on “Generate API KEY”.
Copy your credentials: Copy your Consumer Key and Consumer Secret – you will need them in your code.
Note: If you encounter any issues, you can contact The Social Proxy for customer support 24/7.
OpenAI has a moderation endpoint specifically for hate speech detection; it is a tool that automatically detects potentially harmful content, including hate speech, in text data. It leverages sophisticated language models trained on extensive datasets to identify various categories of harmful content. This tool also provides real-time analyses,granular feedback that categorizes harmful content detected, and confidence scores.
Here is a quick demo to show you how the endpoint works:
const axios = require('axios');
async function moderateContent(text) {
const apiKey = 'YOUR_OPENAI_API_KEY';
try {
const response = await axios.post('https://api.openai.com/v1/moderations', {
input: text
}, {
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
}
});
const result = response.data.results[0];
console.log(`Text: "${text}"`);
console.log(`Hate Speech: ${result.category_scores.hate > 0.5}`);
console.log(`Categories: ${JSON.stringify(result.categories, null, 2)}`);
console.log(`Scores: ${JSON.stringify(result.category_scores, null, 2)}\n`);
} catch (error) {
console.error('Error moderating content:', error);
}
}
// Example text to analyze
const exampleText = "This is an example of harmful content.";
moderateContent(exampleText);
Here is the output:
The script detected Hate Speech as true and categorized the text and scores.
You need an API key in order to use the OpenAI moderation endpoint. Follow these steps to get your key:
Generate an API key: In the Dashboard, click on the side menu item labeled “API keys.” Then, click on “Create new secret key”
Copy the API key: Once the key is generated, copy and store it securely. This key will be used to authenticate your requests to the OpenAI API.
Note: If you are a first-time user, you may have access to free credits. Otherwise you may need to buy credit to use the OpenAI API.
To detect hate speech on Instagram, we’ll combine data from Instagram provided by The Social Proxy Scraper API with OpenAI’s moderation endpoint, which will scan the date for hate speech.
Install Node.js: To check if you already have Node.js installed on your computer, run the command below:
node -v
If you don’t have Node.js installed, you can download it here.
Create a project folder and open the folder with the code editor of your choice.
Initialize a new Node.js project in the folder by running the command:
npm init -y
Inside the folder, install the following dependencies using the command below:
npm install openai axios request
Now, let’s test the various scenarios.
Note: In the code examples, always remember to replace {CONSUMER_KEY} and {CONSUMER_SECRET} with your actual keys from The Social Proxy Scraper API and replace YOUR_OPENAI_API_KEY with your actual OpenAI API key.
In this scenario, we will analyze the comments made on a particular Instagram post. The Scraper API will fetch all the comments made on the post and then send only the comments that contain keywords to OpenAI’s moderation endpoint for analysis.
For starters, we have to get the data from the post. To do so, we have to get its mediaID. For this example, we’ll use a post by Marques Brownlee, a tech YouTuber. To get the mediaID, do the following:
Inspect the page: Right-click on any element on the page and select “Inspect” from the context menu. This will open the Developer Tools.
Locate the mediaID: Press Control + F (Windows) or CMD + F (MacOS) to open a search bar, then look for “id”:
Search through the results to get the mediaID of the post and copy it.
After identifying the mediaID, create a comments.js file in your project folder. Inside comments.js, implement the code to use Scraper API and OpenAI to retrieve and analyze comments for the post.
const request = require('request');
const axios = require('axios');
// Keywords to filter comments
const keywords = [
'kill',
'murder',
'sick',
'shit',
'cybertruck',
];
var options = {
method: 'GET',
url: 'https://thesocialproxy.com/wp-json/tsp/instagram/v1/media/comments?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}&mediaId=3423597702078367970_28943446',
headers: {
'Content-Type': 'application/json',
},
};
request(options, async function (error, response) {
if (error) throw new Error(error);
// Parse the JSON response
const responseData = JSON.parse(response.body);
// Check if the response contains comments
if (responseData.data && responseData.data.comments) {
// Extract username and text for each comment
const comments = responseData.data.comments.map((comment) => ({
username: comment.user.username,
text: comment.text,
}));
// Filter comments based on keywords
const filteredComments = comments.filter((comment) => {
return keywords.some((keyword) =>
comment.text.toLowerCase().includes(keyword)
);
});
// Set up OpenAI API
const openaiApiKey = 'YOUR_OPENAI_API_KEY';
// Analyze each filtered comment for hate speech
for (let comment of filteredComments) {
try {
const response = await axios.post(
'https://api.openai.com/v1/moderations',
{
input: comment.text,
},
{
headers: {
Authorization: `Bearer ${openaiApiKey}`,
'Content-Type': 'application/json',
},
}
);
const result = response.data.results[0];
console.log(`Comment: "${comment.text}"`);
console.log(`Hate Speech: ${result.category_scores.hate > 0.5}`);
console.log(
`Categories: ${JSON.stringify(result.categories, null, 2)}`
);
console.log(
`Scores: ${JSON.stringify(result.category_scores, null, 2)}\n`
);
} catch (error) {
console.error(`Error analyzing comment: "${comment.text}"`, error);
}
}
} else {
console.log('No comments found in the response');
}
});
Here is a breakdown of what the code does:
Here is the output:
Based on the results, none of the comments with the keywords above contain hate speech.
In this scenario, we’ll use the Scraper API to fetch all of the images on Instagram that pertain to a specific user. Next we’ll use OpenAI moderation API to analyze every post’s image and caption for hate speech using only the name of the user.
Once you have the username, create a userposts.js file in your project folder to implement the code and retrieve and analyze all user posts using the Scraper API and OpenAI.
const request = require('request');
const axios = require('axios');
// Keywords to filter captions
const keywords = ['bad', 'pain','forbidden','sex','fuck','terrorist','kill','murder','assassinate',];
var options = {
method: 'GET',
url: 'https://thesocialproxy.com/wp-json/tsp/instagram/v1/profiles/feed?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}&username=samharrisorg',
headers: {
'Content-Type': 'application/json',
},
};
request(options, async function (error, response) {
if (error) throw new Error(error);
try {
const jsonResponse = JSON.parse(response.body);
const imageDetails = extractImageDetails(jsonResponse);
// Set up OpenAI API
const openaiApiKey = 'YOUR_OPENAI_API_KEY';
// Analyze each filtered caption for hate speech
for (let detail of imageDetails) {
const { url, caption } = detail;
if (
caption &&
keywords.some((keyword) => caption.toLowerCase().includes(keyword))
) {
try {
const response = await axios.post(
'https://api.openai.com/v1/moderations',
{
Input: url
},
{
headers: {
Authorization: `Bearer ${openaiApiKey}`,
'Content-Type': 'application/json',
},
}
);
const result = response.data.results[0];
console.log(`Image URL: "${url}"`);
console.log(`Caption: "${caption}"`);
console.log(`Hate Speech: ${result.category_scores.hate > 0.5}`);
console.log(
`Categories: ${JSON.stringify(result.categories, null, 2)}`
);
console.log(
`Scores: ${JSON.stringify(result.category_scores, null, 2)}\n`
);
} catch (error) {
console.error(`Error analyzing caption: "${caption}"`, error);
}
}
}
} catch (parseError) {
console.error('Error parsing JSON:', parseError);
console.log('Raw response:', response.body);
}
});
function extractImageDetails(data) {
const details = [];
if (data.data && data.data[0] && data.data[0].items) {
data.data[0].items.forEach((item) => {
if (
item.media_type === 1 &&
item.image_versions2 &&
item.image_versions2.candidates
) {
// Get the URL of the first (usually highest quality) image
const imageUrl = item.image_versions2.candidates[0].url;
const caption = item.caption ? item.caption.text : '';
details.push({ url: imageUrl, caption });
}
});
}
return details;
}
In the Social Proxy’s API endpoint, replace the username with a new username of your choice.
Here is a breakdown of what the code does:
Here is the output:
In this scenario, we will extract a user’s followers and their usernames using the Scraper API and analyze their profile pictures for any sign of hate speech with the OpenAI moderation API.
Create a followers.js file in your project folder to implement the code to retrieve and analyze all user posts using the Scraper API and OpenAI.
const request = require('request');
const axios = require('axios');
const options = {
method: 'GET',
url: 'https://thesocialproxy.com/wp-json/tsp/instagram/v1/profiles/followers?consumer_key={CONSUMER_KEY}&consumer_secret={CONSUMER_SECRET}&username={USERNAME}&number_of_results=25',
headers: {
'Content-Type': 'application/json',
},
};
request(options, async function (error, response) {
if (error) throw new Error(error);
try {
const jsonResponse = JSON.parse(response.body);
const users = jsonResponse.data.users;
const extractedData = users.map((user) => ({
username: user.username,
profile_pic_url: user.profile_pic_url,
}));
for (const user of extractedData) {
const result = await analyzeProfilePic(user.profile_pic_url);
console.log(
`Username: ${user.username}, Hate Speech Detected: ${result.hate_speech}`
);
}
} catch (parseError) {
console.error('Error parsing JSON:', parseError);
console.log('Raw response:', response.body);
}
});
async function analyzeProfilePic(profilePicUrl) {
const apiKey = ‘YOUR_OPENAI_API_KEY’
try {
const response = await axios.post(
'https://api.openai.com/v1/moderations',
{
input: profilePicUrl,
},
{
headers: {
Authorization: `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
}
);
const result = response.data;
return {
hate_speech: result.results.some((r) => r.categories.hate_speech),
};
} catch (error) {
console.error('Error analyzing profile picture:', error);
return {
hate_speech: false,
};
}
}
In The Social Proxy’s API endpoint, replace the username with the username of your choice.
Here is a breakdown of what the code does:
Here is the output:
This article demonstrates how to effectively use The Social Proxy and OpenAI moderation endpoint to detect hate speech on Instagram. By following these steps, you can easily implement a hate speech detection system into any project.
The Social Proxy facilitates the process of getting valuable data from social media platforms and is valuable to anyone interested in gaining more insights from public social media data.