How to Build a Disinformation Detector for Facebook Groups

The Social Proxy Team

Social media has brought us closer in many ways, but it bears a downside—misinformation is spreading twice as fast. With so many Facebook groups popping up for specific topics like health and business, it’s easy for false info to slip through the cracks and reach large audiences. This growing issue has fueled fear, anxiety, and even instability in communities. Scientists, social media analysts, and misinformation researchers are actively developing tools to combat fake news and detect false information in Facebook groups. From algorithm-based solutions to advanced verification tips, these experts are focused on creating sophisticated methods and strategies to help users identify misleading content.

In the article, we’ll review how to build a Facebook group disinformation model for keyword-specific disinformation detection by verifying posts in Facebook groups, building tools for scraping disinformation and content analysis for truth verification, and harnessing machine learning and scraping artificial intelligence (AI) for Facebook group disinformation. Let’s get started on boosting the health of online communities!

A step-by-step guide to scraping Facebook group posts and detecting false information on topics.

Step 1: Set up your development environment

Whether you’re a developer, researcher, or social media analyst, a detector is an easy tool to build. This is especially true when you leverage machine learning and Natural Language Processing (NLP) for disinformation detection.

Start by setting up your development environment.

Install NodeJS: If you’re new to development, start by visiting the official Node.js website to download and install it. This will enable you to run JavaScript on your server.
Create a virtual environment and install dependencies: Set up a directory and initialize a node project. Run “npm init” in your terminal and follow the prompt. Then, access the directory via an IDE like Visual Studio Code (VScode).

				
					//open terminal or terminal in IDE
mkdir disinformation-tool
cd disinformation-tool
npm init -y
npm install [package_name]

For this use case, “npm install [package_name]” will be used to install the necessary libraries such as Dotenv, Request, and Axios.

To benefit from the Social Proxy Scraper AI:

Set up Scraper API: You’ll need a scraper to capture keyword-specific disinformation in Facebook groups. The Social Proxy Scraper API offers flexibility, speed, security, and the ability to scrape large amounts of data from social networks without the interruption of site blockers or captcha limitations.
Create an account on the website and head to your dashboard.
Proceed to the “Scraper AI” section in the sidebar to subscribe and generate your Consumer Key and Secret Key. You’ll need these to integrate the Scraper API.

Step 2: Scrape Facebook group posts for a specific keyword using The Social Proxy Scraper API

Now that you’ve got the environment set up, follow these next steps:

Create a .env file to store API credentials and sensitive information from the social proxy platform.

				
					touch .env index.js //code to create documents in the terminal

				
					SCRAPER_CONSUMER_KEY=your scraper consumer key from dashboard
SCRAPER_CONSUMER_SECRET=your scraper consumer key from dashboard
SCRAPER_API_KEY={your_scraper_consumer_secret}:{your_scraper_consumer_key}
BASE_URL = https://thesocialproxy.com/wp-json/tsp/facebook/v1/

Create an index.js file where the scraper script will be written to search for keyword-specific keyword scraping.
Inside the index.js file, add the following code to scrape Facebook groups based on the specific keyword:

				
					//import the dotenv module
import dotenv from "dotenv";
dotenv.config();
//import the request module
import request from "request";
//import the nodejs inbuilt file system
import fs from "fs";
//get the env variables from the env file
const SCRAPER_CONSUMER_KEY = process.env.SCRAPER_CONSUMER_KEY;
const SCRAPER_CONSUMER_SECRET = process.env.SCRAPER_CONSUMER_SECRET;
const url = process.env.BASE_URL;
//create a variable for storing the group data
let groupsData;
//api call to get groups that discuss about keyword --- "natural remedies for cancer"
let options = {
  method: "POST",
  //use string literals to construct the url
  url: `${url}search/groups?consumer_key=${SCRAPER_CONSUMER_KEY}&consumer_secret=${SCRAPER_CONSUMER_SECRET}`,
  headers: {
    "Content-Type": "application/json",
  },
  //the body of the request will contain the key word
  body: JSON.stringify({
    page_size: 30,
    typed_query: "natural cures to cancer",
  }),
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  groupsData = response.body;
  const parsedGroupData = JSON.parse(groupsData);
  console.log(groupsData);

  try {
    // console.log(parsedGroupData, "parsedGroupData"); //data is parsed incorrectly due to its comples structure
    // Loop through results and log each object
    parsedGroupData.data.results.forEach((result, index) => {
      const cleanedResults = parsedGroupData.data.results.map(
        (result, index) => {
          // Extracting relevant fields and cleaning data
          const cleanedData = {
            id: result.id || `Missing ID: ${index + 1}`, // Ensure ID is available, fallback if missing
            name: result.name ? result.name.trim() : `No Name`, // Remove any extra spaces from name
            description: result.description
              ? result.description.trim()
              : `No Description`, // Remove any extra spaces from description, if present
            url: result.url || `Missing URL: ${index + 1}`, // Ensure URL is available, fallback if missing
            photoUrl: result.photo_url || `Missing Photo URL: ${index + 1}`, // Ensure photo URL is available, fallback if missing
            info: result.info || `Missing Info: ${index + 1}`, // Ensure info is available, fallback if missing\
            members: result.members || `Missing Members: ${index + 1}`, // Ensure members is available, fallback if missing
            privacy: result.privacy || `Missing Privacy: ${index + 1}`, // Ensure privacy is available, fallback if missing
          };

          // You can perform additional checks or data cleaning here if needed
          return cleanedData;
        }
      );

      // Write cleaned data to a JSON file
      fs.writeFileSync("groupData.js", JSON.stringify(cleanedResults, null, 2));
    });
  } catch (error) {
    console.error("Error parsing JSON data:", error.message);
    return;
  }
})

Let’s break down the code:

Required modules are imported, where the request module is used to make HTTP requests with The Social Proxy Facebook Scraper API and dotenv is used to load environment variables from the .env file.
Variables are created to store the credentials in the .env file.
The URL var is constructed using string interpolation and the options variable is configured to make the API request.
A callback is provided to handle the response. If there’s an error, it gets logged in the console. If the request is successful, the groupData variable gets populated with the results of the search of Facebook groups with posts on the keyword.

Step 3: Data preprocessing and keyword-specific analysis

Once the scraping is complete, the next step is to preprocess and clean the group data. This helps identify the group with the highest post frequency, membership, and relevant keywords. Afterwards, use the group ID to make an API call to retrieve the group data feed (groupDataFeed).

				
					import dotenv from "dotenv";
dotenv.config();

//import the request module
import request from "request";

//import the nodejs inbuilt file system
import fs from "fs";

//import the data set for group data
import { groupData } from "./groupData.js";

//get the env variables from the env file
const SCRAPER_CONSUMER_KEY = process.env.SCRAPER_CONSUMER_KEY;
const SCRAPER_CONSUMER_SECRET = process.env.SCRAPER_CONSUMER_SECRET;
const url = process.env.BASE_URL;

// Step 3: Data Preprocessing and Keyword-Specific Analysis
//we already have the dataset of groups that discuss about keyword in groupData.js
//we will get the group feed of the group with the hughest number of posts, members and fequency
// Sample input data from the Facebook Groups API (parsed JSON string)
const parsedGroupData = groupData;
// Keywords to search for in group names or info
const keywords = ["cancer", "cure", "natural"];

// Step 1: Helper function to clean and process member count strings
const parseMembers = (membersString) => {
  if (membersString.includes("K")) return parseFloat(membersString) * 1000;
  if (membersString.includes("M")) return parseFloat(membersString) * 1000000;
  return parseInt(membersString, 10);
};

// Step 2: Helper function to get the number of posts per day from the group info
const getPostsPerDay = (groupInfo) => {
  const postsMatch = groupInfo.match(
    /(\d+)\s+posts\s+(a\s+day|a\s+month|a\s+year)/
  );
  if (!postsMatch) return 0;

  const [, postCount, period] = postsMatch;
  const count = parseInt(postCount, 10);

  switch (period) {
    case "a day":
      return count;
    case "a month":
      return count / 30; // Approximate posts per day
    case "a year":
      return count / 365; // Approximate posts per day
    default:
      return 0;
  }
};

// Step 3: Process the group data and filter relevant groups
const processedData = parsedGroupData.map((group) => {
const memberCount = parseMembers(group.members);
  // Check if the group name or info contains any of the keywords
  const matchesKeywords = keywords.some(
    (keyword) =>
      group.name.toLowerCase().includes(keyword.toLowerCase()) ||
      group.info.toLowerCase().includes(keyword.toLowerCase())
  );
  //check if it is a public group
  if (!group.privacy) return null;
  return {
    id: group.id,
    name: group.name,
    url: group.url,
    photoUrl: group.photoUrl,
    members: memberCount,
    postsPerDay: getPostsPerDay(group.info), // Calculate posts per day
    privacy: group.privacy,
    matchesKeywords,
  };
});

// Step 5: Find the group with the highest post frequency per day
const findIdealGroup = (processedData) => {
  // Filter groups that are public
  const publicGroups = processedData.filter(
    (group) => group.privacy === "Public"
  );
  if (publicGroups.length === 0) {
    console.log("No public groups found.");
    return null;
  }
  // Sort groups by the highest postsPerDay in descending order
  const sortedGroups = publicGroups.sort(
    (a, b) => b.postsPerDay - a.postsPerDay
  );
  // Return the group with the highest post frequency (first in sorted array)
  return sortedGroups[0];
};

// Step 6: Display or log the ideal group with the highest posts per day
const idealGroup = findIdealGroup(processedData);

if (idealGroup) {
  console.log("Ideal Group:", idealGroup);
} else {
  console.log("No suitable group found.");
}

// Step 7: Save the ideal group data to a file
fs.writeFileSync("idealGroup.json", JSON.stringify(idealGroup, null, 2));
// Step 8: Fetch the posts data from the selected group with the key word  using its ID
let groupFeed = {
  method: "POST",
  url: `${url}groups/feed?consumer_key=${SCRAPER_CONSUMER_KEY}&consumer_secret=${SCRAPER_CONSUMER_SECRET}&groupId=${idealGroup.id}`,
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    page_size: 30,
    ranking_setting: "TOP_POSTS",
    typed_query: "natural cures to cancer",
  }),
};
request.post(groupFeed, (error, response, body) => {
  if (error) {
    console.error("Error fetching group feed:", error);
    return;
  }
  const postsData = response.body;
  console.log("Fetched Posts Data:", postsData);
  //
  let cleanedPostsData = JSON.stringify(postsData, null, 2);
  let data = JSON.parse(cleanedPostsData);
  fs.writeFileSync("idealGroupFeed.json", data);
});
//the extracted data is cleaned and re-saved in the idealGroupFeed.json

Transfer that data feed to an online JSON formatter to format, visualize the data, and understand the data for cleaning.
The groupDataFeed will then be cleaned and extract preprocessed keyword-specific user engagement features (e.g. likes, reactions, postId, message, etc).

				
					//example of cleaned GroupData 
    {
        id: 'UzpfSTEwMDA2NDI3NTMwNTIxNDpWSzoyNjk2NzIwNDUwNjIxMTMyNQ',
        message: null,
        url: 'https://www.facebook.com/groups/CureOfCancer/permalink/26967204506211325/',
        postId: '26967204506211325',
        reactionCount: 2,
        likeCounts: 2,
        reshares: 0
    },

Step 4: Build a keyword-specific disinformation detection model

Instead of building and training a model, you can confidently rely on the false information detection API endpoint of the scraper API offered by The Social Proxy. This API will analyze posts and classify them as true or false based on content.

The cleaned group data feed is run through the Scraper API false information detector to determine which post is true or false. The array of cleansed group data is mapped through to get their post IDs. Each post ID generates feedback as an updated field in the cleansed feed data object.

Here’s a look at the script:

				
					import axios from 'axios';// Ensure axios is installed
const fetchFalseInformation = async (cleanGroupFeedId) => {
    const url = process.env.BASE_URL;
    const SCRAPER_CONSUMER_KEY = process.env.SCRAPER_CONSUMER_KEY 
    const SCRAPER_CONSUMER_SECRET = process.env.SCRAPER_CONSUMER_SECRET; 
    try {
        const response = await axios.get(`${url}posts/false-information`, {
            params: {
                consumer_key: SCRAPER_CONSUMER_KEY,
                consumer_secret: SCRAPER_CONSUMER_SECRET,
                postId: 26929105993354510,
            },
            headers: {
                'Content-Type': 'application/json',
            },
        });
     // Axios uses `data` to return the response body as opposed to request using response.body
        return response.data;
    } catch (error) {
        console.error('Error fetching false information:', error.message);
        throw error;
    }
};
// Main function to loop through the array of group feeds and update each with false info data
const updateGroupFeedsWithFalseInfo = async (cleanGroupFeeds) => {
    const updatedFeeds = await Promise.all(
        cleanGroupFeeds.map(async (feed) => {
            const falseInfoData = await fetchFalseInformation(feed.postId); // Fetch false info for the feed
            return {
                ...feed, // Keep the original feed data
                falseInfoData   // Add the false info data to the feed object
            };
        })
    );
    return updatedFeeds;
};
// Call the function and process the results
updateGroupFeedsWithFalseInfo(cleanGroupFeed)
    .then(updatedFeeds => {
          console.log("Updated Feeds with False Information Data:", updatedFeeds);
        // Further processing of the updated feeds can be done here
    })
    .catch(error => {
        console.error("Error updating feeds with false information:", error);
    });

Step 5: Implement the keyword-specific disinformation detector

After successful integration with the Scraper API, the API assesses all posts through their IDs and uses user metrics, keywords, and other properties of the post to determine if the post is true or misleading. This model can be applied to real-time datasets from groups and group feeds. A scheduler can be integrated to scrape group data at regular intervals, enabling continuous real-time detection. Batch-by-batch processing of a set number of posts can be run through the model to determine their false states. The model can then analyze newly scraped posts, storing each result in a database for easy retrieval and further NLP or machine learning training.

A key part of building a detector is addressing the challenges of mismatching posts. That’s why keyword-context enrichment and increment of the confidence threshold d=of the scraper API can be encouraged in the reduction of false positives. False negatives can be curbed through the broadening of the keywords used, focusing on user engagement metrics and the use of deceptive NLP models for training.

Conclusion

In this article, we walked through the process of building a disinformation detector for Facebook groups. You can scrape Facebook group’s post data to retrieve it and then clean and process it using the detection system to predict its falsehood. We also covered strategies for managing false positives and negatives to ensure the results are as accurate and precise as possible.

With the growing need to combat the spread of false news and rumors, timely and reliable disinformation detection in Facebook groups is crucial. Want to take your efforts to the next level? Explore or sign up for The Social Proxy to supercharge your Facebook group scraping and stay on top of keyword-specific content monitoring.

Resources menu