The Plebeian Algorithm: A Democratic Approach to Censorship and Moderation

doi:10.2196/32427

Original Paper

¹Faculty of Science, University of Ontario, Institute of Technology, Oshawa, ON, Canada

²Faculty of Health Sciences, Queen's University, Kingston, ON, Canada

³Faculty of Engineering, Lakehead University, Thunder Bay, ON, Canada

⁴School of Engineering Technology, Trades, and Aviation, Confederation College, Thunder Bay, ON, Canada

*all authors contributed equally

Corresponding Author:

Benjamin Fedoruk

Faculty of Science

University of Ontario

Institute of Technology

2000 Simcoe St N

Oshawa, ON, L1G 0C5

Canada

Phone: 1 905 721 8668

Email: benjamin.fedoruk@ontariotechu.net

Background: The infodemic created by the COVID-19 pandemic has created several societal issues, including a rise in distrust between the public and health experts, and even a refusal of some to accept vaccination; some sources suggest that 1 in 4 Americans will refuse the vaccine. This social concern can be traced to the level of digitization today, particularly in the form of social media.

Objective: The goal of the research is to determine an optimal social media algorithm, one which is able to reduce the number of cases of misinformation and which also ensures that certain individual freedoms (eg, the freedom of expression) are maintained. After performing the analysis described herein, an algorithm was abstracted. The discovery of a set of abstract aspects of an optimal social media algorithm was the purpose of the study.

Methods: As social media was the most significant contributing factor to the spread of misinformation, the team decided to examine infodemiology across various text-based platforms (Twitter, 4chan, Reddit, Parler, Facebook, and YouTube). This was done by using sentiment analysis to compare general posts with key terms flagged as misinformation (all of which concern COVID-19) to determine their verity. In gathering the data sets, both application programming interfaces (installed using Python’s pip) and pre-existing data compiled by standard scientific third parties were used.

Results: The sentiment can be described using bimodal distributions for each platform, with a positive and negative peak, as well as a skewness. It was found that in some cases, misinforming posts can have up to 92.5% more negative sentiment skew compared to accurate posts.

Conclusions: From this, the novel Plebeian Algorithm is proposed, which uses sentiment analysis and post popularity as metrics to flag a post as misinformation. This algorithm diverges from that of the status quo, as the Plebeian Algorithm uses a democratic process to detect and remove misinformation. A method was constructed in which content deemed as misinformation to be removed from the platform is determined by a randomly selected jury of anonymous users. This not only prevents these types of infodemics but also guarantees a more democratic way of using social media that is beneficial for repairing social trust and encouraging the public’s evidence-informed decision-making.

JMIR Form Res 2021;5(12):e32427

doi:10.2196/32427

Keywords

infodemiology; misinformation; algorithm; social media; plebeian; natural language processing; sentiment analysis; sentiment; trust; decision-making; COVID-19

The internet is a powerful tool for spreading information; as such, it follows that it is equally powerful for spreading misinformation. In 2019, the number of social media users worldwide was 3.484 billion [1], with that number increasing year-by-year by an average of 9% [1,2]. With this increased use, the “power-user” or microinfluencer phenomenon has arisen, where popular social media accounts are able to reach large numbers of readers. This is increasingly important as more people begin to use social media as a source for news [3]. This news comes from a third party by a popular influencer, not posted or moderated by the social media companies themselves. Past analyses examining online misinformation often classify posts as misinformation using a “Point-And-Shoot” algorithm; this is the status quo. However, some algorithms will be better at combating misinformation than others. The Plebeian Algorithm creates criteria that social media websites should take into account when designing their algorithms to reduce misinformation. This reduction of misinformation is thought to be achieved by examining the correlation between sentiment and misinformation; it has been found that posts containing misinformation tend to have more negative sentiment when compared categorically to other posts covering the same issue [4]. Due to this correlation, it is hypothesized that an algorithm that encourages positive interactions will also reduce the amount of misinformation present on the platform through a democratic manner.

Misinformation is a key problem, yet many terms are confused in studies. Herein, the authors shall define several key terms that are often used interchangeably but whose definitions are specific and distinct. First, misinformation shall be defined as the spread, intentional or otherwise, of false information [5]. The intentions of the individuals spreading the information is irrelevant. Second, disinformation is the purposeful spread of false information [5]. A similar yet distinct definition is malinformation, which is the malicious spread of false or misleading information [5]. Finally, fake news is defined as any misinformation (with or without intention) that readers interpret as trustworthy news [5]. For this study, misinformation will be studied in-depth; however, it should be noted that future supplemental studies could conduct a similar investigation focused on disinformation, malinformation, or fake news. Infodemics can additionally be applied to the realm of health care; infodemics have the potential to intensify outbreaks when there is uncertainty among the public concerning evidence-informed preventative and protective health measures [6].

Prior to investigating the spread of misinformation, it is pertinent to define the concept of infodemiology and misinformation. This research paper defines an infodemic according to the World Health Organization (WHO) as “too much information including false or misleading information in digital and physical environments during a disease outbreak,” which “causes confusion and risk-taking behaviours that can harm health [and] also leads to mistrust in health authorities and undermines the public health response” [6]. The study of the spread of infodemics on a large scale, especially pertaining to medical misinformation, is known as infodemiology. The WHO has additionally linked the rapid surge of such infodemics during the COVID-19 pandemic to “growing digitization,” which can support the global reach of information but can also quickly amplify malicious or fabricated messages [6]. The second relevant definition is that of the concept of misinformation, which has been defined similarly to the definition used by the WHO in relation to the infodemic [6], but specifically refers to the distinct lack of verity in information related to a specific field.

At their core, most social media websites aim to maximize the amount of time that users spend on their platforms. This maximization of user page time leads to companies using highly specialized and trained machine learning to advertise content on users’ feeds [7]. At the same time, this can have unintended adverse effects such as maximizing the time a user engages with content that is not verified for accuracy. The proposed solution to this disparity between engagement and integrity is to create democratically moderated spaces. Democratic spaces and recommendations to posts with more positive sentiment are integral concepts in the Plebeian Algorithm, based on the latest evidence that misinformation tends to be more negative [5]. The Plebeian Algorithm is an algorithm, described herein, for the purpose of the control of the spread of misinformation on social media. It is beneficial compared to other existing algorithms, as will become evident. The currently implemented point-and-shoot algorithms are hypertuned to specific sources of misinformation surrounding specific topics. However, they are not adaptable to the fluidity of the definition of true information.

As mentioned earlier, most social media platforms work on a model similar to Twitter, Facebook, or YouTube where content is recommended based on user engagement [7]; however, this is not true to the same extent for all websites. One example of a website that breaks the expectations for social media algorithms is 4chan. 4chan is an excellent epitome of ephemeral social media, where content is completely anonymous and is rapidly discarded regardless of popularity [8]; in addition, there is close to no moderation and the content tends to be more negative in sentiment. This is also exemplified in Parler, an alternative social media platform established in September 2018 that aimed to bring forth a platform with total freedom of speech. Consequently, Parler attracted those who were banned from other social media websites, creating “echo chambers, harbouring dangerous conspiracies and violent extremist groups” [9] such as those who were involved in raiding the US Capitol on January 6, 2021. Reddit also has a forum-based system similar to 4chan. However, individual fora on these platforms have moderators who work to combat negative sentiment throughout the website. Reddit’s issue lies in its incredibly isolated fora, as tailoring one’s feed to be a vast majority of explicitly handpicked fora is a part of the experience; this allows for some fora to have little to no moderation [10].

There have been several related works of research in the field of misinformation detection on social media platforms. These works include studies on the connection between misinformation and cognitive psychology [11], the analysis of geospatial infodemiology [12], the effect of recommendation algorithms on infodemiology [13], the use of distributed consensus algorithms to curb the spread of misinformation [14], and the naming conventions used for viruses [15]. Although these works are in alignment with this study, they do not propose the same solution. The study that offers a solution closest to that proposed by the Plebeian Algorithm discusses the efficacy of curbing the spread of misinformation through layperson judgements [16]. Notably, this work discusses the merits surrounding a layperson algorithm but does not make suggestions for its implementation.

The objective of the study will be to determine the optimal social media algorithm to reduce the spread of misinformation while ensuring personal freedoms. The investigation conducted in this paper will have far-reaching implications that will alter how misinformation in social media platforms is addressed.

The three major implications include:

The creation of a more open and democratic environment on social media platforms
An overall reduction in political divisiveness and extremist sentiment both online and offline
An increase in informed users who can make well-informed opinions on subjects

A detailed step-based methodology was used to analyze data throughout the research process. Python 3.9 (Python Software Foundation) was the language of choice through all aspects of the project. All libraries used can be accessed using pip. The visualization of data was performed using the matplotlib and seaborn libraries in Python. Application programming interfaces (APIs) were used from Twitter, Reddit, and 4chan, gathering data regarding username, date, post, and text. Furthermore, two data sets were gathered from academic sources, containing post data from Twitter [17] and Parler [9]. Various Python libraries were used to interact and connect with the APIs, including twarc, urllib3, and basc_py4chan. The following Python libraries were used to clean the data: beautifulsoup4, demoji, and pyenchant. The pandas library for Python was used to retrieve and store third-party data sets [8,17-20], and the numpy library was used for various array operations. Finally, the nltk library was used to perform sentiment analysis, and sklearn was used to perform regressions.

Python was selected due to its ease of connectivity to the various APIs; it is well supported among a strong community, and as such, connecting to various APIs was done through prewritten libraries. This reduced the programming time while increasing the efficiency and reliability of the code.

In three of the social media services for which APIs were used (ie, Twitter, Reddit, and 4chan), four steps were performed: (1) gather data using the API and the associated Python library; (2) clean data to create a Python set of strings, containing no URLs (removed using regular expressions), HTML (removed using beautifulsoup4), usernames (removed using regular expressions), emojis (replaced with text using demoji), or non-English language (removed using pyenchant); (3) perform sentiment analysis using nltk’s SentimentIntensityAnalyzer class; and (4) save the cleaned and sentiment analyzed data frame as a pickle file. A visualization script was then programmed to display the sentiment data gathered from the social media post data. To ensure confidentiality of users, only aggregate data was displayed. Plotting a histogram with a kernel density estimation (KDE) resulted in the various graphs produced by the research team. Data for which sentiment analysis returned inconclusive due to textual limitations was removed from visualization. Limiting the language to English has the benefit of statistical comparison congruence. One notable platform that did not use an API is Facebook. The reason for this is due to the restrictions placed on the Facebook API, in terms of depth and breadth of research.

The six social media services analyzed (4chan, Twitter, Parler, Reddit, YouTube, and Facebook) had various amounts of associated data. A breakdown of the data analyzed herein is described in Figure 1.

The sentiment analysis dictionary selected for the analysis performed herein is the Valence Aware Dictionary and Sentiment Reasoner (VADER). This dictionary was selected as it is the industry standard for a wide array of general statement analyses and is especially recognized for producing highly accurate results with social media platforms. As such, VADER is the optimal dictionary for the purposes of this research. Although ideal algorithms should implement various checks and balances for the sentiment analysis system implemented, this paper shall focus on strictly the VADER dictionary, which is solely positive and negative sentiment. Other sentiment analysis tools exist to examine specific emotions (including anger, fear, surprise, happiness, etc).

Key term analysis was used for the data cleaning process to determine which strings were classified as relating to a specific topic. The key terms were gathered using a list of the most commonly held terms gathered from Twitter that were directly associated with misinformation.

To confirm the academic literature [5] regarding the correlation between negative sentiment and verity of information, the analysis was performed using Twitter. Data was filtered such that only Tweets containing a set of potentially misinformative keywords were assigned to be assessed using a sentiment analysis. Both were plotted through histogram, and the KDEs were compared (relative to each respective maxima).

Misinformation is directly correlated to negativity. A misinformative post is often negative in sentiment. However, this is not a certainty. As such, when determining an optimal algorithm, it will be critical to use sentiment analysis to narrow the potential misinformative candidates and then to use further methods—a jury process—to accurately detect misinformation.

The study defines several mathematical terms. Many of the histograms and KDEs as previously described form a bimodal distribution. The polarity score upon which the two peaks are centered is termed μ⁺ and μ^–, where the sign indicates whether the term refers to the positive or negative peak. The other variable defined is the skewness of the distribution as a whole, which is described using the symbol γ. When the positive peak is the major mode, then γ ∈ (0, ∞). Contrarily, when the positive peak is the minor mode, then γ ∈ (–∞, 0). The frequency function f describes the frequency curve represented by the KDE (such that f(p) represents the frequency of strings with polarity score p). The skewness is calculated using the following equation:

This equation for skewness was derived using the following derivation:

The results of the analysis will be divided by the social media platform. They will be presented in the following order:

Reddit
4chan
Facebook
YouTube
Parler
Twitter

When analyzing Reddit’s data, a series of subreddits were selected. The subreddits selected were r/AskReddit, r/AskThe_Donald, r/conspiracy, r/covid, r/kindness, r/movies, r/politics, and r/EnoughTrumpSpam. These subreddits were selected as an array of options, allowing an analysis of probable misinformative, probable truthful, and unknown sources. The data was gathered using the Python library urllib3. The first subreddit to be examined herein is r/AskReddit. This subreddit tends to contain a wide variety of posts from a myriad of conversation topics. As such, it is relatively indicative of Reddit overall. r/AskReddit’s histogram can be found in Figure 2. The bimodal distribution was μ^–≈–0.54, μ⁺≈0.48, and γ≈–0.03214.

Figure 2. Reddit r/AskReddit frequency of positivity histogram.

It should be noted that the extremal frequencies of the bimodal distribution were approximately equal between the negative and positive peaks. Another notable subreddit examined was r/politics, which provided a sample of posts potentially swayed by the political leaning of Reddit users. The histogram and KDE for this analysis is displayed in Figure 3. The bimodal distribution for r/politics was μ^–≈–0.56, μ⁺≈0.43, and γ≈–0.37776.

Figure 3. Reddit r/politics frequency of positivity histogram.

r/politics’s content had a stronger negative skew as is apparent by the KDE. The final subreddit to be examined is an avant-garde subreddit: r/conspiracy. In this community, users share various conspiracy theories. When one scrolls through r/conspiracy, plenty of misinformation can easily be noted, including misinformation surrounding Flat Earth Theory and QAnon. The histogram for r/conspiracy is found in Figure 4. The bimodal distribution for r/conspiracy was μ^–≈–0.56, μ⁺≈0.39, and γ≈–0.33904.

Figure 4. Reddit r/conspiracy frequency of positivity histogram.

As can be noted by r/conspiracy, the conspiratorial posts (which are known to contain a large volume of misinformation) are more often negative. This can be noted due to the difference in the peaks of the bimodal distribution.

4chan

To analyze the 4chan data, five boards were selected: /b/, /a/, /v/, /pol/, and /r9k/. These five boards were selected due to their high post frequency compared to other 4chan boards. The data was gathered using basc_py4chan. For each of these boards, a histogram was plotted (with an overlayed KDE) with 30 bins. A visualization for the histogram for /b/’s sentiment can be found in Figure 5. /b/ is described as the random board, containing a wide mixture of conversation from across 4chan. The bimodal distribution for /b/ was μ^–≈–0.55, μ⁺≈0.46, and γ≈0.11380.

Another board to be visualized in this report is the visualization for the sentiment of /pol/, which can be found in Figure 6; /pol/ contains political discussion. The bimodal distribution for /pol/ was μ^–≈–0.61, μ⁺≈0.38, and γ≈–0.16559.

It should be noted that the levels of extreme negative sentiment (ie, with a polarization score of less than 0.75) are substantially higher in /pol/ compared to /b/. This demonstrates that political topics tend to be more negative on 4chan.

Overall, it should be noted that 4chan consistently contains a large number of negative posts, which is greatly dependent upon the topic of the board. Boards which pertain to specific recreational activities (eg, /v/ for video games or /a/ for anime) have a lesser degree of negative polarity.

Figure 6. The 4chan /pol/ frequency of positivity histogram.

Facebook

It is pertinent for this paper to perform an analysis on Facebook, which is currently the social platform with the largest user base of 2.8 billion active monthly users [21]. Facebook has proven to be the social media platform with the highest user base, and as such it is pertinent for this paper to perform analysis on data collected for Facebook. A data set that specifically contains data predating the COVID-19 pandemic was accessed to broaden the scope of the sentiment analysis [22]. The data set includes data gathered from Facebook’s inception until 2017. A data set with a random selection of Facebook comments from the temporal range [22] was selected for sentiment analysis using VADER.

In Figure 7, a histogram was plotted with 30 bins, depicting the frequency of Facebook comments at various sentiment analytic levels. A KDE was overlayed onto the plot to show the general trend.

Notable features of the bimodal histogram include the sharp positive peak and wide negative peak. It should be noted that the integral for the KDE is as follows:

Furthermore, the following values were extracted from the KDE: μ⁺≈0.43, μ^–≈–0.29, and γ≈0.49858.

Figure 7. Facebook frequency of positivity histogram.

YouTube

YouTube is based entirely on long-form video content and tends itself toward more in-depth topics. A preselected data set of YouTube comments [23], after sentiment analysis, has been visualized and presented in Figure 8.

The data set [22] was collected in 2017 and, as such, does not contain misinformation related to COVID-19. This helps to broaden the temporal scope of this analysis and ensure that the present trends hold in data outside of the COVID-19 pandemic (ie, prior to January 2020). It was also limited in geographic scope to the United States, the United Kingdom, and Canada. This limitation was due to the availability of data. It should be noted that these three countries represent the English-speaking members of the Group of Seven, a group of the seven most democratic, affluent, and pluralist nations in the world.

As can be noted, there was a strong positive skewness in the data, with μ⁺≈0.67, μ^–≈–0.52, and γ≈0.66593. The high positive skewness should be noted for these YouTube comments. Potential explanations for this trend will be discussed in a later section.

Figure 8. YouTube COVID-19 frequency of positivity histogram.

Parler

The analysis of Parler was a transition from the traditional analyses of Reddit and 4chan, due to the fact that Parler is not broken down into communities to which users subscribe but is a single news feed–style system. The analysis of Parler should be contrasted to the analysis of Twitter in the subsequent section, as users migrated from Twitter to Parler due to a perception of limitations placed on their freedom of expression on Twitter.

When analyzing Parler, data was collected into a data set throughout the COVID-19 pandemic and the period surrounding the events of January 6, 2021 [9]. Figure 9 contains a visualization of the COVID-19–related parleys posted between January 2020 and March 2020. The bimodal distribution was μ^–≈–0.53, μ⁺≈0.45, and γ≈0.22063.

Figure 9. Parler COVID-19 frequency of positivity histogram.

Twitter

The majority of the analysis performed through this paper was on the social media service Twitter. The reason for this is due to the high amount of data regarding misinformation on the platform, the overall popularity of the platform as a general case study, and the generality of the platform (compared to some other unorthodox data sources such as YouTube comments).

Similar to Parler, Twitter tweets are made at-large to the public. There are no channels, boards, or subreddits of any sort. However, due to the Twitter algorithm, there is an allowance of individuals’ feeds to be in an echo chamber. Evidently, echo chambers should be avoided wherever possible. Echo chambers are a large contributor to the rampant spread of misinformation that is seen surrounding the COVID-19 pandemic [24].

This study used a combination of both data gathered from the Twitter API and a data set of pregathered COVID-19 tweets [17,18,20]. The interface used to connect the Python code (and sentiment analysis) to the Twitter API was twarc. The reason for this duplication of analysis was to ensure that the data used was accurate. Precision must be maintained in both data collected by APIs and over a long duration.

In both studies (using the API and the data set [17]), the study analyzed the broad sentiment of COVID-19–related tweets and filtered the data by keyword. The keywords used included terms concerning misinformation surrounding COVID-19, including “China Virus,” “Bioweapon,” and “Microchip.” The filtered data then underwent sentiment analysis. Both sentimentally analyzed data were plotted on the standard histogram with overlaid KDE.

For the discussion, the study focused on the data gathered from the Twitter API, since a similar methodology was used for gathering data for the other social media platforms studied. However, it should be noted that similar results were attained using the data set [17]. Figure 10 is a graph of the Twitter API’s gathered tweets pertaining to COVID-19 (the broad topic), where the sentiments of the tweets are plotted on a histogram with a KDE. A random sample of the data was taken for this analysis, as there were too many tweets to reasonably analyze the population. The bimodal distribution was μ^–≈–0.36, μ⁺≈0.47, and γ≈0.86500.

Figure 10. Unfiltered COVID-19 Twitter application programming interface frequency of positivity histogram.

As can be noted, the positive peak for the KDE is nearly double the negative peak. This indicates that the number of positive tweets far exceeds the number of negative tweets. Comparatively in Figure 11, the negative peak of the bimodal distribution is on par with the positive peak. This figure is a histographic representation of the polarity score for tweets after being filtered. The tweets selected only contain terms that are known to pertain to COVID-19 misinformation. The bimodal distribution was μ^–≈–0.42, μ⁺≈0.47, and γ≈0.80080 between the two Twitter measurements.

Figure 11. Filtered COVID-19 Twitter API frequency of positivity histogram. API: application programming interface.

The study’s discussion of the reasoning behind this proportional increase in the negative peak compared to the positive peak will be discussed further in the subsequent section. Again, it must be noted that the same results can be seen when performing an identical analysis on the data gathered from the data set [17].

In the Discussion section, not only will an analysis of the results and errors be explored but also the Plebeian Algorithm and its benefits will be discussed, as well as how it compares to the algorithms of the social media platforms studied.

Results Analysis

Several critical notes must be made with regard to the analysis of the quantitative features produced in the Results section.

First, it is critical to note that 4chan was the only social media platform studied that had an overall positive γ, notably on the all-encompassing board of /b/. It was gathered by the observations that a system that provided users with the freedom to determine which content got promoted—as opposed to an artificial intelligence algorithm—improved the sentiment of the average post. This is a key point in the Plebeian Algorithm, which is described in a subsequent section. Second, Twitter had a more moderate skew (ie, closer to 0, or neutral) μ^–, indicating that users tended to be more positive than users on the other social media platforms analyzed in the study.

It is also critical to recall that there was a strong correlation between polarity scores as determined by a sentiment analysis algorithm and the verity of the information communicated [5]. As such, the analysis provided herein can be applied to both the sentiment and the verity of a social media post.

Echo Chambers

In analyzing the data represented by the KDEs and the skew of the bimodal distribution of the data toward negative sentiment, a confirmation of the echo chamber effect (a theory that states “the Internet has produced sets of isolated homogeneous echo chambers, where similar opinions reinforce each other and lead to attitude polarization” [25]) was clearly shown, as negative sentiment has a clear association with emotions such as anger, which have been shown to “...[reinforce] echo chamber dynamics...in the digital public sphere” [25]. In fact, other studies have also predicted the link to this effect to be the impact of the specific algorithms used in the virtual space [26]. The link gives strong evidence to suggest that the algorithms currently deployed by social media companies are creating the optimal medium through which misinformed opinions and content can grow and go uncontested. These echo chambers ensure that users are unable to get access to arguments that conflict with their beliefs and expand their perspectives.

Sources of Error

Although the study attempted to limit error, there remained several sources of error stemming from the methodology of analysis used. The first source of error is the trouble with using key term searching, as it would give us results of posts of not only individuals spreading misinformation but also those trying to bring attention to the issue of misinformation and those who spread it. Furthermore, the sentiment analysis would also have been unable to differentiate between a misinformed post and one that tried to bring attention to the problem. This is because some of the true tweets demonstrate overall negative emotions. The final problem with the key term search is that some of the key terms determined to be misinformative may in the future be proven to be accurate information.

A further source of potential error comes in the form of the social media platforms used. One problem is that only four social media platforms were assessed, thus limiting the scope of the study. It could be that the trends found in the research will not show up in other social media platforms (eg, Facebook). Another issue from the sources assessed was that they were all only textually based and thus the method proposed may not be replicable for more graphically based social media platforms (eg, YouTube, Instagram, or Snapchat).

Definition and Implementation of the Plebeian Algorithm

As described previously, the Plebeian Algorithm is a novel algorithm for identifying and removing misinformation through democratic means. It works in two distinct phases: the Flag Phase, to determine which posts are misinformative, and the Jury Phase, to judge the information to determine if removal is appropriate.

Flag Phase

The Flag Phase is tasked with the determination of possible misinformed posts. In doing so, the algorithm selects posts that have a large number of views and then performs sentiment analysis on both the original post and a “without-replacement simple random sample” [27] of comments or replies to the post. If the overall sentiment leans negative, then the post is flagged as being potentially misleading. The posts flagged will then be passed into the Jury Phase.

Jury Phase

The Jury Phase is tasked with the trial and removal of truly misinformative posts. These posts are removed from the home page or news feed of the user. During the Jury Phase, flagged posts are sent to a random selection of anonymous users known as jurors. This selection should provide a diverse group, consisting of varying political opinions to give the post a fair trial. The selection of jurors uses a “without-replacement simple random sample” of a population [27]. The number of jurors selected is exactly 10% of the number of viewers of a post, rounded up. It should be noted, however, that jurors are not forced to participate or vote. It is assumed that the number of voting jurors will be far less than the total number of jurors. Thus, selecting 10% of the population allows room for the uncertainty of juror engagement. The jurors are then asked to vote either for or against the removal of the post. Once the deliberation has lasted a set duration (or a threshold of response has been met), the results will be counted and the post will remain on the site or be removed by the algorithm.

For reference, a summative flowchart detailing the Plebeian Algorithm as described can be found in Figure 12. The areas colored in magenta constitute the Flag Phase, while the areas colored in mint green constitute the Jury Phase.

Figure 12. Flowchart of the Plebeian Algorithm.

Existing Algorithms

In the following section, each of the algorithms will be detailed for Reddit, 4chan, Parler, and Twitter. These analyses will be based on academic journal articles [10-12,28,29]. It is pertinent to analyze these on a case-by-case basis, as it must be ensured that the user base remains loyal to the brand and platform [30]. Prevalent existing algorithms include the PageRank and Hits algorithms [31].

Before dissecting the individual social media algorithms that are currently being used for the various platforms, it is critical to mention that most of these algorithms have an identical goal: to get users to stay on the platform, thus ensuring a continued revenue for their organization. This objective contradicts the goal of preventing misinformation from spreading on the platform, as preventing misinformation requires some censorship, resulting in a reduction of revenue. This, however, is not to indicate that the Plebeian Algorithm is of little value to site entrepreneurs. It is critical to give note that the ultimate goal of social media companies (with the exception of Parler) have already shown themselves interested in curbing and moderating their own social media platforms through their implementation of “point-and-shoot” algorithms as well as censorship of high-profile posts and accounts (eg, Twitter banning @realDonaldTrump). However, as has already been described, these algorithms are not effective at accomplishing their mission of reducing misinformation and, further, have caused users to become disillusioned with the service. This has led to many users joining platforms that capitalize on this disillusionment (eg, Parler). Hence, by implementing the Plebeian Algorithm, these social media companies finally have a method that carefully balances moderation with freedom of expression that will reinspire a sense of awe within their user base and bring back the notion of social media being a fun online space where people can collaborate and share freely.

The algorithm used for Reddit is a simple upvote/downvote system, as was described in the Introduction section. Users of Reddit are encouraged to upvote content they like and are encouraged to downvote content that they do not like. Posts with more upvotes are more widely shared, whereas the opposite is true with posts with more downvotes. In Reddit, users are allowed to vote on the original post and any comments. “Comment trees” are inherently created by the system as users comment on comments (thereby chaining comments together into a treelike formation).

The Reddit algorithm is tailored to the interests of the Reddit user. Through a system of subscriptions to various topics of conversations or subreddits. Users will receive a mixture of content from the subreddits to which they have subscribed, with additional, sporadic advertisement.

The system is essentially tailored to the specific user. This contrasts with the Plebeian Algorithm, which emphasizes the democratic process for the determination of verity by the user base. Currently, Reddit contains no user-controlled means to fight misinformation aside from the “Report” button, which brings the issue to the attention of a staff person at Reddit. This process is considered a manual review by the corporation, and as such, it does not constitute something similar to the Plebeian Algorithm. For Reddit to implement a Plebeian Algorithm, it must ensure that the process of the misinformation determination remains in the hands of the user base.

Although Reddit currently appears to be a democratic system, it is more of a fiefdom [29]. For example, in 2013 the r/FindBostonBombers subreddit slandered the Brown family by connecting them with the 2013 attack on the Boston Marathon at the direction of the moderators of the subreddit [32]. Examples like this resonate throughout Reddit through incidents such as “the Fappening,” where nude photographs were released to the public unbeknown to victims. Incidents like these make apparent the crux of the fundamental issue with Reddit: the moderators. This promotes content moderation by a few elite members of communities, instead of by the members of the said community as a whole.

It should be noted that Reddit is a platform that is built on a sense of anonymity. Users are not required to add their personal email addresses or their real names. As is the case in all the implementations of the Plebeian Algorithm, it is important that the social media company critically analyzes the existing market served and existing qualities that users may be drawn to. An implementation of the Plebeian Algorithm on Reddit should preserve user anonymity and should still not require the use of personal emails or real full names.