[PDF] Analysis of Moral Judgement on Reddit

Abstract

Moral outrage has become synonymous with social media in recent years. However, the preponderance of academic analysis on social media websites has focused on hate speech and misinformation. This paper focuses on analyzing moral judgements rendered on social media by capturing the moral judgements that are passed in the subreddit /r/AmITheAsshole on Reddit. Using the labels associated with each judgement we train a classifier that can take a comment and determine whether it judges the user who made the original post to have positive or negative moral valence. Then, we use this classifier to investigate an assortment of website traits surrounding moral judgements in ten other subreddits, including where negative moral users like to post and their posting patterns. Our findings also indicate that posts that are judged in a positive manner will score higher.

Full PDF

AAnalysis of Moral Judgement on Reddit

Nicholas Botzer, Shawn Gu, and Tim WeningerDepartment of Computer Science and EngineeringUniversity of Notre Dame { nbotzer, sgu3, tweninger } @nd.edu Abstract

Moral outrage has become synonymous with social media inrecent years. However, the preponderance of academic anal-ysis on social media websites has focused on hate speechand misinformation. This paper focuses on analyzing moraljudgements rendered on social media by capturing the moraljudgements that are passed in the subreddit /r/AmITheAssholeon Reddit. Using the labels associated with each judgementwe train a classiﬁer that can take a comment and determinewhether it judges the user who made the original post to havepositive or negative moral valence. Then, we use this clas-siﬁer to investigate an assortment of website traits surround-ing moral judgements in ten other subreddits, including wherenegative moral users like to post and their posting patterns.Our ﬁndings also indicate that posts that are judged in a posi-tive manner will score higher.

How do people render moral judgements of others? This ques-tion has been pondered for millennia. Aristotle, for example,considered morality in relation to the end or purpose for whicha thing exists. Kant insisted that one’s duty was paramount indetermining what course of action might be good. Consequen-tialists argue that actions must be evaluated in relation to theireffectiveness in bringing about a perceived good. Regardlessof the particular ethical frame that one ascribes to, the com-mon practice of evaluating others’ behavior in moral terms iswidely regarded as important for the well-being of a commu-nity. Indeed, ethnographers and sociologists have documentedhow these kinds of moral judgements actually increase coop-eration within a community by punishing those who commitwrongdoings and informing them of what they did wrong [3].The process of rendering moral judgement has taken an in-teresting turn in the current era with the adoption of the In-ternet and social media in particular. Online social systems al-low people to encounter and consider the lives of others fromaround the world. At no other time in history have so manypeople been able to examine such a variety of cultures andviewpoints so readily. This increased sharing and mixing ofviewpoints inevitably leads to online debates about varioustopics [29]. The content of these debates provides researchers (a)(b)

Figure 1: Example of (a) a post title and (b) a comment in the/r/AmItheAsshole subreddit. The NTA preﬁx and comment-score (not shown) indicates that the commenter judged theposter as “Not the Asshole”.with the opportunity to ask speciﬁc questions about argument,disagreement, moral evaluation, and judgement with the aid ofnew computational tools.To that end, recent work has resulted in the creation of statis-tical models that can understand moral sentiment in text [25].However, these models rely heavily on a gazette of words andtopics and their alignment on moral axes. The central motiva-tion for these works are grounded in moral foundation theory[16] where studies also tend to investigate the use of moral-ity as related to current events in the news such as politics orreligion. Despite their usefulness in understanding the moralvalence of speciﬁc current events, the goal of the current workis to study moral judgements rendered on social media thatapply to more common personal situations.We focus on Reddit in particular, where users can createposts and have discussions in threaded comment-sections. Al-though the details are complicated, users also perform cura-tion of posts and comments through upvotes and downvotesbased on their preference [12, 15]. This assigns each post andcomment a score reﬂecting how others feel about the content.Within Reddit there are a large number of subreddits, whichare small communities dedicated to various topics. The sub-reddit of interest for our question regarding moral judgementsis called /r/AmItheAsshole. Users of this subreddit post a de-scription of a situation that they were involved in; they are alsoencouraged to explain details of the people involved and the ﬁ-nal outcome of the situation. Posters to /r/AmItheAsshole aretypically looking to hear from other Reddit users whether ornot they handled their personal situation in an ethically ap-propriate manner. Other users then respond to the initial post1 a r X i v : . [ c s . S I] J a n ith a moral judgement as to whether the original user wasan asshole or not the asshole. Figure 1 shows an example ofa typical post and one of its top responses. One importantrule of /r/AmItheAsshole is that top-level responses must cat-egorize the behavior described in the original post to one offour categories: Not the Asshole (NTA), You’re the Asshole(YTA), No assholes here (NAH), Everyone sucks here (ESH).In addition to providing a categorical moral judgement, theresponding user must also provide an explanation as to whythey selected that choice. Reddit’s integrated voting systemthen allows other users to individually rate the judgementswith which they most agree (upvote) or disagree (downvote).After some time has passed the competition among differentjudgements will settle, and one of the judgements will be ratedhighest. This top comment is then accepted as the judgementof the community. This process of passing and rating moraljudgement provides a unique view into our original questionabout how people make moral judgements.Compared to other methodologies of computational eval-uation of moral sentiments, collecting judgements from/r/AmItheAsshole (AITA) has some important beneﬁts. First,because posters and commenters are anonymous on Reddit,they are more likely to share their sensitive stories and frankjudgements without fear of reprisal [21, 18]. Second, the vot-ing mechanism of Reddit allows a large number of users toengage in an aggregated judgement in response to the origi-nal post [13]. However, the breadth and variety of this datadoes pose additional challenges. For instance, judgements areprovided without an explicit moral-framing, and, similarly,Reddit-votes do not explicitly denote moral valence and aresusceptible to path dependency effects [14].In the present work we use data from AITA to investigatehow users provide moral judgements of others. We then extractrepresentative judgement-labels from each comment and usethese labels and comments to train a classiﬁer. This classiﬁer isthen broadly applied to infer the moral valence of other Redditcomments from ten different subreddits and used to answer thefollowing research questions:RQ1: What language is most closely associated with posi-tive and negative moral valence?RQ2: Is moral valence correlated with the score of a post?RQ3: Do certain subreddit-communities attract userswhose posts are typically classiﬁed by more negativeor positive moral judgements?RQ4: Are self-reported gender and age descriptions associ-ated with positive or negative moral judgements?In summary, we ﬁnd that posts that are judged to have pos-itive moral valence ( i.e. , NTA label) typically score higherthan posts with negative moral valence. We also ﬁnd thatcertain subreddit-communities where users confess to some-thing immoral ( i.e. , such as /r/confessions) tend to attractusers whose posts are characterized by negative moral valence.Among these immoral users we show that their posting habits Label Meaning

NTA Not the Asshole 717,006YTA You’re the Asshole 372,850NAH No assholes here 91,903ESH Everyone sucks here 79,059Table 1: The four judgements that users can pass on the sub-reddit /r/AmItheAsshole.tend towards three different types. Finally, we show that self-described male-users are more likely to be judged an assholethan female-users.

We retrieve moral judgements by collecting posts and com-ments from the subreddit /r/AmItheAsshole, taken from thePushshift data repository [1].The questions raised in the present work are considered hu-man subjects research, and relevant ethical consideration arepresent. We sought and received research approval from theInstitution Review Board at the University of Notre Dame un-der protocol

Before we introduce our classiﬁer, we ﬁrst consider

RQ1 :what linguistic cues are associated with positive and nega-tive moral judgement? To answer this question we split com-ments into two valence classes: positive and negative. The pos-itive class contains comments that are labeled NTA or NAH;the negative class contains comments that are labeled YTA orESH. Then we use the Allotaxonmeter system, which com-pares two Zipﬁan distributions using a scoring function calledrank-turbulence divergence, to compare how terms are associ-ated with these valence labels [9]. In our case, we constructed1-gram multinomial distributions from each class, which, inthe English language, is well known to exhibit a Zipﬁan distri-bution [22].2 egative Valence Positive Valence you toquilt shebecause myintern cornellsuck theyTable 2: Rank divergence scores of terms associated with pos-itive and negative moral valences. Color bars indicate the rel-ative contribution of terms, i.e. , you contributes about 4 timesas much to negative valence as quilt .Table 2 shows the terms that contain the largest divergencecontribution for each valence class. Words with the highestnegative valence include you , because , and suck in the top5, but also petty , daughter and jesus within the top 10 (notshown). Simply put, these are the top words that are used whenassigning negative moral judgement. Words associated withpositive moral valence consist mainly of functional terms, butalso include the names several ivy league schools in the top50 . Given the dataset, a dataset with textual posts and textual com-ments labeled with positive or negative moral judgements, ourgoal is to predict whether an unlabeled comment assigns a pos-itive (NTA or NAH) or negative (YTA or ESH) moral judge-ment to the user of the post. It is important to note that thisclassiﬁer is classifying the judgement of the commenter, notthe morality of the poster.We deﬁne our problem formally as follows.

Problem Deﬁnition

Given a top level comment C withmoral judgement A ∈ { + , −} that responded to post P weaim to ﬁnd a predictive function f such that f : ( C ) → A (1)Formally, this takes the form of a text classiﬁcation task whereclass inference denotes the valence of a moral judgement.The choice of classiﬁcation model f is not particularly im-portant, but we aim to train a model that performs well andgeneralizes to other datasets. We selected four text classiﬁca-tion models for use in the current work:• Multinomial Na¨ıve Bayes [19]: Uses word counts to learna text classiﬁcation model and has shown success in awide variety of text classiﬁcation problems.• Doc2Vec [20]: Create comment-embeddings, which areinput into a logistic regression classiﬁer that calculatesthe class margin. A brief survey of these posts reveals that many posters ask if they arewrong to attend one of these schools even though their family member orpartner was not admitted.

TL;DR: Married, slept with another man,and regretted it immediately. Husbandfound out, I am not sure if he wants toleave me or not, but I am willing to doanything to ﬁx it. Need advice.If you were so unsatisﬁed why not tryand ﬁx things before you destroy some-ones life. You don’t really deserve a sec-ond chance. You’re actually terrible and Ihope you learn your lesson. Judge-BERT + o r – ( M o r a l V a l en c e ) Figure 2: A post (in blue) made by a user along with the topresponse comment (white). The comment is then fed to ourJudge-BERT classiﬁer (green) to determine the moral valenceof the post.• BERT Embeddings [8]: Uses word embeddings fromBERT, which are averaged together and input into a logis-tic regression classiﬁer that calculates the class margin.• Judge-BERT: We ﬁne-tune the BERT-base model usingthe class labels. Speciﬁcally, we added a single dropoutlayer after BERT’s ﬁnal layer, followed by a ﬁnal outputlayer that consists of our two classes. The model is trainedusing the Adam optimizer and uses the cross entropy lossfunction. We then trained for 3 epochs as recommendedby Devlin et al [8].

We evaluate our four classiﬁers using accuracy, precision, re-call, and F1 metrics. In this context a false positive is the in-stance when the classiﬁer improperly assigns a negative ( i.e. ,asshole) label to a positive judgement. A false negative is theinstance when the classiﬁer improperly assigns a positive ( i.e. ,non-asshole) label to a negative judgement. We perform 5-foldcross-validation and, for each metric, report the mean-averageand standard deviation over the 5 folds.The results in Table 3 indicate that the Doc2Vec, BERT, andMultinomial Na¨ıve Bayes classiﬁers do not perform particu-larly well at this task. Fortunately, the ﬁne-tuned Judge-BERTclassiﬁer performs relatively well, with an accuracy near 90%and where type 1 and type 2 errors are relatively similar. Over-all, these results indicate that the Judge-BERT classiﬁer is ableto accurately classify moral judgements.

Using the Judge-BERT classiﬁer, our next tasks are to betterunderstand moral judgement across a variety of online socialcontexts and to analyze various trends in moral judgement. Inorder to minimize the transfer-error rate it is important to selectsubreddit-communities that are similar to the training dataset.In total we chose ten subreddits to explore in our initial anal-ysis. The subreddits we chose can be broken into three mainstylistic groups and are brieﬂy described in Table 4.3 ethod Accuracy Precision Recall F1

Doc2Vec Embeddings 65.92 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Subreddit DescriptionAdvice

Users pose questions in ascenario like the AITA datasetand receive advice orfeedback on their situation./r/relationship advice/r/relationships/r/dating advice/r/legaladvice/r/dating

Confessionals

Users confess to somethingthat they have been keeping tothemselves. Typically,confessions are aboutsomething immoral the posterhas done./r/offmychest/r/TrueOffMyChest/r/confessions

Conversational

Users engage in conversationswith others to have a simpleconversation or to here otheropinions in order to change theirworldview./r/CasualConversation/r/changemyviewTable 4: Subreddits used for analysis of moral judgement.We applied the Judge-BERT classiﬁer to the comments andposts of these ten subreddits. Speciﬁcally, given a post and itscomment tree we identiﬁed the top-level comment with thehighest score. This top-rated comment, which has receivedthe most upvotes from the community, is considered to be theone passing judgement on the original poster. As illustrated inFig. 2, this top-rated comment is then fed to our classiﬁer andthe resulting prediction is used to label the moral valence ofthe post and poster. It is important to be clear here: we are not predicting the moral valence of the comment itself, but ratherthe top-rated comment is used to pass judgement on the post.

Here we can begin to answer

RQ2 : Is moral valence correlatedwith the score of a post? In other words, do posts with positivemoral valence score higher or lower than posts with negativemoral valence. To answer this question, we extracted all postsand their highest scoring top-level comment from 2018 fromeach subreddit in Table 4.Popularity scores on Reddit exhibit a power-law distribu-tion, so the mean-scores and their differences will certainly . . . . Post Score (log) V a l en c e R a t i o f o r P o s t S c o r e ≤ x /r/relationship advice /r/relationships /r/dating advice/r/legaladvice /r/dating /r/offmychest/r/TrueOffMyChest /r/confessions /r/CasualConversation/r/changemyview Figure 3: Posts judged to have positive valence as a functionof post score. Higher indicates more positive valence. Higherpost scores are associated with more positive valence (MannWhitney τ ∈ [0 . , . , p < . two-tailed, Bonferronicorrected) .be misleading. Instead, we plot the ratio of comments judgedto be positive against all comments as a function of the postscore cumulatively in Fig. 3. Higher values in the plot indi-cate more positive valence. The results here are clear: postpopularity is associated with positive moral valence. Most ofthe subreddits appear to have similar characteristics except for/r/CasualConversation, which has a much higher positive va-lence (on average) than the other subreddits. Mann-WhitneyTests for statistical signiﬁcance on individual subreddits aswell the aggregation of these tests with Bonferroni correctionfound that posts with positive valence have signiﬁcantly higherscores than posts with negative valence ( τ ∈ [0 . , . , p < . two-tailed).We take the additional step to argue that correlation doesindeed imply causation in this particular case. Because postsare made before votes are cast, and because the text of a postis (typically) unchanged, and if we assume that scores arecausally related to the text of the post, then the causal arrowcan only point in one direction, i.e. , posts with positive moralvalence result in higher scores than posts with negative moralvalence.These ﬁndings appear to conﬂict with other studies that4ave shown how negative posts elicit anger and encourage anegative feedback loop on social media [2, 7]. A further in-spection of the posts indicated that posts classiﬁed as hav-ing positive moral valence often found users expressing thata moral norm had been breached. The difference in our re-sults compared to others may be explained by perceived in-tent, that is, whether or not the moral violation occurred froman intentional agent towards a vulnerable agent, c.f. , dyadicmorality [27]. Our inspection of comments expressing neg-ative moral judgement conﬁrms that the perceived intent ofthe poster is critical to the judgement rendered. These nega-tive judgements typically highlight what the poster did wrongand advise the poster to reﬂect on their actions (or sometimessimply insult the poster). Conversely, we ﬁnd that many postsjudged to be positive clearly show that the poster is the vul-nerable agent in the situation to some other intentional agent.The responses to these posts often display sympathy towardsthe poster and also outrage towards the other party in the sce-nario. These instances are perhaps best classiﬁed as examplesof empathetic anger [17], which is anger expressed over theharm that has been done to another. We also note that some ofthe content labelled to have positive moral valence is simplydevoid of a moral scenario. Examples of this can be primarilyseen in /r/CasualConversation where the majority of posts areabout innocuous topics.Another possible explanation for our ﬁndings is that userson other online social media sites like Facebook and Twit-ter are more likely to like and share news headlines thatelicit moral outrage; these social signals are then used by thesite’s algorithms to spread the headline further throughout thesite [4, 15]. Furthermore, the content of the articles trigger-ing these moral responses often covers current news eventsthroughout the world. Our Reddit dataset, on the other hand,typically deal with personal stories and therefore tend to nothave the same in-group/out-group reactions as those found onviral Facebook or Twitter posts. Next we investigate

RQ3 : Do certain subreddit-communitiesattract users whose posts are typically classiﬁed by more neg-ative or positive moral judgements? To answer this questionwe need to reconsider our unit of analysis. Rather than as-signing moral valence to the individual post, in this analysiswe consider the moral valence of the user who committed thepost. To do this, we again ﬁnd all posts and comments of theten subreddits and ﬁnd the highest scoring top-level comment;we classify whether that comment is judging the post to havepositive or negative moral valence and then tally this for theposting user.Of course, users are also able to post comments and sub-comments. So we expand this analysis to include judgementsof users from throughout the entire comment tree. Each com-ment can have zero or more replies each with its own score.So, for each comment we identify the reply with the highestscore and classify whether that reply is judging the comment . . . . . . . . Fraction of Population C u m u l a t i v enega t i v e j udge m en t s Figure 4: Lorenz curve depicting the judgement inequalityamong users; Gini coefﬁcient = 0.515to have positive or negative moral valence, and then tally thisfor the commenting user. We do this for each comment thathas at least one reply at all levels in the comment tree.By assigning moral valence scores to users we are able tocapture all judgements across the ten subreddits and better-understand their behavior. It is important to remember thatthe classiﬁer classiﬁes the moral valence of text – with someamount of uncertainty – not the user speciﬁcally. So we em-phasize that we do not label users as “good” or “bad” explic-itly; rather, we identify users as having submitted posts andcomments that were similar to comments that previously re-ceived positive or negative moral judgement.We include only users that were judged at least 50 times.Each user therefore has an associated count of positive andnegative judgements. This begs an interesting question: aresome users judged more positively or negatively than oth-ers? What does that distribution look like? To understand thisbreakdown we ﬁrst plot a Lorenz curve in Fig. 4. We ﬁnd thatthe distribution of moral valence is highly unequal: about 10%of users receive almost 40% of the observed negative judge-ments (Gini coefﬁcient = 0.515).This clearly indicates that there are a handful of users thatreceive the vast majority of negative judgements. To identifythose users which receive a statistically signiﬁcant proportionof negative judgements we perform a one-sided binomial teston each user. Simply put, this test emits a negativity probabil-ity , i.e. , the probability (p-value) that the negativity of a user isnot due to chance.Finally, we can illustrate the membership of each sub-reddit as a function of users’ negativity probability. As ex-pected, Fig. 5 shows that as we increase the negativitythreshold from almost certainly negative to uncertainty (fromleft to right) we begin to increase the fraction of com-ments observed. These curves therefore indicate the den-sity of comments that are made from negative users (forvarying levels of negativity); higher lines (especially on theleft) indicate higher concentration of negativity. We ﬁnd that/r/confessions, /r/changemyview, and /r/TrueOffMyChest con-tain a higher concentration of comments from more-negativeusers. On the opposite side of the spectrum, we ﬁnd that/r/CasualConversation and /r/legaladvice have deep curves,5 . . . . − − − Negativity Probability Threshold F r a c t i ono f C o mm en t s O b s e r v ed /r/relationship advice /r/relationships /r/dating advice/r/legaladvice /r/dating /r/offmychest/r/TrueOffMyChest /r/confessions /r/CasualConversation/r/changemyview Figure 5: Number of comments (normalized) as a the nega-tivity threshold is raised. As the negativity threshold is raisedthe fraction of comments revealed tends towards 1. Higherlines indicate a higher concentration of negative users and viceversa.which implies that these communities have fewer negativeusers than others.

We select our group of users that have a statistically signiﬁcantnegative moral valence as those that were found to have a p -value less than 0.05 from our one-tailed binomial test. Withinthis group we investigated into their posting habits to deter-mine what types of posts they make to garner such a largenumber of negative judgements. From our analysis of theseusers we determined that they fall into three different stylisticgroups.1. Explainer: These users will argue that what they did isn’tthat wrong.2. Stubborn Opinion: Users that do not acquiesce to the pre-vailing opinion of the responders.3. Returner: Users that repeatedly post the same situationhoping to elicit more-favorable responses.The ﬁrst type of user that we observe is the Explainer. Theexplainer typically makes a post and receives comments thatcondemn their immoral actions. In response to this judgement,the explainer will reply to many of the comments in an attemptto convince others that what they did was in fact moral. Of-ten, this only serves to exacerbate the judgements made againstthem. This then leads to further negative judgements. In fact,we found that many of these users have only made a handfulof posts that each have a large number of comments in self-defense. The large number of users that respond to these com-ments and with negative judgements is similar to the effect ofonline ﬁrestorms [24] but at a scale contained to only an in-dividual post. For these types of posts we also note that some Me and my partner are having a baby.

Posts Responses

How noticeable is her bellybump?Why is she so differentnow?Why is she so distant?Her friends are askingabout it.I’m moving on. You really are insecureabout her losing her hotbody.You think her beingstressed, tired and sickis bullshit?Your other posts don’tmake you seem as eagerto be involved.Neither of you are ma-ture enough to deal witha baby.Quit being a dumbass anddeal with your responsibili-ties.

Figure 6: A diagram showing the posting habits of a Returner.Posts are in the light blue boxes with blue arrows representthe order of posts. An example of a post response is shown inthe white box with the red arrow representing the post it camefrom. Each post is prefaced with the overarching title, ”Me andmy partner are having a baby.” followed by the current updateon the situation. The response comments have also been con-densed from their full length.people do come to the defense of the poster, which followssimilar ﬁndings that people show sympathy after a person hasexperienced a large amount of outrage [26].The second type of user we observe is the Stubborn Opin-ion user. These users are similar to but opposite from the Ex-plainers. Rather than trying to change their perspective, theStubborn Opinion user refuses to acquiesce to the prevailingopinion of the comment thread. For example, users posting to/r/changemyview that do not express a change of opinion de-spite the efforts and agreement of the commenting users oftenincur comments casting negative judgement. This back-and-forth sometimes becomes hostile. Many of these conversationsend in personal attacks from one of the participants, which hasalso been shown in previous work on conversations derailingin /r/changemyview [5].The third type of user is the Returner. The returner seeksrepeated feedback from Reddit on the same subject. For ex-ample, when returners make posts seeking moral judgement,they will often engage in some of the discussion and may evenagree with some of the critical responses. Some time later, theuser will return and edit their original post or make anotherpost providing an update about their situation. An example ofa Returner is illustrated in Figure 6. In this case, a user contin-ues to request advice after recently impregnating their partner.In these situations responding users often ﬁnd previous postson the same topic made by the same user and then use thisinformation and highlight commentary from the previous post6r/relationship advice

Positive NegativeMale

Female

Positive NegativeMale

Female φ = 0 . , p < . and /r/relationships φ = 0 . , p < . to build a stronger case against the user or highlight how thenew post is nothing but a thinly-veiled attempt to shine a more-favorable light on their original situation. These attempts usu-ally backﬁre and result in more negative judgments being castagainst the user. Our ﬁnal task investigates

RQ4 : Are self-reported gender andage descriptions associated with positive or negative moraljudgements? Recent studies on this topic have found that gen-der and moral judgements have a strong association [23].Speciﬁcally, women are perceived to be victims more oftenthan men and harsher punishments are sought for men. Therates at which men commit crimes tends to be higher than therates of female crime and society generally views crimes asa moral violation [6]. If we apply these recent ﬁndings to ourcurrent research question we expect to ﬁnd that male users willbe judged negatively more often than females.This task is not usually available on public social media ser-vices because gender and age are not typically revealed, whilealso allowing for anonymous posting. Fortunately, the post-ing guidelines of /r/relationships and /r/relationship advice re-quires posters to indicate their age and gender in a structuredmanner directly in the post title. An example of this can beseen here:where the poster uses [M27] to indicate that they identify asmale aged 27 years and that their partner [F25] identiﬁes asfemale aged 25 years. Using these conventions we are able toreliably extract users’ self-reported age and gender.We again apply our Judge-BERT model to assign a moraljudgement to the post based on the top-scoring comment.In total we extracted judgements from 508,560 posts on /r/relationship advice

Variable Coefﬁcient p-value 95% CI(Constant) -1.1575 < Gender < Age < Variable Coefﬁcient p-value 95% CI(Constant) -1.0923 < Gender < Age < χ test of independence. Contingency tablesfor this test are reported in Table 5. The χ test reports a sig-niﬁcant association between gender and moral judgement in/r/relationships ( χ (1 , , . , p < . ) and in/r/relationship advice ( χ (1 , , . , p < . ).However, the χ test on such large sample sizes usually resultsin statistical signiﬁcance; in fact, the χ test tends to ﬁnd sta-tistical signiﬁcance for populations greater than 200 [28]. Sowe verify this association using φ , which measures the strengthof association controlled for population size. In this case, φ =0.09 for /r/relationships and φ = 0.07 for /r/relationship advice.These low values indicate that there is only a small associationbetween gender and moral judgement. Our second task is to determine if gender and age are associ-ated with moral judgement. In other words, are young females,for instance, judged more positively than, say, old males? Toanswer this question, we ﬁt a two-variable logistic regressionmodel where the binary-variable gender is encoded as 0 forfemale and 1 for male.We report the ﬁndings from the logistic regressor foreach subreddit in Table 6. These results indicate that malesare judged more negatively than females. Speciﬁcally, in/r/relationship advice being male is associated with a 35%increase in receiving a negative judgement. Similarly, in/r/relationships being male is associated with a 46% increasein receiving a negative judgement.We also ﬁnd that age has a relatively small effect on moral7udgement; increased age is slightly correlated with negativejudgement. Speciﬁcally, in /r/relationship advice an increasein age by one year is associated with a 0.59% increase in re-ceiving a negative judgement. In /r/relationships an increase inage by one year is associated with a 0.34% increase in receiv-ing a negative judgement.Simply put, those who are older and those who are male(independently) are statistically more likely to receive nega-tive judgements from Reddit than those who are younger andfemale. Although gender is much more of a contributing factorthan age and neither association is particularly strong.

In this study, we show that it is possible to learn the languageof moral judgements from text taken from /r/AmITheAsshole.We demonstrate that by extracting the labels and ﬁne-tuninga BERT language model we can achieve good performanceat predicting whether a user is rendering a positive or neg-ative moral judgement. Using our trained classiﬁer we thenanalyze a group of subreddits that are thematically similar to/r/AmITheAsshole for underlying trends. Our results showedthat users prefer posts that have a positive moral valence ratherthan a negative moral valence. Another analysis revealed thata small portion of users are judged to have substantially morenegative moral valence than others and they tend towards sub-reddits such as /r/confessions. We also show that these highlynegative moral valence users fall into three different typesbased on their posting habits. Lastly, we demonstrate that ageand gender have a minimal effect on whether a user is judgedto be have positive or negative moral valence.Although the Judge-BERT classiﬁer enabled us to perform avariety of analysis it does pose some limitations. We are unableto verify if the classiﬁer generalizes well to the other subred-dits in our study. The test-subreddits do deviate from the typesof moral analysis observed in the training data. Moral judge-ment is not the focus of /r/CasualConversation, for example.In the future we hope to implement argument mining inorder to gain a better understanding of the reasons for thesejudgements by extracting the underlying arguments given byusers. Other works have done this by extracting rules of thumbthrough human annotation [11] but this limits the ability to per-form a large scale analysis. Argument mining has shown suc-cess with extracting the persuasive arguments from subredditslike /r/changemyview [10] and would enable us to get a bet-ter understanding of moral judgements on social media. Thiswould also allow us to aggregate the underlying themes fromthese judgements for further analysis.

Acknowledgements

We would like to thank Michael Yankoski and Meng Jiangfor their help preparing this manuscript. This work is fundedby the US Army Research Ofﬁce (W911NF-17-1-0448) andthe US Defense Advanced Research Projects Agency (DARPAW911NF-17-C-0094).