Analysing Meso and Macro conversation structures in an online suicide support forum
Sagar Joglekar, Sumithra Velupillai, Rina Dutta, Nishanth Sastry
AAnalysing Meso and Macro conversation structuresin an online suicide support forum
Sagar Joglekar , Sumithra Velupillai , Rina Dutta , and Nishanth Sastry King’s College, Department of Informatics, London, UK King’s College London, IoPPN, London, SE5 8AF, UK * [email protected] ABSTRACT
Platforms like Reddit and Twitter offer internet users an opportunity to talk about diverse issues, including those pertaining tophysical and mental health. Some of these forums also function as a safe space for severely distressed mental health patientsto get social support from peers. The online community platform Reddit’s SuicideWatch is one example of an online forumdedicated specifically to people who suffer from suicidal thoughts, or who are concerned about people who might be at risk. Itremains to be seen if these forums can be used to understand and model the nature of online social support, not least becauseof the noisy and informal nature of conversations. Moreover, understanding how a community of volunteering peers react tocalls for help in cases of suicidal posts, would help to devise better tools for online mitigation of such episodes. In this paper,we propose an approach to characterise conversations in online forums. Using data from the SuicideWatch subreddit as a casestudy, we propose metrics at a macroscopic level – measuring the structure of the entire conversation as a whole. We alsodevelop a framework to measure structures in supportive conversations at a mesoscopic level – measuring interactions withthe immediate neighbours of the person in distress. We statistically show through comparison with baseline conversations fromrandom Reddit threads that certain macro and meso-scale structures in an online conversation exhibit signatures of socialsupport, and are particularly over-expressed in SuicideWatch conversations.
Introduction
Suicide is responsible for 1 ·
5% of global mortality and is one of the most challenging public mental health issues . Suicidalityincludes any thoughts or actions by an individual that could result in death . Preventing death by suicide is a priority forhealth care services internationally , but poses a great challenge since accurately predicting an episode of suicidality is almostimpossible . Furthermore, many deaths by suicide occur in people who did not have a known diagnosed mental healthcondition when they died . Our ability to understand suicide has therefore been hampered by our ability to obtain data “in situ” .Platforms such as Reddit and Twitter are starting to offer a new and uniquely transparent window into suicide and other mentalhealth issues. Unlike traditional health records, social media posts are authored by the users themselves. Also, in contrast toformal clinical settings, users on such platforms express themselves freely rather than regulating answers to establish a positiveimpression or be socially desirable , thus providing a fresh and honest perspective . Social media have therefore become afertile ground for mental health studies, leading to new results in depression, anxiety, autism, and other problems .Recent studies have shown promising results in modeling and measuring signals and patterns in Reddit communities relatedto mental health. For instance, statistical relations of mental health and depression communities with suicide ideation havebeen studied . The authors explored linguistic and social characteristics that evaluate users’ propensity to suicidal ideation.Approaches to classify reddit posts as related to certain mental health conditions have also been successfully developed,showing that there are certain characteristics specific to mental health-related topics in posts that can be automatically captured .Furthermore, in a study focused on reddit posts related to anxiety, depression and post-traumatic stress disorder, the authorsshow that these online communities exhibit themes of a supportive nature, e.g. gratitude for receiving emotional support .Positive effects of participation in such fora have also been shown by improvements in members’ written communication . Thesupportive nature of comments in the SuicideWatch forum has also been studied by automatic identification and classification ofhelpful comments with promising results . Naturally, several studies that have been based on these types of online communitieslook at the textual content of these online fora and produce inferences about psychological states. In our work, we conjecturethat apart from textual metrics, it is important to quantify the differences in the structure of a supportive conversation.In the context of suicide, social media occupies an important “clinical whitespace” – long intervals between clinicalencounters that are filled with frequent posts on social media. These provide the potential for increased visibility into patientmental states. Studies have started to use social media posts both to understand population level responses to external triggerssuch as celebrity suicides and as a means to assess suicide risk .1 a r X i v : . [ c s . S I] J u l hile these are important first steps, such studies could be affected by the very nature of social media: it is unregulated, andproblems such as cyberbullying could potentially affect the variables being studied. For instance, risk of suicide may beexacerbated just by participating on online social platforms. Therefore, we believe it is crucial to understand first the nature ofthe conversations that happen online around suicide and suicidal behaviour. This paper aims to answer the question: Does socialmedia activity provide a supportive medium for potentially vulnerable patients? We address this by studying the interactions ofusers on SuicideWatch ( ), an online community (“subreddit”) on thesocial media platform Reddit. SuicideWatch is a heavily moderated forum, keeping messages and conversations on topic, andis focussed solely on the topic of suicide. The moderators take the message of peer support seriously, and are governed byguidelines that prohibits false promises, abuse, tough love and other clinically concerning methods of conversations It hastherefore been the focus of recent research and shared tasks which aim to advance the state of the art .In contrast with previous studies which only looked at original posts on SuicideWatch, we look at the entire conversation“thread”. In other words, we start with the content from the Original Poster (OP), but we also include the hierarchically nestedthread of replies to the original post, the replies to those replies, and so on. Further, most previous studies have aimed atstudying the content of posts and their characteristics in relation to other posts. But an important aspect of online communitiesis their supportive function , where users can turn to these platforms not only to express their thoughts and concerns, but also toreceive support from the community. This support often manifests as an emergent conversation between many users and theone in distress. Hence here we propose a framework that captures the structure of a conversation thread and develop metricsthat capture the macroscopic properties of a conversation that involve the entire thread and the users participating in it as wellas mesocopic properties of a conversation that involve the immediate interactions with the one in distress.To model the conversation structure, we represent conversations in a forum using two graph-based abstractions: Userinteraction graphs , which model the user-to-user exchange of messages, and reply graphs , which capture the structure of thedialogue on the forum, see example in Figure 1. The complete processing pipeline can be seen in Figure 2. We describe thepipeline and the metrics in detail in the Methods section. We then propose metrics that quantify the macroscopic structure ofthe two graphs we construct:
Responsiveness measures how quickly the responses accumulate in the reply graph;
Centrality of the OP measures how important the OP is to the conversation thread by computing the betweeness centrality of the OP inthe user graph; Reciprocity measures the extent to which users obtain replies to their posts, by computing the fraction of edgesin the user graph that are bidirectional;
Branching factor measures how the reply graph fans out, i.e., the number of replies apost obtains.To measure local or mesoscopic structure, we turn to network motifs . We propose a new method to count and characteriselocal structures, called anchored triadic motifs. Triadic motifs traditionally consider three nodes at a time . Given the primacyof the OP, our method distinguishes variants based on where the OP is situated in a triad, to understand how the local patternsof communications support the OP. In summary, this paper makes four key contributions: (a) (b) Figure 1.
Figure 1a shows a sample reply graph constructed from a real thread in SW that contains 8 posts by 5 unique users.Each node represents a post and a directed edge is drawn from one node to another node when the first node is a reply to thesecond node. Thus, for example, Node 1 is the original post, with four replies (posts 2, 3, 4 and 5). Each node is given a colourbased on the author of the post that the node represents, and each distinct colour represents a distinct author. Thus, from thereply graph, we can deduce that the original poster (Red node) obtained replies from the blue, green, yellow and purple users.In turn, the red node replied back to purple and yellow nodes ,but not to the blue and green nodes. The entire list of directedinteractions is captured in a user interaction graph in Figure 1b, where each coloured node represents the corresponding userwho wrote a post on the thread, and the directed edges represent the replies. • We develop a framework that abstracts out both the structure and semantics of a threaded conversation on the web
Using this abstraction, we develop metrics which quantify the macroscopic (thread-wide) properties of conversations onSuicideWatch. • We develop a new method, which we term anchored triadic motifs , to understand the mesoscopic or local structure ofSuicideWatch conversations using triadic network motifs. Our method adapts triads by anchoring on the position of theOriginal Poster (OP), thereby distinguishing the OP from the other posters and helps understand how the conversationsupports the OP’s needs. • We show that there are significant statistical differences, both in the macroscopic and mesoscopic realm, that differentiatea SuicideWatch conversation from a generic conversation.
Results
Graph Abstractions • Reply Graphs • User interaction Macro Analysis Meso AnalysisResponsiveness, Reciprocity,Centrality, Branching Anchored Triadic Motifs
Figure 2.
A reddit thread is converted into abstractions (Reply graphs and User interaction graphs). Macroscopic andmesoscopic analysis is performed on these graphs, and statistical over- or under- representation of these metrics is evaluated.Reddit is a platform where a user can create a post or reply to a root post (RP) submitted by an original poster (OP) in asubreddit, and other reddit users can interact by posting at different levels of the thread, or by up or down voting posts. Weanalyzed RPs in the SuicideWatch subreddit (SW), building on the work of Gkotsis et al. . We crawled SW to get entireconversation threads, iteratively pursuing each conversation at progressively deeper levels of replies until the whole threadhad been obtained. The code to crawl reddit for threads can be found at https://github.com/sagarjoglekar/redditTools . Thisresulted in a dataset of over 50,754 SW threads totaling in 419,555 individual posts. To provide a baseline against which tocompare nature of conversations on the SW sub-reddit, we acquired a similar number (49,773) of baseline threads from anyother subreddit popular enough to land on the frontpage (FP). This resulted in a baseline dataset of 3,011,765 posts. Furtherdetails on how these were acquired are presented in the Methods section. We compare the two corpora – SW and FP – at twoscales: the first is a macroscopic analysis that considers features of entire threads; second we perform a mesoscopic analysis byconsidering local structural relations between nodes and their neighbours within user graphs corresponding to each thread. Ouranalysis finds several factors that distinguish SW conversations from FP conversations. Macro Analysis of SW and FP Conversations
Responsiveness: Users respond faster on SW than on other subreddits To understand how responses on SW compared to other sub-reddit threads on FP, we calculate differences between the postingtimes between consecutive messages in a reply graph. The time that elapses between successive messages, i.e., the inter-messagetimes, is taken as an indication of the urgency of how responsive a thread is. Figure 3a shows a comparison of inter-messageresponse times for SW and FP threads, using the empirical Cumulative Distribution Function (CDF). Given a point ( x , y ) in aCDF, y should be interpreted as the frequentist probability that a particular variable (inter-message response times in this case)is less than the value x . Thus, Figure 3a shows that responses in SW tend to be much faster than in other sub-reddits , suggestingthat the community sees a need for urgency in responses. Reciprocity: Interactions on SW are more likely to be bidirectional
Next, we look at whether two users talk with each other – i.e., if user A replies to a user B , does B reply back to A ? Figure 3bplots the empirical CDF of the fraction of posts which are reciprocated, showing a vast difference – conversations on SW aremuch more reciprocal than other subreddits : the median value for U sym for SW is 50% whereas for FP is 2.6%. Centrality: OP is more central in SW conversations To understand whether and to what extent SW conversations revolve around the OP (who may have posted in distress), weconsider the user interaction graph of each thread, and plot the betweenness centrality of the OP . Betweenness centrality of a) (b) (c) (d)(e) Figure 3.
This panel shows the cumulative distribution (CDFs) of Macroscopic features for SuicideWatch sub-reddit data(SW, in blue). These are compared with the control dataset of generic conversations on reddit from the FrontPage (FP, in green).3a depicts results for Urgency; 3b for Reciprocity; 3c for OP’s centrality in an interaction graph; and 3d for Branching. SWconversations score higher on reciprocity, urgency and semantic alignment than FP. The SW conversations tend to branch lessand tend to have higher centrality when compared to FP. Figure 3e represents the median completion times of the three motifsover expressed in SW, where the OP is at the apex (most central) position. This plot shows that as the time goes by, thesymmetric nature of interaction between the OP and those who engage with them increases.a node measures how often that node is on the shortest path connecting two other nodes, and as such, is a measure of howcentral the node is in the graph. Figure 3c shows the empirical CDFs of centralities. It can be seen that the
OP has much higherbetweenness centrality in SW conversations than FP conversations . Branching: SW conversations branch out considerably less compared to FP
We next measure the number of responses or branching factor of the reply graph, using the formula described in SectionBranching Factor. Figure 3d shows that SW threads branch out considerably less than threads on other subreddits, which couldbe indicative of the first few replies satisfying the need for response embedded in the posts they are replying to (e.g., if the postis a query, the initial replies could be providing all the information asked; if the post is a call for help; the initial replies could beproviding the necessary level of support).
Mesoscopic analysis: Patterns in local interactions
It is often useful to express large interaction graphs as the sum of local interactions between two or three nodes at a time. Thismethod is prevalent in the Social Sciences, for studying social structures by looking at local interactions between agents . Suchanalysis is also useful in expressing local structures in large graphs and has been used in several network analysis works .For this reason we conduct a census of the 36 Anchored triadic motifs (see Figure 4a), using methods further described inSection Mesoscopic graph metrics: Anchored triadic motifs ), across all the selected graphs. Anchored motifs extend theconcept of triadic network motifs by distinguishing different variants based on the position of a special node, which we takehere to be the Original Poster (OP) who started the thread. By distinguishing the OP’s position, we are able to reason about howa particular motif may help serve the needs of the OP. Commonly, motif analysis compares the occurrence of each triad ina real network against a baseline, for instance a null model created using generative processes (e.g. random graphs). In thiscase, we compare the motifs seen in SuicideWatch against the set of all graphs that belong to generic conversations from theFrontpage (FP). We perform binning of user graphs as described in Section , and perform over- or under-expression analysisin comparison with motif census performed on FP as the baseline null model. We use Z-scores of the motif occurrences as ametric to measure statistical significance. We are interested in anchored motifs which are present in significant numbers aswell as have strong over or under expression. We classify a motif population as significant if the mean motif population goesabove 10 for any of the 7 bins. We consider a motif over/under expressed if the Z-score is either greater than 1 or less than -1for at-least 1 bin. A motif which has significant mean population but has a Z-score between -1 and 1 is considered equally xpressed. Figure 4 shows all the 8 motifs which are statistically significant and over/under expressed.We find that anchored motif variants are significantly over-expressedin SW conversations across all sizes of graphs as seen from figures 4b,4c,4d,4e, 4f,4g. Similarly anchored motif variants are significantly over-expressed in the null model (FP) graphs across all sizes.We look at the median completion times for 3 of the 5 over expressed motifs (021U-a, 111D-b and 201-b), by plotting themedian age of the last established edge in the motif as a fraction of the entire lifetime of the thread (Figure 3e). These threemotifs share a peculiar property in that they all have the OP at the apex (most central) position. We observe that as the timegoes by, the symmetric edges between the OP and those who engage with them increases.From previous studies on triadic structure, it was inferred that transitive triads are naturally more common than expected insocial structures of apes and humans . Interestingly, our analysis shows that transitive triads are rarer in SW, as comparedwith the FP conversations. These patterns in local interactions indicate that conversations in SW tend to be more OP centric,with non-transitive dialogues between the OP and users who respond to their calls for help. As a consequence, the OP tends tobe highly central in the conversation as well as part of several mutual interactions. These behaviours are unique to SW, i.e.,observed more in SW than when compared with conversations on other subreddits (FP). Discussion
The results show that there are several factors which distinguish a a SuicideWatch (SW) conversation from a comparativebaseline of conversations gathered from the front page (FP) of Reddit. Based purely on the structure of the conversations, weidentified four clear differences between the macroscopic features of SW and FP: The speed or urgency of response to the OPin SW is faster than for FP , which is as would be intuitively expected of a subreddit set up as a place of support for “vulnerableOPs”. Just as in group therapy, it is individual clients and their larger relationship within therapy that is the agent of change ,and the same is reflected in this peer support forum. Features of relational communication are that SW shows more symmetryand reciprocity than FP , and the
OP is central to the communication . Studying the interlocking and reciprocal effects of eachinteractor on the other has been key to understanding “therapy as a system” in face-to-face therapeutic encounters also . Whatis radically different from a clinical context is that the posters are not healthcare seeking and may be on SW precisely becausethey are seeking alternative support and there is no ‘professional’ facilitating the discussion. The closest analogy in a live groupsetting is “fishbowls” (used in certain group counselling courses) where there is an inner ring of discussants (the OP and otherposters on the thread) who are observed by an outer-ring of observers (in SW, a parallel may be drawn with the moderators whomanually examine comments and delete those that threaten or violate the thread’s specified codes and ban trolls ).In communication accommodation theory (CAT) , which was developed for face-to-face conversations between two people(a dyad) but has now been extended to mediated dyadic discussions (e.g. on Twitter) without temporal immediacy , the conceptof accommodation has two opposite forms: convergence and divergence. Convergence is mimicking of the conversationalpartner’s style and divergence is avoidance of the style. This phenomenon may be reflected in SW by less branching ordigression in the conversation thread compared to FP.At a mesoscopic level, the most striking features of the anchored triadic motifs which occur in statistically significantnumbers (shown with a solid, non hatched background in Fig. 4a) are that none of them involve all three nodes, suggesting thatdyadic communications (e.g., providing an answer to a question) are the primary focus. Of course it would be extrapolating toassume this is supportive communication, and there would need to be further qualitative research into the content of the threadsthat demonstrated these motifs. Similarly when considering the anchored triadic motifs which are under expressed in both FPand SW datasets (grey hatched in Fig 4a), it is worth noting that although these are statistically rare in the current work, theycould be worth exploring in other datasets.Both of the anchored triadic motifs that are over expressed in the baseline FP conversations and by comparison are underexpressed in SW (motifs that are shown in green in Fig. 4a), show non-conversational, non-reciprocal patterns of serialcommunications between respondents to an OP (021C-c) and unidirectional response to an OP from one respondent (012-b). Incontrast, those over expressed in SW (shown in red) have two arrow heads pointing towards the apex nodes, suggesting thatcommunication is directed towards one participant. Except for 021U-a and -b, the other motifs over expressed in SW all have atleast one bidirectional conversation, reflecting the high levels of reciprocity in SW.Internet health forums have been studied in several instances and their utility has been shown to be of value in cases ofchronic illnesses , addictions and mental health issues . However, most of these studies have focused on quantitativelyanalysing the content discussed and the linguistic signatures of how these communities interact. Here, we have instead focusedon developing ways to quantitatively analyse the structure of online communication, and study how and whether this structurereveals patterns of peer and community support. To that end, this is the first attempt at finding topological discriminatory factorsbetween supportive and generic conversations on social media forums. Our focus on structure rather than content means thatour methods can potentially be extended to other languages more easily. (a)(b) (c) (d) (e)(f) (g) (h) (i) Figure 4.
Figure 4a shows the 36 different types of Anchored Triadic motifs which are statistically compared between FP andSW graphs. The motifs with green boxes are over expressed in the baseline dataset (FP) by a significant amount. Themotifs with red boxes are over-expressed in the SuicideWatch (SW) dataset by significant amount. The motifs with greyboxes are present in significant numbers in both datasets, but neither over nor under expressed in any datasets based on their Zscores. The motifs in grey hatched boxes are very rare in both the baseline and suicide watch datasets, with less than 5 meanoccurrences per graph per bin. Figures 4b–4h and 4i show the side by side comparison of motif occurrences for SW and FPacross different bins for motifs that are either over or underexpressed (i.e., coloured green or red in Fig. 4a. The Z-score fromthe comparison is plotted as a blue trace, alongside the mean population of the motif in both SW and FP in a selected bin. Forcompleteness, we report the results for other remaining motifs in the supplementary material. a) (b)
Figure 5.
Degree distributions (a) User Graph and (b) Reply Graphs in FP and SW, showing that the two datasets arecomparable.The public health implications of this work are that the distinctive supportive network structures and the content of theirposts should be studied in more detail to investigate what works well and why. This could help educate peer moderators to havea better overview of the subreddits they moderate and the ongoing conversation. Topological features could be used in additionto the community signals they already use, such as numbers of upvotes or downvotes, or referring to comments flagged bycommunity members . Similarly, studying the less supportive motifs could lead to insights into why certain interactions areunhelpful, and might allow automated detection of such interactions so that moderators are able to moderate such comments ina more timely fashion. The results obtained could also be used as a selection strategy for purposefully sampling more supportivenetworks. We believe that the novel framework for macro and meso analysis of supportive online communities we present herecan provide important directions for future research in this area. Methods
Background on Reddit conversations
Building on the brief descriptions in the previous sections, here we provide a more detailed background of Reddit conversations:In most forum based platforms such as reddit, users interact in a nested dialogue fashion, where an Original Poster or OP postscontent called a Root Post or RP to start a new discussion thread. This thread is then open for comments by all the communityusers. In case of Reddit, such a community is called a Subreddit. Subreddits like SuicideWatch consist of a moderated collectionof posts from users who subscribe to that community or subreddit. These users may post new threads onto the subreddit as longas the post follows the subreddit rules. Enforcement of these rules is the responsibility of the moderators. Datasets
The focus our work is the subreddit r/SuicideWatch. We study this using a seed dataset , that consists of root posts from thesubreddit r/SuicideWatch. Building on this dataset, we acquire the entire thread structure of all the root posts in the data byrecursively obtaining all replies to the root posts, replies to those replies and so on, until we reach posts which do not haveany further replies. This results in the acquisition of 50,754 threads from SuicideWatch (SW). To obtain a baseline of similarsize for comparison purposes, we crawl the entire conversation threads of posts that appear on the front page of Reddit.comfor 2 weeks, accumulating a second corpus (FP) of 49,773 reddit threads in the process. The two conversation datasets fromr/SuicideWatch and Frontpage are very similar in terms of common summary statistics such as Degree distributions (see Fig. 5).Owing to the long tailed nature of the datasets, we perform our analysis on threads which have at least 5 posts in addition tothe root post. We further clean the data, by removing threads where the root author has deleted their user account, which isa common practice to preserve anonymity in more controversial posts. The resulting dataset has 10,527 threads in SW and11,070 threads in the baseline (FP). Abstractions
To understand the dynamics of supportive conversations, we develop two abstractions:
Reply Graphs
To mimic the structure of conversation threads on Reddit we define an abstraction that we term as reply graphs , and denote as R = ( P , E ) . The nodes P consist of all the posts in the thread. The root post is labeled as p ∈ P and k th post in chronologicalorder as p k ∈ P . A directed edge ( p i , p j ) ∈ E is drawn from p i to p j if p j is a reply to p i . 1 (a) presents an example. Note thatin platforms such as Reddit, where each response can only reply to one other post, reply graphs end up being reply trees . ser interaction Graphs The second abstraction, which we term user interaction graphs , represents each thread as a directed graph G = ( V , E ) where V is the set of all users participating in a particular thread and a directed edge ( v i , v j ) ∈ E is drawn between two users v i and v j ifuser v i responds to a post by user v j . 1 (b) presents an example. Note that unlike reply trees, user interaction graphs of Redditconversations can be full graphs, and may include cycles. Macroscopic graph metrics
The abstractions are used to extract the following structural metrics from the conversation threads. These metrics are then usedto validate structural differences between supportive conversations and generic casual conversations from our baseline set.
Responsiveness
To understand the speed with which users in a subreddit (whether in SW or in other subreddits as represented in FP) respond tothe OP and each other, we calculate differences between the posting times between consecutive response messages in a replygraph. We then compute the median response times per thread. Reciprocity
Reciprocity measures the extent to which users’ posts obtain responses, and is measured as the fraction of edges in the usergraph that are bidirectional: U sym = total number of bidirectional edges in a user graphTotal number of edges in the graph OP Centrality
Node centrality is a metric that measures how central a node is in a network. We study how central the OP is to a conversationthread by computing how often it lies on the shortest connecting paths between other pairs of nodes in the user interactiongraph. Betweenness centrality is formally defined as g ( OP ) = ∑ s (cid:54) = v (cid:54) = t σ st ( OP ) σ st where σ st is the total number of shortest paths from node s to node t ; σ st ( OP ) is the number of paths that pass through OP . Branching Factor
Branching factor is a metric that reflects the fanning out of a conversation as it evolves. To measure this on Reddit reply trees.we compute the average number of replies obtained by each post, i.e., the average in-degree of nodes in each reply graph.
Mesoscopic graph metrics: Anchored triadic motifs
Network motifs are local sub-networks, typically of 2 or 3 nodes which are connected together. Such local patterns are highlyuseful in quantifying local interactions and the resulting macro structure of the network . They have been used in a variety ofapplications, from economics to cellular protein-protein interaction networks . These local interaction patterns have beenfundamental in the study of social structural processes . They help social scientists quantify the type of hierarchies in thesocial network . Hence, we turn to network motifs to characterize the local structure of the converstion threads. However,SW conversations shows clear distinctions between the users who respond to a call for help and the user/s who are asking forhelp (the OP ). To accommodate this, we extend the concept of triadic motifs to create different variants of the same motif whenthe OP is in different positions.In conventional literature, the local interactions are measured through a census of 16 triadic motif patterns , which coverall possible patterns of non-isomorphic triads which cannot be mapped or morphed into each other. In this method, there is nospecial treatment to any node, and position of all nodes is treated equally. To this, we introduce the notion of anchors , or nodeswith special importance, which in our case is the OP , the user who makes the initial post in the thread under consideration. Byfixing a role for a node in a motif, each of the 16 triadic motifs as seen and developed in the field , can be unravelled into36 sub-variants of these motifs by varying the anchored node, as seen in Fig 4a. Each sub-variant is different from the otherfrom the perspective of the anchored node. Some motifs yield three variants for each of the three positions that the OP canbe in. However, other motifs yield fewer variants, since two or more of the variants can be iso-morphic to each other evenwhen the position of the OP is distinguished. Bataglej et.al’s work developed a method for counting network motifs. Webuild on this and develop an efficient method for counting anchored network motifs. Each motif as seen in Figure 4a is namedusing the naming scheme developed by Holland and Leinhardt . The first three numbers, follow a M-A-N pattern whichsignifies the number of " M utual" , " A symmetric" or " N ull" edges present in that particular triad. For example, the motif 030has 0-Mutual(bi-directional), 3-Asymmetric(unidirectional) and 0-Null (disconnected) edges. There are some motifs with an dded modifier letter (C-U-D-T) attached to further differentiate between different triad types with the same M-A-N pattern. Tothis, we additionally attach a variant label (a, b or c) to distinguish the different anchored network motifs that result from thedifferent positions of the OP .To systematically understand the over or under expression of these anchored triadic motifs in the suicide watch community(SW), we use the user interaction graphs for the Front page (FP) baseline posts as a null model. We analyse 10,527 userinteraction graphs from SW and 11,070 graphs from FP dataset. We progressively select graphs with different sizes, i.e., graphswith differing numbers of users present in the interaction graphs. We bin both the FP and SW user interaction graphs as follows,based on the number of nodes interacting within a thread: 1 – 5, 6 – 10 , 11 – 15 , 16 – 20 , 21 – 25, 26 – 30, 31 – 35 and 36– 40. The number of conversations that contain more than 40 unique users participating in the same conversation thread isextremely small in both SW and FP; hence we stop binning at this point. Within each bin, we then perform a census, countingthe number of occurrences of different anchored network motifs. Once the census is done, we calculate Z scores for the Suicidewatch conversations, using FP conversations as the null model, to understand whether a given anchored network motif is overor under expressed in SW in relation to FP.We call the set of FrontPage and SuicideWatch graphs that belong to bin b as G bFP and G bSW respectively. For a selected bin b , let M graphs from FP belong to b and N from SW belong to b . We conduct the anchored motif census of the 36 motifs forboth G bFP and G bSW . To compute the null model, we require the mean ( µ null ) and standard deviation ( σ null ) of the frequencydistributions of all the 36 motifs found in the G bFP graphs. This means we will have 36 values of ( µ null ) and ( σ null ); one foreach motif. We also compute the mean ( µ SW ) and standard deviation ( σ SW ) for SW dataset, and plot the means of FP and SW side by side as a comparison. The error bars represent standard errors ( e null = σ null √ M and e SW = σ SW √ N ). Plots of these meanfrequencies for both datasets can be found in Figures 4b – 4i.Once we have the null model figures for the bin b from G bFP graphs, we compare these with the N graphs ( G bSW ) in order tocompute the Z score . For the i th motif, the score Z i is defined as Z i = N N ∑ k = m SWk − µ null σ null where m SWk is the total number of the i th motif found in the k th graph in G bSW . We compute this Z score for all the 36 motifsacross all the 7 bins. The trends in the value of this Z score are also plotted in Figure 4a. We consider a motif population assignificant if the mean motif population goes above 10 for any of the 7 bins. We consider a motif over/under expressed ifthe Z-score is either greater than 1 or less than -1 for at least 1 bin. A motif which has significant mean population but has aZ-score between -1 and 1 is considered equally expressed. Figure 4a shows all the 8 motifs which are significant and over/underexpressed, where as the measurements of the statistically insignificant motifs are included in the supplementary material. Data and Code Availability
The datasets generated and analysed during the current study are available from the corresponding author upon reasonablerequest. The code for crawling the Reddit conversation structure, conducting census for Anchored Triadic Motifs, and analysisof the data can be found at https://github.com/sagarjoglekar/redditTools
References O’Connor, R. C. & Nock, M. K. The psychology of suicidal behaviour.
The Lancet Psychiatry , 73–85 (2014). URL . DOI 10.1016/S2215-0366(14)70222-6. Turecki, G. & Brent, D. A. Suicide and suicidal behaviour.
The Lancet , 1227 – 1239 (2016). URL . DOI https://doi.org/10.1016/S0140-6736(15)00234-2. Zalsman, G. et al.
Suicide prevention strategies revisited: 10-year systematic review.
The Lancet Psychiatry , 646–659 (2016). URL . DOI 10.1016/S2215-0366(16)30030-X. McHugh, C. M., Corderoy, A., Ryan, C. J., Hickie, I. B. & Large, M. M. Association between suicidal ideation and suicide:meta-analyses of odds ratios, sensitivity, specificity and positive predictive value.
BJPsych open (2019). Velupillai, S. et al.
Risk Assessment Tools and Data-Driven Approaches for Predicting and Preventing Suicidal Behavior.
Front. Psychiatry (2019). URL . DOI 10.3389/fpsyt.2019.00036. . Stone, D. M. et al.
Vital Signs: Trends in State Suicide Rates — United States, 1999–2016 and Circumstances Contributingto Suicide — 27 States, 2015.
MMWR. Morb. Mortal. Wkly. Rep. (2018). URL . DOI 10.15585/mmwr.mm6722a1. Nock, M. K., Ramirez, F. & Rankin, O. Advancing our understanding of the who, when, and why of suicide risk.
JAMApsychiatry , 11–12 (2019). van de Mortel, T. The role of specialist nurses in improving treatment adherence in children with a chronic illness. TheAust. J. Adv. Nursing: A Q. Publ. Royal Aust. Nurs. Fed. , 40–48 (2008). Gkotsis, G. et al.
Characterisation of mental health conditions in social media using informed deep learning.
Sci. reports ,45141 (2017). De Choudhury, M. & De, S. Mental health discourse on reddit: Self-disclosure, social support, and anonymity. In
EighthInternational AAAI Conference on Weblogs and Social Media (2014).
Park, A., Conway, M. & Chen, A. T. Examining thematic similarity, difference, and membership in three online mentalhealth communities from reddit: a text mining and visualization approach.
Comput. human behavior , 98–112 (2018). Shen, J. H. & Rudzicz, F. Detecting anxiety through reddit. In
Proceedings of the Fourth Workshop on ComputationalLinguistics and Clinical Psychology—From Linguistic Signal to Clinical Reality , 58–65 (2017).
Park, A. & Conway, M. Longitudinal changes in psychological states in online health community members: understandingthe long-term effects of participating in an online depression community.
J. medical Internet research , e71 (2017). De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G. & Kumar, M. Discovering shifts to suicidal ideation frommental health content in social media. In
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems ,CHI ’16, 2098–2110 (ACM, New York, NY, USA, 2016). URL http://doi.acm.org/10.1145/2858036.2858207 . DOI 10.1145/2858036.2858207.
De Choudhury, M. & De, S. Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity.
Proc.Eight Int. AAAI Conf. on Weblogs Soc. Media
Park, A. & Conway, M. Harnessing reddit to understand the written-communication challenges experienced by individualswith mental health disorders: Analysis of texts from mental health communities.
J Med Internet Res , e121 (2018). URL . DOI 10.2196/jmir.8219. Kavuluru, R. et al.
Classification of helpful comments on online suicide watch forums. In
Proceedings of the 7th ACM Inter-national Conference on Bioinformatics, Computational Biology, and Health Informatics , BCB ’16, 32–40 (ACM, New York,NY, USA, 2016). URL http://doi.acm.org/10.1145/2975167.2975170 . DOI 10.1145/2975167.2975170.
Coppersmith, G., Leary, R., Crutchley, P. & Fine, A. Natural language processing of social media as screening for suiciderisk.
Biomed. informatics insights , 1178222618792860 (2018). Kumar, M., Dredze, M., Coppersmith, G. & De Choudhury, M. Detecting changes in suicide content manifested in socialmedia following celebrity suicides. In
Proceedings of the 26th ACM conference on Hypertext & Social Media , 85–94(ACM, 2015).
Karamshuk, D., Shaw, F., Brownlie, J. & Sastry, N. Bridging big data and qualitative methods in the social sciences: Acase study of twitter responses to high profile deaths by suicide.
Online Soc. Networks Media , 33–43 (2017). Shing, H.-C. et al.
Expert, crowdsourced, and machine assessment of suicide risk via online postings. In
Proceedings ofthe Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic , 25–36 (2018).
Zirikly, A., Resnik, P., Uzuner, O. & Hollingshead, K. Clpsych 2019 shared task: Predicting the degree of suicide risk inreddit posts. In
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology , 24–33 (2019).
Luxton, D. D., June, J. D. & Fairall, J. M. Social media and suicide: a public health perspective.
Am. journal public health , S195–S200 (2012).
Patton, D. U. et al.
Social media as a vector for youth violence: A review of the literature.
Comput. Hum. Behav. ,548–553 (2014). Morse, F.
Social media and suicide: What it’s like being a moderator on r/SuicideWatch (2016 (ac-cessed February 5, 2020)). URL . White, D. R. & Borgatti, S. P. Betweenness centrality measures for directed graphs.
Soc. networks , 335–346 (1994). Milo, R. et al.
Network motifs: simple building blocks of complex networks.
Sci. , 824–827 (2002). Faust, K. 7. very local structure in social networks.
Sociol. Methodol. , 209–256 (2007). Wang, C., Lizardo, O. & Hachen, D. S. Triadic evolution in a large-scale mobile phone network.
J. Complex Networks ,264–290 (2014). Shizuka, D. & McDonald, D. B. The network motif architecture of dominance hierarchies.
J. Royal Soc. Interface ,20150080 (2015). Yalom, I. D.
The theory and practice of group psychotherapy, 4th ed . The theory and practice of group psychotherapy, 4thed (Basic Books, New York, NY, US, 1995).
Rogers, L. E. & Bagarozzi, D. A. An overview of relational communication and implications for therapy. In
Marital andFamily Therapy (Human Sciences Press,U.S., 1983).
De Shazer, S.
Putting Difference To Work (W. W. Norton, New York, 1991).
Keim, J., Goodrich, K. M., Ishii, H. & Olguin, D. Groupwork course experiences.
Groupwork , 6–25(2013). URL https://journals.whitingbirch.net/index.php/GPWK/article/view/765 . DOI10.1921/gpwk.v23i2.765. Choudhury, M. D. & Kiciman, E. The Language of Social Support in Social Media and its Effecton Suicidal Ideation Risk. In
Proceedings of the International Conference on Web and Social Media(ICWSM-17) (AAAI, 2017). URL . Coupland, N. & Giles, H. Introduction the communicative contexts of accommodation.
Lang. & Commun. , 175–182 (1988). URL . DOI10.1016/0271-5309(88)90015-8. Lipinski-Harten, M. & Tafarodi, R. W. A Comparison of Conversational Quality in Online and Face-to-Face FirstEncounters.
J. Lang. Soc. Psychol. , 331–341 (2012). URL https://doi.org/10.1177/0261927X12446601 .DOI 10.1177/0261927X12446601. Joglekar, S. et al.
How Online Communities of People With Long-Term Conditions Function and Evolve: NetworkAnalysis of the Structure and Dynamics of the Asthma UK and British Lung Foundation Online Communities.
J. Med.Internet Res. , e238 (2018). URL . DOI 10.2196/jmir.9952. Wood, R. T. & Wood, S. A. An evaluation of two united kingdom online support forums designed to help people withgambling issues.
J. Gambl. Issues
De Choudhury, M., Counts, S. & Horvitz, E. Social media as a measurement tool of depression in populations. In
Proceedings of the 5th Annual ACM Web Science Conference , 47–56 (ACM, 2013).
Zhang, X., Shao, S., Stanley, H. E. & Havlin, S. Dynamic motifs in socio-economic networks.
EPL (Europhysics Lett. ,58001 (2014).
Yeger-Lotem, E. et al.
Network motifs in integrated cellular networks of transcription–regulation and protein–proteininteraction.
Proc. Natl. Acad. Sci. , 5934–5939 (2004).
Davis, J. A. Clustering and structural balance in graphs.
Hum. relations , 181–187 (1967). Davis, J. A. & Leinhardt, S. The structure of positive interpersonal relations in small groups.
Hum. relations (1967).
Holland, P. W. & Leinhardt, S. A method for detecting structure in sociometric data. In
Social Networks , 411–432 (Elsevier,1977).
Batagelj, V. & Mrvar, A. A subquadratic triad census algorithm for large sparse networks with small maximum degree.
Soc. Networks , 237–243 (2001). DOI 10.1016/S0378-8733(01)00035-1. Holland, P. W. & Leinhardt, S. The statistical analysis of local structure in social networks (1974).
Contributions