How-to Present News on Social Media: A Causal Analysis of Editing News Headlines for Boosting User Engagement
UUnderstanding Effects of Editing Tweets for News Sharing by Media Accountsthrough a Causal Inference Framework ∗ [Please cite our ICWSM’21 version of this paper] Kunwoo Park † , Haewoon Kwak ‡ , Jisun An ‡ , Sanjay Chawla § † University of California, Los Angeles, ‡ Singapore Management University, § Qatar Computing Research [email protected], [email protected], [email protected], [email protected]
Abstract
To reach a broader audience and optimize traffic toward newsarticles, media outlets commonly run social media accountsand share their content with a short text summary. Despiteits importance of writing a compelling message in sharing ar-ticles, research community does not own a sufficient level ofunderstanding of what kinds of editing strategies are effectivein promoting audience engagement. In this study, we aim tofill the gap by analyzing the current practices of media outletsusing a data-driven approach. We first build a parallel corpusof original news articles and their corresponding tweets thatwere shared by eight media outlets. Then, we explore howthose media edited tweets against original headlines, and theeffects would be. To estimate the effects of editing news head-lines for social media sharing in audience engagement, wepresent a systematic analysis that incorporates a causal in-ference technique with deep learning; using propensity scorematching, it allows for estimating potential (dis-)advantagesof an editing style compared to counterfactual cases where asimilar news article is shared with a different style. Accordingto the analyses of various editing styles, we report commonand differing effects of the styles across the outlets. To under-stand the effects of various editing styles, media outlets couldapply our easy-to-use tool by themselves.
People prefer to read their news online rather than news-papers these days (Mitchell 2018). This paradigm shift hasbrought both good and bad influences on the news indus-try. The bad is that the competition among news organi-zations has become intense. Since the distribution costs ofnews content is far less expensive than it used to be in thepre-digital news era, many online news media have newlyappeared, and the amount of news stories published in aday has been soaring (Atlantic 2016). The good, on theother hand, is that it enables media to get direct feedbackfrom their audience; it further makes it easier to quantita-tively measure the level of engagement on each news article.News organizations are increasingly adopting data-drivenmethods to understand their audience preferences, decidethe coverage, predict article shelf-life, or recommend nextarticles to read (Castillo et al. 2014; Kuiken et al. 2017; ∗ This work was started when KP, HK, and JA worked in QCRI.
Aldous, An, and Jansen 2019b). Data-driven methods havealso increased the understanding of effective news headlinesthat boost traffic (Kuiken et al. 2017; Hagar and Diakopou-los 2019) while some headlines could undermine the cred-ibility of news organizations in return for increased traf-fic (Chen, Conroy, and Rubin 2015a).
Pretty people have all the luck. Even Airbnb is a beauty contest, a new paper says
Figure 1: An example of news article shared by a media ac-count on Twitter.Sharing news articles on social media is a well-knownstrategy for boosting traffic to online news outlets. As shownin Figure 1, news outlets run an official account (we call me-dia account for the rest of this paper) and share their articleswith a short text. There are various ways for writing the so-cial media posts. One could mirror news headlines withoutany modification or add clickbait-style phrases (e.g., “EvenAirbnb is a beauty contest” in the figure). Social media man-agers at the newsroom face such a challenging task every dayabout how to write a short message to share a given news ar-ticle (Aldous, An, and Jansen 2019a). In spite of its impor-tance, they depend on their experience and make educatedguesses to maximize audience engagement. As a result, theysometimes fail. Also, the research community is aware ofdifferent practices of using social media across media out-lets (Russell 2019; Welbers and Opgenhaffen 2019) yet doesnot own a sufficient level of understanding on which strate-gies lead to increased engagement.In this work, we aim to fill this gap by analyzing newsarticles that are shared on Twitter by eight news media out-lets, which vary publication channels and political leaning.In particular, we tackle the following research questions todeepen our understanding of editing practices of media ac-counts and their effects:RQ . How do news media edit text messages when sharingnews articles on social media? a r X i v : . [ c s . S I] S e p Q . Which kind of editing style leads to more audienceengagement on social media?
We characterize how media accounts edit a tweet messageagainst its original news headline and evaluate its effective-ness on the amount of audience engagement, such as thenumber of retweets or likes, by using the systematic anal-ysis framework that incorporates propensity score analysiswith deep learning.The main contributions of this paper are as follows:1. We build a parallel text corpus of news articles and so-cial media posts (tweets in this work), written by eighthybrid and online-only media accounts, and make it pub-licly available to a wider community. From the dataset,we characterize patterns on how media outlets edit tweetmessages when sharing news articles on Twitter.2. To estimate the effects of editing news headlines on audi-ence engagement, we utilize a systematic analysis frame-work that uses a deep learning-based model for propensityscore analysis: it compares the level of engagements fora style with counterfactual cases where similar news arti-cles are shared with a different editing style. This frame-work can be applied to any paired dataset of news articleand its social media message that is shared by a mediaaccount, which will give a practical contribution to newsmedia outlets for evaluating whether or not their strategyof publishing social media messages is effective.3. Using the analysis framework on the dataset of the eightnews outlets, we test which kind of editing strategy iseffective in audience engagement. For example, we ob-served that writing a tweet message with the ‘clickbait’style achieved a larger amount of engagement comparedto its estimated counterfactual cases for half of the targetmedia outlets.
There has been a line of research on how news organiza-tions use social media in terms of content and interaction.News organizations use Twitter as a promotional tool andwrite a tweet of headlines of news articles with a correspond-ing URL (Armstrong and Gao 2010; Holcomb, Gross, andMitchell 2011). Another study pointed out that news mediaemploy their accounts as a mere news dissemination toolwithout much interaction with audience (Malik and Pfeffer2016). However, the current practices on using social me-dia vary across the news media and countries (Russell 2019;Welbers and Opgenhaffen 2019).The emergence of social media also brings changes innews writing (Dick 2011; Tandoc Jr 2014), particularly innews headlines. In traditional newspapers, news headlinesare expected to provide a clear understanding of what thenews article is about (Van Dijk 2013) for helping thosewho read a newspaper while scanning headlines. Hence,headlines have functioned as a summary of the key pointsof the full article (Bell 1991; Nir 1993). As social mediabecome popular (Hermida et al. 2012), headlines are alsorequired to attract readers’ attention to increase traffic to their websites (Chen, Conroy, and Rubin 2015b). Accord-ingly, editors and journalists have adjusted the way theywrite headlines (Dick 2011). The characteristics of head-lines in online news have been studied across the plat-forms, styles, sentiments, and news media (Kuiken et al.2017; Dos Reis et al. 2015; Scacco and Muddiman 2019;Piotrkowicz et al. 2017).
A significant amount of work has attempted to predict thepopularity of news articles on web environments by mod-eling content features of news articles and user reactionson news websites and social media. Various studies haveconcluded that early user reactions on social media havea strong predictive power for the long-term popularity ofnews articles (Lerman and Hogg 2010; Castillo et al. 2014;Keneshloo et al. 2016). Another study tackled a more chal-lenging problem in forecasting the popularity (mainly viewcounts) of news articles even before its publication, whichknown as ‘cold start’ prediction (Bandari, Asur, and Huber-man 2012). Applying the popularity prediction models rely-ing only on news content, however, was not successful forthe cold-start prediction in practice (Arapakis, Cambazoglu,and Lalmas 2014). A more recent study noted the impor-tance of delivering fresh news earlier than competitors to at-tract readers (Rajapaksha, Farahbakhsh, and Crespi 2019).In addition to views, various dimensions of audience en-gagements have been studied. A study that compared themost-clicked items with the most-commented items foundthat 40-59% of the items are different (Tenenboim and Co-hen 2015). A more recent study observed that topics affectthe level of engagement (Aldous, An, and Jansen 2019b),but the effects turned out to vary across engagement types:views, likes and comments.A recent study investigated the impact of editing a newsheadline with clickbait-style on view counts (Kuiken et al.2017), which is a specific type of news headlines that is de-signed to attract users’ attention with a catchy text (Chen,Conroy, and Rubin 2015a) or by referring content that is notexposed in a headline (Blom and Hansen 2015). Using thedataset of one Dutch news aggregator, Blendle, Kuiken et al.(2017) examined 1,828 pairs of the original news headlineand the rewritten title by Blendle editors. They found thatrewriting a headline with clickbait-style is likely to increasethe number of views.Another line of research examined the role of postingtime for the popularity of news articles. According to regres-sion analyses for predicting view counts of the WashingtonPost articles, (Keneshloo et al. 2016) showed that the post-ing time was not an important factor for audience engage-ment. Another study investigated social media messagesshared by Twitter accounts of 200 Irish journalists (Orellana-Rodriguez, Greene, and Keane 2016) and suggests that thereis no best time of the day for engagements; they only foundout a slight increase in audience engagement after 5 pm.In the subsequent sections, we will first investigate howmedia accounts edit messages for sharing news articles onsocial media. Then, to estimate the effects of editing styles(e.g., sharing news with clickbait messages), we will apply systematic framework that controls for the effects of con-founding variables on engagement. Following the literatureon news popularity and audience engagement, we decide tocontrol for the effects of news content as a major confound-ing variable in the following analyses.
To answer our research questions, we first build a paral-lel text corpus of news articles and social media posts. Forcovering diverse posting styles, we consider two types ofnews media in terms of channels for publishing news: hybridnews media and online-only news media. Hybrid news me-dia (e.g., CNN) are the news outlets that have both conven-tional mass media channels, such as newspapers and tele-vision, and online channels. By contrast, online-only newsmedia (e.g., HuffPost) are emerging media that publish con-tent through online channels only.
Type Media Followers TweetsHybrid New York Times 43.7M 143,011The Economist 23.8M 30,200CNN 42.2M 50,841Fox News 18.5M 34,245Online-only HuffPost 11.4M 23,712ClickHole 487K 5,535Upworthy 516K 168BuzzFeed 6.56M 18,862
Table 1: Descriptive data statisticsFor hybrid news media, we collect a list of reliable newsmedia and their political leaning from Media Bias/FactCheck, which is widely used in large-scale news media anal-ysis (Media Bias Fact Check 2015). We also manually com-pile their social media accounts and their number of fol-lowers on social media. We then choose four most popu-lar news media to cover different political leanings in ourdataset: The New York Times (@nytimes, left-center), TheEconomist (@TheEconomist, least-biased), CNN (@CNN,left), and Fox News (@FoxNews, right). The popularity ismeasured based on the number of followers. For online-onlynews media, we choose four news media: HuffPost (@Huff-Post), ClickHole (@ClickHole), Upworthy (@Upworthy),and BuzzFeed (@BuzzFeed) based on the previous litera-ture (Chakraborty et al. 2016) and their popularity. For theseeight media outlets, our data collection pipeline consists offour steps:(1) We collect tweets written by each media account. Us-ing twint , a third party library for Twitter data collec-tions, we collect all available tweets but not mentionsnor retweets. We also exclude tweets that contain anURL only.(2) We extract an embedded URL from each tweet. As itis typically shortened (e.g., http://nyti.ms/2hKFRvl) andsometimes shortened multiple times, we expand it until https://github.com/twintproject/twint it reaches at the final destination. If the expanded URLpoints to other sites, such as YouTube, we exclude it.(3) We retrieve the HTML document of expanded URLspointing to news articles. Our crawler sends requestswith generous intervals.(4) As the last step, we extract a pair of headline and bodytext from each HTML file we collected.Table 1 is the summary statistics of our dataset used in thiswork. Our dataset consists of the pairs of news articles andtheir tweets that were published in 2018. For the New YorkTimes, we utilize a publicly available dataset (Szpakowski2017) in Step (3) to match news articles with their tweets in2018. We also note that Upworthy actively tweeted only inthe last two months of 2018. Due to the copyright issues, weonly share news headlines accompanied with its correspond-ing tweet ids at the following repository . One can easilyretrieve the paired dataset used for the following analysesby downloading the tweets using the official Twitter API ortwint with the provided tweet ids. To understand how media accounts edit tweet messageswhen sharing news articles on social media (RQ1), we char-acterize media accounts from the perspectives of headlinemirroring ( § § § NYTimes TheEconomist CNN FoxNews0.123 0.124 0.075 0.397 (a) Hybrid media
Huffpost ClickHole Upworthy BuzzFeed0.0001 0.809 0.714 0.294 (b) Online-only media
Table 2: Fraction of the tweets with the mirroring headlinesConsidering that the mainstream news media outlets pub-lish about 150 to 500 news stories per day (Atlantic 2016), itmay be challenging for news outlets to write new social me-dia text for all their news stories. We first examine whetherthe media accounts use the original headline without modi-fications (mirroring) or edit the headline to better appeal tosocial media users. Table 2 presents the proportion of themirrored tweets across the media outlets. Of the 8 news me-dia, Huffpost is the most active in editing headlines for socialmedia; only 0.01% of the tweets contain the original head-lines. By contrast, ClickHole mirrors the original headlinesin 80.9% of their tweets. Then, when a change happens, howmuch content of the headline is preserved in the tweet? dit distance Embedding similarity
Figure 2: Edit distance and embedding similarity betweennews headlines and tweets
To examine how media accounts preserve original contentwhen editing tweet messages, we use two measures thatquantify similarity between news headlines and tweet texts:Levenshtein distance ( edit distance ) and Cosine similarityover an embedding space ( embedding similarity ). First, editdistance is utilized to quantify how many edits (deletion, in-sertion, and substitution) are required to transform a newsheadline to a tweet text. We normalize edit distance by thelonger length of the two texts, which ranges from 0 (iden-tical) to 1 (no character overlap). Second, to know whetherhow much semantics are preserved, we measure embeddingsimilarity by utilizing a pre-trained fastText word embed-ding (fastText 2018). We map a headline and its correspond-ing tweet into 300d vectors using the embedding and mea-sure the cosine similarity between the two vectors, whichranges from -1.0 (dissimilar) to 1.0 (identical). Contrary toedit distance, a higher score indicates that the two texts aremore similar to one another.Figure 2 shows the degree of content preservation ofthe eight news outlets, which is measured by edit distanceand embedding similarity. Not surprisingly, most media ac-counts tend to make a small amount of change for postingtweets against its original news headline, which are repre-sented as a high value of embedding similarity and a lowvalue of edit distance. However, some outlets exhibit dis-tinct patterns; for example, in HuffPost, the median valueof embedding similarity is only 0.619, which is significantlylower than the overall median value of 0.835. We further in-vestigate the media-level difference by employing a Mann-Whitney’s U test between each pair of the 8 outlets on editdistance and embedding similarity, respectively. All of thepairwise relationships show statistically significant differ-ences ( p < .0001) except one pair of ClickHole and Upwor-thy ( p = https://github.com/bywords/NTPairs the meaning but is written very differently against the head-line, which corresponds to a paraphrase of the headline. (a) New York Times (b) CNN(c) HuffPost (d) BuzzFeed Figure 3: Identified clusters of headline and tweet pairs byedit distance and embedding similarityTo figure out how the two measures interact and identifycommon or differing patterns across the eight media outlets,we draw the scatter plots of edit distance and embeddingsimilarity in Figure 3. Each dot indicates a change betweena news headline and its corresponding tweet, which repre-sents embedding similarity along the x -axis and edit dis-tance along y -axis. To aggregate similar patterns into a hand-ful number of groups, we apply the K-means++ clusteringalgorithm, which improves the standard K-means by assign-ing initial centroids based on the underlying data distribu-tion, to the whole data. We determine the optimal number ofclusters ( k =3) by the elbow method. In the figure, the colorof each dot indicates the cluster index, and the red X marksare the centroids of each cluster. Due to the lack of space, wepresent the results of the two outlets each from hybrid andonline-only media by data size. Here, we do not argue thatthe identified clusters represent generalizable editing styles;instead, this approach enables us to systemically understandhow a news outlet preserve the original content in terms oflexicons and semantics by grouping the headline-tweet pairsinto the handful number. Future studies could characterizeediting patterns through a combination of quantitative anal-ysis and qualitative investigation on multiple datasets.Table 3 demonstrates the fraction of headline-tweet pairsthat belongs to each cluster. Cluster 0 represents the pairs ofwhich the tweet is similar or identical to the news headline,which are of low edit distances and high embedding simi-larities. In Cluster 1, a larger amount of lexical changes aremade, but the semantics of a tweet is still similar to the cor-responding news headline, as represented by high edit dis-tances and high embedding similarities. This pattern sug-gests that Cluster 2 may indicate paraphrasing. Cluster 3demonstrates the highest edit distance and the lowest em-bedding similarity, which suggests that a tweet may be re- edia Cluster 0 (Marginal change) Cluster 1(Paraphrasing) Cluster 2(Semantic change)
NYTimes 0.3173 0.3246
TheEconomist 0.1332
ClickHole
Table 3: Fraction of the clusters determined by edit distanceand embedding similarity between headline and tweetwritten with less similar semantics for sharing news articleson Twitter.Here, we make observations on common patterns againstthe type of media. Cluster 0 is the most frequent group forthose online-only media except for Huffpost. Incorporatedwith the observation in Table 2, the online-only media tendto share news headlines with a marginal amount of change.On the other hand, Cluster 0 is the least frequent group forthe hybrid media except for FoxNews; Cluster 1 is the mostfrequent for TheEconomist and CNN while NYTimes showsa balanced distribution over the clusters. This suggests thatthe hybrid media outlets actively rewrite messages specifi-cally designed for sharing news articles on social media, asrepresented by the high frequencies in Cluster 1 and 2.
As the news industry becomes competitive, news outletshave published articles with headlines of a specific style thatmakes the audience click to read more by stimulating psy-chological perspectives, which is known as clickbait (Kilgoand Sinta 2016; Molek-Kozakowska 2013; Stroud 2017).While the credibility of media outlets can be underminedwhen news outlets exploit clickbait too often in their web-sites, this practices might be acceptable on social mediawhere people write casual expressions (e.g., Figure 1). Toinvestigate how the usages of clickbaits vary across the newsoutlets, we utilize a deep learning classifier to examine theheadline-tweet pairs in our dataset.Using a public dataset of clickbait and non-clickbaitheadlines that were manually annotated in a previousstudy (Chakraborty et al. 2016), we first train an attention-based bidirectional recurrent neural network (RNN) classi-fier. The gated recurrent unit (Cho et al. 2014) is used as abasic unit, and an attention mechanism is, in turn, applied tothe hidden units of RNN. We train the network to minimizethe cross-entropy loss using Adam optimizer with gradientclipping. On a separate test set of 90:10 split, the model re-alizes an F1-score of 0.994, which outperforms the initialperformance of 0.934 using SVM (Chakraborty et al. 2016).To understand to what extent each media uses clickbait ontheir headlines and tweets, we estimate their clickbait scores by using the sigmoid output between 0 and 1 from the RNNclassifier, which marks the high performance in the separatetest set.Figure 4: Clickbait scores of news headlines and tweetsFigure 4 demonstrates the distribution of clickbait scoresfor news titles and tweets of the eight news media. All of thehybrid news media are less likely to use clickbait on newsheadlines. Their clickbait scores tend to be higher in tweetsthan headlines ( p < p < class | Headline class ) for hybrid and online-only media (C=Clickbait, NC=Non-clickbait).To better understand how each outlet exploits the click-bait style when sharing a news article on Twitter, wecompute the probability of shifting the clickbait style ofnews article when sharing it on social media: P(Tweet class | Headline class ) . Table 4 reports the conditional probabili-ties of the tweet to be clickbait or non-clickbait given theheadline is clickbait or non-clickbait. For example, for NY-Times, when sharing a clickbait news headline on Twitter,the probability of its corresponding tweet to be non-clickbaitis 0.188. There are common and different trends against themedia type. Given a non-clickbait news headline, the prob-ability of its tweet to be clickbait is similar across the hy-brid and the online-only media (0.254 and 0.261, respec-tively). On the other hand, given a clickbait news headline,the hybrid and online-only media accounts shift the stylewith a huge difference. In the hybrid media, on average, Text class is clickbait when the clickbait score of the text > he probability of controlling the clickbait style remains tobe similar compared when non-clickbait news headlines aregiven (0.248); on the contrary, the online-only media be-come less likely to flip the style when clickbait-style head-lines are given (0.068). This observation implies the differ-ence of editing styles against whether a given news is click-bait across the media type: while the online-only outlets pre-fer to use clickbait-style tweets in any cases, the hybrid me-dia tend to keep the original styles of news headlines. In the previous section, we present that the eight news me-dia outlets employ various strategies in editing tweets whensharing their news articles on Twitter. While the simplesttactic of the media accounts is to mirror the original head-line of news articles to social media ( § § We utilize a systemic framework that incorporates propen-sity score analysis (Rosenbaum and Rubin 1983) with deeplearning. The propensity analysis framework is widely usedfor estimating a causal effect of having a treatment condi-tion from an observational dataset. To test whether a cer-tain causal relationship exists from a treatment variable to an outcome variable, researchers generally conduct a controlledtrial on human or animal subjects, for example, the effects oftaking a pill on reducing the headache symptom. Since thecasual relationship can be confounded by certain variablescalled covariates, such as gender and age, researchers ran-domly assign subjects into one of treatment group (taking areal pill) and control group (taking a placebo).In observational studies where data is given, however,researchers cannot control the process of data generation;therefore, observing correlations between a treatment vari-able and an outcome variable can be confounded by co-variates. In this study, for example, we aim at measur-ing the effects of a certain editing style for news shar-ing on audience engagement on Twitter; however, merelyobserving how the two variables are associated can beconfounded by other factors such as topics, which mightaffect the probability of that news media employ theediting style (Covariates → Treatment) as well as the ex-pected amount of engagement independent of editing styles(Covariates → Outcome).Propensity score matching (PSM) is proposed to ad-dress the issue and widely applied to observational stud-ies on social media (De Choudhury and Kiciman 2017;Olteanu, Varol, and Kiciman 2017; Park et al. 2020). PSMfirst models a probability of having a treatment conditionfrom given covariates (i.e., P ( Treatment | Covariate ) ). Next,PSM ‘matches’ the instances of the corresponding controlgroup to each treatment unit that have a propensity scoresimilar to that of the treatment unit. This process approx-imates randomized controlled trials in which the analysisunits are randomly assigned into either treatment or con-trol group, and thus, the risks of confounding effects due tocovariates are minimized. For more details of PSM, pleaserefer to (Guo and Fraser 2014). Modeling propensity scores
As discussed in related stud-ies (Tenenboim and Cohen 2015; Mummolo 2016), theprobability of selecting news items gets increased when anews article covers the topics of a reader’s interest, andso does the likelihood of audience engagement on socialmedia. Therefore, we aim at reducing the confounding ef-fects of topics on audience engagement by modeling a deep-learning-based propensity model that takes as input the bodytext of news articles. While social media engagements arealso subject to who the posters are, we do not include it asone of the covariates because the analysis framework is ap-plied to each news outlet separately; that is, the effects of theposters are naturally controlled.To model the propensity score, we employ deep learningtechniques that have shown state-of-the-art performancesin text classification tasks in recent studies. In particular,we first transform a sequence of words in body text intoa 300-dimensional vector by averaging word vectors thatwere pre-trained using fastText (Joulin et al. 2016) on newsdataset (fastText 2018). The sentence vectors are fed into thethree-layer fully-connected neural networks. For the activa-tion of hidden layers, we use the ReLU non-linearity. Theneural network is trained to minimize the cross-entropy ofthe labels on a treatment condition and the predicted valueetween 0 and 1, and the L2 regularization is applied to thelast hidden layer ( λ =0.001). Propensity score matching
The next step is to match eachtreatment unit to the control units based on the propensityscore. To put it differently, we prune instances that are toodifferent from treatment groups in terms of the propensityscore. We apply the k -nearest neighbor algorithm ( k =5) toeach treatment unit. After the matching process is done, thegeneral PSM framework requires to check balances betweena treatment group and its matched controls by the standard-ized mean difference of each covariate between the treat-ment group and the matched control group (Guo and Fraser2014); If the two groups are not balanced, we cannot pro-ceed the rest step since they cannot satisfy the conditionalindependence assumption, which is required to estimate acausal effect. In our experiments of which the text feature isrepresented by a 300-d latent vector, it is non-trivial to checkwhether the two groups cover similar content using the samemetric. Alternatively, we use the cosine similarity betweenthe embedding vectors of the treatment and control units,which is widely used to measure the similarity between twodocuments in the NLP community (Manning, Manning, andSch¨utze 1999).We formalize the condition of the successful matching asfollows: (cid:16) T (cid:88) t M t (cid:88) m Similarity ( t, m ) k (cid:17) ≥ max ( µ + α × σ, threshold ) (1), where T is a set of treatment units and M t is a set of con-trol units matched to treatment unit t . α is a hyperparameterthat controls the sensitivity of deciding whether a matchingis successful, and k is a hyperparameter of the nearest neigh-bor algorithm. µ and σ are the mean and standard deviationof embedding similarity between all the pairs of documentsfrom the original dataset before the matching is done. The threshold is to cope with the distribution where similari-ties are on average low. In the following experiments, we set α to be 1.5, which lets µ + α × σ corresponds to the 86thpercentile of the similarity value, and threshold to be 0.8. Estimating treatment effects
For the treatment groupswith successfully matched instances, we measure the effectof having a treatment condition on a variable of audience en-gagement. The Estimated Average Treatment Effect (EATE)on an outcome variable is measured as follows:
EAT E = T (cid:88) t M t (cid:88) m (cid:16) y t − y m k (cid:17) /N T (2), where y t and y m are the outcomes measured for t and m ,respectively. N T is the number of treatment units, and themeaning of other symbols is the same as those in Equation(1). EATE quantifies the potential (dis-)advantage of audi-ence engagement by sharing a news article with a certainstyle (treatment) compared to another (control). Robustness check using cross-validation
As discussedin (Kiciman and Sharma 2019), it is crucial to conduct a sensitivity analysis for conducting propensity score analy-sis because the matching process can lead to a biased result.As a step for a robustness check, we repeat the above pro-cess using 10-fold cross-validation. In particular, for everyiteration, we make use of 90% of the dataset for traininga propensity score model, matching, and measuring EATE.As the last step, we compute the 95% confidence intervalby averaging the 10 EATEs and discard the cases where theinterval includes zero. The reported EATE is the average ofthe 10 EATEs measured on the splits.
Using the analysis framework, we investigate what effectsare brought into audience engagement on Twitter by edit-ing tweets for sharing news articles. Note that the analysisframework is applied to each scenario that tests an effect ofstyle A (e.g., editing) compared to style B (e.g., mirroring)for K outlet (e.g., NYTimes): a propensity model is trainedonly on a dataset, control groups are matched for each treat-ment unit, and semantic balance and robustness are checkedbetween the two groups. If a scenario does not pass the bal-ance check or the robustness check, we are not able to mea-sure the EATE, which is therefore omitted. Effects of modifying original content
First, we investi-gate whether mirroring a news title to social media is a goodstrategy or not for each of the eight outlets. We considerthe treatment group of headline-tweet pairs of which thetweet is different from the headline, and the control groupis the pairs of which the original headline is identical to thetweet text. Because the distribution of audience engagementvaries across those outlets as shown in Figure 5, we applythe propensity score matching to the headline-tweet pairs ofeach media separately.Table 5 presents the EATE on the three variables of au-dience engagement, measured across the eight outlets. Ac-cording to the results of balance check and robustness anal-ysis, we exclude the results for CNN and Upworthy. Under-standing which factors lead to a failure of matching wouldbe an interesting research direction, but we leave it for futureworks.There are three main observations. First, results show thatediting tweet messages is more likely to increase the amountof engagement for the hybrid news media than mirroringheadlines. For example, for NYTimes, the tweets editedfrom news titles are on average more retweeted (+56.34) andliked (+77.90) than the tweets identical to news titles. Whilethe positive effect is similarly observed for TheEconomist,FoxNews exhibits a different pattern; the number of retweetsand likes increased, but that of replies decreased. Second, forthe four online-only new outlets, editing tweets makes di-verse effects. While BuzzFeed enjoys the positive effects ofediting like the hybrid news media, for HuffPost and Click-Hole, editing tweet messages does not help, but lower audi-ence engagement. Interestingly, Huffpost and ClickHole aremedia that changed the news titles the most and least (95%and 19%). Third, there is a common trend across all the me-dia; the magnitude on the number of likes is always biggerthan that on retweet counts with the same direction, indicat- edia EATERT LK RP
NYTimes 56.34 77.90 9.33TheEconomist 13.38 17.33 0.60CNN - - -FoxNews 25.57 77.92 -49.26Huffpost -11.88 -24.46 -ClickHole -103.63 -480.52 -5.64Upworthy - - -BuzzFeed 13.57 49.98 0.47NYTimes (Politics) 104.74 149.32 23.60NYTimes (Entertainment) 34.40 65.23 3.79FoxNews (Politics) 21.97 69.88 -46.21FoxNews (Entertainment) - - -NYTimes (00:00-08:59) 43.32 60.75 6.85NYTimes (09:00-16:59) 64.02 92.91 13.11NYTimes (17:00-23:59) 58.06 75.38 8.47FoxNews (00:00-08:59) 20.19 71.03 -37.76FoxNews (09:00-16:59) - -43.23 -43.49FoxNews (17:00-23:59) 47.69 180.88 -54.23
Table 5: Effects of editing tweets against news headlines onthe amount of audience engagement on Twitter. The bluebackground indicates a positive effect, and the red indicatesa negative one. (RT: retweets, LK: likes, RP: replies)ing that likes count is more likely to be influenced by editingtweets messages than retweets count, which is aligned withprevious work on different levels of user engagement (Al-dous, An, and Jansen 2019b) and further suggests that in-formation diffusion is a collective phenomenon where manyfactors are involved in its success.As discussed in Section 2.2, the level of audience engage-ment could also be affected by other factors, such as topic ofnews and time of day. Thus, we further see if the estimatedeffects by editing tweets are generalizable against those con-founding variables.Considering the news section as a proxy of a broad topic,we first look into the effects of editing in politics (as hardnews) and entertainment (as soft news) separately by repeat-ing the whole analysis process for each scenario. For exam-ple, edited headline-tweet pairs in politics of NYTimes areonly matched to the identical headline-tweet pairs in politicsof NYTimes, according to the newly estimated propensityscores. Since NYTimes and FoxNews explicitly indicate thesection information in their URLs , we focus on the two out-lets in this experiment. The four rows in the middle of Ta-ble 5 show the EATEs measured on each section of those twomedia. For NYTimes, the direction of EATEs is the sameacross the politics and entertainment sections; that is, edit- politics /article.html ing headline of news articles for sharing on social media islikely to increase audience engagement, which is congruentwith the observation from the whole NYTimes data. Simi-larly, for FoxNews, the directions of the EATEs measuredon the politics section is the same as those from the whole.These observations suggest the generalizability of the effectsby editing news headlines with controlling for the effects oftopics.As a second confounding variable, we consider the timeof day a tweet posted. Following the practice of previousstudies considering the effects of time on news engage-ment (Orellana-Rodriguez, Greene, and Keane 2016), wesplit the posting time into the three time blocks: 00:00-08:59, 09:00-16:59, and 17:00-23:59. We align the post-ing time with the Eastern Daylight Time as the most of theglobal news media targets at that timezone due to its impor-tance in economy (e.g., U.S. stock market) and politics (e.g.,Washington D.C.). For example, even though the headquar-ter of TheEconomist is located at London, they publish newsarticles following the EDT.For headline-tweet pairs of NYTimes and FoxNewsshared on each time block, we repeat the analysis processand present the results in the six rows at the bottom of Ta-ble 5. For NYTimes, the direction of EATEs is the sameacross the time blocks, which is also identical with that ofEATE measured on the whole headline-tweet pairs of NY-Times. For FoxNews, the direction of EATEs is the sameas that of the whole data for 00:00-08:59 and 17:00-23:59;yet, in 09:00-16:59, the effects on the number of likes arethe opposite. We hypothesize the Twitter usage patterns inthe working hours may affect the engagement patterns andfuture works could investigate how audience differ acrosstime.Then, how much content should be kept in terms of lex-icons and semantics for effectively garnering audience en-gagement on Twitter? To investigate whether an optimalstyle of content change exists, we apply the analysis frame-work based on the clusters identified in the earlier section(Figure 3), each of which could represent one of the edit-ing styles: marginal change (Cluster 0), paraphrasing (Clus-ter 1), and semantic change (Cluster 2). For headline-tweetpairs in each outlet, we consider the headline-tweet pairs ofa cluster T as a treatment group and those of the other cluster C as a control group. Again, a propensity model is trainedon the dataset of each cluster pair of each outlet separately,and it is used for matching among the pairs published in asame outlet. Therefore, the varying popularity across me-dia is automatically controlled. We run the experiments forall possible pairs of clusters but only report the cases where T > C due to the lack of space; we observe the direction ofeffects is always opposite when we swap the condition oftreatment and control.Table 6 presents the EATE of the matched results. Resultsdisplay that there exists no single optimal cluster that leadsto a positive EATE, but the trend varies across the mediaoutlets. First, in NYTimes and TheEconomist, having Clus-ter 2 (Semantic change) as treatment group corresponds to apositive EATE; this suggests that, for the two media, chang-ing both words and semantics of tweets from news title is an C NYTimes TheEconomist FoxNews Huffpost Upworthy BuzzFeed
RT LK RP RT LK RP RT LK RP RT LK RP RT LK RP RT LK RP1 0
Table 6: Effects of the amount of content change in editing tweet messages against news headlines on audience engagement onTwitter. Unsuccessfully matched entries are omitted. (T: cluster index of treatment group, C: cluster index of control group, RT:retweets, LK: likes, RP: replies)
Treatment(HL→TwT) Control(HL→TwT) NYTimes TheEconomist FoxNews Huffpost ClickHole
RT LK RP RT LK RP RT LK RP RT LK RP RT LK RPC → NC C → C -5.23 -31.39 -1.77 9.20 8.20 - - - - - - - - - -
NC → C NC → NC
Table 7: Effects of controlling clickbait styles of news headlines (HL) into sharing tweets (TwT). Unsuccessfully matchedentries are omitted. (RT: retweets, LK: likes, RP: replies, C: Clickbait, NC: Non-clickbait)effective strategy to increase audience engagement on Twit-ter. In other words, the editors of both news media couldunderstand who are their audiences on Twitter and write tai-lored tweets that were often quite different from the originalheadlines. Second, Cluster 1 (Paraphrasing) leads to posi-tive EATEs in Huffpost in comparison to the other clusters.Third, BuzzFeed tends to show the positive EATE for Clus-ter 2 and FoxNews tends to exhibit the positive EATE forCluster 1; yet, the two media have the opposite effect forreplies, suggesting that replies have a different characteris-tics compared to the other two engagement measures.In combination with the findings on the effects of the mir-roring strategy in Table 5, the above results suggest that theoptimal editing style varies across the news media outlets.
Effects of using clickbait style
Next, we estimate the ef-fects of controlling clickbait styles of news titles for shar-ing on Twitter. Note that we exclude identical headline-tweetpairs for the subsequent analysis to capture a distinct patternagainst the effects of editing. Table 7 shows the EATE on au-dience engagement by sharing non-clickbait headlines withclickbait tweets and those of sharing clickbait headlines withnon-clickbait tweets. The clickbait label is annotated by thesame process described in § → NC and NC → C. Given a newsarticle, the editors may know what is a desirable style forbeing shared on social media. As there is no published re-port about their internal guideline on how to share news onTwitter, we cannot explain how they work, but the observeddata describes the effective approach of TheEconomist to-ward clickbaits.We further see whether the effects hold the same inthe politics and entertainment sections for NYTimes andFoxNews for generalizability. In the Entertainment sectionof NYTimes, the effects remain the same as those measuredin the whole data, except for replies. On the contrary, in Pol-itics, the effects become the opposite; sharing non-clickbaittweets for clickbait news in politics turns out to be bene-ficial for promoting engagement. The distinct direction ofeffects across the sections suggests there might exist de-sirable styles for different topics. In FoxNews, the section-level analysis also exhibits a different trend from that as awhole. In both sections, sharing clickbait tweets with non-clickbait news likely decreases the amount of engagement.This contradicting observation suggests that the audience ofFoxNews on Twitter responds to clickbait tweets differentlyfor Politics and Entertainment compared to news in othersections.
Social media serve as places where people read and discussnews today (Mitchell 2018). News organizations have runtheir media accounts to share their own articles on socialmedia. Unlike traditional newspapers that readers can seeheadlines and body text at the same time, on social media,the main content is not shown to the readers, but a shorttext (e.g., tweet) should attract readers to click the link toread more. Therefore, it is crucial to write an effective socialmedia post for sharing news articles. The lack of availableataset and analysis framework, however, makes it challeng-ing to evaluate which editing strategy is more effective ingarnering user attention on social media in a systemic man-ner.As a first step to overcome such limitations, we built aparallel corpus of news articles and tweets that are shared bythe eight news outlets and examined how they edit the tweetmessages against the news headlines (RQ1). The findingsshow that the media outlets employed diverse strategies inwriting the social media messages. While mirroring a newsheadline to Twitter was a common strategy, the outlets alsomade various levels of change on content; for example, morefrequently for the online-only media, the news articles wereshared more with clickbait tweets.A natural following question is, which editing strategy ismore effective in promoting audience engagement for shar-ing news articles on Twitter (RQ2). To answer the ques-tion in a data-driven way, we employed a systematic analy-sis framework that incorporates deep learning with propen-sity score analysis; in particular, we utilized a deep learn-ing model for modeling the propensity of having a treatmentcondition. The propensity score analysis framework allowsfor estimating the effect of one editing style on audience en-gagement by matching counterfactual outcomes where thesame article is shared with another editing style. The highperformance of deep learning for text classification enablesto mitigate the effects of covariates such as textual featuresmore effectively in the matching process.The findings of the RQ2 can be summarized as three:First, editing a news headline was likely to increase audi-ence engagement on Twitter than mirroring the headline inthe four hybrid news media, which publish news articlesthrough both offline and online channels. By contrast, inthe news media who only keep online channels, the esti-mated effects of editing tweets were generally negative ex-cept for BuzzFeed. Second, in terms of lexical and semanticchanges, there were no universal best strategy applicable todifferent media outlets. For example, changing the originalsemantics of news headlines (Cluster 2) was estimated to bethe best tactic for NYTimes, yet paraphrasing original head-lines for sharing tweets (Cluster 1) was the best for Huff-post in terms of EATE. Third, sharing tweets with clickbait-style messages was likely to increase the amount of audi-ence engagement in the four outlets. This finding is congru-ent with a previous study showing rewriting news headlineswith a clickbait style increased the amount of engagementin a Dutch news service (Kuiken et al. 2017). Yet, the oppo-site direction of EATEs was also observed from the sharedtweets of ClickHole. The differing trend across the mediaoutlets might suggest that the level of audience engagementis not just a function of editing styles but also dependent onwho their audiences are. To test the hypothesis, future stud-ies could characterize audience types of news outlets (e.g.,socioeconomic status) and investigate how different editingstyles are preferred by each audience type.On top of the above observations on how the eight outletsedited tweets for news sharing and its effects on audience en-gagement, we believe the overall analysis framework, fromhow to process the data to how to conduct propensity score analysis, could benefit any media outlets in practice. In par-ticular, similarly as we measured the effects of editing inTable 5, the media outlets would be able to evaluate how ef-fectively they posted social media messages using their ownheadline-tweet pairs. To the end, we thus will release oursystematic analysis framework as an easy-to-use toolkit.
Although we consider diverse news media from hybrid toonline-only and left to right in this study, additional stud-ies with more number of media across multiple regions areessential for evaluating the generalizability of the observa-tions. As we are sharing the entire analysis pipelines as aneasy-to-use toolkit, we hope that it becomes an easy start-ing point for following studies on different datasets. An-other weakness of this study is an inherent limitation of thepropensity score analysis, which is the risk of unobservedcovariates. Based on the findings of the literature on newsengagement, we tried to minimize the risk through variouscomparisons. Last but not least, while we found that using aclickbait-style message increased audience engagement ontweets, it could make an adverse effect on media credibilitysimultaneously; the long-term effect of the clickbait usageon the perceived credibility should be carefully studied inthe future.Beyond the news domain, our analysis framework couldbe extended to other cross-platform sharing activities (Parket al. 2016). For example, how could researchers share theirresearch papers on social media for effectively drawing at-tention and achieving more citations in the long run? Mir-roring the paper title may not be the best strategy because ascientific paper is usually written in a formal language. Ourframework can be used to quantify which messages wouldbe effective. Another exciting research direction is to auto-matically generate a social media post when a news article isgiven. Training a naive sequence-to-sequence model mightnot work well as there exist diverse headline-to-tweet map-pings as shown in this study. Future researchers could de-velop a controlled generation technique such as (Hu et al.2017) for handling such diversity in the mappings.
References [Aldous, An, and Jansen 2019a] Aldous, K. K.; An, J.; andJansen, B. J. 2019a. The challenges of creating engag-ing content: Results from a focus group study of a popularnews media organization. In
Extended Abstracts of the 2019CHI Conference on Human Factors in Computing Systems ,LBW2317. ACM.[Aldous, An, and Jansen 2019b] Aldous, K. K.; An, J.; andJansen, B. J. 2019b. View, like, comment, post: Analyzinguser engagement by topic at 4 levels across 5 social mediaplatforms for 53 news organizations. In
ICWSM , volume 13,47–57.[Arapakis, Cambazoglu, and Lalmas 2014] Arapakis, I.;Cambazoglu, B. B.; and Lalmas, M. 2014. On the feasibil-ity of predicting news popularity at cold start. In
Proc. ofthe SocInfo , 290–299. Springer.Armstrong and Gao 2010] Armstrong, C. L., and Gao, F.2010. Now tweet this: How news organizations use Twit-ter.
Electronic News
Proc. of the ICWSM .[Bell 1991] Bell, A. 1991.
The language of news media .Blackwell Oxford.[Blom and Hansen 2015] Blom, J. N., and Hansen, K. R.2015. Click bait: Forward-reference as lure in online newsheadlines.
Journal of Pragmatics
Proc. ofthe CSCW , 211–223. ACM.[Chakraborty et al. 2016] Chakraborty, A.; Paranjape, B.;Kakarla, S.; and Ganguly, N. 2016. Stop clickbait: Detect-ing and preventing clickbaits in online news media. In
Proc.of the ASONAM , 9–16. IEEE.[Chen, Conroy, and Rubin 2015a] Chen, Y.; Conroy, N. J.;and Rubin, V. L. 2015a. Misleading online content: Rec-ognizing clickbait as false news. In
Proc. of the ACM Work-shop on Multimodal Deception Detection , 15–19.[Chen, Conroy, and Rubin 2015b] Chen, Y.; Conroy, N. J.;and Rubin, V. L. 2015b. News in an online world: The needfor an “automatic crap detector”.
Proc. of the Associationfor Information Science and Technology
Proc. of the EMNLP , 1724–1734.[De Choudhury and Kiciman 2017] De Choudhury, M., andKiciman, E. 2017. The language of social support in socialmedia and its effect on suicidal ideation risk. In
Proc. of theICWSM .[Dick 2011] Dick, M. 2011. Search engine optimisation inuk news production.
Journalism Practice
Proc. of the ICWSM .[fastText 2018] fastText. 2018. English word vectors. https://fasttext.cc/docs/en/english-vectors.html. [Online; accessed8-Aug-2020].[Guo and Fraser 2014] Guo, S., and Fraser, M. W. 2014.
Propensity score analysis: Statistical methods and applica-tions , volume 11. SAGE publications.[Hagar and Diakopoulos 2019] Hagar, N., and Diakopoulos,N. 2019. Optimizing content with a/b headline testing:Changing newsroom practices.
Media and Communication
Journalism Studies
Pew Research Center’s Project for Excellence in Journalism .[Hu et al. 2017] Hu, Z.; Yang, Z.; Liang, X.; Salakhutdinov,R.; and Xing, E. P. 2017. Toward controlled generation oftext. In
Proc. of the ICML , 1587–1596.[Joulin et al. 2016] Joulin, A.; Grave, E.; Bojanowski, P.; andMikolov, T. 2016. Bag of tricks for efficient text classifica-tion. arXiv preprint arXiv:1607.01759 .[Keneshloo et al. 2016] Keneshloo, Y.; Wang, S.; Han, E.-H.;and Ramakrishnan, N. 2016. Predicting the popularity ofnews articles. In
Proc. of the SDM , 441–449. SIAM.[Kiciman and Sharma 2019] Kiciman, E., and Sharma, A.2019. Causal Inference and Counterfactual Reasoning (3hrTutorial). In
Proc. of the WSDM , 828–829.[Kilgo and Sinta 2016] Kilgo, D. K., and Sinta, V. 2016. Sixthings you didn’t know about headline writing: Sensational-istic form in viral news content from traditional and digitallynative news organizations.
Quieting the Commenters: TheSpiral of Silence’s Persistent Effect
Digital Journalism
Proc. of the WWW , 621–630.[Malik and Pfeffer 2016] Malik, M. M., and Pfeffer, J. 2016.A macroscopic analysis of news content in twitter.
DigitalJournalism
Foundations ofstatistical natural language processing . MIT press.[Media Bias Fact Check 2015] Media Bias Fact Check, L.2015. Media Bias/Fact Check. https://mediabiasfactcheck.com. [Online; accessed 8-Aug-2020].[Mitchell 2018] Mitchell, A. 2018. Americans still preferwatching to reading the news-and mostly still through tele-vision.
Pew Research Center, Washington, DC .[Molek-Kozakowska 2013] Molek-Kozakowska, K. 2013.Towards a pragma-linguistic framework for the study of sen-sationalism in news headlines.
Discourse & Communication
The Journal of Politics
Hebrew Linguistics
Proc. of the CSCW , 370–386. ACM.Orellana-Rodriguez, Greene, and Keane 2016] Orellana-Rodriguez, C.; Greene, D.; and Keane, M. T. 2016.Spreading the news: how can journalists gain more en-gagement for their tweets? In
Proc. of the Web Science ,107–116. ACM.[Park et al. 2016] Park, K.; Weber, I.; Cha, M.; and Lee, C.2016. Persistent sharing of fitness app status on twitter. In
Proc. of the CSCW , 184–194.[Park et al. 2020] Park, K.; Kwak, H.; Song, H.; and Cha, M.2020. Trust Me, I Have a Ph. D.: A Propensity Score Anal-ysis on the Halo Effect of Disclosing One’s Offline SocialStatus in Online Communities. In
Proc. of the ICWSM , vol-ume 14, 534–544.[Piotrkowicz et al. 2017] Piotrkowicz, A.; Dimitrova, V.; Ot-terbacher, J.; and Markert, K. 2017. Headlines matter: Usingheadlines to predict the popularity of news articles on twitterand facebook. In
Proc. of the ICWSM .[Rajapaksha, Farahbakhsh, and Crespi 2019] Rajapaksha,P.; Farahbakhsh, R.; and Crespi, N. 2019. Scrutinizing newsmedia cooperation in Facebook and Twitter.
IEEE Access .[Rosenbaum and Rubin 1983] Rosenbaum, P. R., and Rubin,D. B. 1983. The central role of the propensity score in obser-vational studies for causal effects.
Biometrika
Digital Journalism
New Media & Society .[Stroud 2017] Stroud, N. J. 2017. Attention as a valuableresource.
Political Communication
New media & society
Journalism
News as discourse .Routledge.[Welbers and Opgenhaffen 2019] Welbers, K., and Opgen-haffen, M. 2019. Presenting news on social media: Me-dia logic in the communication style of newspapers on face-book.