[PDF] Author Mentions in Science News Reveal Wide-Spread Ethnic Bias

Abstract

Media outlets play a key role in spreading scientific knowledge to the general public and raising the profile of researchers among their peers. Yet, given time and space constraints, not all scholars can receive equal media attention, and journalists' choices of whom to mention are poorly understood. In this study, we use a comprehensive dataset of 232,524 news stories from 288 U.S.-based outlets covering 100,208 research papers across all sciences to investigate the rates at which scientists of different ethnicities are mentioned by name. We find strong evidence of ethnic biases in author mentions, even after controlling for a wide range of possible confounds. Specifically, authors with non-British-origin names are significantly less likely to be mentioned or quoted than comparable British-origin named authors, even within the stories of a particular news outlet covering a particular scientific venue on a particular research topic. Instead, minority scholars are more likely to have their names substituted with their role at their institution. This ethnic bias is consistent across all types of media outlets, with even larger disparities in General-Interest outlets that tend to publish longer stories and have dedicated editorial teams for accurately reporting science. Our findings reveal that the perceived ethnicity can substantially shape scientists' media attention, and, by our estimation, this bias has affected thousands of scholars unfairly.

Full PDF

AAuthor Mentions in Science News RevealWide-Spread Ethnic Bias

Hao Peng, Misha Teplitskiy, David Jurgens ∗ School of Information, University of Michigan105 S State St, Ann Arbor, MI 48109, USA ∗ To whom correspondence should be addressed; E-mail: [email protected].

Abstract

Media outlets play a key role in spreading scientiﬁc knowledge to the general public andraising the proﬁle of researchers among their peers. Yet, given time and space constraints,not all scholars can receive equal media attention, and journalists’ choices of whom to men-tion are poorly understood. In this study, we use a comprehensive dataset of 232,524 newsstories from 288 U.S.-based outlets covering 100,208 research papers across all sciencesto investigate the rates at which scientists of different ethnicities are mentioned by name.We ﬁnd strong evidence of ethnic biases in author mentions, even after controlling for awide range of possible confounds. Speciﬁcally, authors with non-British-origin names aresigniﬁcantly less likely to be mentioned or quoted than comparable British-origin namedauthors, even within the stories of a particular news outlet covering a particular scientiﬁcvenue on a particular research topic. Instead, minority scholars are more likely to have theirnames substituted with their role at their institution. This ethnic bias is consistent across alltypes of media outlets, with even larger disparities in General-Interest outlets that tend topublish longer stories and have dedicated editorial teams for accurately reporting science.Our ﬁndings reveal that the perceived ethnicity can substantially shape scientists’ mediaattention, and, by our estimation, this bias has affected thousands of scholars unfairly.

Scientiﬁc breakthroughs often attract media attention, which serves as a key mechanism forpublic dissemination of new knowledge (Scheufele, 2013; Brossard and Scheufele, 2013). Sci-ence reporting not only distills research insights but also puts a face on who was responsiblefor the research. The media coverage can then feed back into researchers’ careers (Cronin and1 a r X i v : . [ c s . C Y ] S e p ugimoto, 2014). Besides well-established gender and ethnic disparities in conventional sci-entiﬁc outcomes including funding allocation (Ley and Hamilton, 2008; Ginther et al., 2011;Oliveira et al., 2019; Hoppe et al., 2019), hiring decisions (Xie et al., 2003; Turner et al., 2008;Moss-Racusin et al., 2012; Way et al., 2016), publishing (Ding et al., 2006; West et al., 2013),citations (Larivi`ere et al., 2013; Huang et al., 2020), and monetary or non-monetary rewards(Holden, 2001; Shen, 2013; Xie, 2014), emerging evidence has pointed to demographic dispari-ties in general media coverage (Behm-Morawitz and Ortiz, 2013; Jia et al., 2016; Merullo et al.,2019; Smith, 1997; Devitt, 2002), raising the possibility that some scientists are not receivingtheir due attribution (Jia et al., 2015; Amberg and Saunders, 2018).Going unnamed as an author in science reporting not only removes the reputational ben-eﬁts associated with the report, signalling a person is not worthy of public mention, but alsopotentially shifts the public’s perception of who is a scientist (Miller et al., 2018). Under-representing certain demographics groups can perpetuate the stereotype that scientists are whitemales (Turner et al., 2008; Banchefsky et al., 2016), which in turn weakens the pipeline of re-cruiting and training diverse students into new scientists, exacerbating the current representationissues (Cole, 1979; Reuben et al., 2014; Hill et al., 2018).Academic careers are characterized by cumulative advantage, where successes compound,amplifying each other and become easier to sustain (Merton, 1968). As a result, the inhibitorybiases against minority groups have a cumulative penalty that reduce representation and visibil-ity, and can result in a loss of symbolic capital for advancing one’s career (Leahey, 2007).Given known institutional and cultural barriers faced by minority scholars during the earlystages of research (e.g., gathering resources) and middle stages (e.g., publishing), a sizable gapstill remains in our understanding of the latter stages as research disseminates to the public.While it is possible that, once published in the academic literature and covered by the newsmedia, similar contributions receive similar attention regardless of the authors’ perceived iden-tities, a number of mechanisms may produce divergence between contribution and attention inscience coverage.Here, we present the ﬁrst large-scale and science-wide effort to measure demographic biases2n science news through a computational analysis of 232,524 news stories mentioning 100,208published scholarly work (Section S1). Speciﬁcally, we investigate whether the ﬁrst author of ascientiﬁc paper is mentioned by name in news stories that reference their paper. In multi-authorpapers, ﬁrst authors are commonly junior scholars who are directly responsible for the workand stand the most to gain in recognition from being mentioned.We use mixed-effects regression models to examine and quantify demographic differencesin author mentions, while controlling for a broad range of plausible confounding factors. Thecomplexity of our models and the scale of the data enable unusually strict controls, such asmeasuring differential mentions within a particular news outlet covering a particular academicjournal on a particular research topic. These controls help ensure that we are comparing mediamentions of researchers doing comparable work.Furthermore, the richness of the data enables us to delve into the mechanisms causing thedisparities, and to refer to them using the stronger language of “bias.” Ethnic and gender biasesin mentions may be plausibly caused by a number of mechanisms, involving different actors.First, journalists may not be the relevant actors at all. Some news coverage originates from pressreleases created by in-house public relation staff at universities to disseminate their researchers’work. News outlets often reprint these press releases in part or in full, and any biases thereinmay thus be passed on to the outlets’ audiences. We test this hypothesis by comparing men-tions in journalist-written pieces versus press releases, and by whether journalists differentiallymention additional information about particular researchers, such as their institutions.Second, biases may be driven by pragmatic difﬁculties of interviewing researchers in dis-tant time-zones and possibly with limited English proﬁciency. Journalists (and/or their editors)may use researchers’ names and institutions to “statistically discriminate” and infer from themscheduling or other difﬁculties. We test this hypothesis by focusing on a subset of the datawhere journalists and researchers are located in relatively close geographic proximity (withinthe U.S.), and by comparing simple mentions of names vs. direct quotes.Lastly, journalists may have personal animus towards particular ethnic or gender groups orexpectations of animus from their audience members to whom they cater. We use “animus”3o refer to direct negative attitudes towards particular demographic groups and/or incorrect orunfounded negative inferences about their English proﬁciency and other factors that can affectarticle quality. We test for the possible role of audience by comparing mentions across outlet(and presumably audience) types, and statistically control for English proﬁciency using ease-of-reading measures on the abstracts of the research papers. Results

Who Gets Named?

We ﬁnd strong ethnic bias in mentioning ﬁrst authors by name in science news reporting sci-entiﬁc papers. This bias is robust to the inclusion of increasingly stringent controls (Model5 in Table S5). Speciﬁcally, compared to British-origin named authors, all minority-ethnicityauthors are signiﬁcantly less likely to receive name attributions in science reporting. Indeed,this bias appears to increase with English-centric assessments of cultural distance, with otherEuropean ethnicities penalized the least while Asian and African authors penalized the most.Surprisingly, we ﬁnd no gender bias in author mentions. However, when random effects fornews outlets and publication venues are not considered, the ﬁrst author gender variable appearsto have a signiﬁcant effect. As gender representation varies widely across academic disciplines(Xie et al., 2003; Handelsman et al., 2005), this result suggests that gender differences in men-tion rates are likely to be explained by relative attention rates to publication venues in differentﬁelds. This phenomenon is reminiscent of the Simpson’s paradox observed for gender bias ingraduate school admissions (Wagner, 1982), which, when academic department was controlledfor, revealed no gender bias.To quantify the exact effect of having a name with a perceived demographic on the probabil-ity of being mentioned by name in media coverage, we calculated the average marginal effectsfor the ﬁrst author ethnicity and gender variable respectively using our ﬁnest model.As shown in Fig. 1, the estimated probability of being mentioned decreases by an absolute1.0%–6.4% for authors with minority-ethnicity names, compared to their British-origin namedcounterparts. As the average mention rate is only 36.6% (Section S1), these absolute drops4 .10 0.05 0.00AfricanIndianMiddle EasternChinesenon-Chinese East AsianEastern EuropeanScandinavian & GermanicRomance LanguageFemaleProb. of being mentioned compared to Male/British-origin named authors

Figure 1: The marginal effects for ﬁrst authors’ gender and ethnicity, averaged over all 285,708observations in the dataset. A negative average marginal effect indicates a decrease in mentionprobability compared to authors with Male (for gender) or British-origin (for ethnicity) names.The colors are proportional to the absolute probability changes.

Female is colored as blueto reﬂect its difference from ethnicity identities. The error bars indicate 95% bootstrappedconﬁdence intervals.represent signiﬁcant disparities: the 6.3% and 6.4% marginal decreases for Chinese and Africanauthors represent a 17.5% relative decrease in media representation. This result reveals that themainstream U.S. media outlets have profound bias against authors from all minority ethnicitiesin mentioning them by name in science news: Given the current disparities, we estimate thatmore than four thousand minority scholars have gone unmentioned in our data alone.

Does Location Matter?

In reporting on research, journalists often directly seek out the authors by phone or email tocontextualize and explain their results. If an author is at a non-U.S. institution, a journalist froma U.S.-based outlet could be less likely to reach out due to perceived challenges in time-zonedifferences or lower expectations of ﬂuency, potentially resulting in a lower rate of being men-tioned or quoted. Since non-U.S. institutions typically have more Asian and African authors dueto their locations, this mechanism could potentially explain the disparity in being mentioned.To examine the effect of geographical factors, we measured the bias separately for (i) the5ubset of our data where the ﬁrst author is from U.S.-based institutions, and (ii) that for non-U.S. authors. Compared to U.S.-based authors, international scientists have far lower rates ofbeing mentioned, with coefﬁcients (negatively) decreased by a factor of 2-4 for each ethnic-ity compared with their domestic counterparts (Table S6). This considerable gap reveals thatgeographic location is one major issue inﬂuencing mention biases in science news. However,international location alone does not explain all disparities in who is mentioned: The averagemarginal effects shown in Fig. 2 indicate that similar magnitude of mention biases still existamong U.S.-based authors. This comparative result indicates that other factors besides locationplay a substantial effect in which authors are named.

How Authors Are Referred To?

Mention

Quote

Inst. Substitution

Probability of being credited compared to Male/British-origin named authors

Figure 2: U.S.-based authors with minority-ethnicity names are less likely to be mentionedby name ( left ) or quoted ( middle ), and are more likely to be substituted by their institution( right ). The average marginal effects are estimated based on 169,984 observations where theﬁrst author is from U.S.-based institutions. A negative (positive) marginal effect indicates adecrease (increase) in probability compared to authors with Male (for gender) or British-origin(for ethnicity) names. The colors are proportional to the absolute probability changes.

Female is colored as blue to reﬂect its difference from ethnicity identities. The error bars indicate 95%bootstrapped conﬁdence intervals.Journalists have multiple options in how they incorporate the scientists performing the re-search. They may go beyond simply naming the scientist and incorporate quotes from them6bout the research; alternatively, they may have the scientist play a minimal agentive role byusing references like “researchers from

University .” These discourse mechanisms serve to fur-ther integrate or distance the scientist from their role in the described research—giving them aname and a voice or removing their individuality.Our prior result demonstrates that, even within the U.S., African and Asian authors expe-rience substantial under-reporting in being named. As U.S.-based authors may still differ intheir perceived ﬂuency in oral English, and also journalists may simply be less willing to con-tact certain ethnic authors even if they speak ﬂuent English, we hypothesize that authors fromprivileged demographics will be more likely to receive a quote, whereas those from disadvan-taged demographics will be more likely to indirectly mentioned as a role associated with theirinstitutions, rather than explicitly named.To test these hypotheses we further identiﬁed (i) authors who are named as part of quotations(a subset of name mentions), and (ii) authors who get unnamed but their institution is namedinstead (Section S1). Since ﬂuency is correlated with location, we focused on the U.S. subsetand applied the same mixed-effects regression framework to model two dependent variables:(1) whether the ﬁrst author is quoted, and (2) whether the ﬁrst author is indirectly mentioned bytheir institution instead being named or quoted.The average marginal effects in Fig. 2 reveal that U.S.-based African and Asian authorsare less likely to be quoted, and instead are more likely to be substituted by their role withintheir institutions (See Fig. S3 for results based on our full data). The signiﬁcant differencesin being quoted in U.S. subset indicate that the perceived English ﬂuency may play a majorrole in name mentions. However, language proﬁciency is not the only driving mechanism, as astrong bias appears for authors with Indian names, despite English being an ofﬁcial language inIndia. This, along with the “positive” effect in being substituted by institutions when name isnot mentioned for Asian and African authors, suggests that journalist animus also plays a rolein author mentions. This is the case especially given that journalists can always contact authorsperceived to be less ﬂuent via email to get a quote as a way to bypass potential challenges inoral communications, and that overall journalists are dealing with authors of research papers7ritten in English, which would potentially signal some English proﬁciency for all authors.Note that the result on institution substitution also demonstrates that the mention bias doesnot result from a potential mechanism where Asian and African authors working on researchthat is more likely to be used in news stories where there is no need for agency at all (e.g.,survey-like stories summarizing lots of recent results that brieﬂy mention research papers ontheir topic without any form of attribution).

Does It Matter Who Is Reporting?

Understanding whether this ethnic bias is related to journalists’ own demographics is anothercrucial step towards uncovering its mechanisms, as they are the actors who are directly respon-sible for writing the stories. First, journalists may differ in their overall tendencies to mentionﬁrst authors when covering science. Second, there might exist interaction effects between au-thors and journalists. One intuitive hypothesis, which we call “cultural hierarchy,” is that alljournalists, regardless of their gender and ethnicity, prefer to mention Male and British-originnamed scholars over minority others. At the same time, journalist may also prefer to mentionauthors from demographic categories that match their own, which we call “cultural homophily.”(McPherson et al., 2001)Our model controls for journalists’ demographics and their interactions with that of ﬁrst au-thors (Section S1). Due to insufﬁcient instances of identiﬁed journalists (Table S3), we reportthe result based on our ﬁnest model trained with the full data. No meaningful ethnic prefer-ences are seen for author-journalist interactions to suggest either cultural hierarchy or culturalhomophily hypothesis. However, when dropping controls for outlets (Table S5, Models 3-4),journalists’ ethnicities become signiﬁcant, suggesting that journalists’ behavior might be ex-plained by variations at the outlet level, i.e., certain news outlets mention authors more or lessoften and certain groups of journalists are under- or over-represented in those outlets.8 ifferences Across Outlet Types

Outlets vary in the depth and breath of their reporting, e.g., Science & Technology outletswrite about 650 words per story on average, while General News outlets write about 850 words(Section S1; Fig. S2). These differences suggest potentially important variability in the natureof journalists’ day-to-day work and backgrounds. To explore the discrepancy of bias acrossdifferent types of outlets in author mentions, we ﬁtted the speciﬁcation of Model 5 separatelyfor three outlet types in our data and quantiﬁed the average marginal effects.

Press Releases

Sci. & Tech.

General News

Probability of being mentioned compared to Male/British-origin named authors

Figure 3: The relative decrease in the probability of being mentioned for ﬁrst authors of minoritygender and ethnicity reveals a consistent behavioral bias across three types of outlet—yet withstarkly different magnitude of effects. Note that the average mention rates in Press Releases,Science & Technology, and General News outlets are 44.9%, 51.8%, and 22.1%, respectively.The colors are proportional to the absolute probability changes. Error bars represent 95% boot-strapped conﬁdence intervals.Surprisingly, the ethnic bias remains consistent across all outlet types, as shown in Fig. 3,with authors having non-British-origin names being mentioned less frequently across all threeoutlet types. Larger disparities are found for ethnic categories that are more distant from British-origin (e.g., Asian and African). However, outlet types vary substantially in the magnitude oftheir bias: Science & Technology outlets and General News outlets are, on average, three timesmore biased against non-British-origin named scholars than outlets in Press Releases (6% vs.2% marginal decrease). 9he bias in stories from Press Releases outlets is particularly notable, as stories in theseoutlets typically reuse content from university press-releases, suggesting that universities’ pressofﬁces themselves, while less biased than other outlet types, still prefer to mention scholars withBritish-origin names. This result is surprising because local press ofﬁces are expected to havegreater direct familiarity with their researchers, reducing the misuse of stereotypes, and to bemore responsible for representing minority researchers equitably.The largest disparities are seen in General News outlets, e.g. The New York Times andThe Washington Post, where again African and Chinese scholars have nearly a 10% absolutedrop in representation. General News outlets mention ﬁrst authors with a 22.1% chance onaverage (Table S4), so this drop in author coverage nearly halves the perceived role of a largecommunity of scientists. As General News outlets have well trained editorial staff and sciencejournalists dedicated to accurately reporting science and tend to publish longer stories that haveroom to mention and engage with authors, this result is alarming. Historically, these ethnicminorities have been underrepresented, stereotyped, or even completely avoided in U.S. media(Behm-Morawitz and Ortiz, 2013), which has continued in objective science reporting acrossall outlet types. The mechanisms behind variations by outlet type deserve further investigation.

Is the Situation Getting More Equitable?

The longitudinally-rich nature of our dataset allows us to examine how author mentions inscience news have changed over the last decade. Mention rates are on average decreasing overtime, as shown by the coefﬁcient for the mention year scalar variable in Model 5 (Table S5).To examine the time trends across demographic categories, separate models (Model 5) weretrained to quantify the marginal change per year increase for each gender and ethnicity in ourdata. Note that demographic attributes not under study are still included in each model, e.g.,when examining the temporal changes in mention rates for male and female authors, ethnicityis still included as a factor, and vice versa.As shown in Fig. 4, the mention year has a negative association with author mentions forMale and most ethnicity groups, indicating that most authors are less likely to be mentioned10 Increase in probability of beingmentioned with one year change

Figure 4: Average marginal effects on mention probability for a one-unit increase in mentionyear for authors in each gender (blue) and ethnicity (red) group, revealing that the beneﬁts ofprestiged demographics (Male, British-origin) are decreasing over time. However, only smallimprovements are seen for Chinese and Indian ﬁrst authors. African is not shown due to insuf-ﬁcient data for ﬁtting a Model 5. Error bars show 95% bootstrapped conﬁdence intervals.in later years. When compared with the average marginal effects of minority ethnicities on thelikelihood of being mentioned (Fig. 1), the larger decreases for ethnic groups such as British-origin and Scandinavian & Germanic indicate that their overall advantages are shrinking.Indeed, Chinese and Indian authors, two of the most disadvantaged groups in this study, havemention rates that are increasing over time, although more data is needed for precise estimation.However, their estimated rates of increase are relative small, suggesting that ethnic biases forthese authors are unlikely to disappear soon without purposeful behavior change. Based on theabsolute mention rate disparities between minority and British-origin named authors shown inFig. 1, and assuming a constant change rate per year for each ethnicity shown in Fig. 4, weestimate that only authors with Romance Language, Chinese, or Indian names will reach paritywith their British-origin named colleagues within 5-12 years in their rates of being mentioned;all other ethnicities see their overall mention rates drop similarly to that for British-origin names,indicating the current gap will persist. 11 iscussion

Our analyses reveal that the attention researchers get in news coverage is strongly associatedwith their ethnicities. The associations are robust to a variety of plausible confounds, and evenappear when controlling for the (1) particular news outlet, (2) particular scientiﬁc venue, and(3) particular research topic. Although we cannot claim the reported associations as causal, thisunusually strong observational evidence is a “smoking gun” of bias in coverage and deservesattention.

Ethnicity and Gender

Authors with non-British-origin names are mentioned substantially less when their research isdiscussed. The disparity appears for all non-British-origin names. However, mention rates areespecially low for Asian and African names, less pronounced for Indian, Middle Eastern, andRomance Language names, are even less pronounced for Scandinavian & Germanic and EasternEuropean names. The pattern is suggestive of stronger biases against non-Western ethnicities,but more evidence is needed to explain it. As science becomes more global and is increasinglydriven by non-Western ethnicities, the way English-language media responds to non-British-named scholars will only grow in importance.In contrast to ethnicity, we do not ﬁnd bias in mentions of female scholars, once researchﬁelds are controlled for. One possible reason is that ﬁelds vary in their overall level of coverageand in their gender representation (Handelsman et al., 2005). Looking within ﬁelds may thusmask or sidestep gender bias that is manifested between them.

Ruling in and out different mechanisms

Our analyses above point to a multi-causal generation of ethnic biases, in which both pragmaticdifﬁculties of interviewing distant researchers and journalists’ personal biases play key roles.In support of the pragmatic difﬁculties mechanism, we ﬁnd that biases are substantially smallerwhen both the journalists and researchers are U.S.-based. Additionally, the largest biases appearin direct quotations, which may be more difﬁcult to acquire from researchers in different time-12ones and who are likely to have non-British-origin names. In these cases, journalists appear to“substitute” the researcher’s institution for a direct quote.Nevertheless, biases remain even among geographically proximate actors, and journalists’choices are key. Supportive evidence comes from outlet types: when journalists’ role in thenews articles is minimal—when the outlet simply republishes a university press release—thebiases are also minimal (however, the disparities for many groups are still statistically distin-guishable from 0); when the news stories were written by journalists themselves, the biasesare the largest. The data does not allow us to rule out that journalists’ choices reﬂect personalanimus-based biases or the expected biases of their audiences. For example, the biases remaineven when controlling for readability of the research abstract, a potential signal of English proﬁ-ciency that might inﬂuence journalists’ decisions (Table S5). Furthermore, the fact that

Science& Technology and

General News outlets have biases of similar magnitude yet likely differ intheir audiences, suggests again the important role played by journalists’ personal biases.Lastly, we cannot rule out that the biases stem from the academic literature itself, and inparticular which author is designated as “corresponding” (our data did not include this designa-tion). Further disentangling these mechanisms is an important avenue for future work.

Limitations

Although the scale and the breath of our dataset enable the use of unusually ﬁne-grained con-trols, the analysis is not without limitations. First, the observational nature of the data precludesstrong causal statements. Second, some plausible explanatory covariates are unavailable for in-clusion, such as which author is designated as corresponding or the number of citations a paperreceived at the time of being mentioned. However, we anticipate the effect of such covariatesto be small given current controls. Fig. S1 shows that the majority of papers were mentionedwithin one year after publication, which limits the citations a paper can accrue in such a shortacademic time period. Third, the

Ethnea classiﬁer is unable to identify African American schol-ars by name due its deﬁnition of ethnicity at the country level. A manual analysis shows thatauthors with stereotypical African American names are classiﬁed as English (British-origin) if13hey have common English surnames. However, as a robustness test, we repeated our exper-iments using an additional ethnicity classiﬁcation based on coarser-grained U.S. census data(Fig. S3), which is able to identify such authors as Black; the result therein does not show anysigniﬁcant under-representation of Black scholars. Note that African-named authors (based onEthnea) are not necessarily classiﬁed as Black based on the Census data (Table S7-S8). Fi-nally, we note that our data contains too few examples of some ethnicities (e.g., Polynesian andCaribbean) to accurately estimate biases; such ethnicities are regrettably omitted, though werecognize that these groups likely experience bias from their minority status as well.

Conclusions and Implications

Our work shows that science journalism is rife with biases in who receives favorable coverage,with certain ethnic groups receiving much more name mentions and quotations than their peersconducting comparable research. These ethnic biases likely have direct negative consequencesfor the careers of unmentioned scientists, and skew the public perception of who a scientistis—a key factor in recruiting and training new scientists.Our ﬁndings have two important implications for science policy and science journalism.First, simply identifying large-scale ethnic disparities in science news, of which journalists maythemselves have been unaware, can be an agent of change. Second, decision-makers at U.S.research institutions may take ethnic disparities of media attention into account when mak-ing hiring or promotion decisions. More importantly, addressing this problem requires moreresearch to investigate the mechanisms leading to it, which we hope this paper helps stimulate.

References

Anurag Ambekar, Charles Ward, Jahangir Mohammed, Swapna Male, and Steven Skiena.Name-ethnicity classiﬁcation from open sources. In

KDD , pages 49–58, 2009.Amanda Amberg and Darren N Saunders. Cancer in the news: Bias and quality in mediareporting of cancer research. bioRxiv , page 388488, 2018.14ierre Azoulay, Toby Stuart, and Yanbo Wang. Matthew: Effect or fable?

Management Science ,60(1):92–109, 2013.Sarah Banchefsky, Jacob Westfall, Bernadette Park, and Charles M Judd. But you dont looklike a scientist!: Women scientists with feminine appearance are deemed less likely to bescientists.

Sex Roles , 2016.Elizabeth Behm-Morawitz and Michelle Ortiz. Race, ethnicity, and the media.

Oxford Hand-book of Media Psychology , pages 252–266, 2013.Deborah Blum and et al.

A ﬁeld guide for science writers . Oxford University Press, 2006.Dominique Brossard and Dietram A Scheufele. Science, new media, and the public.

Science ,339(6115), 2013.Clifford C Clogg, Eva Petkova, and Adamantios Haritou. Statistical methods for comparingregression coefﬁcients between models.

American Journal of Sociology , 100(5):1261–1293,1995.Jonathan R Cole.

Fair science: Women in the scientiﬁc community . Free Press, 1979.Blaise Cronin and Cassidy R Sugimoto.

Beyond bibliometrics: Harnessing multidimensionalindicators of scholarly impact . MIT Press, 2014.James Devitt. Framing gender on the campaign trail: Female gubernatorial candidates and thepress.

Journalism & Mass Communication Quarterly , 79(2):445–463, 2002.Waverly W Ding, Fiona Murray, and Toby E Stuart. Gender differences in patenting in theacademic life sciences.

Science , 313(5787):665–667, 2006.Donna K Ginther, Walter T Schaffer, Joshua Schnell, Beth Masimore, Faye Liu, Laurel L Haak,and Raynard Kington. Race, ethnicity, and nih research awards.

Science , 333(6045), 2011.Mott Greene. The demise of the lone author.

Nature , 450(7173):1165, 2007.15oger Guimera, Brian Uzzi, Jarrett Spiro, and Luis A Nunes Amaral. Team assembly mecha-nisms determine collaboration network structure and team performance.

Science , 308(5722):697–702, 2005.Jo Handelsman, Nancy Cantor, Molly Carnes, Denice Denton, Eve Fine, Barbara Grosz, Vir-ginia Hinshaw, Cora Marrett, Sue Rosser, Donna Shalala, et al. More women in science.

Science , 309(5738):1190–1191, 2005.Erin Hengel. Publishing while female. are women held to higher standards? evidence from peerreview.

Cambridge Working Papers in Economics 1753 , 2017.Patricia Wonch Hill, Julia McQuillan, Amy N Spiegel, and Judy Diamond. Discovery orienta-tion, cognitive schemas, and disparities in science identity in early adolescence.

SociologicalPerspectives , 2018.Constance Holden. General contentment masks gender gap in ﬁrst aaas salary and job survey.

Science , 294(5541):396–411, 2001.Travis A Hoppe, Aviva Litovitz, Kristine A Willis, Rebecca A Meseroll, Matthew J Perkins,B Ian Hutchins, Alison F Davis, Michael S Lauer, Hannah A Valantine, James M Anderson,et al. Topic choice contributes to the lower rate of nih awards to african-american/blackscientists.

Science Advances , 5(10):eaaw7238, 2019.Junming Huang, Alexander J Gates, Roberta Sinatra, and Albert-L´aszl´o Barab´asi. Historicalcomparison of gender inequality in scientiﬁc careers across countries and disciplines.

PNAS ,117(9):4609–4616, 2020.Sen Jia, Thomas Lansdall-Welfare, and Nello Cristianini. Measuring gender bias in news im-ages. In

Proceedings of the 24th International Conference on World Wide Web , pages 893–898. ACM, 2015.Sen Jia, Thomas Lansdall-Welfare, Saatviga Sudhahar, Cynthia Carter, and Nello Cristianini.Women are seen more than heard in online newspapers.

PLOS ONE , 11(2):e0148434, 2016.16imon M Laham, Peter Koval, and Adam L Alter. The name-pronunciation effect: Why peoplelike mr. smith more than mr. colquhoun.

Journal of Experimental Social Psychology , 48(3):752–756, 2012.Vincent Larivi`ere, Chaoqun Ni, Yves Gingras, Blaise Cronin, and Cassidy R Sugimoto. Bib-liometrics: Global gender disparities in science.

Nature News , 504(7479):211, 2013.Erin Leahey. Not by productivity alone: How visibility and specialization contribute to aca-demic earnings.

American Sociological Review , 72(4):533–561, 2007.Timothy J Ley and Barton H Hamilton. The gender gap in nih grant applications.

Science , 322(5907), 2008.Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily insocial networks.

Annual Review of Sociology , 27(1):415–444, 2001.Robert K Merton. The matthew effect in science: The reward and communication systems ofscience are considered.

Science , 159(3810):56–63, 1968.Jack Merullo, Luke Yeh, Abram Handler, II Grissom, Brendan O’Connor, Mohit Iyyer, et al.Investigating sports commentator bias within a large corpus of american football broadcasts.In

EMNLP , 2019.David I Miller, Kyle M Nolla, Alice H Eagly, and David H Uttal. The development of children’sgender-science stereotypes: a meta-analysis of 5 decades of us draw-a-scientist studies.

ChildDevelopment , 2018.Staˇsa Milojevi´c. Principles of scientiﬁc research team formation and evolution.

PNAS , 2014.Corinne A Moss-Racusin, John F Dovidio, Victoria L Brescoll, Mark J Graham, and Jo Han-delsman. Science facultys subtle gender biases favor male students.

PNAS , 109(41):16474–16479, 2012. 17iego FM Oliveira, Yifang Ma, Teresa K Woodruff, and Brian Uzzi. Comparison of nationalinstitutes of health grant amounts to ﬁrst-time male and female principal investigators.

JAMA ,321(9):898–900, 2019.Ernesto Reuben, Paola Sapienza, and Luigi Zingales. How stereotypes impair women’s careersin science.

PNAS

Proceedings of the 61st Annual Meeting of the AmericanSociety for Information Science , volume 35, pages 279–289, 1998.Dietram A Scheufele. Communicating science in social settings.

PNAS , 110:14040–14047,2013.Helen Shen. Inequality quantiﬁed: Mind the gender gap.

Nature News , 495(7439):22, 2013.Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and KuansanWang. An overview of microsoft academic service (mas) and applications. In

WWW , 2015.Kevin B Smith. When all’s fair: Signs of parity in media coverage of female candidates.

Polit-ical Communication , 14(1):71–82, 1997.Hyunjin Song and Norbert Schwarz. If it’s difﬁcult to pronounce, it must be risky: Fluency,familiarity, and risk perception.

Psychological Science , 20(2):135–138, 2009.Gaurav Sood and Suriyan Laohaprapanon. Predicting race and ethnicity from the sequence ofcharacters in a name. arXiv:1805.02109 , 2018.S Shyam Sundar. Effect of source attribution on perception of online news stories.

Journalism& Mass Communication Quarterly , 75(1):55–68, 1998.Andrew Tomkins, Min Zhang, and William D Heavlin. Reviewer bias in single-versus double-blind peer review.

PNAS , 114(48):12708–12713, 2017.18ucktada Treeratpituk and C Lee Giles. Name-ethnicity classiﬁcation and ethnicity-sensitivename matching. In

Twenty-Sixth AAAI Conference on Artiﬁcial Intelligence , 2012.Caroline Sotello Viernes Turner, Juan Carlos Gonz´alez, and J Luke Wood. Faculty of color inacademe: What 20 years of literature tells us.

Journal of Diversity in Higher Education , 1(3):139, 2008.Clifford H Wagner. Simpson’s paradox in real life.

The American Statistician , 36(1):46–48,1982.Kuansan Wang, Zhihong Shen, Chi-Yuan Huang, Chieh-Han Wu, Darrin Eide, Yuxiao Dong,Junjie Qian, Anshul Kanakia, Alvin Chen, and Richard Rogahn. A review of microsoftacademic services for science of science studies.

Frontiers in Big Data , 2:45, 2019.Samuel F Way, Daniel B Larremore, and Aaron Clauset. Gender, productivity, and prestige incomputer science faculty hiring networks. In

WWW , pages 1169–1179, 2016.Jevin D West, Jennifer Jacquet, Molly M King, Shelley J Correll, and Carl T Bergstrom. Therole of gender in scholarly authorship.

PLOS ONE , 8(7):e66212, 2013.Yu Xie. undemocracy: inequalities in science.

Science , 344(6186):809–810, 2014.Yue Xie, Kimberlee A Shauman, and Kimberlee A Shauman.

Women in science: Career pro-cesses and outcomes . Harvard University Press, 2003.19 upplemental MaterialS1 Materials and Methods

To test for and quantify gender and ethnic bias across media outlets, we constructed a massivedataset by combining news media reports with metadata for the scientiﬁc papers they cover, andthen inferring demographics of the papers’ authors.We focused on mentions of the ﬁrst authors for two reasons: (i) the ﬁrst author position ismore likely to be occupied by early career researchers, and as a result, media coverage may bemore consequential for their careers; (ii) science journalism guidelines highlight the ﬁrst authoras the one who has likely contributed most to the work (Blum and et al., 2006) and therefore isa natural person to mention. Papers in a few research ﬁelds that commonly use the alphabetic-based authorship contributions are also included since journalists may be unfamiliar with thisnorm.

S1.1 News Stories Mentioning Research Papers

The dataset of news stories mentioning scientiﬁc papers was collected from

Altmetric.com (accessed on Oct 8, 2019), which tracks a variety of sources for mentions of research papers,including coverage from over 2,000 news outlets around the world. To control for differencesin the frequency of scientiﬁc reporting and potential confounds from variations in journalisticpractices across different countries, the list of news outlets was curated to 423 U.S.-based newsmedia outlets, with each having at least 1,000 mentions in the Altmetric database. Locationdata for each outlet is provided by Altmetric. This exclusion criterion ensures that the datasethas sufﬁcient volume to estimate outlet-level biases, while still retaining sufﬁcient diversityin outlet types, stories, and the scientiﬁc articles they cover. This initial dataset consists of2.4M mentions of 521K papers by 1.7M news articles before 2019-10-06. Each mention in theAltmetric data has associated metadata that allows us to retrieve the original citing news storyas well as the DOI for the paper itself. 20

Due to access and permission limitations when retrieving news stories, 135 outlets were ex-cluded due to insufﬁcient volume (27 outlets denied our access entirely; 65 outlets had lessthan 100 urls crawled; 43 outlets had at least 100 urls crawled, but only with non-news contentsuch as subscription ads). For the remaining 288 outlets, 48.6% of the stories were successfullyretrieved. The stories were then cleaned to remove all html tags and unrelated content such asadvertisements. Stories with less than 100 words were removed (0.7%) as a manual inspectedshowed the vast majority of these do not contain the complete content of the story. This processresults in 568,785 downloaded stories mentioning 290,469 papers from the 288 outlets.In order to control for the effects of journalists’ ethnicity and gender (cf. Section S1.8), weused the newspaper

Python package ( https://github.com/codelucas/newspaper )to extract the journalists’ names from the retrieved html news content. Since not all stories ineach outlet contain the journalist information and the newspaper package does not work per-fectly for every story that has journalist information, we focused on the top 100 outlets (rankedby the story count). With manual inspection, we veriﬁed that this package can consistently andreliably identify journalist names for 41 of the top 100 outlets. We excluded extracted nameswith words signaling institutions and organizations (such as “University”, “Hospital”, “World”,“Arxiv”, “Team”, “Staff”, and “Editors”). We also cleaned names by removing preﬁx words,such as “PhD.”, “M.D.”, and “Dr.”. We eventually obtained the journalist names in 100,163news stories for 41 outlets (17.5%).

S1.3 Retrieving Paper Metadata

The Altmetric database does not contain author information and therefore an additional datasetis needed to identify the authors for mentioned papers. We used the Microsoft Academic Graph(MAG) snapshot data (accessed on June 01, 2019) to retrieve information for each paper basedon its DOI (Sinha et al., 2015). Not all papers with a DOI in the Altmetric database are indexedin the MAG. We were ultimately able to retrieve 269,509 papers from MAG based on DOIs(matching based on lower-cased strings). MAG also provides rich metadata for papers, includ-21ng author names, author rank, author afﬁliation rank, publication year, publication venue, thepaper abstract, and paper topical keywords. As all of this information will be used in our re-gression models (cf. Section S1.8), we excluded papers with missing metadata and two papersthat list organizations as ﬁrst authors, leaving us with 100,208 papers.

S1.4 Inferring Author and Journalist Gender and Ethnicity

We used

Ethnea to infer the gender and ethnicity for authors. The library makes its predictionbased on the nearest-neighbor matches on authors’ ﬁrst and last names using a ground-truthdatabase of scholars’ country of origin, which offers superior performance over alternative ap-proaches (Ambekar et al., 2009; Treeratpituk and Giles, 2012).Author names in the MAG have varying amounts of completeness. While most have theﬁrst name and surname, special care is taken for three cases: (1) If the name has a single word(e.g., Curie), the ethnicity and the gender are both set to

Unknown , as

Ethnea requires at least aninitial. Single-word name cases occurred for seven authors total. (2) If the name has an initialand surname (e.g., M. Curie), we directly feed it into the API, which provides an ethnicityinference but returns

Unknown for gender due to the inherent ambiguity. (3) If the name has atthree or more words, we take the ﬁrst word as the given name and the last word as the surname.However, if the ﬁrst word is an initial and the second word is not an initial, we take the secondword as the given name (e.g., M. Salomea Curie would be Salomea Curie) to improve predictionaccuracy and retrieve a gender inference.While

Ethnea is trained with scholar names, we also applied it to predict the gender andethnicity for journalists (cf. Section S3 for robustness check).

Ethnea assigns ﬁne-grained ethnic categories based on nationality. Here, we follow theirsame term of ethnicity, recognizing that while ethnicity and nationality are closely related, thetwo are not synonymous (discussed in the main text). To test for macro-level trends aroundlarger ethnic categories and to ensure sufﬁcient samples to estimate the effects, we group the24 observed ethnicities into 9 higher-level categories based on linguistic families and culturaldistance (Table S1). 22 road Ethnic Category Individual Ethnicity

African

African non-Chinese East Asian

Indonesian , Japanese , Korean , Mongolian , Thai , Vietnamese

Chinese

Eastern European

Baltic , Greek , Hungarian , Slav

British-origin

English

Indian

Middle Eastern

Arab , Israeli , Turkish

Scandinavian & Germanic

Dutch , German , Nordic

Romance Language

French , Hispanic , Italian , Romanian

Unknown Note: names are unrecognized by

Ethnea .Table S1: 24 individual ethnicities are grouped into the 9 broad ethnic categories.Note that due to sample size and our hypotheses,

African , Chinese , Indian , and

English (renamed as “British-origin”) are kept as separate high-level categories.

Caribbean and

Poly-nesian are excluded due to less than mentions in total. Examples of names classiﬁed intoeach ethnicity are provided in Table S9. Ethnea returns binary gender categories:

Female and

Male , though we recognize that researchers may identify with genders outside of these two cat-egories. For both gender and ethnicity separately, some names are classiﬁed as “Unknown” ifno discernable signal is found for the respective attribute by

Ethnea . S1.5 Final Dataset and Statistics

The ﬁnal dataset consists of 232,524 news stories referencing 100,208 research papers. As somestories mentioned more than one paper and some papers were mentioned in more than one story,we have 285,708 total observations to test whether a paper’s ﬁrst author is mentioned in a story.Figs. S1a-b show the distribution of papers and news stories over time and attention perpaper. News story data is left censored and primarily includes stories written after 2010. Cen-soring can be explained by the fact that

Altmetric.com was only launched in 2012, limiting thecollection of earlier news. As shown in Fig. S1c, news stories can mention papers that werepublished several decades before, highlighting the potential lasting value of scientiﬁc work.However, the majority of papers are mentioned within the same year or just a few years afterpublication. Table S2 shows the mention counts for authors in each broad ethnicity group, and23

Year10 C o un t a Scientific papersNews stories 10 Num. of news mentions per paper10 N u m . o f s c i e n t i f i c p a p e r s b 0 10 Gap in years10 N u m . o f m e n t i o n p a i r s c Figure S1: a, The number of news stories and research papers in our mention date over time. b, The distribution of the number of news mentions per paper. c, The distribution of the yeargap between paper publication date and news story mention date for all 285,708 story-papermention pairs in the ﬁnal dataset.Table S3 shows the mention counts by journalist ethnicity.

Authors Broad Ethnic Category

British-origin 41,446 12,1891 2.94Scandinavian & Germanic 14,982 41,982 2.80Romance Language 14,982 41,156 2.75Chinese 9,262 25,968 2.80Middle Eastern 5,291 15,267 2.89Eastern European 4,313 12,222 2.83Indian 4327 12,576 2.91non-Chinese East Asian 4,408 11,254 2.55African 682 1902 2.79Unknown Ethnicity 515 1,490 2.89Total 100,208 285,708 2.85Table S2: The number of mentioned papers (unique ones), the total number of story-papermention pairs, and the average number of mentions per paper for authors in each of the 9 high-level ethnicity groups.

S1.6 News Outlets Categorization

To estimate differences across outlets, we grouped 288 news outlets into three categories accord-ing to their news report publishing mechanisms. The three categories are: (1) Press Releases,24 ournalists Broad Ethnic Category

British-origin 37,046Scandinavian & Germanic 5,182Romance Language 7,329Chinese 1,251Middle Eastern 1,788Eastern European 1,679Indian 1,213non-Chinese East Asian 451African 321Unknown Ethnicity 229,448Total 285,708Table S3: The number of story-paper mention pairs by journalists in each of the 9 high-levelethnicity groups.(2) Science & Technology, and (3) General News. The categorization is based on manual in-spections of three random stories for each outlet (Appendix Table S10 shows the full list).The Press Releases category is unique since many outlets in this category commonly—ifnot exclusively—republish university press-releases as stories, making them reasonable proxiesfor estimating bias from a university’s own press ofﬁce. The Science & Technology categoryconsists of magazines that primarily focus on reporting science, such as “MIT TechnologyReview” and “Scientiﬁc American.” These outlets typically construct a large scientiﬁc narrativereferencing several papers in their stories. The General News category includes mainstreamnews media such as “The New York Times” and “CNN.com” that publish stories in a widevariety of topics. They also have well-trained editorial staff and science journalists who arefocused on accurately reporting science.Table S4 shows the paper-story mention pairs for three types of outlets. The average numberof words per story for each outlet type is shown in Fig. S2.25 utlet Type

Press Releases 18 EurekAlert! 81,486 44.9%Science & Technology 79 MIT Technology Rev. 69,966 51.8%General News 171 The New York Times 125,241 22.1%Table S4: The number of outlets for three outlet types, their number of story-paper mentions,and the percentage of mentions that have named the ﬁrst authors. The full list of 288 outlets areavailable in Appendix Table S10.

Press Releases Sci. & Tech. General News02468 a v g . s t o r y l e n g t h ( w o r d s ) ×10 Figure S2: The average story length for three types of outlets. Error bars show 95% conﬁdenceintervals.

S1.7 Check Author Attributions in Science News

S1.7.1 Author Name Mentions

We normalized both the news content and the author names to ensure that this computationalapproach works for names with diacritics. For each story-paper mention pair, each author’slast name is searched for using a regular expression with word boundaries around the name,requiring that the name’s initial letter be capitalized. While the chance exists that this processmay introduce false positives for authors with common words as last names (e.g., “White”),such cases are rare because (i) few authors in our dataset have common English words as theirlast names, and (ii) these words rarely appear at the beginning of a sentence in the story whenthey would be capitalized. However, a particular exception is for two common Chinese lastnames “He” and “She,” which can appear as third person pronouns at the start of sentences. Wethus imposed additional constraints for these two names such that they must be immediatelypreceded with one of the following titles to be considered as a name mention: “Professor”,26Prof.”, “Doctor”, “Dr.”, “Mr.”, “Miss”, “Ms.”, ‘Mrs.”. Ultimately, ﬁrst authors were found in104,569 of the 285,708 story-paper mention pairs (36.6%).

S1.7.2 Author-Quote Detection

Authors can be mentioned by name in different forms, including quotation (e.g., “’We are get-ting close to the truth.’ said Dr. Xu”), paraphrasing (e.g., “Timnit says she is conﬁdent, however,that the process will soon be perfected.”), and simple passing (e.g., “A recent research conductedby Dr. Jha found that drinking coffee has no harmful effects on mental health.”).We used a rule based matching method to detect explicit quotes for each story-paper pair.We ﬁrst parsed our news corpus using spacy ( https://spacy.io/ ). We identiﬁed 18 verbsthat were commonly used to integrate quoted materials in news stories, from the most 50 fre-quently used verbs in our news corpus, including “describe”, “explain”, “say”, “tell”, “note”,“add”, “acknowledge”, “offer”, “point”, “caution”, “advise”, “emphasize”, “see”, “suggest”,“comment”, “continue”, “conﬁrm”, “accord”. A sentence is determined to contain a quote fromthe ﬁrst author if the following two conditions are met: (i) both the quotation mark and theauthor’s last name appear in the sentence, and (ii) any of the 18 quote-signaling verbs (or theirverb tenses) appear with ﬁve tokens before or after the author’s last name. A manual inspectionof 100 extracted quotes revealed no false quote attributes. This conservative method only givesan underestimate of the quote rate, as it may not be able to detect every quote due to unusualwriting styles or article formatting. So the beneﬁt of English-named scholars in getting a quote(Fig. 2 in the main text) may be even higher. S1.7.3 Institution Mentions

We checked institution mentions based on exact string matching with the reported instituionname for the ﬁrst author in the MAG, i.e., for each story-paper pair, we examined whetherthe ﬁrst author’s full institution name appeared in the news story. Similar to quote detection,this method may not be able to identify every instance of institution mentions due noise inthe MAG or the story using slightly different nomenclature such an institutions’ abbreviation.However, a full list of alternate names for each institution is not available to us, we thus used27his conservative method. For this reason, minority scholars’ the trend in being substituted byinstitutions (Fig. 2 in the main text) is likely an underestimation.

S1.8 Regression Models

We adopted a logistic regression framework to examine the demographic bias in author men-tions in science reporting. Many factors are known to inﬂuence name mentions that couldconfound the analysis of ethnicity and gender, such as author reputation, institutional prestigeand location, publication topics and venues, or outlets and journalist demographics.Here, we provide details of these factors and present a series of ﬁve regression modelsthat build upon one another by adding more rigorous control variables at each step. In ourregression framework, each story-paper mention pair is an observation, with the dependentvariable indicating whether the ﬁrst author of the paper is mentioned or not in the story. Wedesigned a mixed-effects model with ﬁve groups of variables: (1) ﬁrst author demographics(gender and ethnicity); (2) paper author controls, including prestige factors, last name factors,and other authors; (3) paper and story content, including temporal factors, paper readability,story length, number of papers mentioned per story, and journalist demographics; (4) ﬁxed-effects for paper domains and topics; (5) random effects for outlets, publication venues, andpopular last authors. The increasing level of model complexity allows us to test the robustnessof the effects of ethnicity and gender, and also to examine potential factors at play in sciencecoverage. Table S5 shows the step-wise regression results.

Model 1: Naive Bias

The ﬁrst model directly encodes our two variables of focus, gender and ethnicity, as the solecategorical factors of the regression model. Here and throughout the study, we treat the ref-erence coding for ethnicity as

British-origin and for gender as

Male . While overly simplisticin its modeling assumptions, Model 1 nevertheless tests for systematic differences for whetherauthors of a particular demographic are mentioned less frequently and serves as a baseline forlayering on controls to explain such bias. 28 odel 2: Paper Author Controls

Many author-level attributes other than demographics could inﬂuence journalistic perceptionson authors and the coverage of them. Model 2 introduces 20 additional factors for controllingfor features of the paper’s authors.

Prestige Factors.

The reputation of the ﬁrst author may also inﬂuence the chance of beingnamed. High-status actors and institutions tend to receive preferential treatment within sci-ence (Merton, 1968; Azoulay et al., 2013; Tomkins et al., 2017), and we hypothesize that theseprestige-based disparities may carry over to media coverage as well. To account for prestige ef-fects, we include the author rank and institution rank provided by the MAG (Wang et al., 2019).This ranking estimates the relative importance of authors and institutions using paper-level fea-tures derived from a heterogeneous citation network; while similar to h-index, the method hasbeen shown to produce more ﬁne-grained and robust measurements of impact and prestige. In-stitution and author ranks are not necessarily directly related, as institutions may be home toauthors of varying ranks (e.g., early- or late-career faculty) and the same author may appearwith different afﬁliations on separate papers due to a career move. Note that for rank values,negative-valued coefﬁcients in the regression models would indicate that higher-ranked individ-uals and those from higher-ranked institutions are more likely to be mentioned.We also add a variable indicating the location of the ﬁrst author’s institution with threecategories: (1) domestic, (2) international, (3) unknown. This variable controls for the geo-graphical factor that may inﬂuence journalists’ willingness to contact by phone or video chatservice and therefore inﬂuence whether they mention the author. We infer the country of originfor institutions based on their latitude and longitude provided in the MAG.

Last Name Factors.

People are known to have a preference for both familiar and moreeasily-pronounceable names (Song and Schwarz, 2009; Laham et al., 2012), and this preferencecould potentially bias which author a journalist mentions. Therefore, we introduce two factorsas proxies: (1) the number of characters in the last name as a proxy for pronounceability, and(2) the log-normalized count of the last name per 100K Americans from the 2018 census data.As journalists are drawn from U.S.-based news sources, the latter reﬂects potential familiarity.29 ther Authors.

Scientiﬁc knowledge is increasingly discovered by large teams, as tacklingcomplex problems often require the collaboration between experts with diverse sets of special-ization (Guimera et al., 2005; Greene, 2007; Milojevi´c, 2014). On these multi-author projects,the last author is typically the senior author responsible for directing the project—a trend that isknown in science journalism guidelines when determining whom to interview (Blum and et al.,2006). The last author could be more likely to be mentioned in press coverage, which could po-tentially reduce the chance for the ﬁrst author. Therefore, we control for whether the last authoris mentioned in the news article using a binary factor. As the demographics of the last authormay inﬂuence whom a journalist decides to mention, we control for the ethnicity and gender ofthe last author, using

British-origin and

Male as the reference category respectively. Note thatsome papers are monographs with no last author. To control for these cases, we include a binaryfactor

Solo which is set to 1 for monographs, at which point all factors related to the last author(gender, ethnicity, and is-mentioned) are set to 0.When journalists examine a paper’s author list, the team size may inﬂuence their under-standing of the distribution of credits among authors, potentially reducing the chance of anyauthor being mentioned for papers with many authors. We thus include a factor for the numberof authors.

Model 3: Paper and Story Content

Besides author-level attributes, the content of the paper and story, and journalist demographicsalso can play a role in affecting author mentions. We thus control for the following factors inModel 3.

Year of News Story (Mention Year).

Bias in science coverage may have temporal variationsdue to unpredictable factors that are directly or indirectly related to research. For instance, theavailable funding resources can affect the number of research outputs in a year, which would inturn inﬂuence the amount of time and space journalists devote to scientists in news articles. Wethus control for the year of the news story, i.e., the mention year of the paper. We treat it as ascalar variable (zero-centered).

Year Gap between Story and Paper.

News stories often reference older scientiﬁc papers in30he narrative, as shown in Fig. S1c. For older papers, at the time of a recent story publication,the original authors may be unable to be reached or the story may be framed differently fromrecent science that is considered “fresh.” Indeed, citing timely scientiﬁc evidence in a newsreport can increase credibility perceptions of a story (Sundar, 1998; Rieh and Belkin, 1998).Therefore we include a factor that quantiﬁes the year difference between the mention year andthe publication year of the mentioned paper.

Number of papers mentioned in a story.

A story can mention several papers to help frameand construct its scientiﬁc narrative, and potentially increase its news credibility perception.However, the more papers being referenced in a story may reduce the amount of space andattention allocated to each paper by journalists, and therefore may decrease the chance of itsauthors being mentioned. We thus control for the number of mentioned papers in a story.

News Story Length.

Longer articles provide more space in depicting stories about the sciencebeing covered, we thus control for the length of each story, measured as the total number ofwords.

Paper Readability.

Given the tight timelines under which journalists work, quickly iden-tifying and understanding insights is likely critical to what is said about a paper. A paper’sreadability may thus inﬂuence whether a journalist feels the need to reach out to the author,with more readable papers requiring less contact. Readability, in turn, may also be tied to au-thor’s demographics like gender (Hengel, 2017), making it important to take readability intoaccount. Due to licensing restrictions, the full text of the majority of papers is unavailablefreely; therefore we compute readability over the paper abstract using three factors: (1) theFlesch-Kincaid readability score, which estimates the grade-level needed to understand the pas-sage; (2) the number of sentences per paragraph, which is a proxy for information content anddensity; and (3) the type-token ratio, which is a measure of lexical variety. Another reason wefocus particularly on the abstract is that journalists may not read the entire paper but very likelyread the abstract.

Journalist Demographics.

It is ultimately the journalist’s decision to mention authors whenwriting science reports. Motivated by the commonly observed homophily principle in social31etworks (McPherson et al., 2001), we hypothesize that the mentioning behavior in sciencereporting is associated with homophilous effects by ethnicity and gender. To model such effects,we include the journalists’ demographics and their interactions with ﬁrst authors’ gender andethnicity.Due to insufﬁcient instances of journalists identiﬁed in news stories (cf. Section S1.2; Ta-ble S3), we further coarsen the 9 broad ethnicity categories into 4 groups: (1) Asian (Chinese,Indian, and non-Chinese East Asian), (2) British-origin, (3) European (Eastern European, Ro-mance Language, and Scandinavian & Germanic), and (4) Other Unknown (Middle Eastern,African, and Unknown).

Model 4: Paper Domains and Topics

Some scientiﬁc domains and topics may be inherently more news-worthy than others. Further-more, journalists’ academic backgrounds may be unequally distributed across scientiﬁc ﬁelds,resulting in different propensities to reach out to authors. Therefore, in Model 4, we includefactors to capture the domain of a paper using metadata from the MAG, which includes a largevolume of keywords (665K) at different levels of speciﬁcity. A paper can have multiple key-words, with each having an associated conﬁdence score between 0 and 1. To capture high-leveltopical and methodological differences, we restrict our focus to the most-common 533 key-words that occur in at least 500 papers in our dataset. Each keyword is used as an independentvariable in the regression, whose value is the keyword’s conﬁdence score for the paper.

Model 5: Outlets, Venues, and Famous Research Labs

News outlets and publication venues both reﬂect extra sources of variability in the regressionmodels. Individual news outlets may follow different standards of practice in how they describescience, creating a separate source of variability in who is mentioned. Publication venues eachcome with different levels of impact and topical focus that potentially affect the depth of jour-nalistic focus on papers published in them. Additionally, famous research labs managed bysenior researchers may be more likely to receive media attention and name attribution as a ben-eﬁt of their visibility gained by previous research outputs. Such popularity can be approximated32y famous last authors based on their number of mentioned papers in our data. To accuratelymodel these sources of variations, we treat outlets, venues, and top 100 last authors as randomeffects in regression Model 5. This mixed-effect regression model implicitly captures a robustset of factors involved in science reporting such as the tendency of speciﬁc journals to be men-tioned more frequently (e.g.,

Nature , Science , or

JAMA ), the focus of news outlets on speciﬁctopics covered by different journals, and the attention beneﬁts for authors working with famousresearch labs.

S2 Regression Results

S2.1 Coefﬁcients for Five Models in Author Mentions

The coefﬁcients for ﬁve regression models are shown in Table S5. For space, all variables inModel 5, including the paper keywords and author-journalist interaction terms, are shown inAppendix Table S11.

S2.2 Inﬂuence of Control Variables

Although our focus in on ethnicity and gender, we ﬁnd that many controls are also stronglyassociated with author mention rates. Examining the inﬂuence of these factors can lead to abetter understanding of the mechanisms at play in science reporting. Below we interpret theireffects based on Model 5 (Table S5) along three themes: (1) prestige related inequality, (2)impact of co-authorship, and (3) story content effects.Scholars who have a high professional rank or are afﬁliated with prestigious institutionsreceive outsized attention in science news. This result suggests that the beneﬁts of status, theso-called “Matthew Effect” (Merton, 1968), persist even after publication.Although having more authors has a weak negative effect on the ﬁrst author being men-tioned, if the last author is mentioned, the ﬁrst author is substantially more likely to be men-tioned as well, suggesting that many stories tend to only engage with a few authors per refer-enced paper. Surprisingly, the demographics of last authors also play a weak role in ﬁrst author33entions, with slightly negative effects for last authors with Eastern European, Middle Eastern,and Chinese names.Solo-authored papers have been decreasing over time and are associated with lower impacton average (Greene, 2007; Milojevi´c, 2014). However, our results highlight an underappreciatedbeneﬁt—conditional on a paper being referenced in the news, a solo author is signiﬁcantly morelikely to be mentioned compared to the ﬁrst author of a multi-author paper. Although seeminglycounter to previous studies, this result has a natural explanation—there is only one author tomention if need be.The coefﬁcients for story features point to the multifaceted nature of science reporting.Although the volume of science reporting is increasing over time (Fig. S1a), journalists tendto mention authors less frequently in later years. At the same time, while older papers arestill discussed in the media (Fig. S1c), journalists are less likely to mention authors of thesestudies as often. When more papers are referenced in a story, their ﬁrst authors are less likelyto be mentioned. We hypothesize that such stories are often citing multiple scientiﬁc papers toconstruct a large narrative and thus those papers are only mentioned in passing.

S2.3 U.S. vs. non-U.S. Institutions in Author Mentions

When ﬁtting a model for the U.S. subset (or non-U.S. subset), we omitted the location variableintroduced in Section S1.8 (Model 2). The coefﬁcients for gender and ethnicity in two modelsare shown in Table S6, which reveal that scholars from non-U.S. institutions are much lesslikely to be mentioned by U.S. media than their counterparts from U.S.-based institutions, withfour categories reaching statistical signiﬁcance, including Romance Language, Scandinavian &Germanic, Chinese, and Middle Eastern.

S2.4 Who is Quoted or Institutionally Substituted?

The three subplots in Fig. S3 show the average marginal effects for minority gender and ethnic-ity authors in being mentioned by name, quoted, or substituted by institution when author nameis not mentioned, respectively. Note that each model is ﬁtted with our full data.34 .10 0.05 0.00 0.05AfricanIndianMiddle EasternChinesenon-Chinese East AsianEastern EuropeanScandinavian & GermanicRomance LanguageFemale

Mention

Quote

Inst. Substitution

Probability of being credited compared to Male/British-origin named authors

Figure S3: Authors with minority-ethnicity names are less likely to be mentioned by name( left ) or quoted ( middle ), and are more likely to be substituted by their institution ( right ). Theaverage marginal effects are estimated based on 285,708 observations in our data. A negative(positive) marginal effect indicates a decrease (increase) in probability compared to authorswith Male (for gender) or British-origin (for ethnicity) names. The colors are proportionalto the absolute probability changes.

Female is colored as blue to reﬂect its difference fromethnicity identities. The error bars indicate 95% bootstrapped conﬁdence intervals.

S3 Additional Ethnicity Coding

While

Ethnea provides a large set of nationality-based ethnicity codings speciﬁcally tailoredto scientists, the library could potentially introduce artifacts in its labeling. As a robustnesscheck, we re-coded the ethnicities of all authors and journalists using two separate sources totest whether the observed bias persists. Speciﬁcally, we used the ethnicolr library ( https://pypi.org/project/ethnicolr/ ) to code ethnicity using either data derived from (i)the nationalities listed in Wikipedia infoboxes to infer nationality-based ethnicity, or (ii) self-reported ethnicity data associated with last names from the 2010 U.S. census. While these twonew sources of data use different deﬁnitions and granularities of ethnicity from

Ethnea , theynonetheless provide approximately-similar categories to

Ethnea that enable us to validate ourresults.

Ethnicity based on Wikipedia Data.

We used the Wikipedia infobox data to code au-thor and journalist ethnicity based on the ﬁrst name and the last name (Ambekar et al., 2009;Sood and Laohaprapanon, 2018). To make the results comparable to that based on

Ethnea (Section S1.4), we placed 13 individual ethnicities deﬁned in the Wikipedia data into 8 broad35ategories:• (1) African (

Africans ),• (2) British-origin (

British ),• (3) East Asian (

EastAsian , Japanese ),• (4) Eastern European (

EastEuropean ),• (5) Indian (

IndianSubContinent ),• (6) Middle Eastern (

Muslim , Jewish )• (7) Roman Language (

French , Hispanic , Italian ),• (8) Scandinavian & Germanic (

Germanic , Nordic ).Note that Chinese ethnicity (deﬁned in

Ethnea ) is by default incorporated into the

EastAsian ethnicity in the Wikipedia data. We further placed the 8 categories into 4 groups for journalistethnicity due to insufﬁcient data size: (1) Asian (East Asian, Indian), (2) British-origin, (3) Eu-ropean (Eastern European, Roman Language, Scandinavian & Germanic), (4) Other Unknown(African, Middle Eastern, Unknown). We ﬁtted the speciﬁcation of Model 5 using this codingscheme (British-origin and Male are still used as the reference categories).

Race in U.S. Census Data.

Similarly, we coded the race for authors and journalists usingthe 2010 U.S. Census data based on the last name (Ambekar et al., 2009; Sood and Laohapra-panon, 2018). The four race categories: (1) Asian ( api ; [note that api denotes Asian and PaciﬁcIslander]), (2) Black ( black ), (3) Hispanic ( hispanic ), (4) White ( white ), are directly used to ﬁtthe speciﬁcation of Model 5 with White and Male used as the reference categories.Fig. S4 shows the average marginal effects in mention rates for scholars of minority ethnicity(or race) compared to British-origin (or White) named authors. As neither tool infers gender,we thus report the result for gender here using

Ethnea ’s labels. Like the case of

Ethnea , we ﬁnd36igure S4: The average marginal effects in mention probability for ﬁrst authors’ demographicvariables, using (

Left ) Wikipedia data for coding ethnicity or (

Right ) U.S. Census data forcoding race based on author (or journalist) names. Note that the gender is stilled inferred using

Ethnea .strong anti-Asian biases in author mentions in science news, highlighting the robustness of ourﬁndings in the main text. 37 odel 1 Model 2 Model 3 Model 4 Model 5 F I R S T A U T HO R D E M OG . African − . ∗∗∗ − . ∗∗∗ − . ∗ − . − . ∗ non-Chinese East Asian − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Chinese . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Eastern European − . ∗ − . ∗∗ − . − . − . ∗ Indian − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Middle Eastern − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Scandinavian & Germanic − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗ − . ∗∗∗ Romance Language − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Unknown Ethnicity − . ∗∗∗ − . ∗∗∗ − . − . − . Female − . ∗∗∗ − . ∗∗∗ .

051 0 . ∗ . Unknown Gender − . ∗∗∗ − . ∗∗∗ . − . − . Author rank − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Afﬁliation rank − . ∗∗∗ − . ∗ − . ∗∗∗ − . ∗∗∗ Afﬁliation international (location) − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Afﬁliation unknown (location) .

179 0 .

192 0 . ∗ . Last name length − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Last name frequency .

002 0 .

001 0 .

001 0 . Is the paper solo authored? . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ L A S T A U T HO R D E M OG . African .

024 0 .

016 0 .

051 0 . non-Chinese East Asian . ∗∗ . ∗∗ . ∗∗∗ . Chinese − . ∗∗ − . ∗∗∗ − . − . ∗∗ Eastern European − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Indian − . ∗∗ − . ∗∗∗ − . ∗ − . Middle Eastern − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Scandinavian & Germanic . ∗∗∗ .

020 0 . ∗ . Romance Language . ∗ .

011 0 . − . Unknown Ethnicity − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Female . ∗ . ∗∗∗ . ∗∗∗ . ∗∗ Unknown Gender . ∗∗∗ . ∗∗∗ . ∗ − . Is last author mentioned? . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ Number of authors in the paper − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ J R N . D E M OG . Asian − . − . ∗∗∗ − . European . ∗∗ . ∗∗∗ − . Other Unknown Ethnicity . ∗∗∗ . ∗∗∗ − . Female − . ∗∗∗ − . ∗∗∗ − . Unknown Gender .

018 0 . − . A U T . - J R N . Scandinavian & Germanic: Asian .

274 0 . ∗ . ∗ Chinese: European .

123 0 .

115 0 . ∗ Romance Language: European .

078 0 .

077 0 . ∗ Chinese: Other Unknown . ∗∗ . ∗∗∗ . ∗∗∗ Scandinavian & Germanic: Other Unknown .

053 0 .

044 0 . ∗∗ Year of news story (mention year) . ∗ − . − . ∗∗∗ Year gap between story and paper − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Num. of papers mentioned in a story − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ News story length − . ∗∗∗ − . ∗∗∗ . ∗∗∗ Flesch-Kincaid score − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ Sentences per paragraph .

001 0 .

003 0 . ∗∗ Type-Token Ratio − . ∗∗∗ .

005 0 . ∗ Intercept − . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ Fixed effects for paper keywords No No No

Yes Yes

Random effects for outlets and venues No No No No

Yes

Random effects for top 100 last authors No No No No

Yes

Akaike Information Criteria (AIC) 374,752.6 334,648.9 315,167.3 307,805.3 230,167.5

Table S5: Coefﬁcients of ﬁve increasing-complexity regression models in predicting if theﬁrst author is mentioned using 285,708 observations. For author-journalist interactions (AUT.-JRN.), only signiﬁcant terms are shown. All variables in Model 5, including 533 keywords, areprovided in Appendix Table S11. *** p < < < ender/Ethnicity U.S.-based non-U.S. p-value Female .

08 0 . − . ∗ − . ∗∗∗ − . ∗ − . ∗∗∗ − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . − . ∗∗∗ − . ∗∗ − . ∗ − . − . ∗ < < < First Author Name

Ethnea

U.S. Census Wikipedia

Alana Lelo African White Romance LanguageSamuel Lawn African White British-originSaka S Ajibola African Black East AsianMosi Adesina Ifatunji African Black AfricanSebastian Giwa African White AfricanOlabisi Oduwole African White AfricanChidi N. Obasi African White AfricanHabauka M. Kwaambwa African Asian AfricanEsther E Omaiye African White AfricanAurel T. Tankeu African White British-originTable S7: A random sample of 10 African authors predicted by

Ethnea (out of 613 in total inour data) and their ethnicity or race categories based on the U.S. census or the Wikipedia data.39 irst Author Name U.S. Census

Ethnea

Wikipedia

E. Robinson Black British-origin British-originMomar Ndao Black Romance Language AfricanAngela F Harris Black British-origin British-originDaddy Mata-Mbemba Black Romance Language AfricanA Bolu Ajiboye Black African AfricanLasana T. Harris Black British-origin British-originJohn M. Harris Black British-origin British-originEdwin S Robinson Black British-origin British-originEric A. Coleman Black British-origin British-originMp Coleman Black British-origin British-originTable S8: A random sample of 10 Black authors predicted based on the U.S. census data (outof 560 in total in our data) and their ethnicity categories based on

Ethnea or the Wikipedia data.40

Tables

Table S9: A random sample of 10 names for each of the24 individual ethnicities and the “Unknown” category. All 6MONGOLIAN names in our data are shown here.

Ethnicity Name Example Gender

AFRICAN Dora Wynchank FBenjamin D. Charlton MJ. Nwando Olayiwola unknownAyodeji Olayemi MElizabeth Gathoni Kibaru FChristopher Changwe Nshimbi MNaganna Chetty unknownBenjamin Y. Ofori MKhadijah Essackjee FJeanine L. Marnewick FHabtamu Fekadu Gemede MARAB Zaid M. Abdelsattar MAlireza Dirafzoon MAhmad Nasiri MSaleh Aldasouqi MIbrahim A. Arif MSameer Ahmed MA Elgalib unknownTaha Adnan Jan MMohsen Taghizadeh MBehnam Nabet MBALTIC Skirmantas Kriaucionis MAiridas Korolkovas MEgle Cekanaviciute FArunas L. Radzvilavicius MIeva Tolmane FAlberts B MGediminas Gaigalas MArmandas Balcytis unknownRuta Ganceviciene FAndrius Paukonis MCHINESE Chin Hong Tan unknownLi Yuan unknownYalin Li unknown41ian Adiconis unknownPhilip Sung-En Wang MXiaohui Ni unknownMinghua Li unknownFang Fang Zhang FLi-Qiang Qin MJian Tan unknownDUTCH Pieter A. Cohen MI. Vandersmissen unknownMarleen Temmerman FGerard ’t Hooft MA. Yool unknownG. A W Rook unknownFatima Foﬂonker FMirjam Lukasse FSander Kooijman MIzaak D. Neveln MENGLISH Isabel Hilton FGavin J. D. Smith MKatherine A. Morse FAndrew S. Bowman MT. M. L. Wigley unknownFrancis Markham MNeil T. Roach MBrooke Catherine Aldrich FVaughn I. Rickert MKellie Morrissey FFRENCH Lucas V. Joel MDaniel Clery MPierre Jacquemot MScott Le Vine MNathalie Dereuddre-Bosquet FStphane Colliac unknownAdelaide Haas FJulie M. D. Paye FJustine Lebeau FArnaud Chiolero MGERMAN Laure Schnabel FJeff M. Kretschmar ME. Homeyer unknownMaren N. Vitousek F42. Wild unknownHany K. M. Dweck ME. M. Fischer unknownPaul Marek MHans-Jrg Rheinberger MDaniel James Cziczo MGREEK Mary J. Scourboutakos FAnita P Courcoulas FElgidius B. Ichumbaki unknownStavros G. Drakos MNikolaos Konstantinides MConstantine Sedikides MMaria A. Spyrou FPanos Athanasopoulos MAristeidis Theotokis MAmy H. Mezulis FHISPANIC Mirela Donato Gianeti FJulio Cesar de Souza MPaulina Gomez-Rubio FJos A. Pons MArnau Domenech MNicole Martinez-Martin FMauricio Arcos-Burgos MRaquel Muoz-Miralles FAnnmarie Cano FMerika Treants Koday FHUNGARIAN Andrea Tabi FRbert Erdlyi MGabor G. Kovacs MXenia Gonda FErzsbet Bukodi unknownJulianna M. Nemeth FIan K. Toth MZoltan Arany MCory A. Toth MAshley N. Bucsek unknownINDIAN Sachin M. Shinde MGovindsamy Vediyappan MAshish K. Jha MTamir Chandra MHariharan K. Iyer M43hanpreet Singh unknownRavi Chinta MMadhukar Pai MLalitha Nayak FRavi Dhingra MINDONESIAN Dewi Candraningrum unknownRichard Tjahjono MT. A. Hartanto unknownJohny Setiawan MTruly Santika unknownChairul A. Nidom unknownChristine Tedijanto FAlberto Purwada MArdian S. Wibowo MAnna I Corwin FISRAELI Ron Lifshitz MMartin H. Teicher MRuth H Zadik FGil Yosipovitch MMor N. Lurie-Weinberger unknownJ. Tarchitzky unknownIlana N. Ackerman FB. Trakhtenbrot unknownYoram Barak MMendel Friedman MITALIAN Tiziana Moriconi FMarco Gobbi MMarco De Cecco MF. Govoni unknownTheodore L. Caputi MMark A Bellis MFernando Migliaccio MJulien Granata MJennifer M. Poti FBrendan Curti MJAPANESE Takuji Yoshimura MMaki Inoue-Choi FMasaaki Sadakiyo MMoeko Noguchi-Shinohara FNaoto Muraoka MShigeki Kawai M44oji Mikami MMasayoshi Tokita MNaohiko Kuno MSaba W. Masho FKOREAN Jih-Un Kim MHanseon Cho unknownHyung-Soo Kim MYun-Hee Youm FYoon-Mi Lee unknownSoo Bin Park FYungi Kim unknownWoo Jae Myung unknownKunwoo Lee unknownSandra Soo-Jin Lee FMONGOLIAN C. Jamsranjav unknownJigjidsurengiin Batbaatar unknownKhishigjav Tsogtbaatar unknownMigeddorj Batchimeg unknownTsolmon Baatarzorig unknownNORDIC Steven G. Rogelberg MKirsten K. Hanson FJan L. Lyche MMorten Hesse MKarolina A. Aberg FBritt Reuter Morthorst FKirsten F. Thompson FShelly J. Lundberg FG Marckmann unknownDavid Hgg MROMANIAN Afrodita Marcu FIulia T. Simion FLiviu Giosan MAlina Sorescu FLiviu Giosan MMircea Ivan MDana Dabelea FConstantin Rezlescu MChristine A. Conelea FR. A. Popescu unknownSLAV Nomi Koczka FMikhail G Kolonin M45ichard Karban MBranislav Dragovi MH Illnerov unknownMarte Bjrk FJacek Niesterowicz MJustin R. Grubich MMikhail Salama Hend MSnejana Grozeva FTHAI Piyamas Kanokwongnuwut unknownClifton Makate MNoppol Kobmoo unknownKabkaew L. Sukontason unknownAroonsiri Sangarlangkarn unknownYossawan Boriboonthana unknownEkalak Sitthipornvorakul unknownTony Rianprakaisang MApiradee Honglawan FWonngarm Kittanamongkolchai unknownTURKISH Iris Z. Uras FMetin Gurcan unknownMustafa Sahmaran MPinar Akman FJoshua Aslan MSelin Kesebir FTan Yigitcanlar unknownThembela Kepe unknownUlrich Rosar MSelvi C. Ersoy FVIETNAMESE Huong T. T. Ha unknownVu Van Dung MH ChuongKim unknownDaniel W. Giang MNhung Thi Nguyen unknownV. Phan unknownOanh Kieu Nguyen FPhuc T. Ha MBich Tran unknownOanh Kieu Nguyen FUnknown Gene Y. Fridman MJudith Glck FNoor Edi Widya Sukoco unknown46harlene Laino FBenot Brard unknownDavid Znd MKatarzyna Adamala FK.A. Godfrin unknownShadd Maruna MMariette DiChristina F47able S10: The 288 U.S.-based outlets are grouped into 3categories based on their topics of reports. Note that other135 U.S.-based outlets, which are not shown in this table, areexcluded in our analyses due to technical limitations in ac-cessing sufﬁcient volumes of their content (e.g., view-limitedpaywalls or anti-crawling mechanisms).

Outlet Type

OnMedica Sci. & Tech.Hufﬁngton Post General NewsKiiiTV 3 General NewsCarbon Brief Sci. & Tech.PR Newswire Press ReleasesNutra Ingredients USA Sci. & Tech.The Bellingham Herald General NewsCNN News General NewsHealth Medicinet Press ReleasesHerald Sun General NewsEurekAlert! Press ReleasesAJMC Press ReleasesThe University Herald General NewsLincoln Journal Star General NewsCardiovascular Business Sci. & Tech.MinnPost General NewsCNET Sci. & Tech.Infection Control Today Sci. & Tech.Science 2.0 Sci. & Tech.Lexington Herald Leader General NewsStatesman.com General NewsNanowerk Press ReleasesThe San Diego Union-Tribune General NewsThe Daily Beast General NewsLab Manager Press ReleasesSDPB Radio General NewsNew Hampshire Public Radio General NewsHealth Day Press ReleasesRocket News General NewsKPBS General NewsTechnology.org Press ReleasesUPI.com General NewsWUWM General News48entral Coast Public Radio General NewsThe Hill General NewsThe Epoch Times General NewsBiospace Sci. & Tech.Minyanville: Finance General NewsNature World News Sci. & Tech.New York Post General NewsAction News Now General NewsWUNC General NewsFuturity Press ReleasesReason General Newsazfamily.com General NewsIdaho Statements General NewsGoogle News General NewsTri States Public Radio General NewsAmerican Physical Society - Physics Press ReleasesKTEP El Paso General NewsLiveScience Sci. & Tech.KUNC General NewsThe Daily Meal Sci. & Tech.AOL General NewsWomen’s Health Sci. & Tech.Prevention Sci. & Tech.ECN Sci. & Tech.Iowa Public Radio General NewsBecker’s Hospital Review Sci. & Tech.7th Space Family Portal Press ReleasesSpringﬁeld News Sun General NewsEnvironmental News Network Press ReleasesSky Nightly Sci. & Tech.Quartz Sci. & Tech.Benzinga General NewsHeadlines & Global News General NewsThe Denver Post General NewsScience Daily Press ReleasesThe Advocate General NewsABC News General NewsNewswise Press Releaseshellogiggles.com General NewsWLRN General NewsEarthSky Sci. & Tech.49ecker’s Spine Review Sci. & Tech.MIT News Press ReleasesMarketWatch General NewsArstechnica Sci. & Tech.Journalist’s Resource Sci. & Tech.Northern Public Radio General NewsEveryday Health Sci. & Tech.Star Tribune General NewsTCTMD Sci. & Tech.The Verge General NewsShe Knows General NewsSeedQuest Sci. & Tech.Tech Times Sci. & Tech.Witchita’s Public Radio General NewsOncology Nurse Advisor Sci. & Tech.Delmarva Public Radio General NewsMedical Daily Sci. & Tech.Homeland Security News Wire General NewsDiscover Magazine Sci. & Tech.Washington Post General NewsMSN General NewsHawaii News Now General NewsThe Daily Caller General NewsNews Tribune General NewsThe Fresno Bee General NewsKing 5 General NewsStar-Telegram General NewsCNBC General NewsSalon General NewsWJCT General NewsWVPE General NewsKTEN General NewsWired.com General NewsDaily Kos General NewsUSA Today General NewsMen’s Health Sci. & Tech.Boise State Public Radio General NewsVoice of America General NewsPR Web Press ReleasesGeorgia Public Radio General NewsFiveThirtyEight General News50ublic Radio International General NewsHarvard Business Review General NewsInverse General NewsDoctors Lounge Sci. & Tech.North East Public Radio General NewsThe Charlotte Observer General NewsNational Geographic Sci. & Tech.Pharmacy Times Sci. & Tech.Popular Science Sci. & Tech.ABC Action News WFTS Tampa Bay General NewsNews Channel General NewsThe University of New Orleans Public Radio General NewsMic General NewsHealth Canal Sci. & Tech.KOSU General NewsRaleigh News and Observer General NewsThe Atlantic General Newsnewsmax.com General NewsYahoo! Finance USA General NewsGovernment Executive General NewsInternational Business Times General NewsEmaxhealth.com Press ReleasesNewsweek General NewsFOX News General NewsThe New York Observer General NewsSign of the Times General NewsThe Inquisitr General NewsABC News 15 Arizona General NewsParent Herald General NewsThe ASCO Post Sci. & Tech.Clinical Advisor Sci. & Tech.Slate Magazine General NewsNPR General NewsHealth Sci. & Tech.Dayton Daily News General NewsGuardian Liberty Voice General NewsBelleville News-Democrat General NewsYahoo! News General NewsWCBE General NewsBuzzfeed General NewsSci-News Sci. & Tech.51he Seattle Times General NewsPhilly.com General NewsRenal & Urology News Sci. & Tech.Arizona Public Radio General NewsInterlochen Public Radio General News12 News KBMT General NewsNew York Magazine General NewsMedium US General NewsKPCC : Southern California Public Radio General News2 Minute Medicine Sci. & Tech.Pediatric News Sci. & Tech.redOrbit Sci. & Tech.Insurance News Net General NewsDrug Discovery and Development Sci. & Tech.USNews.com General NewsYahoo! General NewsThe Body Sci. & Tech.GEN Sci. & Tech.Paciﬁc Standard General NewsNorthwest Indiana Times General NewsPsychology Today Sci. & Tech.Oregon Public Broadcasting General NewsMother Nature Network Sci. & Tech.Pressfrom General NewsPhysician’s Weekly Sci. & Tech.Pettinga: Stock Market General NewsWinona Daily News General NewsRunner’s World Sci. & Tech.Bio-Medicine.org Press ReleasesAlternet General NewsMother Jones General NewsThe Wichita Eagle General NewsCornell Chronicle Press ReleasesPolitico Magazine General NewsEquities.com General NewsWBUR General NewsABC 7 WKBW Buffalo General NewsBillings Gazette General NewsMy Science Sci. & Tech.The Week General NewsBioTech Gate Sci. & Tech.52ansas City Star General NewsThe Deseret News General NewsPBS General NewsSpace.com Sci. & Tech.Astrobiology Magazine Sci. & Tech.Outside General NewsValue Walk General NewsWYPR General NewsBustle General NewsScience World Report Sci. & Tech.Inside Science Sci. & Tech.Science Alert Sci. & Tech.Breitbart News Network General NewsSt. Louis Post-Dispatch General NewsHowStuffWorks General NewsWyoming Public Radio General NewsUBM Medica Sci. & Tech.Fight Aging! Sci. & Tech.MIT Technology Review Sci. & Tech.WVXU General NewsThe Ecologist Sci. & Tech.Alaska Despatch News General NewsHealth Imaging Sci. & Tech.Kansas City University Radio General NewsChristian Science Monitor General NewsMedicinenet Sci. & Tech.WTOP General NewsBusiness Insider General NewsReal Clear Science Sci. & Tech.Counsel & Heal Sci. & Tech.The Raw Story General NewsMedcity News Sci. & Tech.Drugs.com Sci. & Tech.Relief Web Press ReleasesSPIE Newsroom Sci. & Tech.New York Daily News General NewsNewser General NewsThe Sacramento Bee General NewsVice General NewsR&D Sci. & Tech.KCENG12 Sci. & Tech.53nc. General NewsScience/AAAS Sci. & Tech.The Atlanta Journal Constitution General NewsBrookings General NewsCommon Dreams General NewsPhysician’s Brieﬁng Press ReleasesKERA News General NewsSpace Daily Sci. & Tech.Tech Xplore Sci. & Tech.US News Health Sci. & Tech.KUOW General NewsWRKF General NewsTIME Magazine General NewsSmithsonian Magazine Sci. & Tech.Herald Tribune General NewsLifehacker General NewsFast Company General NewsKansas Public Radio General NewsOmaha Public Radio General NewsNew York Times General NewsTechnology Networks Sci. & Tech.Elite Daily General NewsCentre for Disease Research and Policy Sci. & Tech.Business Wire General NewsKUNM General NewsCBS News General NewsScientiﬁc American Sci. & Tech.NBC News General NewsSun Herald General NewsKRWG TV/FM General NewsTODAY General NewsRadio Acadie General NewsThe Columbian General NewsHouston Chronicle General NewsWABE General NewsThe Modesto Bee General NewsAmerican Council on Science and Health Sci. & Tech.WKAR General NewsPsych Central Sci. & Tech.WebMD News Sci. & Tech.Green Car Congress Sci. & Tech.54BC News WMUR 9 General NewsHealthline Sci. & Tech.Mongabay Sci. & Tech.Vox.com General NewsWPTV 5 West Palm Beach General NewsPopular Mechanics Sci. & Tech.PM 360 Sci. & Tech.SFGate General NewsSeed Daily Sci. & Tech.55 able S11: The coefﬁcients of all independent variables (including 533keywords) in Model 5 in predicting whether the ﬁrst author is mentionedor not by name in a news story referencing their research papers. Randomeffects for 100 top last authors, 288 outlets, and 8,268 publication venuesare also included in the model. Note that “FA” denotes the ﬁrst authorand “J” denotes the journalist.

Dependent variable:

First author mentionedFA African − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − A Indian:J OtherUnknown 0.093 ( − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − olecular biology − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − ody mass index − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − ocial relation 0.469 (0.113, 0.824) p = 0.010Chromatin − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − lassical mechanics − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − mmunology − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − llele 0.071 ( − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − eart disease − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − eight gain − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − ocus group 0.716 (0.140, 1.291) p = 0.015Regimen 0.461 ( − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − ntioxidant − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − estational age − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − −114,476.700Akaike Inf. Crit. 230,167.500Bayesian Inf. Crit. 236,579.100