[PDF] Analyzing Who and What Appears in a Decade of US Cable TV News

Abstract

Cable TV news reaches millions of U.S. households each day, meaning that decisions about who appears on the news and what stories get covered can profoundly influence public opinion and discourse. We analyze a data set of nearly 24/7 video, audio, and text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, we detect faces in 244,038 hours of video, label each face's presented gender, identify prominent public figures, and align text captions to audio. We use these labels to perform screen time and word frequency analyses. For example, we find that overall, much more screen time is given to male-presenting individuals than to female-presenting individuals (2.4x in 2010 and 1.9x in 2019). We present an interactive web-based tool, accessible at this https URL, that allows the general public to perform their own analyses on the full cable TV news data set.

Full PDF

AAnalyzing Who and What Appears in a Decade ofUS Cable TV News

James Hong, Will Crichton, Haotian Zhang, Daniel Y. Fu, Jacob RitchieJeremy Barenholtz, Ben Hannel, Xinwei Yao, Michaela MurrayGeraldine Moriba, Maneesh Agrawala, Kayvon Fatahalian

Stanford University H o u r s o f v i d e o Overall CNN FOX MSNBCOverall CNN FOX MSNBC

All contentNews contentCommercials

Less video available from the Internet ArchiveFailure to detect commercials on FOX due to missing captionsMonths vary from 28 to 31 days in length (b) Ratio of time when female- and male-presenting faces are on screen(a) The cable TV news data set R a t i o o f “ f e m a l e ” t o “ m a l e ” s c r ee n t i m e Anderson Cooper (3rd)Rachel Maddow (4th)Chris Matthews (5th)Bill O’Reilly (6th) Don Lemon (7th)Sean Hannity (8th)Joe Scarborough (10th)Other (11th - 100th) Donald Trump (1st)Barack Obama (2nd) Hillary Clinton (9th) (c) Top 100 individuals by screen time in the news P e r c e n t o f s c r ee n t i m e Male-presenting Female-presenting Male and female News presenter Interview CommercialDonald Trump Barack Obama U.S. politicians2010 2012 2014 2016 2018

Figure 1: (a) Our data set contains over 244,000 hours of video aired on CNN, FOX, and MSNBC from January 1, 2010 to July 23,2019. The screen time of news content (and commercials) in our data set is stable from 2012 onwards, representing near 24/7coverage. (b) The ratio of time of when female-presenting faces are on screen to when male-presenting faces are on screen is0.48 to 1 on average, but has risen from 0.41 (to 1) to 0.54 (to 1) over the decade. (c) The top 100 people by face screen time inthe data set, with names of the top 10 given. Of the top 100 people, 18 are U.S. politicians and 85 are news presenters (3 areboth).

ABSTRACT

Cable TV news reaches millions of U.S. households each day, mean-ing that decisions about who appears on the news and what storiesget covered can profoundly influence public opinion and discourse.We analyze a data set of nearly 24/7 video, audio, and text captionsfrom three U.S. cable TV networks (CNN, FOX, and MSNBC) fromJanuary 2010 to July 2019. Using machine learning tools, we detectfaces in 244,038 hours of video, label each face’s presented gender,identify prominent public figures, and align text captions to audio.We use these labels to perform screen time and word frequencyanalyses. For example, we find that overall, much more screen timeis given to male-presenting individuals than to female-presentingindividuals (2.4x in 2010 and 1.9x in 2019). We present an interac-tive web-based tool, accessible at https://tvnews.stanford.edu, thatallows the general public to perform their own analyses on the fullcable TV news data set.

CCS CONCEPTS • Social and professional topics ; •

Applied computing ; •

Com-puting methodologies → Artificial intelligence ; •

Informa-tion systems → Information systems applications ; KEYWORDS

Large scale video analysis, cable TV news

Cable TV news reaches millions of U.S. households each day, andprofoundly influences public opinion and discourse on currentevents [30]. While cable TV news has been on air for over 40 years,there has been little longitudinal analysis of its visual aspects. Asa result, we have little understanding of who appears on cable TVnews and what these individuals talk about.Consider questions like,

What is the screen time of men vs. women?Which political candidates and news presenters receive the most screentime? How are victims and perpetrators of violence portrayed? Whichforeign countries are discussed the most? Who is on screen whendifferent topics are discussed?

In this paper, we demonstrate that it is possible to answer suchquestions by analyzing a data set comprised of nearly 24/7 coverageof video, audio, and text captions from three major U.S. cable TVnews channels – CNN, FOX (News) and MSNBC – over the lastdecade (January 1, 2010 to the present). The data set was collected a r X i v : . [ c s . C Y ] O c t y the Internet Archive’s TV News Archive [2]. We focus our anal-ysis (and validation) between January 2010 to July 2019, whichincludes 244,038 hours (equivalent to about 27.8 years) of footage.Using programmatic and machine learning tools, we label the dataset – e.g., we detect faces, label their presented gender, identifyprominent public figures, align text captions to audio, and detectcommercials. We scope our findings to the news programming por-tion of the data set (Figure 1a), accounting for 72.1% of the video(175,858 hours) compared to 27.9% for commercials (68,179 hours).Each of the resulting labels has a temporal extent, and we usethese extents to compute the screen time of faces and to identifywhen individuals are on screen and when words are said. We showthat by analyzing the screen time of faces, counting words in cap-tions, and presenting results in the form of time-series plots, wecan reveal a variety of insights, patterns, and trends about the data.To this end, we adopt an approach similar to the Google N-gramviewer [42], which demonstrated the usefulness of word frequencyanalysis of 5.2 million books and print media from 1800 to 2000to many disciplines, as well as to the GDELT AI Television Ex-plorer [24], which enables analysis of cable TV news captions andon screen objects (but not people). The goal of our work is to en-able similar analyses of cable TV news video using labels that aidunderstanding of who is on screen and what is in the captions.Our work makes two main contributions. • We demonstrate that analyzing a decade (January 1, 2010 toJuly 23, 2019) of cable TV news video generates a variety ofinsights on a range of socially relevant issues, including gen-der balance (section 2), visual bias (section 3), topic coverage(section 4), and news presentation (section 5). The detailsof our data processing, labeling pipeline, and validation forthese analyses are described in Supplemental 1. • We present an interactive, web-based data analysis inter-face (section 6; available at https://tvnews.stanford.edu), thatallows users to easily formulate their own their analysisqueries on our annotated data set of cable TV news (sec-tion 7). Our analysis interface updates daily with new cableTV news video and allows the general public and journaliststo perform their own exploratory analyses on the full theTV news data set. Our data processing code is available asopen source.

People are an integral part of the news stories that are covered,how they are told, and who tells them. We analyze the screen timeand demographics of faces in U.S. cable TV news.

How much time is there at least one face on screen?

We detectfaces using the MTCNN [66] face detector on frames sampled everythree seconds (Supplemental 1.3). Face detections span a wide rangeof visual contexts ranging from in-studio presenters/guests, peoplein B-roll footage, or static infographics. Overall, we detect 263Mtotal faces, and at least one face appears on screen 75.3% of the time.Over the decade, the percentage of time with at least one face onscreen has risen steadily from 72.9% in 2010 to 81.5% in 2019 and issimilar across all three channels (Figure 2).In the same time period, we also observe an increase in theaverage number of faces on screen. On CNN and FOX the amount of time when only one face is on screen has declined, while it hasremained constant on MSNBC. On all three channels, the amount oftime when multiple faces (2 or more) are on screen simultaneouslyhas risen. This accounts for the overall increase in time when atleast one face is on screen, though we do not analyze which types ofcontent (with no faces on screen) that this footage is replacing. Wenote that while the average number of faces has increased in newscontent, the average number of faces on screen in commercials hasremained flat since 2013 (Supplemental 2.1.1).

How does screen time of male-presenting individuals compareto female-presenting individuals?

We estimate the presentedbinary gender of each detected face using a nearest neighbor classi-fier trained on FaceNet [53] descriptors (Supplemental 1.4). Overall,female-presenting faces are on screen 28.7% of the time, while male-presenting faces are on screen 60.2% of the time, a 0.48 to 1 ratio(Figure 3). These percentages are similar across channels and haveslowly increased for both groups (similar to how the percentageof time any face is on screen has increased). The ratio of female-to male-presenting screen time has increased from 0.41 (to 1) to0.54 (to 1) over the decade (Figure 1b). While the upward trend indi-cates movement towards gender parity, the rate of change is slow,and these results also reinforce prior observations on the under-representation of women in both film [25] and news media [28].We acknowledge that our simplification of presented genderto a binary quantity fails to represent transgender or gender-nonconforming individuals [32, 37]. Furthermore, an individual’spresented gender may differ from their actual gender identification.Despite these simplifications, we believe that automaticallyestimating binary presented gender labels is useful to improvingunderstanding of trends in gender representation in cable TV newsmedia.

Which public figures receive the most screen time?

We esti-mate the identity of faces detected in our data set using the AmazonRekognition Celebrity Recognition API [1]. For individuals whoare not currently included (or not accurately detected) by the API,we train our own classifiers using FaceNet [53] descriptors. SeeSupplemental 1.5 for details.We identify 1,260 unique individuals who receive at least 10hours of screen time in our data set. These individuals account for47% of the 263M faces that we detect in the news content and are onscreen for 45% of screen time. The top individual is Donald Trump,who rises to prominence in the 2016 presidential campaigningseason and his presidency (Figure 1c). Barack Obama is second,with 0.63 × Trump’s screen time, and is prevalent between 2010(the start of the data set) and 2017 (the end of his second term).Besides U.S. presidents, the list of top individuals is dominated bypoliticians and news presenters (e.g., anchors, daytime hosts, fieldreporters, etc.) (Figure 4).

How much screen time do political candidates get before anelection?

During the 2016 Republican presidential primaries, Don-ald Trump consistently received more screen time than any othercandidate (Figure 5a). In the competitive months of the primary sea-son, from January to May 2016, Trump received 342 hours of screentime, while his closest Republican rival, Ted Cruz, received only 130hours. In the same timespan, the leading Democratic candidates,Hillary Clinton and Bernie Sanders received more equal screen time P e r c e n t o f s c r ee n t i m e (a) CNN (b) FOX (c) MSNBCVideo decoding issues2010 2013 2016 2019 2010 2013 2016 2019 2010 2013 2016 2019 CNN FOX MSNBC

Figure 2: The percentage of time when at least one face appears on screen has increased on all three channels over the decade(thick lines), with most of the increase occurring between 2015 and 2018. The amount of time when multiple faces are onscreen has also increased on all three channels, however the percentage of time with only one face on screen has declined onCNN and FOX, and stagnated on MSNBC. P e r c e n t o f s c r ee n t i m e Overall CNN FOX MSNBC male-presentingfemale-presenting

Figure 3: The percentages of time when male-presentingand female-presenting faces are on screen are similar on allthree channels, and have increased over the decade with therise in all faces noted in Figure 2. Because male- and female-presenting faces can be on screen simultaneously, the linescan add to over 100%.

CNN news presentersFOX news presentersMSNBC news presentersOther individuals N u m b e r o f i n d i v i d u a l s Donald TrumpBarack ObamaMitt RomneyNewt GingrichBernie SandersTed CruzPaul RyanJoe BidenJohn McCainRick Santorum... Hours of screen time0 1,000 2,000 3,000 4,000Hillary Clinton10200 Mike Huckabee

Figure 4: Distribution of individuals’ screen time, separatedby presenters on each channel and non-presenters (stacked).65% of individuals with 100+ hours of screen time are newspresenters. The leading non-presenters are annotated — seeFigure 7 for the top news presenters. Note: the three leftmostbars are truncated and the truncated portion includes pre-senters from all three channels. (164 hours compared to 139 hours for Clinton); both received farmore screen time than the other Democratic primary candidates(Figure 5b). Comparing the two presidential nominees during theperiod from January 1, 2016 to election day, Trump received 1.9 × more screen time than Clinton.Unlike Trump in 2016, in the run up to the 2012 presidentialelection, Mitt Romney (the eventual Republican nominee) did notreceive as dominating an amount of screen time (Figure 5c). OtherRepublican candidates such as Herman Cain, Michelle Bachmann,Newt Gingrich, and Rick Santorum have higher peaks than Romneyat varying stages of the primary season, and it is not until April 2012 (when his last rival withdraws) that Romney’s screen timedecisively overtakes that of his rivals. For reference, Figure 5dshows Barack Obama’s screen time during the same period. As theincumbent president up for re-election, Obama had no significantprimary challenger. Obama received more screen time throughout2011 than Romney because, as the president, he is in the news forevents and policy actions related to his duties as president (e.g., U.S.missile strikes in Libya, job growth plan, etc.). In 2012, however,Obama and Romney are comparable. The overall trends are similarwhen viewed by channel, with Trump dominating screen time in2016 on all three channels (Supplemental 2.1.3). Who presents the news?

Cable TV news programs feature hosts,anchors and on-air staff (e.g., contributors, meteorologists) topresent the news. We manually marked 325 of the public figureswho we identified in our data set as news presenters (107 on CNN,130 on FOX, and 88 on MSNBC). Overall, we find that a newspresenter is on screen 28.1% of the time – 27.4% on CNN, 33.5%on FOX, and 23.0% on MSNBC. On CNN, the percentage of timethat a news presenter is on screen increases by 13% between 2015and 2018, while it remains mostly flat over the decade on FOX andMSNBC (Figure 6a).The news presenters with the most screen time are AndersonCooper (1,782 hours) on CNN, Bill O’Reilly (1,094 h) on FOX, andRachel Maddow (1,202 h) on MSNBC. Moreover, while the toppresenter on each channel varies a bit over the course of the decade(Figure 7), Cooper and O’Reilly hold the top spot for relativelylong stretches on CNN and FOX, respectively. Also, while Maddowappears the most on MSNBC overall, Chris Matthews holds thetop spot for the early part of the decade (2010 to 2014). Since 2014,the top presenter on MSNBC has fluctuated on a monthly basis(Figure 7c). The 13% rise in screen time of news presenters on CNNthat we saw earlier (in Figure 6a) can largely be attributed to threehosts (Anderson Cooper, Chris Cuomo, and Don Lemon), who see2.5 × , 4.5 × , and 5.5 × increases in screen time from 2015 onwards(Figure 7a) and account for over a third of news presenter screentime on CNN in 2019. How does screen time of male- and female-presenting newspresenters compare?

The list of top news presenters by screentime is dominated by male-presenting individuals. Of the top fivenews presenters on each channel (accounting for 31%, 22%, and34% of news presenter screen time on CNN, FOX, and MSNBC,respectively), only one on CNN and FOX; and two on MSNBC arefemale (Figure 7). Across all three channels, there is a shift towards

012 20132011 2012 201320112016 20172015 2016 20172015 (c) Republican candidates in 2012 (d) Democratic candidates in 2012(a) Republican candidates in 2016 (b) Democratic candidates in 2016 P e r c e n t o f s c r ee n t i m e Election day

Figure 5: Screen time of U.S. presidential candidates during the campaign and primary season of the 2016 and 2012 elections.(a) In 2016, Donald Trump received significantly more screen time than the other Republican candidates. (b) Hillary Clintonand Bernie Sanders received nearly equal screen time during the competitive primary season (January-May 2016). (c) In 2012,Mitt Romney did not decisively overtake the other Republican candidates in screen time until he became the presumptiveRepublican nominee. P e r c e n t o f n e w s p r e s e n t e r s c r ee n t i m e (b) CNN (c) FOX (d) MSNBCOverall CNN FOX MSNBC P e r c e n t o f s c r ee n t i m e CNN FOX MSNBC

Female-presenting Female-presenting Female-presentingMale-presenting Male-presenting Male-presenting

Figure 6: (a) The percentage of time when a news presenter is on screen has remained mostly flat on FOX and MSNBC, buthas risen by 13% on CNN since 2016. (b-d) Within each channel, the screen time of news presenters by presented-gender (as apercentage of total news presenter screen time) varies across the decade. CNN reaches parity in January-June 2012 and May-August 2015, but has since diverged. Because male- and female-presenting news presenters can be on screen simultaneously,the lines can add to over 100%. P e r c e n t o f s c r ee n t i m e (a) CNN (b) FOX (c) MSNBCAnderson Cooper Bill O’Reilly Megyn Kelly Tucker CarlsonDon LemonChris CuomoBrooke Baldwin Wolf Blitzer Greg GutfeldSean Hannity Chris MatthewsRachel Maddow Joe ScarboroughMika BrzezinskiChris Hayes2010 2013 2016 2019 2010 2013 2016 2019 CNN FOX MSNBC

Figure 7: Screen time of the top five presenters on each channel. Since 2016, several of the top presenters on CNN have dra-matically risen in screen time. Following Bill O’Reilly’s firing and Megyn Kelly’s departure from FOX in 2017, Sean Hannityand Tucker Carlson have risen in screen time. Since 2013, the variation in screen time among the top five hosts on MSNBChas been low compared to CNN and FOX. gender parity in the screen time of news presenters early in thedecade followed by a divergence (Figure 6b-d).CNN exhibits gender parity for news presenters in January-June2012 and May-August 2015 (Figure 6b). However, from September2015 onward, CNN diverges as the 10% increase in the screen timeof male-presenting news presenters (from 14% to 24%) outpaces the3% increase for female-presenting news presenters (13% to 16%).The increase in male-presenting news presenter screen time onCNN mirrors the increase in overall news presenter screen time onCNN due to an increase in the screen time for Anderson Cooper,Don Lemon, and Chris Cuomo (Figure 7a).Similarly, the gender disparity of news presenters on FOX de-creases from 2010 to 2016 but widens in 2017 due to an increasein the screen time of male-presenting news presenters (Figure 6c).This occurs around the time of (former top hosts) Megyn Kelly’sand Bill O’Reilly’s departure from FOX (6% and 5% of presenterscreen time, respectively, on FOX in 2016). Their time is replaced by a rise in Tucker Carlson’s and Sean Hannity’s screen time (3% and5% of news presenter screen time, respectively, on FOX in 2016 andup to 11% and 7%, respectively, in 2017 and 2018). The increase infemale-presenting news presenter screen time in October 2017 oc-curs when Laura Ingraham’s

Ingraham Angle and Shannon Bream’s

FOX News @ Night debut.On MSNBC, the disparity as percentage of news presenter screentime increases from May 2017 to July 2019 (Figure 6d). This is dueto a similar drop in the screen time of both male- and female-presenting news presenters. The percentage of time when male-presenting news presenters are on screen falls from 17% to 13%,while the percentage for female-presenting news presenters fallsfrom 14% to 7%. Unlike with CNN and FOX, the change is moredistributed across news presenters; the screen time of the top fivepresenters from 2017 to 2019 is comparatively flat (Figure 7c).

Which news presenters hog the screen time on their shows?

We compute the percentage of time a news presenter is on screen Chris Cuomo,

Cuomo Primetime

Tucker Carlson,

Tucker Carlson Tonight

Anderson Cooper,

AC360

Glenn Beck,

Glenn Beck

Michael Smerconish,

Smerconish

Bill O’Reilly,

The O’Reilly Factor

Rachel Maddow,

The Rachel Maddow Show

Ed Schultz,

The Ed Show

Sean Hannity,

Hannity

Hours of show content

Ari Melber,

The Beat

Chris Matthews,

Hardball

Neil Cavuto,

Your World P e r c e n t o f s c r ee n t i m e CNN FOX MSNBC

Figure 8: The 25 news presenters who receive the largest frac-tion of screen time on their own shows (“screenhog”s) andthe total amount of video content for their shows in the dataset. The top two shows by this metric,

Cuomo Primetime and

Tucker Carlson Tonight , are relatively recent shows, startingin June 2018 and November 2016, respectively. on their own show (“screenhog score”) and plot the top 25 “screen-hog”s (Figure 8). Chris Cuomo (CNN) has the highest fraction ofscreen time on his own show (visible 70.6% of the time on

CuomoPrimetime ). Tucker Carlson (FOX) is second at 55.3% on

Tucker Carl-son Tonight . These results can be attributed to the format of thesetwo shows; Cuomo and Carlson both do interviews and often showtheir own reactions to guests’ comments. Carlson also regularlymonologues while on screen. Compared to both CNN and MSNBC,FOX has the most screenhogs (13 of the top 25), many of whom arewell-known hosts of FOX’s opinion shows. Bill O’Reilly, AndersonCooper, and Rachel Maddow (the top presenters by channel) alsobreak the top 25, with screenhog scores of 28.5%, 28.3%, and 24.2%,respectively.

What is the average age of news presenters?

We obtain birth-dates for our list of news presenters from public web sources andwe compute the average age of news presenters on each channelwhen they are on screen (Supplemental 1.8). From 2010 to 2019, theaverage age of news presenters rises from 48.2 to 51.0 years (Fig-ure 9). This trend is visible for all three channels, though there arelocalized reversals that are often marked by the retirements of older,prominent hosts; for example, the average news presenter’s age onCNN falls slightly after Larry King’s retirement in 2010 at age 76.Across all three channels, female-presenting news presenters areyounger on average than their male-presenting counterparts by 6.3years. However, the gap has narrowed in recent years.

Are female-presenting news presenters disproportionatelyblonde?

We manually annotated the hair color (blonde, brown,black, other) of 145 female-presenting news presenters andcomputed the screen time of these groups (Supplemental 1.9). Wefind that blondes account for 64.7% of female-presenting newspresenter screen time on FOX (compared to 28.8% for non-blondes).This gives credence to the stereotype that female-presenting newspresenters on FOX fit a particular aesthetic that includes blondehair (advanced, for example, by The Guardian [23]). However,counter to this stereotype, FOX is not alone; the proportion ofblondes on CNN (56.6% overall and 58.2% since 2015, compared to38.6% overall for non-blondes) has risen, and, currently, the chanceof seeing a blonde female news presenter is approximately equalon the two networks (Figure 10). The screen time of blonde femalenews presenters is lower on MSNBC (36.6%), where non-blonde female news presenters account for 55.7%. On MSNBC, brownis the dominant hair color at 40.8%, but 21.4% is due to a singlebrown-haired host (Rachel Maddow). On all three channels, thepercentage of blonde female news presenters far exceeds thenatural rate of blondness in the U.S. ( ≈

11% according to theBureau of Labor Statistics [10]).

Editorial decisions about the images and graphics to include withstories can subtly influence the way viewers understand a story.We examine such editorial choices in the context of the TrayvonMartin shooting.

Which photos of Trayvon Martin and George Zimmermanappeared most often on each channel?

On February 26, 2012,Trayvon Martin, a 17 year-old high-school student, was fatallyshot by neighborhood watchman George Zimmerman [13]. Mediadepictions of both Martin and Zimmerman were scrutinized heavilyas the story captured national interest [45, 56]. We identifiedunique photographs of Martin and Zimmerman in our data setusing a K-NN classifier on FaceNet [53] descriptors and tabulatedthe screen time of these photos (see Supplemental 1.10).Figure 11 shows the four photos of Martin (top row) and Zim-merman (bottom row) that received the most screen time in theaftermath of the shooting and during Zimmerman’s 2013 trial. Inthe initial week of coverage, all three channels used same image ofMartin (purple). This image generated significant discussion aboutthe “baby-faced” depiction of Martin, although it was dated to afew months before the shooting. In the ensuing weeks (and laterduring Zimmerman’s trial), differences in how the three channelsdepict Marin emerge. CNN most commonly used a photograph ofMartin smiling in a blue hat (blue box). In contrast, the most com-monly shown photo on FOX depicts an unsmiling Martin (orange).MSNBC most frequently used the black-and-white image of Martinin a hoodie (pink) that was the symbol for protests in support ofMartin and his family. The three different images reflect significantdifferences in editorial decisions made by the three channels.Depictions of Zimmerman also evolved with coverage of theshooting and reflect both efforts by channels to use the most up-to-date photos for the story at hand and also the presence of editorialchoices. All three channels initially aired the same image of Zim-merman (purple). The photo, depicting Zimmerman in an orangepolo shirt, was both out of date and taken from a prior police inci-dent unrelated to the Martin shooting. A more recent photographof Zimmerman (pink) was made available to news outlets in lateMarch 2012. While CNN and FOX transitioned to using this newphoto, which depicts a smiling Zimmerman, a majority of the time,MSNBC continued to give more screen time to the original photo.After mid-April 2012, depictions of Zimmerman on all three chan-nels primarily show him in courtroom appearances as the legalprocess unfolded.

The amount of coverage that topics receive in the news can influ-ence viewer perceptions of world events and newsworthy stories.As a measure of the frequency of which key topics are discussed, wecount the number of times selected words appear in video captions. emale-presentingFemale-presenting Female-presentingMale-presentingMale-presenting Male-presentingLarry King retires Ed Schultz fired(a) CNN (b) FOX (c) MSNBC2010 2013 2016 2019 A v e r a g e a g e CNN FOX MSNBC

Figure 9: The average age of news presenters, weighted by screen time, has increased on all three channels (bold lines). FOXhas the highest average age for both male- and female-presenting news presenters. P e r c e n t o f f e m a l e n e w s p r e s e n t e r s c r ee n t i m e CNN FOX MSNBC 2010 2013 2016 2019

Figure 10: Blonde female news presenters consistently re-ceive more screen time on FOX than non-blonde femalenews presenters. CNN catches up to FOX from 2014 onward,while the screen time of blonde female news presenters hasrisen on MSNBC since 2015. On MSNBC, blonde female newspresenters do not receive more screen time than non-blondefemale news presenters. Because blonde and non-blonde fe-male news presenters can be on screen at the same time, thelines in (a) and (b) can add to over 100%.

How often are foreign countries mentioned?

Foreign countrynames, defined in Supplemental 1.11, appear in the captions a totalof 4.5M times. Most countries receive little coverage (Figure 12), andthe eight countries with the highest number of mentions (

Russia , Iran , Syria , Iraq , China , North Korea , Israel , and

Afghanistan )account for 51% of all country mentions.

Russia alone accounts for11.2%. (If treated as a country,

ISIS would rank 2nd after

Russia at8.4%.) Of these eight, five have been in a state of armed conflict in thelast decade, while the other three have had major diplomatic riftswith the U.S.. These data suggest that military conflict and tense U.S.relations beget coverage. No countries from Africa, South America,and Southeast Asia appear in the top eight; the top countries fromthese regions are

Libya / Egypt (11th/12th),

Venezuela (32nd), and

Vietnam (25th). Bordering the U.S.,

Mexico is 9th, frequently appear-ing due to disputes over immigration and trade, while

Canada is21st.Mentions of individual countries often peak due to importantevents. Figure 13 annotates such events for the 15 most often men-tioned countries. For example, the Libyan Civil War in 2011, theescalation of the Syrian Civil War in 2012-2013, and the rise of ISIS(Syria, Iraq) in 2014 correspond to peaks. The countries below thetop 10 are otherwise rarely in the news, but the 2011 tsunami andFukushima Daiichi nuclear disaster; the 2014 annexation of Crimeaby Russia; and the Charlie Hebdo shooting and November Parisattacks (both in 2015), elevated Japan, Ukraine, and France to briefprominence. Since the election of Donald Trump in 2016, however,there has a been a marked shift in the top countries, correspond-ing to topics such as Russian election interference, North Korean disarmament talks, the Iran nuclear deal, and the trade war withChina.

For how long do channels cover acts of terrorism, mass shoot-ings, and plane crashes?

We enumerated 18 major terrorist at-tacks (7 in the U.S. and 11 in Europe), 18 mass shootings, and 25commercial airline crashes in the last decade, and we counted re-lated N-grams such as terror(ism,ist) , shoot(ing,er) , and planecrash in the weeks following these events (Supplemental 1.12 givesthe full lists of terms). Counts for terrorism and shootings re-turn to the pre-event average after about two weeks (Figure 14a-c).Likewise, coverage of plane crashes also declines to pre-crash lev-els within two weeks (Figure 14d), though there are some notableoutliers. Malaysia Airlines Flight 370, which disappeared over theIndian Ocean in 2014, remained in the news for nine weeks, andMalaysia Airlines Flight 17, shot down over Ukraine, also receivedcoverage for four weeks as more details emerged. Is it illegal or undocumented immigration? “Illegal immigrant”and “undocumented immigrant” are competing terms that describeindividuals who are in the U.S. illegally, with the latter term seen asmore politically correct [33]. Figure 15 shows the counts of whenvariants of these terms are said (Supplemental 1.13 gives the fulllist of variants).

Illegal is used on FOX the most (59K times); FOXalso has more mentions of immigration overall. From 2012 onward, undocumented has increased in use on CNN and MSNBC, though illegal still appears equally or more often on these channels than undocumented . How often are honorifics used to refer to President Trump andObama?

Honorifics convey respect for a person or office. Wecompared the number of times that

President (Donald) Trump isused compared to other mentions of Trump’s person (e.g.,

DonaldTrump , just

Trump ). When computing the number of mentions ofjust

Trump , we exclude references to nouns such as the Trumpadministration and

Melania Trump that also contain the wordTrump, but are not referring to Donald Trump (Supplemental 1.14gives the full list of exclusions).Our data suggests that although coverage of the incumbent presi-dent has increased since the start of Trump’s presidency in 2017, thelevel of formality when referring to the president has fallen. Trump,in general, is mentioned approximately 3 × more than Obama on amonthly basis during the periods of their respective presidencies inour data set. The term President Trump only emerges on all threechannels following his inauguration to the office in January 2017(Figure 16a-c).

President is used nearly half of the time on CNNand FOX after his inauguration. By contrast, MSNBC continues i n u t e s ( p e r d a y ) Total01020 Mar Apr May Jun Jul Aug2012 2013Mar Apr May Jun Jul Aug2012 2013Mar Apr May Jun Jul Aug2012 2013(a) Travon Martin on CNN (b) Trayvon Martin on FOX (c) Trayvon Martin on MSNBC(d) George Zimmerman on CNN (e) George Zimmerman on FOX (f) George Zimmerman on MSNBC video clips04080 ... ...... ... ......

Total

CNN FOX MSNBC

Figure 11: In early coverage of the shooting of Trayvon Martin (by George Zimmerman), all three channels used the samephotos of Martin and Zimmerman. However, as the story progressed, depictions of Trayvon (top) differed significantly acrosschannels. Depictions of Zimmerman (bottom) also evolved over time, but largely reflect efforts by channels to use the mostup-to-date photo of Zimmerman during criminal proceedings. N u m b e r o f c o u n t r i e s RussiaIranNorth Korea IraqChinaSyriaIsraelAfghanistanMexicoU.K.LibyaEgyptFranceUkraine

Figure 12: Some countries receive more attention in U.S. ca-ble TV news than others. Russia is the largest outlier fol-lowed by Iran. to most commonly refer to him without

President . We plot sim-ilar charts of

President Obama over the course of his presidencyfrom 2010 to January 2017 (Figure 16d-e) and find that, on all threechannels,

President is used more often than not.

People are often associated with specific topics discussed in cableTV news. We analyze the visual association of faces to specifictopics by computing how often faces are on screen at the same time that specific words are mentioned. We obtain millisecond-scale timealignments of caption words with the video’s audio track using theGentle word aligner [47] (Supplemental 1.1).

Which words are most likely to be said when women are onscreen?

By treating both face detections and words as time inter-vals, we compute the conditional probability of observing at leastone female-presenting (or one male-presenting) face on screengiven each word in the caption text (Supplemental 1.15). Becauseof the gender imbalance in screen time, the conditional probabilityof a female-presenting face being on screen when any arbitraryword is said is 29.6%, compared to 61.4% for a male-presenting face.We are interested in words where the difference between femaleand male conditional probabilities deviates from the baseline 31.9%difference.Figure 17 shows the top 35 words most associated with male- andfemale-presenting faces on screen. For female-presenting faces, thetop words are about womens’ health (e.g., breast , pregnant ); family (e.g., boyfriend , husband , mom(s) , mothers , parenthood , etc.); andfemale job titles (e.g., actress , congresswoman ). Weather-relatedterms (e.g., temperatures , meteorologist , blizzard , tornadoes )and business news terms (e.g., futures , Nasdaq , stocks , earnings )are also at or near gender parity, and we attribute this to a numberof prominent female weatherpersons (Indra Petersons/CNN, JaniceDean/FOX, Maria Molina/FOX) and female business correspondents(Christine Romans/CNN, Alison Kosik/CNN, JJ Ramberg/MSNBC,Stephanie Ruhle/MSNBC, Maria Bartiromo/FOX) across much ofthe last decade. The top words associated with male-presentingfaces on screen are about foreign affairs, terrorism, and conflict (e.g., ISIL , Israelis , Iranians , Saudis , Russians , destroy , treaty ); andwith fiscal policy (e.g., deficits , trillion , entitlement(s) ). Thestark difference in the words associated with female-presentingscreen time suggests that, over the last decade, the subjects dis-cussed on-air by presenters and guests varied strongly dependingon their gender. Who uses unique words?

We define vocabulary to be “unique”to a person if the probability of that individual being on screenconditioned on the word being said (at the same time) is high.Table 1 lists all words for which an individual has a greater thana 50% chance of being on screen when the word is said. (We limitanalysis to words mentioned at least 100 times.) Political opinionshow hosts (on FOX and MSNBC) take the most creative liberty intheir words, accounting for all but three names in the list.

Which presenters are on screen when the President honorificis said?

A news presenter’s use of the

President honorific pre-ceding Trump or Obama might set a show’s tone for how theseleaders are portrayed. When a presenter is on screen, we find thatthe honorific term

President is used a greater percentage of timefor Obama than for Trump, during the periods of their presiden-cies. On all three channels, most presenters lie below the parityline in Figure 18. However, the average FOX presenter is closer toparity than the average presenter on CNN or MSNBC in uses of

President in reference to Trump and Obama (a few FOX presen-ters lie above the line). Furthermore, Figure 19 shows how the tophosts (by screen time) on each channel are associated with uses of

President to refer to Trump over time. . Russia 2. Iran 3. Syria 4. Iraq 5. China6. North Korea 7. Israel 8. Afghanistan 9. Mexico 10. U.K.11. Libya 12. Egypt 13. France 14. Ukraine 15. Japan C o u n t ( p e r m o n t h ) Arab Spring & First Libyan Civil War Arab Spring & Revolution Charlie Hebdoshooting &Île-de-Franceattacks November Paris attacks (2015) War in Donbass & Annexation of Crimea by Russia Tōhoku earthquake/tsunami & Fukushima Daiichi nuclear disasterBrexit referendemScotland IdependenceReferendum Israel-Gazaconflict (2014)North Korea crisis(2017-2018) North Korea crisis(2017-2018)Rise of the Islamic StateChemicalattacks & U.S. strikesNuclear talks & agreementAnnexation of Crimea Coup d'état (2013)Benghazi attack Persian Gulf crisis(2019-2020) Tariffs & trade warCivil war events Trump campaigningon his border wallTrump threatens tariffsU.S. troop surge(2009-2012)

Figure 13: Major peaks in mentions of foreign countries occur around disasters and crises. Since the start of Trump’s presidency,there has been an increase in coverage of Russia, China, and North Korea due to increased tensions and a marked shift in U.S.foreign policy (shaded). C o u n t ( p e r d a y ) Days (later) Days (later) Days (later) Days (later)(a) Major terrorist attacks (U.S.) (b) Major terrorist attacks (Europe) (c) Mass shootings (U.S.) (d) Commercial airline crashesPulse nightclub shooting (2016) Île-de-France attacks (2015) Stoneman Douglas school shooting (2018) Malaysia MH370 (2014)San Bernardino shooting (2015) November Paris attacks (2015) Las Vegas shooting (2017) AirAsia QZ8501 (2014) Boston Marathon bombing (2013) Manchester Arena bombing (2017) Sandy Hook school shooting (2012) Malaysia MH17 (2014)0 7 14 21 28 0 7 14 21 280 7 14 21 280 7 14 21 28 avg avg avg avg

Figure 14: Following a major terrorist attack, mass shooting, or plane crash, usage of related terms increases and remainselevated for 2-3 weeks before returning to pre-event levels. A few plane crashes continued to be covered after this period asnew details about the crash (or disappearance in the case of MH370) emerge. In the figure above, lines for individual events areterminated early if another unrelated event of the same category occurs; for example, the San Bernardino shooting (a terroristattack) in December 2015 occurred three weeks after the November 2015, Paris attacks. C o u n t ( p e r m o n t h ) IllegalUndocumented(a) CNN (b) FOX (c) MSNBC

CNN FOX MSNBC

Figure 15: Counts of “illegal immigrant” and “undocu-mented immigrant” terminology in video captions, bymonth.

Illegal is more common than undocumented on allthree channels, but FOX uses it the most.

Undocumented onlycomes into significant use from 2012 onward.

How much was Hillary Clinton’s face associated with theword email ? Hillary Clinton’s emails were a frequent newstopic in 2015 and during the 2016 presidential election due toinvestigations of the 2012 Benghazi attack and her controversialuse of a private email server while serving as U.S. Secretary ofState. During this period, Clinton’s face was often on screen whenthese controversies were discussed, visually linking her to thecontroversy. We compute that during the period spanning 2015to 2016, Clinton’s face is on screen during 11% of mentions ofthe word email(s) (Figure 20), a significantly higher percentagethan the 1.9% of the time that she is on screen overall. This degreeof association is similar across all three channels (Supplemental2.3.1). C o u n t ( p e r m o n t h ) (b) FOX2015 2017 2019(a) CNN (c) MSNBC(d) CNN (e) FOX (f) MSNBCPresident ObamaJust “Obama”President TrumpJust “Trump” CNN FOX MSNBC

Figure 16: Counts of

Trump and

Obama peaked in election years(2016 and 2012). After his inauguration,

Trump is referred tomore often without

President than with (MSNBC has thelargest gap). In contrast,

Obama is referred to with

President more often than not. The channel color-coded lines repre-sent the total counts of

Trump and

Obama , without exclusionssuch as the Trump administration . Note: most of these countsare captured by the N-grams that we identified as referencesto Trump and Obama’s persons.

We have developed an interactive, web-based visualization tool(available at https://tvnews.stanford.edu) that enables the generalpublic to perform analyses of the cable TV news data set (Figure 21). %-50% +50% +100%-100% Pr(female|word) - Pr(male|word) N u m b e r o f w o r d s futures, +7breast, +5Nasdaq, +4boyfriend, +1actress, +0temperatures, +0Dow, -2husband, -2girls, -3women, -3pregnant, -4meteorologist, -4moms, -4newsroom, -4congresswoman, -6blizzard, -6female, -7northeast, -7 stocks, -7alert, -8mothers, -8hail, -8earnings, -8tornadoes, -8plains, -8parenthood, -8dress, -8mom, -9girl, -9nurse, -9woman, -9daughter, -9gender, -10sister, -10shopping, -101.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18. 19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.ISIL, -69ought, -61deficits, -60Iranians, -59revenues, -56capability, -55secondly, -55dishonest, -55phony, -54sir, -54prosperity, -54Europeans, -54Israelis, -53trillion, -53reduction, -53entitlements, -53capabilities, -53principle, -53 rational, -53objective, -53gentlemen, -53balanced, -53standpoint, -52enterprise, -52nonsense, -52entitlement, -51tremendous, -51Saudis, -51philosophy, -51destroy, -51Russians, -51Sunni, -50Watergate, -50principles, -50treaty, -501.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18. 19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35. Figure 17: The distribution of words by difference in condi-tional probability of a female- versus a male-presenting facebeing on screen (Supplemental 1.15). The 35 words that aremost associated with male- and female-presenting screentime are annotated. Note the stark differences in topic rep-resentation in the top male and female associated words:foreign policy, conflict, and fiscal terms (male); and femalehealth, family, weather, and business news terms (female).

Person Unique words ( 𝑃𝑟 [ 𝑝𝑒𝑟𝑠𝑜𝑛 | 𝑤𝑜𝑟𝑑 ] ) Bill O’Reilly (FOX) opine (60.6), reportage (59.0),spout (58.6), urchins (57.9),pinhead[ed,s] (49.0, 51.5, 50.2)Ed Schultz (MSNBC) classers (71.2), beckster (61.6),drugster (59.9), righties (55.2),trenders (60.8), psychotalk (54.2)Tucker Carlson (FOX) pomposity (76.2), smugness (71.5),groupthink (70.5)Sean Hannity (FOX) abusively (76.1), Obamamania (53.3)Glenn Beck (FOX) Bernays (82.3), Weimar (62.2)Rachel Maddow (MSNBC) [bull]pucky (47.9, 50.7), debunktion (51.4)Chris Matthews (MSNBC) rushbo (50.5)Kevin McCarthy (politician) untrustable (75.9)Chris Coons (politician) Delawareans (63.8)Hillary Clinton (politician) generalistic (56.5)

Table 1: Unique words are often euphemisms or insults( urchins ≡ children, beckster ≡ Glenn Beck, drugster / rushbo ≡ Rush Limbaugh, righties ≡ conservatives, etc.). Othersare the names of show segments or slogans. For example, Psychotalk is a segment of the

Ed Show ; Sean Hannity refersto the liberal media as

Obamamania media ; and Tucker Carl-son brands his own show as the “sworn enemy” of lying , pomposity , smugness , and groupthink . Some rare words be-come unique due to being replayed often on the news; forexample, Kevin McCarthy (U.S. representative) calls HillaryClinton untrustable and Hillary Clinton uses generalistic in the same sentence as her infamous statement character-izing Trump’s supporters as a “basket of deplorables”. Although this paper has focused on a static slice of data from Janu-ary 2010 to July 2019, our public tool ingests new video daily andcan be used to investigate coverage of contemporary topics (Fig-ure 22). Our design, inspired by the Google N-gram Viewer [42],generates time-series line charts of the amount of cable TV newsvideo (aggregate time) matching user-specified queries. Queries

50% 100%0%0%50%100% 50% 100%0% 50% 100%0% (a) CNN (b) FOX (c) MSNBC

Percent of instances when “President” is used for Obama P e r c e n t o f i n s t a n c e s w h e n “ P r e s i d e n t ” i s u s e d f o r T r u m p Chris CuomoAnderson CooperDon LemonBrooke BaldwinWolf Blitzer Sean Hannity Tucker CarlsonGreg GutfeldBill O’Reilly Chris MatthewsChris HayesRachel MaddowJoe ScarboroughMika Brzezinski

Figure 18: Percentage of mentions that use the president honorific for Trump (post-inauguration to January 20, 2017)and Obama (before January 20, 2017) by each news presen-ter (dots). A majority of presenters on all three channels use president a higher fraction of time when mentioning Obamathan they do with Trump. The presenters with the highestscreen time on each channel are annotated. may consist of one or more filters that select intervals of time whena specific individual appears on screen ( name="..." ), an on screenface has a specific presented gender ( tag="male" ), a keyword orphrase appears in the video captions ( text="..." ), or the videoscome from a particular channel ( channel="CNN" ), program, or timeof day.To construct more complex analyses, the tool supports queriescontaining conjunctions and disjunctions of filters, which serveto intersect or union the video time intervals matched by individ-ual filters ( name="Hillary Clinton" AND text="email" ANDchannel="FOX" ). We implemented a custom in-memory query pro-cessing system to execute screen time aggregation queries overthe entire cable TV news data set while maintaining interactiveresponse times for the user. In addition to generating time-seriesplots of video time, the tool enables users to directly view underly-ing video clips (and their associated captions) that match queriesby clicking on the chart.A major challenge when developing this tool was making aneasy-to-use, broadly accessible data analysis interface, while stillexposing sufficient functionality to support a wide range of analysesof who and what appears on cable TV news. We call out three designdecisions made during tool development. (1) Limit visualization to time-series plots.

Time-series analy-sis is a powerful way to discover and observe patterns over thedecade spanned by the cable TV news data set. While time-seriesanalysis does not encompass the full breadth of analyses presentedin this paper, we chose to focus the visualization tool’s design onthe creation of time-series plots to encourage and simplify thisimportant form of analysis. (2) Use screen time as a metric.

We constrain all queries, re-gardless of whether visual filters or caption text filters are used,to generate counts of a single metric: the amount of screen timematching the query. While alternative metrics, such as using wordcounts to analyze of caption text (section 4) or counts of distinctindividuals to understand who appears on a show, may be preferredfor certain analyses, we chose screen time because it is well suitedto many analyses focused on understanding representation in thenews. For example, a count of a face’s screen time directly reflectsthe chance a viewer will see a face when turning on cable TV news. P e r c e n t a g e o f i n s t a n c e s w h e n “ P r e s i d e n t ” i s u s e d f o r T r u m p (a) Top CNN news presenters (b) Top FOX news presenters (c) Top MSNBC news presenters Cuomo Primetime begins

CNN FOX MSNBC

Figure 19: Percentage of time when the president honorific is said for

Trump while a news presenter is on screen increases afterTrump’s inauguration (top 5 presenters for each channel are shown). Chris Cuomo (CNN) drops from over 40% to under 20%in June 2018 with his transition from hosting

New Day to Cuomo Primetime . Sean Hannity’s (FOX) decline is more gradual overthe course of Trump’s presidency. From 2017 onward, Wolf Blitzer (CNN) is consistently above the other top hosts on any ofthe three channels (averaging 72%).

Jan 2015 Jul 2015 Jan 2016 Jul 2016 Jan 20170%20%30%10% P e r c e n t o f c o u n t s ( p e r w ee k ) Email scandal gainsnational attention U.S. House Benghazi Committee hearingClinton & email(s)Clinton &

Figure 20: Hillary Clinton is on screen up to 33% of the timewhen email(s) is mentioned (11% on average from 2015 to2016). This is significantly higher than the percentage oftime that Clinton is on screen when any arbitrary word issaid (1.9% on average in the same time period).

Also, word counts can be converted into screen time intervals byattributing each instance of a word, regardless of its actual temporalextent, to a fixed interval of time ( textwindow="..." ). As a result,our tool can be used to perform comparisons of word counts aswell.Our decision to make all filters select temporal extents simplifiedthe query system interface. All filters result in a selection of timeintervals, allowing all filters to be arbitrarily composed in queriesthat combine information from face identity labels and captions. Asystem where some filters yielded word counts and others yieldstime intervals would complicate the user experience as it introducesthe notion of different data types into queries. (3) Facilitate inspection of source video clips.

We found it im-portant for the visualization tool to support user inspection of thesource video clips that match a query (Figure 21). Video clip inspec-tion allows a user to observe the context in which a face or wordappears in a video. This context in turn is helpful for understandingwhy a clip was included in a query result, which facilitates deeperunderstanding of trends being investigated, aids the process of de-bugging and refining queries, and helps a user assess the accuracyof the automatically generated video labels relied on by a query.

Analysis of user-generated charts

We released the tool on August 19, 2020 and began analyzing userbehavior from August 27, 2020 onward. As of September 10, 2020,we have logged 2.6K unique users (based on IP addresses, exclud-ing the authors), who have, on average, created 1.6 new charts containing one or more queries. We provide a FAQ and examplequeries, and these account for 12% of user-generated charts, whilea further 36% are modifications to our examples. Of user-generatedcharts, 43% plot the screen time of public figures, 3.7% plot screentime by gender, and 59% plot caption text searches (6.7% are mul-timodal, with both faces and text). Excluding names featured inour examples (e.g., Joe Biden, Donald Trump, Hillary Clinton, Ka-mala Harris, Elizabeth Warren), the most-queried individuals areBernie Sanders, Nancy Pelosi, Barack Obama, Mike Pence, alongwith several other 2020 Democratic presidential candidates. Un-derscoring the value of timely data, users show interest in currentevents; many common words are related to political polarization(e.g.,

QAnon , antifa , postal service ), COVID-19 (e.g., mask(s) ),civil unrest (e.g., George Floyd , protest(s) , looting ), the economy(e.g., economy , (un)employment ), and technology (e.g., AI , computerscience ). We hope that allowing the public to analyze such contentwill improve media understanding. Annotating video using machine learning techniques enables anal-ysis at scale, but it also presents challenges due to the limitationsof automated methods. Most importantly, the labels generated bycomputational models have errors, and understanding the preva-lence and nature of labeling errors (including forms of bias [49])is important to building trust in analysis results. Labeling errorsalso have the potential to harm individuals that appear in cable TVnews, in particular when related to gender or race [9, 16, 26]. Asa step toward understanding the accuracy of labels, we validatedthe output of our face and commercial detection; presented genderestimation; and person identification models (for a small subset ofindividuals) against human-provided labels on a small collectionof frames. The details of this validation process and the measuredaccuracies of models are provided in the supplemental material.Despite errors in our computational labeling methods at theindividual level, aggregate data about gender representation overtime on cable TV news is useful for understanding gender dispari-ties. Many questions about representation in cable TV news mediasimilarly concern the subject of race, but we are unaware of anycomputational model that can accurately estimate an individual’srace from their appearance (models we have seen have much loweraccuracy than models for estimating presented gender). However, ace identity filter Caption text filter Channel filter Time aligned captionsVideo timeline showing matching intervals Face bounding boxesVideo name(and Archive.org link) Figure 21: Our interactive visualization tool supports time-series analysis of the cable TV news data set. (Left) Users definequeries using a combination of face, caption text, and video metadata filters. The tool generates time-series plots of the totalamount of video (aggregate screen time) matching these queries. (Right) To provide more context for the segments of videoincluded in the chart, users can click on the chart to bring up the videos matching the query. We have found that providingdirect access to the videos is often essential for debugging queries and better understanding the relevant video clips.

Anti-lockdown protestsGeorge Floyd killingDrop in COVID-19 Resurgence of COVID-19

Figure 22: Our tool updates daily with new data and can be used to study contemporary issues. (Left) The amounts of time sinceDecember 1, 2019 when the words

COVID-19 (and its synonyms) and variants of the root word

PROTEST are said on each channel,treating each utterance as a 1s interval. The virus first comes to attention on national cable TV on January 17, 2020 and peakson March 12. There is a sharp dip in

COVID-19 (concurrent with a spike in

PROTEST ) on May 29, when nationwide, Black LivesMatter protests following George Floyd’s killing took over the headlines. From mid-June onward,

COVID-19 coverage rose again;however, the time on FOX is only half that of CNN. (Right) New York governor Andrew Cuomo (blue) rose to prominence inMarch-May, but has since disappeared from cable TV, while Dr. Anthony S. Fauci (purple) has seen a resurgence in screen timesince June. it may be possible to automatically determine the race of individ-uals for whom we have an identity label by using external datasources to obtain the individual’s self-reported race. A similar pro-cedure could also be used to obtain the self-identified gender ofan individual, reducing our reliance on estimating presented gen-der from appearance. Such approaches could further improve ourunderstanding of race and gender in cable TV news.Our system lacks mechanisms for automatically differentiatingdifferent types of face appearances. For example, an individual’s facemay be on screen because they are included in infographics, directlyappearing on the program (as a contributor or guest), or shown inB-roll footage. The ability to differentiate these cases would enable new analyses of how the news covers individuals. Likewise, whileour query system can determine when a specific individual’s faceis on screen when a word is spoken, it does not perform automaticspeaker identification. As a result, the on screen face may not bespeaking – e.g., when a news presenter delivers narration over silentB-roll footage. Extending our system to perform automatic speakeridentification [20] would allow it to directly support questions aboutthe speaking time of individuals in news programs or about whichindividuals spoke about what stories.We believe that adding the ability to identify duplicate clips inthe data set would prove to be useful in future analyses. For exam-ple, duplicate clips can signal re-airing of programs or replaying of opular sound bites. We would also like to connect analyses withadditional data sources such as political candidate polling statis-tics [22] as well as the number and demographics of viewers [46].Joining in this data would enable analysis of how cable TV newsimpacts politics and viewers more generally. Finally, we are work-ing with several TV news organizations to deploy private versionsof our tool on their internal video archives. Prior work in HCI and CSCW has investigated the “informationenvironments” [36] created by technologies such as searchengines [34], social media feeds [12], and online news [18]. Bydetermining what information is easily accessible to users, thesesystems affect people’s beliefs about the world. For example, Kayet. al. [36] showed that gender imbalance in image search resultscan reinforce gendered stereotypes about occupations. Commonmethods used include algorithmic audits [60], mixed-methodstudies of online disinformation campaigns [59], and user studiesthat gauge how algorithms and UI design choices affect userperceptions [21, 58]. While topics such as misinformation spreadvia social media and online news have become a popular areaof research in this space, television remains the dominant newsformat in the U.S. [4]. Therefore, analysis of traditional mediasuch as cable TV is necessary to characterize the informationenvironments that users navigate.

Manual analysis of news and media.

There have been manyefforts to study trends in media presentation, ranging from analysisof video editing choices [6, 8, 17, 31], coverage of political candi-dates [38], prevalence of segment formats (e.g. interviews [14]), andrepresentation by race and gender [7, 28, 52, 61]. These efforts relyon manual annotation of media, which limits analysis to smallamounts of video (e.g., a few 100s of hours/soundbites [8, 31], fiveSunday morning news shows in 2015 [52]) or even to anecdotalobservations of a single journalist [15, 45]. The high cost of manualannotation makes studies at scale rare. The BBC 50:50 Project [7],which audits gender representation in news media, depends onself-reporting from newsrooms across the world. GMMP [28] re-lies on a global network of hundreds of volunteers to compile areport on gender representation every five years. While automatedtechniques cannot generate the same variety of labels as humanannotators (GMMP requires a volunteer to fill out a three-pageform for stories they annotate [28]), annotation at scale using com-putational techniques stands to complement these manual efforts.

Automated analysis of media.

Our work was heavily inspiredby the Google N-gram viewer [42] and Google Trends [29], whichdemonstrate that automated computational analysis of word fre-quency, when performed at scale (on centuries of digitized books orthe world’s internet search queries), can serve as a valuable tool forstudying trends in culture. These projects allow the general publicto conduct analyses by creating simple time series visualizationsof word frequencies. We view our work as bringing these ideas tocable TV news video.Our system is similar to the GDELT AI Television Explorer [24],which provides a web-based query interface for caption text andon screen chryon text in the Internet Archive’s cable TV news dataset and recently added support for queries for objects appearing on screen. Our work analyzes nearly the same corpus of source video,but, unlike GDELT, we label the video with information about thefaces on screen. We believe that information about who is on screenis particularly important in many analyses of cable TV news media,such as those in this paper.There is growing interest in using automated computationalanalysis of text, images, and videos to facilitate understanding oftrends in media and the world. This includes mining print news andsocial media to predict civil unrest [44, 51] and forced populationmigration [57]; using facial recognition on TV video streams tobuild connectivity graphs between politicians [50]; using genderclassification to quantify the lack of female representation in Hol-lywood films [25]; understanding presentation style and motion in“TED talk” videos [62, 64]; identifying trends in fashion [27, 40] frominternet images; and highlighting visual attributes of cities [5, 19].These prior works address socially meaningful questions in otherdomains but put forward techniques that may also be of interest inour cable TV data set.Finally, time series visualizations of word and document frequen-cies are commonly used to show changes in patterns of culturalproduction [48]. We draw inspiration from advocates of “distantreading,” who make use of these visual representations to allow forinsights that are impossible from manual inspection of documentcollections [43].

Alternative approaches for video analysis queries.

A wide va-riety of systems exist for interactive video analysis, and existingwork in interaction design has presented other potential approachesto formulating queries over video data sets. Video Lens [39] demon-strates interactive filtering using brushing and linking to filtercomplex spatio-temporal events in baseball video. The query-by-example approach [67] has been used in image [3, 11, 63, 65], andsports domains [54, 55]. These example-based techniques are lessapplicable for our visualization tool, which focuses on letting usersanalyze who and what is in cable TV news; typing a person’s nameor the keywords in the caption is often easier for users than speci-fying these attributes by example. Other works from Höferlin, etal. [35] and Meghdadi, et al. [41] propose interactive methods tocluster and visualize object trajectories to identify rare events ofinterest in surveillance video. Analyzing motion-based events (e.g.,hand gestures) in TV news is an area of future work.

We have conducted a quantitative analysis of nearly a decade of U.S.cable TV news video. Our results demonstrate that automatically-generated video annotations, such as annotations for when facesare on screen and when words appear in captions, can facilitateanalyses at scale that provide unique insight into trends in whoand what appears in cable TV news. To make analysis of our dataset accessible to the general public, we have created an interactivescreen time visualization tool that allows users to describe videoselection queries and generate time-series plots of screen time. Wehope that by making this tool publicly available, we will encouragefurther analysis and research into the presentation of this importantform of news media. CKNOWLEDGMENTS

This material is based upon work supported by the National ScienceFoundation (NSF) under IIS-1539069 and III-1908727. This work wasalso supported by financial and computing gifts from the BrownInstitute for Media Innovation, Intel Corporation, Google Cloud,and Amazon Web Services. We thank the Internet Archive forproviding their data set for academic use. Any opinions, findings,and conclusions or recommendations expressed in this material arethose of the author(s) and do not necessarily reflect the views ofthe sponsors.

REFERENCES [1] 2020. Amazon Rekognition - Automate your image and video analysis withmachine learning. https://aws.amazon.com/rekognition.[2] 2020. Internet Archive: TV News Archive. https://archive.org/details/tv.[3] Adobe. 2020. Concept Canvas. https://research.adobe.com/news/concept-canvas/[4] Jennifer Allen, Baird Howland, Markus Mobius, David Rothschild, and Duncan J.Watts. 2020. Evaluating the fake news problem at the scale of the informationecosystem.

Science Advances

6, 14 (2020). https://doi.org/10.1126/sciadv.aay3539arXiv:https://advances.sciencemag.org/content/6/14/eaay3539.full.pdf[5] Sean M Arietta, Alexei A Efros, Ravi Ramamoorthi, and Maneesh Agrawala. 2014.City forensics: Using visual elements to predict non-visual city attributes.

IEEEtransactions on visualization and computer graphics

20, 12 (2014), 2624–2633.[6] Kevin G. Barnhurst and Catherine A. Steele. 1997. Image-Bite News: The VisualCoverage of Elections on U.S. Television, 1968-1992.

Harvard International Journalof Press/Politics

Journal of Communication

57, 4 (12 2007), 652–675. https://doi.org/10.1111/j.1460-2466.2007.00362.x arXiv:https://academic.oup.com/joc/article-pdf/57/4/652/22326154/jjnlcom0652.pdf[9] Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accu-racy disparities in commercial gender classification. In

Conference on fairness,accountability and transparency . 77–91.[10] Bureau of Labor Statistics, U.S. Department of Labor. 2019. National LongitudinalSurvey of Youth 1979 cohort, 1979-2016 (rounds 1-27).[11] Yang Cao, Hai Wang, Changhu Wang, Liqing Zhang, Lei Zhang, and Zhiwei Li.2010. MindFinder: Interactive Sketch-based Image Search on Millions of Images.In

ACM Multimedia 2010 International Conference.

Proc. ACM Hum.-Comput. Interact.

European Journal of Communica-tion

4, 4 (1989), 435–451. https://doi.org/10.1177/0267323189004004005arXiv:https://doi.org/10.1177/0267323189004004005[15] Nicholas Confessore and Karen Yourish. 2016. $2 Billion Worth ofFree Media for Donald Trump.

The New York Times i-Perception

2, 6 (2011), 569–576. https://doi.org/10.1068/i0441aaparXiv:https://doi.org/10.1068/i0441aap PMID: 23145246.[18] Nick Diakopoulos. 2019.

Automating The News - How Algorithms Are Rewritingthe Media (1 ed.). Harvard University Press, Cambridge, MA, USA.[19] Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros.2012. What Makes Paris Look Like Paris?

ACM Trans. Graph.

31, 4, Article 101(jul 2012), 9 pages.[20] Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, AvinatanHassidim, William T. Freeman, and Michael Rubinstein. 2018. Looking to Listenat the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation.

ACM Trans. Graph.

37, 4, Article 112 (July 2018), 11 pages. https://doi.org/10.1145/3197517.3201357[21] Motahhare Eslami, Aimee Rickman, Kristen Vaccaro, Amirhossein Aleyasen,Andy Vuong, Karrie Karahalios, Kevin Hamilton, and Christian Sandvig. 2015. "IAlways Assumed That I Wasn’t Really That Close to [Her]": Reasoning aboutInvisible Algorithms in News Feeds. In

Proceedings of the 33rd Annual ACMConference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15) . Association for Computing Machinery, New York, NY, USA, 153–162.https://doi.org/10.1145/2702123.2702556[22] FiveThirtyEight. 2020. Latest Polls. https://projects.fivethirtyeight.com/polls/.[23] Hadley Freeman. 2017. Why do all the women on Fox News look anddress alike? Republicans prefer blondes.

The Guardian

International Conference on Learning Representations . https://openreview.net/forum?id=Bygh9j09KX[27] S. Ginosar, K. Rakelly, S. M. Sachs, B. Yin, C. Lee, P. Krähenbühl, and A. A. Efros.2017. A Century of Portraits: A Visual Historical Record of American HighSchool Yearbooks.

IEEE Transactions on Computational Imaging

Journal of Communication

42, 2 (02 2006), 5–24. https://doi.org/10.1111/j.1460-2466.1992.tb00775.x arXiv:https://academic.oup.com/joc/article-pdf/42/2/5/22465029/jjnlcom0005.pdf[32] Foad Hamidi, Morgan Klaus Scheuerman, and Stacy M. Branham. 2018. GenderRecognition or Gender Reductionism? The Social Implications of EmbeddedGender Recognition Systems. In

Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems (Montreal QC, Canada) (CHI ’18) . Associationfor Computing Machinery, New York, NY, USA, Article 8, 13 pages. https://doi.org/10.1145/3173574.3173582[33] Stephen Hiltner. 2017. Illegal, Undocumented, Unauthorized: TheTerms of Immigration Reporting.

The New York Times

Proc.ACM Hum.-Comput. Interact.

4, CSCW1, Article 048 (May 2020), 27 pages.https://doi.org/10.1145/3392854[35] M. Höferlin, B. Höferlin, G. Heidemann, and D. Weiskopf. 2013. InteractiveSchematic Summaries for Faceted Exploration of Surveillance Video.

IEEE Trans-actions on Multimedia

15, 4 (June 2013), 908–920. https://doi.org/10.1109/TMM.2013.2238521[36] Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Repre-sentation and Gender Stereotypes in Image Search Results for Occupations. In

Proceedings of the 33rd Annual ACM Conference on Human Factors in ComputingSystems (Seoul, Republic of Korea) (CHI ’15) . Association for Computing Machin-ery, New York, NY, USA, 3819–3828. https://doi.org/10.1145/2702123.2702520[37] Os Keyes. 2018. The Misgendering Machines: Trans/HCI Implications of Auto-matic Gender Recognition.

Proc. ACM Hum.-Comput. Interact.

2, CSCW, Article88 (Nov. 2018), 22 pages. https://doi.org/10.1145/3274357[38] Ven-hwei Lo, Pu-tsung King, Ching-ho Chen, and Hwei-lin Huang. 1996. Po-litical bias in the news coverage of Taiwan’s first presidential election: Acomparative analysis of broadcast TV and cable TV news.

Asian Journal ofCommunication

6, 2 (1996), 43–64. https://doi.org/10.1080/01292989609364743arXiv:https://doi.org/10.1080/01292989609364743[39] Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2014. Video Lens:Rapid Playback and Exploration of Large Video Collections and AssociatedMetadata. In

Proceedings of the 27th Annual ACM Symposium on User InterfaceSoftware and Technology (Honolulu, Hawaii, USA) (UIST ’14) . Association forComputing Machinery, New York, NY, USA, 541–550. https://doi.org/10.1145/2642918.2647366[40] Kevin Matzen, Kavita Bala, and Noah Snavely. 2017. StreetStyle: Exploring world-wide clothing styles from millions of photos. arXiv preprint arXiv:1706.01869 IEEE Trans-actions on Visualization and Computer Graphics

19, 12 (Dec 2013), 2119–2128.https://doi.org/10.1109/TVCG.2013.168[42] Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, AdrianVeres, Matthew K. Gray, The Google Books Team, Joseph P. Pick-ett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, StevenPinker, Martin A. Nowak, and Erez Lieberman Aiden. 2011. Quan-titative Analysis of Culture Using Millions of Digitized Books.

Sci-ence

Graphs, Maps, Trees: Abstract Models for a Literary History .Verso, New York, NY, USA.[44] Sathappan Muthiah, Patrick Butler, Rupinder Paul Khandpur, Parang Saraf,Nathan Self, Alla Rozovskaya, Liang Zhao, Jose Cadena, Chang-Tien Lu, AnilVullikanti, and et al. 2016. EMBERS at 4 Years: Experiences Operating an OpenSource Indicators Forecasting System. In

Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (San Francisco,California, USA) (KDD ’16)

Proceedings of the International JointConference on Natural Language Processing (Nagoya, Japan). Association forComputational Linguistics, 347–355.[49] Inioluwa Deborah Raji, Timnit Gebru, Margaret Mitchell, Joy Buolamwini, Joon-seok Lee, and Emily Denton. 2020. Saving Face: Investigating the Ethical Concernsof Facial Recognition Auditing. In

Proceedings of the AAAI/ACM Conference on AI,Ethics, and Society (New York, NY, USA) (AIES ’20) . Association for Computing Ma-chinery, New York, NY, USA, 145–151. https://doi.org/10.1145/3375627.3375820[50] B. Renoust, D. Le, and S. Satoh. 2016. Visual Analytics of Political NetworksFrom Face-Tracking of News Video.

IEEE Transactions on Multimedia

18, 11 (Nov2016), 2184–2195. https://doi.org/10.1109/TMM.2016.2614224[51] Parang Saraf and Naren Ramakrishnan. 2016. EMBERS AutoGSR: AutomatedCoding of Civil Unrest Events. In

Proceedings of the 22nd ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining (San Francisco,California, USA) (KDD ’16)

Proceedings of the IEEEconference on computer vision and pattern recognition . 815–823.[54] Long Sha, Patrick Lucey, Yisong Yue, Peter Carr, Charlie Rohlf, and Iain Matthews.2016. Chalkboarding: A new spatiotemporal query paradigm for sports playretrieval. In

Proceedings of the 21st International Conference on Intelligent UserInterfaces . ACM, 336–347.[55] Long Sha, Patrick Lucey, Stephan Zheng, Taehwan Kim, Yisong Yue, and SridhaSridharan. 2017. Fine-grained retrieval of sports plays using tree-based alignmentof trajectories. arXiv preprint arXiv:1710.02255

Proceedings of the 25th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (Anchorage, AK, USA) (KDD’19) . Association for Computing Machinery, New York, NY, USA, 1975–1983.https://doi.org/10.1145/3292500.3330774[58] Brendan Spillane, Isla Hoe, Mike Brady, Vincent Wade, and Séamus Lawless.2020. Tabloidization versus Credibility: Short Term Gain for Long Term Pain. In

Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20) . Association for Computing Machinery, New York,NY, USA, 1–15. https://doi.org/10.1145/3313831.3376388[59] Kate Starbird, Ahmer Arif, and Tom Wilson. 2019. Disinformation as Collabora-tive Work: Surfacing the Participatory Nature of Strategic Information Operations.

Proc. ACM Hum.-Comput. Interact.

3, CSCW, Article 127 (Nov. 2019), 26 pages.https://doi.org/10.1145/3359229 [60] Daniel Trielli and Nicholas Diakopoulos. 2019. Search as News Curator: TheRole of Google in Shaping Attention to News Information. In

Proceedings of the2019 CHI Conference on Human Factors in Computing Systems (Glasgow, ScotlandUk) (CHI ’19) . Association for Computing Machinery, New York, NY, USA, 1–15.https://doi.org/10.1145/3290605.3300683[61] Women’s Media Center. 2017. The Status of Women in the U.S. Media.[62] Aoyu. Wu and Huamin. Qu. 2018. Multimodal Analysis of Video Collections:Visual Exploration of Presentation Techniques in TED Talks.

IEEE Transactionson Visualization and Computer Graphics (2018), 1–1. https://doi.org/10.1109/TVCG.2018.2889081[63] Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour,and Robinson Piramuthu. 2017. Visual Search at EBay. In

Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada) (KDD ’17) . Association for Computing Machinery, NewYork, NY, USA, 2101–2110. https://doi.org/10.1145/3097983.3098162[64] H. Zeng, X. Wang, A. Wu, Y. Wang, Q. Li, A. Endert, and H. Qu. 2020. EmoCo:Visual Analysis of Emotion Coherence in Presentation Videos.

IEEE Transactionson Visualization and Computer Graphics

26, 1 (Jan 2020), 927–937. https://doi.org/10.1109/TVCG.2019.2934656[65] Andrew Zhai, Hao-Yu Wu, Eric Tzeng, Dong Huk Park, and Charles Rosenberg.2019. Learning a Unified Embedding for Visual Search at Pinterest. In

Proceed-ings of the 25th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining (Anchorage, AK, USA) (KDD ’19) . Association for ComputingMachinery, New York, NY, USA, 2412–2420. https://doi.org/10.1145/3292500.3330739[66] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint facedetection and alignment using multitask cascaded convolutional networks.

IEEESignal Processing Letters

23, 10 (2016), 1499–1503.[67] Moshé M. Zloof. 1975. Query by Example. In

Proceedings of the 1975 Na-tional Computer Conference and Exposition (Anaheim, California) (AFIPS ’75) .Association for Computing Machinery, New York, NY, USA, 431–438. https://doi.org/10.1145/1499949.1500034 upplemental Material:Detailed Methodology and Additional Analyses James Hong, Will Crichton, Haotian Zhang, Daniel Y. Fu, Jacob RitchieJeremy Barenholtz, Ben Hannel, Xinwei Yao, Michaela MurrayGeraldine Moriba, Maneesh Agrawala, Kayvon Fatahalian

Stanford University

S1 THE DATA SET AND PROCESSING

Our static data set consists of 244,038 hours of video, audio, and cap-tions recorded by the Internet Archive’s TV News Archive [2] fromJanuary 1, 2010 to July 23, 2019. It is segmented into 215,771 videos,organized by the date/time of airing and the name of the show.The data set requires 114 terabytes to store, encoded in standarddefinition (640 ×

360 to 858 × S1.1 Captions and time alignment

Closed captions are available from the Internet Archive. The cap-tions are all upper case for the majority of news programming andcontain 2.49 billion text tokens, of which 1.94 million are unique(average token length is 3.82 characters). Not all tokens are words(they include punctuation, numbers, misspellings, etc.), however.By a random sample of the set of unique tokens, we estimate thatthere are 141K unique English words in the data set ( ±

31K at 95%confidence).We use the Gentle word aligner [9] to perform sub-second align-ment of words in a video’s captions to the video’s audio track, as-signing each token a starting and ending time. (The source captionsare only coarsely aligned to the video.) Alignment is consideredsuccessful if alignments are found for 80% of the words in the cap-tions. By this metric, we are able to align captions for 92.4% of thevideos. The primary causes of failure for caption alignment aretruncated captions or instances where the captions do not matchthe audio content (e.g., due to being attributed to the wrong video).The average speaking time for a single word after alignment is 219ms.While the captions are generally faithful to the words beingspoken, we observe occasional differences between the captions andthe audio track. For example, captions are missing when multipleindividuals are speaking (interrupting or talking over each other).The spelling in the captions also sometimes does not reflect thestandard English spelling of a word; email appears as e mail and

Obamacare appears as

Obama Care . When analyzing these topics inthe paper, we account for these spelling/segmentation variants.

S1.2 Commercial detection

We observe that commercial segments in the data set are oftenbracketed by black frames, have captions in mixed/lower case (asopposed to all uppercase for news content), or are missing captiontext entirely. Commercials also do not contain » delimiters (for Year Precision Recall Error (frame level)

Table 1: Face detector precision and recall for all faces (in250 randomly sampled frames per year). speaker changes). Using these features, we developed a heuristicalgorithm that scans videos for sequences of black frames (whichtypically indicate the start and end of commercials) and for videosegments where caption text is either missing or mixed/lower case.The algorithm is written using Rekall [6], an API for complex eventdetection in video, and is shown in Figure 1. To validate our com-mercial detection algorithm, we hand annotated 225 hours of videoswith 61.8 hours of commercials. The overall precision and recallof our detector on this annotated data set are 93.0% and 96.8%respectively.Note: we are unable to detect commercials in 9,796 hours of video(2,713 CNN, 4,614 FOX, and 2,469 MSNBC) because the captionsfrom those videos are unavailable due to failed alignment or missingfrom the Internet Archive [2].

S1.3 Face detection

We use MTCNN [13] to detect faces in a subset of frames uniformlyspaced by three seconds in a video. (Performing face detection onall frames is cost prohibitive.) Three seconds is on the order of 2xthe average shot length ( ≈ . a r X i v : . [ c s . C Y ] O c t caption_words = rekall.ingest(captions, 1D) histograms = rekall.ingest(database.table("hists"), 1D) entire_video = rekall.ingest(database.table("video"), 3D) captions_with_arrows = caption_words .filter(word: '>>' in word) black_frame_segs = histograms .filter(i: i.histogram.avg() < 0.01) .coalesce(predicate = time_gap < 0.1s, merge = time_span) .filter(i: i["t2"] - i["t1"] > 0.5s) candidate_segs = entire_video.minus(black_frame_seqs) non_commercial_segs = candidate_segs .filter_against( captions_with_arrows, predicate = time_overlaps) commercial_segs = entire_video .minus(non_commercial_segs.union(black_frame_segs)) commercials = commercial_segs .coalesce(predicate = time_overlaps, merge = time_span) .filter(i: i["t2"] - i["t1"] > 10s) lower_case_word_segs = caption_words .filter(word: word.is_lowercase()) .coalesce(predicate = time_gap < 5s, merge = time_span) no_captions_segs = entire_video .minus(caption_words) .filter(i: 30 < i["t2"] - i["t1"] < 270) commercials = commercials .union(lower_case_word_segs) .union(no_captions_segs) .coalesce(predicate = time_gap < 45s, merge = time_span) .filter(comm: comm["t2"] - comm["t1"] < 300s) Figure 1: The Rekall [6] query for detecting commercials ina video.

To estimate the accuracy of face detection, we manually countedthe actual number of faces and the number of errors (false pos-itives/negatives) made by the MTCNN [13] face detector in 250randomly sampled frames from each year of the data (Table 1).Overall precision is high ( ≈ ≈ S1.4 Gender classification

We trained a binary K-NN classifier using the FaceNet [11] descrip-tors. For training data, we manually annotated the presented binarygender of 12,669 faces selected at random from the data set. On6,000 independently sampled validation examples, the classifierhas 97.2% agreement with human annotators. Table 2 shows theconfusion matrix and class imbalance between male-presentingfaces and female-presenting faces.Imbalances in the error behavior of the K-NN model can influ-ence the results of an analysis (e.g., recall for females, 93.8%, islower than males, 98.8%). At present, we do not adjust for theseimbalances in the paper. One extension to our analyses would beto incorporate these error predictions into the reported findings.For example, we detected 72.5M female-presenting and 178.4Mmale-presenting faces in all of the news content (28.9% of faces arefemale). Adjusting based on the error rates in Table 2, we wouldexpect 5.0M females to be mislabeled as males and 2.0M males to bemislabeled as females, resulting in an expected 76.5M female facesand 175.4M male faces. This shifts the percentage of female faces to30.4%. Similar adjustments to other analyses where data is analyzedacross time or slices (e.g., channel, show, video segments when“Obama is on screen”) can be devised, subject to assumptions aboutthe uniformity and independence of model error rates with respectto slices of the data set or the availability of additional validationdata to compute fine-grained error estimates. We do not, however,know of closed-form solutions that are consistently applicable to allof our analyses. These extensions are considered future work, andwe focus on salient differences in the paper, the accuracy statisticsreported here in S1, and on careful spot-checking of the results (e.g.,using the interactive tool) when model accuracies are concerned.In randomly sampling 6,000 faces for validation (4,109 labeledmale by human annotators and 1,891 female), we can estimatethat female-presenting individuals comprise 31.5% ( ± S1.5 Identifying public figures

To identify individuals, we use the Amazon Rekognition CelebrityRecognition API [1]. This API identifies 46.2% of the faces in thedata set. To reduce flickering (where a portion of instances of anindividual in a video are missed by Amazon), we propagate theseface detections to an additional 10.7% of faces using a conservative -NN model labelsHuman labels Male FemaleMale 4,058 51Female 118 1,773

Table 2: Presented gender confusion matrix between K-NNmodel generated labels and human labels. The estimatedprecision and recall for the male-presenting and female-presenting classes are 97.2% and 98.8%; and 97.2% and 93.8%,respectively.Screen time

Table 3: Amazon Rekognition Celebrity Recognition [1] re-turns facial identity predictions for 162,307 distinct namesin our data set. We noticed that the majority of uncommonnames (individuals with less than 10 hrs of screen time) pre-dicted by Amazon are “doppelgangers” of the people whoare actually in the news content (false positives). These dop-pelgangers include a large number of foreign musicians,sports players, and actors/actresses. To evaluate the effect ofthese errors, we randomly sampled 25 individuals (by name)from each screen time range and visually validated whetherthe individual is present only as a doppelganger to other in-dividuals. Our results suggest that a threshold of 10 hoursis needed to eliminate most of the doppelgangers. We man-ually verified that the individuals (e.g., politicans, news pre-senters, shooting perpetrators/victims) referenced in the pa-per do not fall under the “doppelganger” category.

L2 distance metric threshold between the FaceNet [11] descriptorsof identified and unidentified faces within the same video.As mentioned in the paper 1,260 unique individuals receive atleast 10 hours of screen time in our data set, accounting in totalfor 47% of faces in the data set. We validated a stratified sample ofthese individuals and estimate that 97.3% of the individuals in thiscategory correspond to people who are in data set (not just visuallysimilar “doppelgangers” of individuals in the news). See Table 3 forthe full statistics and methodology of the doppelgangers estimation.For important individuals who are not recognized by the AmazonRekognition Celebrity Recognition API [1] or whose labels areknown to be inaccurate, we train our own person identificationmodels using the FaceNet [11] descriptors. In the latter case, wedetermined a person’s labels to inaccurate if they were consistently being missed or mis-detected on visual inspection of the videos.To obtain our own labels, we followed two human-in-the-looplabeling methodologies optimized for people who are common(e.g., a President or news presenter who appears for hundreds ofhours) and for people who are uncommon (e.g., a shooting victimor less-known public official). The methodologies are described inS1.5.1 and S1.5.2, respectively. We determined which approach touse experimentally; if we could not find enough training examplesfor the common person approach, we switched to the uncommonperson approach. The individuals for which we use our own labelsare listed in Table 4.Table 5 estimates the precision and recall of the labels for theindividuals referenced in our paper analyses (e.g., important politi-cal figures and candidates). Precision is influenced by many factors,including the presence of individuals of similar appearance beingprominent in the news. Because each individual represents only asmall portion of overall face screen time, unbiased recall is difficultto compute without finding all instances of an individual. We per-form a best effort attempt to estimate recall by manually countingfalse negatives in randomly sampled frames from videos known tocontain the individual (25 videos, 100 frames per video). We notethat the number of samples per individual, found in these frames,varies due to the quantity and nature of an individual’s coverage(e.g., appearances in interviews, and the size and quality of theirimages).

S1.5.1 Methodology for detecting uncommon individuals.

To detectuncommon individuals (with less than ≈

50 hours of screen time or60,000 face detections), we use Google Image Search [8] to obtaininitial images of the person. Next, we use FaceNet [11] to computedescriptors on these examples. We compute the L2 distances fromthese descriptors to descriptors for all other faces in the data setand display the faces visually by ascending L2 distance. We selectinstances of the faces that visually match the person, add them tothe example set and repeat the process of computing L2 distancesand displaying images until it becomes difficult to find additionalexamples (the top candidates are all images of other people). Tomake the selection process more time-efficient, we implementedrange navigation and selection to label faces between L2 distanceranges at once if all or nearly all of the faces in the range are thecorrect person. Even so, the primary limitation of this approachis that the labeling time scales linearly with the frequency of theindividual in the data set.

S1.5.2 Methodology for detecting common individuals.

To detectcommon individuals, for whom it is impossible to browse all of theirdetections, we trained a simple logistic classifier on the FaceNet [11]features. We used Google Image Search [8] to find initial examplesand augment those by sampling faces from the data set that aresimilar to the examples in FaceNet descriptor space. For negativeexamples, we sample faces randomly and manually inspect therandom samples that are most likely (based on L2 distance) to bepositive examples. (This step is necessary because common indi-viduals such as Donald Trump are likely to appear in the negativesamples due to their high frequency in the data set.) We then usethese positive and negative examples to train a model. To improvethe model, we sampled faces for which the model produces low oliticians Notes Donald Trump Low recall from AmazonHillary Clinton Used for consistency to TrumpBarrack Obama Used for consistency to TrumpBernie Sanders Used for consistency to TrumpMitt Romney Used for consistency to TrumpDick Durbin Not identified by Amazon

News presenters

Ana Cabrera Not identified by AmazonBrian Shactman Not identified by AmazonBryan Illenas Not identified by AmazonDave Briggs Not identified by AmazonDavid Gura Not identified by AmazonDorothy Rabinowitz Not identified by AmazonDoug McKelway Not identified by AmazonEd Lavandera Not identified by AmazonGriff Jenkins Not identified by AmazonJason Riley Not identified by AmazonJillian Mele Not identified by AmazonJim Pinkerton Not identified by AmazonJJ Ramberg Not identified by AmazonLauren Ashburn Not identified by AmazonLeland Vittert Not identified by AmazonLouis Burgdorf Not identified by AmazonMaria Molina Not identified by AmazonNatalie Allen Not identified by AmazonNicole Wallace Not identified by AmazonPete Hegseth Not identified by AmazonRichard Lui Not identified by AmazonRick Folbaum Not identified by AmazonRick Reichmuth Not identified by AmazonRob Schmitt Not identified by AmazonToure Neblett Not identified by AmazonTrace Gallagher Not identified by AmazonYasmin Vossoughian Not identified by Amazon

Miscellaneous

George Zimmerman Used for consistency to MartinTrayvon Martin Not identified by Amazon

Table 4: Individuals for whom we use our own labels. We useour own labels when no labels from Amazon [1] are avail-able, the Amazon labels are known to have low precision orrecall, or to be consistent on major comparisons between in-dividuals labeled with our models and with Amazon. confidence scores ( ≈ .

5) and labeled these as new examples, re-peating the training and labeling process until finding new positiveexamples becomes challenging and model precision is sufficient(evaluated by visually validating the faces that are labeled positiveby the mode).

S1.6 Enumerating news presenters

TV news networks refer to their hosts and staff members using anumber of terms (e.g., hosts, anchors, correspondents, personalities, journalists); these terms vary by role and by network. We usethe term “news presenter” to refer broadly to anchors, hosts, andon-air staff (contributors, meteorologists, etc.) of a news network,and we manually enumerated 325 news presenters from the threenetworks Table 6. Our list of names consists of the staff listingson the public web pages of CNN, FOX, and MSNBC, accessed inJanuary 2020, and information manually scraped from Wikipediafor the top 150 shows by screen time (accounting for 96% of newscontent). Because content is shared between different channels ata network, the list for CNN also includes presenters from HLN,owned by CNN. NBC and CNBC presenters are also included inthe MSNBC list. We were unable to identify faces for 18 presentersand these individuals are excluded from the 325 presenters listed.These omitted individuals are either not recognized by AmazonRekognition [1], not in the video data set (e.g., presenters only onHLN or CNBC), or too rare to detect reliably in our data set (e.g., leftbefore January 1, 2010; joined after July 23, 2019; or very specificdomain experts).Most presenters are enumerated at the granularity of a channel;Anderson Cooper (who is a host on CNN) is considered to be a pre-senter in any CNN video, but would not be considered a presenteron FOX or MSNBC. We do not differentiate between presenter roles,and a presenter’s role may change over the decade as they are pro-moted or move from show to show. We also do not track the exactlength of employment for each presenter on a network; however,the screen time of presenters on a channel becomes negligible (nearzero) after they have left the network (due to changing employer,retiring, or being fired). Some presenters in our data set have movedbetween channels; for example, Ali Velshi left CNN in 2013 andjoined MSNBC in 2016. For individuals who were prominent politi-cal figures before becoming news presenters, we track presenterstatus at the show granularity (e.g., Mike Huckabee, Newt Gingrich,and David Axelrod). Table 6 lists all of the news presenters who weidentify.

S1.7 Computing “screenhog score” forpresenters “Screenhog score” is defined in the paper as the percentage of timethat a news presenter is on screen in the content portion of theirown show. We considered shows with at least 100 hours of newscontent when listing the top 25 news presenters by their screenhogscore.

S1.8 Age for news presenters

We successfully obtained birthdates for 98% of news presentersusing DBpedia [3] and manual Google and Wikipedia [12] search.For the birthdates queried from DBpedia, we manually verified theresults to eliminate common errors such as the wrong birthdatedue to the existence of another person of the same name. In a smallnumber of cases (1%), only the birth year was available; for theseindividuals, we compute their age from January 1 of their birthyear.We calculate the age of news presenters, weighted by screentime, by assigning each face identified as a news presenter withthe age (at day granularity) of the individual on the day that the ame Samples Est. precision Samples Est. recallU.S. political figures and candidates Amy Klobuchar 100 1.00 69 0.87Barack Obama † 100 1.00 85 0.86Ben Carson 100 0.99 132 0.85Bernie Sanders † 100 0.99 42 0.83Beto O’Rourke 100 1.00 50 0.58Bill Clinton 100 0.89 59 0.90Bill De Blasio 100 1.00 55 0.89Bobby Jindal 100 0.99 133 1.00Carly Fiorina 100 0.92 99 0.74Chris Christie 100 0.98 118 0.87Dick Durbin † 100 0.96 50 0.80Donald Trump † 100 0.91 65 0.83Elizabeth Warren 100 0.97 42 0.81Gary Johnson 100 0.99 124 0.84George W. Bush 100 0.72 71 0.80Harry Reid 100 0.97 137 0.83Herman Cain 100 1.00 100 0.90Hillary Clinton † 100 0.89 136 0.84Jeb Bush 100 0.96 79 0.92Jim Gilmore 100 0.98 157 0.94Jim Webb 99 0.99 158 0.89Joe Biden 100 1.00 66 0.91John Boehner 100 1.00 84 0.95John McCain 99 0.99 196 0.91Jon Huntsman Jr. 100 1.00 117 0.87Kamala Harris 99 0.97 55 0.93Kellyanne Conway 100 1.00 151 0.72Kevin McCarthy 100 1.00 70 0.97Lincoln Chafee 100 0.88 103 0.87Lindsey Graham 100 1.00 107 0.88Marco Rubio 100 1.00 93 0.85Martin O’Malley 100 0.92 129 0.86Michele Bachmann 100 0.91 104 0.92Michelle Obama 100 1.00 107 0.76Mike Huckabee 100 1.00 299 0.96Mitch McConnell 99 1.00 81 0.83Mitt Romney † 100 0.98 107 0.72Nancy Pelosi 100 1.00 37 0.87Newt Gingrich 100 0.98 226 0.94Orrin Hatch 100 0.99 115 0.94Paul Ryan 100 0.99 104 0.84Pete Buttigieg 100 0.99 25 0.96Rand Paul 100 1.00 140 0.94Rick Santorum 100 1.00 168 0.92Rick Perry 100 0.99 154 0.77Ron Paul 100 1.00 185 0.96Sarah Palin 100 1.00 126 0.85Steve Scalise 100 0.97 109 0.94Ted Cruz 100 1.00 102 0.85Tim Kaine 100 0.99 185 0.92Tulsi Gabbard 100 0.97 88 0.78

Miscellaneous

George Zimmerman † 100 0.98 131 0.79Trayvon Martin † 100 0.95 48 0.63

Table 5: Estimated precision is computed on ≈

100 randomly sampled faces identified as each individual. Estimated recall iscomputed on actual instances of each individual’s face found in a random sample of 2,500 faces, from 25 videos, known tocontain each individual. († indicates our models.) NN Ali Velshi (225.9 hours) Alison Kosik (104.3) Alisyn Camerota (271.1) Amanda Davies (3.4) Amara Walker (9.5)Ana Cabrera (305.7) Anderson Cooper (1782.3) Andrew Levy (0.0) Anthony Bourdain (110.8) Arwa Damon (50.1)Ashleigh Banfield (193.2) Barbara Starr (156.5) Becky Anderson (12.2) Ben Wedeman (61.9) Bianna Golodryga (16.0)Bill Hemmer (0.2) Bill Weir (16.0) Brian Stelter (188.6) Brianna Keilar (267.3) Brooke Baldwin (898.6)Campbell Brown (28.8) Candy Crowley (140.7) Carol Costello (311.4) Chris Cuomo (678.0) Christi Paul (84.1)Christiane Amanpour (72.6) Christine Romans (315.0) Clarissa Ward (33.0) Dana Bash (350.4) Dave Briggs (91.7)Deborah Feyerick (80.2) Don Lemon (1098.8) Drew Griffin (86.4) Ed Lavandera (57.0) Elizabeth Cohen (35.2)Erica Hill (57.4) Erin Burnett (539.6) Errol Barnett (63.5) Fareed Zakaria (230.3) Frederik Pleitgen (71.4)Fredricka Whitfield (477.8) Gary Tuchman (37.4) Gloria Borger (255.6) Hala Gorani (28.6) Howard Kurtz (39.0)Jake Tapper (376.3) Jamie Gangel (17.7) Jean Casarez (35.5) Jeff Zeleny (115.2) Jeffrey Toobin (270.6)Jessica Yellin (73.1) Jim Acosta (220.9) Jim Sciutto (282.3) Joe Johns (118.3) John Berman (584.2)John King (377.0) John Roberts (46.6) John Vause (62.2) John Walsh (20.6) Kate Bolduan (322.6)Kathleen Parker (21.7) Kiran Chetry (54.5) Kristie Lu Stout (4.2) Kyra Phillips (105.1) Kyung Lah (47.9)Larry King (78.9) Lisa Ling (25.2) Lou Dobbs (0.3) Lynda Kinkade (5.4) Lynn Smith (0.2)Martin Savidge (91.7) Max Foster (34.4) Michael Smerconish (177.6) Michelle Kosinski (49.2) Miguel Marquez (0.2)Mike Galanos (2.4) Mike Rogers (50.0) Mike Rowe (4.8) Morgan Spurlock (13.3) Natalie Allen (75.5)Nic Robertson (135.5) Nick Paton Walsh (65.3) Pamela Brown (110.2) Paula Newton (17.3) Piers Morgan (404.2)Poppy Harlow (209.5) Rachel Nichols (31.6) Randi Kaye (148.0) Richard Quest (90.5) Richard Roth (7.4)Robin Meade (2.0) Rosemary Church (81.6) S. E. Cupp (45.7) Sanjay Gupta (200.1) Sara Sidner (21.0)Soledad O’Brien (91.6) Stephanie Cutter (14.8) Susan Hendricks (19.3) Suzanne Malveaux (130.8) T. J. Holmes (114.4)Tom Foreman (44.0) Van Jones (156.2) Victor Blackwell (113.8) W. Kamau Bell (43.9) Wolf Blitzer (800.1)Zain Asher (23.4) Zain Verjee (24.2)

FOX

Abby Huntsman (51.3) Ainsley Earhardt (211.9) Alan Colmes (65.3) Alisyn Camerota (141.3) Andrea Tantaros (177.5)Andrew Levy (160.1) Andrew Napolitano (122.6) Angela McGlowan (23.7) Anna Kooiman (78.6) Ari Fleischer (31.9)Arthel Neville (108.9) Bill Hemmer (383.0) Bill O’Reilly (1093.8) Bob Beckel (268.1) Brenda Buttner (34.8)Bret Baier (536.7) Brian Kilmeade (638.4) Brit Hume (171.7) Bryan Llenas (33.6) Byron York (77.8)Cal Thomas (13.9) Carol Alt (8.9) Casey Stegall (26.2) Charles Krauthammer (283.2) Charles Payne (98.1)Charlie Gasparino (45.9) Cheryl Casone (33.1) Chris Wallace (374.3) Clayton Morris (217.4) Dagen McDowell (44.4)Dana Perino (437.2) Daniel Henninger (53.1) Dave Briggs (70.1) David Asman (50.1) David Hunt (1.0)Dorothy Rabinowitz (7.2) Doug McKelway (68.9) Ed Henry (313.4) Ed Rollins (32.1) Elisabeth Hasselbeck (85.4)Elizabeth Prann (25.0) Ellis Henican (6.2) Eric Bolling (394.9) Eric Shawn (128.7) Fred Barnes (10.8)Geraldo Rivera (232.3) Gerri Willis (27.8) Glenn Beck (288.1) Greg Gutfeld (782.5) Greta van Susteren (487.5)Gretchen Carlson (268.1) Griff Jenkins (31.6) Guy Benson (52.6) Harris Faulkner (291.7) Heather Childers (201.4)Howard Kurtz (227.0) James Taranto (4.6) Jane Hall (0.1) Janice Dean (41.6) Jason Riley (25.3)Jeanine Pirro (514.4) Jedediah Bila (71.3) Jehmu Greene (21.3) Jennifer Griffin (57.9) Jesse Watters (290.7)Jillian Mele (118.9) Jim Pinkerton (24.6) John Fund (20.1) John Roberts (65.5) John Stossel (119.8)Jon Scott (300.3) Juan Williams (367.0) Judith Miller (51.3) Julie Banderas (98.2) Karl Rove (252.4)Katherine Timpf (60.2) Katie Pavlich (83.4) Kelly Wright (71.9) Kevin Corke (40.1) Kimberley Strassel (56.0)Kimberly Guilfoyle (258.7) Kristen Soltis Anderson (10.1) Laura Ingle (31.0) Laura Ingraham (498.0) Lauren Ashburn (9.5)Lauren Green (8.6) Leland Vittert (136.6) Leslie Marshall (73.3) Manny Alvarez (12.2) Mara Liasson (25.5)Maria Bartiromo (81.0) Maria Molina (67.1) Mark Fuhrman (29.1) Mark Levin (55.4) Martha Maccallum (562.5)Megyn Kelly (790.9) Melissa Francis (84.6) Michael Baden (19.5) Mike Emanuel (99.7) Molly Henneberg (28.4)Molly Line (30.7) Monica Crowley (89.3) Neil Cavuto (737.6) Paul Gigot (98.7) Pete Hegseth (246.6)Peter Doocy (87.9) Phil Keating (35.9) Rachel Campos-Duffy (11.2) Raymond Arroyo (21.9) Rich Lowry (37.0)Rick Folbaum (42.8) Rick Reichmuth (80.7) Rob Schmitt (51.0) Robert Jeffress (19.4) Sandra Smith (71.7)Sean Hannity (1071.8) Shannon Bream (416.0) Shepard Smith (360.2) Steve Doocy (450.4) Steve Hilton (81.2)Stuart Varney (126.2) Tammy Bruce (60.5) Tom Shillue (145.5) Tomi Lahren (15.8) Trace Gallagher (131.7)Trish Regan (44.5) Tucker Carlson (865.3) Uma Pemmaraju (48.3) Walid Phares (28.2) William Bennett (16.9)

MSNBC

Abby Huntsman (29.0) Al Sharpton (286.7) Alec Baldwin (2.5) Alex Wagner (174.8) Alex Witt (261.3)Ali Velshi (242.4) Andrea Canning (4.9) Andrea Mitchell (392.2) Andrew Ross Sorkin (11.0) Angie Goff (1.3)Anne Thompson (10.1) Ari Melber (395.0) Ayman Mohyeldin (150.4) Betty Nguyen (29.5) Bill Neely (20.6)Brian Shactman (213.3) Brian Sullivan (13.3) Brian Williams (282.5) Carl Quintanilla (0.8) Chris Hayes (839.5)Chris Jansing (254.8) Chris Matthews (1103.8) Chuck Todd (550.3) Contessa Brewer (49.8) Craig Melvin (173.4)David Faber (1.2) David Gura (54.9) Donny Deutsch (53.3) Dylan Ratigan (109.7) Ed Schultz (493.0)Frances Rivera (44.2) Greta van Susteren (21.7) Hallie Jackson (105.0) Jim Cramer (8.0) Jj Ramberg (30.8)Joe Scarborough (940.4) John Heilemann (147.1) Jose Diaz-Balart (88.3) Josh Mankiewicz (13.1) Joy-Ann Reid (337.1)Kasie Hunt (112.6) Kate Snow (51.6) Katy Tur (187.1) Kayla Tausche (2.1) Keith Olbermann (109.7)Kelly Evans (0.7) Kelly O’Donnell (57.2) Kerry Sanders (25.2) Kristen Welker (212.2) Krystal Ball (91.0)Lawrence O’Donnell (688.0) Lester Holt (13.1) Louis Burgdorf (29.8) Lynn Smith (28.0) Mara Schiavocampo (18.5)Mark Halperin (158.9) Martin Bashir (114.7) Matt Lauer (8.4) Melissa Harris-Perry (197.9) Meredith Vieira (1.2)Miguel Almaguer (9.3) Mika Brzezinski (696.7) Mike Viqueira (46.9) Natalie Morales (4.7) Nicole Wallace (175.9)Pete Williams (105.4) Peter Alexander (97.5) Rachel Maddow (1201.7) Rehema Ellis (7.2) Richard Engel (114.2)Richard Lui (146.4) Rick Santelli (1.3) Ron Mott (16.7) Ronan Farrow (31.4) Savannah Guthrie (43.9)Seema Mody (1.5) Stephanie Gosk (14.0) Stephanie Ruhle (111.5) Steve Kornacki (358.6) Steve Liesman (4.7)Sue Herera (1.8) Tamron Hall (200.5) Thomas Roberts (198.8) Tom Brokaw (29.2) Tom Costello (24.5)Toure Neblett (65.4) Willie Geist (319.3) Yasmin Vossoughian (66.6)

Table 6: Compiled list of news presenters and their screen time in hours. Note that the percentage of female-presenters in thenews presenter list is 52%, 42%, and 44% on CNN, FOX, and MSNBC, respectively. igure 2: Example frames where news presenters (AndersonCooper; Megyn Kelly, Bret Baier; Rachel Maddow) appear instill images and non-live video. video aired. The average age, weighted by screen-time correspondsto the expected age of a news presenter sampled randomly.Note that our methodology assumes that the video was aired thesame day that it was recorded and does not account for old clips orstill images (Figure 2). S1.9 Hair color for news presenters

Two of the authors independently labeled the visible hair color foreach male and female news presenter in 25 frames sampled from thedata set. There were five possible labels (blond, brown, black, red,white/gray, and bald). For each news presenter, we calculated themajority label according to each rater. The inter-rater agreementfor the majority label for female news presenters was 92.4%. Inthese cases, the majority label was used in the analysis as the haircolor label. The two raters reviewed and agreed upon a hair colorlabel for the 11 female news presenters where their majority labelsdid not match. Figure 3 shows example faces from each hair colorgroup for the female news presenters that we analyzed.For male presenters, the data was not analyzed because therewas much lower inter-rater agreement (75%). One major cause ofinter-rater disagreement was confusion over when to apply thebald and white/gray hair labels. There was only one white-hairedfemale presenter in the data set, and no bald female presenters,contributing to lower disagreement. (a) Blonde(b) Brown(c) Black(d) Other

Figure 3: Random image of each female-presenting newspresenter, grouped by hair color label. a) Image 1 (b) Image 2(c) Image 3 (d) Image 4 Figure 4: Examples of the top four images of Trayvon Martin.Images can have different backgrounds, color tone, sharp-ness, and contrast as a result of editing while the source im-age remains the same.

S1.10 Images/video of Trayvon Martin andGeorge Zimmerman

We use our own identity labels for Trayvon Martin and George Zim-merman because both individuals are rare overall in the data set andthey are not reliably identified by Amazon’s Celebrity RecognitionAPI [1].First, we separate out faces by their source image (before anyediting). In the case of George Zimmerman, who is alive, we makea best effort to group faces from the same source event or setting(e.g., court appearances, interview). Note that the same image canbe edited differently, have text overlays, and differ in aspects suchas tonality and background (see Figure 4 for examples).For each individual, we use the FaceNet [11] descriptors (de-scribed in S1.3) and perform a clustering (in the embedding space)of the faces that we previously identified as the individual. Wecluster with a human-in-the-loop, by constructing a 1-NN classifier(i.e., exact nearest neighbor). We select faces which correspond tounique source images, partition the faces, and then visually exam-ine the resulting clusters. Examining the clusters can reveal newsource images or misclassified images; the human can create newlabels, fix existing labels, and repeat the process. We repeat theprocess until the clusters are clean (e.g., over 90% precise). We findthat using a 1-NN classifier is sufficient and that only a small num-ber of manual labels are needed (fewer than 200) to obtain goodprecision and recall in the clusters (Table 7). Figure 4 and Figure 5show examples from the top four clusters for Trayvon Martin andGeorge Zimmerman, respectively.

S1.11 Counting foreign country names

To identify the set of most frequently mentioned countries, weconstructed a list of country and territory names from [5], whichincludes all countries and territories with ISO-3166-1 country codes.We manually augment the list with country name aliases; for ex-ample, the

Holy See and

Vatican are aliases of one another andeither term is counted as Vatican City. A few countries such asMexico and Georgia are substrings of U.S. state names, leading toover-counting in the results. To address this issue, we exclude oc-currences of

Mexico that are preceded by

New and we omit

Georgia (a) Video 1 (b) Image 2(c) Video 3 (d) Image 4

Figure 5: Examples of the top four image and video clustersfor George Zimmerman.Trayvon Martin

Precision (500 samples) 0.996 0.978 0.988 0.986Recall (500 samples) 1.000 1.000 1.000 0.994

George Zimmerman

Contains video? yes no yes noPrecision (500 samples) 0.970 0.996 0.948 0.990Recall (500 samples) 0.941 1.000 1.000 1.000

Table 7: Estimated precision and recall for the top clustersfor Trayvon Martin and George Zimmerman. For each clus-ter (say X), we estimate precision by sampling randomly inX and counting false positives. To estimate the number offalse negatives (for recall) we sample faces randomly fromall other clusters and count the number of faces that belongin cluster X, but were wrongly assigned. The precision esti-mate is used to estimate the number of true positives. entirely. (Mentions of Georgia in U.S. cable TV news overwhelm-ingly refer to the U.S. state and not the country.)

S1.12 Counting terrorism, mass shooting, andplane crash N-grams

To measure how long the media continues to cover events afterthey take place, we counted the number of times words related toterrorism, mass shootings, and plane crashes appear following anevent. Table 8 and Table 9 show the events that were included in theanalysis. For terrorism, we count instances of terror(ism,ist) , attack , shooting , stabbing , and bombing , which refer to the attackitself; for mass shootings, the list is shoot(ing,er) , which refersto the shooting or the mass shooter (searching more restrictivelyfor instances of mass shoot(er,ing) yields a similar result, butsometimes mass is omited in the news coverage); and for planecrashes the list is (air)plane or airliner followed by crash or missing . Because the keywords to measure news coverage are dif-ferent between each category of event, the raw counts are notdirectly comparable across categories. ate Event VictimsTerrorist attacks (U.S.) Terrorist attacks (Europe)

Mass shootings

Table 8: Major events included in the list of terrorist attacksand mass shootings. Date Plane crashes Deaths

Table 9: Plane crashes included in the analysis. This list in-cludes all of the commercial airline crashes from 2010 to2019 involving at least 50 fatalities.

S1.13 Counting illegal and undocumentedimmigration N-grams

We count the number of times that N-grams related to “illegal” and“undocumented” immigration appear in the captions to measurethe prevalence of both terms in discussion around immigration.The N-grams used to measure uses of “illegal” are illegalimmigrant(s) , illegal immigration , illegals , and illegalalien(s) . For “undocumented”, the N-grams are undocumentedimmigrant(s) , undocumented immigration , and undocumentedalien(s) . S1.14 Counting usage of the presidenthonorific in reference to Trump andObama

We measure the number of times the “president” honorific is usedwhen addressing each president. This requires classifying occur-rences of the word

Trump (and also

Obama ) in captions as havingthe “president” honorific, not having the honorific (e.g.,

DonaldTrump or just

Trump ), or not referring to his person (e.g., TrumpUniversity).For Donald Trump, we only count exact matches of

PresidentTrump or President Donald Trump as uses of “president”. To countoccurrences of without the honorific, we exclude occurrencespreceded by president and instances followed by administration , ampaign , university , and care , which are used in compoundnouns with Trump . We also exclude occurrences preceded by the (e.g., to filter out other compound nouns of the form the Trump... ); note that this also removes the Trump presidency , which isnot referring to his person, but his presidency. Finally, we excludeDonald Trump’s immediate family:

Melania , Ivanka , Eric , Barron ,and [Donald Trump] Jr . These exclusions of nouns related toTrump (but not directed at his person) were selected by visualexamination of the top 100 bigrams containing Trump .The methodology for counting references to Barack Obama isidentical, except that the excluded family members are

Michelle , Malia , and

Sasha . S1.15 Measuring visual association betweenwords and male/female-presentingscreen time

We compute the conditional probabilities of any male- or anyfemale-presenting face being on screen when a word appears inthe text.The majority of the words in the data set (including rare words,but also misspellings) occur very infrequently – 95.6% of uniquetokens appear fewer than 100 times in the data set. Because thereare few face detection events corresponding to these words, theirconditional probability has high variance, often taking on extremevalues. In order to remove these words and to make the computationpractical, we considered only words that appear at least 100 timesin the captions.From the remaining tokens, we filter out NLTK English stopwords [4] and restrict our analysis to the most common wordsin the data set, considering only the top 10% of remaining words(words that occur over 13,462 times).We then rank the words according to the difference in conditionalprobability of female-presenting and male-presenting faces giventhe word appearing in the caption. The top and bottom words in thislist are the most strongly associated with the two presented genders.We report the top 35 words for each presented gender, manuallyfiltering out words in these lists that are human names (e.g.,

Alisyn is associated with female-presenting screen time because AlisynCamerota is a presenter on CNN) or news program names (whichassociate to the genders of hosts).The top female-associated word, futures is similar to otherhighly-ranked words in the list (

NASDAQ, stocks , but is also part ofthe name of a female-hosted TV program (

Sunday Morning Futures ).14.6% percent of futures mentions are part of the 3-gram sundaymorning futures

The word with the 14th-highest conditionalprobability newsroom is also both a common news-related wordand part of a news program name (

CNN Newsroom ). S1.16 Computing unique words for individuals

To determine which individuals and words have strong visual/tex-tual associations, we compute the amount of time each individualwas on screen while each word is said. This is used to calculatethe conditional probability that a person is on screen given theword being said. To filter out rare words, we only consider wordswith at least 100 occurrences across the decade. The words with conditional probabilities exceeding 50% for any individual are givenin Table. 1 in the paper.

S1.17 Measuring visual association betweennews presenters and the presidenthonorific

We extended the president honorific analysis (methodology in S1.14)to when various news presenters are on screen. The N-grams thatare counted remain the same as in S1.14. We start with the list ofnews presenters described in S1.6, but we only show news pre-senters with at least 100 total references to Trump and 100 totalreferences to Obama to ensure that there is sufficient data for a com-parison. This is to account for news presenter who retired beforeTrump became president or started after Obama stepped down.

S1.18 Measuring visual association betweenClinton and the word email

The Hillary Clinton email scandal and subsequent FBI investigationwas a highly polarizing issue in the 2016 presidential election. Tomeasure the degree to which Clinton is visually associated withthe issue, represented by the word “email”, we counted the numberof times “email(s)” was said, and the number of times it was saidwhile Clinton are on screen.We count occurrences of e mail(s) , email(s) , and electronicmail as instances of email being said in the captions. There are122K utterances of email in the captions between 2015 and 2017,while Hillary Clinton has 738 hours of screen time in the same timeperiod. Clinton’s face is on screen during 14,019 of those utterances. S1.19 Detecting interviews

Our algorithm for interviews in TV News searches for interviewsbetween a news presenter (the host) and a named guest X. Wesearch for segments where the guest and the host appear together,surrounded by the guest appearing alone or the host appearingalone. Combining these segments captures an alternating patternwhere a host appears, guest appears, ... that is indicative of aninterview. The pseudocode for this algorithm is shown in Rekall [6]in Figure 7.We applied this interview detection algorithm on 44 peopleacross our whole data set. These individuals are listed in Table 10.We exclude Barack Obama, Donald Trump, and Hillary Clintondue to those individuals appearing too often in video clips and stillimages. Their appearances along with hosts are often misclassifiedas interviews. For example, Donald Trump may be shown in a stillimage or giving a speech while the news content cuts back andforth to a host providing commentary (Figure 6). Events such astown-hall gatherings are sometimes also confused as interviews. Asthe leading candidates and presidents, Trump, Clinton, and Obamaappear the most often in these contexts.We validated our interview detection algorithm by annotating100 cable TV news videos which contain interviews for three inter-viewees: Bernie Sanders, Kellyanne Conway, John McCain. Table 11shows the estimated precision and recall numbers for the threeinterviewees, as well as the total amount of interview screen timein ground truth for each interviewee. nterviewee Hours John McCain 124.4Bernie Sanders 107.8Rand Paul 98.0Lindsey Graham 93.3Rick Santorum 91.9Marco Rubio 87.9Kellyanne Conway 77.7Sarah Palin 72.0Paul Ryan 67.5John Kasich 63.5Ted Cruz 61.5Chris Christie 61.5Mitt Romney 58.9Ben Carson 49.1Elizabeth Warren 35.4Mitch McConnell 34.7Carly Fiorina 33.7Cory Booker 31.3Kevin McCarthy 31.0Tim Kaine 29.4Chuck Schumer 28.9Nancy Pelosi 28.9Amy Klobuchar 28.5Jeb Bush 26.8Dick Durbin 25.8John Boehner 24.6Joe Biden 24.2Bill Clinton 22.0Bill De Blasio 19.6George W. Bush 19.2Steve Scalise 18.2Bobby Jindal 17.3Orrin Hatch 15.1Martin O’Malley 14.6Kamala Harris 12.9John Cornyn 10.3Tulsi Gabbard 9.6Harry Reid 7.6Pete Buttigieg 7.5Jim Webb 6.1Beto O’Rourke 5.3Lincoln Chafee 4.4Michelle Obama 2.3Jim Gilmore 1.6Newt Gingrich 185.3Mike Huckabee 95.8

Table 10: Detected interview time for prominent U.S. politi-cal figures. Newt Gingrich and Mike Huckabee are listed sep-arately because they are both hosts (news presenters) andpoliticians. (a) A real interview.(b) Not an interview.

Figure 6: Example frames from a real and incorrectly de-tected interview. Note that both follow a pattern of a hostand guest being on screen, together and alone. The incor-rectly detected interview contains videos and graphics ofDonald Trump in lieu of his live person. As the presidentsand leading candidates, Trump, Clinton, and Obama are dis-cussed at length by hosts in visual contexts that appear sim-ilar to interviews.Interviewee Hours Precision Recall

Bernie Sanders 3.5 91.7% 97.5%Kellyanne Conway 2.2 91.8% 89.1%John McCain 0.9 86.0% 99.5%

Table 11: Precision and recall numbers for the interview de-tector across 100 hand-annotated videos as well as the totalamount of interview screen time in ground truth for eachinterviewee. faces = rekall.ingest(database.table("faces"), 3D) guest_faces = faces.filter( face: face.name = guest_name) host_faces = faces.filter( face: face.is_host) guest_segs = guest_faces.coalesce( predicate = time_gap < 30s, merge = time_span) host_segs = host_faces.coalesce( predicate = time_gap < 30s, merge = time_span) guest_and_host_segs = guest_segs.join( host_segs, predicate = time_overlaps, merge = time_intersection) guest_alone_segs = guest_segs.minus( guest_and_host_segs) interview_segs = guest_and_host_segs.join( guest_alone_segs, predicate = before or after, merge = time_span) interviews = interview_segs .coalesce() .filter(interval: interval["t2"] - interval["t1"] >= 240s) Figure 7: Rekall [6] query to retrieve interviews between ahost and a named guest (e.g., Bernie Sanders).

S2 ADDITIONAL ANALYSESS2.1 Who is in the news?

S2.1.1 How much time is there when at least one face is on screen incommercials?

Recall from the paper that the percentage of screentime when a face is on screen in news content has risen from by8.6%, from 72.9% in 2010 to 81.5% in 2019. This same percentage hasonly risen slightly in commercials in the same timespan (38% to41%), suggesting that the increase is not solely due to improvementsin video quality.The average number of detected faces visible on screen is 1.38in news content and 0.49 in commercials, and these figures varylittle between channels. There is a rise in the number of detectionsover the decade, across all three channels, from 1.2 in 2010 to 1.6 in2019, with much of the increase since 2015 (Figure 10). By contrast,the average number of faces on screen in commercials rises from0.42 to 0.52, with the much of the increase occurring before 2012.

S2.1.2 What is the average size of faces?

The average size of de-tected faces in news content, as a proportion of the frame heighthas also risen slightly from 33% to 35% on CNN and 33% to 36% onMSNBC, but has fallen from 33% to 31% on FOX (Figure 11a). Withincommercials, the change is less than 1% on CNN and MSNBC, buthas fallen from 38% to 34% on FOX (Figure 11b). Note that somevideos have black horizontal bars on the top and the bottom due tothe video resolution not matching the aspect ratio as an artifact ofthe recording (16:9 inside 4:3). We excluded these black bars fromthe frame height calculation.

S2.1.3 Did the screen time given to presidential candidates varyby channel?

There is some variation in the screen time given tocandidates across channels, but the overall patterns are similar tothe aggregate patterns described in the paper (Figure 8).

S2.1.4 Do shows presented by female-presenting news presentersgive more screen time to women overall?

An individual show’s over-all gender balance is skewed by the gender of its host. For example,the show with the greatest female-presenting screen time is

MelissaHarris-Perry on MSNBC and the show with the greatest male screentime is

Glenn Beck on FOX.We use the percentage of female-presenting news presenterscreen time out of total news presenter screen time to measure theextent to which a show is female- or male-presented. As a measureof the gender balance for female-presenting individuals who are notpresenters (non-presenter), we compute the percentage of female-presenting screen time for faces not identified as a news presenterout of the time for all faces that are not identified as a presenter. Wemeasured the linear correlation between these two percentages toevaluate whether shows that lean toward more female-presentingnews presenter screen time also have more screen time for female-presenting non-presenters in general. To exclude short-lived showsand special programming, we limited the analysis to shows with atleast 100 hours of news content.We find no correlation on CNN ( 𝑠𝑙𝑜𝑝𝑒 = . , 𝑅 = .

02) andFOX ( 𝑠𝑙𝑜𝑝𝑒 = − . , 𝑅 = . 𝑠𝑙𝑜𝑝𝑒 = . , 𝑅 = .

19) (Figure 13). This suggeststhat shows hosted by female-presenting news presenters do notgive proportionally more screen time to female-presenting subjects

012 20132011 2012 201320112016 20172015 2016 20172015 2016 20172015 Republican candidates in 2012 Democratic candidates in 2012Republican candidates in 2016 Democratic candidates in 2016 P e r c e n t o f s c r ee n t i m e Election day

Ted Cruz Bernie SandersHillary Clinton Bernie SandersHillary Clinton (a) CNN(b) FOX(c) MSNBC

CNNFOXMSNBC CNNFOXMSNBC CNNFOXMSNBC CNNFOXMSNBC

Donald Trump

Figure 8: Donald Trump received more screen time than any other Republican candidate in the 2016 election season. Thedifference is most pronounced on MSNBC. Hillary Clinton and Bernie Sanders received similar amounts of screen time duringthe competitive period of the presidential primary season (January to May, 2016). Compared to CNN and MSNBC, FOX gaveless screen time to the Democratic candidates in 2016. In the 2012 election season, Mitt Romney did not dominate screen timeof the Republican candidates until much later in the primary season. Michelle Bachmann received a much larger peak onCNN in January 2012 on CNN (compared to FOX and MSNBC) before the Iowa caucuses and after she dropped out of the race.Finally, both Barack Obama, the incumbent Democratic president, and Mitt Romney received more screen time on MSNBCthan on CNN and FOX. P e r c e n t o f s c r ee n t i m e Overall CNN FOX MSNBC CommercialsNews contentVideo decoding issues on CNN

Figure 9: The percentage of time when faces are on screenhas increased for news content, but has remained static incommercials since 2013. A v e r a g e n u m b e r o f f a c e s o n s c r ee n Figure 10: The average number of faces on screen has in-creased on all three channels. A v e r a g e f a c e h e i g h t a s a p e r c e n t a g e o f f r a m e h e i g h t (a) News content(b) CommercialsOverall CNN FOX MSNBC Figure 11: The average height of faces on screen has re-mained mostly constant in both news content and commer-cials, but there is some variation within the decade. The av-erage heights of faces in news content and commercials aresimilar. and guests. Our result contrasts with findings by the GMMP [7]that female journalists write disproportionately more articles aboutfemale subjects.

S2.1.5 Which politicians get interviewed? Which presenters do inter-views?

Interviews are one of the ways that cable TV news channelsbring on experts and provide politicians with a platform to express an 2015 Jul 2015 Jan 2016 Jul 2016 Jan 2017 Jan 2015 Jul 2015 Jan 2016 Jul 2016 Jan 2017 Jan 2015 Jul 2015 Jan 2016 Jul 2016 Jan 2017(a) CNN (b) FOX (c) MSNBC0%20%40% P e r c e n t o f c o u n t s ( p e r w ee k ) Email scandal gains national attention U.S. House Benghazi Committee hearingClinton & email(s)Clinton &

CNN FOX MSNBC

Figure 12: The visual association between Hillary Clinton’s face and the word “emails” follows a similar trend on all threechannels, far exceeding the baseline association between Clinton being on screen and any arbitrary word being said. FromJuly to October, 2015, Clinton is shown the most on MSNBC (peaking at 40%) when email is said.

0% 50% 100% 0% 50% 100% 0% 50% 100%0%50%100% P e r c e n t o f f e m a l e n o n p r e s e n t e r s c r ee n t i m e Percent of female news presenter screen time

CNN FOX MSNBC (a) CNN (b) FOX (c) MSNBC

Figure 13: There is little correlation between shows that arepredominantly presented by female-presenting news pre-senters and shows with the most screen time for female-presenting faces who are not news presenters. their views. We find interviews by looking for continuous segmentsof video when a presenter (interviewer) and interviewee are onscreen together and/or alternating back and forth (details in sub-section S1.19). Empirically, we found that this identifies interviewsegments for 44 prominent American political figures that we tested(including 17 2016 US presidential candidates). (Note: we excludeBarack Obama, Mitt Romney, Donald Trump, and Hillary Clintonbecause they appear too frequently in non-interview contexts, lead-ing to low precision in detecting interviews. Newt Gingrich andMike Huckabee, who are both hosts and political figures, are alsoexcluded.)In the interviews that we detected, John McCain is featured themost. Many of the top interviewees among the individuals that wetested are Republicans. This is due to our biased sampling toward2016 presidential candidates and the relatively competitive andcrowded Republican primary (compared to the Democratic primarythat year). The top three interviewers are all hosts on FOX; Gretavan Susteren (former host of

On the Record on FOX) is the mostprolific.

S2.1.6 What is the visual layout of interviews?

In the majority ofinterviews, the host appears on the left (split-screen) or in themiddle, while the interviewee typically appears on the right (split-screen) or in the middle (Figure 15). This is in contrast to late nighttalk shows, which place the host on the right.

S2.2 What is discussed?

S2.2.1 Does any channel cover foreign countries more than the oth-ers?

The number of times that foreign countries appear in the text M i n u t e s o f i n t e r v i e w t i m e Figure 14: Interview time of the 44 politicians (interviewees)tested and hosts (interviewers). Note: Bernie Sanders is la-beled Democratic due to his affiliation in the 2016 primary. P e r c e n t o f s c r ee n t i m e Screen X coordinate of face centroid HostInterviewee

Figure 15: In interviews, the host appears overwhelminglyon the left or in the middle; interviewees appear in the mid-dle or on the right. captions oscillates over time, likely due to major events occurringabroad (Figure 16). However, all three channels follow a similartrajectory.

S2.3 Who is on screen when a word is said?

S2.3.1 Did different channels visually associate Hillary Clinton morewith the word email than others?

Figure 12 shows the percentageof times when email is said and when Hillary Clinton is on screen. o u n t ( p e r m o n t h ) Figure 16: The number of times when foreign countrynames appear in the news oscillates. The peaks on all threechannels are concurrent, but until 2017, the count of for-eign country names was higher on CNN than on FOX andMSNBC.

REFERENCES [1] 2020. Amazon Rekognition - Automate your image and video analysis withmachine learning. https://aws.amazon.com/rekognition.[2] 2020. Internet Archive: TV News Archive. https://archive.org/details/tv.[3] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak,and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In

Proceed-ings of the 6th International The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference (Busan, Korea) (ISWC’07/ASWC’07) . Springer-Verlag,Berlin, Heidelberg, 722–735.[4] Edward Loper Bird, Steven and Ewan Klein. 2009.

Natural Language Processingwith Python . O’Reilly Media Inc, Sebastopol, CA, USA.[5] Duncalfe, Luke. 2020. ISO-3166 Country and Dependent Territories Lists with UNRegional Codes. https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes.[6] Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong,Avanika Narayan, Maneesh Agrawala, Christopher Ré, and Kayvon Fatahalian.2019. Rekall: Specifying Video Events using Compositions of SpatiotemporalLabels. arXiv preprint arXiv:1910.02993

ACM Trans. Graph.

37, 4, Article 138 (July 2018),13 pages. https://doi.org/10.1145/3197517.3201394[11] Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: Aunified embedding for face recognition and clustering. In

Proceedings of the IEEEconference on computer vision and pattern recognition

IEEESignal Processing Letters

23, 10 (2016), 1499–1503.23, 10 (2016), 1499–1503.