[PDF] Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments

Abstract

This study presents a corpus of turn changes between speakers in U.S. Supreme Court oral arguments. Each turn change is labeled on a spectrum of "cooperative" to "competitive" by a human annotator with legal experience in the United States. We analyze the relationship between speech features, the nature of exchanges, and the gender and legal role of the speakers. Finally, we demonstrate that the models can be used to predict the label of an exchange with moderate success. The automatic classification of the nature of exchanges indicates that future studies of turn-taking in oral arguments can rely on larger, unlabeled corpora.

Full PDF

PPardon the Interruption: An Analysis of Gender and Turn-Taking in U.S.Supreme Court Oral Arguments

Haley Lepp , Gina-Anne Levow Educational Testing Service, University of Washington, [email protected], [email protected] Abstract

This study presents a corpus of turn changes between speakersin U.S. Supreme Court oral arguments. Each turn change is la-beled on a spectrum of cooperative” to competitive” by a humanannotator with legal experience in the United States. We ana-lyze the relationship between speech features, the nature of ex-changes, and the gender and legal role of the speakers. Finally,we demonstrate that the models can be used to predict the labelof an exchange with moderate success. The automatic classi-ﬁcation of the nature of exchanges indicates that future studiesof turn-taking in oral arguments can rely on larger, unlabeledcorpora. Index Terms : gender, speech, emotion recognition, computa-tional linguistics

1. Introduction

The Supreme Court plays a key role in deﬁning, identifying, androoting out gender discrimination by hearing the cases that willdetermine the way gender rights are evaluated across the UnitedStates. However, there are few checks on the presence of genderbias within the court itself. This study offers a novel corpusof annotated speech from Supreme Court oral arguments andproposes a framework to analyze gender biases in turn changes.For decades, scientists have argued that women are inter-rupted more than men in professional settings, indicating thatthis speech act could be an indicator of gender bias. The

NewYork Times has described “being interrupted, talked over, shutdown or penalized for speaking out” as “nearly a universal ex-perience for women when they are outnumbered by men” [1].Interruptions correlating with gender within Supreme Courtoral arguments have occurred consistently over time, and arenot necessarily due to political polarization or the personalitiesof justices [2]. However, in conversational turn-taking, an in-terruption is not inherently a negative act. As demonstrated byTannen [3] in her research on gender and language, interrup-tions cannot be deﬁned categorically as acts of rudeness or dom-inance. Interruptions can be part of regular discourse depend-ing on the context of a conversation, and are especially commonamong speakers of certain social groups in the United States.Furthermore, the term “interruption is not a clear-cut lin-guistic term. Interruptions have variously been described as anoverlap in speech between two speakers [4], possibly includingbackchannels [5], a “power type event to wrest the discoursefrom the speaker[6], a “topic change attempt[6], an event to“bolster the interruptees positive face[6], or a syntactically in-complete turn [7].To address this, we annotate a corpus of turn changes as au- This work was completed while the ﬁrst author was a graduate stu-dent at the University of Washington. dio segments on a spectrum of cooperative to competitive. Todemonstrate the utility of the corpus, we extract speech featuresand show that classiﬁers can automatically predict the humanlabels of the turn changes with relative success.

2. Audio and transcription retrieval

The transcripts and audio recordings of all U.S. Supreme Courtoral arguments since October 2006 are publicly available on-line. The transcripts, written by court stenographers, are for-matted like the script of a play, with the name of the speakerfollowed by the transcribed speech. The transcripts include dis-ﬂuencies and speech that ends mid-sentence or word [8].The transcriptions do not include the time at which eachstatement is said, so we retrieve time stamps of turns or sen-tences (whichever are shorter) from

The Oyez Project [9].

We deﬁne a turn change as an event in which one speaker stopsspeaking and a second speaker starts speaking according to thetranscript. For example:Hannah S. Jurss: And so we’re certainly askingfor this Court’s –John G. Roberts, Jr.: But I’m not faulting themfor that.[10]Using the start and end time-stamp of each speakers turn,we segment each argument into short audio clips around turnchanges. Multiple studies have demonstrated that listeners canperceive signiﬁcant social and emotional information from ashort slice of an audio, despite not knowing the greater con-text of a conversation [11, 12]. The brevity of clips also ensuresthat the annotators, who are busy professionals, do not lose in-terest in the task. Also, because this study aims to ﬁnd patternsin speech without regard to the subject matter of the case, limit-ing the content which an annotator can listen to can help avoidannotator bias.The default length of a segment is six seconds long: twoseconds before the end-label of the ﬁrst speaker and four sec-onds after the start label of the next speaker. If the turn of theﬁrst speaker is less than two seconds long, then we use the startof the ﬁrst speakers turn as the start of the turn change, insteadof a full two seconds of audio. If the turn of the second speakeris less than four seconds long, then we use the end of that speak-ers turn as the end of the turn change.We manually check each segment and remove those inwhich at least one speaker is inaudible (probably due to the The corpus of turn changes and annotations is available for publicuse at https://github.com/hlepp/pardontheinterruption. a r X i v : . [ c s . C L ] S e p tenographer hearing something not picked up by the micro-phone), turns that are listed as separate in the transcripts butare the same person with a pause, and turns that are scripted,such as Mr. Chief Justice, and may it please the court. We trimrecordings by no more than one second if another adjacent turnchange occurs that makes it unclear what an annotator might belabeling. We extend recordings by no more than one second ifthe change becomes more clear with extension; this is usuallydue to timestamp rounding cutting off a very short turn.We update speaker names if the ordering is wrong or nameswere incorrect. For example, if a number of turns occur in quicksuccession or there are two or more speakers talking at the sametime, we change the label so that the ﬁrst and second speak-ers heard are the ﬁrst and second speakers listed. Exchangeswith fully overlapped speech are checked to ensure the order ofspeakers aligns with human perception. The corpus includes 711 turn changes from four oral arguments:Kahler v. Kansas [13], Mitchell v. Wisconsin [10], VirginiaHouse of Delegates v. Bethune-Hill [14], and Washington StateDept. of Licensing v. Cougar Den Inc. [15]. A typical case isheard by nine justices (three female, six male) and two or threeattorneys (the corpus includes seven female and ﬁve male).Each of the selected trials occurred in 2018 or 2019, covers aunique topic, and includes at least one female arguing beforethe court. , The number of turns per attorney in the corpus ranges from27 to 128. For justices, each of whom appear in every oralargument, the number of turns per individual per trial rangesfrom 10 to 42; with one exception: Justice Clarence Thomasdoes not speak in any of the four arguments. Among justices,Justice Sonia Sotomayor and Justice Stephen Breyer are mostrepresented, with over 130 turns each.Table 1:

Information about Annotated Segments

Corpus Component Number

Male Participants 11, with Justice ThomasFemale Participants 9Justice to Attorney exchanges 338Attorney to Justice exchanges 351Justice to Justice exchanges 22Attorney to Attorney exchanges 0Female to Female exchanges 127Male to Male exchanges 269Female to Male exchanges 165Male to Female exchanges 150

3. Corpus annotation

The rules of conversational speech within a courtroom settingare not the same as those in informal conversational speech; The latter qualiﬁcation narrows the selection considerably: in 2018,only 15% of the people who argued before the court were women [16,17]. Gender information was gathered from public proﬁles of speak-ers. An expansion of the corpus should ensure such characteristics alignwith speakers self-identities. power-relationships, formal rules and procedures, and ﬁeld-speciﬁc argument strategies are among the many factors that in-ﬂuence the ways that speakers interact within an oral argument.In the annotation process, all annotators are required to be U.S.-based, and to identify as an attorney, judge, legal scholar, or lawstudent in their second year or above.

We design a brief, easy-to-complete, anonymous online surveyfor legal professionals. The survey is built on the JavaScriptlibrary JSPsych [18]. We instruct participants to categorize theshort clips on a spectrum of cooperative to competitive. Beforebeginning annotation, the participants are given descriptions ofeach category:By cooperative, we mean that to your ears, theﬁrst speaker expects a turn change and gives theﬂoor to the second speaker. The second speakermight leave space for the ﬁrst speaker to ﬁnishtheir turn. Or, the second speaker might talk atthe same time as the ﬁrst speaker, providing shortspurts of feedback, for example saying “mhmm”or “yes”.By competitive, we mean that to your ears, thesecond speaker competes with the ﬁrst speaker forthe chance to speak. You, the ﬁrst speaker, or lis-teners might perceive this as a disruption to theprevious speaker’s speech. The second speakermay cause the ﬁrst speaker to stop speaking, ortalk over the ﬁrst speaker to compete to be heard.as well as trial exercises and example audio clips that couldbe classiﬁed into each category. The descriptions demonstrate adivision between the two categories of turn changes, but makeclear that the participant should use their training and experi-ence to evaluate turn changes with nuance.After this short training, the annotators proceed to the tasks.Each task includes the prompt How competitive or cooperativedo you perceive this exchange to be?, which emphasizes to theparticipant that the annotation should be solely from their per-ception. Below this prompt is an audio element which the usercan control and a slider showing a spectrum from Competitiveto Cooperative with Likert-style category labels. The partici-pants are instructed to leave the slider in the middle if the cate-gory of the clip is unclear. The speaker is given up to 26 moresegments to listen to (a number selected to keep the entire an-notation exercise under ﬁve minutes). If the speaker leaves thesurvey before completion, all results are still saved.

Each segment is annotated twice by 2 of 77 unique annota-tors. Each speaker annotated between 1 and 26 segments, witha mean per person of 18, and a standard deviation of 6.2.As the way listeners perceive speech differs depending onmany factors, we ask participants to optionally share demo-graphic data to demonstrate that the age, gender, ethnic, po-litical, and linguistic diversity of listeners is relatively represen-tative of the diversity of the United States. This information isincluded in the corpus.

Annotators score each audio segment on a visual spectrum, asseen in Figure 1. The location on which an annotator placeshe slider is codiﬁed with a score between 0 and 100, in which0 represents the most competitive turn change, and 100 is themost cooperative turn change.Figure 1:

Slider given to annotators.

The distribution of labels in Figure 2 reﬂects the layout ofthe web interface. The highest peaks are at either end of thespectrum and directly in the middle. This phenomenon indi-cates that annotators move the slider all the way to one end whenan audio clip clearly sounds competitive or cooperative. The an-notators leave the slider in place if the audio does not clearly fallinto a category. There are also middling peaks around where thesurvey interface has labels of slightly competitive and slightlycooperative. These peaks show that annotators make use of theLikert-style guidelines, despite having the ability to drop thebutton anywhere on the slider.Figure 2:

The distribution of labels in the annotated corpus.

We evaluate the annotations under the assumption that if the au-dio ﬁles receive similar annotations by different people, then theannotation process can be considered reproducible [19]. Inter-annotator agreement on the raw labels as well as labels cate-gorized into ﬁve equally-spaced bins are shown in Table 2 andindicate moderate agreement on this task.Table 2:

Annotator agreement.

Raw Five BinsSpearman’s ρ κ

4. Data analysis

To investigate the relationship between gender and competitiveturn-taking, we explore the distribution of turn change scoreswith respect to the gender of the ﬁrst speaker in the turn. Forcomparison, we also consider the distribution of these turnchange scores with respect to the role of the speaker (i.e. justiceor attorney).The mean score of an exchange when a woman is the ﬁrstspeaker is more competitive (45.0 with a standard deviation of28.1) than when a man is the ﬁrst speaker (49.7 with a stan-dard deviation of 26.7). Alternatively, the mean label for a turnin which a woman is the second speaker is slightly more co-operative (51.5) than when a man is (45.0). More extreme isthe difference between roles: the mean label for an attorneyﬁrst speaker is 36.3, while a justice ﬁrst speaker has a muchmore cooperative mean of 59.0. This is predictable consider-ing the power differential and a culture of deference; attorneyswould avoid speaking competitively to a justice, while justiceswould be much more likely to speak to an attorney competi-tively. There are no instances of attorney-to-attorney speech.Figure 3 shows the distribution of labels for each speakerin the corpus who is the ﬁrst speaker in a turn. The distributionilluminates the severity with which role aligns with turn changetype. The eight speakers who have the highest means, or arespoken to most cooperatively, are all justices; those with thelowest means are all attorneys.Figure 3:

Distribution of labels for the ﬁrst speaker in a turn.

Based on the non-parametric Kruskal-Wallis test, we con-ﬁrm a signiﬁcant effect of speaker gender ( p < . ) and ofspeaker role ( p = 7 . e − ) on competitiveness score. AWilcoxon Ranked-Sum test also shows this difference, withcomparably low p -values. . Turn Classiﬁcation Experiments We also investigate the automatic classiﬁcation of turn changesbased on acoustic and speaker cues. Effective classiﬁcationcould enable analysis of the relationship between gender andturn change at scale.

Using openSMILE , we extract time-aggregated features foreach speaker in each audio segment in the corpus. We use twofeature sets: the eGeMaPS collection of 88 psychologically-informed features, and the Speech Prosody collection of 36pitch and loudness related features [20, 21]. We select eGeMaPsbecause of its demonstrated success in emotion recognitionstudies, and we select the Speech Prosody set because pat-terns in pitch and amplitude have been shown linguistically todifferentiate between cooperative and competitive turn-taking[22, 23, 4, 7]. In each feature set, we also include the genderand role of the ﬁrst and second speakers. We do not normalizepitch features for speaker gender. We divide the labeled corpus into two subsets: 80% of the cor-pus into a training set and 20% into an evaluation set. Each sethas comparable gender and role distribution across turn-type;for example, 21% of the turns in the full corpus, training set,and evaluation set are male-to-female, and 49% of all turns ineach set are attorney-to-justice.To group the labels into classes, we take the mean of a seg-ment’s two raw labels given by annotators in a 0 to 100 scale,then categorize each segment into one of three quantile-basedclasses based on that mean: the most competitive, the most co-operative, and the middling exchanges.We measure the effectiveness of Random Forest (RF) andSupport Vector Machine classiﬁers (SVC with RBF kernel) inpredicting whether an audio segment falls into these competitiveand cooperative classes. We use the SciKit Learn LaboratoryToolkit version 2.0 (SKLL) , with features scaled by standarddeviation and centered around a mean, and a micro-averaged F score as a grid search objective [24]. The training data is di-vided randomly into subsets for grid search for hyperparameteroptimization.We report two baselines. The ﬁrst predicts a competitiveturn in every instance in which the transcription of the ﬁrstspeaker’s speech ends in a dash (”-”), indicating syntactic in-completeness, and a cooperative turn when there is no dash.Second, we show the micro F score were the target class pre-dicted for all instances. The classiﬁers are most successful at predicting the competi-tive turns, with the highest micro F resulting from the SpeechProsody feature set. This may be due to the fact that highpitch and loudness are deﬁning elements of a competitive turnchange, while there may be more variation in what could deﬁnea cooperative turn change. The results without adding genderand role as features, as well as normalizing prosodic features bygender, are generally lower, but within 0.1 of listed respective We ﬁnd that normalization of features by speaker gender harmspredictive results, and leave it to future research to explore this phe-nomenon more. https://skll.readthedocs.io/en/latest/index.html scores. The added performance due to these features could bedue to the fact that gender and role do correlate with the classof segments.Table 3: Micro- F score for baseline predictions of classes Dash Target ClassCompetitive

Cooperative

Micro- F score for SVC and RF predictions of classes Model eGeMaPS ProsodyCompetitive

SVC 0.636

RF 0.617 0.640

Cooperative

SVC 0.551 0.561RF 0.593

6. Conclusion

This study introduces a corpus of segments of speech from U.S.Supreme Court oral arguments that include a turn change be-tween speakers. The segments, annotated by legal practitionersfor competitiveness and cooperativeness, provide insight in theways that justices and attorneys speak with one another in thisunique speech setting. We ﬁnd that as the ﬁrst person in anexchange, female speakers and attorneys are spoken to morecompetitively than are male speakers and justices. We also ﬁndthat female speakers and attorneys speak more cooperatively asthe second person in an exchange than do male speakers andjustices. We demonstrate that classiﬁers trained only on pho-netic and acoustic features extracted from the audio segmentscan achieve a level of predictive accuracy above multiple base-lines.In-depth studies of gender bias and inequality are criticalto the oversight of an institution as inﬂuential as the SupremeCourt. While the models presented in this study analyze lin-guistic trends in relation to gender, the labeled corpus could beintegrated with other demographic or content-related informa-tion to provide a ﬁne-grained analysis of intersectional ﬁelds.There is demand in the social sciences for even broader anal-ysis; within the ﬁrst few months of 2020, several cross-cuttingstudies have criticized increasing bias in the Supreme Court andfederal appeals courts, especially in regards to poverty and race[25, 26]. With improved predictive models, a larger set of turnchanges across all Supreme Court oral argument recordings andpossibly other court recordings could provide fodder for futurestatistical social science studies of speech trends in the U.S. ju-dicial system.

7. Acknowledgments

We are grateful to the scholars who supported this cross-disciplinary study: Richard Wright, Keelan Evanini, VikramRamanarayanan, Victoria Zayats, and the board, reviewers, andparticipants in Widening NLP at ACL 2019. We also thank thelegal professionals who annotated data and helped test the an-notation survey. Finally, we express our appreciation for thepublic servants and activists who spend their daily lives ﬁghtingfor equality and fairness in the U.S. Judicial System. . References [1] S. Chira, “The Universal Phenomenon of Men InterruptingWomen,”

The New York Times , 06 2017.[2] T. Jacobi and D. Schweers, “Justice, Interrupted: The Effect ofGender, Ideology and Seniority at Supreme Court Oral Argu-ments,”

Virginia Law Review , vol. 103, no. 7, pp. 1379–1496, 112017.[3] D. Tannen,

Gender and Discourse . New York: Oxford UniversityPress, 1994.[4] L. Yang,

Current and New Directions in Discourse and Dialogue.Text, Speech and Language Technology . Springer, 2003, vol. 22,ch. Visualizing Spoken Discourse.[5] K. Laskowski, “Modeling norms of turn-taking in multi-partyconversation,” in

Proceedings of the 48th Annual Meeting ofthe Association for Computational Linguistics . Association forComputational Linguistics, 2010, p. 9991008.[6] J. A. Goldberg, “Interrupting the discourse on interruptions: Ananalysis in terms of relationally neutral, power and rapport-oriented acts,”

Journal of Pragmatics , vol. 14, pp. 883–903, 121990.[7] A. Wichmann and J. Caspers, “Melodic cues to turn-taking inEnglish: Evidence from perception,” in

Proceedings of the SecondSIGdial Workshop on Discourse and Dialogue

No. 18-6210 . United States SupremeCourt, Jan. 21, 2019.[11] N. Ambady, M. A. Krabbenhoft, and D. Hogan, “The 30-SecSale: Using Thin-Slice Judgments to Evaluate Sales Effective-ness,”

Journal of Consumer Psychology , vol. 16, pp. 4–13, 2006.[12] N. Ambady and R. Rosenthal, “Half a minute: Predicting teacherevaluations from thin slices of nonverbal behavior and physi-cal attractiveness,”

Journal of Personality and Social Psychology ,vol. 64, p. 431441, 1993.[13] Kahler v. Kansas,

No. 18-6135 . United States Supreme Court,Oct. 7, 2019.[14] Virginia House of Delegates v. Bethune-Hill,

No. 18-281 . UnitedStates Supreme Court, Mar. 18, 2019.[15] Washington State Dept. of Licensing v. Cougar Den Inc.,

No. 16-1498 . United States Supreme Court, Oct. 30, 2018.[16] K. S. Robinson and J. S. Rubin, “Women Argue Only a Fractionof Supreme Court Cases,” Jan. 30, 2019.[17] M. Walsh, “Number of Women Arguing Before the SupremeCourt has Fallen off Steeply,”

American Bar Association Journal ,Aug. 1, 2018.[18] J. R. de Leeuw, “jsPsych: A JavaScript library for creating behav-ioral experiments in a web browser,”

Behavior Research Methods ,vol. 47, pp. 1–12, 2015.[19] R. Artstein, “Inter-annotator Agreement,” in

Handbook of Lin-guistic Annotation , P. J. Ide N., Ed. Springer, Dordrecht, 2017,ch. 11, pp. 297–313.[20] F. Eyben and B. Schuller, “OpenSMILE:): The MunichOpen-Source Large-Scale Multimedia Feature Extractor,”

SIG-Multimedia Rec. , vol. 6, no. 4, p. 413, Jan. 2015. [Online].Available: https://doi.org/10.1145/2729095.2729097[21] F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. Andr,C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan,and K. P. Truong, “The Geneva Minimalistic Acoustic ParameterSet (GeMAPS) for voice research and affective computing,”

IEEETransactions on Affective Computing , vol. 7, no. 2, pp. 190–202,2016. [22] J. Gorisch, B. Wells, and G. Brown, “Pitch Contour Matchingand Interactional Alignment Across Turns: An Acoustic Inves-tigation,”

Language and Speech , vol. 55, pp. 57–76, 03 2012.[23] K. Truong, “Classiﬁcation of Cooperative and Competitive Over-laps in Speech Using Cues from the Context, Overlapper, andOverlappee,”

Proceedings of the Annual Conference of the Inter-national Speech Communication Association (INTERSPEECH) ,pp. 1404–1408, 2013.[24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot,and E. Duchesnay, “Scikit-learn: Machine Learning in Python,”

Journal of Machine Learning Research , vol. 12, pp. 2825–2830,2011.[25] R. R. Ruiz, R. Gebeloff, S. Eder, and B. Protess, “A ConservativeAgenda Unleashed on the Federal Courts,”

The New York Times ,03 2020.[26] A. Cohen,