[PDF] Analyzing Team Performance with Embeddings from Multiparty Dialogues

Abstract

Good communication is indubitably the foundation of effective teamwork. Over time teams develop their own communication styles and often exhibit entrainment, a conversational phenomena in which humans synchronize their linguistic choices. This paper examines the problem of predicting team performance from embeddings learned from multiparty dialogues such that teams with similar conflict scores lie close to one another in vector space. Embeddings were extracted from three types of features: 1) dialogue acts 2) sentiment polarity 3) syntactic entrainment. Although all of these features can be used to effectively predict team performance, their utility varies by the teamwork phase. We separate the dialogues of players playing a cooperative game into stages: 1) early (knowledge building) 2) middle (problem-solving) and 3) late (culmination). Unlike syntactic entrainment, both dialogue act and sentiment embeddings are effective for classifying team performance, even during the initial phase. This finding has potential ramifications for the development of conversational agents that facilitate teaming.

Full PDF

AAnalyzing Team Performance with Embeddings from Multiparty Dialogues

Ayesha Enayet

Department of Computer ScienceUniversity of Central FloridaOrlando, [email protected]

Gita Sukthankar

Department of Computer ScienceUniversity of Central FloridaOrlando, [email protected]

Abstract —Good communication is indubitably the founda-tion of effective teamwork. Over time teams develop theirown communication styles and often exhibit entrainment,a conversational phenomena in which humans synchronizetheir linguistic choices. This paper examines the problem ofpredicting team performance from embeddings learned frommultiparty dialogues such that teams with similar conﬂictscores lie close to one another in vector space. Embeddingswere extracted from three types of features: 1) dialogue acts2) sentiment polarity 3) syntactic entrainment. Although allof these features can be used to effectively predict teamperformance, their utility varies by the teamwork phase. Weseparate the dialogues of players playing a cooperative gameinto stages: 1) early (knowledge building) 2) middle (problem-solving) and 3) late (culmination). Unlike syntactic entrainment,both dialogue act and sentiment embeddings are effective forclassifying team performance, even during the initial phase.This ﬁnding has potential ramiﬁcations for the development ofconversational agents that facilitate teaming.

Keywords -teamwork, multiparty dialogues, entrainment, sen-timent analysis, dialogue acts, embeddings

I. I

NTRODUCTION

The aim of our research is to create agents who can assisthuman teams by intervening when teamwork goes awry.To do this, it is important to be able to rapidly assess thestatus of team performance through “thin-slicing”, makingaccurate classiﬁcations from short behavior samples; Jungsuggests that developing this capability would remove theneed for developing continuous team monitoring systems[1]. Ambady and Rosenthal demonstrate that many types ofsocial interactions remain sufﬁciently stable that even a smallsample is meaningful at predicting long term outcomes, themost famous application of this theory being thin-slicingmarital interactions to predict divorce outcomes [2], [3].Rather than developing speciﬁc measures for predictingfuture team conﬂict, we demonstrate that an embeddinggrouping teams with similar conﬂict levels can be learneddirectly from multiparty dialogue. An advantage is that thisapproach avoids the necessity of collecting advance data onteam members, such as personality traits or training records.

This material is based upon work supported by the Defense AdvancedResearch Projects Agency (DARPA) under Contract No. W911NF-20-1-0008.

This paper compares the performance of three types ofembeddings extracted from: 1) dialogue acts, 2) sentimentpolarity, and 3) syntactic entrainment; these features were se-lected based on previous work on team communications andgroup problem-solving. Dialogue acts capture the interactivepattern between speakers in multiparty communication [4].During dialogue act classiﬁcation, utterances are grouped ac-cording to their communication purpose. Sentiment polaritymeasures the attitude or emotion of the speaker during con-versation; it can be used to detect disagreement. Entrainmentis the natural tendency of the speakers to adopt a similar styleduring a conversation, causing them to achieve linguisticalignment. There are several types of entrainment includinglexical choice [5], style [6], pronunciation [7], and manyothers [8]. Reitter and Moore demonstrated that syntacticentrainment, based on alignment of lexical categories, canbe used to predict success in task-oriented dialogues [5].Good team communication exhibits all these character-istics: greater emphasis on problem solving than arguing,positive sentiment, and communication synchronization [9].Our research was conducted on the Teams corpus [10] whichconsists of player dialogue during a cooperative game. Oneadvantage of studying a clearly deﬁned, time-bounded teamtask is that the dialogues can be divided into teamworkphases: 1) early (knowledge building) 2) middle (problemsolving) and 3) late (culmination). For thin-slicing, we seekto predict the team performance from the initial teamworkstages. The Teams corpus includes team conﬂict scores,which measure the amount of disagreement that occurredduring gameplay. Our hypotheses are: • H1 : an embedding leveraging dialogue acts will beuseful for classifying team performance at all phasessince it directly detects utterances related to conﬂict(eristic dialogues). • H2 : sentiment analysis will consistently reveal teamconﬂict and thus be a good predictor of performance. • H3 : the entrainment embedding will be predictive whenthe entire dialogue is considered, but will be less usefulat analyzing early phases before entrainment has beenestablished.Embeddings are mechanisms for mapping high-dimensionalspaces to low-dimensions while only retaining the most ef- a r X i v : . [ c s . C L ] J a n ective structural representations, making it possible to applymachine learning on large inputs by representing them inthe form of sparse vector. This paper presents our approachfor extracting embeddings from multiparty dialogues thatencode team conﬂict. The next section describes the richliterature on analyzing team communication and multipartydialogues. II. R ELATED W ORK

Team communication, both spoken or written, is a criticalelement of collaborative tasks and can be studied in avariety of ways. Semantic analysis centers on the meaningof utterances, while pragmatics involves identifying speechacts [11]; both analytic approaches are important and oftenoccur in parallel. In many studies of team communication,this analysis is arduously done through hand coding theutterances.Parsons et al. [12] contrast two different schemes tocode utterances in team dialogues as part of their longterm research goal of developing a virtual assistant forhuman teams. Their comparison illustrates the beneﬁts andproblems of the Walton and Krabbe typology [13], whichincludes categories for information-seeking, inquiry, nego-tiation, persuasion, deliberation, and eristic, but does notconsider the context in which the utterance occurs. TheMcGrath theory of group behavior [14] focuses on modesof operation: inception, problem-solving, conﬂict resolution,and execution. When applying the McGrath theory of groupbehavior, utterance classiﬁcation is modiﬁed by conversa-tional context.Sukthankar et al. also used an explicit team utterance cod-ing scheme towards the problem of agent aiding of ad hoc,decentralized human teams to improve team performanceon time-stressed group tasks [15]. Unlike teamwork studies,we do not speciﬁcally map individual utterances to teamcommunication categories, but leverage dialogue act classiﬁ-cation models to identify features that are indicative of teamconﬂict. Shibani et al. [16] discussed some of the practicalchallenges in designing an automated assessment system toprovide students feedback on their teamwork competency:1) dialogue pre-processing, 2) assessing teamwork chat text,and 3) classifying teamwork dimensions. They evaluatedthe performance of rule-based systems vs. supervised ma-chine learning (SVM) at classifying coordination, mutualperformance monitoring, team decision making, constructiveconﬂict, team emotional support, and team commitment.Even with dataset imbalance, the SVM model generallyoutperformed the hand coded rules. Our proposed methodcan also be used to assist human teams by proactivelywarning them of deﬁciencies during the early phases of teamtasks, without the onerous data labeling requirements.Other analytic techniques focus on linguistic coordinationbetween speakers in groups. For instance, Danescu et al.studied the effect of power differences on lexical category choices during goal-oriented discussion [17]. This is oneform of entrainment in which the speakers preferentiallyselect function-word classes used by other group members.Our paper uses a dataset (Teams corpus), that was createdto study entrainment in teams [10]. Rahimi and Litmandemonstrated a method for learning an entrainment embed-ding to predict team performance [18]; we use a modiﬁedversion of their technique to express syntactic entrainment.However since entrainment develops over time, we comparethe performance of entrainment at early vs. late task phases.Furthermore, they only focused on syntactic/lexical featuresof utterances, not semantic.Sentiment analysis has been applied to the study of groupdynamics; for instance, researchers have leveraged sentimentfeatures to detect communities in social networks [19], [20].Our work demonstrates the utility of sentiment features to-wards predicting team conﬂict and show that the sentiment-based embedding is useful during all teamwork phases. Werely exclusively on the multiparty team dialogues; howeverthere have been many attempts to predict team performanceusing other types of multimodal features. TCdata, a teamcooperation dataset, includes both audio and video record-ings of teams performing cooperative tasks [21]. Liu etal. explicitly extracted 159 features from team speakingcues, individual speaking time statistics, and face-to-faceinteraction cues to predict team performance on this dataset.Several studies [22], [23] have shown team memberpersonality traits to be useful predictors of conﬂict and teamperformance. Yang et al. used individual personality traits topredict the performance of ﬁnal year student project teamsusing neural networks [22]. Omar et al. developed a studentperformance prediction model that included both personalitytypes and team personality diversity [23]. Even thoughthese additional data sources can be highly predictive, theyare rarely available in real-world team scenarios, unlikemulti-party dialogue which is often self-archived to preserveorganizational memory.III. M

ETHOD

This section describes our procedure for computing em-beddings using doc2vec [24], an unsupervised method that isused to create a vector representation of the team dialogue.We compare the performance of different possible inputsto doc2vec: 1) dialogue acts, 2) sentiment analysis, and 3)syntactic entrainment.

A. Dialogue Acts

Dialogue acts can be created from the semantic classi-ﬁcation of dialogue at the utterance level to identify theintent of the speaker. A transfer learning approach was usedto tag utterances of the Teams corpus using the DAMSL(Discourse Annotation and Markup System of Labeling)tagset. Figure 1 shows the architecture of our dialogue igure 1. Dialogue Act Classiﬁer Architecture.Table ID

ATASET S TATISTICS

Dataset act classiﬁer, which was constructed using the Univer-sal Sentence Encoder; we selected USE for its ability toachieve consistently good performance across multiple NLPtasks [25]. There are two different variants of the model: 1)a transformer architecture, which exhibits high accuracy atthe cost of increased resource consumption and 2) a deepaveraging network that requires few resources and makessmall compromises for efﬁciency. The former uses attention-based, context-aware encoding subgraphs of the transferarchitecture. The model outputs a 512-dimensional vector.The deep averaging network works by averaging words andbigram embeddings to use as an input to a deep neural net-work. The models are trained on web news, Wikipedia, webquestion-answer pages, discussion forums, and the StanfordNatural Language Inference (SNLI) corpus, and are freelyavailable on TF Hub. We selected the USE Transformer-based Architecture model with three dense layers and asoftmax activation function. Figure 1 shows the architectureof our DA classiﬁcation model, which achieves a validationaccuracy of 70%.The model was ﬁne-tuned using the Switchboard DialogueAct Corpus (SwDA) dataset. SwDA is one of the mostpopular public datasets for DA classiﬁcation. It consistsof 1155 human-to-human telephone speech conversations,tagged using 42 tags from the DAMSL tagset. Table I showsthe statistics of both SwDA and the Teams corpus.Table II shows examples from the SwDA training dataset,and Table III shows examples from Teams corpus. Each teamdialogue generates a unique sequence where each element ofthe sequence represents the dialogue act of the correspondingutterance. This sequence of dialogue acts is then used as aninput to doc2vec algorithm to create the embedding.

B. Sentiment Analysis

Another option is to represent the team dialogue as aseries of changes in the emotional state of the team. Thiscan be done by applying sentiment analysis to the individualutterances. Sentiment analysis is the task of predicting theemotion or attitude of the speaker; we are using the TextBlobpython implementation [26] to determine sentiment polarity of each utterance in the dialogue. The polarities are ﬂoatvalues which lies between -1 and 1 representing negative,positive and neutral sentiment. For each team the uniquesequence of these polarities is used as input to doc2vec,where each element of the sequence represents the polarityof the corresponding utterance. This representation encodestransitions in the emotional state of the team across theduration of the task.

C. Entrainment

Entrainment is one form of linguistic coordination inwhich team members adopt similar speaking styles duringconversation. Here we evaluate the performance of a syn-tactic entrainment embedding based on Rahmi and Litman’s[18]’s work that encodes the propensity of subsequent speak-ers to make similar lexical choices. Eight lexical categorieswere used: noun (NN), adjective (JJ), verb (VB), adverb(RB), coordinating conjunction (CC), cardinal digit (CD),preposition/subordinating conjunction (IN), and personalpronoun (PRP) . To calculate the entrainment between twospeakers we follow the method proposed by Danescu et al.[17] shown in Equation 1.

Ent c ( x, y ) is the entrainmentof speaker y to speaker x , c is the lexical category, e yx c represents the event where speaker y utterance immediatelyfollows the speaker x utterance and contains c , e cx is theevent when utterance (spoken to y) of speaker x contains c . Ent c ( x, y ) = p ( e yx c e cx ) − p ( e yx c ) (1)The NLTK part-of-speech (POS) tagger was used to tagall the utterances with their respective lexical categories. Adirected weighted graph was generated for each dialoguelinking speakers with positive entrainment. The structure ofthis graph encodes the entrainment relationships betweenteam members. To translate the graph into a feature represen-tation, six graph centrality kernel functions were applied torepresent each node of the team graph. The kernel functionsare: (1) PageRank (2) betweenness centrality (3) closenesscentrality (4) degree centrality (5) in degree centrality (6)Katz centrality. To create the ﬁnal team representation, thevectors of individual nodes were averaged, and doc2vec wasapplied to create the embedding. This method correspondsto the Kernel version of Entrainment2Vec [18] and achievescomparable performance when applied to the whole dia-logue.Our implementation is slightly different from that of [18]and [17] in two aspects. First, we are using the NLTK POStagger to assign lexical categories to the utterances insteadof using LIWC-derived categories. Second, we are using sixgraph kernel algorithms instead of ten. We observed thatusing more graph kernel functions on graphs that consist ofthree to four team members does not improve performance.The POS tagging reﬂects the sentence’s syntactic structure;we have carefully selected the POS categories that are able IIS W DA D

ATASET S AMPLE

Speaker Utterance DA DescriptionA I don’t, I don’t have any kids. sd Statement-non-OpinionA I, uh, my sister has a, she just had a baby, sd Statement-non-OpinionA he’s about ﬁve months old sd Statement-non-OpinionA and she was worrying about going back to work andwhat she was going to do with him and – sd Statement-non-OpinionA Uh-huh. b AcknowledgeA do you have kids? qy Yes-No-QuestionB I have three. na Afﬁrmative non-yes AnswerA Oh, really? bh Backchannel in question formTable IIIT

EAMS C ORPUS E XAMPLE

Speaker Utterance DA DescriptionA Ok I’m going to sd Statement-non-OpinionA shore up these two. sd Statement-non-OpinionB Good move. ba AppreciationA Then we got one and then I guess I can also sd Statement-non-OpinionA Can I use my powers twice in one play sd Statement-non-OpinionC Mm b Acknowledge (Backchannel)B yes ny Yes answer consistent with the conventional English part of speechcategories used by [18] and [17]. While calculating theentrainment, we do not consider the actual word and itscontext; therefore, this embedding only captures syntacticfeatures, not semantics.

D. Doc2vec

Le and Mikolov [24] introduced doc2vec as an unsu-pervised learning algorithm to generate distributed vectorrepresentations of text of arbitrary size; it is inspired by theword2vec model [27]. They proposed two different modelsfor learning numerical representations of text: 1) DistributedMemory Model of Paragraph Vectors (PV-DM) 2) paragraphvector with a distributed bag of words (PV-DBOW).

Distributed Memory Model of Paragraph Vectors (PV-DM) uses both word vectors and paragraph vectors to predictthe next word. It attempts to learn paragraph vectors thatcan predict the word given different contexts sampled fromthe text. The context size is a tuneable parameter, and asliding window of arbitrary context size generates multiplecontext samples. Doc2vec works by averaging these wordvectors and paragraph vectors to predict the next word.It employs stochastic gradient descent to learn word andparagraph vectors. The resultant paragraph vectors serve asa feature vector of the corresponding paragraph and can beused as an input to machine learning models like SVM andlogistic regression.

Paragraph vector with a distributed bag of words(PV-DBOW) ignores the context words and attempts topredict randomly selected words from the paragraph. Ateach iteration of stochastic gradient descent, it classiﬁes arandomly selected word from the sampled text window usingparagraph vectors. Instead of using doc2vec on the raw team dialogues,doc2vec was applied to the output of the dialogue actclassiﬁer, sentiment analysis, and syntactic entrainment. Thisprocedure enables us to disentangle the contribution ofdifferent elements of team communication at predictingconﬂict. IV. D

ATASET

Our evaluation was conducted on the Teams corpusdataset collected by Litman et al. [10]. It contains 124 teamdialogues from 62 different teams, playing two differentcollaborative board games. The length of the dialoguesvaries from 291 to 2124 utterances. In addition to collectingdialogue data, the researchers administered surveys of teamlevel social outcomes. Team social outcome scores includetask conﬂict, relation conﬂict, and process conﬂict scores.All these scores are highly correlated, and we are using pro-cess conﬂict z-scores to represent team performance. Jehn etal. have identiﬁed that low process conﬂict scores indicategood team performance and vice versa [28]. To study theproblem of early prediction of team conﬂict, we divideeach dialogue into three equal sections that correspond tothe knowledge-building, problem solving, and culminationteamwork phases. Our ﬁnal classiﬁcation dataset consists of12 patterns per dialogue, which are generated from applyingthe three methods (semantic, sentiment, syntactic) to thewhole time period, as well as the initial, middle and ﬁnalsegments.Teams were divided into high performing and low per-forming teams based on their process conﬂict z-scores, andclassiﬁcation accuracy was measured. Doc2vec was usedto generate the vector representation of all the patterns.Doc2vec comes in two different ﬂavors: 1) Distributed Mem- able IVD OC EC C OMPARISON

PV-DBOW PV-DMDialogue Act 57.89 68.42Sentiment 55.26 78.94Entrainment 55.26 60.52Table VC

OMPARISON OF S UPERVISED C LASSIFIERS

Logistic Regression SVMDialogue Act 63.15 68.42Sentiment 71.05 78.94Entrainment 63.15 60.52 ory Model of Paragraph Vectors (PV-DM) and 2) DistributedBag of Words version of Paragraph Vector (PV-DBOW).Through extensive experiments, we identiﬁed that PV-DMwith epoch size of 5, negative sampling 5, and window size10 works best for our setting. By default, we only reportresults for PV-DM. Table IV shows the comparison of PV-DM & PV-DBOW when applied to the complete dialogue.We evaluated the performance of both logistic regressionand the support vector machine (SVM) classiﬁer on the fulldialogue (shown in Table V); for the other experiments, thebetter performer, SVM, was used.V. R

ESULTS

Table VI presents the classiﬁcation accuracy of the threeembeddings on the whole dialogue. SVM exhibits thebest classiﬁcation accuracy of 78.94% on sentiment basedvectors, followed by dialogue act based vectors. Figure 2visually illustrates the effects of different embeddings. Byplotting the vectors in 2d using t-Distributed StochasticNeighbor Embedding (TSNE), we can observe the forma-tion of two clusters, representing teams with high socialoutcomes and low social outcomes in the dialogue actand sentiment vectors, whereas the entrainment ones areintermixed.Table VI shows the accuracy of the conﬂict classiﬁeracross the duration of the games. The sentiment classiﬁerachieved the best accuracy when the whole dialogue wasused and exhibited consistent performance across all teamphases. The dialogue act embedding was the best at theinitial phase, making it a good choice for the “thin-slice”problem of rapidly diagnosing teamwork health from asmall sample of utterances. Syntactic entrainment lagged

Table VIA

CCURACY BY T EAM P HASE

Phase DA Sentiment EntrainmemtWhole 68.42 78.94 60.52Initial 71.05 65.78 42.10Middle 73.68 65.78 47.36End 68.42 71.05 60.52 behind the sentiment and semantic analysis, but performanceimproved during the ﬁnal phase.For statistical testing, we generated 30 results for eachphase using each embedding. Since some of the result distri-butions (Figure 3) failed the D’Agostino-Pearson normalitytest, the Kolmogorov-Smirnov test was used for signiﬁcancetesting. The performance differences between each pairof embeddings were statistically signiﬁcant ( p < . ).However the differences between the initial and end phaseresults for the sentiment and entrainment embeddings werenot signiﬁcant (Table VII). Semantic and sentiment basedvectors outperformed the syntactic entrainment vectors atthe classiﬁcation task across all phases.VI. C ONCLUSION

This study presents an evaluation of different embeddingsfor predicting team conﬂict from multiparty dialogue. Em-beddings were extracted from three types of features: 1)dialogue acts 2) sentiment polarity 3) syntactic entrainment.Results conﬁrm the effectiveness of both sentiment ( H2 ) anddialogue acts ( H1 ). However, experiments failed to conﬁrmthat classiﬁcation based on syntactic entrainment signﬁcantlyimproves over time ( H3 ). Although there are many otherways to measure linguistic synchronizaton, it seems lesspromising for integration into an agent assistance system.The dialogue act embedding is strong during the initial phasemaking it a good candidate for diagnosing the health ofteam formation activity. A continuous team monitoring agentassistant system might do better with sentiment analysis.In future work we plan to explore embeddings basedon macrocognitive teamwork states, such as those in theMacrocognition in Teams Model (MITM) [29]. Drawingfrom research on externalized cognition, team cognition,group communication and problem solving, and collabo-rative learning and adaptation, MITM provides a coherenttheoretically based conceptualization for understanding com-plex team processes and how these emerge and change overtime. It captures the parallel and iterative processes engagedby teams as they synthesize these components in service ofteam cognitive processes such as problem solving, decisionmaking and planning.VII. A CKNOWLEDGEMENT

This material is based upon work supported by the De-fense Advanced Research Projects Agency (DARPA) underContract No. W911NF-20-1-0008. Any opinions, ﬁndingsand conclusions or recommendations expressed in this ma-terial are those of the authors and do not necessarily reﬂectthe views of DARPA or the University of Central Florida.R

EFERENCES [1] M. F. Jung, “Coupling interactions and performance: Pre-dicting team performance from thin slices of conﬂict,”

ACMTransactions on Computer-Human Interaction (TOCHI) ,vol. 23, no. 3, pp. 1–32, 2016. igure 2. t-SNE representation of vectors in 2D, where ’S’ represents the teams with low process conﬂict scores and ’U’ represents the teams with highprocess conﬂict scores. Both sentiment (left) and dialogue act embedding (right) show a better class separation than entrainment (center). Note that theaxes have no explicit meaning.Figure 3. Distribution of embedding results for initial and ﬁnal teamwork phases for dialogue acts (left), sentiment (middle) and entrainment (right)Table VIIC

OMPARISON OF PERFORMANCE OF ALL THE THREE APPROACHES AT K NOWLEDGE D ISCOVERY & C

ULMINATION P HASE

Knowledge Discovery Culminationmin max min max p-valueDialogue Act 0.552632 0.710526 0.473684 0.684211

Sentiment 0.526316 0.657895 0.500000 0.710526 0.455695Entrainment 0.4210 0.4210 0.394737 0.605263 0.594071 [2] N. Ambady and R. Rosenthal, “Thin slices of expressivebehavior as predictors of interpersonal consequences: a meta-analysis,”

Psycholology Bulletin 111 , vol. 2, pp. 256––274,1992.[3] ——, “Half a minute: predicting teacher evaluations from thinslices of non-verbal behavior and physical attractiveness,”

J.Pers. Soc. Psychol. , vol. 64, no. 3, pp. 431––441, 1993.[4] C.-W. Goo and Y.-N. Chen, “Abstractive dialogue summa-rization with sentence-gated modeling optimized by dialogueacts,” in

IEEE Spoken Language Technology Workshop (SLT) ,2018, pp. 735–742.[5] D. Reitter and J. D. Moore, “Predicting success in dialogue,”

Proceedings of the ACL , 2007.[6] C. Danescu-Niculescu-Mizil, M. Gamon, and S. Dumais,“Mark my words!: linguistic style accommodation in socialmedia,” in

Proceedings of the International Conference onWorld Wide Web , 2011, pp. 745––754.[7] J. S. Pardo, “On phonetic convergence during conversationalinteraction,”

The Journal of the Acoustical Society of America ,vol. 119, no. 4, pp. 2382––2393, 2006. [8] M. Mizukami, K. Yoshino, G. Neubig, D. Traum, andS. Nakamura, “Analyzing the effect of entrainment on di-alogue acts,” in

Proceedings of the Annual Meeting of theSpecial Interest Group on Discourse and Dialogue . LosAngeles: Association for Computational Linguistics, Sep.2016, pp. 310–318.[9] Y. Yang, G. N. Kuria, and D.-X. Gu, “Mediating role oftrust between leader communication style and subordinate’swork outcomes in project teams,”

Engineering ManagementJournal , vol. 32, no. 3, pp. 152–165, 2020.[10] D. Litman, S. Paletz, Z. Rahimi, S. Allegretti, and C. Rice,“The Teams corpus and entrainment in multi-party spokendialogues,” in

Proceedings of the Conference on EmpiricalMethods in Natural Language Processing , 2016, pp. 1421–1431.[11] S. Bird, B. Boguraev, M. Kay, D. McDonald, D. Hindle, andY. Wilks,

Survey of the state of the art in human languagetechnology . Cambridge University Press, 1997, vol. 12.[12] S. Parsons, S. Poltrock, H. Bowyer, and Y. Tang, “Analysisof a recorded team coordination dialogue,” in

Proceedings ofthe Second Annual Conference of the ITA , 2008.13] D. N. Walton and E. C. W. Krabbe,

Commitment in Dialogue:Basic Concepts of Interpersonal Reasoning . State Universityof New York Press, 1995.[14] J. E. McGrath, “Time, interaction, and performance,”

SmallGroup Research , 1991.[15] G. Sukthankar, K. Sycara, J. A. Giampapa, C. Burnett, andA. Preece, “An analysis of salient communications for agentsupport of human teams,” in

Multi-agent Systems: Semanticsand Dynamics of Organizational Models , V. Dignum, Ed. IGIGlobal, 2009, pp. 284–312.[16] A. Shibani, E. Koh, V. Lai, and K. J. Shim, “Assessingthe language of chat for teamwork dialogue,”

Journal ofEducational Technology & Society , vol. 20, no. 2, pp. 224–237, 2017.[17] C. Danescu-Niculescu-Mizil, L. Lee, B. Pang, and J. Klein-berg, “Echoes of power: Language effects and power differ-ences in social interaction,” in

Proceedings of the Interna-tional Conference on World Wide Web , 2012, pp. 699–708.[18] Z. Rahimi and D. Litman, “Entrainment2vec: Embeddingentrainment for multi-party dialogues,” in

Proceedings of theAAAI Conference on Artiﬁcial Intelligence , vol. 34, no. 05,2020, pp. 8681–8688.[19] K. Sawhney, M. C. Prasetio, and S. Paul, “Communitydetection using graph structure and semantic understandingof text,”

SNAP Stanford University , 2017.[20] K. Xu, J. Li, and S. S. Liao, “Sentiment community detectionin social networks,” in

Proceedings of the iConference , 2011,pp. 804–805.[21] S. Liu, L. Wang, S. Lin, Z. Yang, and X. Wang, “Analysisand prediction of team performance based on interactionnetworks,” in

Chinese Control Conference (CCC) . IEEE,2017, pp. 11 250–11 255.[22] F.-S. Yang and C.-H. Chou, “Prediction of team performanceand members’ interaction: A study using neural network,”in

International Conference on Industrial, Engineering andOther Applications of Applied Intelligent Systems . Springer,2014, pp. 290–300.[23] M. Omar, S.-L. Syed-Abdullah, and N. M. Hussin, “De-veloping a team performance prediction model: A roughsets approach,” in

International Conference on InformaticsEngineering and Information Science . Springer, 2011, pp.691–705.[24] Q. Le and T. Mikolov, “Distributed representations of sen-tences and documents,” in

International Conference on Ma-chine Learning , 2014, pp. 1188–1196.[25] D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco,R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan,C. Tar et al. , “Universal sentence encoder,” arXiv preprintarXiv:1803.11175 , 2018.[26] “Textblob,” https://textblob.readthedocs.io/en/dev/. [27] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,“Distributed representations of words and phrases and theircompositionality,” in

Advances in Neural Information Pro-cessing Systems , 2013, pp. 3111–3119.[28] K. A. Jehn and E. A. Mannix, “The dynamic nature ofconﬂict: A longitudinal study of intragroup conﬂict and groupperformance,”

Academy of Management Journal , vol. 44,no. 2, pp. 238–251, 2001.[29] S. M. Fiore, S.-J. K. A., E. Salas, N. Warner, and L. M.,“Toward an understanding of macrocognition in teams: De-veloping and deﬁning complex collaborative processes andproducts,”