[PDF] Defining and Quantifying Conversation Quality in Spontaneous Interactions

Abstract

Social interactions in general are multifaceted and there exists a wide set of factors and events that influence them. In this paper, we quantify social interactions with a holistic viewpoint on individual experiences, particularly focusing on non-task-directed spontaneous interactions. To achieve this, we design a novel perceived measure, the perceived Conversation Quality, which intends to quantify spontaneous interactions by accounting for several socio-dimensional aspects of individual experiences. To further quantitatively study spontaneous interactions, we devise a questionnaire which measures the perceived Conversation Quality, at both the individual- and at the group- level. Using the questionnaire, we collected perceived annotations for conversation quality in a publicly available dataset using naive annotators. The results of the analysis performed on the distribution and the inter-annotator agreeability shows that naive annotators tend to agree less in cases of low conversation quality samples, especially while annotating for group-level conversation quality.

Full PDF

DDefining and antifying Conversation ality in SpontaneousInteractions Navin Raj Prabhu

Delft University of TechnologyDelft, The [email protected]

Chirag Raman

Delft University of TechnologyDelft, The [email protected]

Hayley Hung

Delft University of TechnologyDelft, The [email protected]

ABSTRACT

Social interactions in general are multifaceted and there exists awide set of factors and events that in uence them. In this paper,we quantify social interactions with a holistic viewpoint on in-dividual experiences, particularly focusing on non-task-directedspontaneous interactions. To achieve this, we design a novel per-ceived measure, the perceived Conversation Quality, which intendsto quantify spontaneous interactions by accounting for severalsocio-dimensional aspects of individual experiences.To further quantitatively study spontaneous interactions, wedevise a questionnaire which measures the perceived ConversationQuality, at both the individual- and at the group- level. Using thequestionnaire, we collected perceived annotations for conversationquality in a publicly available dataset using naive annotators. Theresults of the analysis performed on the distribution and the inter-annotator agreeability shows that naive annotators tend to agreeless in cases of low conversation quality samples, especially whileannotating for group-level conversation quality. KEYWORDS

Conversation Quality, Spontaneous Interactions, Individual Experi-ences, Social Constructs, Questionnaires, Perceived Annotationsand Inter-annotator agreement.

ACM Reference Format:

Navin Raj Prabhu, Chirag Raman, and Hayley Hung. 2020. De ning andQuantifying Conversation Quality in Spontaneous Interactions. In Compan-ion Publication of the 2020 International Conference on Multimodal Interaction(ICMI ’20 Companion), October 25–29, 2020, Virtual event, Netherlands.

ACM,New York, NY, USA, 10 pages. https://doi.org/10.1145/3395035.3425966

Spontaneous interactions such as unplanned social conversationsare typically non task-directed, unconstrained, and occur in naturalsituations [28, 30, 33]. In such interactions, the quality of the experi-ence is a social construct that exists in the perception of individualparticipants. Such a subjective construct is generally quanti ed byrelying on self-reported measures by the participants. However,such measures can su er from biases from multiple sources—recall[24], social desirability [25], or egoistic[11, 25]. Furthermore, ob-taining self reports might be precluded by privacy concerns. In Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro t or commercial advantage and that copies bear this notice and the full citationon the rst page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s). ICMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands © Figure 1: Illustration of individual experiences existing inthe perception of interacting partners, and how a perceivedmeasure of them are relevant for social robots. this work we argue that the external perception of the quality ofan interaction is an important construct towards the developmentof socially intelligent systems (e.g. social robots), as illustrated inFigure-1. In contrast to self-reported measures, a measure of per-ceived experience quanti es the individual or collective experiencesof participants as perceived by a third-party observer [12, 21, 26].Such measures are also resource e cient since they can be collectedfor existing datasets of spontaneous interactions where the partici-pants are no longer available to provide self-reports. As such, exter-nally perceived measures are more relevant to the development ofarti cial agents aimed at supporting and modulating human-humanor human-robot interaction.Another challenge in the study of spontaneous interactions isthat an experience of the social dynamics of an interaction is amulti-faceted construct. To quantify such an experience, it is veryimportant to consider di erent overlapping aspects of the interac-tion: aspects such as interest-levels [12], involvement [26], cohesion[2], bonding [17], and rapport [23]. Existing literature in social psy-chology tends to consider such aspects in isolation. Consequently,an attempt to study the overall quality of individual experiences isstill a knowledge gap.In this work we make a two-fold contribution. Firstly, we in-troduce a novel measure of spontaneous interactions— perceivedconversation quality . We formally de ne this construct and presentits constituents by jointly considering overlapping aspects of theinteraction. These aspects have thus far been considered only inisolation in social psychology literature. Secondly, we present aninstrument in the form of questionnaires for collecting perceivedannotations for Conversation Quality . We use the instrument tocollect annotations of perceived conversation quality on a publicly

CMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands Raj Prabhu et al. available dataset of free-standing social conversations in-the-wild,and provide an analysis of the annotations. To the best of our knowl-edge, there is no existing work in the literature which has attemptedto de ne and quantify the overall perceived quality of spontaneousinteractions with respect to individual and group experiences.The rest of the research paper is organized as follows. Firstly, inSection-2, we review several research works in existing literature toinvestigate the knowledge gap present and also draw inspirationsto design the measure of perceived Conversation Quality . Secondly,in Section-3, we formally de ne the perceived Conversation Quality measure. Subsequently, in Section-4, we explain how Conversa-tion Quality was quanti ed, with respect to its de nition, using apublicly available dataset. In the same section, we also present anddiscuss the results of the analysis on the collected annotations. InSection-5, we discuss the several key ndings and potential futureworks. Finally, in Section-6, we will conclude the research paper. In this section, we present a literature review which discussesresearch works that have attempted to study social interactions.Firstly, we discuss how the study of social interaction are opra-tionalised by di erent researchers. Secondly, we discuss how di er-ent social constructs are quanti ed with their respective consider-ation and viewpoints. Finally, we present concluding remarks onexisting literature and also discuss its existing knowledge gap. Fundamental research on social interactions was pioneered by Go -man [14], whose symbolic interaction perspective explains societyvia the everyday behavior of people and their interactions. Simi-larly, several other researchers have also operationalised the studyof social interaction using di erent spatial and temporal aspects ofthe interactions.Kendon (1990) [19, p.210], while studying the spatial aspectof face-to-face interactions, de ned the f-formation system as,"the system of behavioural organisation by which certain spatial-orientational patterns are established and sustained in free-standingconversations". Similarly, Edelsky (1981), while studying the tem-poral aspect, examined a series of social interactions, and cruciallyobserved two contrasting styles of conversation, the exclusive oorsand the cooperative oors. According to Edelsky, the exclusive ooris characterised by a sense of orderliness, with only one personowning the oor at a time and turns rarely overlapping. In contrast,the cooperative oor is typi ed by a feeling of participants being"on the same wavelength" in a conversation that is a "free-for-all"([10, p.384]), where there is a sense that no one owns the oor.The cooperative oor seems to capture the sense of cohesivenessand engagement that is associated with positive experiences inconversational scenarios.Cooperative oors have been studied extensively in existing liter-ature. For example, in the social sciences literature, measures of con-versational equality and freedom [5, 21], measures of conversational uency through frequent turns, turn overlap and turn duration[9], and measures of turn synchronisation [32], seem to resonateparticularly strongly with Edelsky’s views on cooperative oors.Spontaneous interactions are forms of such cooperative oors of interaction where there exists a sense of spontaneity amongst inter-acting partners and the interaction is non-task-directed. Reitter et al.(2010) [30] reveals the presence of contrasting behaviour patternsbetween a task-directed and a non-task-directed interaction. Thismotivates us to study such interactions separately with their respec-tive considerations. In this research, we speci cally concentrate onspontaneous non-task-directed interactions. Spontaneous interactions are a dynamic social conversation setting,where a wide range of inter-personal relationships and social con-structs emerge from within. Such relationships and constructs, asdi erent aspects of individual experiences, have been studied exten-sively in existing literature. For example, social constructs whichmeasure the inter-personal relationship (e.g. Rapport and

Bonding ),and social constructs which measure individual- and group- levelbehaviour (e.g. Involvement and Interest-levels) have been studiedby researchers with their respective considerations.Rapport and Bonding, as social construct, has been widely consid-ered as a dyadic construct and as a self-reported measure [15, 17, 23].Müller et al. (2018) [23] de ne Rapport as, "the close and harmo-nious relationship in which interaction partners are “in sync” witheach other". The authors in their research, by considering Rapportas a dyadic-level phenomena, quanti ed Rapport between inter-acting pairs by relying on self-reported questionnaire measuresadopted from Bernieri et at (1996) [3]. Another similar construct isBonding, which measures the positive personal attachment, includ-ing “mutual trust, acceptance, and con dence” amongst interactingpairs [16]. Based on this de nition, Jaques et al. (2016) [17], studiedthe Bonding in human-agent interactions. The authors in their re-search used the Bonding subscale of the Working Alliance Inventory(B-WAI) [16] to quantify Bonding in human-agent interactions.While Rapport and Bonding tapped into the inter-personal rela-tionships, several other social constructs, which quantify individual-and group- level behaviour, have also been studied in literature, e.g.Involvement, Engagement and Interest-levels. John H. Antil (1984)[1] de nes involvement as "the level of perceived personal impor-tance and/or interest evoked by a stimulus (or stimuli) within aspeci c situation". Following this de nition, Oertel et al. (2011) [26]study participants’ degree of involvement in social conversations.The authors in their research developed an perceived annotationscheme based on hearer independent, intuitive impressions and an-notated for ten levels of involvement, each of the levels explainingits respective degree on involvement.Similar to Involvement [26] and Engagement [27], as a perceivedmeasure, Gatica-Perez et al. (2013) [12] de ne group interest-levelsas, "the perceived degree of interest or involvement of the majorityof the group". The authors relied on naive external annotators toannotate for interest-levels using audio-visual recordings of inter-actions, on a discrete 5-point scale. As instructions to annotators,the formal de nition of group interest-level and examples of inter-est indicating activities (e.g. note-taking, focused gaze, and avidparticipation in discussion) were provided.From the above discussed research works, we see that researchworks tend to quantify social constructs either by relying on self-reported measures or externally perceived annotation measures. efining and antifying Conversation ality in Spontaneous Interactions ICMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands Self-report measures have many advantages, but they also su erfrom speci c disadvantages due to the way that subjects gener-ally behave [11]. Self-reported answers may be exaggerated thatrespondents may be too embarrassed to reveal private details, vari-ous biases may a ect the results, like social desirability bias [25].In cases of series of short bursts of spontaneous interaction, sub-jects may tend to also forget longitudinal details and require anExperience Sampling Method (ESM) based data collection. At thesame time, perceived measures are only an approximation of actualperceptions of the individuals and their perceptions. But, perceivedmeasures are free from several keys issues faced by self-reportedmeasures, mainly the issues of egoistic biases [11, 25], recall biases[24] and cognitive errors [24]. The characteristics of perceived mea-sures, as discussed above, make them more suitable towards thedevelopment of social robots in dynamic spontaneous interactions.On the other hand, from the above discussed research works,we see that there exists a common trend where researchers tend tofocus on a particular aspect of social interactions and its individualexperience, be it the inter-personal relationships or individual- andgroup- level engagement. Contrast to these studies, there have beensome attempts to quantify individual experiences and social inter-actions with a holistic viewpoint by considering several uniqueaspects involved. Examples of such works include Cuperman andIckes [8] and, Lindley and Monk [21]. However, these works consid-ered di erent social settings, and were carried out for di erent pur-poses. While Cuperman and Ickes relied on self-reported measuresin a dyadic clinical setting, Lindley and Monk quanti ed perceivedenjoyment in a task-directed social interaction. Moreover, the set ofaspects studied by the two works are mutually exclusive from oneanother and only capture a limited set of individual experiences ina spontaneous conversations.Cuperman and Ickes (2009) [8], used the unstructured dyadicinteraction paradigm to examine the e ects of gender and the BigFive personality traits on the members’ behaviors and perceptionsof the interaction. For this purpose, the authors introduced the Per-ception of Interaction (POI) questionnaire to collected self-reportedmeasures of a participant’s perception of the interaction quality.This questionnaire contained 27 items that required the participantsto rate their interaction experience, with respect to several uniqueaspects of the conversation. These aspects covered by POI includeaspects such as,

Quality of the Interaction , the

Degree of Rapport they felt they had with the other person, and the

Degree to whichthey Liked their interacting partner. This holistic measure of inter-actions has been successfully adopted by several other research tostudy social constructs such as bonding (Jaques et al. (2016) [17])and interaction experience (Cerekovic et al. (2014) [6]) in human-agent interactions. Similar to Cuperman and Ickes [8], Lindley andMonk (2013) [21], with a holistic viewpoint on social interactions,studied several behavioral process measures to develop the

ThinSlice Enjoyment Scale as a measure of experience and empathisedenjoyment in social conversations. The thin-slice enjoyment scalespeci cally captures four unique aspects of a social interactions,namely Conversation Equality , Conversation Freedom , ConversationFluency and

Conversation Enjoyment . In this section, we formally de ne the measure of perceived Conver-sation Quality . This measure, introduced in this research, has beeninspired from Edelsky’s de nition of cooperative oors [10]. Thecooperative oor, in contrast to exclusive oors, are self organisingsystems typi ed by a feeling of participants being " on the samewavelength " in a conversation that is a " free-for-all " [10, p.31]. Thisidea of the cooperative oor captures the sense of cohesiveness andengagement amongst interacting partners which is associated withpositive individual experiences in social interactions. Consideringspontaneous interactions as forms of cooperative oors, Edelsky’sde nition of cooperative oors [10, p.384] will be a suitable startingpoint to quantify the overall quality of spontaneous interactions.With respect to Edelsky’s de nition of cooperative oors [10,p.384], in this research, we de ne the measure of perceived Conver-sation Quality in a spontaneous interaction as, the degree to which participants in the spontaneousinteraction are of the same wavelength and maintaina free-for-all oor, as perceived by external observers .The two keywords here, same wavelength and free-for-all , arethe two high-level aspects of Conversation Quality and are vitalin de ning the measure. In a cooperative spontaneous interactionsetting, the aspect of same wavelength is multi-faceted in natureand tends to capture the sense of cohesiveness, rapport and en-gagement that is associated with positive experiences in conver-sational scenarios. Similarly, the aspect of free-for-all intends tocapture the equal opportunity shared amongst interacting partnersin conversational scenarios. With these two high-level aspects of Conversation Quality , we believe the quanti cation of individualexperiences in spontaneous interactions, with a holistic viewpoint,can be achieved. In the following sections, we will further discusshow these two high-level aspects of perceived Conversation Quality can be captured aptly using di erent constituents. From the literature review presented earlier (in Section-2), we seethat previously studied social constructs, such as cohesion [2], rap-port [23], bonding [17], enjoyment [22] and interest-levels [12] intendto capture a particular aspect of social interactions. Such socialconstructs do not intend to the quantify the overall quality of spon-taneous interactions by capturing di erent aspects of individualexperiences. For example, the measure of Rapport captures the inter-personal relationship in a social interaction by measuring the degreeto which interacting partners are "in-sync" with each other [23], butdoes not capture several other key aspects such as degree of involve-ment [26], free-for-all [21] and interpersonal liking [8]. Similarly,the measures such as interest-levels and engagement capture the de-gree of involvement displayed by individuals in the interaction, butdoes not capture the aspects such as interpersonal relationship [23], quality of interaction [8] and free-for-all [21]. In contrast to all thesesocial constructs, the perceived Conversation Quality quanti esspontaneous interactions with holistic viewpoint.In this section, we present the constituents of the measure ofconversation quality. Each of these constituents intend to uniquelycapture a speci c aspect of individual experiences in a spontaneous CMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands Raj Prabhu et al. interaction, thereby measuring the two high-level aspects of con-versation quality, same wavelength and free-for-all aspects. Inspiredby several research works in literature which study di erent aspectof individual experiences in spontaneous interactions, we presentthe four constituents of perceived Conversation Quality measure as,(1) Interpersonal Relationship, (2) Interpersonal Liking, (2) Natureof Interaction, and, (4) Equal Opportunity. Interpersonal Relationship . The constituent of

InterpersonalRelationship was designed by drawing inspirations from Jaques etal.’s Bonding [17] and Muller et al.’s Rapport [23]. This particularconstituent of Conversation Quality captures the degree of associa-tion or acquaintance between interacting partners in a spontaneousinteraction. The constituent directly measures interactions with re-spect to social constructs related to interpersonal relationships, e.g.rapport [23], cohesion [2] and bonding [17]. For example, the con-stituent measures the degree to which an individual was acceptedand respected by other individuals in the group or the degree towhich the other individuals were paying attention to the individual.The

Interpersonal Relationship amongst interacting partners iswidely acknowledged to result in improved collaboration, and im-proved interpersonal outcomes, thereby having a key in uence onthe Conversation Quality and individual experiences. Interpersonal Liking . The constituent of

Interpersonal Liking was designed by drawing inspirations from Cuperman and Ickes’POI [8] and Cerekovic et al.’s Interaction Experience [6]. This partic-ular constituent captures the degree to which an individual person-ally likes their interacting partners and the ongoing conversationwith them. The constituent directly measures interactions withrespect to the social constructs related to the interpersonal liking,like Cuperman and Ickes’ Degree of Likeness [8] and Attraction[18]. For example, the extent to which an individual would like tointeract more with their interaction partners or the extent to whichan individual liked the other individuals in the interaction.While the previously discussed constituent of Conversation Qual-ity measured the interpersonal relationship based dimensional as-pect of spontaneous interactions, this particular constituent mea-sures another key aspect of such interactions, the InterpersonalLiking. While this particular constituent is key in quantifying spon-taneous interactions, it is also important to note that this measureis an intimate measure of an individual’s experience. Hence, similarconstructs have been widely quanti ed by researchers using self-reported measures [8, 18]. With that in consideration, this particularconstituent cannot be extended to perceived measures. Nature of Interaction . The constituent of

Nature of Interaction was designed by drawing inspirations from Cuperman and Ickes’POI [8]. This particular constituent of Conversation Quality directlycaptures the positive experiences and the nature of interactionsamongst interacting partners. The constituent measures interac-tions with respect to the social constructs related to the positiveexperiences, like Cuperman and Ickes’ Quality of Interaction [8]and Lindley and Monk’s Empathised Enjoyment [21]. While the pre-viously discussed constituents of Conversation Quality measuredthe interpersonal relationship and liking based dimensional aspectsof spontaneous interactions, this particular constituent directly cap-tures the nature of interaction amongst interacting partners and the positive experiences involved. For example, the degree to whichthe individual’s interaction was smooth and relaxed or the degreeto which the individual’s interaction was forced and awkward.

Equal Opportunity . The constituent of

Equal Opportunity wasdesigned by drawing inspirations from Edelsky’s work on cooper-ative oors [10]. This particular constituent directly captures thefree-for-all concept in a spontaneous cooperative interaction, thatis the equal opportunity shared amongst interacting partners. Forexample, free-for-all factors like conversation freedom [20], con-versation equality [21] and an individual’s opportunity to take thelead in the conversation [8, 17] resonate well with the concept offree-for-all and equal opportunity. Free-for-all is an essential aspectof cooperative oors and spontaneous conversations, and hence isan important constituent in measuring the Conversation Quality . As discussed earlier, in this research we quantify conversation qual-ity in spontaneous interactions using externally perceived measures.In this section, we present the forms in which

Conversation Quality as a social construct can be perceived in spontaneous interactions.

Figure 2: Illustration of the two forms of perceived

Conver-sation Quality . The red and green boundaries illustrate thescope of observation to measure group-level and individual-level perceived Conversation Quality respectively.

Social interactions are multi-level systems that involve socialconstructs emerging from di erent levels of interactions [13]. Forexample, social constructs, in our case the perception of Conver-sation Quality , can emerge at di erent levels of interaction, e.g.individual level, dyadic level, group level or even the subgrouplevel. Perception of social constructs at di erent levels of interac-tion occur with a focus over the respective level. For example, an individual-level construct’s perception occur with a prime focus onthe individual and their interactions. When studying groups andteams, researchers can include individual-level and/or group-levelphenomena in their research design. The ability of a socially intelli-gent system to perceive and understand both the individual-leveland group-level Conversation Quality helps the system in under-standing the in uence of the individual-level phenomena on thegroup-level phenomena, which are key in development of severalsocial robot applications.In this research, we consider that the social construct of Con-versation Quality exists in the perception of external observersin two forms -

Perceived Individual’s experience of ConversationQuality (the individual-level phenomena) and

Perceived Group’sConversation Quality (the group-level phenomena). An illustration efining and antifying Conversation ality in Spontaneous Interactions ICMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands with respect to the two perceived measures of Conversation Quality and its scope of perception can be seen in Fig-2. The two forms ofperceived conversation quality are de ned in the coming sections. Perceived Group’s Conversation Quality . For this research,we de ne the perceived group-level conversation quality as an ex-ternal observer’s perception of the conversation quality of the groupas a collection of all its individuals. This perceived measure directlytaps into the what an external observer perceives or feels about theconversation going on in the whole group. On a high-level, thismeasure is the answer to the question,How do you rate the overall quality of the conversa-tion involving the whole group, with respect to thegroup’s interpersonal relationship , its interpersonal lik-ing , its nature of interaction and the equal opportunity maintained in them.The perceived annotation of this measure results in one ratingper group, the group’s externally perceived conversation quality. Perceived Individual’s Experience of Conversation Quality . For this research, we de ne the perceived individual’s experienceof conversation quality as an external observer’s perception of thequality of the conversation as experienced by the individual . Thisperceived measure directly taps into an external observer’s percep-tion of an individual’s experience in their conversation with thegroup. On a high-level, this measure is the answer to the question,How do you rate the particular individual’s experi-ence in the group, with respect to their relationship ,their liking , their nature of interaction and the equalopportunity shared with their interacting partners.The perceived annotation of this measure results in each individ-ual in the group receiving a Conversation Quality rating given tothem by the external observer. Hence, n perceived individual-levelratings are received per group, where n denotes the group-size. In this section, we present the two questionnaires devised to mea-sure the respective forms of

Conversation Quality as perceived byexternal naive annotators. The. questionnaire can be used by naiveexternal annotators to annotate for perceived

Conversation Qual-ity in non-task-directed spontaneous small group interactions, byrelying solely on video clips of the interactions.The two

Perceived Conversation Quality questionnaires weredevised by drawing inspirations from research works such as the

Perception of Interaction (POI) by Cuperman and Ickes (2009) [8]and the

Thin-Slice Enjoyment Scale (TES) [21]. The POI and TESquestionnaires have been widely used by researchers to study socialinteractions in di erent scenarios [6, 17, 21]. Di erent from thesestudies, in this research we speci cally focus on perceived socialconstructs in non-task-directed spontaneous group interactions.Hence, while drawing inspirations on the questionnaire items, wealso modify the items to suit our social setting. The following stepswere taken to modify the respective questionnaire items,(1) All the items were made suitable for external annotators,suitable for perceived social constructs. That is, the items were modi ed to be directed towards the annotator them-selves. For example, the item "I did not want to get alongwith the character" was modi ed to "The individual seemedto have gotten along with the group pretty well".(2) All the items were modi ed to a small group social settingand not restricted to a dyadic interaction. For example, thequestion - "I felt accepted and respected by the character" wasmodi ed to - "The group members accepted and respectedeach other in the interaction".(3) Questionnaire items which relied on the content of the con-versation and ones which relied on modalities other thanvideo clips were not considered. For example, the questionwhich involved content of the conversation was excluded.e.g. the questionnaire item "The character often said thingscompletely out of place" was excluded.(4) Since our research focuses on a perceived measure of Con-versation Quality, the intimate constituent of InterpersonalLiking was excluded while building the questionnaire. Forexample, the item "Did you desire to interact more with part-ner in the future?" was excluded as an external annotatorcannot perceive an individual’s personal liking.The two questionnaires, devised to quantify perceived

Conversa-tion Quality at the individual- and group- level, can be found in theAppendix-7.1 and 7.2.

In this section, we explain and discuss the strategy used to collectannotations for perceived conversation quality, using the

PerceivedConversation Quality questionnaire presented earlier. The sectionis sub-sectioned as follows. Firstly, we will discuss the dataset used,secondly, we will discuss the procedure followed to collect perceivedannotations for Conversation Quality, and nally, we will presentthe results of the analysis performed on the collection annotations. In this research, to quantify the perceived conversation in sponta-neous interactions, we used the publicly available MatchNMingledataset [4]. MatchNMingle is a multimodal dataset for the analysisof spontaneous free-standing conversational groups and speed-dates in-the-wild. The ecologically validated datasets contributesto the ecological validation of our study of conversation quality.The dataset leverages the use of wearable devices and overheadcameras to record a large number of in-the-wild social interactionsduring a real-life speed-date event and a cocktail party. For thisresearch, we utilise only the data from the cocktail party.The dataset consists of two hours of dynamic spontaneous in-teraction involving 92 participants, making it one of the largestdataset with a large number of participants and their ever-evolvinginteractions. This nature of the dataset was the prime motivationbehind using the dataset for our study of spontaneous interactions.The interactions, in the dataset, were lmed using overhead GoProcameras. In total 5 cameras were used to lm the cocktail party(1080p, 30fps, ultra-wide eld of view).The MatchNMingle dataset also consists of f-formation annota-tions for 30-minutes cocktail party event. The annotations were CMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands Raj Prabhu et al. (a) Group Cardinality. (b) Duration of interactions.

Figure 3: Distribution of f-formations in MatchNMingle. performed using the visually perceivable spatial positions of par-ticipants during their interactions. The sub-sampled 30-minutes ofannotated video segments were chosen randomly with an aim toeliminate the possible e ects of acclimatization, and to maximizethe density of participants and the number of social actions thatcould occur in the whole scene. In the 30-minutes segments ofannotated f-formations in the mingling session, there we in total174 f-formations. For this research, we consider a group to be an f-formation and the group members to be all the participants presentin the particular f-formation. The duration of the f-formation inter-actions were distributed with a mean of 1.91, standard deviationof 2.13, median of 1.10 and a mode of 0.42. The distribution of thef-formation samples, with respect to its group size and duration,can be found in the Appendix, in Figure-3. In this subsection, we explain the annotation procedure used tocollect annotations for perceived

Conversation Quality . While ex-plaining the annotation procedure, we also discuss several keyconsiderations taken to devise the strategy.The video clips of the spontaneous interactions, lmed usingoverhead cameras, was the only modality used for the manual anno-tation of the Conversation Quality . No audio data was used for theannotation process. Several other research works in literature havesuccessfully collected rich annotation data by relying completelyon video clips [12, 21]. Audio recordings in most of the conversa-tion scenarios are unavailable due to privacy reasons. Moreover,annotations using audio data as one of the modality is also time con-suming as they are generally prone to problems such as languageconstraints, audio noise, lack of clarity and sometimes requiresspeaker diarisation. On the other hand, manual annotations usingonly video recording are easier and less time consuming. At thesame time, video recordings also have the capability to capturerich non-verbal behaviours of participants in the social interaction.Before using the f-formation groups for annotation, we cropped therespective f-formations out of the overhead video recordings. Thiswas done in order to prevent annotators from getting distractedaway from the current f-formation in focus.Post cropping out f-formations from the video recordings, longerf-formation interactions were split into multiple smaller segmentsof interactions and then was presented to annotators as indepen-dent clips of social interactions. This was done in order to collectmore reliable and granular annotations for longer group interac-tions. From Figure-3b, we see that the durations of f-formation (a) Group Cardinality. (b) Duration of interactions.

Figure 4: Distribution of nal f-formation samples. interactions varies widely, from interactions of few seconds to thatgreater than 3-4 minutes. In that case, it is not reliable enough tohave only one label annotation to de ne the conversation qualityfor the f-formation interactions of di erent durations. With thedistribution in consideration, we decided to split f-formations ofduration greater that 3 minutes into independent interactions of1-2 minutes each. For the same reason for which we split the longerlasting f-formations, we omitted the f-formations with durationsless than

30 seconds . Post the omission and splitting processes, thetotal number of resulting f-formation groups was 115. The distribu-tion of those groups with respect to the group size and interactionduration can be seen in Figures - 4a and 4b respectively.With the processed video clips of spontaneous interactions, wedecided to request naive annotators to help us in the annotationof perceived Conversation Quality . For this study, we were able togather three naive annotators. The three annotators were agedbetween 22-30 years. Out of the three annotators, two were femalesand one was male. The annotators were provided with video clip-pings of the independent f-formations of spontaneous interactionsand were asked to ll out both the Perceived Conversation Quality questionnaires (presented in Section-3.3). These f-formation groupswere provided to the annotators in randomised order for each an-notator, to prevent any annotator bias which might occur in case astrict f-formation clips order is followed.

In this subsection, we present the results of the data analysis per-formed on the annotation responses collected through the annota-tion procedure explained earlier.We rst carried out principal component analysis on the annota-tions. This analysis showed that 71% and 65.2% of the variance, inthe group-level and individual-level annotations respectively, couldbe explained by the rst principal component. While the rst fourprincipal components are capable of explaining over 80% of thedata variance. The eigenvalue bar-chart can be seen in Figure-5.For further analysis of the annotations data, we plotted the datasamples with respect to the rst two principal components. Theplot along with the factor loadings can be seen in Figure-6. Each lineshown in the plots are the magnitude of loading of each questionin the principal component space. A longer line indicates a largervariability of the vector in the two components and vice-versa.The numberings labelled on each loading line corresponds to therespective questionnaire item. efining and antifying Conversation ality in Spontaneous Interactions ICMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands (a) Group-level annotations. (b) Individual-level annotations. Figure 5: Eigenvalue distribution (bar chart) and the cumu-lative percentage of the explained variability (line plot). (a) Group-level annotations. (b) Individual-level annotations

Figure 6: Plot of the factor loadings (black lines) and the sam-ples (blue dots) in the rst two principal components. From the annotations plots, at both the individual-level (Figure-6b) and group-level (Figure-6a), we see that questions are particu-larly clustered into two clusters, one cluster where questions showhigh variance towards the negative scale of the rst principal com-ponent and the second cluster where questions show high variancetowards the positive scale of the rst principal component. Onfurther analysis, we found that the questions in the two clustercorresponds respectively to the orientation of the scale for eachquestion. For example, in the gure-6b, the questions 5, 3 and 10are reversed in scale orientation from the rest of the questions.Similarly, in the gure-6a, the question 3 is reversed in scale ori-entation from the rest of the questions. This observation suggeststhat the three naive annotators treated the respective questionnaireitems in a similar fashion. At the same time, we also see that fewquestion items are strongly loaded with comparison to other items.For example, question 6 in both the annotations (6a, 6b) and alsoquestion 5 in (6a). It was interesting to note that, all these abovementioned highly loaded questions belong to the Free-for-All part ofthe questionnaire. This suggests us that the annotations for the free-for-all question items had the highest variance (between groups)in comparison with the other segments of the questionnaire.

Post the analysis on the annotation distributions, we performedanalysis on the inter-annotator agreeability scores. For this, we usedthe quadratic weighted kappa measure [7], a variant of the Cohen’skappa measure. The measure allows disagreements to be weighteddi erently and is especially useful when the annotation data are or-dinal in nature. To further analyse the nal mean kappa agreements, for both the group- and individual- level annotations, we plottedthe mean kappa score against the respective mean conversationquality score in a scatter plot, seen in Figure-7. A similar plot wasused by Hung et al. [2] to analyse the inter-annotator agreeabilityfor small-group meetings of di erent levels of cohesion. (a) Group-level annotations. (b) Individual-level annotations. Figure 7: Scatter plot between the Mean Kappa score ( ^ ) andrespective Mean Conversation Quality score. From the Figure-7, we see that there exists a linear relationshipbetween mean kappa scores and mean conversation quality scores.That is, inter-annotator agreeability decreases as conversation qual-ity scores decrease, suggesting that annotators agree better onconversations of higher quality when compared to conversationsof lower quality. At the same time, a closer look reveals that, inthe individual-level annotations (Figure-7b), there exists a smallcluster of samples where annotators tended to agree higher forlower conversation quality samples as well. In contrary, for thegroup-level annotations (Figure-7a), annotators never agree wellfor low conversation quality samples. But, this was not expectedby us. We expected similar results as seen in Hung et al.’s work [2],where inter-annotator agreements on cohesion levels for meetingswere higher at the two extremes of the scale. Such a behaviour isseen only marginally and only for the individual-level annotations. (a) Group-level annotations. (b) Individual-level annotations.

Figure 8: Scatter plot between Mean Kappa score ( ^ ) andMean Conversation Quality score, after ZM adjustment. One widely used technique to handle low inter-annotator agree-ability is to correct for mean-shifts, as used by Ringeval et al. (2013)[31]. The authors used a zero-mean (ZM) local normalization tech-nique to remove an eventual bias in an annotator’s annotations, e.g.a shift toward positive or negative values. We performed similarZM adjustments on our annotations and similar analysis was per-formed. The resulting plots, seen in Figure-8, shows that no majorchanges are seen post the ZM technique. This suggests that there

CMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands Raj Prabhu et al. exists no mean shift between annotators but there exists a basicdi erence in annotator judgements. Building on Edelsky’s work on cooperative oors [10], we formallyde ned the measure of Conversation Quality and also presentedits constituents. To further quantitatively study perceived Conver-sation Quality, we devised a questionnaire which measures, at theindividual- and at the group- level, the perceived ConversationQuality in spontaneous interactions.The questionnaires were used as an instrument to quantify Con-versation Quality in a publicly available in-the-wild dataset of spon-taneous interactions. By considering a stable f-formation to be aconversation group sample, we annotated the sample for perceivedconversation quality at both the group- and individual- level. Atthe same time, works such as Raman et al. (2019) [29], prove exis-tence of multiple conversation oors within an f-formation. Withthat in mind, as a future work, it would be interesting to quantifyconversation quality for a particular conversation oor, rather thanthe whole f-formation.Moreover, in this study, we have quanti ed a spontaneous inter-action using one perceived Conversation Quality measure, assum-ing that there exists one stable perceived conversation quality scorethroughout the interaction. But, social interactions and individualexperiences are dynamic in nature and requires a more ne grainedapproach. Several researchers have handled this using thin-slicebased annotations [17][21]. Such a thin-slice based approach canhelp us further study the dynamics involved in the ConversationQuality of spontaneous interactions. Nevertheless, the devised ques-tionnaire is exible enough to be used for the collection of thin-sliceannotations, and can be an interesting future work.From the analysis on the distribution of the annotations, we sawthat the naive annotators handled all the questionnaire items ina similar fashion. But, a deeper analysis with respect to the inter-annotator agreeability revealed that annotators tended to disagreewith each other in cases of lower conversation quality samples.This behaviour was strongly prevalent in case of the group-levelannotations than that of the individual-level annotations. A proba-ble explanation, on the contrast between the two levels, could bethat di erent naive annotators tend to employ di erent aggregationstrategy to compile the overall group’s conversation quality fromindividual-level measures, especially in cases of low conversationquality samples. With that in mind, as a future work, it would be in-teresting to use trained annotators in place of naive annotators. Thiscould result in a richer dataset for further analysis and predictivemodeling of perceived Conversation Quality. In this paper, we designed a novel measure, the perceived

Conversa-tion Quality , which measures the overall quality of spontaneous in-teraction with a holistic view on individual experiences. To achievethis, we de ned the measure to capture four unique aspects of socialinteractions, namely Interpersonal Relationship , Interpersonal Liking , Nature of Interaction and

Equal Opportunity . Social interactionsbeing multi-level systems, we de ned that Conversation Qualitycan be perceived at two levels of perception, the individual-level (Perceived Individual’s Experience of Conversation Quality) andgroup-level (Perceived Group’s Conversation Quality).To quantitatively study the novel measure, we devised two liter-ature backed questionnaires which quanti es Conversation Qualityat its respective levels of perception. We further used this question-naire to collect perceived animations of Conversation Quality in apublicly available dataset, by relying on video clips of spontaneousinteractions and naive external annotators. The analysis on thecollected annotations revealed that, though the naive annotatorstreat the respective questionnaire items in similar fashion, theytend to agree less, with low inter-annotator agreement scores, incases of low conversation quality samples. This behaviour is moreprominent in group-level Conversation Quality annotations thanthat of the individual-level annotation, suggesting the usage oftrained annotators in place of naive annotators.Nevertheless, this research work is a pioneer in studying in-dividual experiences in spontaneous interaction with a holisticviewpoint. Also, the devised questionnaire and the collected anno-tations can further facilitate the quantitative modeling of perceivedConversation Quality. ACKNOWLEDGMENTS

This research was partially funded by the Netherlands Organizationfor Scienti c Research (NWO) under the MINGLE project number639.022.606. We also thank Swathi Yogesh, Divya Suresh Babu, andNakul Ramachandran for their time and patience in helping withannotating the dataset. REFERENCES [1] John H Antil. 1984. Conceptualization and operationalization of involvement.

ACR North American Advances (1984).[2] Audio-visual Nonverbal Behavior, Hayley Hung, and Daniel Gatica-perez. 2010.Estimating Cohesion in Small Groups Using Audio-Visual Nonverbal Behaviour.

IEEE Transactions on Multimedia

12, 6 (2010), 563–575. https://doi.org/10.1109/TMM.2010.2055233[3] Frank J Bernieri, John S Gillis, Janet M Davis, and Jon E Grahe. 1996. Dyadrapport and the accuracy of its judgment across situations: A lens model analysis.

Journal of Personality and Social Psychology

71, 1 (1996), 110.[4] Laura Cabrera-Quiros, Andrew Demetriou, Ekin Gedik, Leander van der Meij, andHayley Hung. 2018. The MatchNMingle dataset: a novel multi-sensor resourcefor the analysis of social interactions and group dynamics in-the-wild duringfree-standing conversations and speed dates.

IEEE Transactions on A ectiveComputing (2018).[5] Jean Carletta, Simon Garrod, and Heidi Fraser-Krauss. 1998. Placement of au-thority and communication patterns in workplace groups: The consequences forinnovation. Small Group Research

29, 5 (1998), 531–559.[6] Aleksandra Cerekovic, Oya Aran, and Daniel Gatica-Perez. 2014. How do youlike your virtual agent?: Human-agent interaction experience through nonverbalfeatures and personality traits.

Lecture Notes in Computer Science (includingsubseries Lecture Notes in Arti cial Intelligence and Lecture Notes in Bioinformatics) Bulletin, 70, 213â

220 (1968).[8] Ronen Cuperman and William Ickes. 2009. Big Five Predictors of Behavior andPerceptions in Initial Dyadic Interactions: Personality Similarity Helps Extravertsand Introverts, but Hurts "Disagreeables".

Journal of Personality and SocialPsychology

97, 4 (2009), 667–684. https://doi.org/10.1037/a0015741[9] Owen Daly-Jones, Andrew Monk, and Leon Watts. 1998. Some advantages ofvideo conferencing over high-quality audio conferencing: uency and awarenessof attentional focus. International Journal of Human-Computer Studies

49, 1 (1998),21–58.[10] Carole Edelsky. 1981. Who’s Got the Floor?

Language in Society

APSObserver

10, 1 (1997).[12] Daniel Gatica-Perez, Iain McCowan, Dong Zhang, and Samy Bengio. 2005. De-tecting Group Interest-Level in Meetings. In efining and antifying Conversation ality in Spontaneous Interactions ICMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands on Acoustics, Speech, and Signal Processing, ICASSP ’05, Philadelphia, Pennsylvania,USA, March 18-23, 2005 . 489–492. https://doi.org/10.1109/ICASSP.2005.1415157[13] Riemannian Geometry and Geometric Analysis. [n.d.]. Advancing MultilevelResearch Design: Capturing the Dynamics of Emergence - Steve . Number Cdm.581–615 pages.[14] Erving Go man. 1961. Encounters: Two studies in the sociology of interaction .Ravenio Books.[15] Juan Lorenzo Hagad, Roberto Legaspi, Masayuki Numao, and Merlin Suarez. 2011.Predicting levels of rapport in dyadic interactions through automatic detectionof posture and posture congruence. In . IEEE, 613–616.[16] Adam O Horvath and Leslie S Greenberg. 1989. Development and validation ofthe Working Alliance Inventory.

Journal of counseling psychology

36, 2 (1989),223.[17] Natasha Jaques, Daniel McDu , Yoo Lim Kim, and Rosalind Picard. 2016. Un-derstanding and predicting bonding in conversations using thin slices of facialexpressions and body language. Lecture Notes in Computer Science (includingsubseries Lecture Notes in Arti cial Intelligence and Lecture Notes in Bioinformatics) ective Computing and IntelligentInteraction Workshops and Demos (ACIIW) . IEEE, 154–160.[19] Adam Kendon. 1990. Conducting interaction: Patterns of behavior in focusedencounters . Vol. 7. CUP Archive.[20] Catherine Lai and Gabriel Murray. 2018. Predicting group satisfaction in meetingdiscussions.

Proceedings of the Workshop on Modeling Cognitive Processes fromMultimodal Data, MCPMD 2018 (2018). https://doi.org/10.1145/3279810.3279840[21] Siân E. Lindley and Andrew F. Monk. 2013. Measuring social behaviour as anindicator of experience.

Behaviour & Information Technology

32, 10 (oct 2013),968–985. https://doi.org/10.1080/0144929X.2011.582148[22] Florian Lingenfelser, Johannes Wagner, Elisabeth André, Gary McKeown, andWill Curran. 2014. An event driven fusion approach for enjoyment recognition inreal-time. In

Proceedings of the 22nd ACM international conference on Multimedia .377–386.[23] Philipp Müller, Michael Xuelin Huang, and Andreas Bulling. 2018. Detecting LowRapport During Natural Interactions in Small Groups from Non-Verbal Behaviour.

CoRR abs/1801.06055 (2018). arXiv:1801.06055 http://arxiv.org/abs/1801.06055[24] Lance J. Rips Norman M. Bradburn and Steven K. Shevell. 1987. AnsweringAutobiographical Questions: The Impact of Memory and Inference on Surveys.In

New Series 1987 . 236(4798):157–167.[25] David A Northrup. 1997.

The problem of the self-report in survey research . Institutefor Social Research, York University.[26] Catharine Oertel, Céline De Looze, Stefan Scherer, Andreas Windmann, PetraWagner, and Nick Campbell. 2011. Towards the Automatic Detection of Involve-ment in Conversation. In

Analysis of Verbal and Nonverbal Communication andEnactment. The Processing Issues , Anna Esposito, Alessandro Vinciarelli, KláraVicsi, Catherine Pelachaud, and Anton Nijholt (Eds.). Springer Berlin Heidelberg,Berlin, Heidelberg, 163–170.[27] Catharine Oertel and Giampiero Salvi. 2013. A gaze-based method for relatinggroup involvement to individual engagement in multimodal multiparty dialogue.In

Proceedings of the 15th ACM on International conference on multimodal interac-tion . 99–106.[28] Catharine Oertel, Stefan Scherer, and Nick Campbell. 2011. On the use of multi-modal cues for the prediction of degrees of involvement in spontaneous conver-sation.

Proceedings of the Annual Conference of the International Speech Commu-nication Association, INTERSPEECH

August (2011), 1541–1544.[29] Chirag Raman and Hayley Hung. 2019. Towards automatic estimation of con-versation oors within F-formations. In ective Computing and Intelligent Interaction Workshops and Demos (ACIIW) .IEEE, 175–181.[30] David Reitter, Johanna D Moore, and Frank Keller. 2010. Priming of syntacticrules in task-oriented dialogue and spontaneous conversation. (2010).[31] Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013.Introducing the RECOLA multimodal corpus of remote collaborative and a ectiveinteractions. In . IEEE, 1–8.[32] Leon Watts, Andrew Monk, and Owen Daly-Jones. 1996. Inter-personal aware-ness and synchronization: assessing the value of communication technologies. International Journal of Human-Computer Studies

44, 6 (1996), 849–873.[33] D. Wyatt, T. Choudhury, and H. Kautz. 2007. Capturing Spontaneous Conversa-tion and Social Dynamics: A Privacy-Sensitive Data Collection E ort. In , Vol. 4. IV–213–IV–216. The questionnaire items below have been organized in terms of thedi erent constituents of Conversation Quality (Section-3.1). Thenumbering before each questionnaire item indicate the ordering ofthe items in the original questionnaire. The source for each term isprovided at the end of each question. Perceived Group’sConversation Quality

Instruction for the annotators: Use the set of questions below toannotate your perception of the group’s conversation quality, asseen in the video. Each interaction aspect in the below questionnaireshould be rated using a ve-point likert scale (Disagree strongly (1)to Agree strongly (5)). Read the questions carefully and observe thewhole group carefully before annotating the video. You are allowedto re-watch the video again if required. Interpersonal Relationship

Nature of Interaction

Equal Opportunity

Perceived Individual’sExperience of Conversation Quality

Instruction for the annotators: Use the set of questions below to an-notate your perception of the individual’s experience in the conver-sation, as seen in the video. Each individual present in the conversa-tion has to be annotated separately with the below questions. Eachinteraction aspect in the questionnaire below should be rated usinga ve-point likert scale (Disagree strongly (1) to Agree strongly (5)).Read the questions carefully and observe the individual carefullybefore annotating the video. You are allowed to re-watch the videoagain if required. Interpersonal Relationship

CMI ’20 Companion, October 25–29, 2020, Virtual event, Netherlands Raj Prabhu et al.

Nature of Interaction