[PDF] Zoomorphic Gestures for Communicating Cobot States

Abstract

Communicating the robot state is vital to creating an efficient and trustworthy collaboration between humans and collaborative robots (cobots). Standard approaches for Robot-to-human communication face difficulties in industry settings, e.g., because of high noise levels or certain visibility requirements. Therefore, this paper presents zoomorphic gestures based on dog body language as a possible alternative for communicating the state of appearance-constrained cobots. For this purpose, we conduct a visual communication benchmark comparing zoomorphic gestures, abstract gestures, and light displays. We investigate the modalities regarding intuitive understanding, user experience, and user preference. In a first user study (n = 93), we evaluate our proposed design guidelines for all visual modalities. A second user study (n = 214) constituting the benchmark indicates that intuitive understanding and user experience are highest for both gesture-based modalities. Furthermore, zoomorphic gestures are considerably preferred over other modalities. These findings indicate that zoomorphic gestures with their playful nature are especially suitable for novel users and may decrease initial inhibitions.

Full PDF

IIEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2021 1

Zoomorphic Gestures for CommunicatingCobot States

Vanessa Sauer , Axel Sauer , and Alexander Mertens Abstract —Communicating the robot state is vital to creatingan efﬁcient and trustworthy collaboration between humans andcollaborative robots (cobots). Standard approaches for Robot-to-human communication face difﬁculties in industry settings, e.g.,because of high noise levels or certain visibility requirements.Therefore, this paper presents zoomorphic gestures based ondog body language as a possible alternative for communicatingthe state of appearance-constrained cobots. For this purpose, weconduct a visual communication benchmark comparing zoomor-phic gestures, abstract gestures, and light displays. We investigatethe modalities regarding intuitive understanding, user experience,and user preference. In a ﬁrst user study ( n = 93) , we evaluateour proposed design guidelines for all visual modalities. A seconduser study ( n = 214) constituting the benchmark indicatesthat intuitive understanding and user experience are highest forboth gesture-based modalities. Furthermore, zoomorphic gesturesare considerably preferred over other modalities. These ﬁndingsindicate that zoomorphic gestures with their playful nature areespecially suitable for novel users and may decrease initialinhibitions. Index Terms —Gesture, Posture and Facial Expressions,Human-Robot Collaboration, Industrial Robots

I. I

NTRODUCTION I NTERACTION and, more speciﬁcally, communication be-tween humans and robots are central challenges in col-laborative robotics [1]. While human-to-robot communicationis an intensively researched ﬁeld, the inverse direction, com-munication from robot to human (RtH), has received lessattention in comparison [2]. Yet, the communication of therobot’s state and the acknowledgment of user commands arerequired for effective and safe collaboration [3]. Well-designedcommunication can also increase trust levels and lead to amore engaging interaction with the collaborative robot (cobot)[3], [4]. The system status has to be informative and easy tounderstand to meet those demands.According to Onnasch et al. [5], cobots can communicatewith human users via acoustic, mechanical, or visual modal-ities. In an industry setting, acoustic communication may beimpeded due to the level and spectral characteristics of noise

Manuscript received: October, 14, 2020; Revised: January, 2, 2021; Ac-cepted: February, 8, 2021.This paper was recommended for publication by Editor Gentiane Ventureupon evaluation of the Associate Editor and Reviewers’ comments. Vanessa Sauer and Alexander Mertens are with the Instituteof Industrial Engineering and Ergonomics at RWTH AachenUniversity, Germany. [email protected],[email protected] Axel Sauer is with the Autonomous Vision Group at the Max-PlanckInstitute for Intelligent Systems and the University of T¨ubingen, Germany.This work was done while Axel Sauer was with the Technical University ofMunich. [email protected]

Digital Object Identiﬁer (DOI): see top of this page. prevalent in these settings. Alternatively, acoustic RtH com-munication would have to be designed very intrusively, e.g.,via loud warning tones [3]. Mechanical RtH communicationmay limit the range in which a user can perceive feedback,such as vibrations. Haptic interfaces can expand this spacebut require additional hardware and the willingness of humanusers to wear the interface while interacting with the cobot.Several modalities are available in visual communication, suchas text displays, lights, or gestures. Compared to the othervisual modalities, gestures have the advantage of being visiblefrom many different positions and distances and do not requireany additional hardware [3].For these reasons, we speciﬁcally focus on gestures for RtHcommunication in an industry setting. Gestures for cobotsare commonly based on human body language [2] due toan easier understanding by the user. In this work, we fo-cus on functionally designed, appearance-constrained cobotslacking expressive faces [6], i.e., robot arms without speciﬁchumanoid design characteristics. For these types of cobots,we argue that gestures based on human body language mayraise excessive expectations regarding the cobot’s capabilities[7]. Furthermore, human-inspired gestures for a non-humanoidcobot may create uneasiness due to a perceptual mismatchof sensory cues (cobot appearance vs. gesture design) [8].Instead, zoomorphic gestures based on animals’ body languagemay be an alternative approach for gesture design.Our proposed approach of zoomorphic gestures for industrycobots may offer the advantages of gesture communication (in-tuitive, visibility from different positions, no additional hard-ware) while avoiding exaggerated expectations and perceptualmismatch. In this work, we investigate if zoomorphic gesturesprovide an intuitive understanding, i.e., whether the modality’smeaning can be unambiguously understood [9]. Furthermore,we evaluate the user experience (attractiveness, joy of use,intuitive use, and intent to use) and the user preferencescompared to other visual communication modalities, such aslight displays and abstract gestures.In this work, we offer several contributions to the researchdomain of RtH communication: • With zoomorphic gestures, we offer a novel approach ofRtH communication for appearance-constrained cobots. • We propose design guidelines for the development ofzoomorphic gestures and evaluate them in an online userstudy ( n = 93 ). • We conduct a benchmark of three visual modalities forcommunicating robot states. We perform the benchmarkwith a large-scale online user study ( n = 214 ) byrecruiting participants with diverse demographics. a r X i v : . [ c s . R O ] F e b IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2021

II. R

ELATED W ORK

RtH communication in industry settings is required to helphuman users predict the cobot’s behavior and actions moreaccurately [4]. Communication of the robot state is possiblevia different channels [5]. Due to the industry context, wefocus on non-facial, non-humanoid communication of therobot state and feedback to the user while omitting the displayof robot emotions. Based on the limitations of acoustic (e.g.,high noise levels in industry settings) and haptic (e.g., requiredhardware, user acceptance) RtH communication in industrysettings, visual communication modalities are preferred forindustry cobots.

Visual RtH communication.

Light displays have a longhistory of being used to express the state of an electronicdevice in general or of a robot state in particular. Yet, thedesign space and the light codes are often kept simple. Thissimpliﬁcation reduces the learning effort by utilizing typicalsignals of widespread electronic devices [10], [11]. Nonethe-less, lights to communicate the robot intent and state have beenexplored in industry settings [12], [13] and other applicationssuch as mobile or ﬂying robots [14], [15]. Light displays havethe advantage of being visible from different distances andangles without interfering with the task execution. They alsoprovide persistent information compared to the transient natureof acoustic feedback [3]. However, the understandability andusefulness of light displays largely depend on the speciﬁcdesign and placement on the robot, requiring time and effortin the design process [11].Previous work on RtH communication in industry settingsalso includes approaches where the robot response and stateis displayed on a screen. The robot state can be visualizedusing pictograms [4] or text-based information [16]. Moreover,communication of robot faults with augmented reality hasalso been explored [17]. While explicit screen-based com-munication tends to be straightforward and easy to interpret,user studies investigating the use of screens show that thiscommunication modality may lead to information overload [4].Moreover, this modality may be less visible from a distance,especially for small screens or texts.Visual RtH communication can also utilize motion-basedcommunication, which primarily entails communication viagestures [4]. Human-inspired gestures are dominant in industryand related applications as they are easy to understand [2],especially when viewed within the task context [18]. Human-inspired gestures have been used for assembly tasks [16],[18] but also less related ﬁelds such as underwater robots[2]. Further, a combination of human-like movements witha humanoid design of the physical robot can reduce users’stress [19]. However, appearance-constrained cobots do notnecessarily have the required physical design to rely onhuman cues or gestures to communicate the robot states likehumanoid cobots [11]. Moreover, perceptual mismatch [8] andexaggerated expectations of the cobot’s capabilities [7] canarise with human-inspired gestures.Alternatively, gestures for motion-based communicationmay also be abstract. In this approach, novel movements, i.e.,not based on any body language, are developed and connected with semantics to express certain robot states or feedback.In the following, we refer to those movements as abstractgestures . We assume that the trajectory of these gestures isoptimal with respect to a particular efﬁciency metric, creatinga robot-typical impression. This metric may be any of theones commonly used in trajectory planning literature, e.g.,minimum execution time, minimum energy, or minimum jerk[20]. [4] and [21] explored the use of robot motion to ex-press robot intent abstractly. Instead of developing dedicatedgestures, they investigate how they can alter goal-orientedmovements to include legible information on the robot’sobjective. In a similar vein, [22] utilize methods from dramaand dance, i.e., Laban movement theory, to derive expressivemotions for appearance-constrained, mobile robots. [23] applythe same approach to a social robot to communicate affect.[24] use passive demonstrations to communicate navigationalintentions of a mobile robot. Moreover, [25] provide generalrecommendations for realizing expressive motions based onthe robot’s morphology and the desired movement. Abstractgestures have the advantage that they can be tailored tothe physical capabilities of an appearance-constrained cobot.However, they may come at the expense of increased effort tolearn the new gestures and their respective semantics, whichcan increase the cognitive load [4].

Zoomorphic gestures.

To address the issues of the commu-nication modalities mentioned above, zoomorphic gestures,based on animal body language, pose one possible approach.To our knowledge, zoomorphic gestures have not yet beenconsidered for RtH communication of cobots for indus-trial applications such as manufacturing. However, they havefound applications in social robotics, including appearance-constrained robots. In this ﬁeld of application, the focuslies on affective displays instead of solely communicatinginformation on the functional state – for example, adding adog tail to a utility robot [26], or leveraging dog-based gestures[7], [27]. Dog-based gestures can achieve high classiﬁcationaccuracy, especially when viewed within the task context [27].Zoomorphic interaction is not limited to RtH communicationbut also extends to human-to-robot communication, e.g., byusing a dog-leash interface to direct a robot [28]. The dominantuse of dog body language for zoomorphic gestures can betraced back to the close relationship between dog and human,which leads to an intuitive understanding of dog-gestures evenby people who do not own a dog [7].III. M

ETHODS

We conducted a benchmark of zoomorphic gestures, abstractgestures, and light displays. We use the benchmark to exploreif zoomorphic gestures provide a better intuitive understandingand user experience for appearance-constrained cobots thanother visual modalities. We use the following approachesand methods to implement the three visual communicationmodalities and conduct the benchmark.

Robot system.

We use the collaborative robot system Pandaby Franka Emika (see Fig. 1) to implement the gestures andlight displays and to perform the subsequent user studies.

AUER et al. : ZOOMORPHIC GESTURES FOR COMMUNICATING COBOT STATES 3

Fig. 1:

Collaborative Assembly Task.

We focus on thecollaborative assembly of fuses as the scenario for gesturedevelopment and user evaluation.Panda is a functional robot targeted at lightweight collabo-rative assembly tasks.

Application task and use cases.

For a concrete applicationfocus, we deﬁned a collaborative task from which we derivedﬁve use cases. As cobots are likely to be used in collaborativeassembly, we chose an assembly task in which the cobotis responsible for the part acquisition. Concretely, the cobotacquires fuses from a storage container and delivers them tothe users (see Fig. 1). The human performs part manipulation(inserting the fuse into a fuse holder) and part operation(screwing the nut to secure the fuse) [18].Based on this speciﬁc assembly task and related studieson similar tasks, e.g., [16], we derived the following ﬁve usecases: (i) Greet user, (ii) Prompt user to take the part, (iii) Waitfor a new command, (iv) Error: Storage container is empty,(v) Shutdown.

Modalities.

We compare three different modalities (see Fig.2). Both the zoomorphic and abstract gestures aim to developemblems [29] based either on dog-body language (zoomorphicgesture) or by developing new, self-contained movements thatconvey a speciﬁc meaning or robot state (abstract gestures).

Zoomorphic gestures.

For each use case, we mirrored theintention of the robot (e.g., prompting the user to take a part)to an intention, a dog may have (e.g., encouraging the ownerto play). In a second step, we collected gestures that dogsuse to express the respective intention by leveraging real-lifeinteraction with dogs, online videos, and literature [26]. Wethen translated the dog gestures into three distinct zoomorphicgestures by jointly applying the following guidelines inspiredby [2]: • Mimicry.

We mimic speciﬁc dog behavior and bodylanguage to communicate robot states. • Exploiting structural similarities.

Although the cobot isfunctionally designed, we exploit certain components tomake the gestures more ”dog-like,” e.g., the camera corre-sponds to the dog’s eyes, or the end-effector correspondsto the dog’s snout • Natural ﬂow.

We use kinesthetic teaching and record afull trajectory to allow natural and ﬂowing movements Fig. 2:

Benchmark Modalities.

We show the modalitiesevaluated in the communication benchmark for the use case”error: storage container empty”. From left to right: zoomor-phic gesture, abstract gesture, light display. We illustrate themovements with faded previous positions.with increased animacy.

Abstract gestures.

To omit the semantics of human or animalbody language in designing the abstract gestures, we creatednew, cobot-speciﬁc gestures inspired by stereotypical robotsand consumer electronics. Towards this goal, we adopted thefollowing guidelines: • Simplicity.

The gestures should generally be simple andgoal-oriented. • Efﬁciency.

Again, we leverage kinesthetic teaching butonly record several keyframes [30]. We utilize aminimum-jerk trajectory planner to generate efﬁcienttrajectories between the keyframes. At each keyframe, therobot remains for a moment before moving on, leadingto a less ﬂowing and more robot-typical movement.As before, we create three different abstract gesture optionsper use case.

Light display.

The light display serves as a reference modal-ity for the two gesture-based modalities in the benchmark.We chose light displays as a reference, because they are awidespread and accepted form to communicate the state ofelectronic devices in general and robots in particular [10]. Weapplied the guidelines by [2] to the design space of FrankaEmika Panda’s built-in light display to make the light displayas robust and intuitive as possible: • Light color.

Color corresponds to the general status: white(neutral), green (prompt for interaction), red (error), lightoff (robot off). We selected additional light colors (whiteand light off) compared to [2]. • Blinking frequency.

Higher frequency indicates a moreurgently required reaction by the user. • Similarity.

Similar robot states are communicated withsimilar light displays, i.e., color and blinking frequency.We recorded the different states of the cobot’s built-in lights.We combined the recordings using video editing software tocreate the desired light codes resulting in three options per usecase.

Experiments.

We conduct two user studies to reduce degreesof freedom in the design process and to examine whetherzoomorphic gestures are more intuitive and attractive thanother visual modalities. The ﬁrst user study aims at identifying

IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2021 the best option to communicate the robot state out of thethree designs per use case and modality. Further, we alsoevaluate the suitability of the design guidelines outlined above.The second user study utilizes the best option per use caseand modality identiﬁed in the ﬁrst study to assess the threemodalities regarding intuitive understanding, user experience,and stated preference.IV. S

TUDY

I: I

DENTIFYING B EST D ESIGN O PTIONS

To identify which of the three design options per use caseis the best way to communicate the robot state with a givenmodality, we utilized an online user study. As the study designdoes not require direct interaction with the cobot, an onlinestudy is a considerably more efﬁcient tool to collect as manyresponses as possible compared to in-person studies.

Population.

We recruited a convenience sample of 93 par-ticipants (39 male, 52 female, 1 other, 1 N/A, average age M = 27 . years, SD = 8 . ). The participants mainly hada non-technical background (63 non-technical, 26 technical, 4other) and, on average, a positive attitude towards robots ingeneral ( M = 3 . , SD = 0 . , rated on a scale from 1 (verynegative) to 5 (very positive)). Experimental procedure.

After giving informed consent,we provided information on the assembly task (descriptionand video, see Fig. 1 for a screenshot) to the participants.For each modality and use case, the participants viewed threevideos showing the three design options. We then askedthe participants which option ﬁts best to communicate thegiven robot state. After completing all ﬁve use cases withinone modality in a randomized order, the participants evalu-ated the robot communicating with the given modality. Weused the Godspeed questionnaire [31] and asked for free-textassociations the participants had with the shown modality.The Godspeed questionnaire was developed as a standardizedmeasure for the impression a robot has on the user andconsiders ﬁve dimensions (see Fig. 4) [31]. Although othermeasures exist, the Godspeed questionnaire is widely utilizedand provides sufﬁcient statistical reliability and validity [31],[32]. The blocks of each modality were randomized as well toavoid anchoring and sequence effects [33]. The online studyconcluded with a short demographic questionnaire.

Results.

The results in Fig. 3 indicate that for zoomorphic ges-tures, a clear best option generally exists. The ambigousness ishigher in the case of abstract gestures and light displays. Forthese modalities, several use cases have two or even all threedesign options selected similarly often, e.g, use case ”wait”.We analyze the evaluation of the modalities with theGodspeed questionnaire with one-way analyses of variance(ANOVA), see Fig. 4. We report the F-statistic ( F ), the p-value ( p ) and the effect size ( η G ). The results indicate thatzoomorphic gestures score signiﬁcantly higher than the othermodalities in the dimensions anthropomorphism ( F (2 , . , p < . , η G = 0 . , animacy ( F (2 , . , p < . , η G = 0 . , and likeability ( F (2 , . , p < . , η G = 0 . . In the case of perceived intelligence,both gesture-based modalities are rated signiﬁcantly higher Greet user Supply empty Wait Prompt user Shut downGreet user Supply empty Wait Prompt user Shut downGreet user Supply empty Wait Prompt user Shut down1 2 3 1 2 3 1 2 3 1 2 3 1 2 31 2 3 1 2 3 1 2 3 1 2 3 1 2 31 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Options C o un t p er O p t i o n Zoomorphic gesture Abstract gesture Light display

Fig. 3:

Choice Distributions.

For each use case and modalitywe designed and evaluated three distinct options. **** **** **** **** * ***** ns * * ** ****ns**** **** Fig. 4:

Godspeed Ratings.

Boxplot of the ratings of the threemodalities over the ﬁve dimensions of the Godspeed ques-tionnaire. We also report the Bonferroni-corrected pairwisecomparisons: ns: p > . , *: p < . , **: p < . , ***: p < . , ****: p < . ,than the light displays ( F (2 , . , p = 0 . , η G = 0 . . Zoomorphic gestures are rated signiﬁcantlylower in regards to perceived safety than the alternatives ( F (2 , . , p < . , η G = 0 . .The participants’ free-text answers reveal strong associa-tions of the zoomorphic gestures with dogs (named by 28participants) and with other animals (20) or humans (15). Forthe abstract gestures, strong associations with robots (21) ormachines (10) but only rarely with dogs (5) were expressed.Cobots communicating via light displays were associated withtrafﬁc/control lights (36) or electronics in general (20), buthardly with dogs (0) or any other living beings. Discussion.

The different levels of embedded semanticsmay explain the varying degree of ambiguousness over thebest design option for a particular modality. In the case ofzoomorphic gestures, semantics are transferred from dog body-language, which tends to be easy to understand both by dog-owners and people with little to no experience with dogs [7].

AUER et al. : ZOOMORPHIC GESTURES FOR COMMUNICATING COBOT STATES 5

Given that abstract gestures and light displays do not have anapparent semantic connection, participants may have chosenthe best option per use case and modality more along personalpreference (”What looks nice or appealing?”).The Godspeed ratings, along with the voiced associations,conﬁrm the functionality of our proposed design guidelines.For example, zoomorphic gestures received high ratings foranthropomorphism and animacy along with frequent dog as-sociations. Abstract gestures were rated less anthropomorphicand animate, and were frequently associated with robots andmachines. Thus, the guidelines are suitable for integratingtypical dog-characteristics into the zoomorphic gestures andhelp create robot-typical abstract gestures. The participants’comments indicate that the light blinking frequency was notalways understood as an indication of urgency. Instead, someparticipants interpreted blinking as a processing state of thecobot during which the user needed to wait. Our recommen-dation for future studies is to review the use of blinking andits frequency. The modalities were rated similarly regardingperceived intelligence. This ﬁnding may be explained bythe fact that the cobot’s general reaction was independentof the modality. For example, the cobot always reportedan error over an empty storage bin. In terms of perceivedsafety, zoomorphic gestures are perceived as less safe than thealternatives, although the effect size is small ( η G = 0 . .Nonetheless, it is not surprising that zoomorphic gesturescause more anxious, agitated, and surprised reactions than thealternatives, as zoomorphic gestures are less robot-typical andmore extensive. When viewed globally, zoomorphic gesturesare still rated above the scale middle ( M = 3 . , SD = 1 . ,see Fig. 4), hence, achieve sufﬁcient subjective safety ratings.Overall, the results of the Godspeed questionnaire indicatethat the modality used to communicate the robot state shapesthe user’s perception of the robot but to different extents.The modality has substantial effects on anthropomorphismand animacy. Likability, perceived intelligence, and perceivedsafety are less affected. Additionally, this study identiﬁesthe best option of the design alternatives per use case andmodality, a prerequisite for the following benchmark study.V. S TUDY

II: V

ISUAL C OMMUNICATION B ENCHMARK

After identifying the best design option per use case foreach modality, we performed a second user study – the visualcommunication benchmark.

Population.

In total, 214 participants (116 male, 96 female,2 other, average age M = 40 . years, SD = 12 . )were recruited for the online study via an online panel. Theparticipants mainly had a non-technical background (110 non-technical, 79 technical, 25 other) and, on average, a positiveattitude towards robots in general ( M = 4 . , SD = 0 . ,rated on a scale from 1 (very negative) to 5 (very positive)).The sample included participants with low prior experience(81 participants with no practical experience) and high priorexperience (60 participants with practical experience with atleast three robot types) with robots from ﬁve different domains(industry, household, entertainment, service, healthcare). Experimental procedure.

After giving informed consent andproviding relevant demographic information, information onthe assembly task was provided in the same way as in theﬁrst study (description and video). For each modality and usecase, the participants viewed the video showing the best optionidentiﬁed in study I. The participants were asked which optionfrom a predeﬁned list describes best what the cobot wantedto express in each video. The predeﬁned list remained thesame for each use case and all modalities. It comprised theﬁve use cases plus two additional ones (cobot wants to checkassembled product, cobot is malfunctioning). The additionaluse cases prohibit an obvious mapping. After completing allﬁve use cases within one modality in a randomized order, theparticipants were asked to rate the robot communicating withthe given modality regarding user experience (intuitive use, joyof use, attractiveness [34]) and intent to use [35]. The blocks ofeach modality were randomized as well to avoid anchoring andsequence effects [33]. The online study concluded by askingthe participants to state their preference for one of the threemodalities.

Results.

To evaluate the understandability of the differentmodalities we consult the classiﬁcation accuracy [9] (see Fig.5). On average, the classiﬁcation accuracy reaches 58.0%for zoomorphic gestures, 58.5% for abstract gestures, and38.8% for light displays. A one-way ANOVA with Bonferroni-adjusted post-hoc tests ( F (2 , . , p < . , η G = 0 . shows that, on average, signiﬁcantly more usecases are correctly classiﬁed for zoomorphic and abstractgestures ( M = 2 . for both modalities) than for light displays ( M = 1 . .In regards to the subjective evaluation, zoomorphic gesturesare rated better (closer to the scale maximum) across the con-sidered dimensions than the alternatives (see Fig. 6). Zoomor-phic gestures are signiﬁcantly more attractive ( F (2 , . , p < . , η G = 0 . , provide signiﬁcantly morejoy when using ( F (2 , . , p < . , η G = 0 . and are signiﬁcantly more intuitive to use ( F (2 , . , p < . , η G = 0 . . The intent to use is signiﬁcantlyhigher for both gesture-based modalities than for the lightdisplay ( F (2 , . , p < . , η G = 0 . .Asked explicitly, which modality participants prefer for RtHcommunication in the given scenario, 56.1% of the participantschose zoomorphic gestures for reasons such as ”it is the mostlogical,” ”it is the most human,” ”appears likable”. 21.1%preferred abstract gestures (”easiest to understand,” ”clearand simple, zoomorphic gestures are too much,” ”modern”),while the remaining 16.8% preferred light displays (”personalpreference,” ”colors are more logical,” ”easy to understandonce you know the light codes”). Discussion.

Our results indicate that our zoomorphic andabstract gestures are more intuitively understood than ourlight displays. Other work investigating zoomorphic gesturesgenerally report higher accuracies (up to 75%, e.g., [27]) thanour results. However, these studies investigate the communi-cation of affective states, not functional states. As zoomorphicgestures are emblems, they are naturally closer related toaffective information [29]; hence, achieving high classiﬁcation

IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2021 G r ee t S upp l y e m p t y W a it P r o m p t S hu t do w n G r ee t S upp l y e m p t y W a it P r o m p t S hu t do w n G r ee t S upp l y e m p t y W a it P r o m p t S hu t do w n Use Cases C l a ss i f i c a t i o n A cc u r a c y [ % ] Zoomorphic gesture Abstract gesture Light display

Fig. 5:

Classiﬁcation Accuracy per Modality and Use Case.

The classiﬁcation accuracy is the number of correct answersdivided by the total number of participants. The chance levelis at . , as indicated by the black lines. ********** ********* ********** ns******** Fig. 6:

User Experience Ratings.

Boxplot of the ratings of thethree modalities over the four user experience dimensions. Wealso report the Bonferroni-corrected pairwise comparisons: ns: p > . , *: p < . , **: p < . , ***: p < . , ****: p < . ,accuracy may be easier. When considering the use casesindividually, space for improvement exists, especially for theuse cases ”greet user” and ”wait for new command” (see Fig.5). Nonetheless, all robot states are identiﬁed above the chancelevel for each modality.The subjective evaluation regarding user experience and in-tent to use is favorable toward zoomorphic gestures. Zoomor-phic gestures are signiﬁcantly more attractive and intuitive andprovide more joy when using. Compared to zoomorphic ges-tures, the abstract gestures achieved similarly high accuracies.Given that the participants had no prior training on any of themodalities, this suggests that our abstract gestures may not en-tail a higher learning effort than the other modalities, contraryto our expectations. Yet, the majority of participants stateda preference for zoomorphic gestures ( . ) over abstractones ( . ). The clear user preference for movement-basedcommunication mirrors ﬁndings from previous studies [25]. Overall, the sample used for this study included participantswith different levels of experience with robots. However, themajority had limited experience with industrial cobots as veryexperienced users are hard to ﬁnd. Therefore, our ﬁndingsindicate that zoomorphic gestures may be both an intuitive andattractive modality, especially for inexperienced users. Similarto results by [4], comments made by some participants indicatethat zoomorphic gestures may be perceived as annoying bymore experienced users. The reason for this may be that theyare more time-consuming and elaborate than the alternativemodalities. The longer execution times of the gestures mayimpede the work-ﬂow of assembly and, thus, may be lesssuitable for work processes with strict cycle times.VI. L IMITATIONS

Our approach and methods also face some limitations. Toour knowledge, this work is the ﬁrst to investigate zoomor-phic gestures for appearance-constrained cobots. Therefore,we have followed an exploratory approach and consciouslyomitted safety aspects from consideration. However, especiallyfor movement-based communication, safety is critical andrequires extensive consideration. These concerns need to beaddressed in future studies.Concerning our evaluation methods, our exploratory ap-proach has some drawbacks. We worked with opportunitysamples to recruit a large and diverse sample for both userstudies to increase the generalizability of the results. The samestudies conducted only with highly experienced users mayyield different results.Further, the videos we produced of the different designoptions included less contextual information for the lightdisplays than in the videos for the gesture-based modalities.In future studies, we advise maintaining an equal level ofcontextual information to reduce the possible inﬂuence ofconfounders. Additionally, we suggest controlling for possiblecovariates, such as the total duration of the gestures. Wealso did not fully exhaust the design space of light displaysin general. Instead, we built design options from existingguidelines under the consideration of the built-in hardware ofthe utilized cobot. On a similar note, our design guidelines todevelop different gesture and light options offer some leeway.Therefore, we designed study I to reduce arbitrariness withinour design choices. Nonetheless, further studies with differentdesign options and use cases are required to support thegeneralizability of our results.Finally, we recommend revising the benchmark study ques-tionnaire to include the participant’s certainty of their classi-ﬁcation. This additional information on certainty may providehelpful insights. VII. C

ONCLUSIONS

This paper’s main objective is to explore the suitabilityof zoomorphic gestures for appearance-constrained cobots inindustry applications. The two user studies indicate that theproposed guidelines for developing zoomorphic and abstractgestures are suitable. Participants understood both gesture-based modalities intuitively with an average classiﬁcation

AUER et al. : ZOOMORPHIC GESTURES FOR COMMUNICATING COBOT STATES 7 accuracy of 58%. Additionally, the subjective evaluation andstated preference are in favor of zoomorphic gestures.Future research avenues include gesture development foradditional use cases, the consideration of safety as outlined in[36], and evaluation with experienced users. Further, zoomor-phic gestures, which mimic the social dynamic between dogsand humans, may present a compelling option to reduce robotabuse by providing relevant social mechanisms [37], [38],[39]. In a deployed system, zoomorphic gestures could also becombined with other visual and non-visual modalities, whichwere not considered in this paper as they represent orthogonalresearch directions. When considering unimodal communi-cation, zoomorphic gestures may be especially suitable fornovice users by lowering inhibitions. Further, zoomorphicgestures may also be made available in libraries provided inrobot programming platforms to personalize the human-robotinteraction. A

CKNOWLEDGEMENTS

We would like to thank Elie Aljalbout and Konstantin Rittfor their technical support and Stefan Groß for supporting thepreparation of the video materials. We would also like thankr/aww for providing insights on dog body language.R

EFERENCES[1] P. Barattini, C. Morand, and N. M. Robertson, “A proposed gesture setfor the control of industrial collaborative robots,” in

RO-MAN , 2012.[2] M. Fulton, C. Edge, and J. Sattar, “Robot communication via motion:Closing the underwater human-robot interaction loop,” in

ICRA , 2019.[3] K. Baraka and M. M. Veloso, “Mobile service robot state revealingthrough expressive lights. formalism, design, and evaluation,”

Interna-tional Journal of Social Robotics , 2018.[4] M. C. Aubert, H. Bader, and K. Hauser, “Designing multimodal intentcommunication strategies for conﬂict avoidance in industrial human-robot teams,” in

RO-MAN , 2018.[5] L. Onnasch, X. Maier, and T. J¨urgensohn, “Mensch-roboter-interaktion- eine taxonomie f¨ur alle anwendungsf¨alle,” baua: Fokus, Bundesanstaltf¨ur Arbeitsschutz und Arbeitsmedizin , 2016.[6] C. L. Bethel and R. R. Murphy, “Survey of non-facial/non-verbal affec-tive expressions for appearance-constrained robots,”

IEEE Transactionson Systems, Man, and Cybernetics, Part C (Applications and Reviews) ,2007.[7] D. S. Syrdal, K. L. Koay, M. G´acsi, M. L. Walters, and K. Dautenhahn,“Video prototyping of dog-inspired non-verbal affective communicationfor an appearance constrained robot,” in

RO-MAN , 2010.[8] J. K¨atsyri, K. F¨orger, M. M¨ak¨ar¨ainen, and T. Takala, “A review ofempirical evidence on different uncanny valley hypotheses: support forperceptual mismatch as one road to the valley of eeriness,”

Frontiers inPsychology , 2015.[9] A. Deshmukh, B. Craenen, M. E. Foster, and A. Vinciarelli, “Themore i understand it, the less i like it: The relationship betweenunderstandability and godspeed scores for robotic gestures,” in

RO-MAN ,2018.[10] C. Harrison, J. Horstman, G. Hsieh, and S. Hudson, “Unlocking theexpressivity of point lights,” in

CHI , 2012.[11] E. Cha, T. Trehon, L. Wathieu, C. Wagner, A. Shukla, and M. J. Matari´c,“Modlight: designing a modular light signaling tool for human-robotinteraction,” in

ICRA , 2017.[12] R. T. Chadalavada, H. Andreasson, R. Krug, and A. J. Lilienthal, “That’son my mind! robot to human intention communication through on-boardprojection on shared ﬂoor space,” in

ECMR , 2015.[13] V. V. Unhelkar, H. C. Siu, and J. A. Shah, “Comparative performance ofhuman and mobile robotic assistants in collaborative fetch-and-delivertasks,” in

HRI , 2014.[14] K. Baraka, A. Paiva, and M. Veloso, “Expressive lights for revealingmobile service robot state,” in

Robot 2015: Second Iberian RoboticsConference , 2016. [15] D. Szaﬁr, B. Mutlu, and T. Fong, “Communicating directionality inﬂying robots,” in

HRI , 2015.[16] I. El Makrini, K. Merckaert, D. Lefeber, and B. Vanderborght, “Designof a collaborative architecture for human-robot assembly tasks,” in

IROS ,2017.[17] F. De Pace, F. Manuri, A. Sanna, and D. Zappia, “An augmentedinterface to display industrial robot faults,” in

AVR , 2018.[18] B. Gleeson, K. MacLean, A. Haddadi, E. Croft, and J. Alcazar, “Ges-tures for industry intuitive human-robot communication from humanobservation,” in

HRI , 2013.[19] A. M. Zanchettin, L. Bascetta, and P. Rocco, “Acceptability of roboticmanipulators in shared working environments through human-like re-dundancy resolution,”

Applied Ergonomics , 2013.[20] A. Gasparetto, P. Boscariol, A. Lanzutti, and R. Vidoni, “Trajectoryplanning in robotics,”

Mathematics in Computer Science , 2012.[21] A. D. Dragan, S. Bauman, J. Forlizzi, and S. S. Srinivasa, “Effects ofrobot motion on human-robot collaboration,” in

HRI , 2015.[22] H. Knight, “Expressive motion for low degree-of-freedom robots,” 2016.[23] K. Takahashi, M. Hosokawa, and M. Hashimoto, “Remarks on designingof emotional movement for simple communication robot,” in , 2010.[24] R. Fernandez, N. John, S. Kirmani, J. Hart, J. Sinapov, and P. Stone,“Passive demonstrations of light-based robot signals for improved humaninterpretability,” in

RO-MAN , 2018.[25] G. Venture and D. Kuli´c, “Robot expressive motions: a survey ofgeneration and evaluation methods,”

ACM Transactions on Human-Robot Interaction (THRI) , 2019.[26] A. Singh and J. E. Young, “A dog tail for utility robots: exploringaffective properties of tail movement,” in

IFIP Conference on Human-Computer Interaction , 2013, pp. 403–419.[27] M. G´acsi, A. Kis, T. Farag´o, M. Janiak, R. Muszy´nski, and ´A. Mikl´osi,“Humans attribute emotions to a robot that shows simple behaviouralpatterns borrowed from dog behaviour,”

Computers in Human Behavior ,2016.[28] J. E. Young, Y. Kamiyama, J. Reichenbach, T. Igarashi, and E. Sharlin,“How to walk a robot: A dog-leash human-robot interface,” in

RO-MAN ,2011.[29] M. L. Knapp,

Nonverbal communication in human interaction . HoltRinehart and Winston, 1972.[30] B. Akgun, M. Cakmak, J. W. Yoo, and A. L. Thomaz, “Trajectoriesand keyframes for kinesthetic teaching: A human-robot interactionperspective,” in

HRI , 2012.[31] C. Bartneck, D. Kuli´c, E. Croft, and S. Zoghbi, “Measurement in-struments for the anthropomorphism, animacy, likeability, perceivedintelligence, and perceived safety of robots,”

International Journal ofSocial Robotics , 2009.[32] A. Weiss and C. Bartneck, “Meta analysis of the usage of the godspeedquestionnaire series,” in

RO-MAN , 2015.[33] E. Fanning, “Formatting a paper-based survey questionnaire: Best prac-tices,”

Practical Assessment, Research, and Evaluation , 2005.[34] M. Schrepp and J. Thomaschewski, “Handbook for the modular ex-tension of the user experience questionnaire,” in

Mensch & Computer ,2019.[35] D. Harborth and S. Pape, “German translation of the uniﬁed theory ofacceptance and use of technology 2 (UTAUT2) questionnaire,”

SSRNJournal , 2018.[36] ISO, “TS 15066: 2016: Robots and robotic devices – collaborativerobots,”

International Organization for Standardization , 2016.[37] C. Bartneck and J. Hu, “Exploring the abuse of robots,”

InteractionStudies , 2008.[38] H. Lucas, J. Poston, N. Yocum, Z. Carlson, and D. Feil-Seifer, “Toobig to be mistreated? examining the role of robot size on perceptions ofmistreatment,” in

RO-MAN , 2016.[39] X. Z. Tan, M. V´azquez, E. J. Carter, C. G. Morales, and A. Steinfeld,“Inducing bystander interventions during robot abuse with social mech-anisms,” in