[PDF] "Alexa, Can I Program You?": Student Perceptions of Conversational Artificial Intelligence Before and After Programming Alexa

Abstract

Growing up in an artificial intelligence-filled world, with Siri and Amazon Alexa often within arm's - or speech's - reach, could have significant impact on children. Conversational agents could influence how students anthropomorphize computer systems or develop a theory of mind. Previous research has explored how conversational agents are used and perceived by children within and outside of learning contexts. This study investigates how middle and high school students' perceptions of Alexa change through programming their own conversational agents in week-long AI education workshops. Specifically, we investigate the workshops' influence on student perceptions of Alexa's intelligence, friendliness, aliveness, safeness, trustworthiness, human-likeness, and feelings of closeness. We found that students felt Alexa was more intelligent and felt closer to Alexa after the workshops. We also found strong correlations between students' perceptions of Alexa's friendliness and trustworthiness, and safeness and trustworthiness. Finally, we explored how students tended to more frequently use computer science-related diction and ideas after the workshops. Based on our findings, we recommend designers carefully consider personification, transparency, playfulness and utility when designing CAs for learning contexts.

Full PDF

““Alexa, Can I Program You?”: Student Perceptions ofConversational Artificial Intelligence Before and AfterProgramming Alexa

JESSICA VAN BRUMMELEN, VIKTORIYA TABUNSHCHYK, and TOMMY HENG,

Mas-sachusetts Institute of Technology, USA

Fig. 1. Example conversations from students’ Alexa skill designs, including a “Meme Maker” and “LanguageGame”.

Growing up in an artificial intelligence-filled world, with Siri and Amazon Alexa often within arm’s—orspeech’s—reach, could have significant impact on children. Conversational agents could influence how studentsanthropomorphize computer systems or develop a theory of mind. Previous research has explored howconversational agents are used and perceived by children within and outside of learning contexts. This studyinvestigates how middle and high school students’ perceptions of Alexa change through programming theirown conversational agents in week-long AI education workshops. Specifically, we investigate the workshops’influence on student perceptions of Alexa’s intelligence, friendliness, aliveness, safeness, trustworthiness,human-likeness, and feelings of closeness. We found that students felt Alexa was more intelligent and feltcloser to Alexa after the workshops. We also found strong correlations between students’ perceptions ofAlexa’s friendliness and trustworthiness, and safeness and trustworthiness. Finally, we explored how studentstended to more frequently use computer science-related diction and ideas after the workshops. Based on ourfindings, we recommend designers carefully consider personification, transparency, playfulness and utilitywhen designing CAs for learning contexts.CCS Concepts: •

Applied computing → Interactive learning environments ; •

Social and professional topics → K-12 education ; Children ; •

Human-centered computing → Natural language interfaces ; User interfaceprogramming ; •

Computing methodologies → Intelligent agents .Additional Key Words and Phrases: child-agent interaction, conversational agents, voice user interfaces, digitalassistants, smart speakers, AI education, theory of artificial mind, constructionism

ACM Reference Format:

Jessica Van Brummelen, Viktoriya Tabunshchyk, and Tommy Heng. 2021. “Alexa, Can I Program You?”:Student Perceptions of Conversational Artificial Intelligence Before and After Programming Alexa. 1, 1(February 2021), 16 pages.

With children asking Google to buy them more toys [6], cheating on homework with Alexa [53],and playing voice-based pranks on parents [6], conversational agents (CAs) have potential to notonly influence children’s play—but also how they grow and develop [55]. For instance, researcherstheorize that interacting with an agent can change people’s understanding of agency concepts and

Authors’ address: Jessica Van Brummelen, [email protected]; Viktoriya Tabunshchyk, [email protected]; Tommy Heng, [email protected], Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, USA, 02139. a r X i v : . [ c s . H C ] F e b Van Brummelen, et al. their Theory of Mind (ToM) [21, 29, 54]. Other research has shown engaging with CAs can changepeople’s behavior [1, 9, 20, 49] and have positive effects on information retention [4].Considering the impact agents have on human understanding and behavior, how prevalent thesesystems are becoming [43, 55], and how opaque their operations can be to humans [13, 34, 42], agrowing body of research suggests it is important for people of all ages to understand AI [32, 42, 56].Furthermore, researchers are investigating how to best teach AI literacy concepts to students,including those as young as preschoolers [61]. For instance, one study leverages 3-5th gradestudents familiarity with CAs to teach AI literacy concepts [31]. Other works utilize AI ethicsdiscussions [12], interactive, collaborative learning environments [60], and gesture recognitiontools [66] to engage students in learning AI. In this work, we use a constructionist approach, inwhich students program their own CAs, to teach AI concepts to 6-12th grade students [40, 59].Another aspect of AI education research includes students’ perception of AI systems themselves,including personification of such systems, emotions the systems evoke, and students’ conceptionsof how the systems work. For example, one study examines preschool- and kindergarten-agedstudents’ perceptions of “thinking machines” during an AI learning activity, emphasizing theimportance of early childhood AI literacy and ToM development [61]. Other studies investigatechildren and family’s perceptions of CAs [34], how interaction modalities influence children’sperceptions of CAs [13], children’s perceptions of maze-solving agents’ intelligence [14], andwhether children categorize CAs as animate objects or artifacts [65]. Yet other studies emphasizethe importance of adults’ perceptions and conceptions of AI, especially in decision-making andpolicy [22, 27, 50]. To our knowledge, few studies investigate middle and high school students’perceptions of AI [36, 45], despite teenage years being critical in ethical perspective development[10], a key component of AI literacy [56]. Furthermore, to our knowledge, no studies investigatehow middle and high school students’ perceptions of CAs change through programming CAs.We posit that understanding students’ perceptions and feelings towards such agents can helpresearchers better facilitate student learning. For instance, feelings of closeness with teachershave been shown to affect students’ academic performance [2, 5, 63], which may also be thecase when agents take on the teacher role. Another study indicates that the avatar used forpedagogical feedback-giving agents affect students’ emotional attachment and satisfaction with thelearning process [48], alluding to the potential for students’ perception of agents to affect learning.Furthermore, research suggests understanding students’ preconceptions and mental models canimprove teaching [15, 51]. By understanding students’ feelings and conceptions about agents, weexpect we can create better digital learning environments.This study investigates 6-12th grade students’ perceptions and conceptions of Amazon Alexain a learning environment described in [59]. In contrast to [59], which investigates students’ AIliteracy, this study investigates how a programming and learning intervention, in which studentsdevelop their own CAs, affects student perspectives of AI. Our main research question is as follows:

RQ: How does building Alexa skills and learning about conversational AI in a remoteworkshop affect students’ perceptions and conceptions of AI, conversational AI, andAlexa?

By better understanding students’ perspectives on agents and how these perspectives can bechanged, we contribute to ongoing research to develop more human-centered, socially usefulagents—especially for K-12 education. To this end, we present four design considerations forK-12 education agents and development tools based on our findings. Specifically, we look atstudents’ conceptions of how AI and conversational AI work, and perceptions of Alexa in termsof friendliness, human-likeness, aliveness, safeness, trustworthiness, intelligence (generally andrelative to themselves), and how close they feel to Alexa.

Alexa, Can I Program You?”: Student Perceptions of Conversational Artificial Intelligence Before and After ProgrammingAlexa 3

The working definition of AI in research has changed over the years—from having a sharp focus onlogical, symbolic representations of concepts and actions to a marked concentration on modellingextensive interconnected computation machines called “neural networks” [64]. In the media, AIhas been depicted in many different ways—as killer robots, android caretakers, and superintelligent,disembodied voices [19]. Despite the somewhat frivolous portrayals, people’s understanding ofAI and how it works has serious implications—from policy-making to day-to-day assessments ofwhether a self-driving vehicle is safe to trust one’s life with [22, 35].ToM research in AI investigates how to develop AI systems with human-like cognition, as wellas how people understand AI as agents with mental states [17]. In this research, we focus onthe latter, or “Theory of Artificial Mind” (ToAM) [54]. Understanding people’s conceptions ofAI, including anthropomorphization of AI technology, conceptions of specific technologies, likeCAs, and emotional reactions to AI systems, is important for teaching AI literacy. Through betterunderstanding students’ perceptions and ToAM, we can likely better teach students about AI [21]and therefore better reach our research community’s goal of equipping people to live in an AI-filledworld [22, 32, 56].Children have been observed to anthropomorphize AI systems [13, 14, 65]; however, theirunderstanding of the actual “aliveness” of such systems is inconsistent across populations andseems to vary with age [24, 47, 61, 65]. Other anthropomorphic aspects of AI systems have alsobeen investigated for different purposes. For instance, a number of studies examine how children(3-10 years old) perceive agents’ intelligence—generally and relative to their intelligence—with thepurpose of inspiring critical thinking [13, 14]. Another study investigates 5-6 year old children’sperception of CAs’ friendliness, aliveness, trustworthiness, safeness, and funniness, in additionto intelligence to develop CA design recommendations [34]. Researchers investigated similaranthropomorphic aspects, including how sociable, mutual-liking, attractive, human, close, andintelligent children (10-12 years old) perceive agents to be, in order to improve learning interventions[36]. We investigate related anthropomorphic aspects in middle to high school students’ ToAM.Research also shows that interaction with AI artifacts can influence people’s ToM and perceptionsof AI. For instance, observing and constructing robot behavior influenced students’ ToAM, enablingthem to better explain the AI systems’ behavior [54]. Another study showed that interactingwith a pedagogical agent influenced students’ understanding of the key ToM concept of agency,allowing them to better predict behavior. The same study linked students’ prior understanding ofagency to better learning [21]. In this paper, we investigate how AI literacy workshops involvingprogramming a CA influences students’ ToAM, including perceptions of anthropomorphic qualitiesand understanding of AI behavior.

Many studies investigate how CAs can best embody the teaching role [11, 28, 38, 41]. Some suchstudies show that interacting with agents can positively affect learning and students’ ToM [11, 21].In this study, however, we take a constructionist approach, and instead of placing agents in theteaching role, we empower students to learn about AI through developing their own CAs [40, 59].Constructionism has been shown to be effective in teaching K-12 students AI concepts. Forexample researchers have taught students AI ethics through constructing paper prototypes [3], ma-chine learning (ML) concepts through developing gesture-based models [66], and AI programmingconcepts through creating projects with AI cloud services [23]. Our study teaches students Longand Magerko’s AI literacy competencies through developing CAs [32, 59].

Van Brummelen, et al.

Certain studies specifically investigate whether constructionist activities change student con-ceptions and perceptions of AI agents. For example, a series of studies showed constructing arobot’s behavior enabled kindergarten students to conceptualize an agent’s rule-based behavior[37], shifted students’ perspectives from technological to psychological [30], and shifted students’language from anthropomorphic to technological [26]. Through an activity with the same construc-tionist programming environment, it was shown 5- and 7-year-old students’ conceptions of ToAMdeveloped, and the students were able to better understand robots’ behavior [54]. A study withprogramming and ML training activities showed 4-6-year-old students’ understanding of ToM andperceptions of robots changed throughout the experiment [61]. In this work, we investigate whetherstudents’ ToAM and perceptions of AI in middle and high school change through a constructionistCA programming activity and workshop.

We conducted our workshops with 47 students separated into two groups of 12 and 35. For eachgroup, the students’ teachers observed the workshops and provided feedback to the three teachingresearchers [59]. The teachers were recruited through an Amazon Future Engineers call to Title Ischools. Each teacher chosen for the workshops was asked to recruit 5 or 6 of their students. Wetargeted Title I schools because they have high concentrations of children from low-income families[52], and we wanted to provide opportunities for enrichment that they may not normally receive.We developed middle and high school level AI curriculum and thus targeted middle and high schoolstudents. The students’ mean age was 14.78 (range 11-18, SD=1.91), with 19 self-identifying as male,27 self-identifying as female, and 1 student that did not complete the questionnaire.

To accomplish our goal of studying student perceptions of conversa-tional AI before and after programming Alexa, we developed an interface within MIT App Inventorfor creating Alexa skills [57]. This lowered the barrier of entry to programming CAs, as the studentscould use visual block-based coding to develop skills. As described in [59], once a student creates askill on the interface, the backend translates their blocks into JSON and Javascript code to be sent toAlexa’s API to build and enable the skill on the student’s Amazon Developer account. This allowsthe students to interact and have a conversation with the Alexa skill either on an Alexa-enableddevice (e.g., iOS Alexa App or Amazon Echo) or an online simulated Alexa device (e.g., MIT AppInventor Alexa Testing simulator or Amazon Developer Console).

This section provides a brief overview of the learning intervention, whichis described in-depth in [59]. The intervention occurred over two sessions, which both involvedfive consecutive days of 2.5 hour long Zoom sessions. The first day began with an introduction tothe MIT App Inventor interface [62] to accustom students to block-based coding. Then the studentswere given a chance to interact freely with Alexa, writing down the questions they asked duringthe interaction. In the first week, students were each provided with a complimentary Echo Dot.This was not feasible for the second week of workshops due to an increased number of students, sostudents either used the Alexa app on their mobile devices, an online Alexa simulator (within MITApp Inventor or otherwise), or Alexa devices they previously owned. Overall, 19 students used anAlexa device, 17 used the Alexa app, 10 used an online simulator, and one did not specify.The second day involved introducing students to key AI and conversational AI concepts, dis-cussing AI ethics, and completing a tutorial walk-through to create an Alexa skill that wouldrespond to basic greetings. On the third day, students completed a tutorial to develop a calculator

Alexa, Can I Program You?”: Student Perceptions of Conversational Artificial Intelligence Before and After ProgrammingAlexa 5 skill, in which Alexa could be asked, “What’s number A multiplied by number B ”, or somethingsimilar. Next, we taught students about ML in more depth, including discussing the differencebetween a rule-based CA developed on the first day and the ML-based CAs developed on the secondand third days. Finally, students engaged in an AI text generation activity.On the fourth day, students developed a skill that enabled Alexa to read out text entered intoMIT-App-Inventor-developed mobile apps. Students then brainstormed ideas for skills for theirpersonal projects. Students spent the final day developing their projects and presenting them tothe rest of the class.

Various questionnaires inspired by the perception of AI questions in [14] and [34] were given tostudents during the learning intervention. On the first day, students recorded their interactionswith Alexa, impressions of the CA, and demographics information. At the start of the second day,students completed a questionnaire assessing their initial feeling towards and understanding ofAlexa, AI and conversational AI. The questions were divided into two sets, which we refer to as the

Persona and

Conception questions.The

Persona questions assessed students’ sentiments about Alexa on a 7-point Likert scale. Thequestions stated, “Alexa is...” followed by “intelligent”, “friendly”, “alive”, “safe”, “trustworthy”,“human-like”, and “smarter than me”. The final

Persona question asked how close students felt toAlexa using the

Inclusion of the Other in the Self scale [18]. The

Conception questions assessedstudents’ understanding of AI and conversational AI through asking, “Describe in your own wordswhat AI is” and “Describe in your own words what conversational AI is (e.g., chatbots, like Alexa orGoogle home, use conversational AI)”. At the end of the final day, students completed the

Persona and

Conception questions again. Additional questionnaires were given at the end of the second,third, and fourth days to assess specific AI literacy competencies, as discussed and analyzed in [59].

This study builds on the study presented in [59]. Thus, certain data analyzed in this study (e.g.,demographics) is necessarily the same; however, this study focuses on data not analyzed in [59],including the questionnaire responses to the

Persona questions and students’ reported interactionswith Alexa. The responses to the

Conception questions were analyzed in both studies, howeverusing different methods and through different lenses. This study investigates students’ conceptionsof AI through a word frequency analysis as well as analyses of changes in number of tags (asdescribed below). The study in [59] assessed students’ AI literacy before and after the learningintervention.To investigate the responses to qualitative questions, a reflexive, open-coding approach tothematic analysis [7] was performed by three researchers. The three researchers independentlycompleted familiarization and code-generation stages. After several discussions, the three re-searchers came to a consensus on codes for the questionnaire responses. Codes and respectiverepresentative quotations can be found in [58]. Researchers generally constructed codes inductivelyor with respect to ideas from literature, including the Big AI Ideas [56]. It is important to note thatresponses often involved multiple ideas and were thus tagged with more than one code.For the quantitative questions (e.g., Likert scale

Persona questions) asked on both pre- and post-questionnaires, the Wilcoxon Signed-Rank Test was employed to measure changes. Additionally, weused the Kendall Tau method to create pairwise correlation matrices. We analyzed the correlationcoefficients using Cohen (2013)’s definition for correlation effect strength for behavioral andeducation psychology [8]. To test the validity of the strength of the coefficients, we comparedKendall Tau p-values to an alpha of 0.05.

Van Brummelen, et al.

Table 1. Types of questions asked by students to Alexa prior to the conversational AI programming interven-tion.

Type Example utterances Instances

Information updates What time is it?, How is the weather for Wednesday?, How is the traffic? 31 (26%)Action commands Set a 15-minute timer, Play my Custom Spotify Playlist, Remind me thatI have a meeting at 1:00 pm, What’s 0 times 0? 30 (25%)Other Hello, Learn my voice, Are dragons real?, What are all the numbers ofpi? 24 (20%)Jokes Tell me a joke, Can you tell me a joke? 17 (14%)Personal questions What’s your favorite color?, When were you made?, What’s your favoritevideo game?, How was your day? 16 (14%)

For the word frequency analysis, we used the NLTK library [33] to remove stop-words, tok-enize and lemmatize qualitative responses. Additionally, to better visualize non-obvious concepts,we filtered out words directly from the questions, including ‘AI’, ‘artificial’, ‘intelligence’, and‘conversational’. Word clouds were generated using [39].

To understand the types of interactions students had with Alexa prior to the intervention, we codedthe phrases they reported saying to Alexa during the interaction activity. We found most of thephrases fell into one of five categories listed in Tab. 1. The

Information Updates category involvedreal-time events; the

Action Commands category involved built-in Alexa applications; the

PersonalQuestions category involved questions about Alexa; the

Jokes category involved asking Alexa to saya joke; and the

Other category involved questions and phrases that were often humorous (e.g., “Aredragons real?”) or impossible to fully answer (e.g., “What are all the numbers of pi?”), or generallyfell outside of the other categories (e.g., “Hello”). Note that prior to the activity, we asked Alexato tell us a joke, which may have contributed to a large number of students also asking Alexa forjokes.

By comparing pre- and post-survey answers to the

Persona questions (see Fig. 2), we found significantdifferences in how students felt about Alexa’s intelligence and how close they felt they were toAlexa. After the intervention, students felt Alexa was more intelligent ( ¯ 𝑥 = . 𝑀𝑜 = | 𝑍 | = . 𝑝 = . 𝑥 = . 𝑀𝑜 = | 𝑍 | = . 𝑝 = . 𝑥 = . 𝑀𝑜 = 𝑥 = . 𝑀𝑜 = 𝑥 = . 𝑀𝑜 = 𝑥 = . 𝑀𝑜 = 𝑥 = . 𝑀𝑜 = , 𝑥 = . 𝑀𝑜 = 𝑥 = . 𝑀𝑜 = 𝑥 = . 𝑀𝑜 = We found strong ( 𝑟 ≥ . Alexa, Can I Program You?”: Student Perceptions of Conversational Artificial Intelligence Before and After ProgrammingAlexa 7

Fig. 2. Students’ perceptions of Alexa prior to the workshops (in blue) and after (in green). on the post-test. There was also a strong correlation between trustworthiness reported on thepre-test and safeness reported on the post-test. Student reports of Alexa’s friendliness and trust-worthiness on the pre-test and between the pre- and post-tests were moderately ( 𝑟 ≥ . To visualize students’ understanding of AI and conversational AI, we analyzed word frequency andcreated word clouds based on answers to two questions. Fig. 4 shows the word frequency analysesof students’ answers to, “Describe in your own words what AI is”, prior to and after the intervention.Fig. 5 shows the analyses of answers to, “Describe in your own words what conversational AI is(e.g., chatbots, like Alexa or Google Home, use conversational AI)”.

To better understand students’ qualitative answers when conceptualizing AI and conversational AI,we performed a graphical exploration of tag frequency from our thematic analysis. The graph inFig. 6 shows the change in tag frequency from pre- to post-test. Since the number of participantswho completed the pre-test differed from the post-test, the number of tags ( 𝑡 𝑝𝑟𝑒 and 𝑡 𝑝𝑜𝑠𝑡 ) werenormalized over the number of responses ( 𝑛 𝑝𝑟𝑒 =

45 and 𝑛 𝑝𝑜𝑠𝑡 = Van Brummelen, et al.

Fig. 3. Correlation matrices before (top) and after (bottom) the intervention. Lighter colors correspond tohigher coefficients. percent change, 𝐶 % (Eq. 1). This is presented as an exploratory, graphical analysis for high-levelinsights rather than statistical analysis. Alexa, Can I Program You?”: Student Perceptions of Conversational Artificial Intelligence Before and After ProgrammingAlexa 9

Fig. 4. Word frequency analyses from “Describe in your own words what AI is” prior to (left) and after theintervention (right).Fig. 5. Word frequency analyses from “Describe in your own words what conversational AI is” prior to (left)and after the intervention (right). Notice the relative increase in the words, conversation , back , and able , likelyhaving to do with agents’ abilities to have back and forth conversation. 𝐶 % = ∗ 𝑡 𝑝𝑜𝑠𝑡 𝑛 𝑝𝑜𝑠𝑡 − ∗ 𝑡 𝑝𝑟𝑒 𝑛 𝑝𝑟𝑒 (1) Fig. 6. The change in instances of tags from pre- to post-test for

Conception questions. The change wascalculated according to Eq. 1.

Prior to the study, we hypothesized students would feel Alexa was less intelligent after learninghow to program it, as they would better understand how it works; however, students felt Alexawas more intelligent after the intervention ( | 𝑍 | = . 𝑝 = . multiple reasons. Perhaps by successfully learning fundamental AI literacy concepts [59], studentsrealized Alexa was more complex than they initially thought and thus perceived it to be more“intelligent” (as in the Dunning-Kruger effect [16]). This is supported by the relative increase in AIliteracy concepts (which are comparatively complex) in the post-test responses to the Conception questions (Fig. 6), and the relative decrease in pre-programming concepts (which are comparativelysimplistic). Students also generally felt Alexa was smarter than themselves (both before and afterthe intervention). This is consistent with previous studies of students aged 3-10 [13, 34].The Dunning-Kruger concept may also explain why there were relatively fewer tags identified—likely indicating fewer ideas presented by students—in the post-test than in the pre-test for manyof the qualitative answers to the

Conception questions. For example, as shown in Fig. 6, therewere relatively fewer tags for the majority of the tag categories in the post-test responses aboutconversational AI. Perhaps students became “less ignorant of their ignorance” [16] about Alexathrough the intervention, and therefore felt less qualified to answer the qualitative questionsand thus presented fewer ideas in the post-test. Nevertheless, one limitation of this study wasthat students responded to the post-test at the end of the workshops, so they may have had lessenergy than when they responded to the pre-test, alternatively explaining the relatively fewer ideaspresented.We also hypothesized that students would personify Alexa less after understanding the logicbehind how it works, and therefore rate its “aliveness”, “human-likeness”, “friendliness”, and theirfeelings of closeness to it as less than prior to the intervention. However, there was no significantevidence for any change, except that they felt closer to Alexa ( | 𝑍 | = . 𝑝 = . prior programming experience was moderately correlated withcloseness. Furthermore, prior programming experience and human-likeness, as well as closenessand human-likeness were moderately correlated. One explanation could be that as students learnedto program, they felt Alexa had human-like, logical reasoning, and thus felt closer to it (because ofits human-like traits).Students’ perceptions of Alexa’s friendliness and trustworthiness were strongly correlated, aswell as trustworthiness and safeness, and to a lesser extent, intelligence and trustworthiness,friendliness and safeness, and closeness and trustworthiness. Although these correlations do notnecessitate causation, it is important to consider the implications of potential causation whendesigning CAs. For instance, if a CA was purposefully designed to seem friendly and intelligent,users may associate this with trustworthiness and safeness, despite the potential for the CA toprovide incorrect information (intentionally or not). Nevertheless, this could also provide positiveopportunities, including how students may learn better if they feel a pedagogical agent is friendlyand intelligent, and thus also trustworthy and safe. This is discussed in more depth below. From the pre-/post-test comparison of word frequency in responses describing AI (Fig. 4) and con-versational AI (Fig. 5), as well as the change in tag frequency analysis (Fig. 6), students’ conceptionsseemed to shift towards more accurate understandings. For instance, the diction for describingAI seemed to shift towards computer-science-related terminology, including program , learn , and Alexa, Can I Program You?”: Student Perceptions of Conversational Artificial Intelligence Before and After ProgrammingAlexa 11 information . This trend is consistent with other literature, in which students describe AI with morecomputer science vocabulary after developing AI projects [45]. Furthermore, the emergence ofthe word learn in post-test responses suggests a better understanding of AI systems’ ability toadapt and update with training. For instance, one student’s response described AI as “a programthat learns and uses the learning for other problems”. Furthermore, as shown in Fig. 6, there wasa relative increase in references to concepts from the Big AI Ideas [56], including

Learning and

Representation and reasoning , and a relative decrease in simplistic explanations of the AI acronym(e.g., “AI is artificial intelligence”) and of how AI is “like a human”, indicating better understanding.In the conversational AI responses (Fig. 5), human remains the most frequent word in the pre- andpost-test, suggesting student understanding of how human interaction is central to conversationalAI’s purpose. For the conversational AI descriptions shown in Fig. 6,

Learning and the concept ofnatural language (NL) responses and understanding increased, indicating better understanding.Furthermore, there was a relative decrease in simplistic explanations of conversational AI being “likea robot”, mimicking humans, having pre-programmed responses, and being something that “help[s]humans”. Despite these indications of better understanding, there was a slight relative increasein vague or shallow answers for both the descriptions of AI and conversational AI, and a relativedecrease in the Big AI idea of representation and reasoning for conversational AI. Overall, however,it seemed as if students’ conceptions improved through the workshops, especially considering theevidence for increased understanding of AI literacy concepts presented in [59].

Based on the results, we present design considerations for engaging students in learning experienceswith CAs.

As shown in Tab. 1, students asked Alexa many personal questions (e.g.,“Alexa, do you like Siri?” and “What’s your favorite color?”), which would typically be asked ofhumans rather than computer systems. Alexa’s often humorous responses (e.g., “I like ultraviolet.It glows with everything”) could have contributed to students’ perception of personified traits, likefriendliness, intelligence and trustworthiness, which were all rated highly. As discussed, personifiedtraits in CAs could play a role in effective teaching interventions [48], especially since feelings ofcloseness and trust can enhance human teaching and learning experiences [2, 5, 63].We recommend pedagogical CA developers cautiously consider personification in theirdesigns . Although personification could engage students in effective learning experiences, it couldalso increase their feelings of trust disproportionately with the actual trustworthiness of the device.For example, students could perceive the device as always providing unbiased, correct answers,despite AI systems often being biased [46]. Thus, we further recommend considering transparencyin CA design.

Students also seemed to test the limits of Alexa, asking impossible or difficultquestions as encapsulated by the

Other category in Tab. 1. For example, students asked Alexato turn itself off, to tell them all the (infinite) digits of 𝜋 , and to provide the answer to . Thesebehaviors could be linked to trying to understand the system’s inner workings. Thus, we recom-mend developing CAs with the ability to explain themselves, and furthermore, providetransparency in terms of their abilities (e.g., being able to explain AI bias). This is especiallyimportant when considering the correlations between CAs’ friendliness and perceived trustworthi-ness, and students’ potential increase in awareness of ignorance in how CAs work, as discussedabove. This recommendation also aligns with other child-CA interaction research, which suggestsdesigning transparent AI systems with respect to children’s level of understanding [61]. Similar to the behavior of “testing” Alexa described above, students asked Alexaplayful questions like, “How much wood would a wood chuck chuck if a wood chuck would chuckwood?” and “Are dragons real?”. These questions illustrate students’—even middle and high schoolstudents’—innate desire to play. Play can be hugely beneficial in learning environments, especiallyfrom a constructionist perspective [40, 44]; thus, we recommend considering playful learningexperiences when developing CAs. For example, in our study students had the opportunity todevelop their own CA projects. Students came up with many different playful (as well as serious)ideas [59]. One very playful idea included a CA “Meme Maker”, which according to the developer,“help[ed] everyone get a quick laugh because as the old saying goes laughter is the best medicine”.This same student cited their favorite part of the workshop as “improving [their] coding abilityand learning more about [CAs]”.

Many student projects’ purposes were to provide utility, with 34% being mental andphysical health-, 29% being educational-, 21% being productivity- and 8% being accessibility-relatedCAs [59]. Utility was also reflected in students’ interactions with Alexa, as

Information updates and

Action commands were the most common interactions reported. With students evidently beinginterested in CAs’ utility, we recommend designing CAs with useful features to provide entrypoints to CA engagement and potential learning moments. For example, students might naturallyengage with a CA in figuring out what the weather is like tomorrow, which would provide anopportunity to teach students about APIs and databases, and how

CAs provide such answers.

One limitation of this study includes its generalizability. We engaged middle and high schoolstudents in remote workshops in which they used MIT App Inventor to program Amazon Alexa;however, the results may not generalize to other environments or grade bands. Furthermore, sincewe held workshops on two different weeks with slight differences, this could have affected theresults. Thus, future work may include larger follow-up studies with students in different gradebands in different environments.There are also limitations associated with thematic analyses. For instance, we may have missedcertain themes within the data, despite following the approach to analysis described in [7]. Fur-thermore, the amount of ideas presented by students in the pre- versus post-tests could have beeninfluenced by the time of day each test was presented. Nevertheless, we believe the thematic data(as well as the word frequency data) are useful for exploratory, graphical analysis. Further researchshould statistically analyse students’ conceptions of CAs and investigate how these conceptionsaffect the effectiveness of learning interventions.

Through the programming and learning intervention, students’ perceptions of Alexa changed inhow they viewed its intelligence and how close they felt to it, and students’ conceptions tendedtowards describing AI systems using more computer science terminology and AI literacy concepts.Based on these results, we presented four design recommendations, including considering per-sonification, transparency, playfulness and utility when designing CAs for engaging students inlearning experiences. This study contributes to AI literacy research aiming to develop students’understanding of AI to be more accurate and healthy, ToAM research aiming to understand students’perception of AI, and CA research aiming to develop more useful, effective interactions.

Alexa, Can I Program You?”: Student Perceptions of Conversational Artificial Intelligence Before and After ProgrammingAlexa 13

Children aged 11-18 (Mean=14.78, SD=1.91) were selected by their teachers to participate in themiddle/high school workshops. Teachers were selected from those that responded to an AmazonFuture Engineers call to Title I schools and signed a consent form. Selected students of the ageof 18 were given similar student consent forms to sign, and those under the age of 18 were givenassent forms and consent forms to be signed by their legal guardians before participating. Theuniversity’s IRB approved the study protocol and consent/assent forms, which communicated howthe data would be aggregated and anonymized. Given the wide age range, teachers assigned someof their older students to be mentors to younger students in case they fell behind.

ACKNOWLEDGMENTS

We thank the teachers and students, volunteer facilitators, MIT App Inventor team, Personal RobotsGroup, and Amazon Future Engineer (AFE) members who made the workshops possible. Specialthanks to Hal Abelson and Hilah Barbot. This work was funded by the AFE program and HongKong Jockey Club Charities Trust.

REFERENCES [1] Rachel F. Adler, Francisco Iacobelli, and Yehuda Gutstein. 2016. Are you convinced? A Wizard of Oz study totest emotional vs. rational persuasion strategies in dialogues.

Computers in Human Behavior

57 (2016), 75 – 81.https://doi.org/10.1016/j.chb.2015.12.011[2] Michal Al-Yagon and Mario Mikulincer. 2004. Socioemotional and Academic Adjustment Among Children withLearning Disorders: The Mediational Role of Attachment-Based Factors.

The Journal of Special Education

38, 2 (2004),111–123. https://doi.org/10.1177/00224669040380020501 arXiv:https://doi.org/10.1177/00224669040380020501[3] Safinah Ali, Blakeley H Payne, Randi Williams, Hae Won Park, and Cynthia Breazeal. 2019. Constructionism, ethics,and creativity: Developing primary and middle school artificial intelligence education. In

International Workshop onEducation in Artificial Intelligence K-12 (EDUAI’19) .[4] Robbert-Jan Beun, Eveliene de Vos, and Cilia Witteman. 2003. Embodied Conversational Agents: Effects on MemoryPerformance and Anthropomorphisation. In

Intelligent Virtual Agents , Thomas Rist, Ruth S. Aylett, Daniel Ballin, andJeff Rickel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 315–319.[5] Sondra H Birch and Gary W Ladd. 1997. The teacher-child relationship and children’s early school adjustment.

Journalof school psychology

Handbook of researchmethods in health social sciences , Pranee Liamputtong (Ed.). Springer, Singapore.[8] J. Cohen. 2013.

Statistical Power Analysis for the Behavioral Sciences . Elsevier Science. https://books.google.ca/books?id=rEe0BQAAQBAJ[9] Kevin Corti and Alex Gillespie. 2016. Co-constructing intersubjectivity with artificial conversational agents: People aremore likely to initiate repairs of misunderstandings with agents represented as human.

Computers in Human Behavior

58 (2016), 431 – 442. https://doi.org/10.1016/j.chb.2015.12.039[10] William Damon. 2004. What is Positive Youth Development?

The ANNALS of the American Acad-emy of Political and Social Science

Computers & Education

111 (2017), 74 – 100. https://doi.org/10.1016/j.compedu.2017.04.005[12] Daniella DiPaola, Blakeley H. Payne, and Cynthia Breazeal. 2020. Decoding Design Agendas: An Ethical Design Activityfor Middle School Students. In

Proceedings of the Interaction Design and Children Conference (London, United Kingdom) (IDC ’20) . Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3392063.3394396[13] Stefania Druga, Randi Williams, Cynthia Breazeal, and Mitchel Resnick. 2017. "Hey Google is It OK If I Eat You?":Initial Explorations in Child-Agent Interaction. In

Proceedings of the 2017 Conference on Interaction Design and Children (Stanford, California, USA) (IDC ’17) . Association for Computing Machinery, New York, NY, USA, 595–600. https://doi.org/10.1145/3078072.3084330 [14] Stefania Druga, Randi Williams, Hae Won Park, and Cynthia Breazeal. 2018. How Smart Are the Smart Toys? Childrenand Parents’ Agent Interaction and Intelligence Attribution. In

Proceedings of the 17th ACM Conference on InteractionDesign and Children (Trondheim, Norway) (IDC ’18) . Association for Computing Machinery, New York, NY, USA,231–240. https://doi.org/10.1145/3202185.3202741[15] Reinders Duit. 2009. Bibliography: Students’ and teachers’ conceptions and science education.

Diakses pada tanggal

Ulm University (2016), 1–11.[18] Simon Gächter, Chris Starmer, and Fabio Tufano. 2015. Measuring the closeness of relationships: a comprehensiveevaluation of the ’Inclusion of the Other in the Self’ scale.

PloS one

10, 6 (2015), e0129478.[19] Andres Guadamuz. 2017. Do androids dream of electric copyright? Comparative analysis of originality in artificialintelligence generated works.

Intellectual property quarterly

International Journal of Information Management

56 (2021), 102250. https://doi.org/10.1016/j.ijinfomgt.2020.102250[21] Christopher Brett Jaeger, Alicia M Hymel, Daniel T Levin, Gautam Biswas, Natalie Paul, and John Kinnebrew. 2019. Theinterrelationship between concepts about agency and students’ use of teachable-agent learning technology.

Cognitiveresearch: principles and implications

4, 1 (2019), 1–20.[22] Christopher Brett Jaeger and Daniel Levin. 2016. If Asimo thinks, does Roomba feel? The legal implications ofattributing agency to technology.

Journal of Human-Robot Interaction (Symposium on Robotics Law and Policy)

5, 3(Dec. 2016), 23. https://ssrn.com/abstract=3097129[23] KM Kahn, R Megasari, E Piantari, and E Junaeti. 2018. AI programming by children using Snap! block programming ina developing country. 11082.[24] Peter H Kahn Jr, Takayuki Kanda, Hiroshi Ishiguro, Nathan G Freier, Rachel L Severson, Brian T Gill, Jolina H Ruckert,and Solace Shen. 2012. “Robovie, you’ll have to go into the closet now”: Children’s social and moral relationships witha humanoid robot.

Developmental psychology

48, 2 (2012), 303.[25] Thomas Kreilkamp. 1984. Psychological Closeness: EXAMPLES OF CLOSENESS CONCEPTUALIZATION REFERENCES.

The American Behavioral Scientist (pre-1986)

Interdisciplinary Journal of E-Learning and Learning Objects

Proceedings of the 20th Congress of the International Ergonomics Association (IEA 2018) , SebastianoBagnara, Riccardo Tartaglia, Sara Albolino, Thomas Alexander, and Yushi Fujita (Eds.). Springer International Publishing,Cham, 3–12.[28] Krittaya Leelawong and Gautam Biswas. 2008. Designing learning by teaching agents: The Betty’s Brain system.

International Journal of Artificial Intelligence in Education

18, 3 (2008), 181–208.[29] D. T. Levin, J. A. Adams, M. M. Saylor, and G. Biswas. 2013. A transition model for cognitions about agency. In . 373–380. https://doi.org/10.1109/HRI.2013.6483612[30] Sharona T Levy and David Mioduser. 2008. Does it “want” or “was it programmed to...”? Kindergarten children’sexplanations of an autonomous robot’s adaptive functioning.

International Journal of Technology and Design Education

18, 4 (2008), 337–359.[31] Phoebe Lin, Jessica Van Brummelen, Galit Lukin, Randi Williams, and Cynthia Breazeal. 2020. Zhorai: Designing aConversational Agent for Children to Explore Machine Learning Concepts.

Proceedings of the AAAI Conference onArtificial Intelligence

34, 09 (Apr. 2020), 13381–13388. https://doi.org/10.1609/aaai.v34i09.7061[32] Duri Long and Brian Magerko. 2020. What is AI Literacy? Competencies and Design Considerations. In

Proceedingsof the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20) . Association forComputing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3313831.3376727[33] Edward Loper and Steven Bird. 2002. Nltk: The natural language toolkit. arXiv preprint cs/0205028 (2002).[34] Silvia B. Lovato, Anne Marie Piper, and Ellen A. Wartella. 2019. Hey Google, Do Unicorns Exist? ConversationalAgents as a Path to Answers to Children’s Questions. In

Proceedings of the 18th ACM International Conference on

Alexa, Can I Program You?”: Student Perceptions of Conversational Artificial Intelligence Before and After ProgrammingAlexa 15

Interaction Design and Children (Boise, ID, USA) (IDC ’19) . Association for Computing Machinery, New York, NY, USA,301–313. https://doi.org/10.1145/3311927.3323150[35] Alexander Meschtscherjakov, Manfred Tscheligi, Bastian Pfleging, Shadan Sadeghian Borojeni, Wendy Ju, PhilippePalanque, Andreas Riener, Bilge Mutlu, and Andrew L. Kun. 2018. Interacting with Autonomous Vehicles: Learningfrom Other Domains. In

Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI EA ’18) . Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3170427.3170614[36] Joseph E. Michaelis and Bilge Mutlu. 2019. Supporting Interest in Science Learning with a Social Robot. In

Proceedingsof the 18th ACM International Conference on Interaction Design and Children (Boise, ID, USA) (IDC ’19) . Association forComputing Machinery, New York, NY, USA, 71–82. https://doi.org/10.1145/3311927.3323154[37] David Mioduser and Sharona T Levy. 2010. Making sense by building sense: Kindergarten children’s construction andunderstanding of adaptive robot behaviors.

International Journal of Computers for Mathematical Learning

15, 2 (2010),99–127.[38] Elizabeth Katalina Morales-Urrutia, Jose Miguel Ocaña, and Diana Pérez-Marín. 2020. How to Integrate Emotionsin Dialogues With Pedagogic Conversational Agents to Teach Programming to Children.

Innovative Perspectives onInteractive Communication Systems and Technologies (2020), 66.[39] Andreas Mueller. 2020. WordCloud for Python documentation. http://amueller.github.io/word_cloud/. Accessed:2021-01-30.[40] Seymour Papert and Idit Harel. 1991. Situating constructionism.

Constructionism

36, 2 (1991), 1–11.[41] Diana Pérez-Marín and Ismael Pascual-Nieto. 2013. An exploratory study on how children interact with pedagogicconversational agents.

Behaviour & Information Technology

32, 9 (2013), 955–964. https://doi.org/10.1080/0144929X.2012.687774 arXiv:https://doi.org/10.1080/0144929X.2012.687774[42] Yim Register and Amy J. Ko. 2020. Learning Machine Learning with Personal Data Helps Stakeholders GroundAdvocacy Arguments in Model Mechanics. In

Proceedings of the 2020 ACM Conference on International ComputingEducation Research (Virtual Event, New Zealand) (ICER ’20)

Journal for Education in the Built Environment

4, 2 (2009), 94–108. https://doi.org/10.11120/jebe.2009.04020094 arXiv:https://doi.org/10.11120/jebe.2009.04020094[45] Juan David Rodríguez-García, Jesús Moreno-León, Marcos Román-González, and Gregorio Robles. 2021. Evaluation ofan Online Intervention to Teach Artificial Intelligence With LearningML to 10-16-Year-Old Students. (2021).[46] Drew Roselli, Jeanna Matthews, and Nisha Talagala. 2019. Managing Bias in AI. In

Companion Proceedings of The 2019World Wide Web Conference (San Francisco, USA) (WWW ’19) . Association for Computing Machinery, New York, NY,USA, 539–544. https://doi.org/10.1145/3308560.3317590[47] Michael Scaife and Mike van Duuren. 1995. Do computers have brains? What children believe about intelligentartifacts.

British Journal of Developmental Psychology

13, 4 (1995), 367–377.[48] Sofia Schöbel, Andreas Janson, and Abhay Mishra. 2019. A Configurational View on Avatar Design–The Role ofEmotional Attachment, Satisfaction, and Cognitive Load in Digital Learning. In

Fortieth International Conference onInformation Systems, Munich .[49] Ryan M Schuetzler, G Mark Grimes, Justin Scott Giboney, and Jay F Nunamaker Jr. 2018. The influence of conversationalagents on socially desirable responding. In

Proceedings of the 51st Hawaii International Conference on System Sciences .283.[50] Daniel B. Shank. 2014. Impressions of computer and human agents after interaction: Computer identity weakenspower but not goodness impressions.

International Journal of Human-Computer Studies

72, 10 (2014), 747 – 756.https://doi.org/10.1016/j.ijhcs.2014.05.002[51] Bruce Sherin. 2013. A Computational Study of Commonsense Science: An Exploration in the Automated Analysis ofClinical Interview Data.

Journal of the Learning Sciences

22, 4 (2013), 600–638. https://doi.org/10.1080/10508406.2013.836654 arXiv:https://doi.org/10.1080/10508406.2013.836654[52] Rebecca R Skinner. 2019. The Elementary and Secondary Education Act (ESEA), as Amended by the Every StudentSucceeds Act (ESSA): A Primer. CRS Report R45977, Version 2.

Congressional Research Service (2019).[53] Hannah Sparks. 2019. Mom busts 9-year-old son using Alexa to cheat on homework. https://nypost.com/2019/10/15/mom-busts-9-year-old-son-using-alexa-to-cheat-on-homework/. Accessed: 2021-01-27.[54] Karen Spektor-Precel and David Mioduser. 2015. The Influence of Constructing Robot’s Behavior on the Development ofTheory of Mind (ToM) and Theory of Artificial Mind (ToAM) in Young Children. In

Proceedings of the 14th InternationalConference on Interaction Design and Children (Boston, Massachusetts) (IDC ’15) . Association for Computing Machinery,New York, NY, USA, 311–314. https://doi.org/10.1145/2771839.2771904 [55] Micol Spitale, Silvia Silleresi, Giulia Cosentino, Francesca Panzeri, and Franca Garzotto. 2020. "Whom Would You liketo Talk with?": Exploring Conversational Agents for Children’s Linguistic Assessment. In

Proceedings of the InteractionDesign and Children Conference (London, United Kingdom) (IDC ’20) . Association for Computing Machinery, NewYork, NY, USA, 262–272. https://doi.org/10.1145/3392063.3394421[56] David Touretzky, Christina Gardner-McCune, Fred Martin, and Deborah Seehorn. 2019. Envisioning AI for K-12: WhatShould Every Child Know about AI?

Proceedings of the AAAI Conference on Artificial Intelligence

33, 01 (Jul. 2019),9795–9799. https://doi.org/10.1609/aaai.v33i01.33019795[57] Jessica Van Brummelen. 2019.

Tools to Create and Democratize Conversational Artificial Intelligence . Master’s thesis.Massachusetts Institute of Technology, Cambridge, MA.[58] Jessica Van Brummelen, Tommy Heng, and Viktoriya Tabunshchyk. 2020. Appendix. https://gist.github.com/jessvb/1cd959e32415a6ad4389761c49b54bbf. Accessed: 2020-09-09.[59] Jessica Van Brummelen, Tommy Heng, and Viktoriya Tabunshchyk. 2021. Teaching Tech to Talk: K-12 ConversationalArtificial Intelligence Literacy Curriculum and Development Tools. In . AAAI, Online.[60] Xiaoyu Wan, Xiaofei Zhou, Zaiqiao Ye, Chase K. Mortensen, and Zhen Bai. 2020. SmileyCluster: Supporting AccessibleMachine Learning in K-12 Scientific Discovery. In

Proceedings of the Interaction Design and Children Conference (London, United Kingdom) (IDC ’20) . Association for Computing Machinery, New York, NY, USA, 23–35. https://doi.org/10.1145/3392063.3394440[61] Randi Williams, Hae Won Park, and Cynthia Breazeal. 2019. A is for Artificial Intelligence: The Impact of ArtificialIntelligence Activities on Young Children’s Perceptions of Robots. In

Proceedings of the 2019 CHI Conference on HumanFactors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19) . Association for Computing Machinery, New York, NY,USA, 1–11. https://doi.org/10.1145/3290605.3300677[62] David Wolber, Harold Abelson, and Mark Friedman. 2015. Democratizing computing with App Inventor.

GetMobile:Mobile Computing and Communications

18, 4 (2015), 53–58.[63] Ilka Wolter, Michael Glüer, and Bettina Hannover. 2014. Gender-typicality of activity offerings and child–teacherrelationship closeness in German “Kindergarten”. Influences on the development of spelling competence as anindicator of early basic literacy in boys and girls.

Learning and Individual Differences

31 (2014), 59 – 65. https://doi.org/10.1016/j.lindif.2013.12.008[64] Michael Wooldridge. 2021. Artificial Intelligence Is a House Divided: A decades-old rivalry has riven the field. It’s timeto move on.

The Chronicle (20 Jan. 2021).[65] Ying Xu and Mark Warschauer. 2020. What Are You Talking To?: Understanding Children’s Perceptions of Conversa-tional Agents. In

Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20) . Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376416[66] Abigail Zimmermann-Niefield, Shawn Polson, Celeste Moreno, and R. Benjamin Shapiro. 2020. Youth Making MachineLearning Models for Gesture-Controlled Interactive Media. In

Proceedings of the Interaction Design and ChildrenConference (London, United Kingdom) (IDC ’20)(IDC ’20)