[PDF] Extended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities

Abstract

Extended Reality (XR) technology - such as virtual and augmented reality - is now widely used in Human Computer Interaction (HCI), social science and psychology experimentation. However, these experiments are predominantly deployed in-lab with a co-present researcher. Remote experiments, without co-present researchers, have not flourished, despite the success of remote approaches for non-XR investigations. This paper summarises findings from a 30-item survey of 46 XR researchers to understand perceived limitations and benefits of remote XR experimentation. Our thematic analysis identifies concerns common with non-XR remote research, such as participant recruitment, as well as XR-specific issues, including safety and hardware variability. We identify potential positive affordances of XR technology, including leveraging data collection functionalities builtin to HMDs (e.g. hand, gaze tracking) and the portability and reproducibility of an experimental setting. We suggest that XR technology could be conceptualised as an interactive technology and a capable data-collection device suited for remote experimentation.

Full PDF

EExtended Reality (XR) Remote Research: a Survey of Drawbacksand Opportunities

Jack Ratcliffe ∗ Queen Mary, University of LondonLondon, UK

Francesco Soave ∗ Queen Mary, University of LondonLondon, UK

Nick Bryan-Kinns

Queen Mary University of LondonLondon, UK

Laurissa Tokarchuk

Queen Mary University of LondonLondon, UK

Ildar Farkhatdinov

Queen Mary University of LondonLondon, UK

ABSTRACT

Extended Reality (XR) technology - such as virtual and augmentedreality - is now widely used in Human Computer Interaction (HCI),social science and psychology experimentation. However, theseexperiments are predominantly deployed in-lab with a co-presentresearcher. Remote experiments, without co-present researchers,have not flourished, despite the success of remote approaches fornon-XR investigations. This paper summarises findings from a30-item survey of 46 XR researchers to understand perceived limi-tations and benefits of remote XR experimentation. Our thematicanalysis identifies concerns common with non-XR remote research,such as participant recruitment, as well as XR-specific issues, includ-ing safety and hardware variability. We identify potential positiveaffordances of XR technology, including leveraging data collectionfunctionalities builtin to HMDs (e.g. hand, gaze tracking) and theportability and reproducibility of an experimental setting. We sug-gest that XR technology could be conceptualised as an interactivetechnology and a capable data-collection device suited for remoteexperimentation.

CCS CONCEPTS • Human-centered computing → Mixed / augmented reality ; Virtual reality . KEYWORDS

Extended Reality, Virtual Reality, Augmented Reality, literaturereview, expert interviews

ACM Reference Format:

Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk,and Ildar Farkhatdinov. 2021. Extended Reality (XR) Remote Research: aSurvey of Drawbacks and Opportunities. In

CHI Conference on HumanFactors in Computing Systems (CHI ’21), May 8–13, 2021, Yokohama, Japan.

ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3411764.3445170 ∗ Both authors contributed equally to this research.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHI ’21, May 8–13, 2021, Yokohama, Japan © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8096-6/21/05...$15.00https://doi.org/10.1145/3411764.3445170

Extended reality (XR) technology - such as virtual, augmented,and mixed reality - is increasingly being examined and utilised byresearchers in the HCI and other research communities due to its po-tential for creative, social and psychological experiments [7]. Manyof these studies take place in laboratories with the co-presence ofthe researcher and the participant [36]. The XR research commu-nity has been slow to embrace recruiting remote participants totake part in studies running outside of laboratories - a techniquewhich has proven useful for non-XR HCI, social and psychologi-cal research [53][49]. However, the current Covid-19 pandemic hashighlighted the importance and perhaps necessity of understandingand deploying remote recruitment methods within XR research.There is also limited literature about remote XR research, al-though what reports exist suggest that the approach shows promise:data-collection is viable [67], results are similar to those found in-lab [46] even when the participants are unsupervised [29], andrecruiting is possible [40]. Researchers have also suggested usingexisting communities for these technologies, such as customisablesocial VR experiences, as combined platforms for recruitment andexperimentation [60]. With the increasingly availability of con-sumer XR devices (estimates show five million high-end XR HMDssold in 2020, raising to 43.5 million by 2025[73]), and health andsafety concerns around in-lab experimentation, particularly forresearch involving head-mounted displays (HMDs), it seems an im-portant time to understand the conceptions around remote researchfrom researchers who use XR technologies.This paper outlines the methodology and results from the first(that we are aware of) survey of XR researchers regarding remoteXR research. The results have been derived from 46 respondentsanswering 30 questions regarding existing research practice. Itoffers three core contributions: (1) we summarise existing researchon conducting remote XR experiments. (2) We provide an overviewof the status quo, showing that many of the concerns regardingremote XR are those also applicable to other remote studies; and thatthe unique aspects of remote XR research could offer more benefitsthan drawbacks. (3) We set out recommendations for advancingremote XR research, and outline important questions that should beanswered to create an evidence-backed experimentation process.

We present a literature review of relevant publications on XR re-search, remote research and remote XR research. We use "XR" asthe umbrella term for virtual reality (VR), augmented reality (AR) a r X i v : . [ c s . H C ] J a n HI ’21, May 8–13, 2021, Yokohama, Japan Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov and mixed reality (MR) [39]. This space is also sometimes referredto as spatial or immersive computing.The chapter is organised in three parts. First, we explore conven-tional XR experiments under ‘normal’ conditions (e.g. in laboratoryandor directly supervised by the researcher). We then summariseexisting literature on remote experiments in XR research. Finally,we report the main findings in previous publications on remotedata collection and experimentation.

According to Suh andProhpet’s 2018 systematic literature review [71], XR experimentsinvolving human participants can broadly be categorised into twogroups: (1) studies about XR, and (2) studies about using

XR. Thefirst group focuses on the effects of XR system features on the userexperience (e.g. if enhancing embodiment could affect presence out-comes [56]), whereas the second category examines how the use ofan XR technology modifies a measurable user attribute (e.g. if lever-aging XR embodiment could affect learning outcomes [55]). Acrossthese categories there have been a variety of explorations on differ-ent subjects and from different academic fields. These include socialpsychological [7], including social facilitation–inhibition [28], con-formity and social comparison [6], social identity [34]; neuroscienceand neuropsychology [36], visual perception [78], multisensory in-tegration [14], proxemics [61], spatial cognition [75], education andtraining [54], therapeutic applications [20], pain remediation [23],motor control [16], terror management [30] and media effects suchas presence [4].The theoretical approaches behind these studies are also dis-parate, including theories such as conceptual blending, cognitiveload, constructive learning, experiential learning, flow, media rich-ness, motivation, presence, situated cognition, the stimuli-organism-response framework and the technology acceptance model [71].

According to Suhand Prophet’s meta-analysis, the majority of XR research explo-rations have been experiments (69%) [71]. Other types of explo-rations include surveys (24%), interviews (15%) and case studies (9%).These approaches have been used both alone and in combinationwith each other. Data collection methods are predominantly quan-titative (78%), although qualitative and mixed approaches are alsoused. Another systematic review of XR research (focused on highereducation) [54] adds focus group discussion and observation asresearch methods, and presents two potential subcategories for ex-periments: mobile sensing and "interaction log in VR app", in whichthe XR application logs the user’s activities and the researcher usesthe resulting log for analysis.The types of data logging found in XR experiments are muchthe same as those listed in Weibel’s exploration of physiologicalmeasures in non-immersive virtual reality [77], with studies us-ing skin conductance [79], heart rate [18], blood pressure [27], aswell as electroencephalogram (EEG) [1]. Built-in inertial sensorsthat are integral to providing an XR experience, such as head andhand position for VR HMDs, have also been widely used for in-vestigations, including posture assessment [9], head interaction tracking [80], gaze and loci of attention [52] and gesture recogni-tion [31], while velocity change [76] has also been used in both VRand AR interventions.

There are many suggested bene-fits to using XR technology as a research tool: it allows researchersto control the mundane-realism trade-off [2] and thus increase theextent to which an experiment is similar to situations encounteredin everyday life without sacrificing experimental control [7]; tocreate powerful sensory illusions within a controlled environment(particularly in VR) , such as illusions of self-motion and influencethe proprioceptive sense [65]; improve replication [7] by makingit easier to recreate entire experimental environments; and allow representative samples [7] to experience otherwise inaccessible envi-ronments, when paired with useful distribution and recruitmentnetworks.

Pan [47] explored some of thechallenges facing experiments in virtual worlds, which continueto be relevant in immersive XR explorations. These include thechallenge of ensuring the experimental design is relevant for eachtechnology and subject area; ensuring a consistent feeling of self-embodiment to ensure engaged performance [35]; avoid uncannyvalley , in which characters which look nearly-but-not-quite hu-man are judged as uncanny and are aversive for participants [44]; simulation sickness and nausea during VR experiences[45]; cogni-tive load [72] which may harm participation results through over-stimulation, particularly in VR [69] [41]; novelty effects of newtechnology interfering with results [15] [19]; and ethics , especiallywhere experiences in VR could lead to changes in participants’behaviour and attitude in their real life [5] and create false memo-ries [63].

There has been little research into remote XR experimentation, par-ticularly for VR and AR HMDs. By remote, we mean any experimentthat takes place outside of a researcher-controlled setting. This isdistinct from field or in-the-wild research, which is research "thatseeks to understand new technology interventions in everyday liv-ing" [58], and so is dependent on user context. These definitionsare somewhat challenged in the context of remote VR research, asfor VR, remote and field/in-the-wild are often the same setting, asthe location where VR is most used outside the lab is also where itis typically experienced (e.g. home users, playing at home [40]). ForAR, there is a greater distinction between remote, which refer to anyAR outside of the controlled setting of the lab; and field/in-the-wild,which require a contextual deployment.In terms of remote XR research outcomes, Mottelson and Horn-bæk [46] directly compared in-lab and remote VR experiment re-sults. They found that while the differences in performance be-tween the in-lab and remote study were substantial, there wereno significant differences between effects of experimental condi-tions. Similarly, Huber and Gajos explored uncompensated andunsupervised remote VR samples and were able to replicate keyresults from the original in-lab studies, although with smaller effectsizes [29]. Finally, Steed et al. showed that collecting data in thewild is feasible for virtual reality systems [67]. xtended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities CHI ’21, May 8–13, 2021, Yokohama, Japan

Ma et al. [40] is perhaps the first published research on recruitingremote participants for VR research. The study, published in 2018,used the Amazon Mechanical Turk (AMT) crowdsourcing platform,and received 439 submissions over a 13-day period, of which 242were eligible. The participant demographics did not differ signifi-cantly from previously reported demographics of AMT populationsin terms of age, gender, and household income. The notable differ-ence was that the VR research had a higher percentage of U.S.-basedworkers compared to others. The study also provides insight intohow remote XR studies take place: 98% of participants took partat home, in living rooms (24%), bedrooms (18%), and home offices(18%). Participants were typically alone (84%) or in the presence ofone (14%) or two other people (2%). Participants reported having“enough space to walk around” (81%) or “run around (10%)”. Only6% reported that their physical space would limit their movement.While Ma et al’s work is promising in terms of reaching a repre-sentative sample and the environment in which participants takepart in experiments, it suggests a difficulty in recruiting participantswith high-end VR systems, which allow six-degrees of freedom (theability to track user movement in real space) and leverage embodiedcontrollers (e.g. Oculus Rift, HTC Vice). Only 18 (7%) of eligibleresponses had a high-end VR system. A similar paucity of high-endVR equipment was found by Mottelson and Hornbæk [46], in which1.4% of crowdworkers had access to these devices (compared to4.5% for low-end devices, and 83.4% for Android smartphones). Thisproblem is compounded if we consider Steed et al’s finding thatonly 15% of participants provide completed sets of data [67].An alternative approach to recruiting participants is to createexperiments inside existing communities of XR users, such as insidethe widely-used VR Chat software [60]. This allows researchers toenter into existing communities of active users, rather than attemptto establish their own. However, there are significant limitations forbuilding experiments on platforms not designed for experimenta-tion, such as programming limitations, the ability to communicatewith outside services for data storage, and the absence of bespokehardware interfaces.

Using networksfor remote data collection from human participants has been provenvalid in some case studies [22, 37]. In Gosling et al’s comprehen-sive and well-cited study [22], internet-submitted samples werefound to be diverse, generalise across presentation formats, werenot adversely affected by non-serious or repeat respondents, andpresent results consistent with findings from in-lab methods. Thereis similar evidence for usability experiments, in which both the laband remote tests captured similar information about the usabilityof websites [74].That said, differences in results for lab and remote experimentsare common [10, 64, 70]. The above website usability study alsofound that in-lab and remote experiments offered their own ad-vantages and disadvantages in terms of the usability issues uncov-ered [74]. The factors that influence differences between in-laband remote research are still being understood, but even beyondexperiment design, there is evidence that even aspects such as the participant-perceived geographical distance between the partici-pant and the data collection system influences outcomes [43].Reips’ [57] well-cited study outlined 18 advantages of remoteexperiments, including (l) easy access to a demographically and cul-turally diverse participant population, including participants fromunique and previously inaccessible target populations; (2) bringingthe experiment to the participant instead of the opposite; (3) highstatistical power by enabling access to large samples; (4) the directassessment of motivational confounding; and (5) cost savings oflab space, person-hours, equipment, and administration. He foundseven disadvantages: (l) potential for multiple submissions, (2) lackof experimental control, (3) participant self-selection, (4) dropout,(5) technical variances, (6) limited interaction with participants and(7) technical limitations.

With the increasing availabilityof teleconferencing, it has become possible for researchers to be co-"tele"present and supervise remote experiments through schedulingwebcam experiment sessions. This presents a distinction from theunsupervised internet studies discussed above, and brings its ownopportunities and limitations.Literature broadly suggests that unsupervised experiments pro-vide suitable quality data collection [26, 32, 59]. A direct comparisonbetween a supervised in-lab experiment and a large, unsupervisedweb-based experiment found that the benefits outweighed its po-tential costs [59]; while another found that a higher percentageof high-relevance responses came from unsupervised participantsthan supervised ones in a qualitative feedback setting [26]. Thereis also evidence that unsupervised participants react faster to tasksover the internet than those observed in the laboratory [32].For longitudinal studies, research in healthcare has found nosignificant difference between task adherence rates between un-supervised and supervised groups [17]. However, one study notedthat supervised studies had more effective outcomes [38].

Remote data collection was theorisedto bring easy access to participants, including diverse participantsand large samples [57]. Researchers have found that recruitingcrowdworkers, people who work on tasks distributed to them overthe internet, allowed them access to a large participant pool[49],with enough diversity to facilitate cross-cultural and internationalresearch [11]. Research has found that crowdworkers were signif-icantly more diverse than typical American college samples andmore diverse than other internet recruitment methods [11], at an af-fordable rate [49][11]. This has allowed researchers a faster theory-to-experiment cycle [42].Results from crowdworker-informed studies have been shown toreproduce existing results from historical in-lab studies [49] [11] [66],while a direct comparison between experiment groups of crowd-workers, social media-recruited participants and on-campus recruit-ment, found almost indistinguishable results [13].Some distinctions between crowdworkers and in-lab have beendiscovered, however. Comparative experiments between crowd-workers and in-person studies have suggested slightly higher par-ticipant rejection rates [66], while participants have been shownto report shorter narratives than other groups of college students(both online and in-person) and use proportionally more negative

HI ’21, May 8–13, 2021, Yokohama, Japan Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov emotion terms than college students reporting verbally to an ex-perimenter [24].Distinctions also exist within crowdworker recruitment sources.A study of AMT, CrowdFlower (CF) and Prolific Academic (ProA)found differences in response rate, attention-check question results,data quality, honesty, diversity and how successfully effects werereproduced [50].Data quality is a common concern regarding crowdworkers [21].However, attention-check questions used to screen out inattentiverespondents or to increase the attention of respondents have beenshown to be effective in increasing the quality of data collected [3],as have participant reputation scores [51].A growing concern regarding crowdworkers is non-naivete, inwhich participants having some previous knowledge of the study orsimilar studies that might bias them in the experiment. Many work-ers report having taken part in common research paradigms [48],and there are concerns that if researchers continue to depend onthis resource, the problem may expand. As such, further efforts areneeded by researchers to identify and prevent non-naive partici-pants from participating in their studies [12].

It is clear that remote methods have been usefully deployed fornon-XR research, and seemingly bring benefits such as easier par-ticipant recruitment, reduced recruitment cost and broadened di-versity, without introducing major biases. However, there is stilla paucity of research regarding the extent to which remote XRresearch can and has been used to leverage the unique benefits ofboth XR (environmental control, sensory illusions, data collection,replication) and remote (participation, practicality, cost-savings)methods, as well as the potential impact of their combined limita-tions. Therefore a survey of XR researcher experiences and beliefsregarding remote XR research could help us understand how theseapply practically at the current time, and understand the key areasfor future developments in this field.

We surveyed current practice to outline the researcher-perceivedbenefits and drawbacks of lab-based and remote XR research. Weused a 30-item qualitative questionnaire that enquired about partic-ipants’ existing lab-based and remote research practices; thoughtson future lab-based and remote research; and potential benefitsand drawbacks for each area. The survey was circulated throughrelevant mailing lists ([email protected], [email protected], [email protected]), to membersof groups thinking of or currently running remote studies, andto members of universities’ virtual and augmented reality groupsfound via search engines.Responses were thematically analysed using an inductive ap-proach based upon Braun and Clarke’s six phases of analysis [8].The coding and theme generation process was conducted twiceby independent researchers; themes were then reviewed collabora-tively to create the final categorisations.

We received 46 responses to our survey from 36 different (pre-dominantly academic) institutions. Most responses came from re-searchers based in Europe and North America, but responses alsocame from Asia. The majority of participants were either PhD stu-dents (18) or lecturers, readers or professors (11) at universities.Other roles were academic/scientific researcher (5), masters stu-dent (5), corporate researcher (4) and undergraduate student (2). Adiverse set of ages responded to the survey: 18-24 (5), 25-34 (22),35-44 (11), 45+ (6), and gender skewed male (29) over female (16) orother (1).

Participants were more likely to have previously ran in-lab studies(37) than remote studies (14). Twenty-seven participants noted that,because of the Covid-19 pandemic, they have considered conductingremote XR experiments. In the next six months, more researcherswere planning to run remote studies (24) than lab-based (22).Participants predominantly categorised their research as VR-only(28) over AR-only (5). Ten participants considered their research asboth VR and AR (and three did not provide an answer). This result

Figure 1: Type of XR medium explored by survey respon-dents.Figure 2: Features used by respondents in their userstudies. (A) Embodied Interactivity: using embodiedcontroller/camera-based movement. (B) Embodied Move-ment:using your body to move/"roomscale". (C) AbstractMovement: using a gamepad or keyboard and mouse tomove. (D) Sound 3D: binaural acoustics. E) Spoken Input.(F) Abstract Interactivity: using a gamepad or keyboard andmouse to interact. (G) Sound non-3D: mono/stereo audio.(H) Unique features: e.g. haptics, hand tracking, scent. xtended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities CHI ’21, May 8–13, 2021, Yokohama, Japan is illustrated in Fig. 1. In terms of research hardware, the majorityof VR research leveraged virtual reality HMD-based systems withsix degrees of freedom (32), that tracks participants’ movementsinside the room, over three degrees of freedom (15) or CAVE sys-tems (1). Nineteen researchers made use of embodied or gesturecontrollers, where the position of handheld controllers are trackedin the real world and their position virtualised. For AR, HMDs werethe predominant medium (13) over smartphones (9), with someresearchers (5) using both.An array of supplementary technologies and sensors were alsoreported by 13 respondents, including gaming joysticks, haptic ac-tuators, a custom haptic glove, motion capture systems, e-textiles,eye-trackers, microphones, computer screens, Vive body trackers,brain-computer interfaces, EEG and electrocardiogram (ECG) de-vices, galvanic skin response sensors and hand-tracking cameras,as well as other spatial audio and hardware rigs.The use of a variety of different off-the-shelf systems was alsoreported: Vive, Vive Pro, Vive Eye, Vive Index, Vive Pucks, Quest,Go, Rift, Rift S, DK2, Cardboard, Magic Leap One, Valve Knuckles,Hololens. Predominantly used devices are part of HTC Vive (25)and Oculus (23) family.Respondents outlined numerous features of immersive hardwarethat they used in their research, visible in Fig. 2. The most prominentwere embodiment aspects, including embodiment interactivity, inwhich a user’s hand or body movements are reflected by a digitalavatar (37) and embodiment movement (35), where participantscan move in real space and that is recognised by the environment.Abstract movement (13), where a user controls an avatar via anabstracted interface (like a joystick) and abstract interactivity (8)were less popular. Spoken input was also used (10), as well as 3Dsound (13) and non-3D sound (6). Scent was also noted (1) alongwith other unique features.

In this section, we present and discuss the themes found in oursurvey study. The key points of each theme are summarised in atable at the start each subsection. Some of these points were foundacross multiple themes as they touch various aspects of user-basedXR research.

Our analysis suggests that in-lab and remote studies can be addition-ally distinguished by whether the setting type is vital or preferred(summarised in Table 1). Broadly, in-lab (vital) studies require ex-perimental aspects only feasible in-lab, such as bespoke hardwareor unique data collection processes; in-lab (preferred) studies couldtake place outside of labs, but prefer the lab-setting based uponheightened concerns regarding the integrity of data collected andplace a high value on a controlled setting.

Remote (vital) studiesare required when a user’s natural environment is prioritised, suchas explorations into behaviour in Social VR software; and remote(preferred) studies are used when cross-cultural feedback or a largenumber of participants are needed, or if the benefits offered by anin-lab setting are not required.Beyond these, another sub-type emerged as an important consid-eration for user studies: supervised or unsupervised . While less of an Table 1: Summary of XR Study Sub-types

Method SummaryIn-lab (vital) Experiment requires features only feasiblein-lab, e.g. bespoke hardware, unique datacollectionIn-lab (preferred) Concerns about integrity of data collectedremotely, high value on controlled settingRemote (vital) User’s natural (in-the-wild) environment isimportant (e.g. Social VR, naturally experi-enced at home and online)Remote (preferred) Priority to get cross-cultural feedback orreach large number of participant; lab pro-vides limited benefitsimportant distinction for in-lab studies (which are almost entirelysupervised), participant responses considered both unsupervised "encapsulated" studies, in which explanations, data collection andthe study itself exist within the software or download process, and supervised studies, in which researchers schedule time with the re-mote participant to organise, run and/or monitor the study. Thesedistinctions will be discussed in more detail throughout the analy-sis below, as the sub-types have a distinct impact on many of thefeasibility issues relating to remote studies.

Twenty-nine respondents stated the well-known challenge of recruiting a satisfactory number of participantsfor lab-based studies. Issues were reported both with the scale ofavailable participants, and the problem of convenience samplingand WEIRD - Western, educated, industrialized, rich and democraticsocieties - participants[25].Participant recruitment was mentioned by 27 respondents as thearea in which remote user studies could prove advantageous overlabs. Remote studies could potentially provide easier recruitment(in terms of user friction: accessing to lab, arriving at the correcttime), as well as removing geographic restrictions to the participantpool.Removing the geographic restrictions also simplifies researchers’access to cross-cultural investigations (R23, R43). While cross-cul-tural lab-based research would require well-developed local recruit-ment networks, or partnerships with labs in target locations, remoteuser studies, and more specifically, systems built deliberately forremote studies, introduce cross-cultural scope at no additional over-head.There are, however, common concerns over the limitations tothese benefits due to the relatively small market size of XR technolo-gies. For AR, this is not a strong limitation for smartphones-basedexplorations, but the penetration of HMD AR and VR technology iscurrently limited, and it is possible that those who currently haveaccess to these technologies will not be representative of the widerpopulations. Questions remain over who the AR/VR HMD ownersare, if they exhibit notable differences from the general population,and if those differences are more impactful than those presentedby existing convenience sampling.

HI ’21, May 8–13, 2021, Yokohama, Japan Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov

Table 2: Study Participants Key Points

Key Point Issue Lab RemoteRecruitment Scope Sample size Usually smaller numbers Potential for larger numberRecruitment Scope Sample balance Might be easier to ensure balance How to ensure balance? (e.g. who mostly owns XRequipment?)Efficiency Time Requires setup time and organise par-ticipants Potential less time especially if encapsulated andunsupervisedPrecursor Require-ments Requisites Pre-test and linguistic/culture compre-hension conditions are ensured Not clear how to verify conditions in remote studiesDespite the belief that designing for remote participants willincrease participant numbers, and therefore the power of studies, itseems unclear how researchers will reach HMD-owning audiences.Thirty respondents who have, or plan to, run remote XR studieshave concerns about the infrastructure for recruiting participantsremotely. Unlike other remote studies, the requirement for partici-pants to own or have access to XR hardware greatly reduces thepool (around 5 million XR HMDs were sold in 2020 [73]). A majoroutstanding question is how researchers can access these potentialparticipants, although some platforms for recruiting XR partici-pants have emerged in the past few months such as XRDRN.org.Nine respondents noted that remote XR experiments may en-courage participation from previously under-represented groups,including introverts and those who cannot or do not wish to travelinto labs to take part (e.g. people who struggle to leave their homesdue to physical or mental health issues).However, respondents with research-specific requirements alsoraised concerns that recruitment of specific subsets of participantscould be more difficult remotely. For example, when recruiting for amedical study of those with age-related mobility issues, it is unlikelythat there will be a large cohort with their own XR hardware.

Twenty-five respondents noted the poten-tial for remote studies to take up less time, particularly if remotestudies are encapsulated and unsupervised. They stated that thisremoves scheduling concerns for both the researcher and the par-ticipant, and allows experiments to occur concurrently, reducingthe total researcher time needed or increasing the scale of experi-ment. However, there are concerns this benefit could be offset byincreased dropouts for longitudinal studies, due to a less "close"relationship between research and participant (R17, R25).

One respondent notedthey needed to run physiological precursor tests (i.e. visual acuityand stereo vision) that have no remote equivalent. Transitioning toremote research has meant this criteria must now be self-reported.Similarly, experiments have general expectations of linguistic andcultural comprehension, and opening research to a global scalemight introduce distinctions from typically explored population.One respondent cautioned that further steps should be taken toensure participants are able to engage at the intended level, as in-labthese could be filtered out by researcher intuition.

The overwhelming drawback of remote XR research, as reportedby the majority respondents, was that of data collection. Excludingchanges to participant recruitment, as mentioned above, the issuescan broadly be categorised as: (1) bespoke hardware challenges,(2) monitoring/sensing challenges, and (3) data transmission andstorage.The use of bespoke hardware in any type of remote user studyis a well-known issue, predominantly regarding the difficulty ofmanaging and shipping bespoke technology to participants andensuring it works in their test environments. In the context of XRtechnologies, 13 respondents voiced concerns about the compli-cated and temperamental system issues that could arise, particularlysurrounding the already strenuous demands of PC-based VR onconsumer-level XR hardware, without additional overheads (e.g.recording multiple cameras).Four respondents felt it was unreasonable to ask remote par-ticipants to prepare multiple data-collection methods that may betypical in lab-studies, such as video recording and motion tracking.There were also concerns regarding the loss of informal, ad-hocdata collection (e.g. facial expressions, body language, casual con-versations).Finally, concerns were also raised regarding the efforts requiredto encapsulate all data capture into the XR experience, the effectsthis might have on data collection (for example, a recent study high-lighted a difference on the variability of presence when participantsrecorded it from inside the VR experience versus outside [62]), thereliability of transferring large amounts of data from participants,and how sensitive information (especially in the context of medicalXR interventions) can securely be transferred and stored. This areasperhaps presents the biggest area for innovation for remote XRresearch, as it is reasonable to assume the academic communitycould create efficient, easy-to-use toolkits for remote data collec-tion in XR environments which integrate to ethics-compliant dataarchives.Many data collection methods were deemed infeasible for re-mote experimentation: EEG, ECG, eye/hand tracking, GSR, as wellas body language and facial expressions. Five researchers notedadaptions they had been working on to overcome these, includingusing HMD orientation to replace eye tracking, and using built-inHMD microphones to record breaths instead of ECG monitoring todetermine exertion, or using the HMD controllers to perform handtracking. xtended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities CHI ’21, May 8–13, 2021, Yokohama, Japan

Table 3: Data Collection Key Points

Key Point Lab RemoteHardware Access custom and/or reliable hardware Limited access to devices (e.g. EEG, ECG, computationalpower, etc.)Data Collection can be supervised, more detailed, real-time,more space for qualitative Mostly unsupervised (less control), human expressions(e.g. facial) are generally lost, qualitative feedback isharder to collectBehaviour Likely more serious, richer (qualitative) data Lack of detailed feedback, potentially less honestRespondents also noted some behavioural concerns and changesfor remote, unsupervised participants. These included a lack ofparticipation in qualitative feedback (6 respondents); for one re-searcher (R20), participants were "encouraged to provide feedbackbut few took the initiative." Another researcher (R31) stated "De-briefing is such a good space to collect unstructured interview data.Users relax after the questionnaire/debriefing ... produc[ing] a ...meta-narrative where participants consider your questions andtheir experiences together". The lack of supervision raised con-cerns regarding whether participants were being "truthful" in theirresponses, with one researcher (R41) stating that participants at-tempted to "game" their study in order to claim the participationcompensation. However, others stated that unsupervised studiescould reduce research bias arising from their perception of theparticipants’ appearance and mannerisms.

Many respondents were concerned thatunsupervised participants may conduct the experiments incorrectly,or have incorrect assumptions, or misunderstand processes or targetactions. Twenty-four respondents felt that guidance would be betterprovided (introduction, explanations, etc) in a lab setting that alsoallows ad-hoc guidance and real-time corrections.There were also concerns over the mental state of participants:remote participants "may not take it seriously" or not focus (lack ofmotivation and engagement) or approach the study with a specificmood unknown to the researcher (R19, R30). Contrasting opinionssuggested that participants may feel that the in-lab experience is"overly formal and uncomfortable" (R32).Some respondents stated that remote experiments risk losingthe "rapport" between researcher and participant, which might neg-atively influence the way a participant performs a remote study.However, one respondent stated that the transition to remote ex-perimentation allowed them different, deeper, on-going connec-tion with their participants. Their research was for a VR machinelearning tool, and they found that moving away from in-personexperimentation and to a remote workshop process encouragedthe up-take of longitudinal community-building tools. The chosencommunication method between researcher and user - Discordservers - became a place for unsupervised interaction between par-ticipants, and led to an on-going engagement with the research(R33). However it should be considered that any "rapport" betweenparticipant and researcher might introduce bias.

Concern was raised around participants’ en-vironments, and their potential varying unsuitability for remoteexperimentation, compared with controlled laboratory settings. Forexample, one respondent (R20) stated: "one user reported walkinginto their whiteboard multiple times, causing low presence scores."The concern is particularly strong for unsupervised remote experi-ments, as distractions could enter into the experiment and affectdata without the researcher being aware.This concern was not universal, however. Four respondentsnoted that their laboratories space was far from distraction free,and even suggested that a remote space could prove freer of inter-ruptions than the space available to them in their research setting;while others stated that researchers should be mindful that the lab-oratory itself is an artificial space, far more so than where peoplewill typically use their VR setups - in their homes. Five respondentshighlighted how XR research could benefit from being deployed in"the participants’ own environment".The immediate environment of the user was also raised as aconcern for VR experiment design: the choice of being able to movefreely in an open space in a laboratory against a more adaptivesolution for the unknown variables of participants’ home environ-ments.Respondents noted that supporting the different VR and ARsetups to access a larger remote audience would also prove morelabour-intensive, and would introduce more variables comparedwith the continuity of the tech stack available in-lab. With remoteexperiments, and more so for encapsulated unsupervised ones, 10respondents believe there will be more time spent in developingthe system.

A concern regarding remote experi-ments, particularly unsupervised, is that calibration processes areharder to verify (R30). This could either cause participants to un-knowingly have faulty experiences, and therefore report faultydata; or it will increase time taken to verify user experiences arecorrect. Unknown errors can effect data integrity or participantbehaviour. Respondents noted that this type of remote error areoften much more difficult and labour-intensive to fix comparedwith in-lab. This issue is compounded by individual computer sys-tems introducing other confounding factors (for both bug-fixingand data collection) such as frame-rates, graphic fidelity, trackingquality and even resolution can vary dramatically.Five respondents reflected that overcoming these issues couldlead to more robust research plans, as well as better development

HI ’21, May 8–13, 2021, Yokohama, Japan Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov

Table 4: Experiment Process Key Points

Key Point Issue Lab RemoteProcess & Guidance Control Full control over setup and participants No control and guidance over participantsProcess & Guidance Participants Rapport with researcher, welcoming, moreserious, attentive Different attitude, potential cheatingEnvironment Setting Can be distracting (e.g. outside noise) butgenerally more controlled Might be distracting or overwhelming butlikely more realistic/natural for participantsHardware & software Hardware Access to custom devices, normal calibra-tion process No calibration (by researcher), potential forunknown errors, no custom toolsHardware & software Software Allows for Wizard of Oz, adjust setting inreal time Issues harder to spot and influence results,longer development timeResearch questions Topics Unchanged, if we go back to normal re-search conditions Remote setup might influence researchquestions and topicsCost Expenditures More time consuming, more expensive torun Potentially cheaper but potentially morework for implementationand end-product software to overcome problems listed. This encap-sulation could also lead to easier opportunities for reproducability,as well as the ability for researchers to share working versions ofthe experiment with other researchers, instead of just the results.It could also help with the versioning of experiments, allowingresearchers to build new research on-top of previous experimentsoftware.Four respondents were aware these advantages are coupled withlonger development times. The increased remote development re-quirements could also be limiting for researchers who face con-strained development resources, particularly those outside of com-puter science departments. This is compounded by the fact that theinfrastructure for recruiting remote XR participants, data capture,data storage and bug fixing is not particularly developed. Once theseare established, however, respondents felt these might make for ahigher overall data quality compared with the current laboratory-based status quo, due to more time spent creating automated record-ing processes, and not relying on researcher judgement. There arealso arguments that the additional development time is offset by thepotential increase in participants and, if unsupervised, the reductionin experiment supervision requirements.Six respondents that use specific hardware in their research,noted that it was currently difficult to measure physiological in-formation in a reliable way, and included hand tracking in this.However, we are aware that some consumer VR hardware (OculusQuest) allows hand-tracking, and so there is an additional questionof whether researchers are being fully supported in knowing whattechnologies are available to them.To alleviate issues with reaching participants, two respondentswrote about potentially sending equipment to participants. Thelimitations of this were noted as hardware having gone missing(which had happened, R35), and participants being unable to useequipment on their own (which had not happened yet).

Five respondents noted that their re-search questions changed or could change depending on whetherthey were aiming for a laboratory or remote settings. For example,one respondent (R31) suggested that "instead of the relationship

Table 5: Health and Safety Key Points

Key Point SummaryProtocols Missing standard protocols (to work safelywith participants in-lab)Equipment Sanitizing of in-lab equipment and spacesRemote Concerns for remote participants (e.g. acci-dents during a user study)Real-Time Aid Not available for remote participants (e.g.motion sickness)of the physical body to virtual space, I’d just assess the actions invirtual space". Others explored the potentiality of having accessto many different system setups, for example, now being able toeasily ask questions like "are there any systematic differences incybersickness incidence across different HMDs?". (R39)Nine respondents speculated that remote research has potentialfor increasing longitudinal engagement, due to lower barriers toentry for researcher (room booking, time) and participant (no com-mute), and that rare or geographically based phenomena could becheaply studied using remote research; as providing those commu-nities access to VR may be cheaper than relocating a researcher tothem.

Eight respondents noted the potential of remote ex-perimentation for reducing some of the cost overheads for runningexperiments. Laboratories have important costs that are higherthan remote studies: lab maintenance, hardware maintenance, staffmaintenance. Without these, costs per participant are lower (andfor unsupervised studies, almost nil). As experiment space availabil-ity was also noted as a concern for laboratory-based experiments,this seems a potentially under-explored area of benefit, providedremote participant recruitment is adequate. xtended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities CHI ’21, May 8–13, 2021, Yokohama, Japan

The leading benefit given for remote user studies was that of healthand safety, citing shared HMDs and controllers as a potential vectorfor Covid-19 transmission, as well as more general issues such asair quality in enclosed lab spaces. Concerns were raised for bothviral transmission between participants, and between participantand the researcher. This concern has also increased administrationoverheads, with 6 respondents stating it could be more time con-suming to prepare the lab and organise the studies or using newcontract-tracing methods for lab users.However, respondents also raised concerns about additionalsafety implications for remote participants. The controlled lab en-vironment is setup to run the study, whereas remote participantsare using a general-purpose space. One AR researcher who con-ducts research that requires participants to move quickly outsidein fields noted his study could be considered "incredibly unsafe" ifunsupervised or run in an inappropriate location. Additionally, forhealth and mental health studies, in-lab allows for researcher toprovide support, especially with distressing materials. Finally, VRenvironment design has a direct impact on the level of simulatorsickness invoked in participants. There were questions about theresponsibility of researchers to be present to aid participants whocould be made to feel unwell from a system they build.

Three ethics concerns were reported by respondents: encouragingrisky behaviours, responsibility for actions in XR and data privacy.An example of this might be the ethical implications of payingparticipants, and therefore incentivising them, to take part in whatcould be considered a high-risk behaviour: entering an enclosedspace with a stranger and wearing a VR HMD.Respondent (R30) raised the question of liability for participantswho are injured in their homes while taking part in an XR researchproject. The embodied nature of XR interventions - and most respon-dents used this embodiment in their studies - could put participantsat a greater risk of harming themselves than with other mediums.Finally, while cross-cultural recruitment was seen as a potentialboom for remote research, questions were raised about ethics anddata storage and protection rules when participants are distributedacross different countries, each with different data storage lawsand guidelines. Although not limited to XR, due to the limitednumber of VR users, and the disproportionate distribution of theirsales, it seems the majority of remote VR participants will originatefrom North America, and ethics clarification from non-US-baseduniversities are needed.

While Covid-19 has impacted most studies around the world, thedependence on shared hardware for XR research, especially HMDs,has led to many implications reported by our respondents. Theseconcerns are particularly related to Covid-19, and therefore bereduced as the pandemic is resolved. However, as it is currentlyunclear when the pandemic will end, we felt it was useful to discussthem in a dedicated section.Most respondents noted that Covid-19 had caused a suspensionof studies and that they were unclear how long the suspension

Table 6: Covid-19 Implications

Key Point SummarySuspensions No user studies at the momentFacilities Sanitizing of equipment and spacesRecruitment Harder/impossible to recruit in-lab partici-pantsExclusion Bias and high risk participantswould last for, resulting in an overall drop in the number of studiesbeing conducted, with 30 respondents stating it will change theresearch they conduct (e.g. moving to online surveys). The con-tinuation of lab studies was eventually expected, but with addedsanitizing steps. However for many, it was unclear what steps theyshould take in order to make XR equipment sharing safe. Theseconcerns extended beyond the XR hardware to general facility suit-ability, including room airflow and official protocols which mayvary for each country and/or institution.Five respondents also had concerns about participants. Therewere worries that lab-based recruitment would be slow to recover,as participants may be put off taking part in experiments because ofthe potential virtual transfer vectors. Similarly, respondents wereconcerned about being responsible for participants, and puttingthem in a position in which their is a chance they could be exposedto the virus.There was also concerns around Covid-19 and exclusion, asresearchers who are at high risk of Covid-19 or those who are inclose contact with high risk populations, would now have to self-exclude from lab-based studies. This might introduce a participantselection bias towards those willing to attend a small room andsharing equipment,It should be noted that not all labs are facing the same problems- some of our respondents had continued lab-based experimen-tation during this period, with Covid-19 measures ensuring thatparticipants wore face masks during studies. This was considered adrawback as combined with an HMD, it covered the participant’sentire face and was cumbersome. These measures are also knownnot to be 100% protective.

In the previous section we presented the results as themes we foundin our analysis. Some of these presented common characteristicsand some issues were reported in multiple themes. We now sum-marise the results, highlight the key points and suggest importantquestions for future research.

As with non-XR experiments, researchers are interested in thepotential benefits of remote research for increasing the amount,diversity and segmentation of participants compared with in-labstudies. However, with many respondents reporting that it hasbeen difficult to recruit XR participants, it seems there is a gapbetween potential and practice. The unanswered question is howto build a pool of participants that is large and diverse enoughto accommodate various XR research questions and topics, given

HI ’21, May 8–13, 2021, Yokohama, Japan Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov that there are few high-end HMDs circulating in the crowdworkercommunity [40][46]. So far, we have found three potential solutionsfor participant recruitment, although each requires further study:(1) Establish a dedicated XR crowdworker community. However,concerns of non-naivety[48], which are already levied at the muchlarger non-XR crowdworker participant pools, would surely beincreased. We would also have to understand if the early versionof this community would be WEIRD[25] and non-representative,especially given the cost barrier to entry for HMDs.(2) Leverage existing consumer XR communities on the internet,such as the large discussion forums on Reddit. These should increasein size further as they shift from early-adopter to general consumercommunities. However, these communities may also have issueswith representation.(3) Establish hardware-lending schemes to enable access to abroader base of participants [68]. However, the cost of entry andrisk of these schemes may make them untenable for smaller XRresearch communities.It is also not clear, beyond HMD penetration, what the addi-tional obstacles are that XR poses for online recruitment. Technicalchallenges (e.g. XR applications needing to run on various devices,on different computers, requiring additional setup beyond simplesoftware installation) and unintuitive experiment procedures (e.g.download X, do an online survey at Y, run X, record Z) for par-ticipants are notable distinct issues for remote XR research. It isalso unclear if the use of XR technology has an impact on whatmotivates participants to take part in remote studies, an area ofstudy that has many theoretical approaches even in the non-XRarea[33].

Respondents feel that many types of physiological data collectionare not feasible with either XR or non-XR remote research. Forremote XR research, there are unique concerns over video andqualitative data collection as using XR technologies can make it(technically) difficult to reliably video or record the activity, as wellas moving participants’ loci of attention away from the camera orobscuring it behind an HMD. However, the hardware involved increating XR experiences provides a variety of methods to gatherdata, such as body position, head nodding, breath-monitoring, handtracking, HMD angle instead of eye tracking. These can be usedto explore research topics that are often monitored via other typesof physiological, video or qualitative data, such as attention, moti-vation, engagement, enjoyment, exertion or focus of attention. Itwould be useful for XR researchers to build an understanding ofwhat the technologies that are built into XR hardware can tell usabout participant experiences, so as to allow us to know the datacollection affordances and opportunities of XR hardware.That said, the infrastructure for collecting and storing this (mass)of XR data remotely is currently not fully implemented, and we arenot aware of any end-to-end standardised framework. However,work is being done to simplify the data collection step for XRexperiments build in Unity [9]. There are also opportunities tofurther develop web-based XR technologies that could send andstore data on remote servers easily. There are also ethical concerns,as respondents were unclear on guidance regarding data collection from participants located in other nations, particularly when theyshould be paid. This includes how the data is collected, where itshould be stored, and how can be manipulated.

At the time of writing, many laboratories are considered unsafe forrunning user studies. Although some respondents reported beingable to work in-lab, the limitations mean it is not currently feasibleto run user studies under normal conditions. The main concernfor the near future is the lack of standardised protocols to ensuresafety of researchers and participants while running user studiesand the issue with the ethics protocols of the research institutions.For XR research, it is unclear how to adequately sanitize equipmentand tools, as well as how to maintain physical distancing. There arealso concerns about the comfort of participants if they are requiredto wear masks alongside HMDs. Finally, respondents reported con-cerns about a potential long-term fall in user motivation to take partin such experiments, when HMDs are a notable infection vector.There are distinctly different safety and ethics concerns aroundremote XR experiments, including the research responsibility fornot harming participants (e.g. ensuring environments are safe forthe movements, and not inducing simulator sickness), which, whilealso true of in-lab experiments, are considered a greater challengewhen a participant is not co-located with the researcher.

Respondents reported framing their research questions and experi-ments differently depending on the target experiment setting. Thestrongest transition was that of an in-lab study of participants us-ing an AR HMD (Hololens), which changed to a remote study thathad participants watch a pre-recorded video of someone using theAR HMD. It seems these kinds of transitions will continue to benecessary depending on how esoteric the hardware is, with fewerconcerns for AR smartphone investigations.A concern for respondents was that remote settings introduceadditional uncontrolled variables that need to be considered byresearchers, such as potential unknown distractions, trust in partic-ipants and their motivation, and issues with remote environmentalspaces. However, previous research shows that most HMD-wearingremote participants engage in space well-known to them (the home)and predominantly when they are alone [40], which could alleviatesome of the environmental space and distraction concerns. Furtherresearch into how a home environment could impact XR studiesis needed, and the creation of well-defined protocols to alleviateuncontrolled influences remote XR results. Beyond this, we alsoneed to understand any impact that remote experiments may haveon results compared with in-lab experiences, especially if we areto be able to reliably contrast lab and remote research. Previousresearch for non-XR experiments suggest that distinctions betweenlab and remote settings exist [10][64] [70], but it has been theorisedthat the impact might be less for XR experiments, as you "take theexperimental environment with you" [7].

Respondents stated that creating remote XR experiments might en-courage better software development and experimental processes. xtended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities CHI ’21, May 8–13, 2021, Yokohama, Japan

If experiments are able to be deployed as all-in-one experience anddata collection bundles that can run unsupervised, the time-savingimplications for researchers (and participants) are huge, especiallywhen paired with the potential increase in participants. This type of"encapsulated experiment" can also improve replication and trans-parency, as theorised by Blascovich [7], and allow for versioningof experiments, in which researchers can build on perfect repli-cas of other’s experimental environments and processes. Finally,due to the similar nature of XR hardware, data logging techniquescould easily be shared between system designers or standardised;something we have seen with the creation of the Unity ExperimentFramework [9].However, there are some limitations to this approach. It is likelyit will require additional development time from the researchers,especially as a comprehensive experiment framework is established.In addition, there are data collection limitations for remote XRstudies, as discussed in previous sections. It is also interesting toconsider how encapsulation might work for AR investigations, asthe environment will only partially be controlled by the designer.We believe that the potential for remote XR experiments liesin understanding the data collection affordances of the hardware;collectively building frameworks to ease the collection of this data;and to design research questions that maximise their use; all in-side encapsulated experiences. This might be a mindset shift forresearchers, who according to our survey, are predominantly lab-orientated.

Our goal with this research was to provide an overall insight intothe XR researcher community. However, this approach means thatinsights from sub-communities may not have been found. For exam-ple, we had no responses from researchers involved in topics such asvulnerable populations. Further investigation into sub-communitiesis needed to uncover potential insights for those areas.

It is clear from our survey that respondents believe that remote XRresearch has the potential to be a useful research approach. How-ever, it currently suffers from numerous limitations regarding datacollection, system development and a lack of clarity around par-ticipant recruitment. Analysis of our survey results and literaturearound remote and remote XR research suggest that, to better un-derstand the boundaries of remote XR experimentation, researchersneed answers to the following questions:(1) Who are the potential remote XR participants, and are theyrepresentative?(2) How can we access a large pool of remote XR participants?(3) To what extent do remote XR studies affect results comparedwith in-lab?(4) What are the built-in XR data collection affordances of XRhardware, and what can they help us study?(5) How can we lower the barriers to creating encapsulatedexperiment software, to maximise the potential of remoteXR research? We believe there is an opportunity to reconceptualise approaches toXR and remote research. XR experiments, as it stands, are predomi-nantly used to study a participant’s experience with an XR system,in an artificial but controlled setting (laboratory) using externaldata collection methods (surveys, cameras, etc.). However, if weconsider XR devices primarily as data-collection hardware with setproperties, we can work backwards to understand what researchquestions are suitable with the existing data collection afforded byXR hardware. Additionally, we also believe that there is potentialto reconceptualise, for suitable applications, the home as a naturalresearch location and move away from the laboratory as the defaultlocation for user studies. This is a potentially unique opportunityfor XR compared with non-XR studies as, for many investigations,the XR experiment takes the environment with it.

ACKNOWLEDGMENTS

This work is supported by the EPSRC and AHRC Centre for DoctoralTraining in Media and Arts Technology (EP/L01632X/1).

REFERENCES [1] Judith Amores, Robert Richer, Nan Zhao, Pattie Maes, and Bjoern M Eskofier.2018. Promoting relaxation using virtual reality, olfactory interfaces and wearableEEG. In . IEEE, IEEE, Las Vegas, 98–101.[2] Elliot Aronson. 1969. The theory of cognitive dissonance: A current perspective.

Advances in experimental social psychology

4, 1 (1969), 34.[3] Frederik Aust, Birk Diedenhofen, Sebastian Ullrich, and Jochen Musch. 2013.Seriousness checks are useful to improve data validity in online research.

Behaviorresearch methods

45, 2 (2013), 527–535.[4] Jakki Bailey, Jeremy N Bailenson, Andrea Stevenson Won, June Flora, and K Car-rie Armel. 2012. Presence and memory: immersive virtual reality effects on cuedrecall. In

Proceedings of the International Society for Presence Research Annual Con-ference . Citeseer, International Society for Presence Research Annual Conference,Philadelphia, Pennsylvania, USA, 24–26.[5] Domna Banakou, Parasuram D Hanumanthu, and Mel Slater. 2016. Virtualembodiment of white people in a black virtual body leads to a sustained reductionin their implicit racial bias.

Frontiers in human neuroscience

10 (2016), 601.[6] Jim Blascovich. 2002. Social influence within immersive virtual environments.In

The social life of avatars . Springer, California, USA, 127–145.[7] Jim Blascovich, Jack Loomis, Andrew C Beall, Kimberly R Swinth, Crystal L Hoyt,and Jeremy N Bailenson. 2002. Immersive virtual environment technology asa methodological tool for social psychology.

Psychological inquiry

13, 2 (2002),103–124.[8] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.

Qualitative research in psychology

3, 2 (2006), 77–101.[9] Jack Brookes, Matthew Warburton, Mshari Alghadier, Mark Mon-Williams, andFaisal Mushtaq. 2019. Studying human behavior with virtual reality: The UnityExperiment Framework.

Behavior research methods

1, 52 (2019), 1–9.[10] Tom Buchanan. 2000. Potential of the Internet for personality research. In

Psychological experiments on the Internet . Elsevier, Wir 8AL United Kingdom,121–140.[11] Michael Buhrmester, Tracy Kwang, and Samuel D Gosling. 2016. Amazon’sMechanical Turk: A new source of inexpensive, yet high-quality data?

Perspectiveson Psychological Science

1, 6 (2016), 3–5.[12] Michael D Buhrmester, Sanaz Talaifar, and Samuel D Gosling. 2018. An evaluationof Amazon’s Mechanical Turk, its rapid rise, and its effective use.

Perspectives onPsychological Science

13, 2 (2018), 149–154.[13] Krista Casler, Lydia Bickel, and Elizabeth Hackett. 2013. Separate but equal? Acomparison of participants and data gathered via Amazon’s MTurk, social media,and face-to-face behavioral testing.

Computers in human behavior

29, 6 (2013),2156–2160.[14] Woong Choi, Liang Li, Satoru Satoh, and Kozaburo Hachimura. 2016. Multi-sensory integration in the virtual hand illusion with active movement.

BioMedresearch international

Interactive multimedia learning environments . Springer,University of Southern California USA, 19–30.[16] Lauri Connelly, Yicheng Jia, Maria L Toro, Mary Ellen Stoykov, Robert V Kenyon,and Derek G Kamper. 2010. A pneumatic glove and immersive virtual reality

HI ’21, May 8–13, 2021, Yokohama, Japan Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov environment for hand rehabilitative training after stroke.

IEEE Transactions onNeural Systems and Rehabilitation Engineering

18, 5 (2010), 551–559.[17] S. A. Creasy, R. J. Rogers, K. K. Davis, B. B. Gibbs, E. E. Kershaw, and J. M. Jakicic.2017. Effects of supervised and unsupervised physical activity programmes forweight loss.

Obesity Science & Practice

3, 2 (2017), 143–152. https://doi.org/10.1002/osp4.107 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/osp4.107[18] Darragh Egan, Sean Brennan, John Barrett, Yuansong Qiao, Christian Timmerer,and Niall Murray. 2016. An evaluation of Heart Rate and ElectroDermal Activityas an objective QoE evaluation method for immersive virtual reality environments.In . IEEE, IEEE, Lisbon, Portugal, 1–6.[19] Donald P Ely and Barbara B Minor. 1994.

Educational Media and TechnologyYearbook, 1994. Volume 20.

ERIC, 6633, Englewood, CO.[20] Daniel Freeman, Polly Haselton, Jason Freeman, Bernhard Spanlang, SameerKishore, Emily Albery, Megan Denne, Poppy Brown, Mel Slater, and Alecia Nick-less. 2018. Automated psychological therapy using immersive virtual reality fortreatment of fear of heights: a single-blind, parallel-group, randomised controlledtrial.

The Lancet Psychiatry

5, 8 (2018), 625–632.[21] Joseph K Goodman, Cynthia E Cryder, and Amar Cheema. 2013. Data collection ina flat world: The strengths and weaknesses of Mechanical Turk samples.

Journalof Behavioral Decision Making

26, 3 (2013), 213–224.[22] Samuel D Gosling, Simine Vazire, Sanjay Srivastava, and Oliver P John. 2004.Should we trust web-based studies? A comparative analysis of six preconceptionsabout internet questionnaires.

American psychologist

59, 2 (2004), 93.[23] Diane Gromala, Xin Tong, Amber Choo, Mehdi Karamnejad, and Chris D Shaw.2015. The virtual meditative walk: virtual reality therapy for chronic pain man-agement. In

Proceedings of the 33rd Annual ACM Conference on Human Factors inComputing Systems . Association for Computing Machinery, New York, NY, USA,521–524.[24] Azriel Grysman. 2015. Collecting narrative data on Amazon’s Mechanical Turk.

Applied Cognitive Psychology

29, 4 (2015), 573–583.[25] Joseph Henrich, Steven J Heine, and Ara Norenzayan. 2010. Most people are notWEIRD.

Nature

International Journal of Human-Computer Interaction

31 (2015), 557 –570.[27] Hunter G Hoffman, Azucena Garcia-Palacios, Veronica Kapa, Jennifer Beecher,and Sam R Sharar. 2003. Immersive virtual reality for reducing experimentalischemic pain.

International Journal of Human-Computer Interaction

15, 3 (2003),469–486.[28] Crystal L Hoyt, Jim Blascovich, and Kimberly R Swinth. 2003. Social inhibition inimmersive virtual environments.

Presence: Teleoperators & Virtual Environments

12, 2 (2003), 183–195.[29] Bernd Huber and Krzysztof Z Gajos. 2020. Conducting online virtual environmentexperiments with uncompensated, unsupervised samples.

Plos one

15, 1 (2020),e0227629.[30] Naomi Josman, Eli Somer, Ayelet Reisberg, Patrice L Weiss, Azucena Garcia-Palacios, and Hunter Hoffman. 2006. BusWorld: designing a virtual environmentfor post-traumatic stress disorder in Israel: a protocol.

Cyberpsychology & Behav-ior

9, 2 (2006), 241–244.[31] Roland Kehl and Luc Van Gool. 2004. Real-time pointing gesture recognition foran immersive environment. In

Sixth IEEE International Conference on AutomaticFace and Gesture Recognition, 2004. Proceedings.

IEEE, IEEE, Seoul, South Korea,577–582.[32] Pyry Kettunen and J. Oksanen. 2018. Effects of Unsupervised Participation overthe Internet on a Usability Study about Map Animation. In

New Directions inGeovisual Analytics: Visualization, Computation, and Evaluation . Lipics, 02430Finland, 7:1–7:7.[33] Florian Keusch. 2015. Why do people participate in Web surveys? Applyingsurvey participation theory to Internet survey data collection.

Managementreview quarterly

65, 3 (2015), 183–216.[34] Konstantina Kilteni, Ilias Bergstrom, and Mel Slater. 2013. Drumming in im-mersive virtual reality: the body shapes the way we play.

IEEE transactions onvisualization and computer graphics

19, 4 (2013), 597–605.[35] Konstantina Kilteni, Raphaela Groten, and Mel Slater. 2012. The sense of embod-iment in virtual reality.

Presence: Teleoperators and Virtual Environments

21, 4(2012), 373–387.[36] Panagiotis Kourtesis, Danai Korre, Simona Collina, Leonidas AA Doumas, andSarah E MacPherson. 2020. Guidelines for the development of immersive virtualreality software for cognitive neuroscience and neuropsychology: the develop-ment of virtual reality everyday assessment lab (VR-EAL), a neuropsychologicaltest battery in immersive virtual reality.

Frontiers in Computer Science

Psychological experiments on the Internet . Elsevier, Indiana 47243,35–60. [38] A. Lacroix, R. Kressig, T. Muehlbauer, Y. Gschwind, B. Pfenninger, O. Bruegger,and U. Granacher. 2015. Effects of a Supervised versus an Unsupervised CombinedBalance and Strength Training Program on Balance and Muscle Power in HealthyOlder Adults: A Randomized Controlled Trial.

Gerontology

62 (2015), 275 – 288.[39] Barbara L. Ludlow. 2015. Virtual Reality: Emerging Applications and FutureDirections.

Rural Special Education Quarterly

34, 3 (2015), 3–10. https://doi.org/10.1177/875687051503400302[40] Xiao Ma, Megan Cackett, Leslie Park, Eric Chien, and Mor Naaman. 2018. Web-based VR experiments powered by the crowd. In

Proceedings of the 2018 WorldWide Web Conference . International World Wide Web Conferences Steering Com-mittee, Republic and Canton of Geneva, CHE, 33–43.[41] Guido Makransky, Thomas S Terkildsen, and Richard E Mayer. 2019. Addingimmersive virtual reality to a science lab simulation causes more presence butless learning.

Learning and Instruction

60 (2019), 225–236.[42] Winter Mason and Siddharth Suri. 2012. Conducting behavioral research onAmazon’s Mechanical Turk.

Behavior research methods

44, 1 (2012), 1–23.[43] Youngme Moon. 1998. The effects of distance in local versus remote human-computer interaction. In

Proceedings of the SIGCHI conference on Human factors incomputing systems . ACM Press/Addison-Wesley Publishing Co., USA, 103–108.[44] Masahiro Mori, Karl F MacDorman, and Norri Kageki. 2012. The uncanny valley[from the field].

IEEE Robotics & Automation Magazine

19, 2 (2012), 98–100.[45] Jason D Moss and Eric R Muth. 2011. Characteristics of head-mounted displaysand their effects on simulator sickness.

Human factors

53, 3 (2011), 308–319.[46] Aske Mottelson and Kasper Hornbæk. 2017. Virtual reality studies outside thelaboratory. In

Proceedings of the 23rd acm symposium on virtual reality softwareand technology . Association for Computing Machinery, New York, NY, USA,1–10.[47] Xueni Pan and Antonia F de C Hamilton. 2018. Why and how to use virtualreality to study human social interaction: The challenges of exploring a newresearch landscape.

British Journal of Psychology

Current Directions in Psychological Science

23, 3 (2014), 184–188.[49] Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. 2010. Runningexperiments on amazon mechanical turk.

Judgment and Decision making

5, 5(2010), 411–419.[50] Eyal Peer, Laura Brandimarte, Sonam Samat, and Alessandro Acquisti. 2017.Beyond the Turk: Alternative platforms for crowdsourcing behavioral research.

Journal of Experimental Social Psychology

70 (2017), 153–163.[51] Eyal Peer, Joachim Vosgerau, and Alessandro Acquisti. 2014. Reputation as asufficient condition for data quality on Amazon Mechanical Turk.

Behaviorresearch methods

46, 4 (2014), 1023–1031.[52] Thammathip Piumsomboon, Gun Lee, Robert W Lindeman, and Mark Billinghurst.2017. Exploring natural eye-gaze-based interaction for immersive virtual reality.In . IEEE, IEEE, Los Angeles,CA, USA, 36–39.[53] Jennifer Preece. 2016. Citizen science: New research challenges for human–computer interaction.

International Journal of Human-Computer Interaction

32, 8(2016), 585–612.[54] Jaziar Radianti, Tim A Majchrzak, Jennifer Fromm, and Isabell Wohlgenannt.2020. A systematic review of immersive virtual reality applications for highereducation: Design elements, lessons learned, and research agenda.

Computers &Education

147 (2020), 103778.[55] Jack Ratcliffe and Laurissa Tokarchuk. 2020. Evidence for embodied cognition inimmersive virtual environments using a second language learning environment.In . IEEE, IEEE, London, UK, 1–8.[56] Jack Ratcliffe and Laurissa Tokarchuk. 2020. Presence, Embodied Interaction andMotivation: Distinct Learning Phenomena in an Immersive Virtual Environment.In

Proceedings of the 28th ACM International Conference on Multimedia . IEEE,London, UK, 1–8.[57] Ulf-Dietrich Reips. 2000. The Web experiment method: Advantages, disadvan-tages, and solutions. In

Psychological experiments on the Internet . Elsevier, CH-8032Zurich Switzerland, 89–117.[58] Yvonne Rogers and Paul Marshall. 2017. Research in the Wild.

Synthesis Lectureson Human-Centered Informatics

10, 3 (2017), i–97.[59] Robert S. Ryan, Mara Wilde, and Samantha Crist. 2013. Compared to a small,supervised lab experiment, a large, unsupervised web-based experiment on apreviously unknown effect has benefits that outweigh its potential costs.

Com-puters in Human Behavior

29, 4 (2013), 1295 – 1301. https://doi.org/10.1016/j.chb.2013.01.024[60] David Saffo, Caglar Yildirim, Sara Di Bartolomeo, and Cody Dunne. 2020. Crowd-sourcing Virtual Reality Experiments using VRChat. In

Extended Abstracts of the2020 CHI Conference on Human Factors in Computing Systems . Association forComputing Machinery, New York, NY, USA, 1–8.[61] Ferran Argelaguet Sanz, Anne-Hélène Olivier, Gerd Bruder, Julien Pettré, andAnatole Lécuyer. 2015. Virtual proxemics: Locomotion in the presence of obstaclesin large immersive projection environments. In .IEEE, IEEE, Arles, France, 75–80. xtended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities CHI ’21, May 8–13, 2021, Yokohama, Japan [62] Valentin Schwind, Pascal Knierim, Nico Haas, and Niels Henze. 2019. Usingpresence questionnaires in virtual reality. In

Proceedings of the 2019 CHI Conferenceon Human Factors in Computing Systems . Association for Computing Machinery,New York, NY, USA, 1–12.[63] Kathryn Y Segovia and Jeremy N Bailenson. 2009. Virtually true: Children’sacquisition of false memories in virtual reality.

Media Psychology

12, 4 (2009),371–393.[64] C Senior, Mary L Phillips, J Barnes, and AS David. 1999. An investigation into theperception of dominance from schematic faces: A study using the World-WideWeb.

Behavior Research Methods, Instruments, & Computers

31, 2 (1999), 341–346.[65] Francesco Soave, Nick Bryan-Kinns, and Ildar Farkhatdinov. 2020. A PreliminaryStudy on Full-Body Haptic Stimulation on Modulating Self-motion Perceptionin Virtual Reality. In

Augmented Reality, Virtual Reality, and Computer Graphics ,Lucio Tommaso De Paolis and Patrick Bourdot (Eds.). Springer InternationalPublishing, Cham, 461–469.[66] Jon Sprouse. 2011. A validation of Amazon Mechanical Turk for the collectionof acceptability judgments in linguistic theory.

Behavior research methods

43, 1(2011), 155–167.[67] A. Steed, S. Frlston, M. M. Lopez, J. Drummond, Y. Pan, and D. Swapp. 2016. An‘In the Wild’ Experiment on Presence and Embodiment using Consumer VirtualReality Equipment.

IEEE Transactions on Visualization and Computer Graphics

22, 4 (2016), 1406–1414.[68] Anthony Steed, Francisco Ortega, Adam Williams, Ernst Kruijff, Wolf-gang Stuerzlinger, Anil Ufuk Batmaz, Andrea Won, Evan Suma Rosen-berg, Adalberto Simeone, Aleshia Hayes, and et al. 2020. Interac-tions. https://interactions.acm.org/blog/view/evaluating-immersive-experiences-during-covid-19-and-beyond . IEEE, IEEE, Greenville, SC, 67–76.[70] Steven E Stern and Jon E Faber. 1997. The lost e-mail method: Milgram’s lost-letter technique in the age of the Internet.

Behavior Research Methods, Instruments, & Computers

29, 2 (1997), 260–263.[71] Ayoung Suh and Jane Prophet. 2018. The state of immersive technology research:A literature analysis.

Computers in Human Behavior

86 (2018), 77–90.[72] John Sweller. 2010. Cognitive load theory: Recent theoretical advances.

CognitiveLoad Theory: Recent Theoretical Advances

Usability Professionals Association Conference . Usability ProfessionalsAssociation Conference, Orlando, FL, 8.[75] David Waller, Eric Bachmann, Eric Hodgson, and Andrew C Beall. 2007. TheHIVE: A huge immersive virtual environment for research in spatial cognition.

Behavior Research Methods

39, 4 (2007), 835–843.[76] Vivek R Warriar, John R Woodward, and Laurissa Tokarchuk. 2019. ModellingPlayer Preferences in AR Mobile Games. In .IEEE, IEEE, London UK, 1–8.[77] Raphael P Weibel, Jascha Grübel, Hantao Zhao, Tyler Thrash, Dario Meloni,Christoph Hölscher, and Victor R Schinazi. 2018. Virtual reality experimentswith physiological measures.

JoVE (Journal of Visualized Experiments)

1, 138(2018), e58318.[78] Christopher J Wilson and Alessandro Soranzo. 2015. The use of virtual reality inpsychology: a case study in visual perception.

Computational and mathematicalmethods in medicine . IEEE, IEEE,Waltham, MA, USA, 95–102.[80] Leshao Zhang and Patrick GT Healey. 2018. Human, Chameleon or NoddingDog?. In