[PDF] Ethical Issues in Empirical Studies using Student Subjects: Re-visiting Practices and Perceptions

Abstract

Context: Using student subjects in empirical studies has been discussed extensively from a methodological perspective in Software Engineering (SE), but there is a lack of similar discussion surrounding ethical aspects of doing so. As students are in a subordinate relationship to their instructors, such a discussion is needed. Objective: We aim to increase the understanding of practices and perceptions SE researchers have of ethical issues with student participation in empirical studies. Method: We conducted a systematic mapping study of 372 empirical SE studies involving students, following up with a survey answered by 100 SE researchers regarding their current practices and opinions regarding student participation. Results: The mapping study shows that the majority of studies does not report conditions regarding recruitment, voluntariness, compensation, and ethics approval. In contrast, the majority of survey participants supports reporting these conditions. The survey further reveals that less than half of the participants require ethics approval. Additionally, the majority of participants recruit their own students on a voluntary basis, and use informed consent with withdrawal options. There is disagreement among the participants whether course instructors should be involved in research studies and if should know who participates in a study. Conclusions: It is a positive sign that mandatory participation is rare, and that informed consent and withdrawal options are standard. However, we see immediate need for action, as study conditions are under-reported, and as opinions on ethical practices differ widely. In particular, there is little regard in SE on the power relationship between instructors and students.

Full PDF

aa r X i v : . [ c s . S E ] F e b Noname manuscript No. (will be inserted by the editor)

Ethical Issues in Empirical Studies using StudentSubjects: Re-visiting Practices and Perceptions

Grischa Liebel · Shalini Chakraborty

Received: date / Accepted: date

Self-archiving note:

This is a post-peer-review, pre-copyedit version of an article publishedin Empirical Software Engineering. The ﬁnal authenticated version ismade available online soon.

Abstract

Context:

Using student subjects in empirical studies has been dis-cussed extensively from a methodological perspective in Software Engineering(SE), but there is a lack of similar discussion surrounding ethical aspects ofdoing so. As students are in a subordinate relationship to their instructors,such a discussion is needed.

Objective:

We aim to increase the understandingof practices and perceptions SE researchers have of ethical issues with studentparticipation in empirical studies.

Method:

We conducted a systematic map-ping study of 372 empirical SE studies involving students, following up with asurvey answered by 100 SE researchers regarding their current practices andopinions regarding student participation.

Results:

The mapping study showsthat the majority of studies does not report conditions regarding recruitment,voluntariness, compensation, and ethics approval. In contrast, the majority ofsurvey participants supports reporting these conditions. The survey furtherreveals that less than half of the participants require ethics approval. Addi-tionally, the majority of participants recruit their own students on a voluntarybasis, and use informed consent with withdrawal options. There is disagree-

G. LiebelReykjavik UniversityMenntavegur 1, 102 Reykjav´ık, IcelandORCID: 0000-0002-3884-815XE-mail: [email protected]. ChakrabortyReykjavik UniversityMenntavegur 1, 102 Reykjav´ık, IcelandE-mail: [email protected] Grischa Liebel, Shalini Chakraborty ment among the participants whether course instructors should be involved inresearch studies and if should know who participates in a study.

Conclusions:

It is a positive sign that mandatory participation is rare, and that informedconsent and withdrawal options are standard. However, we see immediate needfor action, as study conditions are under-reported, and as opinions on ethicalpractices diﬀer widely. In particular, there is little regard in SE on the powerrelationship between instructors and students.

Keywords

Student Subjects · Ethics · Research Methodology · MappingStudy · Survey

Using student subjects in empirical studies is common in Software Engineering(SE). In comparison to professionals, students are typically easier to recruit,more homogeneous in terms of skills and experience, and available in largernumbers [16]. Furthermore, students may serve as a representative sample inmany situations, e.g., if the target population is junior developers [16].While multiple studies exist in SE that discuss the validity of using studentsubjects in empirical research, e.g., [16,22,35,38,45], there is a lack of researchinvestigating ethical considerations of student use. Furthermore, early work inempirical SE shows little awareness and even disregard for ethical issues [21].As students are under the control of their instructor, using students subjectsraises several ethical considerations in addition to the issues already presentin studies involving human subjects [40,39]. Among others, it is necessary toensure that students achieve their educational goals, and that empirical studiesdo not negatively aﬀect learning outcomes by competing for limited time andresources [11]. Therefore, we investigate in this paper how student subjectsare treated in SE research , and how this picture compares to several studiesin the early 2000s. We aim to answer the following research questions.RQ1 What are the conditions under which student subjects participate in em-pirical studies (in SE)?RQ2 As a community, what do we believe are the ideal conditions under whichstudent subjects participate in empirical studies (in SE)?For the conditions, we focus on the research method applied, the voluntarinessof participation, whether ethics approval is present, and the compensationof study participation. To answer RQ1, we conducted a systematic mappingstudy, extracting and screening publications that use student subjects in topSE venues . Based on the results of this study, we additionally conducted asurvey with authors of the primary studies regarding their opinions on studentsubject use in SE to answer RQ2. Note that we target SE-focused research. Therefore, we do not speciﬁcally target SEeducation or computer science education research. The complete list of venues is found in Appendix A. To avoid confusion, throughout the paper we refer to the authors participating in oursurvey as “participants”, while referring to students participating in a study as “subjects”.thical Issues in Empirical Studies using Student Subjects 3

Our ﬁndings are that student subjects are typically used in controlled ex-periments. The majority of studies recruit students on a voluntary basis, oftenas a part of a course. Ethics approvals and compensation is only rarely men-tioned in the primary studies.The survey shows that only 43% of participants need ethics approval. Themajority uses their own students, and on a voluntary basis. Students are com-pensated by 50% of the participants. Informed consent and withdrawal op-tions are used by more than 90% of participants. Participants disagree on howacceptable diﬀerent practices are, in particular with respect to the researcher-student relationship.Finally, we ﬁnd a substantial gap between the wish of our survey partici-pants that study conditions should be reported, and the actual picture drawnby our systematic mapping study in which study conditions are only rarelyreported.With this paper, we hope to contribute to the discussion surrounding eth-ical use of student subjects in SE, and to increase the awareness of ethicalstandards and guidelines.

Several publications are of direct interest to this study, either as they describegeneral ethical principles in relation to SE, e.g., [40,39,1,21], or as they de-scribe ethical principles or general guidelines in relation to student subjects,e.g. [10,11,18,9,26].Singer and Vinson [40] list ethical considerations compiled from existingcodes of ethics in other disciplines, e.g., other engineering disciplines. The au-thors highlight informed consent, stating that, while ethicists “do not fullyagree on the necessary components”, it should contain elements such as vol-untariness, the consent decision, and the right to withdraw from the studyat any time. Furthermore, they highlight that the power of a course instruc-tor is problematic. Even if informed consent is in place, students might fearreprisal if they do not participate. Speciﬁcally, this is the case regardless ofthe instructor’s intent, “the ethical diﬃculty arises not from the professor’s in-tent but from her power.” [40]. The authors suggest that students can remainanonymous to the instructor by using anonymous surveys/data collection, orthe help of a graduate student to administer the data collection. Finally, theauthors clarify that consent is typically not required when there is no privatedata identifying the students in the raw data, i.e., if the students can expectto remain private.As a more detailed follow-up publication, Vinson and Singer [47] provideguidelines for ethical research involving humans in SE. Speciﬁcally, the au-thors discuss four pillars that should be required in all such studies: use ofinformed consent, beneﬁcence to subjects or reduction of harm, conﬁdentialityof subjects and information they share, and scientiﬁc value of the study. Theauthors further provide guidelines for ethics reviews. Finally, they conclude

Grischa Liebel, Shalini Chakraborty stating that ethics in empirical studies needs to be considered speciﬁcally, andthat researchers need to be trained to do so.Andrews and Pradhan [1] discuss ethical considerations in empirical SE.Interestingly, they essentially do not mention student subjects as a speciﬁcarea of concern. As an exception, the authors mention that informed consentmust be used, with “full disclosure of all potentially adverse eﬀects” and optionto withdraw at any time.Davison et al. [14] summarise a panel discussion in the information systemscommunity on research ethics. The authors conclude that voluntary partici-pation in empirical studies should be the norm, and that students should beinformed how their data is handled. Furthermore, the authors state that a codeof practice is likely more successful than strict regulation, since it encouragesa constructive dialogue in the research community.Hall and Flynn [21] report on a survey conducted among SE researchersin the UK regarding attitudes towards ethical issues in empirical SE. Theauthors state that there is only little attention towards ethical issues withhuman participation in SE, compared to traditional disciplines such as psy-chology, medicine, or law. Their results show a worrying picture, with a lackof awareness and concern among participants for ethical issues.Sieber [39] discusses speciﬁcally how to protect study subjects in empiricalSE. The author explains that risks (e.g., inconvenience, economic risk), context(e.g., students as subjects), and vulnerabilities (e.g., subordination of subjects)need to be considered when designing studies. As in [40], the author discussesthat pressure on students arises “even if they are assured that participation [ina study] is voluntary.” [39]. Similarly, to lower this risk, the author suggestsusing graduate students to preserve anonymity of the subjects towards theinstructor.According to the diﬀerent ethical concerns listed by Singer and Vinson [40],Storey et al. [42] discuss ethical issues in relation to an empirical study con-ducted in their own course, after ethics approval was obtained. The authorsstate that, while informed consent was obtained, they were unsure whetherthey did not inadvertently coerce students into participating, e.g., by oﬀeringcourse credits and unfavourable alternative assignments. Similarly, they notethat there might have been a too tight connection between instructors andresearchers, thus putting additional stress on the participating students. Theauthors conclude that many ethical issues arise from exposing students to toolsthat are untested, such as research prototypes.Carver et al. [10] discuss ethical issues arising through the use of studentsubjects in empirical SE studies. The authors contrast the students’ “rights toreach their educational goals” with the costs caused by study participation.Several ethical questions are raised, most of which serve as direct source forthe hypotheses used in our study (see Section 3). For instance, the authorsask whether it is ethically correct “to base some of the ﬁnal evaluation of astudent on his or her performance in the empirical study”.In a more recent publication by Carver et al. [11], the ethical questions arecomplemented with requirements for empirical studies using student subjects. thical Issues in Empirical Studies using Student Subjects 5

As a general problem, the authors state that an empirical study needs to com-pete for “scarce time, eﬀort, and resources” in university courses. Especiallyrelevant for our study are three requirements raised by the authors, namelythat ethical issues must be addressed by the study design, that student shouldunderstand the value of empirical studies and how to conduct them, and thatgroup projects should be included in empirical studies with student participa-tion. At the same time, several issues raised by [40,39,1] are not addressed bythe requirements. For instance, the authors do not discuss that it can be prob-lematic if the instructor is aware of which students participated in a study, asthis might put stress on students or coerce them into participating.Experiences with the guidelines by [11] are discussed by Galster et al. [18].Especially relevant to our study is the observation that it is helpful for thecourse instructor and the researcher conducting the study to be the same per-son, since it “reduces problems with communicating the pedagogical value tostudents” and since students “felt less intimidated compared to a situationin which an external person conducts the study”. This contrasts with state-ments from related work on ethics, e.g., [40,39], that students feel pressure toparticipate when researcher and course instructor are identical.Buse et al. [9] study publications at the CHI conference series with re-spect to ethics approval, following a similar method as we do in this study.The authors ﬁnd that only 14 out of 211 studies mention ethics approval. Ad-ditionally, the authors ﬁnd that ethics approval procedures are a perceivedbarrier in user studies.Ko et al. [26] formulate guidelines for controlled experiments with humansubjects in SE. The authors state that informed consent should be used, butthat “it is likely that they [the authors] are required to obtain informed con-sent and that the study design itself must ﬁrst be approved by an ethicscommittee”. Similar to [9], the authors state that this approval process can betime-consuming.Finally, recent work investigates ethics in SE in relation to emerging re-search areas, such as bots or autonomous systems, e.g., [6,7,3]. While thiswork is outside of our scope, the named publications re-iterate the basic prin-ciples named in [47], i.e., informed consent, beneﬁcence, conﬁdentiality, andscientiﬁc value.In summary, work from the early 2000s reports that there is little aware-ness and concern for ethical issues in SE research [21]. Consequently, existingwork summarises general ethical guidelines and procedures related to studieswith human subjects [40,39,1,21] and, in particular, with student subjects [10,11,18,9,26].. This body of work extensively references existing work in otherdisciplines. In addition to this body of work, there is existing work that for-mulates SE-speciﬁc guidelines [10,11]. While these are often concerned withvalidity of the studies, there exists also ethical advice related to student sub-ject use. Interestingly, we ﬁnd a disconnect between these two areas: Certainconcerns mentioned in the more general work on ethics are seemingly ignoredin SE-speciﬁc guides. Furthermore, ethical issues remain even in cases whereethics approval is obtained [42]. Finally, we ﬁnd that some advice is in direct

Grischa Liebel, Shalini Chakraborty conﬂict with general ethical advice, e.g., the supposed advantage of a courseinstructor being the same person as the researcher conducting the study [18].This raises the question which practices, if any, are currently followed in theSE ﬁeld, and what beliefs exist among SE researchers related to how ethicalthose practices are.

The aim of this paper is to improve the understanding of ethical issues per-taining to the conditions under which they participate in empirical studiesin SE. We focus on the research method applied, the voluntariness of partic-ipation, whether ethics approval is present, and the compensation of studyparticipation. To address this aim, we formulate the following two RQs:RQ1 What are the conditions under which student subjects participate in em-pirical studies (in SE)?RQ2 As a community, what do we believe are the ideal conditions under whichstudent subjects participate in empirical studies (in SE)?To answer RQ1, we conducted a literature study of empirical studies publishedin top SE venues. While Systematic Literature Reviews (SLRs) have becomecommon in SE research [31], their purpose is to provide an in-depth review ofthe results and methodology of a number of papers [31]. To answer our ques-tions, we instead need to provide an overview of a large number of publicationsin the ﬁeld of SE, without the need to review the results and methodology indepth. Therefore, we instead chose to conduct a systematic mapping study asthis is an appropriate way to provide this overview [31], and as it allows tohandle a larger number of primary studies [31].To answer RQ2, we conducted a survey among SE researchers publishingempirical studies with student participation. We chose a survey design sincesurveys can allow for generalisation over a population of actors [41], in thiscase SE researchers experienced in conducting studies with students, and withan interest in ethics.The designs of the two studies are explained in the following sub-sections.3.1 Systematic Mapping StudyWe break down RQ1 into a number of sub-research questions.RQ1.1 In which top SE venues are empirical studies with student subjects pub-lished?RQ1.2 What are the most frequently applied research strategies in which studentsubjects are involved?RQ1.3 How many students are participating in the found studies?RQ1.4 To what extent is ethics approval reported in empirical studies using stu-dent subjects? thical Issues in Empirical Studies using Student Subjects 7

RQ1.5 To what extent is participation of students in empirical studies voluntary?RQ1.6 How is the participation of students in empirical studies compensated?Based on our research questions, we formulated the search string in accor-dance with [24], breaking down our research questions into individual facets.In our case, we consider all empirical studies (study design) from softwareengineering (population) that include students (context). We did not ﬁnd anysynonyms to these keywords and, therefore, decided to use the broad searchstring “software engineering” AND “empirical” AND “student” . We searchedScopus, IEEE Xplore and ACM Digital Library using the search string adaptedto the format used in the respective database. For IEEE Xplore and ACM Dig-ital Library, we searched the full-text version of the paper. For Scopus, full-textsearch is not available, so we searched in title, abstract and keywords instead.We limited the search to papers published since 2010, since Carver et al. [11]published their updated checklist on student empirical studies in SE in thatyear. Clearly, including earlier years would add valuable information and allowfor a better analysis of trends. However, we decided that the Carver et al. [11]study marks a logical point in time to restrict the search, and at the sametime limits the eﬀort since the number of papers already grew rather largein comparison to many existing studies. Similarly, we chose to only includepapers from high-quality SE venues in our search, since we wanted to obtaina picture of SE research that would be considered to be of high quality byscholars in the ﬁeld. As high quality venues, we decided to use all SE confer-ences ranked A or A* in the most recent CORE conference ranking , and the14 SE journals included in the ranking by Robert Feldt . While this selectionof venues is, to some extent, an arbitrary choice, we believe that slight diﬀer-ences in the in/exclusion of venues would not introduce a signiﬁcant change inour results. The entire list of venues is depicted in Appendix A. Note that thethree databases we searched index all of these venues to some extent. However,certain titles are not completely included, or not full-text searchable. For in-stance, JSEP, STVR, SPE, and IJSEKE were only searched on title, abstractand keywords, and for HICSS not all conference years are indexed in the useddatabases.The search yielded 1284 papers, 1140 after duplicate removal. For the re-maining papers, we applied the following exclusion criteria on the title andabstract: – Paper is less than 8 pages in length (short papers). – Title/abstract does not clearly indicate that the paper contains a primaryempirical study. – From the title/abstract, it is evident that data was not collected fromstudent subjects. This criterion also excludes secondary/tertiary studiessuch as systematic literature reviews.Regarding the last criterion, we excluded studies that, e.g., made statementssuch as “we conducted a controlled experiment with 50 professional software http://portal.core.edu.au/conf-ranks/ Grischa Liebel, Shalini Chakraborty developers”, with no indication of further data collection. If unclear, we wouldinclude the paper in the next step. It can be argued that papers with lessthan 8 pages in length should also be included, as they might provide valuableinformation. However, we decided against that, as many venues consider thoseshort papers or other special-track papers, such as new ideas and visions, witha speciﬁc focus. Due to the length and their speciﬁc focus, it is less likelythat such papers report study conditions. For instance, a new ideas and visionpaper will likely spend more time elaborating on the novelty of the idea, ratherthan on a performed or planned study design.Based on these exclusion criteria, we excluded 652 papers, leaving us with488 papers. For these 488 papers, we obtained the full-text version of eachpaper. Applying the exclusion criteria of title/abstract on the full-text versions,we excluded another 112 papers. Finally, 4 papers were behind a paywall andnot available to us. This left us with 372 papers for analysis.Based on targeted reading of the introduction and method sections, as wellas keyword search, we extracted the information relevant to our review, i.e.,study type, venue, number of student subjects, voluntariness of participation,subject compensation, and status of ethics approval. The detailed extractionprocedures are listed in Appendix B, including the keywords we searched for.In contrast to the recommendations by Petersen et al. [31], we extracted thisinformation from the full-text publication. We visualised the resulting infor-mation using summary statistics and bubble plots.Initially, we did not perform any reliability checks, as we deemed the ex-clusion and extraction process to be comparably objective. However, in a ﬁrstrevision of this paper we added inter-rater and intra-rater reliability checksin a post-study fashion. We aimed for a Cohen’s Kappa of κ ≥ .

61 for allinter-rater reliability checks, and a κ ≥ .

81 for all intra-rater reliability checks,meaning substantial and perfect agreement according to Landis and Koch [27].To conduct the checks, the ﬁrst and second author independently processed arandom set of 10% of the initial 1140 papers, applying the exclusion criteriaon the title and abstracts. Additionally, the ﬁrst author re-did his exclusionapproximately 20 months after the original process. We obtained an inter-rateragreement of κ ≈ .

722 and an intra-rater agreement of κ ≈ . κ ≈ . κ ≈ . thical Issues in Empirical Studies using Student Subjects 9 had lower reliability on ethics approval ( κ ≈ . κ ≈ . κ ≈ . κ ≈ . . . . . . To design thesurvey, we followed the guidelines by Kitchenham and Pﬂeeger [25,32].The objective of this descriptive follow-up survey was to understandwhether or not the perceptions of SE researchers regarding student partici-pation in empirical studies are in line with published accounts. Based on themapping study results, on the updated guidelines released by Carver et al. [11],and our experiences, we formulated a number of hypotheses that guided us in For this survey, we did not need to obtain ethics approval at our institution. Participationwas voluntary and no incentive was provided. The ﬁrst survey page as well as the invitationemail stated which data was collected and how the obtained data would be used.0 Grischa Liebel, Shalini Chakraborty

Table 1: Hypotheses and their Sources

ID Hypothesis Source H The majority of participants are not required to obtainethics approval (Mapping study, [9]) H The majority of participants recruit their own studentsfor empirical studies. (Experience, no mappingstudy data) H The majority of participants perform voluntary studies,as a part of courses. (Mapping study) H The majority of participants do not oﬀer compensationto their subjects. (based on inconclusivemapping study data) H The majority of participants use informed consent, in-cluding the option to withdraw voluntarily. ([11], no mapping studydata) H The majority of participants agree that studies shouldrelate to course learning outcomes or project work. (sound “reasonable”, [11]) H The majority of participants agree that informed con-sent and withdrawal should be oﬀered. ([11], no mapping studydata) H The majority of participants disagree that informationon ethics approval, voluntariness, compensation and in-formed consent should be included in publications. (mapping study) the design of our survey. We assume that the mapping study results are re-ﬂecting researcher’s beliefs about study participation, that the guidelines byCarver et al. [11] are being applied, and that our experiences are representa-tive for SE research. The hypotheses, together with their sources, are depictedbelow.We designed the survey as a cross-sectional, self-administered ques-tionnaire , i.e., participants ﬁll in the questionnaire through an online toolat a single point in time. The survey questions were formulated based on thescope and the ﬁndings of the mapping study, related work on ethical issues instudent subject studies, e.g., [10,11,18], and the hypotheses listed in Table 1.The questionnaire followed an hourglass format, with general demographicquestions at the beginning, increasingly detailed questions regarding the par-ticipants’ opinions and past studies, and open questions/comment ﬁelds toend the survey. Demographic questions were mainly open, to allow for diﬀer-ent academic systems, e.g., in position descriptions. For questions regardingpast study conditions and opinions on student recruiting and reporting, weused closed questions with free-text clariﬁcation options. We tried to designthe questions following best practice, e.g., avoiding jargon, leading questions,or Yes/No questions (unless appropriate). We did not try to measure conceptsthat are hard to map to single questions and would require summated ratingscales [25] or similar. The survey questionnaire is listed in Appendix C, andcan be found in the dataset [29].The survey was instrumented using the online service

SoSciSurvey . Weincluded an introduction page with purpose statement, a declaration that theparticipation is voluntary and anonymous, and a statement regarding the useof the obtained data. The length of the survey was designed in a way that https://soscisurvey.dethical Issues in Empirical Studies using Student Subjects 11 it should not take more than 15 minutes, unless extensive free-text answerswere given. From the meta data, we know that only 3 participants exceededthis time, and one participant conﬁrmed in an email that the length had beenestimated well.To pilot the survey , we sent it to two researchers, one in SE and onein higher education. Furthermore, the latter researcher is a native Englishspeaker. The researchers answered the survey, and reviewed the questions interms of content and form. We did not perform a test-retest reliability assess-ment.We followed a purposeful sampling strategy , selecting the ﬁrst twoauthors from each publication included in the systematic mapping study. Thus,we characterise the population as SE researchers experienced in conductingstudies with student subjects. This does not allow us to answer RQ2 for SEresearchers in general, but is likely to yield a higher response rate as we canexpect the sample to be more interested in our research questions.We selected the ﬁrst two authors, since those are in many cases the onesmost familiar with the published studies (due to the dominant practice in SE toorder authors by contribution). We then merged duplicate email addresses (incases where one author appeared on more than one publication) and multipleemail addresses for the same person (where we could identify them). This ledto a list of 504 recipients, who we contacted with an invitation for our survey.For the 97 of those recipients who were not reachable, we found alternativeemail addresses in 48 cases. For the remainder, we proceeded with the nextauthor in the publication. We repeated this step until we either obtained twovalid email address per paper, or there were no further authors. This led to atotal of 474 delivered mail invitations.We sent one reminder 10 days before the end of the survey administrationperiod. In total, we received 100 replies, corresponding to a 21.1% responserate.To analyse the survey data, we compared the visual representation of thedata using bar graphs with the hypotheses. Additionally, we used paradigmaticcorroboration [37]. That is, we coded the qualitative data obtained from free-text answers using descriptive coding [37], i.e., assigning codes describing thediscussed topic(s) similar to “hash tags”. We then checked whether or notthey corroborate the quantitative survey data. This coding process was notvalidated.The survey dataset together with the complete instrument is available at[29]. Note that free-text answers have been anonymised where necessary, andare only shown in aggregate form to avoid individuals being identiﬁable basedon their answers. In particular, we anonymised mentions of countries with onlyone participant, and removed meta data (such as time spent on the survey). Construct validity reﬂects to what extent the measures represent the constructinvestigated. In our study, we are investigating several aspects of ethics thatcould be misinterpreted by study participants. For instance, the notion of astudent subject could have been misunderstood by participants to be limitedonly to controlled experiments, whereas students that are involved in a casestudy would not be considered subjects. This is something we clariﬁed in oursurvey after a participant notiﬁed us of the potential misunderstanding.Furthermore, to increase validity, we piloted the survey with one colleaguewithin SE, and another colleague outside of SE (with a role in universityeducation) to review our survey instrument for clarity.In the invitation for our survey, we presented invitees with preliminaryresults of our mapping study analysis. Several of the invitees contacted uswith feedback and/or additional impressions. To these invitees, we sent aninitial draft of this paper as a form of member checking . We decided not toask for additional input from the other invitees, to limit the number of emailsto those who might not be interested in the study.

Internal validity reﬂects to what extend causal relationships are closely exam-ined and other, unknown factors might impact the ﬁndings.As a means to avoid oversimpliﬁcation, we applied paradigmatic corrob-oration [37] during the survey analysis. That is, we examined whether thequalitative, free-text answers corroborated the quantitative scores assigned todiﬀerent answers. This avoided, in some cases, a misinterpretation of the an-swers. For instance, participants would assign a neutral score to a question,but then explain the free-text answers that they did not fully understand thequestion, or that it did not apply.As a part of the results of the mapping study, we display the number ofincluded papers per venue, normalised by the total amount of papers publishedat each venue during the time. To do so, we manually extracted the numberof published papers for each venue, relying on database searches (for journals)and information in conference proceeding preambles. There is a minor threatthat these numbers are not entirely correct, e.g., as database indices mightbe incomplete, or as conference proceedings might list numbers for multipletracks instead of only the tracks we searched.Since we screened for published papers, there is the threat that the resultswould diﬀer when including grey literature, or rejected papers. Intuitively, thical Issues in Empirical Studies using Student Subjects 13 those papers could be less systematic in their practices and how they re-port them. However, we might exclude papers which are under-representeddue to community trends or bias. For instance, qualitative studies are under-represented in SE [43]. This raises the potential threat of a systematic error.Since we do not see a feasible way of systematic sampling those types of papersas well, we have to accept this threat to validity.

External validity is reﬂecting to what extent ﬁndings can be generalised beyondthe concrete sample. In our mapping study, we screened publications from topSE venues. We expect that the data obtained from this study is representativeof high-quality SE publications. Since quality criteria change over time, andvary between diﬀerent venues, we do not expect to ﬁnd similar results inearlier publications or in those published in venues that have more relaxedacceptance criteria. In particular, we expect studies with less rigour (in termsof recruitment and sampling) and lower number of subjects per study. In termsof study type, we would expect a larger variance in study type in other venues,since many of the top SE venues have been subject to criticism for potentiallyfavouring quantitative studies.Some of the venues we included in the mapping study are not accessiblethrough full-text search. The extent of this bias can be approximated. Lookingat ACM DL and IEEEXplore, the databases that allowed for full-text search,we observe that approximately 17.5% of papers (17.13% in ACM DL, 18.22% inIEEEXplore) contain ’student’ in the abstract/title/keywords. The remainderof papers only include the term in the full-text search. Of the papers thatdid not include the keyword in title/abs/keywords, 19.40% were ultimatelyincluded at the full-text stage. This is in contrast to the much larger 62.20% ofpapers included where the keyword was found in title/abs/keywords. Directlytranslating these numbers to the remaining database (Scopus) leads to anupper bound of papers we might have missed. In Scopus, we found 702 paperswhere student was in title/abs/keywords. Assuming the same ratio as theIEEE and ACM databases would lead to another 3309 papers if Scopus wouldallow full-text search. Of these, we would then include 19.40%, meaning 642additional full-text papers. However, this is a very conservative estimate sinceScopus overlaps to a large extent with the indexed venues of ACM DL andIEEE Xplore. Excluding from the Scopus search results those papers publishedin venues that are at least partially indexed by ACM DL or IEEE Xplore,and re-doing the calculation, we end up with a lower bound of around 100-300 papers that might have been missed through the lack of full-text search.Speciﬁcally, venues we are aware that are neither in ACM DL nor IEEEXploreare the Wiley Journals (Journal of Software: Evolution and Process, Software:Practice & Experience, Software Testing, Veriﬁcation and Reliability), WorldScientiﬁc’s Int. Journal of Software Engineering and Knowledge Engineering,and to a large extent HICSS conference. While this bias is rather large, i.e., between 25% and 172.58%, we would expect the missing/incomplete venuesnot to report studies that diﬀer substantially from the extracted ones.Our survey sample was drawn from the population of SE researchers fa-miliar with conducting and publishing empirical studies with student subjects.We made this choice in order to allow for a higher response rate and an in-formed answer to our survey questions, as the sample can be considered moreexperienced with and interested in the study topic compared to the populationof SE researchers in general. Ideally, we would like to generalise our answer toRQ2 to SE researchers in general, but our sampling strategy does not allowthis. To allow for generalisation to SE research in general, a replication of thesurvey with diﬀerent sampling strategy is needed.Our survey response rate is 21.1%, which is substantially higher than thetypical 5% stated by Lethbridge et al. [28]. Considering that we have two con-tact emails for most papers, this means that we get on average approximatelyone answer for every second paper. We deem this a reasonable response, butthere might be a bias towards researchers most interested in ethical aspects.We chose to only send a single reminder to not cause any annoyance, as surveyinvitations are becoming a nuisance in SE [4].A ﬁnal threat to external validity is the exclusion of a number of emailaddresses in the survey. In the original manuscript of this paper, we conductedthe search in the 2019 version of the ACM Digital Library, which did not allowfor full-text search. In the ﬁrst revision, we updated this search to include alsofull-text search, leading to the addition of 26 papers in the ﬁnal full-text set.Since we sampled the authors on the original data, this means we did notinclude the email addresses of these 26 papers, leading to 32 unique emailsbeing omitted from the survey. Assuming the same response rate, this meanswe could have expected another 7 survey answers. We decided to not re-runthe survey for those 32 individuals, as it would add another threat to validity,namely opinions having changed compared to the original cross-section.

Reliability describes the degree to which similar results would be obtained ifthe same study would be repeated, by the same or by other researchers.While we described the study designs, data collection, and data analy-sis procedures in detail, there is a subjective element to several parts of ourstudies. Speciﬁcally, the hypotheses deﬁnition process is based on our experi-ences, understanding of contemporary SE research methods, and belief systemsabout ethics in recruiting student subjects. Similarly, the qualitative analysisat several parts of our studies is subject to the researcher’s interpretation.Speciﬁcally, we categorised the study type in the mapping study and appliedopen coding to the survey free-text answers. Both were not validated. Forthe study type, we tried to be conservative in our categorisation and relymainly on verbatim statements from the papers. For instance, we did not tryto judge whether a study is indeed a “controlled experiment”, or would be bet-ter categorised as a “quasi-experiment” or even a “ﬁeld experiment”. Given thical Issues in Empirical Studies using Student Subjects 15 the maturity and the lack of common terminology with respect to researchmethodology in SE, we believe that a certain lack of precision is acceptablehere. For the open coding of survey answers, we take an interpretivist view-point that subjectivity and diﬀerent interpretations among multiple coders arein fact desirable. Hence, reliability is indeed not given, but also not desirable.

In the following, we will discuss the results of our systematic mapping studyand of the survey. Plots for mapping study data are displayed using greycolours, while light blue is used for survey data.4.1 Systematic Mapping StudyThe ﬁnal selection of papers included 372 primary studies published in top SEvenues.The paper count per venue is depicted in Figure 1 relative to the totalnumber of papers published at each venue. On the lower end of the scale,we see venues that focus on automated tasks and therefore have a naturallylow percentage of studies with students, e.g., MSR and ASE, and studieswith incomplete indexing in the used databases (as discussed in the threatsto validity), e.g., HICSS and IJSEKE. The list is topped by the two venuesfocusing on empirical SE research, EMSE and ESEM conference. Similarly, itcan be seen that the ﬂagship conferences and journals, i.e., EMSE, TOSEM,ICSE, ESEC/FSE, and TSE, all have a percentage of student studies over theaverage. Finally, while a number topic-speciﬁc SE venues have high percentageof publications with student studies, e.g., REJ and RE, others have much lowerpercentage, e.g., MODELS and ISSRE.In Figure 2, the research strategies applied in the primary studies are de-picted (excluding strategies that appeared in only one primary study). Thedominant strategies are controlled experiments (166 studies) and families ofcontrolled experiments (63), followed by case studies (29 publications), sur-veys/questionnaires (13 publications), replications of controlled experiments(12), and individual quasi-experiments (11). A number of diﬀerent strategiesexist with 10 or less publications, e.g., observational studies and mixed-methodstudies.On the left-hand side of Figure 3, the sample size of the primary studiesis depicted. Approximately half of the studies (190) have between 10 and 50subjects. This is followed by larger studies, i.e., 80 studies with between 51and 100 students and 63 studies with between 101 and 500 students. Only4 publications have a sample size of over 500. 16 publications do not clearlyname their subject count. Finally, 19 publications have less than 10 subject.Sample size is directly related to the research strategy. Depending on thetype of data collected and on how the actual study is conducted, sample sizes

E S E M E M S E R E J T O S E M I C S EA O S D I S T I E T SE S E C / F S E R E T S E J S E P I C S M E S Q JJ S S I C S T E A S ES O S Y M A S E J A S E M O D E L S I S S R E O O

P S L AS T V R I S S T A S P E I J S E K EE C S A M S RH I C S S E R N u m be r o f s t ud i e s ( % o f t o t a l pub li s hed ) Venue (Normalised)

Fig. 1: Venue of Mapping Study Papers (Normalised).

29 56 463 23810 713 212166 11 211 c on t r o ll ed e x p f a m il y o f e x p c a s e s t ud y que s t i onna i r e r ep l . o f c on t r . e x p . qua s i − e x pu s e r s t ud y ob s e r v a t i on m i x ed − m e t hod s quan t i t a t i v e s t ud y e x p l o r a t o r y s t ud y e v a l ua t i on s t ud y e y e − t r a ck i ng s t ud y f i e l d s t ud y f a m il y o f qua s i − e x p . c on t r o ll ed e x p .t oo l e v a l ua t i on N u m be r o f s t ud i e s Study Type

Fig. 2: Study Types of Mapping Study Papers.may vary considerably. For instance, running and analysing a survey with alarge number of subjects is typically considerably less work than conductingan observational study with the same number of subjects. Furthermore, weonly recorded the number of student subjects. A study might therefore havea sample size that is larger than the one we name here, in case non-studentsubjects participated as well. thical Issues in Empirical Studies using Student Subjects 17 −

50 51 − − < N A > N u m be r o f s t ud i e s Sample size (a) Sample Size.

275 242540 17 N A bonu s po i n t s pa i d none o t he r s na cks N u m be r o f s t ud i e s Compensation (b) Compensation of Students.

Fig. 3: Sample Size and Compensation in Mapping Study Papers.Of all primary studies, only 22 primary studies clearly stated that theyobtained ethics approval. In stark contrast, 347 studies did not mention ethicsapproval at all. Finally, 3 studies named that they did not have to obtainethics approval, but clariﬁed that there are central guidelines without a formalapproval process, e.g., mandated by the university or the country/state.The majority of studies recruited students on a voluntary basis (154 publi-cations). 77 publications did not state how subjects were recruited. Similarly,107 did not state whether or not participation was voluntary, but stated thatthe study was part of a course. We can therefore not tell whether studentsparticipated on a voluntary basis or not. Finally, 34 studies had mandatoryparticipation by students enrolled in a course. We included in this categorystudies that were performed on the basis of graded assignments or exams, sinceit can be argued that participation is mandatory if the student wants to attaina good or excellent grade.The right-hand side of Figure 3 depicts how students were compensatedin the primary studies. Approximately 75% of the studies (275) did not statehow or if students were compensated for their participation. Of the remaining25%, 40 studies compensated students in the form of bonus points, 25 with aﬁnancial reward, 1 study with snacks, and 7 studies or in another form. Finally,24 studies explicitly stated that there was no compensation in addition to thepractice and training acquired through the study participation.Looking at trends over time, the overall publication count per year is de-picted in Fig. 4. Note that 2019 is incomplete, as the data was extracted inearly April 2019. While there are slightly higher numbers, on average, from2014 on, there is too little data to consider this a trend in the ﬁeld. In partic-ular, 2014 and 2016 stand out.

28 32 28 37 58 42 56 38 47 6 N u m be r o f s t ud i e s Publication year

Fig. 4: Papers per Publication Year.Going in more depth, we plotted the voluntariness and the compensationon a yearly basis in Fig. 5. The y-axis depicts the years in increasing order,and the x-axis depicts the voluntariness (grey-shaded, left-hand side) and thecompensation (no shade, right-hand side). Each bubble shows the numberof studies for a given year with the reported condition. Additionally, next toeach bubble the percentage of studies reporting the given condition is depictedfor each given year. For example, the bubble at the intersection of 2011 and“Mandatory” shows that 4 studies, or 15.38% of studies in that year, reportedmandatory participation. Similar to the overall numbers, voluntary participa-tion is the most common study option in all years, followed by studies thatwere part of a course (with no further details on voluntariness). The samepattern is visible in the compensation plot, with most studies in all yearsnot specifying the compensation. Looking at the progress over time, there isno clearly visible trend in either of the plots. For instance, the percentageof studies with voluntary participation is almost identical in 2018 and 2010,while dropping in 2011 and 2012. Similarly, while the percentage of studiesnot reporting compensation is highest in 2010, there is no decrease over time.In Fig. 6, we depict an additional bubble plot plotting reported study con-ditions against the publication year. The x-axis depicts how many of the threestudy conditions (voluntariness, compensation, ethics approval) were reportedby the diﬀerent publications (grey-shaded, left-hand side) and the ethics ap-proval (no shade, right-hand side). There is a visible decrease over time ofstudies not reporting any of the three conditions. While between 46.55% and53.57% lacked these conditions in between 2010 and 2014, the numbers aresubstantially lower afterwards, ranging between 30.95% and 30.29%. While2019 has again 50% of studies not reporting any condition, this data is incom-plete, as discussed. Overall, only 11 papers report all three conditions, 1 in2011, 3 in 2014, 1 in 2016, and 6 in 2017. Due to this low number, we decided thical Issues in Empirical Studies using Student Subjects 19

Points NA None Other Paid SnacksCompensation

Mandatory NA Part of course, NA Voluntary

Voluntariness

Fig. 5: Voluntariness and Compensation in Studies Per Year.Table 2: Primary Studies Reporting all of Voluntariness, Compensation, andStatus of Ethics Approval.

Key Study Type SampleSize Voluntariness Compensation EthicsApproval[2] Mixed-methods 75 voluntary Other Yes[46] Controlled experiment 48 voluntary Movie vouchers Yes[13] Controlled experiment 53 voluntary Bonus points Yes[17] Controlled experiment 35 voluntary Paid Yes[5] Eye-Tracking study 56 voluntary Bonus points Yes[44] Controlled experiment 60 voluntary Bonus points Yes[36] Controlled experiment 45 voluntary Paid Yes[33] Family of cont. exp. 155 mandatory Bonus points Yes[20] Controlled experiment 15 voluntary paid Yes[8] Quasi-Experiment 20 mandatory none Other[34] Controlled experiment 50 voluntary none Yes to reference these papers in full in Table 2. For ethics approval, the percent-age of non-NA answers is so low that any analysis of trends would likely bearbitrary and due to random error.

Number of Study Conditions Reported(Ethics Approval, Voluntariness, Compensation)

Ethics Approval Obtained

Fig. 6: Number of Study Conditions and Ethical Approval Status in StudiesPer Year4.2 SurveyWe received 100 answers to our survey , corresponding to a response rate of21.1%. One participant did not answer any questions beyond the demographicquestions. Furthermore, two participants left the last block of questions re-garding their opinions on study circumstances unanswered. Our participants are mainly located in United States of America (13 partic-ipants), Germany (11 participants), Spain (10 participants), and Sweden (9participants). The remaining countries follow at some distance, with Israelhaving 5 participants, Australia, Brazil, Canada, Italy, the Netherlands, andPortugal having 4 participants each, Finland, New Zealand, and Switzerlandhaving 3 participants each, and Austria, Poland, and the United Kingdom hav- Due to this convenient number, we use number of answers and percentage interchange-ably in the following.thical Issues in Empirical Studies using Student Subjects 21 ing 2 participants each. Finally, 9 further countries had one participant each,3 participants did not state their location, and one answer was not identiﬁable(a free-text answer that could not be mapped to a country).74% of participants have a PhD degree as their highest degree, 12% a ha-bilitation degree, and 10% of participants hold a Master degree. 3 participantshold a high school degree and 1 participant another degree.Regarding currently or last held academic position, 14 participants statedprofessor, 9 full professor, 24 associate professor, 22 assistant professor, 15 PhDstudent/candidate, 5 postdoc, and the remaining 10 various other positions (4senior lecturer, 2 senior researcher, 1 research scientist, 1 master student, and2 adjunct professor). Of our participants, 43% stated that they require ethics approval. 17% do notrequire ethics approval, but have to follow mandatory steps that regulate howstudies with student participants are to be conducted. Finally, 39% are neitherrequired to obtain ethics approval, nor to follow mandatory steps.In Figure 7 (left-hand side), we depict the number of student studies par-ticipants conducted during the last 5 years. The majority conducted 1 to 5studies (62%), followed by 6 to 10 studies (18%) and more than 10 studies(12%). Finally, 5 participants did not conduct any studies with student par-ticipation, and one participant did not answer.During those studies (Figure 7 (right-hand side)), 63% of participants re-cruited their own students, and 34% did not do so. − s t ud i e s − s t ud i e s > s t ud i e s N one N / A N u m be r o f an s w e r s Studies in last 5 years (a) Number of Studies in Last 5 Years

63 34 1 Y e s N o N / A N u m be r o f an s w e r s Recruitment of own students (b) Recruitment of Own Students

Fig. 7: Number of studies and Recruitment of own students We do not list countries with only one participant, to ensure participant anonymity.2 Grischa Liebel, Shalini Chakraborty

65% of our participants used entirely voluntary participation. 30% oﬀeredstudents a choice to participate, but participation inﬂuenced grading in someway. Finally, 2 participants used mandatory participation.In Figure 8, we depict how the participants compensated participation intheir studies. 50% did not oﬀer any compensation to their study participants.22% oﬀered bonus points/credits in their course. 10% oﬀered a ﬁnancial re-ward, including lotteries with cash prizes. Finally, 4% oﬀered snacks and 11%other forms of compensation. Regarding the bonus point compensation, manystudies did not provide further details on how exactly these points or creditswere oﬀered, while some papers stated that the bonus points would count to-wards the ﬁnal grade. From personal experience, we also know of cases wherebonus points lead to more than the “full points” in the course. For instance, acourse could have a total of 100 points in all assignments and exams, but thebonus points would make it possible to reach 110.

50 22 10 411 1 N one P o i n t s O t he r M one y S na cks N / A N u m be r o f an s w e r s Compensation

Fig. 8: Subject Compensation in the StudiesThe vast majority of participants, or 88%, use informed consent in theirstudies. 9% did not use informed consent and 1 participant did not answer.Similarly, the majority (92%) oﬀered their study participants to withdrawat any time, while 5% did not oﬀer this option.

In addition to asking participants how they conducted studies in the last years,we also asked them to state their opinions on a number of statements. Theserelate to how studies should be connected to a course, what the participation thical Issues in Empirical Studies using Student Subjects 23 conditions should be, and how acceptable diﬀerent practices are. Finally, weasked them which characteristics of a study should be reported in a publica-tion. In all following ﬁgures, the green bars to the right of the centre line depictagreement (light green) and strong agreement (dark green), the grey bars in themiddle depict neutrality towards the statement, and the brown bars to the leftof the centre line depict disagreement (light brown) and strong disagreement(dark brown). The statements are ordered by overall agreement. In additionto the overall response, we also report numbers for the sub-groups of students(PhD candidates/students, master students, and research scientist, n = 17),early-stage researchers (PostDoc, Assistant professor, senior/university lec-turer, and adjuncts, n = 35), and senior faculty (Associate professor, professor,full professor, n = 48).Figure 9 depicts the participants’ opinions regarding the connections be-tween study and a course. Overall, there is (strong) agreement that reviewersof a manuscript should speciﬁcally check whether or not the study had edu-cational value to the participants, that the curriculum should not be changedto ﬁt a study into the course, and that studies should be connected to courseprojects. The latter point is related to projects in particular, i.e., practicalcourse moments. Disagreeing with this statement does not necessarily meanthat the study is completely disconnected from the course content, it mightsimply not be connected to a practical course moment.The sub-groups do not exhibit large diﬀerences here for this block of ques-tions. Students have less disagreement on the statement that studies shouldbe connected to projects (7% compared to 21% in early-stage researchers and18% in senior faculty). Similarly, the student group has a more neutral opinion(38% compared to 18% and 22%) and less disagreement (6% compared to 15%and 17%) on the statement that the curriculum should not be changed to ﬁtin a study. Fig. 9: Opinions on the Study-Course Connection.

Figure 10 depicts the participants’ opinions regarding the circumstancesof the study. There is strong agreement that informed consent including thepossibility to withdraw at any time should be used. Similarly, the majority ofparticipants agree that studies should always be voluntary, and disagree thatmandatory attendance may be warranted.Interestingly, 27 participants agreed or strongly agreed that mandatoryparticipation may be warranted. Among those, 16 are senior faculty (associateprofessor, professor, full professor), 5 are early-stage researchers (assistant pro-fessor, senior lecturer), and the remaining 6 students (PhD student/candidate,research scientist). In terms of geographic distribution, 14 European partic-ipants believed mandatory participation might be warranted, followed by 5from North America, 4 from Oceania, and 3 from Asia. Several participantsclariﬁed their answers in free text. For instance, one participant noted that, ina course that is elective, mandatory participation might be warranted withinlimits. That is, the students should still have an alternative, e.g., completing xout of y assignments, where one option is participation in the empirical study.Six people noted that participation is diﬀerent from data usage. That is, theinstructor may require participation in a study, e.g., to expose the students tothe process of empirical studies, but opting out of their data being used forresearch purposes should always be possible.In terms of sub-groups, there is very little diﬀerence. The only notableexception are the views on mandatory participation. Here, only 14% of early-stage researchers agree, with 71% disagreeing. In comparison, senior facultyhave 35% agreement and 48% disagreement, and students have 35% and 59%.

Fig. 10: Opinions on Study Circumstances.Figure 11 depicts the participants’ opinions regarding a number of practicesthat are discussed in related work on ethics in student studies. The highestagreement (91%) was registered on the practice to encourage students to par- thical Issues in Empirical Studies using Student Subjects 25 ticipate in a study. Only 2% disagreed with this practice. 84% agreed thatit is acceptable to use their own students (registered in a course given by orsupervised by the researcher) as subjects in an empirical study. Only 4% dis-agreed, while 12% remained neutral. 49% agree that it is acceptable for theresearcher conducting the study to be the same person as the course instructor,while 17% disagree and 34% remain neutral. The next two statements causeda considerable polarisation in answers: 41% agree and 38% disagreed that it isacceptable to withhold study goals prior to a study. Part of this polarisationis explained by free-text answers of participants stating that it is sometimesnecessary to withhold the detailed hypotheses in order to not risk the validityof the studies. Similarly, 40% agree and 26% disagree that it is acceptable forthe course instructor to know who participated in a study, while 34% stateda neutral opinion. The remaining three statements all have considerable dis-agreement from the survey participants: 56% disagree that it is acceptable tobase part of the assessment in a course on participation in a study, with 30%agreeing and 14% neutral. 71% disagree, most of which strongly, that it is ac-ceptable to base part of the assessment in a course on performance in a study,with 22% agreeing and 7% neutral. Finally, 85% disagree that it is acceptableto withhold information regarding the use of data collected in a study fromthe students. Only 6% agree with this statement and 9% remain neutral.As before, we do not see large disagreements between the sub-groups. Onediﬀerence is that students disagree substantially more on the statements thatit is acceptable to withhold goals (12% agreement, 53% disagreement) com-pared to senior faculty (44% agreement, 29% disagreement) and early-stageresearchers (52% agreement, 42% disagreement).The ﬁnal set of statements relate to reporting of study characteristics ina publication manuscript. All statements have very high agreement, as de-picted in Figure 12, generally favouring that study characteristics should bestated transparently. 98% agree that voluntariness of participation should bereported, while the use of informed consent (91% agreement), student com-pensation (84% agreement) and status of ethics approval (83% agreement)received slightly less agreement.As the overall picture is very decided, the sub-groups do not exhibit anylarge diﬀerences in any of the points.

Overall, these results allow us to evaluate our hypotheses formulated prior tothe survey. An overview of the evaluation is given in Table 3. H , that the majority of participants are not required to obtain ethics ap-proval, is not supported by the survey data. 43% of all participants requireethics approval, and 17% have at least mandatory steps to follow. Looking atindividual countries, we get a mixed picture. For several countries, all partici-pants answered that ethics approval is required, indicating national legislation.Among these are, e.g., the Netherlands, New Zealand, Canada, and Israel. Inother countries, e.g., Brazil and Spain, the answers were mixed with no option Fig. 11: Opinions on Acceptability of Practicesclearly dominating. In the country with the largest participation, the USA,11 participants stated that they needed ethics approval, one stated that theydo not need ethics approval, but have mandatory regulations to follow, andone participant stated that they did neither require ethics approval, nor haveto follow mandatory guidelines. Finally, for some countries, the majority an-swered that no approval is needed, e.g., nine out of eleven answers by Germanparticipants. H , that the majority of participants recruit their own students for empir-ical studies, is supported by our data. 63% of participants answered that theydid so in the past ﬁve years. Similarly, 84% agree or strongly agree later in thesurvey that it is acceptable to do so. H , that the majority of participants perform voluntary studies, as a part ofcourses, is also corroborated by our data. Overall, 95% of participants answered thical Issues in Empirical Studies using Student Subjects 27 Fig. 12: Opinions on Reporting of Study InformationTable 3: Hypotheses Evaluation

ID Hypothesis Yes/No Key Metric H The majority of participants are notrequired to obtain ethics approval No 43% require ethics approval H The majority of participants recruittheir own students for empiricalstudies. Yes 63% recruit own students H The majority of participants per-form voluntary studies, as a part ofcourses. Yes 95% have voluntary participa-tion H The majority of participants do notoﬀer compensation to their subjects. ? 50% do not oﬀer compensation,50% oﬀer some form of compen-sation H The majority of participants use in-formed consent, including the optionto withdraw voluntarily. Yes 88% use informed consent, 92%the option to withdraw. H The majority of participants agreethat studies should relate to courselearning outcomes or project work. Yes Over 50% agreement on all re-lated points. H The majority of participants agreethat informed consent and with-drawal should be oﬀered. Yes 92% agree H The majority of participants disagreethat information on ethics approval,voluntariness, compensation and in-formed consent should be included inpublications. No Between 83% and 98% agree-ment that their studies were voluntary. Of the 65 participants that recruited theirown students, 61 answered that the recruitment was voluntary.For H , the majority of participants do not oﬀer compensation to theirsubjects in addition to the learning beneﬁt, the answer is mixed. While nocompensation is the most common answer (with 50 participants), there are avariety of other compensations being oﬀered, also summing up to 50% of theparticipants. H , that the majority of participants use informed consent, including theoption to withdraw voluntarily, receives by far the strongest support fromour survey data. 88% of participants used informed consent, and 92% oﬀeredstudents the option to withdraw from the study. H targets the diﬀerent opinions participants have regarding the relationof a study to the course curriculum and learning outcomes. All statementshave an agreement of over 50%, thus supporting the hypothesis. However, itis interesting to note that they also have reasonably high disagreement values.For instance, 19 participants disagree that reviewers of an empirical studyinvolving students should check for educational value of a study. No free-textanswers were left explaining these points.For H , we expected that the majority would agree that informed con-sent and option to withdraw need to be present in a study. This was clearlysupported, with 92% agreement to both statements. The second part of thehypothesis aimed towards mandatory participation, stating that the major-ity would ﬁnd mandatory participation acceptable given the circumstances.However, 58% disagreed that mandatory participation may be OK.Finally, hypothesis H considered diﬀerent characteristics of a study tobe included in publications. Given the lack of information on ethics approval,voluntariness, compensation, and informed consent in our mapping study, weexpected most participants to answer that this information is not necessary.On the contrary, we found large support that these aspects should in fact bedescribed in a study, with 98% agreeing that voluntariness should be reported,followed by informed consent (91%), compensation (84%), and ethics approval(83%). This is clearly in contrast to the ﬁndings of our mapping study. Several of our ﬁndings lead to interesting discussions regarding the ethics ofempirical studies with student participation in SE. We will discuss these in thefollowing according to the four pillars of ethical research by [40,47]: scientiﬁcvalue, conﬁdentiality, informed consent, and beneﬁcence.5.1 Scientiﬁc ValuePrimary studies with student participation are published substantially in thetop SE venues, clearly indicating that there is scientiﬁc value in many of thesestudies. In our mapping study, there are not publication trends that clearlystand out. However, it is worth noting that the prime venues such as ICSE orESEC/FSE do not exhibit any patterns that suggest an inherent bias againststudies with student subjects. Instead, these venues have among the highestpercentage of such studies.The primary method used in empirical studies with students is controlledexperiment, along with related methods, e.g., families of controlled experi- thical Issues in Empirical Studies using Student Subjects 29 ments, (families of) quasi-experiments, and replications of controlled experi-ments. This is sensible given that student populations are often homogeneousin terms of knowledge and experience, there is control as to which techniquesthey have learned, and recruitment is easy compared to professional subjects.Furthermore, in qualitative methods such as case studies, observational stud-ies, and ethnography, the case context is a deciding factor and can typically notbe separated from the studied phenomenon [41]. Therefore, such studies areonly feasible with student subjects if their context is of direct interest to thestudy, e.g., studies on curriculum design, or on student behaviour. Similarly,it is not unexpected that qualitative studies in general are under-represented,given that the number of qualitative studies is generally lower in SE [43].Potential factors playing a role here are biases against students as subjects,against qualitative studies in general, and a comparable lack of maturity inqualitative methods compared to quantitative methods in SE.While ideally sample size should be appropriate according to the chosenresearch strategy and context, reviewers might tend to reject papers with lowersample sizes, even if justiﬁed. However, we see no such indication in our data.5.2 ConﬁdentialityRegarding the issue of conﬁdentiality , we do not either see a need for action.The majority of survey participants ﬁnd it unacceptable to withhold infor-mation regarding data usage from student subjects, and ﬁnd it acceptable towithhold goals and hypotheses only if needed for validity reasons. Similarly,recent years have seen increasing awareness regarding privacy laws such as theEU General Data Privacy Regulation (GDPR) [48], leading in turn to an in-creased attention to conﬁdentiality issues. This is also witnessed in our surveyby a number of free-text comments stating that it would be illegal to withholdinformation on how data is used from the student subjects.5.3 Informed Consent

Informed consent is used by 92% of our survey participants, including the rightto withdraw from a study. We did not extract this data from the mappingstudy, but consent was one of the keywords we searched for when extractingdata. Our feeling is that, while more prominent than ethics approval, use ofinformed consent was not typically reported in primary studies. That is, whileit is an encouraging sign that the majority of survey participants use informedconsent, we see a need for reporting this in the ﬁnal papers. As discussed inSinger and Vinson [40], the elements that should be part of informed con-sent are disputed in ethics research. We would like to reiterate the authors’statements that important parts in informed consent are information on vol-untariness, the right to withdraw from the study at any time, and the act ofconsenting to the study. Guidance can be oﬀered by oﬃcial regulations that list required elements of informed consent, such as [15], or by the guide forethics approvals given in [47].Related to giving the consent, the aspect of whether or not participationin a study is actually voluntary matters. The majority of survey participantsused voluntary participation in their studies. Similarly, the majority of pri-mary studies in our mapping study used voluntary participation, at least if weexclude those that did not report on the voluntariness. However, the majorityof survey participants recruited their own students, and found it acceptablethat the researcher is identical to the course instructor. Finally, 40% foundit acceptable that the instructor knows who participated and 34% remainedneutral on this statement. Galster et al. [18] even found it helpful that the in-structor is the researcher, since students have more trust than in a researcherunknown to them. There seems to be little concern that students might feelpressured to participate if the researcher is identical to the researcher [40,39,42], even if participation is voluntary. Suggested measures, such as lettinga graduate student handle recruitment instead of the instructor, would notadd substantial overhead to a study and should therefore be easy to imple-ment. This is a clear lack of ethical practice in the ﬁeld, and we thereforesee a strong need for increasing discussion (and corresponding action) regard-ing our recruitment practices and study conditions. In addition to the ethicalaspect, pressuring students into participating might aﬀect the validity of theﬁndings as well, since students will behave diﬀerently compared to voluntaryparticipation. We would like to believe that only very few researchers know-ingly coerce their students into participation, but this pressure can also ariseunintentionally.5.4 BeneﬁcenceWe believe that achieving a positive beneﬁcence , i.e., “a weighted combinationof risks, harms, and beneﬁts to the subjects and society” [30], is the mostdiﬃcult of the four pillars to address, and the one where the community islacking most.As Carver et al. [10,11] state, “students have the right to reach their edu-cational goals, and empirical studies can compete with other instruction formsfor the scarce resources in courses, e.g., lecture time”. On the one hand, stu-dents can have a direct beneﬁt of participating in empirical studies, e.g., theycan be trained in skills valued in industry, receive training in empirical SE, orreceive feedback on their current abilities. For instance, there is a substantialeﬀort in SE education to expose students to SE trends [12]. Hence, studentscan familiarise themselves with SE trends in practical course projects, thatcan be connected to empirical studies. This will increase their chances on thejob market. On the other hand, there are issues that potentially lower thebeneﬁcence, e.g., spending considerable time on an empirical study instead ofcovering other important course topics, spending time on irrelevant or out-dated techniques or methods (as a part of a control group), and higher stress thical Issues in Empirical Studies using Student Subjects 31 levels due direct or indirect coercion to participate and perform well in anempirical study. Connecting to the previous example, instructors might decideto investigate whether a new practice or technique increases productivity orquality. To do so, they necessarily have to compare this new treatment to abaseline, which means students might end up in the control group being ex-posed to an outdated method, or at least having to spend considerable timeon this method in case of a paired experiment design. The central issue hereis whether or not an instructor is able to objectively assess whether or nottheir study is justiﬁed, and contributes to the students’ learning goals. In par-ticular, conﬁrmation bias might lead instructors to believe that their studycontributes more towards the students’ learning than it actually does, or thata technique/method developed by themselves is more important or relevantthan it actually is. In this direction, we also interpret the survey participants’strong agreement on the statement that a curriculum should not be changed toﬁt in a study. If the researcher is identical to the instructor, there is a risk thatthe instructor/researcher mis-judges the relevance of a study topic, especiallyif studying their own techniques or methods. Storey et al.’s [42] report is agood example of issues that might arise as a result, e.g., exposing students toincomplete prototype tools, or providing students with more support in onearea (the area of study) than in another (an alternative assignment). Whilethis is clearly a diﬃcult topic to address, we believe more discussion is neededsurrounding the topic of beneﬁcence to student subjects.5.5 Cross-Cutting ConcernsFinally, as a topic touching on all four pillars, ethics approval is an importantissue for discussion. In the mapping study, we see similar statistics as reportedfor the CHI conference series in [9]. That is, only a small fraction of all studiesreports whether they obtained, or if they had to obtain, ethics approval. Incontrast, 83% of our survey participants believe that ethics approval shouldbe reported. Hence, there is currently a strong discrepancy between what thecommunity believes should be reported, and what is actually being reported.We ﬁnd that only 39% of our survey participants are required to obtainethics approval. This is in conﬂict to the statement made in Ko et al. [26]that it is likely that authors need ethics approval. In fact, one participanteven pointed out that they would not be able to request approval, since theuniversity does not have processes in place that support this. From personalexperience, we know that reviewers sometimes request ethics approval, or judgeit as a negative point if ethics approval was not obtained in a study. The mixedpicture on a per-country basis also clearly shows that legislation in many casesdiﬀers between states or universities in the same country, or as some partic-ipants pointed out, depends on the agency funding the research. Therefore,our ﬁndings clearly show that reviewers should abstain from requesting ethicsapproval, unless they know that the authors are indeed required to obtainapproval. However, this does not free the authors from the responsibility to know their local regulations and clarify those - and reviewers might very wellrequest clariﬁcation of the regulations surrounding ethics approval.As to the usefulness of a central board that grants ethics approvals, wesee two conﬂicting opinions. On the one hand, several participants stated thatonly the researchers are familiar enough with the method, study context, andthe ﬁeld of study to be able to judge ethical consequences. Similarly, the ex-periences by Storey et al. [42] demonstrate that even with ethics approval,ethical issues remain. On the other hand, one participant pointed out that thevalue of an IRB is in having an independent unit that is not subject to thesame biases that researchers might have towards their own study. Indeed, webelieve that cognitive bias is a central issue in case a researcher has to judgeethical issues surrounding their studies. In particular, the body of work onconﬁrmation bias impressively demonstrates that human beings are stronglyaﬀected in judging and/or justifying their own actions [23].Without a need for ethics approval, there is a chance that lack of awarenessor even disregard for ethical issues can spread, similar to the worrisome picturedrawn in 2001 by Hall and Flynn [21]. Our mapping study results impressivelydemonstrate that, currently, reporting ethics approval or related regulations isextremely uncommon, and other study conditions are similarly under-reported.Speciﬁcally, the number of studies reporting none of the extracted study con-ditions (voluntariness, compensation, ethics approval) is alarming. The bubbleplots in Section 4.1 show no discernible improvement over time, indicating thatthere is no progress. This is in direct contrast with the opinions of the surveyparticipants, where a clear majority indicated that study conditions should bereported. If we consider the survey sample to be representative for the pop-ulation, we have to conclude that the opinions are not currently reﬂected inthe community’s practices. Therefore, we believe that there is need for action,e.g., by establishing a code of practice, as suggested by [14]. Existing guidelinessuch as [11,47], and the code of ethics for SE professionals [19] could serve asa basis for such a code of practice for SE research. We outline the elements ofsuch a code of practice in the following.5.6 A Code of Practice for Ethical Studies with Student SubjectsWe believe that a large part of the current shortcomings in SE are due to alack of reporting practice, something that could be ﬁxed easily. Speciﬁcally,we believe that many important aspects are under-reported because of a lackof attention For instance, conference chairs and journals could speciﬁcallyask for meta information on study conditions. Similar disclosures are alreadycommon practice in some SE journals for other important aspects, such asﬁnancial disclosure or conﬂict of interest. For instance, Springer EmpiricalSoftware Engineering has a declaration section, where authors are requested Ironically, we forgot to add information on ethics approval and voluntariness in oursurvey in the ﬁrst version of this paper.thical Issues in Empirical Studies using Student Subjects 33 to disclose information on funding, conﬂicts of interest, as well as availabil-ity of code and data . The survey also shows that SE researchers familiarwith student-subject studies strongly support completely reporting study con-ditions.Community agreement is needed on what should be reported, and in whatform. Here, a code of practice could help that outlines what elements need tobe reported in a study. As a starting point, we suggest the following elements.1. Voluntariness – How were the subjects recruited? – Was participation in the study mandatory or voluntary? – If participation was mandatory, was it possible to opt out of the databeing used for research purposes? – Were the researchers involved in teaching the course? If yes, how wasit ensured that subjects did not feel pressured to participate?2. Compensation – How were subjects compensated? – If any compensation was given in relation to a university course, e.g.,bonus points, which potential eﬀect did non-participation have on thegrade?3. Ethics approval and informed consent – Was ethics approval obtained? If not, why so (e.g., not possible ornecessary)? – How was informed consent ensured? – Was withdrawal possible at any time? What eﬀect did withdrawal haveon compensation?Apart from reporting, there are a number of issues that we believe need tobe addressed, but require ongoing discussions and more maturity in the ﬁeld.The widely diﬀering opinions in our survey on factors such as voluntariness orcompensation, which also showed up to some extent in the review process ofthis paper, demonstrate clearly that the ﬁeld has not agreed on such a standardpractice. For instance, while bonus points as a form of compensation might beconsidered acceptable by faculty and students in some places, students couldfeel pressured to participate if bonus points are oﬀered. Similarly, as pointedout by several survey participants, mandatory participation in itself does notneed to be unethical, e.g., when it is required to reach certain course learningoutcomes. However, in such a case, students should have the option to declinethe use of their data in a research study. Finally, the experiences of Storey etal. [42] in addition show that obtaining ethics approval alone does not ensurean ethical study, but issues might remain. Therefore, continued discussions areneeded in order to reach a set of accepted practices. While there is on-going debate on the ethical impact of software products,and the ethics of new research trends such as increasingly capable learningalgorithms, there has been a gap in discussion regarding the daily researchpractices in SE. Many SE researchers regularly conduct studies with students,as they are easy to recruit, and allow for relatively large sample sizes. How-ever, students are also vulnerable since they are typically under the controlof their instructors, and unethical practices can therefore limit their learningand potentially cause other harm. Therefore, we aimed to re-visit the currentpractices in student-subject studies in SE from an ethics perspective.To do so, we conducted a systematic mapping study on ethical aspects ofstudent use in empirical SE studies. Based on 372 primary studies in top-SEvenues, we ﬁnd that it is uncommon to disclose information on ethics ap-proval, voluntariness of participation, and compensation. Out of the 372 pri-mary studies, 160 did not report any, while 125 reported one of the conditions,76 reported two, and only 11 studies reported all three.We followed up the systematic map with a survey conducted among theauthors of the primary studies, to which we received 100 answers. The ﬁndingsshow that the majority of participants are not required to obtain ethics ap-proval, that they typically recruit their own students on a voluntary basis, andthat student subjects are compensated by approximately half the participants.Agreement on the acceptability of diﬀerent practices diﬀers. The highest ap-proval receive statements that informed consent and the option to withdraw atany time should be given. Mandatory participation is met by disapproval bythe large majority of participants, but 27 participants do agree that it mightbe warranted under certain conditions, e.g., when students can withdraw theapproval for the data to be used for research purposes. Whether or not theinstructor should know student participation receives a mixed response, eventhough related work on ethics stresses the point that students might feel pres-sure to participate if this is the case. Finally, the majority strongly agreesthat details on voluntariness, informed consent, compensation, and ethics ap-proval should be disclosed in publications and reviewers should check that thisinformation is present and complete.Compared to the early 2000s, we see an increased awareness of ethical is-sues, e.g., reﬂected in the widespread use of informed consent. However, theactual ﬁgures on reporting these study conditions are alarming and requireattention from the community. Therefore, we believe that a code of practiceis required that states which conditions need to be reported when publishingempirical studies with student participation. Speciﬁcally, details on voluntari-ness of participation, compensation, ethics approval, and the use of informedconsent and withdrawal possibilities need to be reported, e.g., as a part ofa mandatory appendix in journals, similar to existing disclosure of fundingand conﬂict of interest. Journal editors and PC chairs then need to make surethat this code of practice is indeed enforced. In addition, we see the need fordiscussions on what is ethically acceptable when it comes to studies with stu- thical Issues in Empirical Studies using Student Subjects 35 dent involvement. From the results of the mapping study, the survey, as well asfrom the review process of this paper, we can clearly see that opinions on whatis ethical practice currently diﬀers substantially. Finally, beyond committees,regulations, and organisations, we believe that individual researchers in SEneed to pay more attention to the ethical aspects of their studies. Or, as Vin-son and Singer put it, “Ethical research does not happen by chance. Individualresearchers must be committed to making their research ethical.” [47]

Acknowledgements

We would like to thank all survey participants, as well as the indi-viduals who gave further input on the survey and feedback on the draft manuscript.6 Grischa Liebel, Shalini Chakraborty

References

1. Andrews, A.A., Pradhan, A.S.: Ethical issues in empirical software engineering: thelimits of policy. Empirical Software Engineering (2), 105–110 (2001)2. Anvari, F., Richards, D., Hitchens, M., Babar, M.A., Tran, H.M.T., Busch, P.: Anempirical investigation of the inﬂuence of persona with personality traits on conceptualdesign. Journal of Systems and Software , 324–339 (2017)3. Aycock, J., Buchanan, E., Dexter, S., Dittrich, D.: Human subjects, agents, or bots:Current issues in ethics and computer security research. In: International Conferenceon Financial Cryptography and Data Security, pp. 138–145. Springer (2011)4. Baltes, S., Diehl, S.: Worse than spam: Issues in sampling software developers. In:Proceedings of the 10th ACM/IEEE International Symposium on Empirical SoftwareEngineering and Measurement, pp. 1–6 (2016)5. Barik, T., Smith, J., Lubick, K., Holmes, E., Feng, J., Murphy-Hill, E., Parnin, C.:Do developers read compiler error messages? In: 2017 IEEE/ACM 39th InternationalConference on Software Engineering (ICSE), pp. 575–585. IEEE (2017)6. Bowser, A., Tsai, J.Y.: Supporting ethical web research: A new research ethics review.In: Proceedings of the 24th international conference on world wide web, pp. 151–161(2015)7. Buchanan, E., Aycock, J., Dexter, S., Dittrich, D., Hvizdak, E.: Computer science secu-rity research and human subjects: Emerging considerations for research ethics boards.Journal of Empirical Research on Human Research Ethics (2), 71–83 (2011)8. Budgen, D., Burn, A.J., Kitchenham, B.: Reporting computing projects through struc-tured abstracts: a quasi-experiment. Empirical Software Engineering (2), 244–277(2011)9. Buse, R.P., Sadowski, C., Weimer, W.: Beneﬁts and barriers of user evaluation in soft-ware engineering research. In: Proceedings of the 2011 ACM International Conferenceon Object Oriented Programming Systems Languages and Applications, OOPSLA ’11,pp. 643–656 (2011). DOI 10.1145/2048066.204811710. Carver, J., Jaccheri, L., Morasca, S., Shull, F.: Issues in using students in empiricalstudies in software engineering education. In: Proceedings of the 9th InternationalSymposium on Software Metrics, p. 239. IEEE Computer Society (2003)11. Carver, J.C., Jaccheri, L., Morasca, S., Shull, F.: A checklist for integrating studentempirical studies with research and teaching goals. Empirical Software Engineering (1), 35–59 (2010)12. Cico, O., Jaccheri, L., Nguyen-Duc, A., Zhang, H.: Exploring the intersection betweensoftware industry and software engineering education-a systematic mapping of softwareengineering trends. Journal of Systems and Software , 110736 (2020)13. Dahan, M., Shoval, P., Sturm, A.: Comparing the impact of the oo-dfd and the use casemethods for modeling functional requirements on comprehension and quality of models:a controlled experiment. Requirements Engineering (1), 27–43 (2014)14. Davison, R.M., Kock, N., Loch, K.D., Clarke, R.: Research ethics in information systems:would a code of practice help? Communications of the Association for InformationSystems (1), 4 (2001)15. Electronic Code of Federal Regulations: Title 45, subtitle a, subchapter a, part 46:Protection of human subjects, §

16. Falessi, D., Juristo, N., Wohlin, C., Turhan, B., M¨unch, J., Jedlitschka, A., Oivo,M.: Empirical software engineering experts on the use of students and profession-als in experiments. Empirical Software Engineering , 452–489 (2018). DOI10.1007/s10664-017-9523-317. Floyd, B., Santander, T., Weimer, W.: Decoding the representation of code in the brain:An fmri study of code review and expertise. In: 2017 IEEE/ACM 39th InternationalConference on Software Engineering (ICSE), pp. 175–186. IEEE (2017)18. Galster, M., Tofan, D., Avgeriou, P.: On integrating student empirical software engi-neering studies with research and teaching goals. In: 16th International Conference onEvaluation & Assessment in Software Engineering, EASE ’12, pp. 146 – 155 (2012)thical Issues in Empirical Studies using Student Subjects 3719. Gotterbarn, D., Miller, K., Rogerson, S.: Software engineering code of ethics. Commu-nications of the ACM (11), 110–118 (1997)20. Grubb, A.M., Chechik, M.: Modeling and reasoning with changing intentions: an exper-iment. In: 2017 IEEE 25th International Requirements Engineering Conference (RE),pp. 164–173. IEEE (2017)21. Hall, T., Flynn, V.: Ethical issues in software engineering research: a survey of currentpractice. Empirical Software Engineering (4), 305–317 (2001)22. H¨ost, M., Regnell, B., Wohlin, C.: Using students as subjects—a comparative study ofstudents and professionals in lead-time impact assessment. Empirical Software Engi-neering (3), 201–214 (2000)23. Jørgensen, M., Papatheocharous, E.: Believing is seeing: Conﬁrmation bias studies insoftware engineering. In: 2015 41st Euromicro Conference on Software Engineering andAdvanced Applications, pp. 92–95. IEEE (2015)24. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviewsin software engineering (2007)25. Kitchenham, B.A., Pﬂeeger, S.L.: Personal opinion surveys. In: Guide to advancedempirical software engineering, pp. 63–92. Springer (2008)26. Ko, A.J., LaToza, T.D., Burnett, M.M.: A practical guide to controlled experiments ofsoftware engineering tools with human participants. Empirical Software Engineering (1), 110–141 (2015). DOI 10.1007/s10664-013-9279-327. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data.biometrics pp. 159–174 (1977)28. Lethbridge, T.C., Sim, S.E., Singer, J.: Studying software engineers: Data collectiontechniques for software ﬁeld studies. Empirical software engineering (3), 311–341(2005)29. Liebel, G.: Dataset: Ethical issues in empirical studies using student subjects: Re-visiting practices and perceptions (2021). DOI 10.5281/zenodo.4412263. URL http://dx.doi.org/10.5281/zenodo.4412263

30. McNeill, P.M.: The ethics and politics of human experimentation. CUP Archive (1993)31. Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M.: Systematic mapping studies insoftware engineering. In: 12th International Conference on Evaluation and Assessmentin Software Engineering (EASE) 12, pp. 1–10 (2008)32. Pﬂeeger, S.L., Kitchenham, B.A.: Principles of survey research: part 1: turning lemonsinto lemonade. ACM SIGSOFT Software Engineering Notes (6), 16–18 (2001)33. Riaz, M., King, J., Slankas, J., Williams, L., Massacci, F., Quesada-L´opez, C., Jenkins,M.: Identifying the implied: Findings from three diﬀerentiated replications on the useof security requirements templates. Empirical Software Engineering (4), 2127–2178(2017)34. Riaz, M., Slankas, J., King, J., Williams, L.: Using templates to elicit implied securityrequirements from functional requirements-a controlled experiment. In: Proceedings ofthe 8th ACM/IEEE International Symposium on Empirical Software Engineering andMeasurement, pp. 1–10 (2014)35. Runeson, P.: Using students as experiment subjects–an analysis on graduate and fresh-men student data. In: Proceedings of the 7th International Conference on EmpiricalAssessment in Software Engineering, pp. 95–102. Citeseer (2003)36. Sakhnini, V., Mich, L., Berry, D.M.: Group versus individual use of power-only epm-create as a creativity enhancement technique for requirements elicitation. EmpiricalSoftware Engineering (4), 2001–2049 (2017)37. Salda˜na, J.: The coding manual for qualitative researchers. Sage (2015)38. Salman, I., Misirli, A.T., Juristo, N.: Are students representatives of professionals insoftware engineering experiments? In: 2015 IEEE/ACM 37th IEEE International Con-ference on Software Engineering, vol. 1, pp. 666–676. IEEE (2015)39. Sieber, J.E.: Protecting research subjects, employees and researchers: Implications forsoftware engineering. Empirical Software Engineering (4), 329–341 (2001)40. Singer, J., Vinson, N.G.: Ethical issues in empirical studies of software engineering.IEEE Transactions on Software Engineering (12), 1171–1180 (2002)41. Stol, K.J., Fitzgerald, B.: The abc of software engineering research. ACM Transactionson Software Engineering and Methodology (TOSEM) (3), 1–51 (2018)8 Grischa Liebel, Shalini Chakraborty42. Storey, M.A., Phillips, B., Maczewski, M.: Is it ethical to evaluate web-based learningtools using students? Empirical Software Engineering (4), 343–348 (2001)43. Storey, M.A., Williams, C., Ernst, N.A., Zagalsky, A., Kalliamvakou, E.: Methodologymatters: How we study socio-technical aspects in software engineering. arXiv pp. arXiv–1905 (2019)44. Sturm, A., Kramer, O.: Evaluating the productivity of a reference-based programmingapproach: A controlled experiment. Information and Software Technology (10), 1390–1402 (2014)45. Svahnberg, M., Aurum, A., Wohlin, C.: Using students as subjects-an empirical evalua-tion. In: Proceedings of the Second ACM-IEEE international symposium on Empiricalsoftware engineering and measurement, pp. 288–290 (2008)46. Tu, Y.C., Tempero, E., Thomborson, C.: An experiment on the impact of transparencyon the eﬀectiveness of requirements documents. Empirical Software Engineering (3),1035–1066 (2016)47. Vinson, N.G., Singer, J.: A practical guide to ethical research involving humans. In:Guide to Advanced Empirical Software Engineering, pp. 229–256. Springer (2008)48. Voigt, P., Von dem Bussche, A.: The eu general data protection regulation (gdpr). APractical Guide, 1st Ed., Cham: Springer International Publishing (2017)thical Issues in Empirical Studies using Student Subjects 39 A List of Included Venues

The list of publication venues is as follows:

Table 4: Publication Venues with Acronyms

Acronym VenueRE IEEE International Requirements Engineering ConferenceSSR ACM Symposium on Software ReuseER International Conference on Conceptual ModellingHICSS Hawaii International Conference on System SciencesAOSD Aspect-Oriented Software DevelopmentICST International Conference on Software Testing, Veriﬁcation and ValidationMODELS International Conference on Model Driven Engineering Languages and Sys-temsMSR IEEE International Working Conference on Mining Software RepositoriesISSTA International Symposium on Software Testing and AnalysisICSME IEEE International Conference on Software Maintenance and EvolutionECSA European Conference on Software ArchitectureASE Automated Software Engineering ConferenceISSRE International Symposium on Software Reliability EngineeringESEM International Symposium on Empirical Software Engineering and Measure-mentEASE International Conference on Evaluation and Assessment in Software Engineer-ingESEC/FSE European Software Engineering Conference and the ACM SIGSOFT Sympo-sium on the Foundations of Software EngineeringICSE International Conference on Software EngineeringOOPSLA ACM Conference on Object Oriented Programming Systems Languages andApplicationsTSE IEEE Transactions on Software EngineeringEMSE Springer Empirical Software EngineeringTOSEM ACM Transactions on Software Engineering and MethodologyASEJ Springer Automated Software EngineeringIST Elsevier Information & Software TechnologyREJ Springer Requirements EngineeringSOSYM Springer Software & Systems ModelingSQJ Springer Software Quality JournalJSS Elsevier Journal of Systems and SoftwareJSEP Wiley Journal of Software: Evolution and ProcessSTVR Wiley Software Testing, Veriﬁcation & ReliabilitySPE Wiley Software: Practice & ExperienceIETS IET SoftwareIJSEKE World Scientiﬁc Int. Journal of Software Engineering and Knowledge Engi-neering

B Mapping Study: Data Extraction

To extract the study conditions and key metrics from the primary studies, we followed thefollowing process.0 Grischa Liebel, Shalini Chakraborty1. Check whether paper should actually be included. By default, papers are included in thisstep. Only exclude if one of the following points applies. Whenever a paper is excluded,note down the reason. – Is it at least 8 pages long? If not, exclude. – Is it published at one of the included venues (see Appendix A)? If not, exclude.Note that only the technical tracks at the conferences are included, no workshops or‘special’ tracks (like SEET at ICSE). Only exclude based on the Publication Titleﬁeld, or on the venue name printed in the PDF – no extra online search. – If there are no student subjects or no empirical study, exclude (this is also caughtby the search later on).2. Extract study information. First, do keyword search. If necessary (if any of the ﬁvecategories below remain unanswered), read introduction, method (usually needed forthe study type), and threats to validity. – Which study type is it? Use the same term as used in the paper (e.g., “Controlledexperiment”, “Family of quasi-experiments”, “qualitative study”). If there is noempirical study conducted, exclude the paper. – How many students are participating in the study? If students and professionalsubjects participate, count only the students. If it cannot clearly be determinedhow many students participated, use ’NA’ (as long as it’s clear that students indeedparticipated, otherwise exclude).

Keyword search (one after the other, until you ﬁnd the information): student, subject, graduate, master, bachelor, recruit, invit, participa – Was it voluntary to participate? Possible values: voluntary (also if it is mentionedthat subjects were recruited openly, e.g., “via ﬂyers”, “university-wide”, “via mail-ing lists”), part of a course (if it is only mentioned that participation was in thescope of a course, not whether it was voluntary or mandatory), mandatory, and NA.

Keyword search:

Same as for voluntariness. Additionally: withdraw, option, vol-unt, mandatory, compulsory – How were (student) subjects compensated? Use the same term as used in the paper.If it cannot clearly be determined, use ’NA’.

Keyword search: ﬁnanc, money, monetary, reward, extra, bonus, compensat,voucher, receiv, gift, points – Was ethical approval obtained? Yes, No, Other, or NA. ’Other’ if it is describedthat mandatory procedures (e.g., from university, state, or country) were followed,but it is clear that this is not an ethical approval.

Keyword search: ethic, irb, board, approv, consent

C Questionnaire

The online questionnaire consisted of the following questions. The entire questionnaire withintroduction and ﬁnal page, and the precise layout is found in the online dataset [29].1. In which country are you employed?If several, state the country in which you spend most of your work time. (Free-text with suggestions based on standard country list)

2. What is your highest degree? (Higher education entrance qualiﬁcation, Bachelor degree (or equivalent),Master degree (or equivalent), PhD degree (or equivalent), Habilitation de-gree (or equivalent), Other (free text))

3. What is your current, or last held, academic rank?(For example, doctoral student, associate professor) (Free text)

4. Have you previously conducted an empirical study which involved student subjects?Note: We understand “subject” broadly in this survey. That is, studies such as a casestudy involving students would also be included. (Yes/No) thical Issues in Empirical Studies using Student Subjects 41

New Page

5. Which of the following statements regarding ethics approval processes applies to yourwork? (Single selection with free-text option)–

For studies with student subjects, I am required to obtain ethical approval – I am not required to obtain ethical approval, but there are mandatory steps forhuman-subject studies from my employer/the state (clarify details, if any) – Neither of the above (clarify details, if any)6. How many studies with student subjects have you performed in the last 5 years? ( >

10 studies, 6-10 studies, 1-5 studies, None)

7. In your most recent research study with students, did you recruit your own students?We refer to “own” students as students that are either supervised (thesis project, PhDsupervision) by the respondent, or take a course given by the respondent. (Yes/No)

8. In your most recent research study with students, was their participation mandatory orvoluntary? (Voluntary, Voluntary, but part of a graded course component (e.g., assign-ment), Mandatory)

9. In your most recent research study with students, what forms of compensation did youoﬀer to student subjects?Please mark all options that apply. (None (besides potential learning experience), Bonus points in course, Mon-etary reward (e.g., ﬁxed rate, a raﬄe/lottery), Snacks/Food, Other (pleasespecify))

10. In your most recent research study with students, did the subjects in your study/studiesgive consent to participate? (Yes/No)

11. In your most recent research study with students, did the subjects in your study/studieshave the option to withdraw from the study at any time? (Yes/No)

12. Based on your answers above, would you like to clarify any details? (Free text)New Page

13. Please state your agreement to the following statements regarding educational value ofresearch studies. (Per item: 5-point Likert agreement scale and “don’t know” option)–

If course content is prescribed by a curriculum, it should not be changed just tomake a research study ﬁt in the course. – Research studies should be connected to course projects. – Reviewers (of a study design/protocol) should pay close attention to students re-ceiving adequate educational value from the research study.14. Please state your agreement to the following statements regarding consent. (Per item: 5-point Likert agreement scale and “don’t know” option)–

Every research study involving student subjects should be based on informed con-sent. – Student subjects should be permitted to withdraw from a research study at anytime. – Participation in a research study may be mandatory if it ﬁts into the course context. – Participation in a research study conducted in a course should always be voluntaryfor students.15. Please state your agreement to the following statements regarding the course-studyrelationship.The following statements all apply to a situation in which a course instructor conductsa research study within his/her course. (Per item: 5-point Likert agreement scale and “don’t know” option)

From an ethical standpoint, it is acceptable...2 Grischa Liebel, Shalini Chakraborty – to use the enrolled students as subjects in a research study. – to base a part of the assessment on the participation in a research study. – to base a part of the assessment on the performance in a research study. – to encourage students to participate in a research study. – to withhold information from the students with respect to the study goals. – to withhold information from the students as to how the data they provide will beused. – for the instructor to know who participated in the research study. – that the researcher conducting the research study is the same person as the courseinstructor.16. According to your opinion, how relevant is it to include the following information in apublication that uses student subjects? (Per item: 5-point Likert scale from “very irrelevant” to “very relevant”,and “don’t know” option)– Status of ethical approval or similar measures. – Voluntariness of student participation. – Compensation of student subjects. – Use of informed consent.17. Do you have additional comments? (Free text)

D Declarations

D.1 Funding

University faculty funding with no external funding involved.

D.2 Conﬂicts of interest/Competing interests

Not applicable.

D.3 Availability of data and material

The data used in this manuscript is published on Zenodo [29].