[PDF] Characterizing Student Engagement Moods for Dropout Prediction in Question Pool Websites

Abstract

Problem-Based Learning (PBL) is a popular approach to instruction that supports students to get hands-on training by solving problems. Question Pool websites (QPs) such as LeetCode, Code Chef, and Math Playground help PBL by supplying authentic, diverse, and contextualized questions to students. Nonetheless, empirical findings suggest that 40% to 80% of students registered in QPs drop out in less than two months. This research is the first attempt to understand and predict student dropouts from QPs via exploiting students' engagement moods. Adopting a data-driven approach, we identify five different engagement moods for QP students, which are namely challenge-seeker, subject-seeker, interest-seeker, joy-seeker, and non-seeker. We find that students have collective preferences for answering questions in each engagement mood, and deviation from those preferences increases their probability of dropping out significantly. Last but not least, this paper contributes by introducing a new hybrid machine learning model (we call Dropout-Plus) for predicting student dropouts in QPs. The test results on a popular QP in China, with nearly 10K students, show that Dropout-Plus can exceed the rival algorithms' dropout prediction performance in terms of accuracy, F1-measure, and AUC. We wrap up our work by giving some design suggestions to QP managers and online learning professionals to reduce their student dropouts.

Full PDF

1111Characterizing Student Engagement Moods for DropoutPrediction in Question Pool Websites

REZA HADI MOGAVI,

Hong Kong University of Science and Technology, Hong Kong SAR

XIAOJUAN MA,

Hong Kong University of Science and Technology, Hong Kong SAR

PAN HUI,

Hong Kong University of Science and Technology & University of Helsinki, Hong Kong SAR &FinlandProblem-Based Learning (PBL) is a popular approach to instruction that supports students to get hands-ontraining by solving problems. Question Pool websites (QPs) such as LeetCode, Code Chef, and Math Playgroundhelp PBL by supplying authentic, diverse, and contextualized questions to students. Nonetheless, empiricalfindings suggest that 40% to 80% of students registered in QPs drop out in less than two months. This researchis the first attempt to understand and predict student dropouts from QPs via exploiting students’ engagementmoods. Adopting a data-driven approach, we identify five different engagement moods for QP students, whichare namely challenge-seeker , subject-seeker , interest-seeker , joy-seeker , and non-seeker . We find that studentshave collective preferences for answering questions in each engagement mood, and deviation from thosepreferences increases their probability of dropping out significantly. Last but not least, this paper contributesby introducing a new hybrid machine learning model (we call Dropout-Plus) for predicting student dropoutsin QPs. The test results on a popular QP in China, with nearly 10K students, show that Dropout-Plus canexceed the rival algorithms’ dropout prediction performance in terms of accuracy, F1-measure, and AUC. Wewrap up our work by giving some design suggestions to QP managers and online learning professionals toreduce their student dropouts.CCS Concepts: • Human-centered computing → HCI theory, concepts and models ; •

Applied comput-ing → Interactive learning environments ; •

Computing methodologies → Machine learning approaches .Additional Key Words and Phrases: Question Pool website (QP), online judge, Problem-Based Learning (PBL),online learning, engagement mood, dropout prediction.

ACM Reference Format:

Reza Hadi Mogavi, Xiaojuan Ma, and Pan Hui. 2021. Characterizing Student Engagement Moods for DropoutPrediction in Question Pool Websites.

J. ACM

37, 4, Article 111 (August 2021), 22 pages. https://doi.org/10.1145/1122445.1122456

Problem-Based Learning (PBL) is a student-centered approach to instruction where students learnthrough solving problems [82]. Question Pool websites such as LeetCode, Code Chef, Timus, Jutge,and Math Playground support PBL by supplying students with a variety of questions, quizzes, andcompetitions in different subjects [85, 118, 120]. However, empirical statistics from these websites

Authors’ addresses: Reza Hadi Mogavi, [email protected], Hong Kong University of Science and Technology, HongKong SAR; Xiaojuan Ma, [email protected], Hong Kong University of Science and Technology, Hong Kong SAR; Pan Hui,[email protected], Hong Kong University of Science and Technology & University of Helsinki, Hong Kong SAR & Finland.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2021 Association for Computing Machinery.0004-5411/2021/8-ART111 $15.00https://doi.org/10.1145/1122445.1122456 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021. a r X i v : . [ c s . H C ] F e b show that 40% to 80% of the registered students in QPs tend to drop out before completing theirsecond-month of membership [22, 54, 64, 87, 106]. Having said this, practical insights into thisphenomenon can help educators and online learning professionals to improve their QP designsand reduce dropouts.Nevertheless, the majority of empirical studies to date in the area of computer-mediated edu-cation are only focused on studying dropouts from Massive Open Online Courses (MOOCs) andCommunity Question Answering (CQA) websites [25, 72, 76, 102]. Therefore, there is a researchgap in the literature for studying student dropouts in comparatively new platforms like QPs. Ourresearch aims to fill this gap and inspect the problem of student dropouts in QPs through thelens of student engagement moods. By doing so, we draw Human-Computer Interaction (HCI)and Computer Supported Cooperative Work (CSCW) researchers’ attention to the importance ofpersonalization in QPs. More formally, this work answers three research questions as follows: • RQ1:

What are student engagement moods in QPs? • RQ2:

How are student engagement moods and dropout rates correlated? • RQ3:

Can student engagement moods help to predict student dropouts more precisely?We utilize a probabilistic graphical model, known as Hidden Markov Model (HMM), to extractand visually distinguish different student engagement moods in QPs. We identify five dominantstudent engagement moods, which are (E1) challenge-seeker , (E2) subject-seeker , (E3) interest-seeker , (E4) joy-seeker , and (E5) non-seeker . We distinguish each mood according to students’ data-drivenbehavioral patterns that emerge in the process of interacting with QPs. We are inspired by theHexad user types of Tondello et al. (see [108]) for naming the extracted engagement moods, butthe context and concepts we introduce are genuine and specialized for QPs.To the best of our knowledge, this work is the first research that casts a typology for studentbehaviors in QPs. By adopting a data-driven approach, we identify some distinctive behavioralpatterns for different QP students. For example, when students are in the challenge-seeker mood,they search for challenging types of questions that are commensurate with their high-level skills.Students who are in the subject-seeker mood are described best as mission or task-oriented individ-uals. Interestingly, they do not search much to find their questions and often restrict themselvesto a predefined study plan around specific subject matter and contexts. Students in this mood aremore in need of a mentor, guide, or a study plan to keep them focused and help them find theirquestions easily. When students are in an interest-seeker mood, they rummage around for a varietyof topics to find their questions of interest. However, they do not chase the challenging questionsas challenge-seekers do.Furthermore, we notice that the students in joy-seeking and non-seeking moods are not ascommitted as students in the other moods to study and exercise their knowledge. Joy-seekerstudents have a high tendency to game the platform or misuse it for purposes other than education[7, 9, 31]. Technically speaking, students who exploit the platform’s properties rather than theirknowledge or skills to become successful in an educational platform are considered to be gamingthe platform [8]. Finally, students in a non-seeker mood tend to leave the platform earlier thanstudents in other moods. These students are seldom determined to answer any questions. They onlycheck the platform to see what is new and if any questions can attract their attention by chance.These findings underline the familiar point that a one-size design QP does not fit all students[28, 65, 104].We also find that students have collective preferences for answering questions in each engagementmood, and deviation from those preferences increases their probability of dropping out significantly. Depending on the context of research, the terms attrition , churning , and dropping out can be used interchangeably to implysimilar concepts.J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021. haracterizing Student Engagement Moods for Dropout Prediction in Question Pool Websites 111:3 Finally, inspired by the HMM findings and insights about student dropouts, we feed HMM results toa Long Short-Term Memory (LSTM) recurrent neural network to find if it can improve the accuracyof dropout predictions compared with a plain LSTM model and five other baselines, includingan XGBoost model, Random Forest, Decision Tree (DT), Logistic Regression, and Support VectorMachine (SVM) [76, 110]. We notice improvements in the accuracy of dropout predictions whenHMM and LSTM recurrent neural networks are combined. More precisely, we reach an accuracy of78.22%, F1-measure of 81.28%, and AUC measure of 89.10%, which until now set the bar for dropoutprediction models in QPs.

Contributions.

This work is important to HCI and CSCW because it presents the first typologyand dropout prediction model for QP students. Furthermore, it reinforces the need for personalizingQP websites by revealing that different students have different preferences for selecting andanswering QP questions. Such knowledge could help QPs to make more informed design decisions.We also provide some design suggestions for QP managers and online learning professionals toreduce student dropouts in QPs.

Question Pool websites (QP) provide students with a collection of questions to learn and practicetheir knowledge online [120]. The QPs such as Timus, Jutge.org, Optil.io, Code Chef, and HDUvirtual judge are among the most popular web-based platforms for code education [86, 117, 118, 120].These platforms usually include a large repository of programming questions from which studentschoose to answer. The students submit their solutions to the QP and wait for the feedback to findif their code is correct. Dropout prediction is a challenging but necessary study for ensuring thesustainability and service continuation of these platforms.In this paper, we concentrate on a popular and publicly accessible QP in China that is knownas HDU virtual judge platform (henceforth HDU). The website originally belongs to HangzhouDianzi University’s ACM team and is designed to provide students hands-on exercises to hone theirprogramming and coding skills [120]. HDU, on average, hosts more than one hundred studentsevery day and receives more than 300K programming code submissions every month. It is also afamiliar platform for researchers who work in the field of HCI [120]. Figure 1 shows snapshots ofHDU website.Similar to the literature works [34, 93, 124], we formulate the dropout prediction problem as abinary classification task. Our definition of dropout is similar to [72], which temporally splits adataset into observation and inspection periods. The students whose number of solution submissionsin the inspection period drops to less than 20% of the observation period are considered to bedropped students. Since QP platforms usually do not have fixed start and end time points likein MOOCs [76], we regulate the observation and inspection periods similar to CQA platforms[34, 72, 88]. We use equally long periods of time for observation and inspection time windows. The HCI and CSCW literature is abundant with various conceptualizations of engagement [4, 20, 32,36, 55, 105]. However, there is a lack of consensus on the definition of engagement [10]. The threemost widely used conceptualizations of engagement moods are behavioral , emotional , and cognitive [2, 39]. In the context of students’ learning research, behavioral engagement refers to attention,participation, and effort a student puts in doing his or her academic activities [2, 19, 78]. For example,students can be in off-task or on-task moods when doing their learning tasks [2]. The emotional http://code.hdu.edu.cn/ J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021. (a) Pool of Questions (b) Realtime Status of the QPFig. 1. Snapshots from HDU, the QP we study in this work. Figure (a) shows the pool of questions and eachquestion’s acceptance rate. Figure (b) shows the QP’s realtime status that helps students become aware ofthe evaluation of their own and their friends’ performances after each submission. engagement refers to students’ affective responses to learning activities and the individuals involvedin those activities [78, 83]. For instance, students might be concerned about how their instructorsperceive their performances. Finally, cognitive engagement is about how intrinsically invested andmotivated students are in their learning process [78]. For example, students might make mentalefforts to debate in an online forum [49].The complexity of discovering student engagement moods has resulted in the appearance ofa diversity of data mining techniques [4, 17, 74, 112]. K-means, the clustering algorithm, is oneof the most popular techniques many researchers use to extract naturally occurring typology ofstudents’ engagement moods [44, 74, 94]. Saenz et al. use K-means and exploratory cluster analysisto extract different engagement moods for college students [94]. By comparison of similarities anddissimilarities between different features, they characterize 15 different clusters. Furtado et al. use acombination of hierarchical and non-hierarchical clustering algorithms to identify the contributors’profiles in the context of CQA platforms [40]. They categorize CQA contributors’ behavior into10 types based on how much and how well they contribute to the platform over time. However,K-Means is more useful when features show Euclidean distances properties [47].Latent variable models like Hidden Markov Models (HMMs) are also quite popular. Faucon etal. use a semi-Markov Model for simulating and capturing the behavioral engagement of MOOCstudents [35]. They provide a graphical representation of the dynamics of the transitions betweendifferent states, such as forum participation or video watching. Mogavi et al. combine HMM and aReinforcement Learning Model (RLM) to capture users’ flow experience on a famous CQA platform[72]. Flow experience is a positive mental state in psychology that occurs when the challengelevel of an activity (e.g., answering a question, doing a task, or solving a homework problem)is commensurate with the user’s skill level [30, 72, 81]. Our paper is the first to use HMM tocharacterize students’ engagement moods in QPs. Understanding students’ engagement moods canhelp educators to manage students behaviors better and set more customized curricula [97]. Educational Data Mining (EDM) is a relatively new discipline that has recently caught the attentionof HCI and CSCW communities [29, 53, 68]. One of EDM’s primary interests is to know whetherstudents would drop out soon or continue their studies until the end of their courses or at least

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021. haracterizing Student Engagement Moods for Dropout Prediction in Question Pool Websites 111:5

Table 1. The summary of the benchmark dataset

Measure Number Min Max Median Mean SD

Answer Submission 1.2 M 1 62 5.23 7.14 18.53Accepted Answers 227 K 0 39 4.16 5.01 12.89Endurance (in minute) N/A 0 209.65 15.91 21.38 67.43Attendance Gap (in hour) N/A 1.18 673.38 134.22 192.48 125.68Number of Students = 9,941 Number of Dropped Students = 5,261 for a long time [12, 57, 92]. Qiu et al. introduce a latent variable model called LadFG based onstudents’ demographics, forum activities, and learning behaviors to predict students’ success incompleting the courses they start in XuetangX, one of the largest MOOCs in China [89]. They showthat having friends on MOOCs can increase students’ chances of success in receiving the finalcertificates of any course dramatically by three-fold, but surprisingly, being more active on theprogram does not guarantee the student will take the final certificate. Wang et al. propose a hybriddeep learning-based dropout prediction model that combines two architectures of ConvolutionalNeural Network (CNN) and Recurrent Neural Network (RNN) to advance the accuracy of dropoutprediction models in MOOCs [115]. They show that their model can achieve a high accuracycomparable to feature-engineered data mining methods. As another example, Xing et al. proposea simple deep learning-based model with features such as students’ access times to the platformand their history of active days to predict dropouts [121]. They suggest finding students’ dropoutprobability on a weekly basis to take better measures in preventing students’ dropouts. Studentengagement is one key factor among all of these studies. In fact, engagement can be considered abasis for students’ retention, and a lack of it confronts and cancels any positive learning outcomes[125]. Therefore, we use this rationale to utilize students’ engagement moods to predict dropoutsin QPs.

After receiving the approval of our local university’s Institutional Review Board (IRB), we followthe ethical guidelines of AoIR for the study of student behavior on Hangzhou Dianzi University’sQP platform (known as HDU). The dataset we study includes near 10K student records in the rangefrom January 25th to July 15th, 2019 (172 days). We utilize the student data before April 21st forthe observation period feed of the prediction models, and the data after that for the inspection ofthe dropouts (similar to [77, 88]). More than half of the students drop out of the platform in theinspection period. We exclude the students with no submissions in the observation period fromour study to avoid the inclusion of the students who have already dropped out and to alleviate theproblem of imbalanced class labels between dropped and continuing students (see [61, 72]). Table 1summarizes the main statistics of our dataset. We use an unsupervised HMM for decoding student engagement moods, and a supervised LSTMnetwork for predicting dropouts. Both components are trained with the student data features duringthe observation period. The HMM inputs include simple observable features that imply student performance , challenge , endurance , and attendance gap states after each submission to the QP. Theparameters of the HMM are estimated by iterations of a standard Expectation Maximization (EM) https://aoir.org/ethics/ J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021. algorithm known as Baum-Welch [13]. Five dominant student engagement moods are identified,which are (E1) challenge-seeker, (E2) subject-seeker, (E3) interest-seeker, (E4) joy-seeker, and (E5)non-seeker. We run a user study with 26 local students to evaluate our HMM findings.After decoding the student engagement moods with the HMM, we associate the platform ques-tions with the engagement moods that are more likely to submit solutions for those questions. Wenotice that students have collective preferences for answering questions in each engagement mood,and deviation from those preferences increases their probability of dropping out significantly.Finally, we feed the generated engagement moods and questions’ associativity features alongwith other common features to an LSTM network to predict student dropout in the inspectionperiod. All the dropout predictions are reported based on the 10-fold cross-validation. Hidden Markov Models (HMMs) are statistical tools that help to make inferences about the latent(unobservable) variables through analyzing the manifest (observable) features [26, 58, 98]. In thecontext of education systems, HMMs are used for the detection of various phenomenons such as student social loafing [127], flow zone engagement [72], and academic pathway choices [42]. Thewidespread use of HMMs in the literature and the capability of capturing complex data structures[42] inspire our work to apply HMMs to help distinguish between different student engagementmoods in the QP platforms. We use the hmmlearn module in Python to train our model. • Inputs.

In order to build an HMM, we utilize the manifest features for student performance , challenge , endurance , and attendance gap . We pick the manifest features by performing an extensivethematic analysis [84, 116] across the literature about student engagement [43, 50, 73]. The featureswe use are as follows.- Performance: The feedback QP platforms provide each submission, such as if the answer is Wrong or Accepted .- Challenge: The past acceptance rate of a question, which is often shown along with a guidenext to each question, can resemble the challenge.- Endurance: The time students spend on the platform to answer questions and compile codesin one session is student endurance. Similar to [46], we define a “session” in QP platforms asa period of time in which the interval between two consecutive submissions does not exceedone hour. We measure the student endurance in minutes.- Attendance gap: The time interval between two consecutive sessions is a student’s attendancegap. We measure the attendance gap in hours.We should mention here that these features serve only as cues to infer students’ cognitive (per-formance and challenge) and behavioral (endurance and attendance gap) engagement moods[45, 59, 62, 80]. • Parameters.

Hereafter, we use a four-element vector 𝑂 𝑡 to refer to the corresponding studentobservations after each answer submission to the QP in time 𝑡 . The HMM assumes the observations 𝑂 𝑡 are generated by an underlying state space of hidden variables 𝑍 = { 𝑧 𝑖 } , where 𝑖 ≥ . For conve-nience, we use a triplet 𝜆 HMM = ( 𝐴, 𝐵, 𝜋 ) to denote the HMM we train for extraction of the studentengagement moods. The transition matrix 𝐴 shows the probabilities of moving between differentengagement moods over time. The emission matrix 𝐵 shows the conditional probability for anobservation 𝑂 𝑡 to be emitted (generated) from a certain engagement mood 𝑧 . The vector 𝜋 denotesthe initial probabilities of being in each of the engagement moods of 𝑍 . The initial probabilitiesare often assumed to be || 𝑍 || , with || 𝑍 || showing the cardinality of the hidden state space (i.e., thenumber of engagement moods) [72]. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021. haracterizing Student Engagement Moods for Dropout Prediction in Question Pool Websites 111:7

A I C B I C

N u m b e r o f H i d d e n S t a t e s

AIC

BIC

Fig. 2. Estimating the best number of hidden states, where AIC and BIC measures both take the least values. • Model training.

We optimize 𝜆 HMM parameters according to the student behavior in the obser-vation period with a standard EM algorithm known as Baum-Welch [13]. The aim is to optimizethe 𝜆 HMM parameters such that 𝑃𝑟 ( 𝜆 HMM | 𝑂 ) is maximized with 𝑂 = { 𝑂 𝑡 } . To avoid the local maximumproblem with the EM algorithm, we train the HMM with ten random seeds until they converge atthe global maximum.The HMM representation is completed by choosing the best number of hidden states [72]. Whilethis task appears to be simple conceptually, finding the best number of hidden states in a meaningfulway is quite challenging [72, 111]. The main reason is that an HMM with a small number of hiddenstates cannot capture the underlying behavioral kinetics adequately, and an HMM with too manyhidden states is difficult to interpret [79]. However, we need a criterion to compromise, and thuswe use the conventional Akaike (AIC) [96] and Bayes (BIC) [23] measures in our model training(also see [72]). Figure 2 plots the AIC and BIC measures against the number of hidden states in ourtrained HMMs (i.e., from 2 to 10 hidden states are tested). We choose an HMM with || 𝑍 || = hiddenstates since the global values of AICs and BICs measures are the lowest in this representation. Thesmaller AIC and BIC show more descriptive and less complicated 𝜆 HMM [72].

Similar to the literature, we use data distributions within each hidden state to characterize andvisualize the distinction between different engagement moods [5, 40, 72]. The features we inspecthere are the number of incorrect and accepted answers, the average ease of questions, the averagetime spent in the platform, the average time gap in attendance, and the number of repeatedsubmissions. They are inspired by the manifest features we had before, but instead of an individualstudent, they aim to examine all students’ collective behaviors in a specific hidden state. Thecumulative distribution function (CDF) of these features are plotted in Figure 3. We also utilizethe frequency plots of student submissions over different question IDs to demonstrate question-answering patterns in each hidden state (Figure 4). The number of submissions in each hidden stateis a value normalized between 0 to 100. This representation is to facilitate the visualization andcomparison of the patterns in different hidden states. The marked characteristics of each hiddenstate are as follows: • (E1) Challenge-seeker (Hidden State 1) : As shown in Figure 3c, the students in this mood arebest described for their tendency to look for more challenging questions, i.e., the questions withthe least acceptance rate. From Figures 3d and 3e, we find they also spend the longest average J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021.

Percentage (%)

N u m b e r o f I n c o r r e c t A n s w e r s

J o y -s e e k e r In te re s t-s e e k e r C h a lle n g e -s e e k e r S u b je c t-s e e k e r N o n -s e e k e r (a) Incorrect Answers

Percentage (%)

N u m b e r o f A c c e p t e d A n s w e r s

J o y -s e e k e r In te re s t-s e e k e r C h a lle n g e -s e e k e r S u b je c t-s e e k e r N o n -s e e k e r (b) Accepted Answers

Percentage (%)

A v e r a g e Q u e s t i o n E a s e

J o y -s e e k e r In te re s t-s e e k e r C h a lle n g e -s e e k e r S u b je c t-s e e k e r N o n -s e e k e r (c) Question Ease

Percentage (%)

A v e r a g e t i m e S p e n t i n P l a t f o r m ( m i n u t e )

J o y -s e e k e r In te re s t-s e e k e r C h a lle n g e -s e e k e r S u b je c t-s e e k e r N o n -s e e k e r (d) Spent Time

Percentage (%)

A v e r a g e t i m e G a p i n A t t e n d a n c e ( h o u r )

J o y -s e e k e r In te re s t-s e e k e r C h a lle n g e -s e e k e r S u b je c t-s e e k e r N o n -s e e k e r (e) Gap in Attendance

Percentage (%)

N u m b e r o f R e p e a t e d S u b m i s s i o n s

J o y -s e e k e r In te re s t-s e e k e r C h a lle n g e -s e e k e r S u b je c t-s e e k e r N o n -s e e k e r (f) Repeated SubmissionsFig. 3. CDF plots of the student engagement moods resolved time on the platform, and attend the platform more frequently in comparison with the otherengagement moods. Furthermore, Figure 4a shows that challenge-seekers show more interest inthe last questions of the platform. • (E2) Subject-seeker (Hidden State 2) : As shown in Figure 4b, students in this mood tend toanswer specific sets of questions. They usually answer the specific-context questions (e.g., greedyalgorithms) sequentially. Furthermore, from Figure 3d we can notice that subject-seekers comesecond after the students in the challenge-seeker mood for spending the longest time on theplatform. Figures 3a and 3b show that the students in the subject-seeker mood have the largestaverage number of incorrect answers on the platform, whereas their average number of acceptedanswers is quite similar to those of the challenge-seeker students (the distributions are also similar). • (E3) Interest-seeker (Hidden State 3): From Figure 4c, we realize that the students in this mooddo not answer specific-context questions and regularly search for their questions of interest. Thequestion types they answer has the largest variance. Furthermore, the distribution of the easinessof the questions shown in Figure 3c makes no tangible difference in comparison with the othermoods except the challenge-seeker mood. As shown in Figure 3b, the students in this mood hold thehighest average number of accepted answers after the students in the joy-seeker mood. Moreover,according to Figure 3a, interest-seekers’ distribution of producing incorrect answers is close to auniform distribution, which sharply distinguishes them from the other student moods. • (E4) Joy-seeker (Hidden State 4): As shown in Figures 3c and 3f, the students who are inthis mood tend to answer the easiest questions on the platform in a highly repetitive manner.Interestingly, these students choose their questions from a small and selective number of QPquestions (probably those with compilation loopholes) (see Figure 4d). Also, their number ofaccepted answers has the highest value among all the other moods (see Figure 3b). Based on these

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2021. haracterizing Student Engagement Moods for Dropout Prediction in Question Pool Websites 111:9