[PDF] Effects of Early Warning Emails on Student Performance

Abstract

We use learning data of an e-assessment platform for an introductory mathematical statistics course to predict the probability of passing the final exam for each student. Based on these estimated probabilities we sent warning emails to students in the next cohort with a low predicted probability to pass. We analyze the effect of this treatment and propose statistical models to quantify the effect of the email notification. We detect a small but imprecisely estimated effect suggesting effectiveness of such interventions only when administered more intensively.

Full PDF

EEﬀects of Early Warning Emails on Student Performance

Till Massing ∗ a , Natalie Reckmann a , Jens Klenke a , Benjamin Otto † b , Christoph Hanck a ,and Michael Goedicke baFaculty of Business Administration and Economics, University of Duisburg-Essen, Universit¨atsstr. 12, 45117 Essen, Germany b paluno - The Ruhr Institute for Software Technology, University of Duisburg-Essen, Gerlingstr. 16, 45127 Essen, Germany February 18, 2021

Abstract

We use learning data of an e-assessment platformfor an introductory mathematical statistics course topredict the probability of passing the ﬁnal exam foreach student. Based on these estimated probabilitieswe sent warning emails to students in the next cohortwith a low predicted probability to pass. We analyzethe eﬀect of this treatment and propose statisticalmodels to quantify the eﬀect of the email notiﬁca-tion. We detect a small but imprecisely estimatedeﬀect suggesting eﬀectiveness of such interventionsonly when administered more intensively.

Modern university courses often use formative andsummative e-assessment systems. Especially whencourses have a large number of participants suchtools are useful to give students individual feedback.Courses with quantitative content such as statisticsand introductory mathematics are particularly suit-able for e-assessment. This is because ﬁll-in exer-cises – which require students to submit a numericanswer – conveniently allow to assess whether stu-dents can solve a task. The e-assessment systemJACK is a framework for delivering and grading com-plex exercises of various kinds. It was originally cre-ated to check programming exercises in Java [37], ∗ Corresponding author. Email: [email protected] † For more information regarding JACK contact this author. but has been extended to other exercise types suchas multiple-choice and ﬁll-in exercises [38, 36, 34].JACK oﬀers parameterizable content, meaning thatexercises can contain diﬀerent values each time an ex-ercise is practiced. This does not only mean that dif-ferent students get diﬀerent parameterizations but,also, that each student gets diﬀerent numbers eachtime he or she tackles an exercise. Hence, the taskremains challenging until he or she understands theexercise’s underlying concept.In addition to ﬁll-in exercises, JACK allows to de-sign exercises with dynamic programming content.For instance, JACK oﬀers exercises in R – the stan-dard statistical programming language see [31]. Pro-gramming exercises do not only prepare students formodern statistical work, but have also been shown tobe highly beneﬁcial for fostering their understandingof statistics, see [30, 25].In recent studies JACK data was analyzed to un-derstand students’ learning behavior more deeplyin an introductory mathematical statistics course.The high correlation between learning eﬀort in thesemester and the ﬁnal grades is well documented, see[26] and Section 2 for more examples.We predict probabilities of passing the ﬁnal examfor each student ahead of the ﬁnal exam of the mod-ule based on exercise points gained from the JACKexercises of the course in the previous cohort. Sub-sequently, we send warning emails to students of thecurrent cohort with a low probability of passing theexam to motivate them to study more intensively and1 a r X i v : . [ c s . C Y ] F e b hus allow a higher share of students to pass theexam. We note that these emails are administerednext to several other interventions implemented tomotivate students to learn. Summative assessment inJACK and quizzes during the lecture on the game-based learning platform Kahoot! give students thechance to get bonus points for the ﬁnal exam.In this paper, we aim to investigate the eﬀective-ness of warning emails we sent to the students duringthe semester. It turns out that the eﬀect of sendingwarning emails has a positive but insigniﬁcant eﬀecton the performance in the ﬁnal exam.The remainder of this paper is organized as fol-lows: Section 2 provides a brief overview of relatedwork. Section 3 introduces the statistics course ana-lyzed here. Section 4 presents the available data andthe models used. Section 5 discusses the empiricalresults. Section 6 concludes. The overall engagement of students is one of the maincovariates of academic success. In the case of math-ematical statistics, [35] show in a meta-study thatthe simultaneous use of traditional classroom lecturesand e-assessment has a positive eﬀect on students’success. [26] substantiate the previous result by ana-lyzing the learning activity on the e-assessment plat-form JACK. The study reveals that learning eﬀortand success, measured by the total number of (cor-rect) submissions on JACK over the course, positivelyaﬀects the ﬁnal grade in the exam. [30] add addi-tional R-programming exercises to the JACK frame-work and show that the newly introduced exercisetype helps to improve the general understanding offundamental statistical concepts and thus ultimatelyyields better results in the ﬁnal exam.Due to the empirically observed positive eﬀect ofa multitude of variables on academic performance,prediction of the latter has become possible. Herevarious statistical learning methods are applied toeducational data in order to predict student learn-ing outcomes. Often, this outcome is measured with Kahoot! is an application where a lecturer can conductmultiple-choice quizzes, https://kahoot.it. a binary response of pass/fail in order to be able toprovide an early-warning to students. [21, 17, 28] givea comprehensive overview of popular statistical learn-ing methods used in the literature. For an overview ofhow to implement an early-warning-system see, e.g.,[9].The literature has identiﬁed a number of importantpredictors. [16] ﬁnd evidence for the importance ofsocioeconomic and psychometric variables as well aspre-university grades, although [29] show that, espe-cially among the socioeconomic variables, the predic-tive capability can vary across countries. [7] addition-ally identify post-admission variables like obtainedcredits, degree of exam participation and exam suc-cess rate in previous courses to have an inﬂuence onstudents’ success. [24, 39, 14] analyze the learning ac-tivity on learning management systems and are ableto accurately predict students’ performance with ap-propriate variables. In a more assessment-based fash-ion [18, 10, 25] use activity in e-learning frameworksas well as the results of mid-term exams to predictstudents’ success in the ﬁnal exam. [6] identify theperformance in selected courses as a predictor for theacademic achievement at the end of the program. Fora literature review on educational data mining, see[32].Several studies show the possibility of predictionof academic performance early after the start of thecourse. [2, 1, 23, 8, 12] investigate developing earlyprediction systems in diﬀerent contexts of e-learningsystems. They all show a high accuracy of the pre-diction of students’ success early in the semester.[13] analyze the Moodle Learning Managment System(LMS), in which they predict students’ performance.They show that for the purpose of early interventionor conditional on in-between assessment grades, LMSdata are of little value.E-learning data allows to use early warning systemsto motivate students with a low probability to passthe course. There are several e-learning approacheswhich use early warning systems. The Purdue Uni-versity (West Lafayette), Indiana, developed a simi-lar early intervention system called “Course Signals”,see [4]. Students receive an email notiﬁcation as wellas signal lights (red, yellow, and green) on a traf-ﬁc signal to inform them about their learning status.25] analyze the retention and performance outcomesrealized since the implementation of Course Signals.The quantitative data indicate a strong impact onstudents’ grades and retention behavior. However,they did not measure the eﬀect of sending warningemails only. Instructors and students have providedinformation via surveys and interviews, which em-phasize the usage of the system. [33] designed an in-tervention engine, the Intelligent Intervention System(In2S), based on learning analytics. Students see sig-nal lights for each assessment task as an instructionalintervention. The system uses elements of gamiﬁca-tion such as a leader board, badges, and notiﬁcationsas motivational intervention. Learners using In2S in-dicate the usefulness of the system and want to useit in the context of other courses. The evaluations ofthese online-courses via interviews and surveys stressthe possibility to detect at-risk students early in thesemester.A suitable tool to quantify the eﬀect of early inter-ventions on students’ performance is the regressiondiscontinuity design (RDD). In this design there aretwo groups of individuals, in which one group getsa speciﬁc treatment. The value of a covariate lyingon either side of a ﬁxed threshold determines the as-signment to the two groups. The comparison of in-dividuals with values of the covariate just below thethreshold to those just above can be used to estimatethe eﬀect of the treatment on a speciﬁc outcome.[27] use regression discontinuity approaches to es-timate the eﬀect of delayed school enrollment on stu-dent outcomes. [3] use the regression discontinuityapproach to estimate the eﬀect of class size on testscores. [20] studied the eﬀect of remedial educationon student achievement using a regression disconti-nuity design. [15] examine the impact of the ReadingFirst program, a federal educational program in theUnited States to ensure that all children learn to readwell by the end of third grade.

This section outlines the structure of the analyzedcourse.The e-assessment system JACK was used for a lec- ture and exercise course in mathematical statisticsat the University of Duisburg-Essen, Germany. Itstarted with 802 undergraduate ﬁrst-year students. Itis obligatory for several business and economics pro-grams as well as in teachers’ training. Out of these802 students, only 337 took an exam at the end of thecourse, while the others dropped the course in thisterm (see Table 1). In addition to classical ﬁll-in andmultiple-choice exercises, the course also introducesthe statistics software R by oﬀering programming ex-ercises in the e-assessment system JACK where thecorrectness of students’ code is assessed.The course consisted of a weekly 2-hours lec-ture, which introduced concepts, and a 2-hours ex-ercise class, which presented explanatory exercisesand problems. Both classes were held classically infront of the auditorium. Due to the large numberof students, these classes are limited in addressingstudents’ diﬀerent speeds of learning and individualquestions. To overcome this and to encourage self-reliant learning, as well as to support students whohad diﬃculties to attend classes, all homework wasoﬀered on JACK.There were 175 diﬀerent exercises on JACK, ofwhich 43 were designed as R-programming exercisesand the remainder as ﬁll-in or multiple-choice exer-cises. The individual learning success is supportedby oﬀering speciﬁc automated feedback and optionalhints. In case of additional questions, which were nei-ther covered by hints nor feedback, the students wereable to ask questions in a Moodle help-forum.In order to further encourage students to studycontinuously during the semester, and not only inthe weeks prior to the exams, we oﬀered ﬁve onlinetests using JACK. These tests lasted 40 minutes atﬁxed time slots. All of the online tests contained ﬁll-in or multiple-choice exercises as well as one R exer-cise. Participation only required a device with inter-net access, but no compulsory attendance on campus.These summative assessments allowed students to as-sess their individual state of knowledge during thelecture period. It was not compulsory for students toparticipate at online tests in order to take the ﬁnalexam at the end of the course. Instead, we oﬀered upto 10 bonus points to encourage participation. Thebonus points were only added to ﬁnal exam points if3tudents, who. . . counts ∼ ∼ The maximum points astudent achieved in an exam (over both exams persemester) imply the ﬁnal grade. The correspondingexam will be denoted as the ﬁnal exam.

This section presents the data and model used for theanalysis. The raw data is collected from three diﬀer-ent sources. First, we collected each student’s home- Students obtain 6 “malus points” for each failed exam ofwhich they may collect at most 180 during their whole bachelorprogram. grade counts1 32 483 674 95 210 (cid:80) , i from the raw data:4 the number of submissions over the whole course • the number of submissions in a given period(e.g., between the ﬁrst and the second onlinetest) • The score, deﬁned as, letting t be a day duringthe semester, score it := n (cid:88) j =1 ζ ijt where ζ ijt is the number of points of the latestsubmission up to time t of student i in exercise j, j = 1 , . . . , n . Put diﬀerently, the score is thesum of points of the last submissions to everyexercise until day t and may be interpreted asthe learning progress of student i at time t . • a ranking of the students based on the score onthe 26 th of June 2019.To determine who should receive a treatment(warning mail) we considered two parameters. First,the JACK ranking was taken into account. From thisranking a categorical variable was created, where thebottom third was associated with a strong warning(2), the middle third with a mild warning (1), andthe top third with no warning (0).Second, the results from the ﬁrst 4 online testswhich were conducted until then were used. We useda Logit model to predict the probability that a stu-dent with these online test results would pass theexam. The model was trained with the data ob-tained from the same course given two years earlier,see [26, 25]. The predicted probability will serve asour running variable W in the RDD .These predictions were transformed to an ordinalvariable. If the predicted probability of passing theexam was larger than 0.4 the student was indicatedwith no message (0), between 0.4 and 0.15 with a mild warning (1), and with less than 0.15 with a strong warning (2). Lastly, these two indicators were com-bined to a single indicator and a mail was sent to thestudents to be treated.Given our data and treatment design, which wasnot randomly distributed, but rather based on the attendance1 0warning 1 183 4250 151 40Table 3: Distribution between warnings and atten-dance in at least one examprobability to pass the exam and the JACK ranking,we use the RDD to analyze the eﬀectiveness of ourintervention. The method allows us to compare students aroundthe cutoﬀ point and hence derive a possible treat-ment eﬀect. Our identifying assumption is that theparticipants around the cutoﬀ are similar with re-spect to other (important) properties - often referredto as quasi random. Additionally, more advanced ap-proaches allow to control for other inﬂuences. To dis-tinguish between the diﬀerent RD designs ﬁrst con-sider Y i = β + α treatment i + β W i + u i (1)and let treatment i = (cid:40) , W i ≤ c, , W i > c, where treatment i indicates if a student received anemail, which is determined by the threshold c , in ourcase 0.4 of the prediction to pass the exam W i . Y i isthe sum of points of student i in his/her (latest) ﬁnalexam and u i is the error term. For the analysis onlystudents who attended at least once in the ﬁnal examwere included ( n = 334). This design deterministi-cally assigns the treatment which means that only if W i ≤ c the student will receive the treatment. Thetreatment eﬀect is represented by α .The approach above is a sharp RDD since the twogroups (treatment, no treatment) are perfectly sep-arated by the cutoﬀ. As our treatment group is de-termined by two diﬀerent variables - probability to We also investigated alternative modeling approaches likepropensity score matching. However, the results were similarand RDD seems to be most suitable given the problem at hand. fuzzy RDD . In this case only the probabilityof receiving the treatment needs to increase consider-ably at the cutoﬀ and not from 0 to 1 as in the sharpdesign. This non-parametric approach estimates a local average treatment eﬀect (LATE) through an in-strumental variable (IV) setting [ ? ].Consider the following model Y i = β + α T i + δ W i + X T i βββ + u i (2) T i = γ + γ Z i + γ W i + ν i , (3)where (3) represents the ﬁrst stage of the IV estima-tion with T i denoting if a student received the treat-ment, the instrument Z i = 1 [ W i ≤ c ] indicating if astudent is below or above the cutoﬀ of 0.4 (treatment i in the sharp RDD), W i remains the predicted proba-bility to pass the exam, while ν i represents the errorterm. The ﬁtted values of T i are inserted into (2)where Y i again represents the sum of points of stu-dent i in his/her (latest) ﬁnal exam. u i represents theerror term, X i a covariate - here the sum of points ofonline tests - and α the (possible) treatment eﬀect.To determine a possible treatment eﬀect the follow-ing model assumptions must be met; (i) the runningvariable needs to be continuous around the cutoﬀ,see [ ? ]. Otherwise, participants might be able to ma-nipulate the treatment. (ii) the instrument Z onlyappears in equation (3) for T and not in equation (2)for Y and the general assumptions for the IV estima-tion must hold [11, pp. 883-885].For an IV estimation the two main requirementsare that the instrument does covary with the variable T and also the instrument must not covary with theerror term u i (exogeneity). In a RDD this assump-tions are met by construction of the approach sincethe instrument is a nonlinear (step) transformationof the running variable. [19, 22] Table 4 shows the treatment (warning = 1) and con-trol (warning = 0) group, as expected, to diﬀer sub- Figure 1: McCrary sorting teststantially. Both the performances in the online testsand JACK score (Testate 1-4 and Score Jun25) aswell as the - observable - eﬀort (Begin 2506) aremuch lower in the treatment group. These numberssuggest that a warning mail could have a possibleimpact.We ﬁrst check that the assumption of a continu-ous running variable with no jump at the cutoﬀ ismet. For this, we perform the [ ? ] sorting test whichtests continuity of the density of our running vari-able - the predicted probability to pass the exam -around the cutoﬀ. In order to estimate the eﬀect α correctly there must not be a jump in the density atthe cutoﬀ. Otherwise some participants could havemanipulated the treatment and the results would nolonger be reliable.Figure 1 displays the McCrary sorting test. In-spection suggests no major changes around the cut-oﬀ. The test conﬁrms this with a p -value of 0.509.Since there is no jump around the cutoﬀ and the stu-dents were not informed beforehand about the warn-ing email we can be relatively conﬁdent that the stu-dents were not able to manipulate the treatment.Apart from that, the incentive to worsen ones own Note that students may also or entirely learn outside of theJACK framework. However, since the ﬁnal exam was takenvia JACK students have a strong incentive to also learn on theplatform to get used to the framework. p -value of 0.976 and0.968 (respectively without covariates). The beneﬁtof comparing the mean outcomes of the left and rightside around the cutoﬀ rather than using polynomialregression is that this approach is more eﬃcient sincewe need to estimate fewer parameters [22].We use a bandwidth of 0.255 which was determinedwith the data driven approach of [ ? ]. At the bot-tom of Table 5 and 6, the F -test for the diﬀerentbandwidth choices are displayed. The F-test testsif the bandwidth is not too wide, since this wouldlead to a biased estimate of the eﬀect. This wouldviolate the assumption that the participants aroundthe cutoﬀ only diﬀer due to receiving the treatment.Since the F-test is not rejected for LATE and halfbandwidth (HALF-BW), but for double bandwidth(Double-BW) we conclude that the bandwidth of0.255 yields the most eﬃcient estimation without biasthrough the bandwidth choice. Therefore, only stu-dents with a probability inside the limits of 0.4 (cut- oﬀ) ± In this paper we analyzed whether students who per-form rather poorly in a current course can be pos-itively inﬂuenced by a warning mail. Our resultsof the RDD do not provide any evidence that thewarning mail has a signiﬁcant eﬀect on the results(or behavior) of the students. This might have sev-eral reasons. For instance a lot of the participantswho received a warning did not take part in any ﬁ-nal exam (see Table 3). This likely compromises thedetection of an eﬀect. There are several possible ex-planations for that. On the one hand, the warningmight lead to the eﬀect that the students are morelikely to postpone participation to a later semester.The email could give students the impression thatthe chances of getting a good grade are already rel-atively low. Therefore students might be more likelyto repeat the course a year later. On the other hand,the rate of students who did not take the exam arein our experience similar to the previous ones. In asense, this is also a positive outcome, as we then atleast prevent students from collecting malus points,cf. footnote 2.Another important aspect in this analysis is thatstudents can earn extra points through the onlinetests and the Kahoot! game. From the perspective ofthe students this is probably an even bigger incentivethan the warning mail.To conclude, we were not able to detect a signiﬁ-cant eﬀect of the warning mails in our design. Thisis still noteworthy because successful motivation ofweak and modest students remains challenging forinstructors. We will keep track of the warning maildesign in future editions of our course.

Acknowledgments

We thank all colleagues who contributed to the course“Induktive Statistik” in the summer term 2019.Part of the work on this project was funded by theGerman Federal Ministry of Education and Research7ariable warning count min Q0.25 median mean Q0.75 max sdexam points 0 151 3.40 17.50 25.30 23.80 30.00 47.00 8.201 183 0.00 8.55 15.60 16.10 23.10 39.20 9.52Testate 1-4 0 191 341.00 579.00 755.00 761.00 900.00 1425.00 221.001 607 0.00 0.00 0.00 98.50 167.00 700.00 133.00Score Jun25 0 191 0.00 2216.00 3520.00 3799.00 5174.00 11580.00 2206.001 425 0.00 200.00 983.00 1347.00 2033.00 8385.00 1363.00Begin 2506 0 191 0.00 74.50 125.00 128.00 204.00 677.00 125.001 425 0.00 5.00 25.00 47.80 60.00 499.00 67.40Table 4: Overview of empirical quartiles, mean and standard deviation for the response variable and con-sidered covariatesbandwidth Observations Estimate Std. Error z -value p -valueLATE 0.255 126 0.146 4.852 0.030 0.976Half-BW 0.127 54 -3.151 10.759 -0.293 0.770Double-BW 0.507 306 5.295 3.436 1.541 0.123 F -statistics F Num. DoF Denom. Dof p -valueLATE 0.257 4 121 0.905Half-BW 0.076 4 49 0.989Double-BW 8.902 4 301 8.313 e-07Table 5: Summary of the regression discontinuity model with covariates. At the top: The estimate of thetreatment eﬀect of the warning. At the bottom: An F -test for the diﬀerent bandwidth.under grant numbers 01PL11075 and 01 JA 1610. References [1] G. Ak¸capınar, A. Altun, and P. A¸skar. Usinglearning analytics to develop early-warning sys-tem for at-risk students.

International Journalof Educational Technology in Higher Education ,16(1):40, 2019.[2] G. Ak¸capınar, M. N. Hasnine, R. Majumdar,B. Flanagan, and H. Ogata. Developing an early-warning system for spotting at-risk students byusing ebook interaction logs.

Smart LearningEnvironments , 6(1):4, 2019. [3] Angrist, J. D. and Lavy, V. Using maimonides’rule to estimate the eﬀect of class size on scholas-tic achievement.

The Quarterly journal of eco-nomics , 114(2):533–575, 1999.[4] Arnold, K. E. Signals: Applying academic ana-lytics.

Educause Quarterly , 33(1):n1, 2010.[5] Arnold, K. E. and Pistilli, M.D. Course sig-nals at purdue: Using learning analytics to in-crease student success. In

Proceedings of the 2ndInternational Conference on Learning Analyticsand Knowledge , LAK ’12, pages 267–270. ACM,2012.[6] Asif, R., Merceron, A., Ali, S.A., and Haider,N.G. Analyzing undergraduate students’ perfor-8andwidth Observations Estimate Std. Error z -value p -valueLATE 0.255 126 0.193 4.889 0.040 0.968Half-BW 0.127 54 -2.867 10.075 -0.285 0.776Double-BW 0.510 306 6.662 3.274 2.035 0.041 F -statistics F Num. DoF Denom. Dof p -valueLATE 0.324 3 122 0.808Half-BW 0.088 3 50 0.966Double-BW 11.013 3 302 6.977 e-07Table 6: Summary of the regression discontinuity model without covariates. At the top: The estimate of thetreatment eﬀect of the warning. At the bottom: An F -test for the diﬀerent bandwidths.mance using educational data mining. Comput-ers & Education , 113:177–194, 2017.[7] Baars, G.J.A., Stijnen, T., and Splinter, T.A.W.A model to predict student failure in the ﬁrstyear of the undergraduate medical curriculum.

Health Professions Education , 3(1):5–14, 2017.[8] D. Ba˜neres, M. E. Rodr´ıguez, A. E. Guerrero-Rold´an, and A. Karadeniz. An early warningsystem to detect at-risk students in online highereducation.

Applied Sciences , 10(13):4427, 2020.[9] bin Mat, U., Buniyamin, N., Arsad, P.M., andKassim, R. An overview of using academic ana-lytics to predict and improve students’ achieve-ment: A proposed proactive intelligent interven-tion. In , pages 126–130. IEEE, 2013.[10] Burgos, C., C. L, de la Pe˜na, D., Lara, J.A., andLizcano, D. and Mart´ınez, M.A. Data mining formodeling students’ performance: A tutoring ac-tion plan to prevent academic dropout.

Comput-ers & Electrical Engineering , 66:541–556, 2018.[11] A. C. Cameron and P. K. Trivedi.

Microecono-metrics : methods and applications . CambridgeUniv. Press, Cambridge [u.a.], 2005.[12] Y. Chen, Q. Zheng, S. Ji, F. Tian, H. Zhu, andM. Liu. Identifying at-risk students based on the phased prediction model.

Knowledge andInformation Systems , 62(3):987–1003, 2020.[13] R. Conijn, C. Snijders, A. Kleingeld, andU. Matzat. Predicting student performance fromlms data: A comparison of 17 blended coursesusing moodle lms.

IEEE Transactions on Learn-ing Technologies , 10(1):17–29, 2016.[14] Elbadrawy, A., Studham, R.S., and Karypis, G.Collaborative Multi-Regression Models for Pre-dicting Students’ Performance in Course Activ-ities. In

Proceedings of the Fifth InternationalConference on Learning Analytics And Knowl-edge , LAK ’15, pages 103–107. ACM, 2015.[15] Gamse, B. C., Jacob, R.T., Horst, M., Boulay,B., and Unlu, F. Reading ﬁrst impact study.ﬁnal report. ncee 2009-4038.

National Center forEducation Evaluation and Regional Assistance ,2008.[16] Gray, G., McGuinness, C., and Owende, P.An application of classiﬁcation models to pre-dict learner progression in tertiary education.In

International Advance Computing Conference(IACC) , pages 549–554. IEEE, 2014.[17] Hastie, T., Tibshirani, R., and Friedman, J.

TheElements of Statistical Learning . Springer, NewYork, 2009.918] Huang, S. and Fang, N. Predicting student aca-demic performance in an engineering dynamicscourse: A comparison of four types of predictivemathematical models.

Computers & Education ,61:133–145, 2013.[19] G. W. Imbens and T. Lemieux. Regression dis-continuity designs: A guide to practice.

Journalof econometrics , 142(2):615–635, 2008.[20] Jacob, B. A. and Lefgren, L. Remedial edu-cation and student achievement: A regression-discontinuity analysis.

Review of economics andstatistics , 86(1):226–244, 2004.[21] James, G., Witten, D., Hastie, T., and Tibshi-rani, R.

An Introduction to Statistical Learn-ing: with Applications in R . Springer, New York,2013.[22] D. S. Lee and T. Lemieux. Regression disconti-nuity designs in economics.(report).

Journal ofEconomic Literature , 48(2):281, 2010.[23] O. H. Lu, A. Y. Huang, J. C. Huang, A. J. Lin,H. Ogata, and S. J. Yang. Applying learning an-alytics for the early prediction of students’ aca-demic performance in blended learning.

Journalof Educational Technology & Society , 21(2):220–232, 2018.[24] Macfadyen, L.P. and Dawson, S. Mining LMSdata to develop an ”early warning system” foreducators: A proof of concept.

Computers &Education , 54(2):588–599, 2010.[25] Massing, T., Reckmann, N., Otto, B., Hermann,K.J., Hanck, C., and Goedicke, M. Klausurprog-nose mit Hilfe von E-Assessment-Nutzerdaten.In

DeLFI 2018 - Die 16. E-Learning FachtagungInformatik , pages 171–176, 2018.[26] Massing, T., Schwinning, N., Striewe, M.,Hanck, C., and Goedicke, M. E-assessmentusing variable-content exercises in mathemati-cal statistics.

Journal of Statistics Education ,26(3):174–189, 2018. [27] McEwan, P. J. and Shapiro, J. S. The beneﬁts ofdelayed primary school enrollment discontinuityestimates using exact birth dates.

The Journalof human Resources , 43(1):1–29, 2008.[28] Meier, Y., Xu, J., Atan, O., and van der Schaar,M. Predicting grades.

IEEE Transactions onSignal Processing , 64(4):959–972, 2016.[29] Oskouei, R.J. and Askari, M. Predicting Aca-demic Performance with Applying Data MiningTechniques (Generalizing the results of Two Dif-ferent Case Studies).

Computer Engineering andApplications Journal , 3(2):79–88, 2014.[30] Otto, B., Massing, T., Schwinning, N., Reck-mann, N., Blasberg, A., Schumann, S., Hanck,C., and Goedicke, M. Evaluation einer Statis-tiklehrveranstaltung mit dem JACK R-Modul.In

DeLFI 2017 - Die 15. e-Learning Fachta-gung Informatik, Lecture Notes in Informatic,Gesellschaft f¨ur Informatik , pages 75–86, 2017.[31] R Core Team.

R: A Language and Environmentfor Statistical Computing . R Foundation for Sta-tistical Computing, Vienna, Austria, 2019.[32] Romero, C. and Ventura, S. Educational DataMining: A Review of the State of the Art.

IEEE Transactions on Systems, Man, and Cy-bernetics, Part C (Applications and Reviews) ,40(6):601–618, 2010.[33] M. S¸ahin and H. Yurdug¨ul. An intervention en-gine design and development based on learninganalytics: the intelligent intervention system (in2 s).

Smart Learning Environments , 6(1):18,2019.[34] Schwinning, N., Striewe, M., Massing, T.,Hanck, C., and Goedicke, M. Towards digitali-sation of summative and formative assessmentsin academic teaching of statistics. In

Proceedingsof the Fifth International Conference on Learn-ing and Teaching in Computing and Engineer-ing , 2017.1035] Sosa, G.W., Berger, D.E., Saw, A.T., and Mary,J.C. Eﬀectiveness of computer-assisted instruc-tion in statistics: A meta-analysis.

Review ofEducational Research , 81(1):97–128, 2011.[36] Striewe, M. An architecture for modular gradingand feedback generation for complex exercises.

Science of Computer Programming , 129:35–47,2016.[37] Striewe, M., Balz, M., and Goedicke, M. A Flexi-ble and Modular Software Architecture for Com-puter Aided Assessments and Automated Mark-ing. In

Proceedings of the First InternationalConference on Computer Supported Education(CSEDU), 23 - 26 March 2009, Lisboa, Portu-gal , volume 2, pages 54–61. INSTICC, 2009.[38] Striewe, M., Zurmaar, B., and Goedicke,M. Evolution of the E-Assessment Frame-work JACK. In

Gemeinsamer Tagungsbandder Workshops der Tagung Software Engineer-ing 2015. , pages 118–120, 2015.[39] Wolﬀ, A., Zdrahal, Z., Nikolov, A., and Pan-tucek, M. Improving retention: Predicting at-risk students by analysing clicking behaviour ina virtual learning environment. In