KKey Factor Not to Drop Out is to Attend the Lecture
Hideo HiroseHiroshima Institute of Technology, Hiroshima, Japan
Abstract
In addition to the learning check testing results performed at each lectures, wehave extended the factors to find the key dropping out factors. Among them are, thenumber of successes in the learning check testing, the number of attendances to thefollow-up program classes, and etc. Then, we have found key factors strongly re-lated to the students at risk. They are the following. 1) Badly failed students (scorerange is 0-39 in the final examination) tend to be absent for the regular classes andfail in the learning check testing even if they attended, and they are very reluctantto attend the follow-up program classes. 2) Successful students (score range is 60-100 in the final examination) attend classes and get good scores in every learningcheck testing. 3) Failed students but not so badly (score range is 40-59 in the fi-nal examination) reveal both sides of features appeared in score range of 0-39 andscore range of 60-100. Therefore, it is crucial to attend the lectures in order not todrop out. Students who failed in learning check testing more than half out of alltesting times almost absolutely failed in the final examination, which could causethe drop out. Also, students who were successful to learning check testing morethan two third out of all testing times took better score in the final examination.
Keywords: learning check testing, placement test, follow-up program, item re-sponse theory, multiple linear regression, final examination.
It is crucial to identify students at risk for failing courses and/or dropping out asearly as possible because a variety of students are now enrolled in universities andwe teachers have to educate them altogether. This circumstance prohibits us touse conventional methods such as a mass education method. However, the num-ber of staffs and classes are limited. New assisting systems using ICTs shall beintroduced to solve such a difficulty. To overcome this, we established online test-ing systems aimed at helping students who need further learning skills in math-ematics subjects. Such systems include 1) the learning check testing, the LCT,for every class to check if students comprehend the contents of lectures or not,2) the collaborative working testing, the CWT, for training skills with support-ers and teachers, and 3) the follow-up program testing, the FPT, to check if the a r X i v : . [ s t a t . O T ] D ec ollow-up program class members understand the standard level of the lectures.The system has been successfully operating (see [5], [6]), and some computationalresults were reported [8]. In addition, other relevant cases were well investigated(see [7], [9], [10], [11], [13], [15]).Using the accumulated data in the database, we may find some key factorsstrongly related to the students at risk, as indicated in [2], [3], [14], and [16], ifwe pay attention to learning analytics. Then, we may be able to actively make aappropriate decision for better learning methods. As indicated in [17], it is alsoimportant to analyze the data theoretically.This paper is aimed at obtaining effective learning strategies for students atrisk for failing courses and/or dropping out, using a large-scale of learning dataaccumulated from the follow-up program systems. They consists of the placementscores, every LCT scores, FPT success/failure times, FPC attendances, etc. In thispaper, we use the ability values for students’ learning skills obtained from the itemresponse theory (IRT, e.g., see [1], [4], [12]). Although the subjects we deal withare analysis basic (similar to calculus) and linear algebra, we show the case oflinear algebra as a typical case. The LCT is a kind of short-time testing using five questions in each LCT in the firstsemester in 2017. All the students in regular classes take the LCT for ten minutesvia the online testing system. All the questions are the same to each student, butthe order to each question to a student is sorted in a different order from the nextstudent. We have fourteen lectures with one midterm and one final examinationsin the semester; thus, the number of LCT is fourteen. We can estimate the studentsabilities to each LCT using the item response theory (IRT) evaluation method.Each problem in five items consists of multiple small questions. Students se-lect appropriate answers to each small question from many choices. We adoptsthe two-parameter logistic function P ( θ i ; a j , b j ) shown below instead of the three-parameter logistic function including pseudo-guessing parameter. P i , j = P ( θ i ; a j , b j ) = + exp {− . a j ( θ i − b j ) } = − Q i , j , (1)where θ i expresses the ability for student i , and a j , b j are constants in the lo-gistic function for item j , and they are called the discrimination parameter andthe difficulty parameter, respectively. Then, the likelihood for all the examinees, i = , , . . . , N , and all the items, j = , , . . . , n , will become L = N ∏ i = n ∏ j = (cid:16) P δ i , j i , j × Q − δ i , j i , j (cid:17) , (2)where δ i , j denotes the indicator function such that δ = δ = P i , j in Equation (1) is a logistic probability distribution function2ith unknown parameters a j and b j , and the random variable is θ i . However, a j , b j ,and θ i are all unknown here. We have to obtain the maximum likelihood estimatesfor a j and b j , and θ i simultaneously by maximizing L in Equation (2).However, as easily imagined with so small number of questions, the estimatedability values tend to have biases and the variances are large (see [13], [15]). Itwould be difficult to classify the students into a successful group and a failed groupin the final examination using each LCT result. Thus, we first use all the LCTresults in classifying.Figure 1 shows the histogram of estimated abilities of LCT to successful stu-dents overlaid the histogram of estimated abilities of LCT to failed students in thecase of linear algebra in the first semester in 2017. We can see that it would be dif-ficult to find the optimal discriminating threshold to success/failure students. Thenumbers of successful students is 898, and failed students is 145; the ratio of failedstudents to all the students is 0 . .
63 for successful students and − .
17 for failed students); the lowest estimates around − . − . .
11. Limited to failed students, thedecision tree predicted 107 students may fail, and 70 students actually failed; thehitting ratio is 65%. 3able 1: Confusion matrix determined by decision tree using full response matrix.predictedsuccessful failed totalsuccessful 861 37 898observed failed 75 70 145total 936 107 1043threshold = − . Attendance to the Lectures and the Follow-up ProgramClasses
Attendance/absence to classes is other discrete type information. Intuitively, wefeel that the more frequently attend the classes, the higher the scores of the fi-nal examination. Recently, it is often seen that attendance/absence information ismemorized to the database automatically using the electric card attendance checksystem. However, the system is not perfectly working; some students may disap-pear after exposing their cards.The LCT compensate this defect. The attendance information cannot be guar-anteed unless the testing is completed. Figure 3 shows that the attendance/absenceinformation are classified into three groups: the first is for score range is 60-100seen on the right in the figure, the second is for score range is 40-59 seen in themiddle, and third is for score range is 0-39 seen on the left. In these matrices, rowmeans the student id, and column means the question id. Using two kinds of at-tendance/absence information by electric cards (expressed by y shown below) andLCT results (expressed by x shown below), the value of each element, s , is deter-mined and is colored by the formula of s = x + y , where meanings of s , x , and y are indicated in Figure 4. The figure shows the scheme of the attendance/absenceinformation and LCT successful/failed information. For example, s =
55 meansthat a student was absolutely absent for the class, and s =
11 means that a studentis absolutely attended the class; they are also indicated in Figure 3.Since each element is colored by green to red according to s value from lowerto higher, red and orange colors indicate the absence or failed in the LCT, andgreen color indicate the success in the LCT. Obviously, three groups can be classi-fied clearly by these colors by looking at the figure. This indicates that the atten-dance/absence information may play a key role in determining the risk of a studentin addition to the LCT results. We first show the relationships among the factors we are concerned with in Figures5 and 6. In these figures, for example, we see that there is a strong relationshipbetween the LCT successes and the no requirement for the FPT (see first columnand sixth row in the figures), but it seems unclear which factors are key factors inclassifying the successful/failed groups. In this paper, however, we will not deeplydiscuss the dimension reduction problem. We are only interested in finding the keyfactors related to the risky students in the final examination. Thus, a much easiermethod will be taken in the following.Since we have known that the attendance/absence information may be effec-tive for classifying the students groups into successful/failed students in the finalexamination, we apply the multiple regression analysis of Y = X β in finding thekey factors; candidates of factors are shown in Figure 7.5igure 3: Three groups classified by using the attendance/absence information andLCT successful/failed information. 6igure 4: Scheme of the attendance/absence information and LCT successful/failedinformation.Figure 5: Relationships among the factors when the score range is 0-59.7igure 6: Relationships among the factors when the score range is 60-100.Figure 7: Factors in the multiple regression analysis.8pplying the multiple linear regression using the accumulated learning data,e.g., estimated LCT ability values, placement scores, class attendance/absence,follow-up class attendance/absence, and etc., we obtained the result shown in Fig-ure 8. Marked symbols by asterisks indicate that these factors are significant withgiven p -values in using R [18], the statistical computing and graphics languageand environment. The symbol of FPTnotrequired means that students took theLCT and successful, resulting no requirement for follow-up class attendance. Thatis, attendance/absence for FPT is the most significant information in deciding suc-cessful/failed students.Figure 8: Multiple linear regression analysis result.Therefore, we next focus on this factor. Figure 9 shows the 2-dimensionalrelationship between the number of successes in the LCT and the number of absentsfor the follow-up classes for the three groups, score ranges are 60-100, 40-59, and0-39 in the final examination. At a first glance, we can see that a clear linearrelationship between the number of successes in the LCT and the number of absentsfor the FPC when score range is 0-39. We also see some similarity between thecases score range 40-59 and the cases score range 60-100, but it is unclear. Sinceeach dot represents a student in the figure, overlaid dots representing the sameposition hinder the accumulated numbers of the students.Figure 10 shows the 3-dimensional bar charts representing the relationshipbetween the number of successes in the LCT and the number of absents for thefollow-up classes for the three groups, score ranges are 60-100, 40-59, and 0-39in the final examination. By looking at the figure, we find the following: 1) Whenscore range is 60-100, almost all the students show successful results in the LCTand very small number of absences for the FPC (almost all are not required theattendance for the FPC). 2) When score range is 0-39, we see a clear linear rela-tionship between the number of successes in the LCT and the number of absentsfor the FPC, which means that almost all the failed students in the LCT or studentsabsent for the classes ignore the attendance for the FPC. 3) When score range is40-59, students reveal both sides of features appeared in score range of 0-39 andscore range of 60-100. Some students tried to make effort to be successful, and9igure 9: 2-dimensional relationship between the number of successes in the LCTand the number of absents for the FPC.some were successful but unfortunately some were not. Therefore, we have foundthat failed students in the final examination were reluctant to attend the classes andshowed failed LCT results, and they were unwilling to attend the FPC in addition.As intuition suggests, the most crucial factor for the success in the final examina-tion is attendance to the class. We have been looking at some factors to classify successes and failures in thefinal examination. To investigate such factors much more precisely, more detailedinformation may be required. Thus, we have classified the successful group intofour groups such as A+, A, B, C, where scores in these groups are distributed to be90-100, 80-89, 70-79, 60-69. The possible factor to discriminate these groups isconsidered to be the number of successful LCT.Figure 11 shows the frequency bar charts for the number of successful LCT toeach group. Taking a look at the figure, we can see that students who failed in LCTmore than seven times almost absolutely failed in the final examination, whichcould cause the drop out. Also, students who were successful to LCT more thanten times took better score in the final examination. Since all the testing times were13 in this case, this means that students who failed in learning check testing morethan half out of all testing times almost absolutely failed in the final examination,10igure 10: 3-dimensional bar charts representing the relationship between the num-ber of successes in the LCT and the number of absents for the FPC.and students who were successful to learning check testing more than two third outof all testing times took better score in the final examination.
It is crucial to identify students at risk for failing courses and/or dropping out asearly as possible because a variety of students are now enrolled in universities andwe teachers have to educate them altogether. To overcome this, we established on-line testing systems aimed at helping students who need further learning skills inmathematics subjects, including the learning check testing, the collaborative work-ing testing, and the follow-up program testing. Using the accumulated data fromthese testings in the database, we aimed at obtaining effective learning strategiesfor students at risk for failing courses and/or dropping out. Although the subjectswe deal with are analysis basic (similar to calculus) and linear algebra, we focusedon linear algebra case as a typical one.In this paper, we have found some key factors strongly related to the studentsat risk. The findings are the following. 1) Badly failed students (score range is0-39 in the final examination) tend to be absent for the regular classes and failin the learning check testing even if they attended, and they are very reluctant toattend the follow-up program classes. 2) Successful students (score range is 60-100 in the final examination) attend classes and get good scores in every learning11igure 11: Histograms of estimated abilities of LCT to successful students and tofailed students (linear algebra in the first semester in 2017).12heck testing. 3) Failed students but not so badly (score range is 40-59 in the finalexamination) reveal both sides of features appeared in score range of 0-39 andscore range of 60-100. Therefore, it is crucial to attend the lectures in order not todrop out. Students who failed in learning check testing more than half out of alltesting times almost absolutely failed in the final examination, which could causethe drop out. Also, students who were successful to learning check testing morethan two third out of all testing times took better score in the final examination.
Acknowledgment
The author would like to thank mathematical staffs at Hiroshima Institute of Tech-nology.
References [1] R. de Ayala, The Theory and Practice of Item Response Theory. GuilfordPress, 2009.[2] N. Elouazizi, Critical Factors in Data Governance for Learning Analytics,Journal of Learning Analytics, 1, 2014, pp. 211-222.[3] D. Gasevic, S. Dawson, and G. Siemens, Let’s not forget: Learning analyticsare about learning, TechTrends, 59, 2015, pp. 64-71.[4] R. Hambleton, H. Swaminathan, and H. J. Rogers, Fundamentals of ItemResponse Theory. Sage Publications, 1991.[5] H. Hirose, Meticulous Learning Follow-up Systems for Undergraduate Stu-dents Using the Online Item Response Theory, 5th International Conferenceon Learning Technologies and Learning Environments, 2016, pp.427-432.[6] H. Hirose, M. Takatou, Y. Yamauchi, T. Taniguchi, T. Honda, F. Kubo,M. Imaoka, T. Koyama, Questions and Answers Database Construction forAdaptive Online IRT Testing Systems: Analysis Course and Linear AlgebraCourse, 5th International Conference on Learning Technologies and LearningEnvironments, 2016, pp.433-438.[7] H. Hirose, Learning Analytics to Adaptive Online IRT Testing Systems “AiArutte” Harmonized with University Textbooks, 5th International Conferenceon Learning Technologies and Learning Environments, 2016, pp.439-444.[8] H. Hirose, M. Takatou, Y. Yamauchi, T. Taniguchi, F. Kubo, M. Imaoka, T.Koyama, Rediscovery of Initial Habituation Importance Learned from Ana-lytics of Learning Check Testing in Mathematics for Undergraduate Students,6th International Conference on Learning Technologies and Learning Envi-ronments, 2017, pp.482-486. 139] H. Hirose, Dually Adaptive Online IRT Testing System, Bulletin of Informat-ics and Cybernetics Research Association of Statistical Sciences, 48, 2016,pp.1-17.[10] H. Hirose, Difference Between Successful and Failed Students Learned fromAnalytics of Weekly Learning Check Testing, Information Engineering Ex-press, Vol 4, No 1, 2018, pp.11-21.[11] H. Hirose, A Large Scale Testing System for Learning Assistance and ItsLearning Analytics, Proceedings of the Institute of Statistical Mathematics,Vol.66, No.1, 2018, pp.79-96.[12] W. J. D. Linden and R. K. Hambleton, Handbook of Modern Item ResponseTheory. Springer, 1996.[13] T. Sakumura, H. Hirose, Bias Reduction of Abilities for Adaptive OnlineIRT Testing Systems, International Journal of Smart Computing and ArtificialIntelligence (IJSCAI), 1, 2017, pp.57-70.[14] G. Siemens and D. Gasevic, Guest Editorial - Learning and Knowledge Ana-lytics, Educational Technology & Society, 15, 2012, pp.1-2.[15] Y. Tokusada, H. Hirose, Evaluation of Abilities by Grouping for Small IRTTesting Systems, 5th International Conference on Learning Technologies andLearning Environments, 2016, pp.445-449.[16] R. J. Waddington, S. Nam, S. Lonn, S.D. Teasley, , Improving Early Warn-ing Systems with Categorized Course Resource Usage, Journal of LearningAnalytics, 3, 2016, 263-290.[17] A.F. Wise and D.W. Shaffer, Why Theory Matters More than Ever in the Ageof Big Data, Journal of Learning Analytics, 2, pp. 5-13, 2015.[18]