[PDF] Modelling the Dropout Patterns of MOOC Learners

Abstract

We adopted survival analysis for the viewing durations of massive open online courses. The hazard function of empirical duration data is dominated by a bathtub curve and has the Lindy effect in its tail. To understand the evolutionary mechanisms underlying these features, we categorized learners into two classes due to their different distributions of viewing durations, namely lognormal distribution and power law with exponential cutoff. Two random differential equations are provided to describe the growth patterns of viewing durations for the two classes respectively. The expected duration change rate of the learners featured by lognormal distribution is supposed to be dependent on their past duration, and that of the rest learners is supposed to be inversely proportional to time. Solutions to the equations predict the features of viewing duration distributions, and those of the hazard function. The equations also reveal the feature of memory and that of memorylessness for the viewing behaviors of the two classes respectively.

Full PDF

11 Modelling the Dropout Patterns of MOOC Learners

Zheng Xie ,(cid:93) (cid:93) [email protected] Abstract

We adopted survival analysis for the viewing durations of massive open online courses. The hazardfunction of empirical duration data is dominated by a bathtub curve and has the Lindy eﬀect in its tail.To understand the evolutionary mechanisms underlying these features, we categorized learners into twoclasses due to their diﬀerent distributions of viewing durations, namely lognormal distribution and powerlaw with exponential cutoﬀ. Two random diﬀerential equations are provided to describe the growthpatterns of viewing durations for the two classes respectively. The expected duration change rate of thelearners featured by lognormal distribution is supposed to be dependent on their past duration, and thatof the rest learners is supposed to be inversely proportional to time. Solutions to the equations predictthe features of viewing duration distributions, and those of the hazard function. The equations alsoreveal the feature of memory and that of memorylessness for the viewing behaviors of the two classesrespectively.

Keywords:

Data science applications in education; Distance education and online learning; Evaluationmethodologies.

Introduction

Massive open online courses (MOOCs) arise from the integration of education and information technolo-gies, featured by unlimited participation and open access via the Internet [1, 2]. These courses break thespatiotemporal boundary of traditional education, and contribute to balancing the resources of educa-tion [3, 4], and so on. Diﬀerences between these courses and traditional courses lie in several dimensions,involving admission condition, learning motivation, teaching methods, the management of learners, theinteractions between teachers and learners [5, 6]. Analyzing learning behaviors has become a hot topicin the MOOC community, which includes the achievements of learners [7, 8], the interactions among a r X i v : . [ phy s i c s . e d - ph ] J u l The data

We analyzed the log data of viewing eight courses from 01/01/2017 to 10/11/2017, which are providedby the platform iCourse. The courses are selected from natural sciences, social sciences, humanities andengineering technologies respectively. Speciﬁc statistical indexes of these courses are listed in Table 1,which have been used to analyze course attraction in our previous work [29]. The data include the timelength of each video. For each learner, the data include the start time of viewing each video he or sheopened, and the corresponding viewing durations.Videos can be downloaded by iCourse app. The log data of viewing the downloaded videos arealso collected, unless the app disconnects to the Internet. Accordingly, our study only involves theonline viewing behavior, recorded as log data. However, learners may be oﬀ-task during video playing,which cannot be measured through log data. This is a limitation of our study. In addition, some typicaloperations on videos are not analyzed here, such as pausing, skipping, backward, forward, speed changing,and so on.We concentrated on learners’ viewing duration, which is deﬁned to be the duration of video playing,where the duration of pause is not counted. We introduced the following symbols to express the duration.Suppose that learners { L , ..., L m } have viewed a course with n videos { V , ..., V n } . Denote the time lengthof video V i as l i , the duration of learner L s viewing V i as t si . Then the viewing duration of learner L s onthe course is (cid:80) ni =1 t si . Hereafter, the viewing duration on a course is called duration for short. The survival analysis of viewing durations

A learner’s viewing duration on a course can be regarded as the “lifetime” of his or her viewingbehavior. The number of learners with duration t expresses the number of dropouts at “age” t , where t ∈ [0 , T ], and T is the maximum viewing duration. Therefore, the density function of viewing durations,denoted as f ( t ), expresses the rate of dropouts at any possible t . The rate of dropouts at a given t for the Table 1. Speciﬁc statistical indexes of the data provided by iCourse.

Course Course Id m n a b c d

Calculus

Game theory

Finance

Psychology

Spoken English

Etiquette

C Language

Python m : the number of learners, n : the number of videos, a : the number of all-rounders who viewed all of the videos, b :the average number of viewed videos per learner, c : the average viewing duration of learners (unit: hour), and d : theaverage time length of videos (unit: hour). learners with viewing duration no less than t is calculated as h ( t ) = f ( t ) /S ( t ), where S ( t ) = (cid:82) Tt f ( τ ) dτ is the probability of a learner’s duration no less than t . In survival analysis [30], S ( t ) and h ( t ) are calledsurvival function and hazard function respectively. The function h ( t ) is the derivative of log S ( t ), andthen is more informative about dropouts.The hazard function of empirical data is dominated by a concave function and has a decreasingtail (Fig. 1). The shape of the concave part is known as bathtub curve, the concept of which comesfrom product quality assessment. The curve is often used to describe the failures of products over time,which contain the decreasing rate of early failures as defective products are discarded, the random failureswith a constant rate during the useful life of products, and the increasing rate of wear-out failures asthe products exceed their designed lifetime. In this study, we called the dropouts of the increasing partwear-out dropouts. The decreasing tail is known as the Lindy eﬀect, namely the future life expectancyis proportional to their current age. It means that every additional period of duration implies a longerremaining duration expectancy.To understand why the Lindy eﬀect and bathtub curve emerge simultaneously, we should analyze thefeatures of the density function f ( t ), and mine the dynamic mechanisms under those features, because thehazard function h ( t ) is based on f ( t ). When a learner has viewed a certain number of videos, his or herviewing duration follows a lognormal distribution (the results of the KS test are shown in Table 2). Forthe rest learners, most of their duration follow a power law with an exponential cutoﬀ (the results of good-of-ﬁt are listed in Table 2). To illustrate these features, we ﬁtted the parameters of these distributions foreach course, and generated synthetic durations following each distribution (Fig. 2), the number of whichis the same as that of the corresponding empirical durations. The comparison between empirical duration -4 -3 -2 -1 C ond iti on a l d r op ou t r a t e (a)a) Calculus

Segment-learners

Lognormal -learnersAll learners -6 -5 -4 -3 -2 -1 (b)b) C Language

Segment-learnersLognormal-learnersAll learners Viewing time length -5 -4 -3 -2 -1 C ond iti on a l d r op ou t r a t e (e)) Game theory Segment-learnersLognormal-learnersAll learners Viewing time length -5 -4 -3 -2 -1 (f)f) Psychology

Segment-learnersLognormal-learnersAll learners -4 -3 -2 -1 (c)c) Etiquette

Segment-learnersLognormal-learnersAll learners -5 -4 -3 -2 -1 (d)d) Finance

Segment-learnersLognormal-learnersAll learners Viewing time length -6 -5 -4 -3 -2 -1 (g)g) Python

Segment-learnersLognormal-learnersAll learners Viewing time length -5 -4 -3 -2 -1 (h)h) Spoken Engl ish Segment-learnersLognormal-learnersAll learners

Figure 1. The hazard functions of empirical viewing durations.

Panels show the hazardfunctions (orange circles) and their trend line (red dotted lines) of viewing durations. Panels also showthe trend lines of the hazard functions for lognormal-learners (blue lines) and segment-learners (blacklines) respectively.distributions and synthetic ones are shown in Fig. 3. What is the relationship between these featuresof the density function and the shape of the hazard function? To answer this question, we explored theevolutionary mechanisms underlying these features in the following sections.

Table 2. Fitting parameters and goodness-of-ﬁt.

Course α β µ σ τ p-value ψ Calculus

C Language

Etiquette

Finance

Game theory

Psychology

Python

Spoken English x α e βx are ﬁtted through multiple linear regression. The parameters of Lognormal( µ, σ ) are calculatedbased on empirical data. At signiﬁcance level 5%, the KS test cannot reject that the viewing durations of the learners, whohave viewed no less than τ videos, follow a lognormal distribution ( p -values > . ψ is the half ofthe cumulative diﬀerence between the duration distribution of segment-learners and the corresponding synthetic one. -3 -2 -1 L ea r n e r p r opo r ti on ( a) Calculus Power-lawLognormalMixture -4 -3 -2 ( b ) C Language

Power-lawLognormalMixture -3 -2 ( c ) Etiquette

Power-lawLognormalMixture -3 -2 -1 ( d ) Finance

Power-lawLognormalMixture Viewing time length -3 -2 L ea r n e r p r opo r ti on ( e ) Game theory

Power-lawLognormalMixture Viewing time length -3 -2 ( f ) Psychology

Power-lawLognormalMixture Viewing time length -4 -3 -2 (g) Python

Power-lawLognormalMixture Viewing time length -4 -3 -2 -1 ( h ) Spoken English

Power-lawLognormalMixture

Figure 2. Synthetic viewing duration distributions.

Panels show the density functions oflognormal distributions (blue lines, orange circles) and those of the power-law distributions that have anexponential cutoﬀ (black lines, blue circles) respectively, as well as the mixture density functions ofthem (red dotted lines). The parameters of these distributions are listed in Table 2. L ea r n e r p r opo r ti on -3 -2 -1 ( a ) SyntheticCalculus -4 -3 -2 -1 ( b ) SyntheticC Language -3 -2 ( c ) SyntheticEtiquette -3 -2 -1 ( d ) SyntheticFinance Viewing time length -3 -2 -1 L ea r n e r p r opo r ti on ( e ) SyntheticGame theory Viewing time length -3 -2 -1 ( f ) SyntheticPsychology Viewing time length -4 -3 -2 -1 ( g ) SyntheticPython Viewing time length -4 -3 -2 -1 ( h ) SyntheticSpoken English Figure 3. Comparisons between the empirical distributions of viewing durations andsynthetic ones.

Panels show that each empirical distribution can be approximated by a mixture of alognormal distribution and a power law with an exponential cutoﬀ.

The Lindy eﬀect, wear-out dropouts and lognormal distribution

The empirical data show that the viewing durations of the learners, who have viewed no less than τ videos (Table 2), follow a lognormal distribution. We called them lognormal-learners, and used analgorithm in Reference [29] to ﬁnd them (Table 4 in the appendix). Following an identical lognormaldistribution means those learners are the samples drawn from the same population in the sense of viewingduration; thus can be categorized as one class.The density function of lognormal distribution is f ( x ) = e ( log x − µσ ) /σ √ πx , where x ∈ [1 , ∞ ), σ > µ ∈ R . The corresponding hazard function is h ( x ) = 1 xσ (cid:114) π e − (log x − µ )22 σ (cid:18) − erf (cid:18) log x − µ √ σ (cid:19)(cid:19) − . (1)The hazard function (1) is convex, when its corresponding density function is convex [31]. Fig. 1, 2 showthat the density function of lognormal-learners and the corresponding hazard function are convex foreach empirical course, namely appears the wear-out dropouts and the Lindy eﬀect. Fig. 4 shows that thehazard functions of synthetic durations give reasonable ﬁts to those of empirical data, which verify abovearguments.To ﬁnd the evolutionary mechanisms underlying the wear-out dropouts and the Lindy eﬀect, we shouldgo back to where lognormal distributions come from. They often emerge in the lifetime distributions ofmechanism units [32], where the lifetime of a unit is aﬀected by the multiplication of many small factors.Approximate these factors as a range of independent and identically distributed random variables. Thecentral limit theorem says that the summation of these variables in log scale follows a normal distribution.Transforming to the original scale gives rise to that the multiplication of these factors follows a lognormaldistribution. This is known as the multiplicative version of the central limit theorem, called the Gibrat’slaw [33].Lognormal-learners contain the all-rounders who viewed all of the videos. It means that the endurancesof the rest lognormal-learners and those of all-rounders are homogenous; thus could be regarded aspotential all-rounders. For a potential all-rounder who has the willing to complete a course, his enduranceof viewing videos (measured by his viewing duration) could be analogized to a mechanical unit whosefailure mode is of a fatigue-stress nature. The life of such a unit follows a lognormal distribution. The -5 -4 -3 -2 -1 C ond iti on a l d r op ou t r a t e (a)a) Calculus

Power-lawLognormalMixtrure -5 -4 -3 -2 -1 (b)b) C Language

Power-lawLognormalMixtrure -4 -3 -2 -1 (c)c) Etiquettey

Power-lawLognormalMixtrure -5 -4 -3 -2 -1 (d)d) Finance

Power-lawLognormalMixtrure Viewing time length -5 -4 -3 -2 -1 C ond iti on a l d r op ou t r a t e (e)e) Game theory Power-lawLognormalMixtrure Viewing time length -4 -3 -2 -1 (f)f) Psychology

Power-lawLognormalMixtrure Viewing time length -4 -3 -2 -1 (g)g) Python

Power-lawLognormalMixtrure Viewing time length -3 -2 -1 (h)h) Spoken English

Power-lawLognormalMixtrure

Figure 4. The hazard functions of synthetic viewing durations.

Panels show the hazardfunctions (blue squares) and their trend line (red dotted lines) of synthetic durations. Panels also showthe hazard functions of lognormal distributions (blue lines) and those of the power-law distributionsthat have an exponential cutoﬀ (black lines).analogy enlightens us to provide dk i ( t ) = µk i ( t ) dt + σk i ( t ) dw, (2)where k i ( t ) is the duration at time t of learner i , w is the Wiener process, µ and σ >

0. Supposing k i ( t ) = 1 gives rise to the solution k i ( t ) = e ( µ − σ / t − t )+ σ (cid:82) tt dw , which is the random variable of alognormal distribution.Eq. (2) means the viewing behavior of a lognormal-learner has memory, because the change rate ofduration correlates to his or her past duration. Moreover, the expected change rate correlates to the pastduration positively. It means when t is large enough, the duration increases exponentially, namely themore you learn, the more you want to learn. This is the status of the learners who are deeply impressedby a course. -3 -2 L ea r n e r p r opo r ti on (a) SyntheticCalculus -4 -3 -2 (b) SyntheticC Language -2 (c) SyntheticEtiquette -3 -2 (d) SyntheticFinance Viewing time length -3 -2 L ea r n e r p r opo r ti on (e) SyntheticGame theory Viewing time length -3 -2 (f) SyntheticPsychology Viewing time length -3 -2 (g) SyntheticPython Viewing time length -3 -2 (h) SyntheticSpoken English Figure 5. The exponential cutoﬀs of viewing duration distributions.

Panels show theduration distributions of the segment-learners that viewed no less than one minute (red circles),compared with the predictions of Eq. (3) (blue squares).

Dropouts with a constant rate and exponential cutoﬀ

Fig. 3 shows the viewing durations of the learners viewing less than τ videos (Table 2) approximatelyfollow a power law with an exponential cutoﬀ. The emergence of the cutoﬀ is mainly due to that thedurations of segment-learners, which are no less than one minute, approximately follow an exponentialdistribution (Fig. 5). The density function of exponential distribution is f ( x ) = e − x/λ /λ, where x ∈ [1 , ∞ )and λ >

0. The corresponding survivor function is S ( x ) = e − x/λ , and then the hazard function is aconstant, namely h ( x ) = f ( x ) /S ( x ) = 1 /λ .To ﬁnd the evolutionary mechanism underlying the dropouts with a constant rate, we came backto the mechanism underlying exponential distribution. The distribution is featured by memorylessness,because it satisﬁes p ( T > s + t | T > s ) = e − ( s + t ) /λ / e − s/λ = e − t/λ = p ( T > t ) for any possible T , s , and t . In our study, the memorylessness means the future viewing duration is free of the past duration. Forexample, the probability that a learner, who has viewed ten minutes, will view one minute is equal tothe probability that a learner, who dose not view videos, will view one minute.Due to the memorylessness and the fatigue of learning, it is reasonable to suppose the change rate ofviewing duration decreases with time. For simplicity, we supposed the change rate of learner i ’s viewing0duration k i ( t ) to be ddt k i ( t ) = λt . (3)Solving the equation in the time interval [ y, T ] gives rise to k i ( T ) = λ log Ty , (4)where y is the start time of viewing. Supposing y is a random variable of the uniform distribution over[ T , T ] gives rise to p ( k i ≤ x ) = p ( λ log Ty ≤ x ) = p ( T e − xλ < y )=1 − TT − T e − xλ . (5)It leads to an exponential distribution p ( k i = x ) = ddx p ( k i ≤ x ) = Tλ ( T − T ) e − xλ . (6)When T = 0, Eq. (6) is the standard exponential distribution.To make synthetic duration distributions ﬁt the empirical ones, we valued the parameters of thesolution (4) based on the information of empirical data. Let the domain of the durations of segment-learners (which are no less than one minute) be [ T , T ], and calculate the exponent − /λ of the formula(6) by ﬁtting empirical data (Table 3). Letting the simulated duration λ log ( T /y ) belong to [ T , T ] givesthe sampling interval [e − T /λ , e − T /λ ] for y/T . Table 5 in the appendix shows the detail of this simulationprocess.Analytical arguments allow for the prediction of the exponential cutoﬀ. The simulations based onthe solution (4) also provide a reasonable ﬁt to those of empirical data (Fig. 5, the index ψ in Table 4).Therefore, Eq. (3) can be regarded as an expression of the evolutionary mechanism for the exponentialcutoﬀ and for the random dropouts with a constant rate.1 Table 3. Parameters of synthetic power law and exponential cutoﬀ.

Course

T λ N ψ a b c N ψ Calculus

460 116.9 1,217 26.38% 1.30e03 3.682 7.25e-2 1,169 11.89%

C Language

Etiquette

206 42.62 683 22.36% 1.89e01 2.122 7.71e-2 1,079 10.29%

Finance

Game theory

678 181.4 1,834 25.08% 1.47e02 2.904 7.51e-2 1,408 7.74%

Psychology

495 93.70 1,718 31.49% 1.04e01 2.170 5.54e-2 1,188 17.51%

Python

616 104.2 6,607 14.59% 2.70e00 1.351 6.94e-2 4,261 13.49%

Spoken English

916 93.70 4,062 25.58% 1.29e36 22.78 5.92e-2 7,074 18.52%Index T : the maximum duration of segment-learners, λ : the parameter of Eq. (3), N and ψ ( N and ψ ): the number ofthe segment-learners with duration no less than (less than) one minute and the half of the cumulative diﬀerence betweenthe duration distribution of those learners and the corresponding synthetic distribution, a and b : the parameters ofEq. (8), c : the normalization coeﬃcient of Formula (7). Early decreasing dropouts and power law

The early decreasing trend of dropout rates appears in the hazard functions of empirical data, whichdescribes a phenomenon that the dropout rate of viewing course decreases within the ﬁrst minute. Itdescribes the period of “infant mortality” where the learners, who only tour a course, drop viewing.Meanwhile, the density functions of empirical data show the viewing durations of segment-learners, whoviewed less than one minute, can be ﬁtted by a power-law function approximately (Fig. 6). Denote thedensity function of a power-law distribution by f ( x ) = cx − α , where c is the normalization coeﬃcient,and α ∈ (0 , x is valued in a ﬁnite interval, denoted as [ R , R ]. Note thatthe value of the exponent α is diﬀerent from that of degree distribution in network sciences, which islarger than one. The corresponding survivor function is S ( x ) = 1 − c ( x − α − / (1 − α ), and the hazardfunction is h ( x ) = cx − α − c − α ( x − α − . (7)It decreases with the growth of x , when x ≤ R (( α + 1) / / (1 − α ) .To ﬁnd the evolutionary mechanism underlying the early decreasing dropouts, we also came back tothe mechanism underlying such a power law. The durations of learners approximately following a powerlaw are less than those approximately following an exponential distribution. Hence it is more reasonableto regard their learning behavior as memorylessness. Therefore, we supposed the durations of thoselearners are also governed by Eq. (2). Meanwhile, power law reﬂects the heterogeneity of samples [34,35].2 -2 -1 L ea r n e r p r opo r ti on (a) SyntheticCalculus -2 -1 (b) SyntheticC Language -2 -1 (c) SyntheticEtiquette -2 -1 (d) SyntheticFinance Viewing time length -2 -1 L ea r n e r p r opo r ti on (e) SyntheticGame theory Viewing time length -2 -1 (f) SyntheticPsychology Viewing time length -2 -1 (g) SyntheticPython Viewing time length -2 -1 (h) SyntheticSpoken English Figure 6. The power-law parts of viewing duration distributions.

Panels show the durationdistributions of the segment-learners who viewed less than one minute (red circles), compared with thepredictions of Eq. (9) (blue squares).It means the parameter λ of Eq. (2) should be heterogenous over learners.We expressed this heterogeneity by λ ( ν ) = ν b /a , and then ddt k i ( t ) = ν b at , (8)where ν is a random integer of the uniform distribution over [ S , S ], a >

0, and b >

1. Solving it oninterval [ y, T ] gives rise to k i ( T ) = ν b a log Ty , (9)where y is the start time of viewing, sampled from the uniform distribution over [ T , T ]. Hence theexpected value of k i ( T ) is ν b log 2 /a , which yields p ( k i ( T ) ≤ x ) = p (cid:32) ν ≤ (cid:18) ax log 2 (cid:19) b (cid:33) = 1( S − S + 1) (cid:18) ax log 2 (cid:19) b . (10)3Then the density function of viewing duration is p ( k i ( T ) = x ) = ddx p ( k i ( T ) ≤ x ) ∝ x b − . (11)The strict deduction of the density function needs averaging over all possible ν , which yields p ( k i ( T ) = x ) = 1 S − S (cid:90) S S λ ( ν ) e − λ ( ν ) x dν = 1 S − S (cid:90) S S aν − b e − aν − b x dν = a b b ( S − S ) x b − (cid:90) aS − b xaS − b x τ − b e − τ dτ ∝ x b − I ( x ) , (12)where I ( x ) = (cid:82) aS − b xaS − b x τ − b e − τ dτ . Diﬀerentiate the integration part to obtain ddx I ( x ) = a − b x − b (cid:16) S − b e − aS − b x − S − b e − aS − b x (cid:17) . (13)This derivative is approximately equal to 0 if a is large enough, which is guaranteed by the empiricalvalues of a in Table 3. Hence the integration part is free of x and p ( k i ( T ) = x ) ∝ x /b − .To make the simulated distributions ﬁt the empirical ones, we valued the parameters of Eq. (9) basedon empirical data as follows. Calculate the domain of the durations of segment-learners (which are lessthan one minute) [ R , R ], and ﬁt their distribution by power law cx − α . The ﬁtted values of α and c arelisted in Table 2 and Table 3 respectively. Comparing the coeﬃcients of Eq. (11) to the α and c givesrise to α = 1 − /b and c = ( a/ log 2) /b / ( S − S + 1) b . Solving them obtains the value of a and b . Theexpected duration ν b log 2 /a belonging to [ R , R ] gives rise to the sampling interval for ν and then for y/T . Table 6 in the appendix shows the detail for this simulation process.Above analysis realizes a process of deriving power law from a range of exponential distributions.Moreover, it provides an explanation for the early decreasing trend of the hazard functions. That is, thedropout rate 1 /λ ( ν ) decreases with the growth of the expected value λ ( ν ). The simulations based on thesolution (9) also provide a reasonable ﬁt to the heads of the empirical duration distributions (Fig. 5).Therefore, Eq. (8) can be regarded as an expression of the evolutionary mechanism for power law and for4 -4 -2 Viewing time length -2 C ond iti on a l d r op ou t r a t e -5 L ea r n e r p r opo r ti on Lindy effectBathtub curve Wear - out drop out s Power law Exponent ial cutoff Lognormal ∂k i ∂t ∝ t (b)(a)(c) Random drop out s with constant rate D ecreasing dropouts dk i ( t ) = µk i ( t ) dt + σk i ( t ) dw Figure 7. An illustration of the presented results.

The illustrated data are from the course

C language . Panel (a) shows the density function and the evolutionary equations for viewing durations.Panels (b, c) show the features of the density function, and those of the corresponding hazard function.The equation on the left describes the evolutionary mechanism for power law with exponential cutoﬀ,early deceasing dropouts, and the random dropouts with a constant rate. The equation on the rightdescribes the mechanism for lognormal distribution, wear-out dropouts, and the Lindy eﬀect.the early decreasing trend. In addition, the memorylessness of Eq. (8) together with that of Eq. (3) canbe regarded as the intrinsic meaning of the class name segment-learners.

Discussion and conclusions

The survival analysis on the viewing behavior of learning MOOCs shows the hazard functions of empiricalviewing durations are featured by the Lindy eﬀect and bathtub curve simultaneously. Two randomdiﬀerential equations are provided to describe the growth processes of viewing durations. The solutionsto these equations predict the features of the hazard functions. Therefore, these equations can be regardedas mathematical expressions of the evolutionary mechanisms underlying these features. We summarizedthe presented results in Fig.7.The presented results have the potential to illuminate speciﬁc views and implications in a broaderstudy of learning behaviors. For example, the features of viewing duration distributions can be usedto proﬁle the type of learners, such as lognormal-learners, the learners with a duration approximately5

Lognormal-learners

Segment-learne rs with duration≥1 minute

The rest

Calculus C Language Etiquette Finance Game theory Psychology Python Spoken English

Figure 8. The fractions of three learner-types.

The three types of learners arelognormal-learners (the durations of them follow a lognormal distribution), the segment-learners withduration no less than one minute (those durations follow an exponential distribution approximately),and the rest (the durations of them follow a power law approximately).following an exponential distribution, and those with a duration approximately following a power law.The fractions of these types vary over courses (Fig. 8). Over half of the learners studying the course

Calculus are lognormal-learners. Almost half of the learners taking

C Language or Python viewed lessthan one minute, where their duration approximately follows a power law. Weighting each type with adiﬀerent value helps to measure the attractions of MOOCs in a reasonable way.Comparing the duration distributions before and after adopting a teaching method helps to knowwhether the method signiﬁcantly increases or decreases learning durations. For example, if the KS testshows the duration distributions of lognormal-learners are identical, it cannot say the improvement ofthe adopted method is signiﬁcant. This can also be used to compare the attractions of diﬀerent courses.It removes the heterogeneity of the number of learners, and hence is a fair way for the courses with highquality but having few learners.In the presented equations, the viewing durations are based on random factors, and memory ormemorylessness. At the beginning of studying a course, a learner does not need the knowledge of thecourse. When studying deeply, the learner would need the knowledge learned from the course. Thisprocess could be regarded as the transition from memorylessness to memory. Meanwhile, the viewingduration distribution of each empirical course has a fat tail, known as a feature of complexity. We foundthat each tail is dominated by the tail of a lognormal distribution, and that viewing with memory cangenerate a lognormal distribution. Therefore, exploring the mechanisms underlying the transition wouldcontribute to understanding the role of memory in the complexity of learning behaviors.6

Appendixes

Categorization of learners

The following algorithm comes from Reference [29].

Table 4. An algorithm of categorizing learners.

Input: the viewing duration t s and the number of viewed videos n s of learners L s ( s = 1 , ..., m ).For k from 0 to max( n , ..., n m ):do the KS test for t s of the learners L s satisfying n s > k with the null hypothesis that they follow alognormal distribution;break if the test cannot reject the null hypothesis at signiﬁcance level 5%.Output: the current k (denoted as κ ).The unit of durations is millisecond. Categorize learner L s as a lognormal-learner if n s > κ , and as asegment-learner if else. The process of generating synthetic durations

Table 5. Modelling exponential cutoﬀ.

Input: the durations ( ≥ δ e − x/λ based on the input.Calculate the input’s domain [ T , T ].For i in range 1 to the number of empirical durations:sample a y/T from the uniform distribution over [e − T /λ , e − T /λ ];substitute it into Eq. (4) to obtain a random integer;append the integer to the list of synthetic durations.Output: the list of synthetic durations.The unit of durations is 2 seconds. Acknowledgments

The author thinks Researcher Xiao Xiao in the Higher Education Press, Professor Ming Zhang in thePeking university, Professor Jinying Su and Jianping Li in the National University of Defense Technologyfor their helpful comments and feedback. The author is grateful to the MOOC platform iCourse for itsempirical data. This work is supported by National Science Foundation of China (Grant No. 61773020).7

Table 6. Modelling power-law part.

Input: the durations ( < S , S ] of ν .Regress the coeﬃcients of cx α based on the input.Calculate the parameters of Eq. (9): b = 1 / (1 − α ), a = ( c ( S − S + 1) b ) b log 2.Calculate the input’s domain [ R , R ].For i in range 1 to the number of empirical durations:sample a ν from the uniform distribution over [( aR / log 2) /b , ( aR / log 2) /b ];sample a y/T from the uniform distribution over [e − R /λ ( ν ) , e − R /λ ( ν ) ];substitute them into Eq. (9) to obtain a random integer;append the integer to the list of synthetic durations.Output: the list of synthetic durations.The unit of durations is 2 seconds, S = 1, and S = 29. References

1. L. Breslow, D. E. Pritchard, J. DeBoer, G. S. Stump, A. D. Ho, and D. T. Seaton, Studying learningin the worldwide classroom research into edx’s ﬁrst MOOC,

Res. Pract. Assess., vol. 8, pp.13-25,2013.2. M. Zhu, A. Sari, and M. M. Lee, A systematic review of research methods and topics of the empiricalMOOC literature (2014-2016).

Int. High. Educ., vol. 37, pp.31-39, 2018.3. E. J. Emanuel, Online education: MOOCs taken by educated few,

Nature , vol. 503, no. 7476,pp.342-342, 2013.4. J. Reich, Rebooting MOOC research,

Science, vol. 347, no. 6217, pp.34-35, 2015.5. A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec, Engaging with massive online courses.in

Proc. 23rd Int. Conf. on World Wide Web , ACM, 2014, pp. 687-698.6. K. Jona and S. Naidu, MOOCs: emerging research,

Dist. Educ., vol. 35, no. 2, pp.141-144, 2014.7. J. DeBoer, A.D. Ho, G. S. Stump, and L. Breslow, Changing “course” reconceptualizing educationalvariables for massive open online courses,

Educ. Res. , vol. 43, no. 2, pp.74-84, 2014.8. J. P. Meyer and S. Zhu, Fair and equitable measurement of student learning in MOOCs: Anintroduction to item response theory, scale linking, and score equating,

Res. Pract. Assess., vol. 8,pp.26-39, 2013.89. A. S. Sunar, W. Su, N. A. Abdullah, and H. C. Davis, How learners’ interactions sustain engagement:a MOOC case study,

IEEE. T. Learn. Technol., vol. 10, no. 4, pp.475-487, 2017.10. S. R. Emmons, R. P. Light, and K. B¨orner, MOOC visual analytics: empowering students, teachers,researchers, and platform developers of massively open online courses,

J. Assoc. Inf. Sci. Technol., vol. 68, pp.2350-2363, 2017.11. C. Sandeen, Assessment’s Place in the New MOOC World,

Res. Pract. Assess., vol. 8, pp.5-12,2013.12. S. I. de Freitas, J. Morgan, and D. Gibson, Will MOOCs transform learning and teaching in highereducation? Engagement and course retention in online learning provision,

Br. J. Educ. Technol., vol. 46, no. 3, pp.455-471, 2015.13. J. A. Greene, C. A. Oswald, J. Pomerantz, Predictors of retention and achievement in a massiveopen online course,

Am. Educ. Res. J., vol. 52, no. 5, pp.925-955, 2015.14. K. S. Hone and G. R. E. Said, Exploring the factors aﬀecting MOOC retention: A survey study,

Comput. Educ., vol. 98, pp.157-168, 2016.15. A. Littlejohn, N. Hood, C. Milligan, and P. Mustain, Learning in MOOCs: motivations and self-regulated learning in MOOCs,

Int. High. Educ., vol. 29, pp.40-48, 2016.16. M. Barak, A. Watted, and H. Haick, Motivation to learn in massive open online courses: Examiningaspects of language and social engagement,

Comput. Educ., vol. 94, pp.49-60, 2016.17. P. G. de Barba, G. E. Kennedy, and M. D. Ainley, The role of students’ motivation and participationin predicting performance in a MOOC,

J. Comput. Assist. Learn., vol. 32, no. 3, pp.218-231, 2016.18. A. Watted and M. Barak, Motivating factors of MOOC completers: comparing between university-aﬃliated students and general participants,

Int. High. Educ., vol. 37, pp.11-20, 2018.19. S. Zheng, M. B. Rosson, P. C. Shih, and J. M. Carroll, Understanding student motivation, behaviorsand perceptions in MOOCs, in

Proc. 18th ACM Conf. on Computer Supported Cooperative Work &Social Computing,

Comput. Educ., vol. 80, pp.28-38, 2015.21. K. Jordan, Initial trends in enrolment and completion of massive open online courses,

Int. Rev.Res. Open. Dist. Learn., vol. 15, pp.133-160, 2014.22. R. F. Kizilcec, C. Piech, E. Schneider, Deconstructing disengagement: analyzing learner subpopu-lations in massive open online courses, in

Proc. 3rd Int. Conf. on Learning Analytics and Knowledge,

ACM, 2013, pp.170-179.23. K. F. Hew, Promoting engagement in online courses: What strategies can we learn from threehighly rated MOOCs,

Br. J. Educ. Technol., vol. 47, no. 2, pp.320-341, 2016.24. L. Y. Li and C. C. Tsai, Accessing online learning material: Quantitative behavior patterns andtheir eﬀects on motivation and learning performance,

Comput. Educ., vol. 114, pp.286-297, 2017.25. G. Cheng and J. Chau, Exploring the relationships between learning styles, online participation,learning achievement and course satisfaction: An empirical study of a blended learning course,

Br.J. Educ. Technol., vol. 47, no. 2, pp.257-278, 2016.26. C. R. Henrie, L. R. Halverson, and C. R. Graham, Measuring student engagement in technology-mediated learning: A review,

Comput. Educ., vol. 90, pp.36-53, 2015.27. G. A. Klutke, P. C. Kiessler, M. A. Wortman, A critical look at the bathtub curve,

IEEE. T.Reliab., vol. 52, no. 1, pp.125-129, 2003.28. J. Holman, Antifragile: things that gain from disorder,

Quant. Financ., vol. 13, no. 11, pp.1691-1692, 2013.29. Z. Xie, Bridging MOOC education and information sciences: empirical studies.

IEEE Access , vol.7, pp. 74206-74216, 2019.30. D. G. Kleinbaum and M. Klein,

Survival Analysis . New York, Springer, 2012.31. N. M. Kiefer, Economic duration data and hazard functions,

J. Econ. Lit., vol. 26, no. 2, pp.646-679, 1988.32. J. H. Gaddum, Lognormal distributions,

Nature, vol. 463, no. 3964, 1945.033. J. Sutton, Gibrat’s legacy,

J. Econ. Lit., vol. 35, no. 1, pp.40-59, 1997.34. Z. Xie, Z. Z. Ouyang, J. P. Li, E. M. Dong, and D. Y. Yi, Modelling transition phenomena ofscientiﬁc coauthorship networks,

J. Assoc. Inf. Sci. Technol., vol. 69, pp.305-317, 2018.35. Z. Xie, J. P. Li, and M. Li, Exploring cooperative game mechanisms of scientiﬁc coauthorshipnetworks,