[PDF] Predicting Online Video Engagement Using Clickstreams

Abstract

In the nascent days of e-content delivery, having a superior product was enough to give companies an edge against the competition. With today's fiercely competitive market, one needs to be multiple steps ahead, especially when it comes to understanding consumers. Focusing on a large set of web portals owned and managed by a private communications company, we propose methods by which these sites' clickstream data can be used to provide a deep understanding of their visitors, as well as their interests and preferences. We further expand the use of this data to show that it can be effectively used to predict user engagement to video streams.

Full PDF

PPredicting Online Video Engagement Using Clickstreams

Everaldo Aguiar

University of Notre DameNotre Dame, Indiana [email protected]

Saurabh Nagrecha

University of Notre DameNotre Dame, Indiana [email protected]

Nitesh V. Chawla ∗ University of Notre DameNotre Dame, Indiana [email protected]

ABSTRACT

In the nascent days of e-content delivery, having a superiorproduct was enough to give companies an edge against thecompetition. With today’s ﬁercely competitive market, oneneeds to be multiple steps ahead, especially when it comes tounderstanding consumers. Focusing on a large set of web por-tals owned and managed by a private communications com-pany, we propose methods by which these sites’ clickstreamdata can be used to provide a deep understanding of their vis-itors, as well as their interests and preferences. We furtherexpand the use of this data to show that it can be effectivelyused to predict user engagement to video streams.

Author Keywords

Clickstream; predictive analysis; online video; userengagement.

ACM Classiﬁcation Keywords

I.5.2. Pattern Recognition: Design Methodology

INTRODUCTION

The constant growth in volume, speed, availability, and func-tionality of the Web brings with it not only a variety of chal-lenges and risks, but also a number of opportunities. Whilethere have been a series of major advances in the ﬁeld overtime, one that has been given a considerable amount of atten-tion in more recent years is that of personalization .Data about users’ online activity is continuously captured andanalyzed. Advanced recommendation systems are now ableto tell us what products we might be interested in buying, thebooks we will enjoy reading, what movies we should watchnext, and even which diseases we are at risk of contracting.From a business perspective, the beneﬁts of being able to un-derstand customers in this level of detail are unquestionable.Methods for capturing user data on the Web are also becom-ing increasingly efﬁcient. As described in [20], the brows-ing behavior of individual users can be recorded at the gran-ularity of mouse clicks with little to no work needing to bedone. A number of services, both free and proprietary, offeruser tracking solutions that can be implemented and deployedwithin minutes. However, the feedback that one usually getsfrom these tools is often in the form of simplistic aggregatestatistics that do not offer a deeper understanding of user be-havior.With that in mind, we set to analyze the application of someof these ideas to a speciﬁc context, while having as our major ∗ Corresponding Author goal the understanding of each user as an individual unit. Forthis study, we were provided a large dataset that describesuser clicks generated within a two-month span and acrossa number of websites managed by a large communicationscompany.This paper describes the process through which we parsed,analyzed, and drew knowledge from a user-generated click-stream dataset provided by a large communications company.We begin by showing, from a more general perspective, howthis type of data can be used to identify particularly interest-ing trends in user interest, and to further illustrate the useful-ness of this information, we describe how we applied methodsto predict user engagement to video streams and discuss theiraccuracy.

Video Start 25% Complete 50% Complete 75% Complete CompleteProgress into video0.00.20.40.60.81.0 V i e w e r r e t e n t i o n CommunityEntertainmentFoodHealthHomeNewsPoliticsSportsTechnologyWeather

Figure 1. Video viewership drop-off by category of content

Distributing content that entices user engagement and cap-tures large audiences is the ultimate goal of all web mediaproviders. Measuring and forecasting these variables, how-ever, is not an easy task. As Figure 1 illustrates, as time goesby, the amount of users that remain tuned to video streamsdramatically decreases. For certain categories, the percent-age of users that actually watch videos to completion can beas low as 20%.To address this undesired outcome, we propose the develop-ment of clickstream-based models that can learn the individ-ual preferences and characteristics of each user, and utilizethis information to predict how “engaged” they will be to aparticular video stream. Being able to know, in advance, if a r X i v : . [ c s . L G ] M a y user is likely to exit a video prematurely allows contentproviders some leeway to implement personalized interven-tion strategies aimed at maximizing viewership retention.The remaining portion of the paper is organized as follows –The next section gives an overview of the most recent relatedliterature. That is followed by a detailed coverage of click-stream data representation and a description of our particulardataset. We then elaborate on the methods applied in thisstudy, the results obtained, and their importance. Finally, thelast section draws conclusions about this exercise and arguesfor the latent potential that resides in user-generated click-stream data. RELATED WORK

Interest in analyzing the online activities of users is as oldas providing consumable content itself. This problem haspiqued the interest of multiple ﬁelds, namely marketing, psy-chology and computer science.Since user activity provides an immense amount of measur-able secondary data, various models to predict multiple as-pects of their behavior have been proposed. User interactionhas been studied at various levels— from gaze tracking [8]to broader patterns of path traversal within a website [1, 22].Simple duration and dwell-time [4] can be used to predictwhen a user exits the site. User classiﬁcation [21] can beused to identify what the user is speciﬁcally looking for andeven morph the website [16] according to the custom tastesof that particular user proﬁle. Personalized content based onclick history has been implemented and widely adopted bycommercial content providers [6, 19].With the distribution of video content online becoming main-stream, the way we study user engagement has been greatlyenriched. Studies like [7] have measured the role of videocontent quality in inﬂuencing user engagement, but did notutilize clickstreams to contextualize the video views. On-line video engagement for Massive Open Online Courses(MOOCs) [13] has shown that the lessons learned from an-alyzing video views can be used to improve video author-ing, editing and interface design. It also emphasizes the valueof video dropout as a metric for engagement. Though theMOOC work lacks the contextual history of the users, in thispaper we leverage similar and many other clickstream fea-tures to predict video engagement.

CLICKSTREAM DATA REPRESENTATION

Clickstream data consists of a “virtual trail” that users leavebehind while they interact with a given system, website orapplication. More speciﬁcally, data that describes the state ofa user’s current session is recorded each time a click is per-formed, and the aggregation of that produces a clickstream ,which can be used to reconstruct all actions taken by the userwhile he or she utilized that given product.While applicable to a variety of scenarios, the collection andanalysis of clickstreams has become most notably popular inthe context of Web-based tools and websites. As highlightedby Srivastava et. al. [25], the analysis of such information

Figure 2. A simple illustration of the clickstream of a typical user has potential applications in a number of areas such as web-site personalization and modiﬁcation, system improvement,business intelligence and usage characterization. Our contri-butions fall mainly within the ﬁrst and last domains.

Our Dataset

The data we utilized for this study was provided to us by alarge U.S.-based communications company that operates inthe radio, TV, newspaper and online media domain. Theymanage a few dozens of websites, all of which are embeddedwith clickstream capturing functionality. Next, we give a de-tailed description of the most important features this datasetcontains.User activity is continuously captured by numerous serversacross the country and is then concatenated at the end of theday in the form of daily “dumps”. We utilized 59 of these ﬁlesthat covered the period ranging from December 4, 2012 toJanuary 31, 2013. Altogether, these ﬁles contain an upwardsof 65 million click instances.Each click instance recorded is characterized by a large num-ber of features (161 in this case). Table 1 lists a small subsetof the most relevant features and a brief description of each.With that information we are able to determine (1) how usersreached the website, (2) what attracted them there, (3) whateature Type Feature Name Description

Nominal

Browser The browser that was usedChannel The site that the page view belongs toCity The city the user accessed the page fromCookies Whether the user had cookies turned on or notCountry The country the user accessed the page fromDomain Domain of the user’s ISPExclude hit Identiﬁes web crawlersFirst hit page URL the user ﬁrst landed on the websiteFrequency of visits Denotes hourly, daily, weekly, monthly or yearly visitIP Refers to the IP address of the userNew visit Determines whether the user is new to the site, based on cookiesReferrer Lists the URL of the website that referred this userRegion Refers to the state or region the user was inSearch Keywords The search string which led to the particular pageSection The section of the website where the click took placeSubsection Subsection of the website where the click took place

Numeric

First hit time Timestamp of when the user ﬁrst landed on the websiteLast click Time stamp of when the last click was made by the userLast visit Refers to when the user visited the site lastTime & Date Timestamp of when the click instance happenedVisit number Refers to the number of times the user has visited the site

Table 1. Dataset features described actions they performed while on the site and (4) how theyeventually exited.Note that while there is no feature that captures the event ofa user leaving the website, as is common practice, we workunder the assumption that when a user is inactive for a periodlonger than 30 minutes (i.e., no click events originate fromthis person during that time), we simply say that the user hasexited the site.This assumption allows us to group these click events fromthe original datasets into user sessions, which illustrate thepath a user takes while browsing the website and can be usedto identify areas that attract more (or less) trafﬁc.Figure 2 illustrates one individual session chosen at randomfrom our dataset. We can see that the user in this case wasreferred to our domain through a link that he or she foundon a social network website and that their visit consisted ofseveral hops, most of which happened in the news section.Aggregating these sessions allows us to visualize which ar-eas of the website are more popular, as well as which linksconnecting different sections are traversed the most. Take forinstance the example illustrated in Figure 3. To generate thisparticular graph, we isolated the sessions corresponding to acertain newspaper’s website, its 12 most popular sections, andthe trafﬁc between them. Among other observations, we no-ticed that the readers of this particular newspaper were oftenprone to navigating to the sports section and reading multiplearticles there.Furthermore, these sessions can be aggregated, producing ahigh-level view of the entire website structure by popularityof section. Figure 3 illustrates this concept. Lastly, we note that based on information retrieved from spe-ciﬁc features of our dataset, it is possible to determine if a useris simply browsing text articles, displaying image galleries orstreaming online video. The following sections of this paperwill describe how we used this fact to aid in the developmentof predictive models for video viewership engagement.

METHODSIdentiﬁcation of Video Exit Instances

When a user watches a video, a separate log entry is madecorresponding to when he or she completes watching a certainpercentage of the video, while a player ID remains constant.This makes the clickstream log reﬂect a cumulative history ofthe viewer’s progress within that video.By ﬁltering the data to get only clicks corresponding to videoinstances, and then by IP address, we obtain the entire videoviewing activity of each IP. From this modiﬁed dataset, weisolate an individual “video view” table by specifying theplayer ID. This table is then sorted chronologically and ﬁl-tered by session timeout. The last entry corresponds to theviewer’s exit point. This gives us a unique session for a visit.In combination with the current session data, and data fromcookies, we retrieve the user’s unique historical browsing pat-terns. It should be noted that in the absence of cookies, wetreat the user as a fresh incoming visitor. For our analysis, weisolated only the instances where the user exited the video.Due to the inherently discretized nature of the data collection,we get a coarse-grained estimate of when the user reached acertain percent of the video. If the last entry shows that theuser watched 50% of a video, it can be inferred that the userexited at p % , such that p ∈ [50 , . igure 3. Clickstream network for a news-media website. The vari-ous nodes displayed here represent different sections. The direction ofthe arrows represents user trafﬁc ﬂowing between these sections and thethickness is indicative of volume of said trafﬁc. Feature Selection

Using various feature selection methods, we reduced the sizeof our dataset from the original 161 features to the 12 best de-scriptors. Among these were features like IP, location, con-tent annotations, and referrer information. Out of the 161features in a typical video exit instance, 40 are mutually re-dundant, and 32 are constant in value. This motivates the needto ﬁnd a set of features that is the best descriptor of the targetclass (in this case, the percent of video the user watches be-fore exiting) [14]. We investigated various feature selectionmethods which support mixed data types and ranked the topfeatures. One would expect these features to encompass mea-surable user traits which inﬂuence their interest in the video.Various feature selection methods aim to remove redundantand irrelevant features using different statistical means, whichhave their respective strengths. Though a popular choice inmachine learning, correlation based feature selection (CFS)was not considered due to the sparse nature of the data [15].A more detailed study of these can be found in [28, 12]. Thefeature selection methods employed in this problem are de-scribed below:

Chi Squared

The chi squared ( χ ) method measures how much deviationis seen in the observed data from the case where the classvalue and the feature are independent of each other. It evalu-ates whether the feature and class occurrences are randomlyrelated, or exhibit some relation. Features Chi IG GR oneR Symm

Time 1 1 7 - 2IP 2 2 9 - 3First hit referrer 3 3 5 2 5First hit page 4 5 10 - 7Story title 5 4 2 1 1Search engine 6 7 3 3 8City 7 6 - - 9ISP 8 8 - - 10Referrer type 9 10 1 - 4

Table 2. Feature Selection Rankings. (description of abbreviations)

Information Gain

Information gain [23] measures how much entropy is lostwhen the feature is present vs. absent.

Gain Ratio

Information gain favors attributes with many values overthose with fewer values, the gain ratio [24] compensates forthis by factoring in the amount of split caused by the feature.

One R

One R formulates a set of simple relationships between thefeatures and ranks the features based on how accurate theserules are.

Symmetric Uncertainty

Symmetric uncertainty [26, 10] targets attributes which cor-relate well with the class but have little intercorrelation.The results of these feature selection methods are summa-rized in Table 2. The attributes in the table are the ones whichconsistently appear in the top 10%. These are the attributeswhich inﬂuence video exit points the most.The time of viewing inﬂuences at what point people are proneto exit the video. IP address, in conjunction with location,and ISP indicate who is watching the video and thus offera personalized facet to the prediction. The number of pagesviewed by a person and frequency of visits can be perceivedto be reﬂective of the person’s interest in the site. The re-ferrer which brought the viewer on the site can inﬂuence theengagement of the viewer; a viewer coming from a social net-work link interacts differently than one who had the site book-marked on their browser. The entry point is the ﬁrst page theviewer saw in their current viewing session; this determinestheir interest in consuming further content. The actual title ofthe story includes the section which the video is under. As wehad observed in Figure 1, users viewing “Technology” relatedvideos were less likely to exit than those viewing “Entertain-ment” related videos.

Classiﬁcation

Our aim is to predict how much of the video a user watchesbefore exiting. In our dataset, we ﬁnd that this is representedby 5 distinct markers, which correspond to the percentage ofhe video the user watched before exiting. We formulate twoclassiﬁcation tasks- to predict what percent of the video iswatched, and whether the user exits the video “early” (beforereaching 50% of the video).This prediction task involving 5 classes. Since it is relevant topredict users who exit early on in the video, we assume thatusers who exit the video at the beginning or having viewed25% of the video to have exited “early”. As described above,this would correspond to users who have viewed 0 to 49% ofthe video.

Figure 4. Converting the Percentages Classiﬁcation to Early Exit Clas-siﬁcation: The 5 class problem (top) is reduced to a binary classiﬁcationproblem by merging classes (bottom).

We can reﬁne the problem as the binary prediction of these“early exits”. The classes would then be a merger of the pre-viously mentioned 5 classes, with the ﬁrst two combined toform that of “early exits” and the latter 3 being those whochose to not exit early. This simpliﬁcation is depicted in therepresentative expected confusion matrices for both of theclassiﬁcation tasks as displayed in Figure 4. We have per-formed both prediction analyses on our data.

Naive Bayes

Among the simplest and most primitive classiﬁcation algo-rithms, this probabilistic method is based on the Bayes The-orem [2] and strong underlying independence assumptions.That is, each feature is assumed to contribute independentlyto the class outcome.

C4.5 Decision trees

C4.5 Decision Trees [24] work by building a tree structurewhere split operations are performed on each node based oninformation gain values for each feature of the dataset andthe respective class. At each level, the attribute with highestinformation gain is chosen as the basis for the split criterion.

Repeated Incremental Pruning to Produce Error Reduction

RIPPER [5] is a rule based classiﬁcation tree learner. Itis algorithmically faster than C4.5, having a complexity of O ( n ( log ( n )) ) as opposed to C4.5’s complexity of the order O ( n ) . RIPPER constructs an initial set of rules and then it-eratively optimizes it according to a tunable parameter. It isimplemented in Weka under the “JRip class”. Random forests

Random forests [3] combine multiple tree predictors in anensemble. New instances being classiﬁed are pushed downthe trees, and each tree reports a classiﬁcation. The “forest”then decides which label to assign to this new instance basedon the aggregate number of votes given by the set of trees.

Decision Tables

Decision Table classiﬁers [18] are built by concatenating aseries of rules derived from the feature set to correspondingclass outcomes. This method as its major advantages the factthat it is easy to interpret and notably efﬁcient.

Random Subspaces

The random subspace method [17] is an ensemble classiﬁerwhose individual classiﬁers operate on random subsets of thefeature set. The predictions made by the individual classiﬁersare combined using the posterior probabilities of each classin the constituent classiﬁers. This method looks at the clas-siﬁcation problem from various perspectives by randomizingthe selection of features.

Stacking

Stacking [27] is a meta-classiﬁcation scheme which employsan ensemble of classiﬁers and performs the learning task ontwo levels. First, the classiﬁers in the ensemble are trained onthe data, then the meta-classiﬁer learns from their predictionsand the training labels of the data.

Key Performance Indices / Metrics Utilized

Our key performance index is the accuracy of prediction ofwhen the user will drop-off in the video. To obtain these pre-dictions, we perform 10-fold cross-validation on the availabledata using various classiﬁcation methods. In 10-fold crossvalidation, the data is randomly partitioned into 10 subsetsand predictions are made on each of these. These predictionsare then aggregated to provide the overall performance of theclassiﬁer, which we measured by the accuracy and area underthe Receiver Operating Characteristics curve (AUROC), allataset Classiﬁer Acc AUROCNB 0.416

C4.5 0.547 0.699Multiclass RIP 0.547 0.629DT 0.543 0.717ST ST Table 3. Summary of results obtained for each classiﬁer and dataset.The classiﬁers used are NB: Naive Bayes, C4.5: C4.5 decision tree, RIP-PER: Repeated Incremental Pruning to Produce Error Reduction, DT:Decision Table, ST: Stacking using random subspaces of decision trees of which are described in further detail below. Each of thesemeasures depict various aspects of the prediction results.

Accuracy

The accuracy of a classiﬁer is perhaps the simplest measure-ment of its performance. It represents the percentage of totalinstances that were correctly classiﬁed. We would like for thisto be as high as possible. The baseline for accuracy is that of aperfectly random prediction. For a binary classiﬁcation prob-lem, this would be 50% and for a 5 class problem, the base-line accuracy would be 20%. Any classiﬁer which deliversstatistically greater accuracy than these respective baselines,is considered to be better than a random predictor.

Receiver Operating Characteristics (ROC) curves

A system tuned to increase accuracy does not necessarilymake it a good predictor. Relying on accuracy alone doesnot provide insights into the nature of misclassiﬁed instances.ROC curves [11] are a way to quickly compare multiple clas-siﬁers. The goal of a classiﬁer in ROC space is to be asclose to the upper-left corner as possible. In ROC space, ifthe curve for one classiﬁer is closer to the upper-left cornerthan that for another, then it is considered to have a superiorperformance.

EXPERIMENTAL RESULTS

We evaluated the performance of each of the classiﬁers used,with 10-fold cross validation for both the multiclass and thebinary classiﬁcation predictions. Table 3 summarizes the re-sults of all experiments.

Multiclass Prediction

We see that in terms of sheer accuracy, the stacked classiﬁersperformed slightly better than other methods, achieving anaccuracy of 56.9%. In terms of AUROC, however, it is seenthat Naive Bayes performs much better, closely followed byDecision Tables. These simple classiﬁers might not have thebest accuracy, but outperform the others.

Binary Class Prediction

In this second scenario, we associate a semantic meaningto the drop-off percentage point and predict if the user willexit early or not. This reﬁnement of the problem statementgives us a much better performance across the board. Thestacked classiﬁers, for instance, achieve a remarkable accu-racy of 84.6% when predicting which users exited their videostreams prematurely. As it was the case with the multiclassproblem, we again saw that Decision Tables and Naive Bayessurpassed the other classiﬁers in terms of AUROC values.Though stacked classiﬁers give greater accuracy, they are notas good as Decision Tables or Naive Bayes in predicting earlydrop-off. This is still reﬂective of the general trends observedin the multiclass problem as we have merely merged classes,the underlying data remains the same.In both, the multiclass and binary class prediction, it is ob-served that simpler rule based learners outperform compli-cated meta-classiﬁers. This is documented in [9], showingthat stacking does not always outperform the best classiﬁer. T r ue po s i t i v e r a t e False positive rateROC curve comparison between different classifiers naive bayesc4.5ripperdecision tablestacking

Figure 5. ROC curves for the binary class problem. A comparison ofvarious classiﬁers to predict early exit behavior.

We see that simple classiﬁcation algorithms can be used toachieve comparable, or even better performance than compli-cated meta-classiﬁers. Besides the obvious performance su-periority, it is desirable to use simpler classiﬁers on groundsof computational complexity, as implementing these is algo-rithmically more scalable and thus offers faster runtime.

CONCLUSIONS

We demonstrated how clickstream data can be used to pre-dict “early exits” in online videos. By constructing models tothis effect, we were able to identify with high accuracy whichvideo streaming sessions are likely to terminate prematurely.Additionally, we compared and contrasted the performance ofa number of classiﬁers, highlighting those that we found to bearticularly ﬁt to this problem. Having knowledge of such in-formation would allow content providers to personalize howtheir media is distributed so as to increase user retention, andas a result, business value.

REFERENCES

1. Banerjee, A., and Ghosh, J. Clickstream clustering usingweighted longest common subsequences. In

Proceedingsof the web mining workshop at the 1st SIAM conferenceon data mining , vol. 143, Citeseer (2001), 144.2. Bayes, M., and Price, M. An essay towards solving aproblem in the doctrine of chances. By the late Rev. Mr.Bayes, communicated by Mr. Price, in a letter to JohnCanton, M.A. and F.R.S.

Philosophical Transactions(1683-1775) (1763), 370–418.3. Breiman, L. Random forests.

Machine learning 45 , 1(2001), 5–32.4. Bucklin, R. E., and Sismeiro, C. A model of web sitebrowsing behavior estimated on clickstream data.

Journal of Marketing Research (2003), 249–267.5. Cohen, W. W. Fast effective rule induction. In

ICML ,vol. 95 (1995), 115–123.6. Das, A. S., Datar, M., Garg, A., and Rajaram, S. Googlenews personalization: scalable online collaborativeﬁltering. In

Proceedings of the 16th internationalconference on World Wide Web , ACM (2007), 271–280.7. Dobrian, F., Sekar, V., Awan, A., Stoica, I., Joseph,D. A., Ganjam, A., Zhan, J., and Zhang, H.Understanding the impact of video quality on userengagement.

SIGCOMM-Computer CommunicationReview 41 , 4 (2011), 362.8. Dreze, X., and Hussherr, F.-X. Internet advertising: Isanybody watching?

Journal of interactive marketing 17 ,4 (2003), 8–23.9. Dˇzeroski, S., and ˇZenko, B. Is combining classiﬁerswith stacking better than selecting the best one?

Machine learning 54 , 3 (2004), 255–273.10. Eom, J.-H., and Zhang, B.-T. Machine learning-basedtext mining for biomedical information analysis.

Genomics & Informatics 2 , 2 (2004), 99–106.11. Fawcett, T. An introduction to roc analysis.

Patternrecognition letters 27 , 8 (2006), 861–874.12. Forman, G. An Extensive Empirical Study of FeatureSelection Metrics for Text Classiﬁcation.

The Journal ofMachine Learning Research 3 (2003), 1289–1305.13. Guo, J. K. P. J., Krzysztof, D. T. S. P. M., and Miller, Z.G. R. C. Understanding in-video dropouts andinteraction peaks in online lecture videos. 14. Guyon, I., and Elisseeff, A. An introduction to variableand feature selection.

The Journal of Machine LearningResearch 3 (2003), 1157–1182.15. Hall, M. A.

Correlation-based Feature Selection forMachine Learning . PhD thesis, The University ofWaikato, 1999.16. Hauser, J. R., Urban, G. L., Liberali, G., and Braun, M.Website morphing.

Marketing Science 28 , 2 (2009),202–223.17. Ho, T. K. The random subspace method for constructingdecision forests.

Pattern Analysis and MachineIntelligence, IEEE Transactions on 20 , 8 (1998),832–844.18. Kohavi, R. The power of decision tables. In

MachineLearning: ECML-95 . Springer, 1995, 174–189.19. Liu, J., Dolan, P., and Pedersen, E. R. Personalized newsrecommendation based on click behavior. In

Proceedings of the 15th international conference onIntelligent user interfaces , ACM (2010), 31–40.20. Mobasher, B., Cooley, R., and Srivastava, J. AutomaticPersonalization Based on Web Usage Mining.

Communications of the ACM 43 , 8 (2000), 142–151.21. Moe, W. W. Buying, searching, or browsing:Differentiating between online shoppers using in-storenavigational clickstream.

Journal of ConsumerPsychology 13 , 1 (2003), 29–39.22. Montgomery, A. L., Li, S., Srinivasan, K., and Liechty,J. C. Modeling online browsing and path analysis usingclickstream data.

Marketing Science 23 , 4 (2004),579–595.23. Quinlan, J. R. Induction of decision trees.

MachineLearning 1 , 1 (1986), 81–106.24. Quinlan, J. R.

C4.5: Programs for Machine Learning ,vol. 1. Morgan Kaufmann, 1993.25. Srivastava, J., Cooley, R., Deshpande, M., and Tan, P.-N.Web usage mining: Discovery and applications of usagepatterns from web data.

ACM SIGKDD ExplorationsNewsletter 1 , 2 (2000), 12–23.26. Witten, I. H., and Frank, E.

Data Mining: PracticalMachine Learning Tools and Techniques . MorganKaufmann, 2005.27. Wolpert, D. H. Stacked generalization.

Neural networks5 , 2 (1992), 241–259.28. Yang, Y., and Pedersen, J. O. A comparative study onfeature selection in text categorization. In