[PDF] Learning Skill Equivalencies Across Platform Taxonomies

Abstract

Assessment and reporting of skills is a central feature of many digital learning platforms. With students often using multiple platforms, cross-platform assessment has emerged as a new challenge. While technologies such as Learning Tools Interoperability (LTI) have enabled communication between platforms, reconciling the different skill taxonomies they employ has not been solved at scale. In this paper, we introduce and evaluate a methodology for finding and linking equivalent skills between platforms by utilizing problem content as well as the platform's clickstream data. We propose six models to represent skills as continuous real-valued vectors and leverage machine translation to map between skill spaces. The methods are tested on three digital learning platforms: ASSISTments, Khan Academy, and Cognitive Tutor. Our results demonstrate reasonable accuracy in skill equivalency prediction from a fine-grained taxonomy to a coarse-grained one, achieving an average recall@5 of 0.8 between the three platforms. Our skill translation approach has implications for aiding in the tedious, manual process of taxonomy to taxonomy mapping work, also called crosswalks, within the tutoring as well as standardized testing worlds.

Full PDF

LLearning Skill Equivalencies Across Platform Taxonomies

Zhi Li

University of California, [email protected]

Cheng Ren

University of California, [email protected]

Xianyou Li

University of California, [email protected]

Zachary A. Pardos

University of California, [email protected]

ABSTRACT

Assessment and reporting of skills is a central feature of manydigital learning platforms. With students often using multiple plat-forms, cross-platform assessment has emerged as a new challenge.While technologies such as Learning Tools Interoperability (LTI)have enabled communication between platforms, reconciling thedifferent skill taxonomies they employ has not been solved at scale.In this paper, we introduce and evaluate a methodology for find-ing and linking equivalent skills between platforms by utilizingproblem content as well as the platform’s clickstream data. Wepropose six models to represent skills as continuous real-valuedvectors, and leverage machine translation to map between skillspaces. The methods are tested on three digital learning platforms:ASSISTments, Khan Academy, and Cognitive Tutor. Our resultsdemonstrate reasonable accuracy in skill equivalency predictionfrom a fine-grained taxonomy to a coarse-grained one, achieving anaverage recall@5 of 0.8 between the three platforms. Our skill trans-lation approach has implications for aiding in the tedious, manualprocess of taxonomy to taxonomy mapping work, also called cross-walks, within the tutoring as well as standardized testing worlds.

CCS CONCEPTS • Applied computing → Education ; •

Computing methodolo-gies → Learning latent representations ; Machine translation . KEYWORDS

Skill equivalencies, transfer models, crosswalks, taxonomies, digitallearning platforms, representation learning, machine translation,interoperability, acknowledging prior knowledge, app hand-offs.

ACM Reference Format:

Zhi Li, Cheng Ren, Xianyou Li, and Zachary A. Pardos. 2021. Learning SkillEquivalencies Across Platform Taxonomies. In

LAK21: 11th InternationalLearning Analytics and Knowledge Conference (LAK21), April 12–16, 2021,Irvine, CA, USA.

ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3448139.3448173

Digital learning platforms commonly tag skills to assessment itemsin order to measure students’ mastery and guide their learning tra-jectories [3, 14, 29]. Due to the formative nature of digital learningplatforms [12], their skill taxonomies are generally finer-grainedas compared to those found in large scale summative tests, where

LAK21, April 12–16, 2021, Irvine, CA, USA broader constructs and abilities are measured [33]. The presenceof an accurate skill or knowledge component model [14] in a dig-ital learning platform can have a significant positive impact onits efficacy [4]. Conversely, the absence of an accurate expert skillmodel can impede effective assessment of mastery and subsequentlyprohibit adaptive learning approaches [22].Customarily, digital learning platforms have developed theirown taxonomies, but the demand for greater interoperability acrossplatforms is rapidly growing [28]. Teachers and students todayoften use a mix of devices and many learning platforms per devicein class [9]. To effectively assess students and acknowledge priorlearning when they switch platforms, smooth "hand-offs" [10] be-tween apps are needed, including a shared taxonomy or translationbetween taxonomies enabling learning progress on one platform tobe continued on the next. Many platforms in the United States havealready embarked on migrating to a common taxonomy such asthe Common Core State Standards [1] and re-tagging their contentor re-mapping their in-house taxonomy to this standard. This taskis highly time consuming and likely to be repeated every time anew set of common standards are introduced or modified.In general, the task of validating the equivalence of skills acrosstaxonomies is non-trivial. For example, the skill of "area of irregu-lar figure" in system A and "area of quadrilaterals and polygons"in system B may or may not be strongly related and it is almostimpossible to determine without looking at the set of problemsassociated with each.To date, mapping taxonomies across digital learning platformsusing machine learning has largely remained unexplored. Theyare, however, a prime candidate for this type of approach as theyoffer data affordances distinct from the standardized testing con-text, where most taxonomy mapping has occurred. In particular,platforms store data about students’ learning behaviors such aslogged clickstream and response sequence data [11]. Some researchhas successfully harnessed response data to infer the underlyingskill of problems [24]. Here, we evaluate the utility of responsesequence data, in addition to problem text, for cross-platform tax-onomy mapping.We show that a taxonomy mapping, or transfer model, can belearned between the skill vector spaces of different platforms, simi-lar to how machine language translation learns mappings betweenword embedding spaces [17]. Our main contributions are:(1) Six models proposed to represent skills as continuous real-valued vectors, with each model able to represent differentinput data. a r X i v : . [ c s . C Y ] F e b AK21, April 12–16, 2021, Irvine, CA, USA Zhi Li, Cheng Ren, Xianyou Li, and Zachary A. Pardos (2) Empirical validation of the feasibility of employing machinetranslation to map between the taxonomies of three digitallearning platforms.(3) Further extension of the methodology to situations wheretwo platforms have asymmetric data types available for rep-resentation (e.g., one platform with problem text, the otherwith only response sequences).(4) Inspection of important factors that have an influence onskill equivalency prediction performance, including modelhyperparameters and differences between source and desti-nation taxonomy granularity.

Taxonomy mapping has relied on manual work by subject matterexperts [7, 29, 32], though there has been past research on automat-ing the process using legacy Natural Language Processing (NLP)to find similar skills across taxonomies using text descriptions ofeach skill [6, 35]. Choi et al. [6] converted each skill statement to averb phrase graph and a noun phrase graph, then calculated simi-larity between skills by comparing graphs. Yilmazel et al. [35] usedrule-based methods to extract features from skill descriptions inone standard and trained a machine learning classifier to map themto another standard.Several terms have been used in the literature to refer to themappings between skill taxonomies. The term "crosswalk" is one,derived from the idea of creating a path to cross a street, used todescribe the connection between two taxonomies or sets of ed-ucational standards [7, 32]. Other terms like "transfer" [29] and"alignment" [6, 35] have also been used in related work. In thecontext of digital learning environments, several terms have beenused to refer to the elements of their taxonomy. Intelligent TutoringSystems refer to each element as a knowledge component (KC) anda taxonomy of elements as a knowledge component model. A KC isdefined as, "an acquired unit of cognitive function or structure thatcan be inferred from performance on a set of related tasks" [14].These KCs, tagged to problem steps, allow for adaptive tutoringand cognitive mastery estimation in systems like the CognitiveTutor (now MATHia) [4]. The generic term of "skill" [29] or "tag"[28] has been more commonly used in digital learning environ-ments to describe a semi-granular subject area associated with alearning resource. The term "skill" can also be used to generalizethe concepts of tags and knowledge components, which is howwe will use it throughout the paper. Differences in granularity andepistemology of taxonomies add to the challenge of cross-platformskill equivalency learning.Methodologically, no prior work has utilized problem text orresponse sequences, nor have modern neural approaches from com-putational linguistics been brought to bear on taxonomy mapping.In the broader field of learning analytics research, neural wordembeddings [18] have been utilized in a number of areas wheretext descriptions of educational resources are available. Examplesinclude detecting student misconceptions [16], extracting courseconcepts [20], and coding non-cognitive traits related to studentsuccess [31]. Non-text, sequence data can also carry useful semanticinformation, such as sequences in which an educational resourceor skill appears (i.e., its clickstream logs), which we call "context" information. Previous research has shown that item embeddingslearned from sequence contexts with a skip-gram model [18] canencode underlying attributes of items helpful to downstream insti-tutional prediction tasks [13], skill label inference [24], and courserecommendation [19, 25]. After item embeddings are learned, atranslation model can be trained to map items from one embeddingto another. This is called machine translation and was introduced inthe context of language translation using word embeddings [2, 17];however, the idea can be extended to translation between any em-bedding spaces, such as translating course embeddings betweeninstitutions to identify candidate courses for credit articulation [23].

Each platform consists of problems students are expected to solvethat are associated with one or more skills labeled by domain ex-perts. We define the problem of skill equivalency prediction acrossplatforms as follows: given a skill 𝑠 in a source platform 𝑠𝑟𝑐 , find 𝑘 most similar skills ordered by similarity in destination 𝑑𝑠𝑡 . Theinput includes the content and/or context information of skillsand some ground truth equivalent skill pairs. More precisely, ourmethod is as follows:(1) Represent each skill as a continuous real-valued vector usingproblem content and/or sequence context.(2) Learn a translation model from the source skill space to thedestination skill space.(3) Calculate cosine similarities between skills across platforms: 𝑐𝑜𝑠𝑖𝑛𝑒 _ 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 ( 𝑠 𝑠𝑟𝑐 , 𝑠 𝑑𝑠𝑡 ) = 𝑠 𝑠𝑟𝑐 · 𝑠 𝑑𝑠𝑡 ∥ 𝑠 𝑠𝑟𝑐 ∥∥ 𝑠 𝑑𝑠𝑡 ∥ = (cid:205) 𝑛𝑖 = 𝑠 𝑖𝑠𝑟𝑐 𝑠 𝑖𝑑𝑠𝑡 √︃(cid:205) 𝑛𝑖 = ( 𝑠 𝑖𝑠𝑟𝑐 ) √︃(cid:205) 𝑛𝑖 = ( 𝑠 𝑖𝑑𝑠𝑡 ) (4) For each source skill, rank the destination skills by similarityand take the top 𝑘 as predictions.In this section, we describe the models for skill representa-tion and translation. Representation models can be categorizedinto three types: content-based models (Section 3.1), context-basedmodel (Section 3.2), and models combining content and context(Section 3.3). The way in which we will translate one representationto another is borrowed from machine translation (Section 3.4). Codeto replicate our methodology can be found online . We represent skills as functions of the problem content they areassociated with. The content of a problem in a digital learningplatform can include text, graphical figures, and video. For thisstudy, our data only contain the text portion of the content. Weutilize the following three content-based representations: Bag-of-words, TF-IDF, and Content2vec.

Bag-of-words is a standard text processingtechnique where a document is represented as a vector whoselength is equal to the vocabulary size and values are the frequenciesof the words occurring in the document. In our experiments, wetake all unique words from both platforms as the vocabulary, and https://github.com/CAHLR/skill-equivalency earning Skill Equivalencies Across Platform Taxonomies LAK21, April 12–16, 2021, Irvine, CA, USA represent each problem as a Bag-of-words vector. Then for everyskill, we find all problems associated with the skill and averagetheir representations together arriving at a skill vector. A weakness of Bag-of-words is that the vector mightbe dominated by frequent but non-distinguishing words. TF-IDF(Term Frequency-Inverse Document Frequency), a method adaptedfrom Bag-of-words, can address this issue, where the values nowreflect how distinct a word is to a problem relative to the collectionof all problems. The TF-IDF score for word 𝑤 in problem 𝑝 fromproblem set 𝑃 is calculated by: 𝑇 𝐹 - 𝐼𝐷𝐹 ( 𝑤, 𝑝, 𝑃 ) = 𝑇 𝐹 ( 𝑤, 𝑝 ) · 𝐼𝐷𝐹 ( 𝑤, 𝑃 ) (1)where 𝑇 𝐹 ( 𝑤, 𝑝 ) = 𝑙𝑜𝑔 ( + 𝑓 𝑟𝑒𝑞 ( 𝑤, 𝑝 )) (2) 𝐼𝐷𝐹 ( 𝑤, 𝑃 ) = 𝑙𝑜𝑔 ( | 𝑃 | 𝑐𝑜𝑢𝑛𝑡 ( 𝑥 ∈ 𝑃 : 𝑤 ∈ 𝑥 ) ) (3) Content2vec is a method built on word embed-dings. While previous works have mainly utilized word vectors pre-trained on large-scale datasets like Google News corpus [16, 20, 31],our problem space is different from theirs in that mathematicaljargon, symbols, and formulas that are not frequently seen in othercorpora take up the majority of our problem texts. Therefore, wetest two versions of Content2vec, one with word vectors pretrainedon Google News and the other with our own word vectors trainedon all the problem texts. We represent each problem as the averageof the word vectors, and each skill as the average of the problemvectors.Note that in content-based methods, representations of prob-lems and skills across platforms share the same space and thusthere is no need to translate between spaces for cross-platform skillcomparison.

Context information comes from the response log data of a digi-tal learning platform, where each row denotes an interaction be-tween a student and the platform, and pertinent columns includeanonymized student ID, attempted problem ID, tagged skills, andstart time. To make use of context information we propose Skill2vec.In this method, data are preprocessed into skill sequences per stu-dent ordered by start time, and a skip-gram model is then appliedto these sequences to learn a continuous vector embedding for eachskill. The model is similar to previous work in which problems wereembedded based on problem sequence [24], but has not until nowbeen applied to skills.Unlike content-based methods, Skill2vec is trained separatelyon each platform, hence the generated skill vector spaces are notaligned. Therefore, before measuring cross-platform skill similarity,a translation is necessary to align the two vector spaces. Detailsare discussed in Section 3.4.

Since both content and context data may contain useful informationabout skills, we also evaluate the combination of them that mayform a more descriptive representation than either alone. In thissection, we introduce two models that combine content and contextinformation: a simple concatenation of our previously mentioned content and context vectors and a separate model from the literature,called Text-Associated Matrix Factorization (TAMF), that integratesboth during training.

The first model is simply the concate-nation of Content2vec and Skill2vec. Since Content2vec does notneed translation but Skill2vec does, we design the process of con-catenation as follows (Figure 1): we first learn a translation modelfor context vectors, and then combine the translated source contextvector with the source content vector, while the destination skillvector is obtained by appending the destination content vector tothe unchanged destination context vector. By doing so, we makesure the two components reside in the same space, thus allowingthe concatenated vectors to be compared.

Figure 1: Process of combining Skill2vec and Content2vecrepresentations

While concatenation ofrepresentations is commonplace, we also sought out a method thatlearns a single representation from both sources. Text-AssociatedMatrix Factorization (TAMF) [34] is such a method and may beable to learn features related to the interaction between contentand context, and could in principle be more expressive than simpleconcatenation where no interaction between sources can be utilized.The method learns embeddings based on nodes in a graph andincorporates content information in a matrix factorization stage.Our adaptation differs from the original model in that they beginwith a graph and use Deepwalk [26] to generate a context matrixfrom which item embeddings are learned, while our response logdata are already sequences, therefore we factorize the PositivePointwise Mutual Information (PPMI) matrix derived from our skillsequences instead of a Deepwalk matrix.Specifically, let 𝑆 be the set of all skills and 𝑃 be the set of all skilland context skill pairs observed in the input data, then the PPMI ma-trix 𝑀 ∈ R | 𝑆 |×| 𝑆 | can be calculated by 𝑀 𝑠,𝑐 = max ( , log ( 𝑠,𝑐 ) ·| 𝑃 | ( 𝑠 ) · ( 𝑐 ) ) where 𝑠 is a skill, 𝑐 is a context skill, and ( 𝑠 ) , ( 𝑐 ) , and ( 𝑠, 𝑐 ) are the counts of occurrences of 𝑠 , 𝑐 , and the pair 𝑠, 𝑐 within a win-dow size in all sequences. According to Levy and Goldberg [15],a skip-gram model (Skill2vec in our case) implicitly factorizes thePPMI matrix 𝑀 = 𝑊 ⊤ 𝐻 where 𝑊 , 𝐻 ∈ R 𝑘 ×| 𝑆 | , with 𝑘 being the AK21, April 12–16, 2021, Irvine, CA, USA Zhi Li, Cheng Ren, Xianyou Li, and Zachary A. Pardos dimension of the learned embeddings. The matrix 𝑊 is equivalentto the output embeddings of the skip-gram model.With TAMF, the problem is formulated as 𝑀 = 𝑊 ⊤ 𝐻𝑇 , where 𝑊 ∈ R 𝑘 ×| 𝑆 | , 𝐻 ∈ R 𝑘 × 𝑑 , and 𝑇 ∈ R 𝑑 ×| 𝑆 | (Figure 2). The new matrix 𝑇 is the Content2vec matrix of embedding size 𝑑 . The output ofthe method is the concatenation [ 𝑊 ⊤ , ( 𝐻𝑇 ) ⊤ ] ∈ R | 𝑆 |× 𝑘 withembedding size 𝑘 . In this way content and context are deeplymerged in the skill representations. Figure 2: Text-Associated Matrix Factorization (TAMF)

The optimization objective of TAMF is to search for 𝑊 and 𝐻 that minimizes the loss: 𝐿 = ∥ 𝑀 − 𝑊 ⊤ 𝐻𝑇 ∥ 𝐹 + 𝜆 (∥ 𝑊 ∥ 𝐹 + ∥ 𝐻 ∥ 𝐹 ) where the first term ∥ 𝑀 − 𝑊 ⊤ 𝐻𝑇 ∥ 𝐹 is the distance betweenthe original PPMI matrix and the reconstructed matrix from thelearned factorization, and the second term 𝜆 (∥ 𝑊 ∥ 𝐹 + ∥ 𝐻 ∥ 𝐹 ) is aregularization term restricting the magnitude of the learned matrix.Minimizing this loss should give us a reasonable factorization whilenot overfitting to the data.This loss 𝐿 is convex with respect to 𝑊 when 𝐻 is fixed and viceversa, hence we can iteratively optimize in closed form by takingpartial derivatives, reorganizing, and solving linear systems for 𝑊 and 𝐻 given the other one fixed until the decreasing rate of 𝐿 isbelow a certain threshold. While this process does not guaranteeto hit the global minimum, in practice we find it works well. For Skill2vec and TAMF, representations of skills in different plat-forms are learned independently and do not share the same space.They may not even have the same dimensionality. To align vectorspaces, we use machine translation to learn a transformation froma source skill vector space to destination space. Previous researchhas found that linear transformation outperforms more complexneural translation models on word-to-word translation tasks [17].Therefore we choose to learn a linear translation matrix 𝑇 ∈ R 𝑚 × 𝑛 that maps a source skill vector 𝑣 𝑠 ∈ R 𝑛 to a vector 𝑣 𝑑 ∈ R 𝑚 inthe destination vector space by maximizing the cosine similaritiesof translated source vectors and ground truth destination vectorsbased on a set of known equivalent skill pairs. The datasets for this study are from three digital learning platforms:ASSISTments , Khan Academy , and Cognitive Tutor (now namedMATHia ). All three platforms offer content primarily for middleschool and high school students. ASSISTments.

We use the public ASSISTments 2012-2013 dataset including problem texts, which contains problems for 7th and 8thgrades math [12]. Most problems in ASSISTments are tagged witha single skill and we keep only these problems and keep only skillswith at least 1,000 response logs associated with them. We applythis filter since ASSISTments contains much content that is teacherproduced for a single class. ASSISTments has the most coarse-grained taxonomy among the three platforms. Khan Academy.

We use anonymized student responses collectedfrom Khan Academy 7th and 8th grades math exercises between2013-2014, and collect problem texts through web scraping. Becausethere are no explicit skills assigned in our data, and each exercise inKhan Academy serves as a template generating multiple problems,we decide to regard each exercise as a skill, as was done in Piechet al. [27]. This results in a very minor difference between someskills, (e.g., 𝑐𝑜𝑚𝑏𝑖𝑛𝑖𝑛𝑔 _ 𝑙𝑖𝑘𝑒 _ 𝑡𝑒𝑟𝑚𝑠 _ and 𝑐𝑜𝑚𝑏𝑖𝑛𝑖𝑛𝑔 _ 𝑙𝑖𝑘𝑒 _ 𝑡𝑒𝑟𝑚𝑠 _ are considered as two skills). We find that keeping these skillsseparate gives better performance than grouping them together.Khan Academy’s taxonomy is finer than ASSISTments, but coarserthan Cognitive Tutor. Cognitive Tutor.

We use the publicly available Cognitive Tutordataset from the 2010 KDD Cup [30]. We choose the Algebra I2008-2009 challenge dataset as it is the largest and its content cov-ers 8th-10th grade math [21] with good overlap with the other twoplatforms in our study. Cognitive Tutor skills are assigned per step,rather than per problem, and it is not uncommon for each step tohave more than one skill. To allow for multiple skills, we gener-ate the skill sequences for Skill2vec by randomly ordering skillswithin a single step. We select the column "KC(SubSkills)" to serveas the skills column since it has the least number of missing valuescompared to alternative KC columns. Note that problem texts (i.e.content information) are not present in Cognitive Tutor, raising theissue of asymmetric data addressed in Section 5.2. The taxonomyof Cognitive Tutor is the finest-grained.

Preprocessing.

For all three datasets, we drop the rows without skillassignment. For problem texts we tokenize words and clean thetexts by removing stop words and converting to lower case. Thebasic descriptives of the datasets after preprocessing are shown inTable 1.

Skill Equivalence Labeling.

There are currently no skill equivalencylabels across these three platforms, so we create the labels ourselves.For each pair of platforms, the three authors separately annotated https://sites.google.com/site/assistmentsdata/home/2012-13-school-data-with-affect earning Skill Equivalencies Across Platform Taxonomies LAK21, April 12–16, 2021, Irvine, CA, USA Table 1: Descriptive statistics of datasets (after preprocessing)

ASSISTments Khan Academy Cognitive TutorSubject Math Math MathGrade Level 7th and 8th grades 7th and 8th grades 8th to 10th gradesNumber of Skills 130 194 536Number of Unique Problems (Steps for Cognitive Tutor) 38,490 20,797 495,068Number of Unique Users 27,760 875,492 3,292Number of Responses 2,602,777 47,794,008 6,263,006Average Number of Words per Problem 32 29 N/AInformation Types Content+Context Content+Context ContextGranularity Coarse-grained Medium Fine-grained

Table 2: Annotation results

Platforms Fleiss’ Kappa Number of SkillEquivalenciesKhan Academy, ASSISTments 0.69 148Khan Academy, Cognitive Tutor 0.72 85ASSISTments, Cognitive Tutor 0.83 222 equivalent skills across platforms and then agreement was mea-sured by Fleiss’ Kappa and any conflict was resolved by discussionand majority voting. To give an example, the authors initially dis-agreed on whether skill "Angles 2" in Khan Academy should bemapped to skill "Angles - Obtuse, Acute, and Right" in ASSIST-ments. After discussion and inspecting the associated problems, aconsensus was arrived at that they should not be linked as the KhanAcademy skill involves angle calculation which is not needed inthe ASSISTments skill. Since the granularity of taxonomies varies,there is likely to be one-to-many and many-to-one relationshipsbetween taxonomies. To account for such asymmetric relationships,we calculate Fleiss’ Kappa by considering every possible pair ofskills as "subject", and a binary value indicating equivalent or notas label. The Fleiss’ Kappas and number of equivalent skill pairs aredisplayed in Table 2. The high Kappas suggest a good agreementamong the authors.

We evaluate our models using two metrics, recall@ 𝑘 and meanreciprocal rank. The skill taxonomies in our platform datasets differin granularity, resulting in some skills having a one-to-many rela-tionship across taxonomies. Thus, in addition to using a somewhatstandard metric of recall@ 𝑘 [23], we also report mean reciprocalrank to indicate the rank of the first true positive result.Recall@ 𝑘 , where 𝑘 is a user definable integer, measures thepercentage of relevant destination skills that are contained withinthe model’s top 𝑘 predictions for each source skill, as calculated by 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 @ 𝑘𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 @ 𝑘 + 𝑓 𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 @ 𝑘 . In our experiments, we choose 𝑘 to be 5, which will return the 5 most similar predicted skills.Mean reciprocal rank (MRR) calculates the rank of the first rele-vant destination skill for each source skill and takes the mean of thereciprocals of these ranks as output. It is given by | 𝑆 | (cid:205) 𝑠 ∈ 𝑆 𝑟𝑎𝑛𝑘 ( 𝑠 ) where 𝑆 is the set of all source skills and 𝑟𝑎𝑛𝑘 ( 𝑠 ) denotes the rankof the first relevant destination skill for source skill 𝑠 . We take different validation and test-ing strategies for different representation models, depending onwhether they require hyperparameter tuning and whether theyneed ground truth labels to train. We apply hyperparameter searchon validation data separate from the test set.For Bag-of-words, TF-IDF, and Content2vec with pretrainedword vectors, no hyperparameters or ground truth labels are in-volved in training. Test set results are produced by comparing thepredictions against all the ground truth pairs.For Content2vec with our own word vectors trained on problemtexts, no skill pair labels are needed, but the word vector trainingprocess involves hyperparameters. Using 10-fold cross-validation,we search the best hyperparameters on the training set of eachphase of the cross-validation and evaluate on the phase’s test setwith those hyperparameters, hence we get 10 best hyperparametersets for the 10 folds instead of a single best hyperparameter set.For Skill2vec and TAMF, the training requires both hyperparam-eter tuning and ground truth skill pairs. We similarly conduct anested cross-validation on the training data within each phase ofthe 10-fold cross-validation to tune hyperparameters. The outercross-validation tier has the same splits with Content2vec in orderto compare models.For Content2vec+Skill2vec, we simply concatenate the tunedcontent vector and context vector with the best hyperparametersfor each fold.At the dataset level, we treat Khan Academy and ASSISTmentsas training platforms and run experiments on them to comparemodels. We then pick the best model and conduct a final test onthe mappings to and from Cognitive Tutor. We choose CognitiveTutor as the test platform because it has only context information,whereas ASSISTments and Khan Academy have both content andcontext with which we can do holistic experiments on various skillrepresentations, including symmetric and asymmetric source anddestination data sources. A test on Cognitive Tutor without anymodel tuning will validate (or invalidate) our conclusions drawnfrom the other two platforms.

AK21, April 12–16, 2021, Irvine, CA, USA Zhi Li, Cheng Ren, Xianyou Li, and Zachary A. Pardos

Table 3: Skill equivalency prediction results using symmet-ric data sources and skill representations

Representation K2A K2A A2K A2KRecall@5 MRR Recall@5 MRRBag-of-words 0.55 0.43 0.38 0.46TF-IDF 0.51 0.44 0.43 0.51Content2vec (pretrained) 0.33 0.21 0.14 0.25Content2vec 0.64 0.49 0.44 0.55Skill2vec 0.61 0.51 0.11 0.14Content2vec+Skill2vec 0.80 0.63 0.28 0.35TAMF 0.74 0.67 0.22 0.21

We use Adam optimizer with learning rate 0.001to train the translation model. The loss to minimize is the mean ofcosine distances of training ground truth skill pairs. The maximumnumber of epochs is 1,000. We also randomly take 20% out of thetraining data as validation. If the validation loss has not decreasedin 100 epochs, we stop the training and keep the model with thesmallest validation loss. The models are trained on one NVIDIAMaxwell architecture GPU and each training job takes around 10seconds to finish, with a whole nested cross-validation with 24hyperparameter sets to tune totaling 5 hours ( seconds × × × = seconds) to run. This section presents the results of several experiments. Section5.1 compares equivalent skill prediction performance of the sixskill representations in Khan Academy and ASSISTments assumingthe same source and destination information types. Section 5.2examines situations where source and destination platforms havedifferent types of skill information (e.g., content in one and contextin the other). Section 5.3 evaluates skill prediction performance onCognitive Tutor as a test set using the best models as observed fromthe Khan Academy and ASSISTments results. Section 5.4 introducesthe ablation study results for each method and the performancevariations with different hyperparameters.

In this section, we assume the source and destination platformshave the same type of data available (content, context, or both),and compare the six skill representations detailed in Section 3 onthe skill equivalency prediction tasks between Khan Academy andASSISTments. The results are shown in Table 3, where "K2A" refersto mapping from Khan Academy to ASSISTments taxonomy, and"A2K" is the other direction. The main observations are as follows: • The highest MRRs for both directions are above 0.5, meaningthe true predictions are at the second place or higher onaverage. • Among all content-based methods, Content2vec with wordvectors trained on problem texts is consistently the best.Content2vec with pretrained word vectors perform poorly,perhaps due to the insufficiency of the pretrained corpus incapturing information on math problem texts. • In the mapping direction from Khan Academy to ASSIST-ments, incorporating both content and context informationin the representation is better than either alone. The bestmethod depends on the evaluation metric: Content2vec+Skill2vecis preferred with recall@5, and TAMF is favorable with MRR. • In the mapping direction from ASSISTments to Khan Acad-emy, the best method is Content2vec. Moreover, sequence in-formation is counterproductive since Content2vec+Skill2vecis worse than Content2vec. • The direction from Khan Academy to ASSISTments has bet-ter results than the other direction. This is likely becauseKhan Academy has a finer-grained taxonomy than ASSIST-ments, which will be discussed further in Section 6.3.

In certain scenarios, the combination of content and context in-formation for a digital learning platform will not be available. Forexample, Cognitive Tutor provides only context information pub-licly, and other platforms like Junyi Academy have only contentinformation made available. While the platforms themselves likelystore sequence data, a new platform with little to no sequence datamay still wish to map to other taxonomies. Our framework can ac-commodate such circumstances: it allows different types of sourceand destination skill representations to be presented to the model.The machine translation will learn transformations between thetwo vector spaces provided that there are ground truth skill pairs totrain on. Experiments are conducted to test how skill equivalencyprediction performance differs when data type availability is notsymmetric between source and destination. There are three typesof information input for a given platform: content, context, andboth, resulting in 6 different combinations of asymmetric sourceand destination representation types. Table 4 gives the results ofthe different representation scenarios. The best models in terms ofMRR for each input type (Content2vec, Skill2vec, and TAMF) areused.Comparing the results with Table 3, we see that the highestrecall@5 and MRR with asymmetric data sources are lower thansymmetric cases, so it’s better to first consider symmetric sources ifall information is available. However, asymmetric sources are still aviable option. On one hand, asymmetric sources can not be avoidedif one platform has only content and the other has only context.Moreover, even if one platform has both types of information andthe other has only content or context, where training with symmet-ric sources is an option, it’s still worth trying to incorporate bothinformation in the first platform. For example, as shown in Table 3and 4, TAMF to Skill2vec (asymmetric) is better than Skill2vec toSkill2vec (symmetric). We choose the models for mapping between Cognitive Tutor andthe other two platforms anticipated to be the best based on theresults of mapping shown in Tables 3 and 4. For each mappingdirection, given that Cognitive Tutor has only context data, welook for the best representation for the other platform. For example,when selecting for Cognitive Tutor to Khan Academy, we look at https://github.com/junyiacademy/junyiexercise earning Skill Equivalencies Across Platform Taxonomies LAK21, April 12–16, 2021, Irvine, CA, USA Table 4: Skill equivalency prediction results using asymmetric data sources and skill representations

Source Source Source Destination Destination Destination K2A K2A A2K A2KContent Context Model Content Context Model Recall@5 MRR Recall@5 MRR ✓ Content2vec ✓ Skill2vec 0.71 0.57 0.07 0.13 ✓ Content2vec ✓ ✓

TAMF 0.71 0.55 0.19 0.18 ✓ Skill2vec ✓ Content2vec 0.47 0.35 0.11 0.17 ✓ Skill2vec ✓ ✓

TAMF 0.69 0.54 0.15 0.14 ✓ ✓

TAMF ✓ Content2vec 0.69 0.60 0.24 0.22 ✓ ✓

TAMF ✓ Skill2vec 0.70 0.63 0.12 0.16the results where Khan Academy is the destination and the source iscontext only, and find the content-only destination has the highestMRR. We therefore choose Content2vec for Khan Academy.The skill equivalency prediction results are displayed in Table 5.Mapping from Cognitive Tutor to the other two platforms reachrelatively high recall@5 and MRR, supporting our methodologyand conclusions from experiments between Khan Academy andASSISTments; however, the opposite directions are worse. We be-lieve this is because Cognitive Tutor has the finest-grained skilltaxonomy and naunced information is lost when forcing a mappingfrom coarse-grained to fine-grained (i.e., one-to-many).

Hyperparameters are tuned in the experiments with symmetricdata sources. As detailed in Section 4.3.1, three methods need hy-perparameter tuning: Content2vec, Skill2vec, and TAMF. Althoughwe do not find a single best hyperparameter set as a consequence ofcross-validation, we can compare the average performance of eachhyperparameter set on the validation set. The metric is MRR in hy-perparameter tuning thanks to its insensitivity to the one-to-manyrelationship.For Content2vec, the hyperparameters include vector dimension,window size, and minimum count. Surprisingly, the best hyper-parameters for each mapping direction are the same across all 10folds, which are vector dimension 100, window size 20, and mini-mum count 50 from Khan Academy to ASSISTments, and vectordimension 100, window size 20, and minimum count 30 from AS-SISTments to Khan Academy. Generally speaking, larger vectordimension and larger window size are better, while minimum tokencount does not have much impact on the performance.For Skill2vec, the source and destination hyperparameters canbe different, but there is no minimum count as we need to keepall skills, so the hyperparameters are source vector dimension,source window size, destination vector dimension, and destinationwindow size. Figure 3 shows how they affect validation MRR acrossall folds. We can see that vector dimension influences the resultsmore significantly than window size, but there is no single rule tochoose vector dimension, since sometimes large vector dimensionis favorable while sometimes small dimension is better.TAMF has two hyperparameters for each platform, 𝑘 for halfvector dimension and 𝜆 for regularization coefficient. Shown in Fig-ure 4 is the performance with different hyperparameter sets. Largervector dimension is preferred, and regularization coefficient doesnot greatly affect the overall skill equivalency prediction results. Figure 3: Skill2vec performance with different hyperparam-etersFigure 4: TAMF performance with different hyperparame-ters

This section delves into inspecting the taxonomy translation andedge cases. Section 6.1 visualizes skills from all taxonomies mappedto a single shared space; Section 6.2 considers skills with no match-ing skills in the destination platform; and Section 6.3 summarizesthe effect of taxonomy granularity on skill equivalency prediction.

We map skills from all platform taxonomies to a common spaceand visualize them to help understand skill equivalency in ourtaxonomy mapping models. Specifically, we project Khan Acad-emy TAMF vectors and Cognitive Tutor Skill2vec vectors onto theASSISTments TAMF vector space and keep ASSISTments TAMFvectors unchanged, since the skill mapping in this direction givesthe best equivalency prediction performance. In this space, we runk-means clustering to explore relationships among closely mappedskills. The number of clusters is set to 20 as determined by the"elbow method" heuristic. Finally, we apply t-SNE to reduce dimen-sionality and display the results in Figure 5. Skills are colored basedon k-means applied to the original, high dimensional skill vectors.We rank these clusters by a heuristic score: the percent of skillswhose true matching skills from another taxonomy are also in the

AK21, April 12–16, 2021, Irvine, CA, USA Zhi Li, Cheng Ren, Xianyou Li, and Zachary A. Pardos

Table 5: Skill equivalency prediction results on Cognitive Tutor

Source Source Representation Destination Destination Representation Recall@5 MRRCognitive Tutor Skill2vec Khan Academy Content2vec 0.72 0.72Khan Academy TAMF Cognitive Tutor Skill2vec 0.23 0.39Cognitive Tutor Skill2vec ASSISTments TAMF 0.88 0.79ASSISTments TAMF Cognitive Tutor Skill2vec 0.03 0.12

Table 6: Skill equivalency prediction results including"None" skills

Representation "None" Skill K2A A2KStrategy Recall@5 Recall@5Content2vec Ignore 0.45 0.30Content2vec Threshold 0.53 0.42Skill2vec Ignore 0.43 0.08Skill2vec Threshold 0.47 0.13Content2vec+Skill2vec Ignore 0.56 0.24Content2vec+Skill2vec Threshold 0.62 0.26same cluster. The best cluster (score 1.0), the worst cluster (0.36),and a middle cluster (0.71) are featured in Figure 5. The best clusteris tightly grouped, containing only unit conversion skills. The worstcluster is the most dispersed, yet it still groups similar skills togetherlike inequality and number line. The middle cluster is mostly aboutPythagorean Theorem, with a few outliers. In this cluster, we canobserve the fined-grained step-level skills of Pythagorean Theoremlike calculating lengths and squares of hypotenuse and legs fromCognitive Tutor are clustered together with the coarse-grainedproblem-level skills from the other two platforms. The clusteringresults indicate that our models do map similar skills close to eachother as desired. The complete visualization with cluster assign-ments and plot inspection tool can be found online . Owing to the uniqueness of each learning platform, some skills inone platform may not have a matching skill in another. In our case,30% of skills in Khan Academy have no corresponding skills in AS-SISTments and 31% of skills in ASSISTments have no counterpartsin Khan Academy. We call these skills the "None" skills and theywere ignored in the previous evaluations. While an algorithm canalways return the most similar but not sufficiently equivalent skills,it is desirable to find a way for our method to report that there isno reasonable match. Therefore, we also conduct experiments todistinguish these untranslatable skills. Our method is to simply adda similarity threshold, whereby any prediction below that thresholdwill be considered as a "None" skill prediction, and the ground truthfor those untranslatable skills is a mapping to the "None" skill. Theevaluation metric used is again recall@5. We also compare thismethod against a baseline "Ignore", which is to train and predict asif all skills are translatable but consider the predictions for those"None" skills as wrongly predicted in evaluation. https://cahlr.github.io/skill-equivalency-visualization Table 6 shows the results for Content2vec, Skill2vec, and Con-tent2vec+Skill2vec, which are the best method with recall@5 foreach input type. Considering "None" skills leads to a decrease inperformance compared to previously shown evaluations, underscor-ing the difficulty of addressing this issue. The threshold approach,however, performs slightly better than the baseline of not modelingNone skills and counting them wrong. Given the simplicity of thethreshold method, we hope future research can improve on thisapproach to classifying untranslatable skills between platforms.

We observed that there is a large performance discrepancy betweendirections of taxonomy mapping. For example, the best MRR fromKhan Academy to ASSISTments is 0.67 achieved with TAMF, butthe best MRR in the other direction is only 0.55. This phenomenon isalso observed in other platform pairs (Table 7). For example, resultsof Cognitive Tutor to Khan Academy or ASSISTments are betterthan Khan Academy or ASSISTments to Cognitive Tutor.A common characteristic of the poorer performing pairs is whenmapping in the direction of a fine-grained taxonomy to a coarse-grained one. Cognitive Tutor is the finest with skills assigned to step,Khan Academy is the middle, and ASSISTments uses the coarsest-grained taxonomy of the three. We can observe the magnitude ofthe difference in performance between directions differs, as shownin Table 7. Cognitive Tutor and ASSISTments have larger differ-ence than Cognitive Tutor and Khan Academy, and Khan Academyand ASSISTments. This result suggests that the more distant twoplatforms are in granularity, the larger performance discrepancythere will be between the two mapping directions.

There were a few limitations to this research that we believe can beimproved in future studies. First, the ground truth cross-platformskills were labeled by the authors, not domain experts. Second, thetaxonomies used are restricted within the mathematics domain.This was in part because the majority of publicly available digi-tal learning platform clickstream and content datasets are frommathematics platforms. Third, we only utilized text content in ourcontent-based approaches. Utilizing images, as was done in [5],would capture a more complete representation of problems. Fi-nally, pretrained contextual word embedding models like BERT [8]might be effective in improving content-based representations ofproblems, thus boosting skill equivalency prediction performance. earning Skill Equivalencies Across Platform Taxonomies LAK21, April 12–16, 2021, Irvine, CA, USA

Figure 5: t-SNE visualization of skills from all three platform taxonomies projected onto the ASSISTments TAMF vector spaceTable 7: Discrepancy of skill equivalency prediction results (MRR) between opposite directions

Fine-grained Platform Coarse-grained Platform Fine to Coarse Coarse to Fine DifferenceKhan Academy ASSISTments 0.67 0.55 0.12Cognitive Tutor Khan Academy 0.72 0.39 0.33Cognitive Tutor ASSISTments 0.79 0.12 0.67

In this research, we demonstrated the viability of learning skillequivalencies across several taxonomies using data from the con-tent of the problems skills are associated with and the clickstreamsequences (i.e., contexts) in which those problems appear on a digital learning platform. We represented skills as vectors and em-ployed machine translation to map between skill vector spaces,and validated the methodology on three digital learning platforms:ASSISTments, Khan Academy, and Cognitive Tutor.

AK21, April 12–16, 2021, Irvine, CA, USA Zhi Li, Cheng Ren, Xianyou Li, and Zachary A. Pardos

We found that attempting to map from a coarser-grained taxon-omy to a finer-grained taxonomy was considerably more difficult,with the best ASSISTments to Khan Academy recall@5 markedlylower than the reverse direction (0.44 vs 0.80) and, similarly, whenmapping to the finer-grained taxonomy of the Cognitive Tutor (0.23and 0.03 with Khan Academy and ASSISTments as sources, respec-tively). We also found that skill equivalence prediction was moreaccurate in experiments where there was symmetric data used torepresent skills in both the source and destination taxonomies; how-ever, skill prediction with asymmetric data performed comparablywhen mapping from fine-grained to coarse-grained taxonomies,particularly in the case of Cognitive Tutor (using Skill2vec) to AS-SISTments (using TAMF), which scored 0.88 in [email protected] results are promising, but do not yet reach the level re-quired for completely unattended automatic taxonomy mapping.The approach can, however, effectively triage a skill mapping ortaxonomy crosswalk process and likely reduce the manual laborneeded by a considerable amount. Use of the methods introducedin this paper, amplified by improvements defined in future work,could facilitate a more connected digital learning ecosystem, pro-viding greater acknowledgement of students’ prior learning andsubsequently, more effective pedagogy.

REFERENCES [1] National Governors Association et al. 2010. Common Core State Standards.

Washington, DC (2010).[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural MachineTranslation by Jointly Learning to Align and Translate. In .[3] Norman Bier, Sean Lip, Ross Strader, Candace Thille, and Dawn Zimmaro. 2014.An Approach to Knowledge Component/Skill Modeling in Online Courses.

OpenLearning (2014), 1–14.[4] Hao Cen, Kenneth R Koedinger, and Brian Junker. 2007. Is Over PracticeNecessary?–Improving Learning Efficiency with the Cognitive Tutor throughEducational Data Mining. In

Proceedings of the 2007 Conference on ArtificialIntelligence in Education . 511–518.[5] Devendra Singh Chaplot, Christopher MacLellan, Ruslan Salakhutdinov, andKenneth Koedinger. 2018. Learning Cognitive Models Using Neural Networks. In

International Conference on Artificial Intelligence in Education . Springer, 43–56.[6] Namyoun Choi, Il-Yeol Song, and Yongjun Zhu. 2016. A Model-Based Methodfor Information Alignment: A Case Study on Educational Standards.

Journal ofComputing Science and Engineering

10, 3 (2016), 85–94.[7] David T Conley. 2011. Crosswalk Analysis of Deeper Learning Skills to CommonCore State Standards.

Educational Policy Improvement Center (NJ1) (2011).[8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding. In

Proceedings of NAACL-HLT, Volume 1 (Long and Short Papers) . 4171–4186.[9] Cambridge Assessment International Education. 2018. The 2018 CambridgeInternational Global Education Census. (2018).[10] Stephen E Fancsali, Michael V Yudelson, Susan R Berman, and Steven Ritter. 2018.Intelligent Instructional Hand Offs.

International Educational Data Mining Society (2018).[11] Christian Fischer, Zachary A Pardos, Ryan Shaun Baker, Joseph Jay Williams,Padhraic Smyth, Renzhe Yu, Stefan Slater, Rachel Baker, and Mark Warschauer.2020. Mining Big Data in Education: Affordances and Challenges.

Review ofResearch in Education

44, 1 (2020), 130–160.[12] Neil T Heffernan and Cristina Lindquist Heffernan. 2014. The ASSISTmentsEcosystem: Building a Platform that Brings Scientists and Teachers together forMinimally Invasive Research on Human Learning and Teaching.

InternationalJournal of Artificial Intelligence in Education

24, 4 (2014), 470–497.[13] Weijie Jiang and Zachary A Pardos. 2020. Evaluating Sources of Course Informa-tion and Models of Representation on a Variety of Institutional Prediction Tasks.

International Educational Data Mining Society (2020).[14] Kenneth R Koedinger, Albert T Corbett, and Charles Perfetti. 2012. TheKnowledge-Learning-Instruction Framework: Bridging the Science-PracticeChasm to Enhance Robust Student Learning.

Cognitive Science

36, 5 (2012),757–798.[15] Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit MatrixFactorization. In

Advances in Neural Information Processing Systems 27 . Curran Associates, Inc., 2177–2185.[16] Joshua J Michalenko, Andrew S Lan, and Richard G Baraniuk. 2017. Data-MiningTextual Responses to Uncover Misconception Patterns. In

Proceedings of theFourth (2017) ACM Conference on Learning@ Scale . 245–248.[17] Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting Similaritiesamong Languages for Machine Translation. arXiv preprint arXiv:1309.4168 (2013).[18] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.Distributed Representations of Words and Phrases and Their Compositionality.In

Advances in Neural Information Processing Systems . 3111–3119.[19] Sara Morsy and George Karypis. 2019. Will This Course Increase or Decrease YourGPA? Towards Grade-Aware Course Recommendation.

Journal of EducationalData Mining

11, 2 (2019), 20–46.[20] Liangming Pan, Xiaochen Wang, Chengjiang Li, Juanzi Li, and Jie Tang. 2017.Course Concept Extraction in MOOCs via Embedding-based Graph Propagation.In

Proceedings of the Eighth International Joint Conference on Natural LanguageProcessing (Volume 1: Long Papers) . 875–884.[21] John F Pane, Beth Ann Griffin, Daniel F McCaffrey, and Rita Karam. 2014. Effec-tiveness of Cognitive Tutor Algebra I at Scale.

Educational Evaluation and PolicyAnalysis

36, 2 (2014), 127–144.[22] Zachary A Pardos, Yoav Bergner, Daniel T Seaton, and David E Pritchard. 2013.Adapting Bayesian Knowledge Tracing to a Massive Open Online Course inedX. In .International Educational Data Mining Society.[23] Zachary A Pardos, Hung Chau, and Haocheng Zhao. 2019. Data-Assistive Course-to-Course Articulation Using Machine Translation. In

Proceedings of the Sixth(2019) ACM Conference on Learning@ Scale . 1–10.[24] Zachary A Pardos and Anant Dadu. 2017. Imputing KCs with Representationsof Problem Content and Context. In

Proceedings of the 25th Conference on UserModeling, Adaptation and Personalization . 148–155.[25] Zachary A Pardos, Zihao Fan, and Weijie Jiang. 2019. Connectionist Recom-mendation in the Wild: On the Utility and Scrutability of Neural Networks forPersonalized Course Guidance.

User Modeling and User-Adapted Interaction

29, 2(2019), 487–525.[26] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online Learn-ing of Social Representations. In

Proceedings of the 20th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining . 701–710.[27] Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami,Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep Knowledge Tracing. In

Advances in Neural Information Processing Systems . 505–513.[28] Darrell Porcello and Sherry Hsi. 2013. Crowdsourcing and Curating OnlineEducation Resources.

Science

Technology,Instruction, Cognition & Learning

5, 3 (2007).[30] John Stamper and Zachary A Pardos. 2016. The 2010 KDD Cup CompetitionDataset: Engaging the Machine Learning Community in Predictive LearningAnalytics.

Journal of Learning Analytics

3, 2 (2016), 312–316.[31] Cathlyn Stone, Abigail Quirk, Margo Gardener, Stephen Hutt, Angela L Duck-worth, and Sidney K D’Mello. 2019. Language as Thought: Using Natural Lan-guage Processing to Model Noncognitive Traits that Predict College Success. In

Proceedings of the 9th International Conference on Learning Analytics & Knowledge .320–329.[32] Mega Subramaniam, June Ahn, Amanda Waugh, Natalie Greene Taylor, AllisonDruin, Kenneth R Fleischmann, and Greg Walsh. 2013. Crosswalk between the"Framework for K-12 Science Education" and" Standards for the 21st-CenturyLearner": School Librarians as the Crucial Link.

School Library Research

16 (2013).[33] Mark Wilson. 2004.

Constructing Measures: An Item Response Modeling Approach .Routledge.[34] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y Chang. 2015.Network Representation Learning with Rich Text Information. In

Proceedings ofthe 24th International Conference on Artificial Intelligence . 2111–2117.[35] Ozgur Yilmazel, Niranjan Balasubramanian, Sarah C Harwell, Jennifer Bailey,Anne R Diekema, and Elizabeth D Liddy. 2007. Text Categorization for AligningEducational Standards. In2007 40th Annual Hawaii International Conference onSystem Sciences (HICSS’07)