Andrew S. Lan
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew S. Lan.
knowledge discovery and data mining | 2014
Andrew S. Lan; Christoph Studer; Richard G. Baraniuk
We propose SPARFA-Trace, a new machine learning-based framework for time-varying learning and content analytics for educational applications. We develop a novel message passing-based, blind, approximate Kalman filter for sparse factor analysis (SPARFA) that jointly traces learner concept knowledge over time, analyzes learner concept knowledge state transitions (induced by interacting with learning resources, such as textbook sections, lecture videos, etc., or the forgetting effect), and estimates the content organization and difficulty of the questions in assessments. These quantities are estimated solely from binary-valued (correct/incorrect) graded learner response data and the specific actions each learner performs (e.g., answering a question or studying a learning resource) at each time instant. Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learners concept knowledge evolution over time, analyzing the quality and content organization of learning resources, and estimating the question--concept associations and the question difficulties. Moreover, we show that SPARFA-Trace achieves comparable or better performance in predicting unobserved learner responses compared to existing collaborative filtering and knowledge tracing methods.
learning at scale | 2015
Andrew S. Lan; Divyanshu Vats; Andrew E. Waters; Richard G. Baraniuk
While computer and communication technologies have provided effective means to scale up many aspects of education, the submission and grading of assessments such as homework assignments and tests remains a weak link. In this paper, we study the problem of automatically grading the kinds of open response mathematical questions that figure prominently in STEM (science, technology, engineering, and mathematics) courses. Our data-driven framework for mathematical language processing (MLP) leverages solution data from a large number of learners to evaluate the correctness of their solutions, assign partial-credit scores, and provide feedback to each learner on the likely locations of any errors. MLP takes inspiration from the success of natural language processing for text data and comprises three main steps. First, we convert each solution to an open response mathematical question into a series of numerical features. Second, we cluster the features from several solutions to uncover the structures of correct, partially correct, and incorrect solutions. We develop two different clustering approaches, one that leverages generic clustering algorithms and one based on Bayesian nonparametrics. Third, we automatically grade the remaining (potentially large number of) solutions based on their assigned cluster and one instructor-provided grade per cluster. As a bonus, we can track the cluster assignment of each step of a multistep solution and determine when it departs from a cluster of correct solutions, which enables us to indicate the likely locations of errors to learners. We test and validate MLP on real-world MOOC data to demonstrate how it can substantially reduce the human effort required in large-scale educational platforms.
international conference on acoustics, speech, and signal processing | 2014
Andrew S. Lan; Christoph Studer; Richard G. Baraniuk
This paper deals with the recovery of an unknown, low-rank matrix from quantized and (possibly) corrupted measurements of a subset of its entries. We develop statistical models and corresponding (multi-)convex optimization algorithms for quantized matrix completion (Q-MC) and quantized robust principal component analysis (Q-RPCA). In order to take into account the quantized nature of the available data, we jointly learn the underlying quantization bin boundaries and recover the low-rank matrix, while removing potential (sparse) corruptions. Experimental results on synthetic and two real-world collaborative filtering datasets demonstrate that directly operating with the quantized measurements - rather than treating them as real values - results in (often significantly) lower recovery error if the number of quantization bins is less than about 10.
international conference on acoustics, speech, and signal processing | 2013
Andrew E. Waters; Andrew S. Lan; Christoph Studer
We develop a new model and algorithm for machine learning-based learning analytics, which estimate a learners knowledge of the concepts underlying a domain. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each questions intrinsic difficulty. We estimate these factors given the graded responses to a set of questions. We develop a bi-convex algorithm to solve the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach.
learning at scale | 2018
Zichao Wang; Andrew S. Lan; Weili Nie; Andrew E. Waters; Phillip Grimaldi; Richard G. Baraniuk
The ever growing amount of educational content renders it increasingly difficult to manually generate sufficient practice or quiz questions to accompany it. This paper introduces QG-Net, a recurrent neural network-based model specifically designed for automatically generating quiz questions from educational content such as textbooks. QG-Net, when trained on a publicly available, general-purpose question/answer dataset and without further fine-tuning, is capable of generating high quality questions from textbooks, where the content is significantly different from the training data. Indeed, QG-Net outperforms state-of-the-art neural network-based and rules-based systems for question generation, both when evaluated using standard benchmark datasets and when using human evaluators. QG-Net also scales favorably to applications with large amounts of educational content, since its performance improves with the amount of training data.
artificial intelligence in education | 2018
Da Cao; Andrew S. Lan; Weiyu Chen; Christopher G. Brinton; Mung Chiang
Learner behavioral data (e.g., clickstream activity logs) collected by online education platforms contains rich information about learners and content, but is often highly redundant. In this paper, we study the problem of learning low-dimensional, interpretable features from this type of raw, high-dimensional behavioral data. Based on the premise of generative adversarial networks (GANs), our method refines a small set of human-crafted features while also generating a set of additional, complementary features that better summarize the raw data. Through experimental validation on a real-world dataset that we collected from an online course, we demonstrate that our method leads to features that are both predictive of learner quiz scores and closely related to human-crafted features.
Signal Processing | 2018
Amirali Aghazadeh; Mohammad Golbabaee; Andrew S. Lan; Richard G. Baraniuk
Sensor selection refers to the problem of intelligently selecting a small subset of a collection of available sensors to reduce the sensing cost while preserving signal acquisition performance. The majority of sensor selection algorithms find the subset of sensors that best recovers an arbitrary signal from a number of linear measurements that is larger than the dimension of the signal. In this paper, we develop a new sensor selection algorithm for sparse (or near sparse) signals that finds a subset of sensors that best recovers such signals from a number of measurements that is much smaller than the dimension of the signal. Existing sensor selection algorithms cannot be applied in such situations. Our proposed Incoherent Sensor Selection (Insense) algorithm minimizes a coherence-based cost function that is adapted from recent results in sparse recovery theory. Using three datasets, including a real-world dataset on microbial diagnostics, we demonstrate the superior performance of Insense for sparse-signal sensor selection.
IEEE Journal of Selected Topics in Signal Processing | 2017
Andrew S. Lan; Andrew E. Waters; Christoph Studer; Richard G. Baraniuk
Machine learning (ML) models and algorithms can enable a personalized learning experience for students in an inexpensive and scalable manner. At the heart of ML-driven personalized learning is the automated analysis of student responses to assessment items. Existing statistical models for this task enable the estimation of student knowledge and question difficulty solely from graded response data with only minimal effort from instructors. However, most existing student–response models are generalized linear models, meaning that they characterize the probability that a student answers a question correctly through a linear combination of their knowledge and the questions difficulty with respect to each concept that is being assessed. Such models cannot characterize complicated, nonlinear student–response associations and, hence, lack human interpretability in practice. In this paper, we propose a nonlinear student–response model called Boolean logic analysis (BLAh) that models a students binary-valued graded response to a question as the output of a Boolean logic function. We develop a Markov chain Monte Carlo inference algorithm that learns the Boolean logic functions for each question solely from graded response data. A refined BLAh model improves the identifiability, tractability, and interpretability by considering a restricted set of ordered Boolean logic functions. Experimental results on a variety of real-world educational datasets demonstrate that BLAh not only achieves best-in-class prediction performance on unobserved student responses on some datasets but also provides easily interpretable parameters when questions are tagged with metadata by domain experts, which can provide useful feedback to instructors and content designers to improve the quality of assessment items.
Journal of Machine Learning Research | 2014
Andrew S. Lan; Andrew E. Waters; Christoph Studer; Richard G. Baraniuk
Archive | 2014
Richard G. Baraniuk; Andrew S. Lan; Christoph Studer; Andrew E. Waters