Michael Heilman
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Heilman.
workshop on innovative use of nlp for building educational applications | 2008
Michael Heilman; Kevyn Collins-Thompson; Maxine Eskenazi
A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntactic parses. We also tested statistical models for nominal, ordinal, and interval scales of measurement. The results indicate that a model for ordinal regression, such as the proportional odds model, using a combination of grammatical and lexical features is most effective at predicting reading difficulty.
workshop on innovative use of nlp for building educational applications | 2008
Michael Heilman; Le Zhao; Juan Pino; Maxine Eskenazi
Finding appropriate, authentic reading materials is a challenge for language instructors. The Web is a vast resource of texts, but most pages are not suitable for reading practice, and commercial search engines are not well suited to finding texts that satisfy pedagogical constraints such as reading level, length, text quality, and presence of target vocabulary. We present a system that uses various language technologies to facilitate the retrieval and presentation of authentic reading materials gathered from the Web. It is currently deployed in two English as a Second Language courses at the University of Pittsburgh.
artificial intelligence in education | 2010
Michael Heilman; Kevyn Collins-Thompson; Jamie Callan; Maxine Eskenazi; Alan Juffs; Lois Wilson
The REAP tutoring system provides individualized and adaptive English as a Second Language vocabulary practice. REAP can automatically personalize instruction by providing practice readings about topics that match interests as well as domain-based, cognitive objectives. While most previous research on motivation in intelligent tutoring environments has focused on increasing extrinsic motivation, this study focused on increasing personal interest. Students were randomly split into control and treatment groups. The control-condition tutor chose texts to maximize domain-based goals such as the density of practice opportunities for target words. The treatment-condition tutor also preferred texts that matched personal interests. The results show positive effects of personalization, and also demonstrate the importance of negotiating between motivational and domain-based goals.
meeting of the association for computational linguistics | 2014
Yi Song; Michael Heilman; Beata Beigman Klebanov; Paul Deane
Under the framework of the argumentation scheme theory (Walton, 1996), we developed annotation protocols for an argumentative writing task to support identification and classification of the arguments being made in essays. Each annotation protocol defined argumentation schemes (i.e., reasoning patterns) in a given writing prompt and listed questions to help evaluate an argument based on these schemes, to make the argument structure in a text explicit and classifiable. We report findings based on an annotation of 600 essays. Most annotation categories were applied reliably by human annotators, and some categories significantly contributed to essay score. An NLP system to identify sentences containing scheme-relevant critical questions was developed based on the human annotations.
meeting of the association for computational linguistics | 2014
Michael Heilman; Aoife Cahill; Nitin Madnani; Melissa Lopez; Matthew Mulholland; Joel R. Tetreault
Automated methods for identifying whether sentences are grammatical have various potential applications (e.g., machine translation, automated essay scoring, computer-assisted language learning). In this work, we construct a statistical model of grammaticality using various linguistic features (e.g., misspelling counts, parser outputs, n-gram language model scores). We also present a new publicly available dataset of learner sentences judged for grammaticality on an ordinal scale. In evaluations, we compare our system to the one from Post (2011) and find that our approach yields state-of-the-art performance.
workshop on innovative use of nlp for building educational applications | 2015
Anastassia Loukina; Klaus Zechner; Lei Chen; Michael Heilman
Automated scoring systems used for the evaluation of spoken or written responses in language assessments need to balance good empirical performance with the interpretability of the scoring models. We compare several methods of feature selection for such scoring systems and show that the use of shrinkage methods such as Lasso regression makes it possible to rapidly build models that both satisfy the requirements of validity and intepretability, crucial in assessment contexts as well as achieve good empirical performance.
intelligent tutoring systems | 2008
Anagha Kulkarni; Michael Heilman; Maxine Eskenazi; Jamie Callan
Words with multiple meanings are a phenomenon inherent to any natural language. In this work, we study the effects of such lexical ambiguities on second language vocabulary learning. We demonstrate that machine learning algorithms for word sense disambiguation can induce classifiers that exhibit high accuracy at the task of disambiguating homonyms (words with multiple distinct meanings). Results from a user study that compared two versions of a vocabulary tutoring system, one that applied word sense disambiguation to support learning and another that did not, support rejection of the null hypothesis that learning outcomes with and without word sense disambiguation are equivalent, with a p-value of 0.001. To our knowledge this is the first work that investigates the efficacy of word sense disambiguation for facilitating second language vocabulary learning.
Proceedings of the Second Workshop on Metaphor in NLP | 2014
Beata Beigman Klebanov; Ben Leong; Michael Heilman; Michael Flor
Current approaches to supervised learning of metaphor tend to use sophisticated features and restrict their attention to constructions and contexts where these features apply. In this paper, we describe the development of a supervised learning system to classify all content words in a running text as either being used metaphorically or not. We start by examining the performance of a simple unigram baseline that achieves surprisingly good results for some of the datasets. We then show how the recall of the system can be improved over this strong baseline.
north american chapter of the association for computational linguistics | 2015
Keisuke Sakaguchi; Michael Heilman; Nitin Madnani
A major opportunity for NLP to have a realworld impact is in helping educators score student writing, particularly content-based writing (i.e., the task of automated short answer scoring). A major challenge in this enterprise is that scored responses to a particular question (i.e., labeled data) are valuable for modeling but limited in quantity. Additional information from the scoring guidelines for humans, such as exemplars for each score level and descriptions of key concepts, can also be used. Here, we explore methods for integrating scoring guidelines and labeled responses, and we find that stacked generalization (Wolpert, 1992) improves performance, especially for small training sets.
workshop on innovative use of nlp for building educational applications | 2015
Torsten Zesch; Michael Heilman; Aoife Cahill
Automated short answer scoring is increasingly used to give students timely feedback about their learning progress. Building scoring models comes with high costs, as stateof-the-art methods using supervised learning require large amounts of hand-annotated data. We analyze the potential of recently proposed methods for semi-supervised learning based on clustering. We find that all examined methods (centroids, all clusters, selected pure clusters) are mainly effective for very short answers and do not generalize well to severalsentence responses.