Casey Stuart Whitelaw
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Casey Stuart Whitelaw.
empirical methods in natural language processing | 2009
Casey Stuart Whitelaw; Ben Hutchinson; Grace Chung; Gerard Ellis
We have designed, implemented and evaluated an end-to-end system spellchecking and autocorrection system that does not require any manually annotated training data. The World Wide Web is used as a large noisy corpus from which we infer knowledge about misspellings and word usage. This is used to build an error model and an n-gram language model. A small secondary set of news texts with artificially inserted misspellings are used to tune confidence classifiers. Because no manual annotation is required, our system can easily be instantiated for new languages. When evaluated on human typed data with real misspellings in English and German, our web-based systems outperform baselines which use candidate corrections based on hand-curated dictionaries. Our system achieves 3.8% total error rate in English. We show similar improvements in preliminary results on artificial data for Russian and Arabic.
conference on information and knowledge management | 2008
Casey Stuart Whitelaw; Alexander Kehlenbeck; Nemanja Petrovic; Lyle H. Ungar
Automatic recognition of named entities such as people, places, organizations, books, and movies across the entire web presents a number of challenges, both of scale and scope. Data for training general named entity recognizers is difficult to come by, and efficient machine learning methods are required once we have found hundreds of millions of labeled observations. We present an implemented system that addresses these issues, including a method for automatically generating training data, and a multi-class online classification training method that learns to recognize not only high level categories such as place and person, but also more fine-grained categories such as soccer players, birds, and universities. The resulting system gives precision and recall performance comparable to that obtained for more limited entity types in much more structured domains such as company recognition in newswire, even though web documents often lack consistent capitalization and grammatical sentence construction.
Archive | 2007
Kushal B. Dave; Casey Stuart Whitelaw; Alexis Battle
Archive | 2012
Paul Nordstrom; Casey Stuart Whitelaw
Archive | 2012
Paul Nordstrom; Casey Stuart Whitelaw
Archive | 2015
Paul Nordstrom; Casey Stuart Whitelaw
Archive | 2012
Casey Stuart Whitelaw; Arnaud Claude Weber; Paul Nordstrom
Archive | 2012
Paul Nordstrom; Casey Stuart Whitelaw
Archive | 2012
Paul Nordstrom; Casey Stuart Whitelaw
Archive | 2012
Paul Nordstrom; Casey Stuart Whitelaw