Joshua T. Goodman
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joshua T. Goodman.
Computer Speech & Language | 1999
Stanley F. Chen; Joshua T. Goodman
We survey the most widely-used algorithms for smoothing models for language n -gram modeling. We then present an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980); Katz (1987); Bell, Cleary and Witten (1990); Ney, Essen and Kneser (1994), and Kneser and Ney (1995). We investigate how factors such as training data size, training corpus (e.g. Brown vs. Wall Street Journal), count cutoffs, and n -gram order (bigram vs. trigram) affect the relative performance of these methods, which is measured through the cross-entropy of test data. We find that these factors can significantly affect the relative performance of models, with the most significant factor being training data size. Since no previous comparisons have examined these factors systematically, this is the first thorough characterization of the relative performance of various algorithms. In addition, we introduce methodologies for analyzing smoothing algorithm efficacy in detail, and using these techniques we motivate a novel variation of Kneser?Ney smoothing that consistently outperforms all other algorithms evaluated. Finally, results showing that improved language model smoothing leads to improved speech recognition performance are presented.
meeting of the association for computational linguistics | 1996
Stanley F. Chen; Joshua T. Goodman
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.
Computer Speech & Language | 2001
Joshua T. Goodman
In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n -grams, skipping, interpolated Kneser?Ney smoothing, and clustering. We present explorations of variations on, or of the limits of, each of these techniques, including showing that sentence mixture models may have more potential. While all of these techniques have been studied separately, they have rarely been studied in combination. We compare a combination of all techniques together to a Katz smoothed trigram model with no count cutoffs. We achieve perplexity reductions between 38 and 50% (1 bit of entropy), depending on training data size, as well as a word error rate reduction of 8.9%. Our perplexity reductions are perhaps the highest reported compared to a fair baseline.
ACM Transactions on Asian Language Information Processing | 2002
Jianfeng Gao; Joshua T. Goodman; Mingjing Li; Kai-Fu Lee
This article presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) there is no standard definition of words in Chinese; (2) word boundaries are not marked by spaces; and (3) there is a dearth of training data. Our unified approach automatically and consistently gathers a high-quality training data set from the Web, creates a high-quality lexicon, segments the training data using this lexicon, and compresses the language model, all by using the maximum likelihood principle, which is consistent with trigram model training. We show that each of the methods leads to improvements over standard SLM, and that the combined method yields the best pinyin conversion result reported.
international conference on acoustics, speech, and signal processing | 2001
Joshua T. Goodman
Maximum entropy models are considered by many to be one of the most promising avenues of language modeling research. Unfortunately, long training times make maximum entropy research difficult. We present a speedup technique: we change the form of the model to use classes. Our speedup works by creating two maximum entropy models, the first of which predicts the class of each word, and the second of which predicts the word itself. This factoring of the model leads to fewer nonzero indicator functions, and faster normalization, achieving speedups of up to a factor of 35 over one of the best previous techniques. It also results in typically slightly lower perplexities. The same trick can be used to speed training of other machine learning techniques, e.g. neural networks, applied to any problem with a large number of outputs, such as language modeling.
intelligent user interfaces | 2002
Joshua T. Goodman; Gina Venolia; Keith Steury; Chauncey R. Parker
Language models predict the probability of letter sequences. Soft keyboards are images of keyboards on a touch screen for input on Personal Digital Assistants. When a soft keyboard user hits a key near the boundary of a key position, the language model and key press model are combined to select the most probable key sequence. This leads to an overall error rate reduction by a factor of 1.67 to 1.87. An extended version of this paper [4] is available.
meeting of the association for computational linguistics | 2002
Joshua T. Goodman
We describe a speedup for training conditional maximum entropy models. The algorithm is a simple variation on Generalized Iterative Scaling, but converges roughly an order of magnitude faster, depending on the number of constraints, and the way speed is measured. Rather than attempting to train all model parameters simultaneously, the algorithm trains them sequentially. The algorithm is easy to implement, typically uses only slightly more memory, and will lead to improvements for most maximum entropy problems.
electronic commerce | 2004
Joshua T. Goodman; Robert L. Rounthwaite
We analyze the problem of preventing outgoing spam. We show that some conventional techniques for limiting outgoing spam are likely to be ineffective. We show that while imposing per message costs would work, less annoying techniques also work. In particular, it is only necessary that the average cost to the spammer over the lifetime of an account exceed his profits, meaning that not every message need be challenged. We develop three techniques, one based on additional HIP challenges, one based on computational challenges, and one based on paid subscriptions. Each system is designed to impose minimal costs on legitimate users, while being too costly for spammers. We also show that maximizing complaint rates is a key factor, and suggest new standards to encourage high complaint rates.
international conference on acoustics, speech, and signal processing | 2001
Xuedong Huang; Alex Acero; Ciprian Chelba; Li Deng; Jasha Droppo; Doug Duchene; Joshua T. Goodman; Hsiao-Wuen Hon; Derek Jacoby; Li Jiang; Ricky Loynd; Milind Mahajan; Peter Mau; Scott Meredith; Salman Mughal; Salvado Neto; Mike Plumpe; Kuansan Steury; Gina Venolia; Kuansan Wang; Ye-Yi Wang
Dr. Who is a Microsoft research project aiming at creating a speech-centric multimodal interaction framework, which serves as the foundation for the NET natural user interface. MiPad is the application prototype that demonstrates compelling user advantages for wireless personal digital assistant (PDA) devices, MiPad fully integrates continuous speech recognition (CSR) and spoken language understanding (SLU) to enable users to accomplish many common tasks using a multimodal interface and wireless technologies. It tries to solve the problem of pecking with tiny styluses or typing on minuscule keyboards in todays PDAs. Unlike a cellular phone, MiPad avoids speech-only interaction. It incorporates a built-in microphone that activates whenever a field is selected. As a user taps the screen or uses a built in roller to navigate, the tapping action narrows the number of possible instructions for spoken word understanding. MiPad currently runs on a Windows CE Pocket PC with a Windows 2000 machine where speech recognition is performed. The Dr Who CSR engine uses a unified CFG and n-gram language model. The Dr Who SLU engine is based on a robust chart parser and a plan-based dialog manager. The paper discusses MiPads design, implementation work in progress, and preliminary user study in comparison to the existing pen-based PDA interface.
international world wide web conferences | 2004
Geoff Hulten; Joshua T. Goodman; Robert L. Rounthwaite
In this paper we analyze a very large junk e-mail corpus which was generated by a hundred thousand volunteer users of the Hotmail e-mail service. We describe how the corpus is being collected, and analyze: the geographic origins of the e-mail who the e-mail is targeting and what the e-mail is selling.