Kenneth Ward Church | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenneth Ward Church is active.

Explore More

Publication

Featured researches published by Kenneth Ward Church.

international conference on acoustics, speech, and signal processing | 1989

A stochastic parts program and noun phrase parser for unrestricted text

Kenneth Ward Church

A program that tags each word in an input sentence with the most likely part of speech has been written. The program uses a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (probability of observing part of speech i given n following parts of speech). Program performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct.<<ETX>>

Computers and The Humanities | 1992

A method for disambiguating word senses in a large corpus

William A. Gale; Kenneth Ward Church; David Yarowsky

Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Both quantitive and qualitative methods have been tried, but much of this work has been stymied by difficulties in acquiring appropriate lexical resources. The availability of this testing and training material has enabled us to develop quantitative disambiguation methods that achieve 92% accuracy in discriminating between two very distinct senses of a noun. In the training phase, we collect a number of instances of each sense of the polysemous noun. Then in the testing phase, we are given a new instance of the noun, and are asked to assign the instance to one of the senses. We attempt to answer this question by comparing the context of the unknown instance with contexts of known instances using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval. The proposed method is probably most appropriate for those aspects of sense disambiguation that are closest to the information retrieval task. In particular, the proposed method was designed to disambiguate senses that are usually associated with different topics.

meeting of the association for computational linguistics | 1989

Word Association Norms, Mutual Information, and Lexicography

Kenneth Ward Church; Patrick Hanks

The term word association is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor. ) We will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word). This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable.) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.

knowledge discovery and data mining | 2006

Very sparse random projections

Ping Li; Trevor Hastie; Kenneth Ward Church

There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A in Rn x D be our n points in D dimensions. The method multiplies A by a random matrix R in RD x k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0,1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0,1) entries in R with entries in -1,0,1 with probabilities 1/6, 2/3, 1/6, achieving a threefold speedup in processing time.We recommend using R of entries in -1,0,1 with probabilities 1/2√D, 1-1√D, 1/2√D for achieving a significant √D-fold speedup, with little loss in accuracy.

conference on information and knowledge management | 2008

Query suggestion using hitting time

Qiaozhu Mei; Dengyong Zhou; Kenneth Ward Church

Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. However, how to generate suggestions while ensuring their semantic consistency with the original query remains a challenging problem. In this work, we propose a novel query suggestion algorithm based on ranking queries with the hitting time on a large scale bipartite graph. Without involvement of twisted heuristics or heavy tuning of parameters, this method clearly captures the semantic consistency between the suggested query and the original query. Empirical experiments on a large scale query log of a commercial search engine and a scientific literature collection show that hitting time is effective to generate semantically consistent query suggestions. The proposed algorithm and its variations can successfully boost long tail queries, accommodating personalized query suggestion, as well as finding related authors in research.

human language technology | 1991

Identifying word correspondence in parallel texts

William A. Gale; Kenneth Ward Church

Researchers in both machine translation (e.g., Brown et al, 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts (also known as bilingual corpora), bodies of text such as the Canadian Hansards (parliamentary debates) which are available in multiple languages (such as French and English). Much of the current excitement surrounding parallel texts was initiated by Brown et al. (1990), who outline a self-organizing method for using these parallel texts to build a machine translation system.

Computer Speech & Language | 1991

A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams

Kenneth Ward Church; William A. Gale

Abstract In principle, n -gram probabilities can be estimated from a large sample of text by counting the number of occurrences of each n -gram of interest and dividing by the size of the training sample. This method, which is known as maximum likelihood estimator (MLE), is very simple. However, it is unsuitable because n -grams which do not occur in the training sample are assigned zero probability. This is qualitatively wrong for use as a prior model, because it would never allow the n -gram, while clearly some of the unseen n -grams will occur in other texts. For non-zero frequencies, the MLE is quantitatively wrong. Moreover, at all frequencies, the MLE does not separate bigrams with the same frequency. We study two alternative methods. The first method is an enhanced version of the method due to Good and Turing (I. J. Good [1953]. Biometrika , 40 , 237–264). Under the modest assumption that the distribution of each bigram is binomial, Good provided a theoretical result that increases estimation accuracy. The second method is an enhanced version of the deleted estimation method (F. Jelinek & R. Mercer [1985]. IBM Technical Disclosure Bulletin , 28 , 2591–2594). It assumes even less, merely that the training and test corpora are generated by the same process. We emphasize three points about these methods. First, by using a second predictor of the probability in addition to the observed frequency, it is possible to estimate different probabilities for bigrams with the same frequency. We refer to this use of a second predictor as “enhancement.” With enhancement, we find 1200 significantly different probabilities (with a range of five orders of magnitude) for the group of bigrams not observed in the training text; the MLE method would not be able to distinguish any one of these bigrams from any other. The probabilities found by the enhanced methods agree quite closely in qualitative comparisons with the standard calculated from the test corpus. Second, the enhanced Good-Turing method provides accurate predictions of the variances of the standard probabilities estimated from the test corpus. Third, we introduce a refined testing method that enables us to measure the prediction errors directly and accurately and thus to study small differences between methods. We find that while the errors of both methods are small due to the large amount of data that we use, the enhanced Good-Turing method is three to four times as efficient in its use of data as the enhanced deleted estimate method. Good-Turing method is preferable to the enhanced deleted estimate method. Both methods are much better than MLE.

meeting of the association for computational linguistics | 1993

Char align: A Program for Aligning Parallel Texts at the Character Level

Kenneth Ward Church

There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Rosenschein (to appear), Simard et al (1992), Warwick-Armstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, char_align, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al.

international conference on computational linguistics | 1990

A spelling correction program based on a noisy channel model

Mark D. Kernighan; Kenneth Ward Church; William A. Gale

This paper describes a new program, correct, which takes words rejected by the Unix® spell program, proposes a list of candidate corrections, and sorts them by probability. The probability scores are the novel contribution of this work. Probabilities are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in the speech recognition literature (Jelinek, 1985), one can often recover the intended correction, c, from a typo, t, by finding the correction c that maximizes Pr(c) Pr(t/c). The first factor, Pr(c), is a prior model of word probabilities; the second factor, Pr(t/c), is a model of the noisy channel that accounts for spelling transformations on letter sequences (e.g., insertions, delections, substitutions and reversals). Both sets of probabilities were trained on data collected from the Associated Press (AP) newswire. This text is ideally suited for this purpose since it contains a large number of typos (about two thousand per month).

meeting of the association for computational linguistics | 1991

A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA

William A. Gale; Kenneth Ward Church

Researchers in both machine translation (e.g., Brown et al. 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian Hansards (parliamentary proceedings), which are available in multiple languages (such as French and English). One useful step is to align the sentences, that is, to identify correspondences between sentences in one language and sentences in the other language.This paper will describe a method and a program (align) for aligning sentences based on a simple statistical model of character lengths. The program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend to be translated into shorter sentences. A probabilistic score is assigned to each proposed correspondence of sentences, based on the scaled difference of lengths of the two sentences (in characters) and the variance of this difference. This probabilistic score is used in a dynamic programming framework to find the maximum likelihood alignment of sentences.It is remarkable that such a simple approach works as well as it does. An evaluation was performed based on a trilingual corpus of economic reports issued by the Union Bank of Switzerland (UBS) in English, French, and German. The method correctly aligned all but 4% of the sentences. Moreover, it is possible to extract a large subcorpus that has a much smaller error rate. By selecting the best-scoring 80% of the alignments, the error rate is reduced from 4% to 0.7%. There were more errors on the English-French subcorpus than on the English-German subcorpus, showing that error rates will depend on the corpus considered; however, both were small enough to hope that the method will be useful for many language pairs.To further research on bilingual corpora, a much larger sample of Canadian Hansards (approximately 90 million words, half in English and and half in French) has been aligned with the align program and will be available through the Data Collection Initiative of the Association for Computational Linguistics (ACL/DCI). In addition, in order to facilitate replication of the align program, an appendix is provided with detailed c-code of the more difficult core of the align program.

Explore More