Wei-Jing Zhu
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wei-Jing Zhu.
meeting of the association for computational linguistics | 2002
Kishore Papineni; Salim Roukos; Todd Ward; Wei-Jing Zhu
Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.
IEEE Transactions on Speech and Audio Processing | 2004
William Byrne; David S. Doermann; Martin Franz; Samuel Gustman; Jan Hajic; Douglas W. Oard; Michael Picheny; Josef Psutka; Bhuvana Ramabhadran; Dagobert Soergel; Todd Ward; Wei-Jing Zhu
Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.
international acm sigir conference on research and development in information retrieval | 2001
Martin Franz; Todd Ward; J. Scott McCarley; Wei-Jing Zhu
We investigate important differences between two styles of document clustering in the context of Topic Detection and Tracking. Converting a Topic Detection system into a Topic Tracking system exposes fundamental differences between these two tasks that are important to consider in both the design and the evaluation of TDT systems. We also identify features that can be used in systems for both tasks.
north american chapter of the association for computational linguistics | 2001
Abraham Ittycheriah; Martin Franz; Wei-Jing Zhu; Adwait Ratnaparkhi; Richard J. Mammone
We present a statistical question answering system developed for TREC-9 in detail. The system is an application of maximum entropy classification for question/answer type prediction and named entity marking. We describe our system for information retrieval which did document retrieval from a local encyclopedia, and then expanded the query words and finally did passage retrieval from the TREC collection. We will also discuss the answer selection algorithm which determines the best sentence given both the question and the occurrence of a phrase belonging to the answer class desired by the question. A new method of analyzing system performance via a transition matrix is shown.
international acm sigir conference on research and development in information retrieval | 2001
Martin Franz; J. Scott McCarley; Todd Ward; Wei-Jing Zhu
Our English-Chinese cross-language IR system is trained from parallel corpora; we investigate its performance as a function of training corpus size for three different training corpora. We find that the performance of the system as trained on the three parallel corpora can be related by a simple measure, namely the out-of-vocabulary rate of query words.
Archive | 2002
Satya Dharanipragada; Martin Franz; Jeffrey Scott McCarley; Todd Ward; Wei-Jing Zhu
IBM’s story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower C seg by combining them. IBM’s topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.
text retrieval conference | 2000
Abraham Ittycheriah; Martin Franz; Wei-Jing Zhu; Adwait Ratnaparkhi; Richard J. Mammone
Topic detection and tracking | 2002
Satya Dharanipragada; Martin Franz; Jeffrey Scott McCarley; Todd Ward; Wei-Jing Zhu
conference of the international speech communication association | 2000
Satya Dharanipragada; Martin Franz; J. Scott McCarley; Kishore Papineni; Salim Roukos; Todd Ward; Wei-Jing Zhu
conference of the international speech communication association | 2001
Martin Franz; J. Scott McCarley; Todd Ward; Wei-Jing Zhu