Kai-Fu Lee
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kai-Fu Lee.
ACM Transactions on Asian Language Information Processing | 2002
Jianfeng Gao; Joshua T. Goodman; Mingjing Li; Kai-Fu Lee
This article presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) there is no standard definition of words in Chinese; (2) word boundaries are not marked by spaces; and (3) there is a dearth of training data. Our unified approach automatically and consistently gathers a high-quality training data set from the Web, creates a high-quality lexicon, segments the training data using this lexicon, and compresses the language model, all by using the maximum likelihood principle, which is consistent with trigram model training. We show that each of the methods leads to improvements over standard SLM, and that the combined method yields the best pinyin conversion result reported.
meeting of the association for computational linguistics | 2000
Zheng Chen; Kai-Fu Lee
Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also, to deal with real input, it also includes a typing model which enables spelling correction in sentence-based Pinyin input, and a spelling model for English which enables modeless Pinyin input.
meeting of the association for computational linguistics | 2000
Jianfeng Gao; Kai-Fu Lee
We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performed 7--9% (word perplexity reduction) better than conventional cutoff methods.
international conference on acoustics, speech, and signal processing | 2000
Jianfeng Gao; Haifeng Wang; Mingjing Li; Kai-Fu Lee
The paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigrams to Chinese is challenging because: (1) there is no standard definition of words in Chinese, (2) word boundaries are not marked by spaces, and (3) there is a dearth of training data. Our unified approach automatically and consistently gathers a high-quality training data set from the Web, creates a high-quality lexicon, and segments the training data using this lexicon, all using a maximum likelihood principle, which is consistent with the trigram training. We show that each of the methods leads to improvements over standard SLM, and that the combined method yields the best pinyin conversion result reported.
pacific rim international conference on artificial intelligence | 2000
Qiang Yang; Haifeng Wang; Ji-Rong Wen; Gao Zhang; Ye Lu; Kai-Fu Lee; HongJiang Zhang
As more information becomes available on the World Wide Web, it has become an acute problem to provide effective search tools for information access. Previous generations of search engines are mainly keyword-based and cannot satisfy many informational needs of their users. Search based on simple keywords returns many irrelevant documents that can easily swamp the user. In this paper, we describe the system architecture of a next-generation search engine that we have built with a goal to provide accurate search result on frequently asked concepts. Our key differentiating factors from other search engines are natural language user interface, accurate search results, and interactive user interface and multimedia content retrieval. We describe the architecture, design goals and experience in developing the search engine.
US Patent | 2000
Haifeng Wang; Kai-Fu Lee; Qiang Yang
Archive | 2000
Haifeng Wang; Chang-Ning Huang; Kai-Fu Lee; Shuo Di; Jianfeng Gao; Dong-Feng Cai; Lee-Feng Chien
conference of the international speech communication association | 2000
Eric Chang; Jian-Lai Zhou; Shuo Di; Chao Huang; Kai-Fu Lee
conference of the international speech communication association | 2000
Chao Huang; Eric Chang; Jian-Lai Zhou; Kai-Fu Lee
conference of the international speech communication association | 2000
Zheng Chen; Kai-Fu Lee; Mingjing Li