Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kai-Fu Lee is active.

Publication


Featured researches published by Kai-Fu Lee.


ACM Transactions on Asian Language Information Processing | 2002

Toward a unified approach to statistical language modeling for Chinese

Jianfeng Gao; Joshua T. Goodman; Mingjing Li; Kai-Fu Lee

This article presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) there is no standard definition of words in Chinese; (2) word boundaries are not marked by spaces; and (3) there is a dearth of training data. Our unified approach automatically and consistently gathers a high-quality training data set from the Web, creates a high-quality lexicon, segments the training data using this lexicon, and compresses the language model, all by using the maximum likelihood principle, which is consistent with trigram model training. We show that each of the methods leads to improvements over standard SLM, and that the combined method yields the best pinyin conversion result reported.


meeting of the association for computational linguistics | 2000

A new statistical approach to Chinese Pinyin input

Zheng Chen; Kai-Fu Lee

Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also, to deal with real input, it also includes a typing model which enables spelling correction in sentence-based Pinyin input, and a spelling model for English which enables modeless Pinyin input.


meeting of the association for computational linguistics | 2000

Distribution-based pruning of backoff language models

Jianfeng Gao; Kai-Fu Lee

We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performed 7--9% (word perplexity reduction) better than conventional cutoff methods.


international conference on acoustics, speech, and signal processing | 2000

A unified approach to statistical language modeling for Chinese

Jianfeng Gao; Haifeng Wang; Mingjing Li; Kai-Fu Lee

The paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigrams to Chinese is challenging because: (1) there is no standard definition of words in Chinese, (2) word boundaries are not marked by spaces, and (3) there is a dearth of training data. Our unified approach automatically and consistently gathers a high-quality training data set from the Web, creates a high-quality lexicon, and segments the training data using this lexicon, all using a maximum likelihood principle, which is consistent with the trigram training. We show that each of the methods leads to improvements over standard SLM, and that the combined method yields the best pinyin conversion result reported.


pacific rim international conference on artificial intelligence | 2000

Towards a next-generation search engine

Qiang Yang; Haifeng Wang; Ji-Rong Wen; Gao Zhang; Ye Lu; Kai-Fu Lee; HongJiang Zhang

As more information becomes available on the World Wide Web, it has become an acute problem to provide effective search tools for information access. Previous generations of search engines are mainly keyword-based and cannot satisfy many informational needs of their users. Search based on simple keywords returns many irrelevant documents that can easily swamp the user. In this paper, we describe the system architecture of a next-generation search engine that we have built with a goal to provide accurate search result on frequently asked concepts. Our key differentiating factors from other search engines are natural language user interface, accurate search results, and interactive user interface and multimedia content retrieval. We describe the architecture, design goals and experience in developing the search engine.


US Patent | 2000

Search engine with natural language-based robust parsing for user query and relevance feedback learning

Haifeng Wang; Kai-Fu Lee; Qiang Yang


Archive | 2000

System and iterative method for lexicon, segmentation and language model joint optimization

Haifeng Wang; Chang-Ning Huang; Kai-Fu Lee; Shuo Di; Jianfeng Gao; Dong-Feng Cai; Lee-Feng Chien


conference of the international speech communication association | 2000

Large vocabulary Mandarin speech recognition with different approaches in modeling tones.

Eric Chang; Jian-Lai Zhou; Shuo Di; Chao Huang; Kai-Fu Lee


conference of the international speech communication association | 2000

Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition.

Chao Huang; Eric Chang; Jian-Lai Zhou; Kai-Fu Lee


conference of the international speech communication association | 2000

Discriminative training on language model.

Zheng Chen; Kai-Fu Lee; Mingjing Li

Collaboration


Dive into the Kai-Fu Lee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge