Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aitao Chen is active.

Publication


Featured researches published by Aitao Chen.


cross language evaluation forum | 2002

Cross-Language Retrieval Experiments at CLEF 2002

Aitao Chen

This paper describes monolingual, bilingual, and multilingual retrieval experiments using CLEF 2002 test collection. The paper presents a technique for incorporating blind relevance feedback into a document ranking formula based on logistic regression analysis. Both blind relevance feedback and decompounding in German or Dutch are shown to be effective in monolingual and bilingual retrieval. The amount of improvement of performance by decompounding varies from one set of topics to another. The simple raw-score merging strategy in multilingual retrieval can be effective if the individual ranked lists of documents, one for each document language, are produced using the same retrieval system under similar conditions. The performance of English to French bilingual retrieval using a large parallel corpus as the translation resource is comparable to that using machine translation systems.


international acm sigir conference on research and development in information retrieval | 1997

Chinese text retrieval without using a dictionary

Aitao Chen; Jianzhang He; Liangjie Xu; Fredric C. Gey; Jason Meggs

It is generafly believed that words, rather than characters, should be the smallest indexing unit for Chinese text retrieval systems, and that it is essential to have a comprehensive Chinese dictionary or lexicon for Chhmse text retrieval systems to do well. Chinese text has no delimiters to mark woni boundaries. As a result, any text retrieval systems that build word-based indexes need to segment text into words. We implemented several statistical and dictionary-hazed word segmentation methods to study the effect on retrieval effectiveness of different segmentation methods using the TREC-S Chinese test collection and topics. The results show that, for all three sets of queries, the simple bigram indexing and the purely statistical word segmentation perform better than the popular dictionary-based maximum matching method with a dictionary of 138,955 entries.


Information Retrieval | 2004

Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding

Aitao Chen; Fredric C. Gey

Multilingual retrieval (querying of multiple document collections each in a different language) can be achieved by combining several individual techniques which enhance retrieval: machine translation to cross the language barrier, relevance feedback to add words to the initial query, decompounding for languages with complex term structure, and data fusion to combine monolingual retrieval results from different languages. Using the CLEF 2001 and CLEF 2002 topics and document collections, this paper evaluates these techniques within the context of a monolingual document ranking formula based upon logistic regression. Each individual technique yields improved performance over runs which do not utilize that technique. Moreover the techniques are complementary, in that combining the best techniques outperforms individual technique performance. An approximate but fast document translation using bilingual wordlists created from machine translation systems is presented and evaluated. The fast document translation is as effective as query translation in multilingual retrieval. Furthermore, when fast document translation is combined with query translation in multilingual retrieval, the performance is significantly better than that of query translation or fast document translation.


cross language evaluation forum | 2001

Multilingual Information Retrieval Using English and Chinese Queries

Aitao Chen

The University of California at Berkeley group two participated in the CLEF 2001 monolingual, bilingual, and multilingual retrieval tasks. In this paper, we present a German decompounding procedure and a method of combining multiple translation resources for translating Chinese into English. We also report on our experiments with three different approaches to multilingual retrieval.


cross language evaluation forum | 2003

Combining query translation and document translation in cross-language retrieval

Aitao Chen; Fredric C. Gey

This paper describes monolingual, bilingual, and multilingual retrieval experiments using the CLEF 2003 test collection. The paper compares query translation-based multilingual retrieval with document translation-based multilingual retrieval where the documents are translated into the query language by translating the document words individually using machine translation systems or statistical translation lexicons derived from parallel texts. The multilingual retrieval results show that document translation-based retrieval is slightly better than the query translation-based retrieval on the CLEF 2003 test collection. Furthermore, combining query translation and document translation in multilingual retrieval achieves even better performance.


Proceedings of the fifth international workshop on on Information retrieval with Asian languages | 2000

Combining multiple sources for short query translation in Chinese-English cross-language information retrieval

Aitao Chen; Hailing Jiang; Fredric C. Gey

In this paper, we examine various factors that affect the retrieval performance of Chinese-English cross-language retrieval. The factors include segmentation dictionary coverage, segmentation algorithm, transfer dictionary coverage, transfer dictionary quality, and translation disambiguation. The paper introduces an idea of recovering the original English names for the transliterated Chinese words, mainly the proper name, using search engine. We used two transfer dictionaries and a Chinese search engine to transfer short Chinese queries into English. The majority of the Chinese words were translated into English, but the overall precision of the Chinese to English cross-language retrieval is only about 56% of the overall precision for the monolingual retrieval.1


cross language evaluation forum | 2000

Cross-Language Retrieval for the CLEF Collections - Comparing Multiple Methods of Retrieval

Fredric C. Gey; Hailing Jiang; Vivien Petras; Aitao Chen

For our participation in CLEF, the Berkeley group participated in the monolingual, multilingual and GIRT tasks. To help enrich the CLEF relevance set for future training, we prepared a manual reformulation of the original German queries which achieved excellent performance, more than 110% better than average of median precision. The GIRT task performed English-German Cross-Language IR by comparing commercial machine translation with thesaurus lookup techniques and query expansion techniques. Combining all techniques using simple data fusion produced the best results.


acm/ieee joint conference on digital libraries | 2002

Harvesting translingual vocabulary mappings for multilingual digital libraries

Ray R. Larson; Fredric C. Gey; Aitao Chen

This paper presents a method of information harvesting and consolidation to support the multilingual information requirements for cross-language information retrieval within digital library systems. We describe a way to create both customized bilingual dictionaries and multilingual query mappings from a source language to many target languages. We will describe a multilingual conceptual mapping resource with broad coverage (over 100 written languages can be supported) that is truly multilingual as opposed to bilingual parings usually derived from machine translation. This resource is derived from the 10+ million title online library catalog of the University of California. It is created statistically via maximum likelihood associations from word and phrases in book titles of many languages to human assigned subject headings in English. The 150,000 subject headings can form interlingua mappings between pairs of languages or from one language to several languages. While our current demonstration prototype maps between ten languages (English, Arabic, Chinese, French, German, Italian, Japanese, Portuguese, Russian, Spanish), extensions to additional languages are straightforward. We also describe how this resource is being expanded for languages where linguistic coverage is limited in our initial database, by automatically harvesting new information from international online library catalogs using the Z39.50 networked library search protocol.


international acm sigir conference on research and development in information retrieval | 2002

Translingual vocabulary mappings for multilingual information access

Fredric C. Gey; Aitao Chen; Michael K. Buckland; Ray R. Larson

This demonstration presents a completely new resource to support the multilingual information requirements of crosslanguage information retrieval and statistical machine translation. The resource can create both customized bilingual dictionaries and multilingual query mappings from a source language to many target languages. The resource coverage is broad (over 100 written languages can potentially be supported) and is truly multilingual as opposed to bilingual pairings. Our current prototype software maps between ten languages (English, Arabic, Chinese, French, German, Italian, Japanese, Portuguese, Russian, Spanish) but extensions to additional languages are straightforward. Our search for this new source of multilingual resources derives from a background of library research. The worlds great research university libraries have book content which spans the worlds languages. Through the oÆces of the University of California Digital Library, we obtained a private copy of the University of California Book Catalog with 10,091,737 records, of which 4,626,793 were non-English. We utilized our entry vocabulary index technology (described in [1]) to create maximum likelihood mappings from Library of Congress Subject Headings (LCSH) to words found in


international acm sigir conference on research and development in information retrieval | 2004

Geotemporal querying of multilingual documents

Fredric C. Gey; Aitao Chen; Ray R. Larson; Kim Carl

This demonstration utilizes a geographic information system interface to display multilingual news documents in time and space by extracting place names from text and matching them to a multilingual multi-script gazetteer which identifies the latitude and longitude of the location.

Collaboration


Dive into the Aitao Chen's collaboration.

Top Co-Authors

Avatar

Fredric C. Gey

University of California

View shared research outputs
Top Co-Authors

Avatar

Ray R. Larson

University of California

View shared research outputs
Top Co-Authors

Avatar

Hailing Jiang

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jason Meggs

University of California

View shared research outputs
Top Co-Authors

Avatar

Jianzhang He

University of California

View shared research outputs
Top Co-Authors

Avatar

Youngin Kim

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Byron Lam

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge