Sun Yu-fang
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sun Yu-fang.
Journal of Computer Science and Technology | 2000
Du Lin; Sun Yu-fang
This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-based indexing methods, such as bi-gram or tri-gram indexing, have high false drops due to the mismatches between queries and documents. On the other hand, it’s difficult to efficiently identify all the proper nouns, terminology of different domains, and phrases in the word-based indexing systems. The new indexing method uses both proximity and mutual information of the word pairs to represent the text coutent so as to overcome the high false drop, new word and phrase problems that exist in the character-based and word-based systems. The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5.
Proceedings of the fifth international workshop on on Information retrieval with Asian languages | 2000
Du Lin; Zhang Yibo; Sun Le; Sun Yu-fang; Han Jie
This paper focused on introducing a novel PM indexing schema for Chinese text retrieval. Different with the Western languages, there is no delimiter between words in Chinese texts. The indexing is based either on the characters or on the segmented words. For the word-based indexing, the out-of-vocabulary words, such as the proper nouns, or domain terminology, are usually mis-segmented due to the limited vocabulary coverage of the segmentation dictionaries and thus impair the query precision. In this paper, several indexing and ranking methods, including the novel PM-based ranking, were tested so as to compare their efficiency in dealing with the new words in Chinese text retrieval. The experiment has shown that the query precision of the PM + word method is 10% higher than the word indexing.This paper focused on introducing a novel PM indexing schema for Chinese text retrieval. Different with the Western languages, there is no delimiter between words in Chinese texts. The indexing is based either on the characters or on the segmented words. For the word-based indexing, the out-of-vocabulary words, such as the proper nouns, or domain terminology, are usually mis-segmented due to the limited vocabulary coverage of the segmentation dictionaries and thus impair the query precision. In this paper, several indexing and ranking methods, including the novel PM-based ranking, were tested so as to compare their efficiency in dealing with the new words in Chinese text retrieval. The experiment has shown that the query precision of the PM + word method is 10% higher than the word indexing.
Journal of Computer Science and Technology | 1999
Ye Yimin; Sun Yu-fang
It is of great significance to provide Chinese TrueType font support in X Window. This paper describes the work of adding a Chinese TrueType font renderer under Xs font support mechanism. First, the origin of the idea is introduced, followed by a brief study of TrueType font and its rasterization algorithm, then the font support mechanism in X Window is discussed. Finally, an overall illustration of the Chinese TrueType font renderer is given.
international conference on computational linguistics | 2004
Zhang Jun-lin; Sun Le; Qu Wei-min; Sun Yu-fang
Language model based IR system proposed in recent 5 years has introduced the language model approach in the speech recognition area into the IR community and improves the performance of the IR system effectively. However, the assumption that all the indexed words are irrelative behind the method is not the truth. Though statistical MT approach alleviates the situation by taking the synonymy factor into account, it never helps to judge the different meanings of the same word in varied context. In this paper we propose the trigger language model based IR system to resolve the problem. Firstly we compute the mutual information of the words from training corpus and then design the algorithm to get the triggered words of the query in order to fix down the topic of query more clearly. We introduce the relative parameters into the document language model to form the trigger language model based IR system. Experiments show that the performance of trigger language model based IR system has been improved greatly. The precision of trigger language model increased 12% and recall increased nearly 10.8% compared with Ponte language model method.
Journal of Computer Science and Technology | 2001
Du Lin; Zhang Yibo; Sun Le; Sun Yu-fang
This paper proposes a novel Chinese-English Cross-Lingual Information Retrieval (CECLIR) model PME, in which bilingual dictionary and comparable corpora are used to translate the query terms. The proximity and mutual information of the term-pairs in the Chinese and English comparable corpora are employed not only to resolve the translation ambiguities but also to perform the query expansion so as to deal with the out-of-vocabulary issues in the CECLIR. The evaluation results show that the query precision of PME algorithm is about 84.4% of the monolingual information retrieval.This paper proposes a novel Chinese-English Cross-Lingual Information Retrieval (CECLIR) model PME, in which bilingual dictionary and comparable corpora are used to translate the query terms. The proximity and mutual information of the term-pairs in the Chinese and English comparable corpora are employed not only to resolve the translation ambiguities but also to perform the query expansion so as to deal with the out-of-vocabulary issues in the CECLIR. The evaluation results show that the query precision of PME algorithm is about 84.4% of the monolingual information retrieval.
Journal of Computer Science and Technology | 1997
Sun Yu-fang
International use of the UNIX system in recent years provokes a need to expand its functionality. Extensions are needed to process data in various languages as the market requirement dictate[1,2]. With the advent of open systems and interfaces, the method of internationalization (I18N) has become standardized. Hanzix Association was founded by the Institute of Software, The Chinese Academy of Sciences (ISAS, Beijing), Institute of Information Industry (III, Taipei) and Chinese University of Hong Kong (CUHK, Hong Kong), and its aim is to promote an open system standard for Chinese character (Hanzi) processing. This paper presents Hanzix, an open system environment to support Hanzi processing, including enhancement recommended for Hanzi API, input method mechanism, codeset conversion and announcement, and reviews the current work.
Archive | 2003
Liang Hong-liang; Sun Yu-fang; Zhao Qing-Song; Zhang Xiang-Feng; Sun Bo; Sun B. Design
Acta Electronica Sinica | 2001
Sun Yu-fang
NTCIR | 2004
Zhang Jun-lin; Sun Le; Zhang Yongchen; Sun Yu-fang
Computer Engineering | 2003
Sun Yu-fang