Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Muyun Yang is active.

Publication


Featured researches published by Muyun Yang.


meeting of the association for computational linguistics | 2000

Statistics Based Hybrid Approach to Chinese Base Phrase Identification

Tiejun Zhao; Muyun Yang; Fang Liu; Jianmin Yao; Hao Yu

This paper extends the base noun phrase (BNP) identification into a research on Chinese base phrase identification. After briefly introducing some basic concepts on Chinese base phrase, this paper presents a statistics based hybrid model for identifying 7 types of Chinese base phrases in view. Experiments show the efficiency of the proposed method in simplifying sentence structure. Significance of the research lies in it provides a solid foundation for the Chinese parser.


international symposium on chinese spoken language processing | 2006

Construct trilingual parallel corpus on demand

Muyun Yang; Hongfei Jiang; Tiejun Zhao; Sheng Li

This paper describes the effort of constructing the Olympic Oriented Trilingual Corpus for the development of NLP applications for Beijing 2008. Designed to support the real NLP applications instead of pure research purpose, this corpus is challenged by multilingual, multi domain and multi system requirements in its construction. The key issue, however, lies in the determination of the proper corpus scale in relation to the time and cost allowed. To solve this problem, this paper proposes to observe the better system performance in the sub-domain than in the whole corpus as the signal of least corpus needed. The hypothesis is that the multi-domain corpus should be sufficient to reveal the domain features at least. So far a Chinese English Japanese tri-lingual corpus totaling 2.4 million words has been accomplished as the first stage result, in which information on domains, locations and topics of the language materials has been annotated in XML.


international conference on computational linguistics | 2002

Learning Chinese bracketing knowledge based on a bilingual language model

Yajuan Lü; Sheng Li; Tiejun Zhao; Muyun Yang

This paper proposes a new method for automatic acquisition of Chinese bracketing knowledge from English-Chinese sentence-aligned bilingual corpora. Bilingual sentence pairs are first aligned in syntactic structure by combining English parse trees with a statistical bilingual language model. Chinese bracketing knowledge is then extracted automatically. The preliminary experiments show automatically learned knowledge accords well with manually annotated brackets. The proposed method is particularly useful to acquire bracketing knowledge for a less studied language that lacks tools and resources found in a second language more studied. Although this paper discusses experiments with Chinese and English, the method is also applicable to other language pairs.


international acm sigir conference on research and development in information retrieval | 2010

Predicting query potential for personalization, classification or regression?

Chen Chen; Muyun Yang; Sheng Li; Tiejun Zhao; Haoliang Qi

The goal of predicting query potential for personalization is to determine which queries can benefit from personalization. In this paper, we investigate which kind of strategy is better for this task: classification or regression. We quantify the potential benefits of personalizing search results using two implicit click-based measures: Click entropy and Potential@N. Meanwhile, queries are characterized by query features and history features. Then we build C-SVM classification model and epsilon-SVM regression model respectively according to these two measures. The experimental results show that the classification model is a better choice for predicting query potential for personalization.


international conference on innovative computing, information and control | 2006

Bilingual Phrase Extraction from N-Best Alignments

Yong-Zeng Xue; Sheng Li; Tiejun Zhao; Muyun Yang; Jun Li

Improved approach of phrase extraction was proposed for phrase-based statistical machine translation. The effectiveness was investigated when using n-best alignments instead of one-best for phrase extraction. Bilingual phrase pairs were extracted in the presented approach by combining word-to-word links from n-best alignments between source and target sentences. First, the n-best alignments were divided into hierarchies by frequencies of word co-occurrence. Second, candidates of phrase pairs were extracted from each layer. Experimental results show that the presented approach outperforms the baseline system Pharaoh in both NIST and BLEU scores. Therefore it is effective to use n-best alignments as an extension to one-best alignment for phrase extraction


international joint conference on natural language processing | 2004

FML-Based SCF predefinition learning for chinese verbs

Xiwu Han; Tiejun Zhao; Muyun Yang

This paper describes the first attempt to acquire Chinese SCFs automatically and the application of Flexible Maximum Likelihood (FML), a variational filtering method of the simple maximum likelihood (ML) estimate from observed relative frequencies, to the task of predefining a basic SCF set for Chinese verb subcategorization acquisition. By setting a flexible threshold for SCF probability distributions over 1774 Chinese verbs, we obtained 141 basic SCFs with a reasonably practical coverage of 98.64% over 43,000 Chinese sentences. After complementation of 11 manually observed SCFs, a both linguistically and intuitively acceptable basic SCF set was predefined for future SCF acquisition work.


NLPCC | 2013

Feature Analysis in Microblog Retrieval Based on Learning to Rank

Zhongyuan Han; Xuwei Li; Muyun Yang; Haoliang Qi; Sheng Li

Learning to rank, which can fuse various of features, performs well in microblog retrieval. However, it is still unclear how the features function in microblog ranking. To address this issue, this paper examines the contribution of each single feature together with the contribution of the feature combinations via the ranking SVM for microblog retrieval modeling. The experimental results on the TREC microblog collection show that textual features, i.e. content relevance between a query and a microblog, contribute most to the retrieval performance. And the combination of certain non-textual features and textual features can further enhance the retrieval performance, though non-textual features alone produce rather weak results.


international acm sigir conference on research and development in information retrieval | 2010

Re-examination on lam% in spam filtering

Haoliang Qi; Muyun Yang; Xiaoning He; Sheng Li

Logistic average misclassification percentage (lam%) is a key measure for the spam filtering performance. This paper demonstrates that a spam filter can achieve a perfect 0.00% in lam%, the minimal value in theory, by simply setting a biased threshold during the classifier modeling. At the same time, the overall classification performance reaches only a low accuracy. The result suggests that the role of lam% for spam filtering evaluation should be re-examined.


meeting of the association for computational linguistics | 2009

A Statistical Machine Translation Model Based on a Synthetic Synchronous Grammar

Hongfei Jiang; Muyun Yang; Tiejun Zhao; Sheng Li; Bo Wang

Recently, various synchronous grammars are proposed for syntax-based machine translation, e.g. synchronous context-free grammar and synchronous tree (sequence) substitution grammar, either purely formal or linguistically motivated. Aiming at combining the strengths of different grammars, we describes a synthetic synchronous grammar (SSG), which tentatively in this paper, integrates a synchronous context-free grammar (SCFG) and a synchronous tree sequence substitution grammar (STSSG) for statistical machine translation. The experimental results on NIST MT05 Chinese-to-English test set show that the SSG based translation system achieves significant improvement over three baseline systems.


international conference on machine learning and cybernetics | 2004

Parsing Chinese with head-driven model

Hailong Cao; Tiejun Zhao; Muyun Yang; Sheng Li

Great progress has been made in parsing the Wall Street Journal portion of the Penn Treebank. Now parsing languages other than English is an intensive research area. Head-driven model is one of the best English parsing models. It has been successfully applied to Czech but failed to outperform a base-line model in parsing German. This paper attempts to parse Chinese with head-driven model. Promising experimental results demonstrate that head-driven model works well for Chinese. We propose a hybrid parsing strategy, which combines head-driven model with a Chinese base phrases parsing model. The combined model not only improves the performance but also makes the parser space and time efficient. We evaluate our method in PARSEVAL measures, and the combined model performances are at 79.88% precision, 81.97% recall.

Collaboration


Dive into the Muyun Yang's collaboration.

Top Co-Authors

Avatar

Sheng Li

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tiejun Zhao

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Haoliang Qi

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Zhongyuan Han

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Junguo Zhu

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Xiaoning He

Harbin University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hongfei Jiang

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yong Han

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Shuqi Sun

Harbin Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge