Jianmin Yao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianmin Yao is active.

Explore More

Publication

Featured researches published by Jianmin Yao.

international acm sigir conference on research and development in information retrieval | 2012

What reviews are satisfactory: novel features for automatic helpfulness voting

Yu Hong; Jun Lu; Jianmin Yao; Qiaoming Zhu; Guodong Zhou

This paper focuses on exploring the features of product reviews that satisfy users, by which to improve the automatic helpfulness voting for the reviews on commercial websites. Compared to the previous work, which single-mindedly adopts the textual features to assess the review helpfulness, we propose that user preferences are more explicit clues to infer the opinions of users on the review helpfulness. By using the user-preference based features, we firstly implement a binary helpfulness based review classification system to divide helpful reviews and useless, and on the basis, we secondly build a Ranking SVM based automatic helpfulness voting system (AHV) which rank reviews based on their helpfulness. Experiments used a large scale dataset containing over 34,266 reviews on 1289 products to test the systems, which achieves promising performances with accuracy of up to 0.72 and NDCG@10 of 0.25, and at least 9% accuracy improvement compared to the textual-feature based helpfulness assessment.

meeting of the association for computational linguistics | 2000

Statistics Based Hybrid Approach to Chinese Base Phrase Identification

Tiejun Zhao; Muyun Yang; Fang Liu; Jianmin Yao; Hao Yu

This paper extends the base noun phrase (BNP) identification into a research on Chinese base phrase identification. After briefly introducing some basic concepts on Chinese base phrase, this paper presents a statistics based hybrid model for identifying 7 types of Chinese base phrases in view. Experiments show the efficiency of the proposed method in simplifying sentence structure. Significance of the research lies in it provides a solid foundation for the Chinese parser.

meeting of the association for computational linguistics | 2006

An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation

Zhimao Lu; Haifeng Wang; Jianmin Yao; Ting Liu; Sheng Li

This paper presents a new approach based on Equivalent Pseudowords (EPs) to tackle Word Sense Disambiguation (WSD) in Chinese language. EPs are particular artificial ambiguous words, which can be used to realize unsupervised WSD. A Bayesian classifier is implemented to test the efficacy of the EP solution on Senseval-3 Chinese test set. The performance is better than state-of-the-art results with an average F-measure of 0.80. The experiment verifies the value of EP for unsupervised WSD.

conference on information and knowledge management | 2012

Cross-argument inference for implicit discourse relation recognition

Yu Hong; Xiaopei Zhou; Tingting Che; Jianmin Yao; Qiaoming Zhu; Guodong Zhou

Motivated by the critical importance of connectives in recognizing discourse relations, we present an unsupervised cross-argument inference mechanism to implicit discourse relation recognition. The basic idea is to infer the implicit discourse relation of an argument pair from a large number of comparable argument pairs, which are automatically retrieved from the web in an unsupervised way. In this way, the inference proceeds from explicit relations to implicit ones via connective as bridge. This kind of pair-to-pair inference is based on the assumption that two argument pairs with high content similarity (i.e. comparable argument pairs) should have similar discourse relationship. Evaluation on PDTB proves the effectiveness of our inference mechanism in implicit relation recognition to the four level-1 relations. It also shows that our mechanism significantly outperforms other alternatives.

meeting of the association for computational linguistics | 2014

Effective Selection of Translation Model Training Data

Le Liu; Yu Hong; Hao Liu; Xing Wang; Jianmin Yao

Data selection has been demonstrated to be an effective approach to addressing the lack of high-quality bitext for statistical machine translation in the domain of interest. Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus. By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model. In this paper, we study and experiment with novel methods that apply translation models into domain-relevant data selection. The results show that our methods outperform previous methods. When the selected sentence pairs are evaluated on an end-to-end MT task, our methods can increase the translation performance by 3 BLEU points. *

empirical methods in natural language processing | 2014

An Iterative Link-based Method for Parallel Web Page Mining

Le Liu; Yu Hong; Jun Lu; Jun Lang; Heng Ji; Jianmin Yao

Identifying parallel web pages from bilingual web sites is a crucial step of bilingual resource construction for crosslingual information processing. In this paper, we propose a link-based approach to distinguish parallel web pages from bilingual web sites. Compared with the existing methods, which only employ the internal translation similarity (such as content-based similarity and page structural similarity), we hypothesize that the external translation similarity is an effective feature to identify parallel web pages. Within a bilingual web site, web pages are interconnected by hyperlinks. The basic idea of our method is that the translation similarity of two pages can be inferred from their neighbor pages, which can be adopted as an important source of external similarity. Thus, the translation similarity of page pairs will influence each other. An iterative algorithm is developed to estimate the external translation similarity and the final translation similarity. Both internal and external similarity measures are combined in the iterative algorithm. Experiments on six bilingual websites demonstrate that our method is effective and obtains significant improvement (6.2% F-Score) over the baseline which only utilizes internal translation similarity.

international conference on computer science and service system | 2012

Simultaneous Product Attribute Name and Value Extraction with Adaptively Learnt Templates

Wei Tang; Yu Hong; Yanhui Feng; Jianmin Yao; Qiaoming Zhu

If we present the products as the attribute name and value pairs, it will improve the effectiveness of many applications. In this paper, we propose an adaptive template based method to simultaneously extract the product attribute name and value pair from Web pages. The titles of Web pages are used to assist the unsupervised template construction. And the template ranking strategy ensures the correct templates of every Web page are selected. Our approach contains four key steps: 1) construct domain attribute word bag by the titles of Web pages. 2) segment text nodes based on some default delimiters. 3) collect candidate attribute and value pairs 4) learn high-quality templates by a template ranking algorithm. The experimental corpus is collected from two domains: digital camera and mobile phone. Experiments show the precision of 94.68% and recall of 90.57% can be got by our method.

international conference on asian language processing | 2011

Using HTML Tags to Improve Parallel Resources Extraction

Yanhui Feng; Yu Hong; Wei Tang; Jianmin Yao; Qiaoming Zhu

This paper proposes a new approach to extract parallel resources (including bilingual sentences and bilingual terms) from bilingual web pages, which have a primary language and a secondary language (the second language is often the translation to primary language). Our method is composed of four tasks: 1) parsing the web page into a DOM tree and segmenting inner texts of each node into series of monolingual snippets; 2) selecting adjacent snippet pairs in different languages and with higher translation scores as seeds for the next task; 3) constructing comprehensive wrappers from selected seeds, which save both HTML and surface formatting styles; 4) mining candidate instances and selecting good instances by their similarities with seeds. In this paper, we first propose to segment text by HTML tags, and select potential parallel resources by ranking all extracted candidates. According to the experimental results, our method can be applied to bilingual pages written in any other pair of languages. Experimental results also show that our approaches are effective in improving the parallel resources extraction.

Chinese National Conference on Social Media Processing | 2016

A Novel Approach for Relation Extraction with Few Labeled Data

Xiaobin Wang; Yu Hong; Jianmin Yao; Qiaoming Zhu; Guodong Zhou

Lack of large scale training data is a challenge for conventional supervised relation extraction approach. Although distant supervision has been proposed to address this issue, it suffers from massive noise and the trained model cannot be applied to unseen relations. We present a novel approach for relation extraction which uses the relation definition as a guide and only needs a hundred of high-quality mention examples for training model. In detail, we classify the candidate mention of a specific relation by judging whether the mention is in conformity with the relation’s definition through measuring the semantic relevance between the definition and the mention. Our approach is insensitive to class-imbalance problem. And the trained model can be directly applied to classify mentions of newly defined relation without labeling new training data. Experimental results demonstrate that our approach achieves competitive performance and can be incorporated with existing approaches to boost performance.

Archive | 2012

Divided Pretreatment to Targets and Intentions for Query Recommendation

Yangyang Kang; Yu Hong; Li Yu; Jianmin Yao; Qiaoming Zhu

We propose a query recommendation method called “Divided Pretreatment to Targets and Intentions for Query Recommendation”, which concentrates on the structure, elements and composition of a query. Based on the recognition of query targets and query intentions by a classifying method, the clusters of query intentions are built following the clue of consistent and similar query targets. After that, query recommendations are generated by simple substitution of peer intentions. This method aims to explore a simple and efficient mechanism, which only analyzes and processes query itself and its internal attributes. The experiment demonstrates that, accuracy of “query targets” and “query intentions” recognition is 73.11%, while that of intention clustering reaches 55.67%. The p@1 value of query recommendation gets 57.83%.

Explore More