Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Haoliang Qi is active.

Publication


Featured researches published by Haoliang Qi.


international acm sigir conference on research and development in information retrieval | 2010

Predicting query potential for personalization, classification or regression?

Chen Chen; Muyun Yang; Sheng Li; Tiejun Zhao; Haoliang Qi

The goal of predicting query potential for personalization is to determine which queries can benefit from personalization. In this paper, we investigate which kind of strategy is better for this task: classification or regression. We quantify the potential benefits of personalizing search results using two implicit click-based measures: Click entropy and Potential@N. Meanwhile, queries are characterized by query features and history features. Then we build C-SVM classification model and epsilon-SVM regression model respectively according to these two measures. The experimental results show that the classification model is a better choice for predicting query potential for personalization.


international conference for young computer scientists | 2008

Extending BLEU Evaluation Method with Linguistic Weight

Muyun Yang; Junguo Zhu; Jufeng Li; Lixin Wang; Haoliang Qi; Sheng Li; Liu Da-xin

BLEU is one of the most popular metrics for automatic evaluation of machine translation quality. Focusing on its ignorance of different effects of various translation units upon translation quality, this paper extends proper weights to different words and n-grams in the framework of BLEU. The linear regression method is adopted to capture the human perception on translation quality via word types and n-gram length. Compared with other linguistic-rich metrics based on machine learning, the proposed approach is simple and largely preserves BLEUpsilas advantage of language independence. Experimental results indicate that this method brings a much better evaluation performance for both human translation and machine translation than original BLEU.


NLPCC | 2013

Feature Analysis in Microblog Retrieval Based on Learning to Rank

Zhongyuan Han; Xuwei Li; Muyun Yang; Haoliang Qi; Sheng Li

Learning to rank, which can fuse various of features, performs well in microblog retrieval. However, it is still unclear how the features function in microblog ranking. To address this issue, this paper examines the contribution of each single feature together with the contribution of the feature combinations via the ranking SVM for microblog retrieval modeling. The experimental results on the TREC microblog collection show that textual features, i.e. content relevance between a query and a microblog, contribute most to the retrieval performance. And the combination of certain non-textual features and textual features can further enhance the retrieval performance, though non-textual features alone produce rather weak results.


international acm sigir conference on research and development in information retrieval | 2010

Re-examination on lam% in spam filtering

Haoliang Qi; Muyun Yang; Xiaoning He; Sheng Li

Logistic average misclassification percentage (lam%) is a key measure for the spam filtering performance. This paper demonstrates that a spam filter can achieve a perfect 0.00% in lam%, the minimal value in theory, by simply setting a biased threshold during the classifier modeling. At the same time, the overall classification performance reaches only a low accuracy. The result suggests that the role of lam% for spam filtering evaluation should be re-examined.


international conference on asian language processing | 2009

Sogou Query Log Analysis: A Case Study for Collaborative Recommendation or Personalized IR

Zhitao Zhang; Muyun Yang; Sheng Li; Haoliang Qi; Chao Song

Through analyzing the search engine logs, we can better understand the law o users’ search behavior, mining users’ personality, so that improving the performances of web information retrieval. This paper analyzes the user, query, clickthrough data of Sogou, a large-scale Chinese search engine. We focus on the relation of user, query and URL, revealing some new characteristic of the Web user. The result shows that the portal websites are visited most frequently. The average user of Sogou clicks 4.82 URL, including 1.72 distinct URL. This paper demonstrates the necessity of personalized information retrieval, which is enlightening for improving the performance of Chinese search engine.


international conference on asian language processing | 2009

The Improved Logistic Regression Models for Spam Filtering

Yong Han; Muyun Yang; Haoliang Qi; Xiaoning He; Sheng Li

The logistic regression model has achieved success in spam filtering. But it is disadvantaged by the equal adjustment of the feature weights appeared in both spam messages and ham ones during training period. This paper presents an improved logistic regression model which reduces the impact of the features appearing in both spam messages and ham ones. Byte level n-grams are employed to extract the features from messages, and TONE (Train On or Near Error) is adopted, which are proved effective in state-of-the-art spam filtering system. The official runs of CEAS (Conference on Email and Anti-Spam) Spam-filter Challenge 2008 show that the proposed model is one of the best methods. Our system achieved competitive results in all tasks and is the winner of active learning on the live stream by 1- ROCA.


asia information retrieval symposium | 2012

Adaptive Weighting Approach to Context-Sensitive Retrieval Model

Xiaochun Wang; Muyun Yang; Haoliang Qi; Sheng Li; Tiejun Zhao

To best exploit the context information for meaningful hints to the user’s intent, this paper proposes an adaptive weighting approach to improve the current context-sensitive retrieval model. The potential for adaptability is first investigated as the performance gap between the current context-sensitive models with a fixed form weight and those with adaptive weights for contextual information. Then the proper context weight is predicated according to the relation strength between the query and its context. The experimental results on a public available dataset indicate that the proposed approach outperforms three baseline methods.


international conference on asian language processing | 2010

Chinese Spam Filter Based on Relaxed Online Support Vector Machine

Yong Han; Xiaoning He; Muyun Yang; Haoliang Qi; Chao Song

Spam filtering is a classical online learning problem. When the size of training sample set becomes larger and larger, the speed of Online SVM is becoming slower and slower. Therefore, we relax the constraints of Online SVM and get the Relaxed Online SVM (ROSVM) model, which can not only improve the speed, but also can ensure the performance. In this paper, we applied this model to Chinese spam filter. Our model outperforms the best system of TREC 2006 Chinese spam filter track. Our filter also participated in the SEWM 2010 spam filter track, and got the best 1-ROCA% of the delayed feedback task and the active learning task.


international conference on asian language processing | 2010

Information Theory Based Feature Valuing for Logistic Regression for Spam Filtering

Haoliang Qi; Xiaoning He; Yong Han; Muyun Yang; Sheng Li

Discriminative learning models such as Logistic Regression (LR) has shown good performance in spam filtering tasks. While most previous researches on LR have used binary features, this discards much useful information. To overcome this problem, information theory based feature valuing method for LR instead of traditional binary features is presented. The effectiveness of our approach has been evaluated on TREC, CEAS, and SEWM test sets. Results show that the proposed method outperforms the traditional binary features in the most test sets.


international conference on asian language processing | 2009

Ranking vs. Classification: A Case Study in Mining Organization Name Translation from Snippets

Muyun Yang; Zhenyong Shi; Sheng Li; Tiejun Zhao; Haoliang Qi

Both classification and ranking strategy have been reported positively in mining the named entity (NE) translation from the snippets re-turned by the web search engine. Taking the most challenging issue of the organization name and its translation as an example, this paper conducts a contrastive study on the two strategies under SVM framework. We empirically show that the method of translation ranking achieves the best performance in various data settings, with the best Top-1 precision up to 65.75%. We conclude that, compared with the classification strategy, the ranking strategy is more suitable in such snippet based translation mining, in which the unbalance data issue prevails.

Collaboration


Dive into the Haoliang Qi's collaboration.

Top Co-Authors

Avatar

Muyun Yang

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Sheng Li

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Zhongyuan Han

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Leilei Kong

Harbin Engineering University

View shared research outputs
Top Co-Authors

Avatar

Tiejun Zhao

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Xiaoning He

Harbin University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Zhimao Lu

Harbin Engineering University

View shared research outputs
Top Co-Authors

Avatar

Yong Han

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Chao Song

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Song Li

Harbin Engineering University

View shared research outputs
Researchain Logo
Decentralizing Knowledge