Fuchun Peng
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fuchun Peng.
international acm sigir conference on research and development in information retrieval | 2007
Fuchun Peng; Nawaaz Ahmed; Xin Li; Yumao Lu
Traditionally, stemming has been applied to Information Retrieval tasks by transforming words in documents to the their root form before indexing, and applying a similar transformation to query terms. Although it increases recall, this naive strategy does not work well for Web Search since it lowers precision and requires a significant amount of additional computation. In this paper, we propose a context sensitive stemming method that addresses these two issues. Two unique properties make our approach feasible for Web Search. First, based on statistical language modeling, we perform context sensitive analysis on the query side. We accurately predict which of its morphological variants is useful to expand a query term with before submitting the query to the search engine. This dramatically reduces the number of bad expansions, which in turn reduces the cost of additional computation and improves the precision at the same time. Second, our approach performs a context sensitive document matching for those expanded variants. This conservative strategy serves as a safeguard against spurious stemming, and it turns out to be very important for improving precision. Using word pluralization handling as an example of our stemming approach, our experiments on a major Web search engine show that stemming only 29% of the query traffic, we can improve relevance as measured by average Discounted Cumulative Gain (DCG5) by 6.1% on these queriesand 1.8% over all query traffic.
conference on information and knowledge management | 2006
Yumao Lu; Fuchun Peng; Xin Li; Nawaaz Ahmed
It is important yet hard to identify navigational queries in Web search due to a lack of sufficient information in Web queries, which are typically very short. In this paper we study several machine learning methods, including naive Bayes model, maximum entropy model, support vector machine (SVM), and stochastic gradient boosting tree (SGBT), for navigational query identification in Web search. To boost the performance of these machine techniques, we exploit several feature selection methods and propose coupling feature selection with classification approaches to achieve the best performance. Different from most prior work that uses a small number of features, in this paper, we study the problem of identifying navigational queries with thousands of available features, extracted from major commercial search engine results, Web search user click data, query log, and the whole Webs relational content. A multi-level feature extraction system is constructed.Our results on real search data show that 1) Among all the features we tested, user click distribution features are the most important set of features for identifying navigational queries. 2) In order to achieve good performance, machine learning approaches have to be coupled with good feature selection methods. We find that gradient boosting tree, coupled with linear SVM feature selection is most effective. 3) With carefully coupled feature selection and classification approaches, navigational queries can be accurately identified with 88.1% F1 score, which is 33% error rate reduction compared to the best uncoupled system, and 40% error rate reduction compared to a well tuned system without feature selection.
Archive | 2007
Fuchun Peng; Nawaaz Ahmed; Yumao Lu; Marco Zagha
Archive | 2006
Yumao Lu; Fuchun Peng; Xin Li; Nawaaz Ahmed
Archive | 2008
Yumao Lu; Nawaaz Ahmed; Fuchun Peng; Marco Zagha
Archive | 2007
Yumao Lu; Fuchun Peng; Xin Li; Nawaaz Ahmed
Archive | 2006
Xin Li; Nawaaz Ahmed; Fuchun Peng; Yumao Lu
Archive | 2010
Yumao Lu; Nawaaz Ahmed; Fuchun Peng; Marco Zagha
Archive | 2007
Fuchun Peng; Nawaaz Ahmed; Xin Li; Yumao Lu
Archive | 2011
Yumao Lu; Nawaaz Ahmed; Fuchun Peng; Marco Zagha