Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gordon Sun is active.

Publication


Featured researches published by Gordon Sun.


international acm sigir conference on research and development in information retrieval | 2007

A regression framework for learning ranking functions using relative relevance judgments

Zhaohui Zheng; Keke Chen; Gordon Sun; Hongyuan Zha

Effective ranking functions are an essential part of commercial search engines. We focus on developing a regression framework for learning ranking functions for improving relevance of search engines serving diverse streams of user queries. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: 1) absolute relevance judgments arising from explicit labeling of search results; and 2) relative relevance judgments extracted from user click throughs of search results or converted from the absolute relevance judgments. We propose a novel optimization framework emphasizing the use of relative relevance judgments. The main contribution is the development of an algorithm based on regression that can be applied to objective functions involving preference data, i.e., data indicating that a document is more relevant than another with respect to a query. Experimental results are carried out using data sets obtained from a commercial search engine. Our results show significant improvements of our proposed methods over some existing methods.


international acm sigir conference on research and development in information retrieval | 2009

Global ranking by exploiting user clicks

Shihao Ji; Ke Zhou; Ciya Liao; Zhaohui Zheng; Gui-Rong Xue; Olivier Chapelle; Gordon Sun; Hongyuan Zha

It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which records the sequence of documents being clicked and not clicked in the result set during a user search session. We formulate the problem as a global ranking problem, emphasizing the importance of the sequential nature of user clicks, with the goal to predict the relevance labels of all the documents in a search session. This is distinct from conventional learning to rank methods that usually design a ranking model defined on a single document; in contrast, in our model the relational information among the documents as manifested by an aggregation of user clicks is exploited to rank all the documents jointly. In particular, we adapt several sequential supervised learning algorithms, including the conditional random field (CRF), the sliding window method and the recurrent sliding window method, to the global ranking problem. Experiments on the click data collected from a commercial search engine demonstrate that our methods can outperform the baseline models for search results re-ranking.


international world wide web conferences | 2008

Investigation of partial query proximity in web search

Jing Bai; Yi Chang; Hang Cui; Zhaohui Zheng; Gordon Sun; Xin Li

Proximity of query terms in a document is an important criterion in IR. However, no investigation has been made to determine the most useful term sequences for which proximity should be considered. In this study, we test the effectiveness of using proximity of partial term sequences (n-grams) for Web search. We observe that the proximity of sequences of 3 to 5 terms is most effective for long queries, while shorter or longer sequences appear less useful. This suggests that combinations of 3 to 5 terms can best capture the intention in user queries. In addition, we also experiment with weighing the importance of query sub-sequences using query log frequencies. Our preliminary tests show promising empirical results.


conference on information and knowledge management | 2009

Multi-task learning for learning to rank in web search

Jing Bai; Ke Zhou; Gui-Rong Xue; Hongyuan Zha; Gordon Sun; Belle L. Tseng; Zhaohui Zheng; Yi Chang

Both the quality and quantity of training data have significant impact on the performance of ranking functions in the context of learning to rank for web search. Due to resource constraints, training data for smaller search engine markets are scarce and we need to leverage existing training data from large markets to enhance the learning of ranking function for smaller markets. In this paper, we present a boosting framework for learning to rank in the multi-task learning context for this purpose. In particular, we propose to learn non-parametric common structures adaptively from multiple tasks in a stage-wise way. An algorithm is developed to iteratively discover super-features that are effective for all the tasks. The estimation of the functions for each task is then learned as a linear combination of those super-features. We evaluate the performance of this multi-task learning method for web search ranking using data from a search engine. Our results demonstrate that multi-task learning methods bring significant relevance improvements over existing baseline methods.


conference on information and knowledge management | 2006

Incorporating query difference for learning retrieval functions in world wide web search

Hongyuan Zha; Zhaohui Zheng; Haoying Fu; Gordon Sun

We discuss information retrieval methods that aim at serving a diverse stream of user queries such as those submitted to commercial search engines. We propose methods that emphasize the importance of taking into consideration of query difference in learning effective retrieval functions. We formulate the problem as a multi-task learning problem using a risk minimization framework. In particular, we show how to calibrate the empirical risk to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we also propose an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. We work out the details for both L1 and L2 regularization cases, and provide convergence analysis for the alternating optimization method for the special case when the retrieval functions belong to a reproducing kernel Hilbert space. We illustrate the effectiveness of the proposed methods using modeling data extracted from a commercial search engine. We also point out how the current framework can be extended in future research.


international conference on data engineering | 2008

Adapting ranking functions to user preference

Keke Chen; Ya Zhang; Zhaohui Zheng; Hongyuan Zha; Gordon Sun

Learning to rank has become a popular method for web search ranking. Traditionally, expert-judged examples are the major training resource for machine learned web ranking, which is expensive to get for training a satisfactory ranking function. The demands for generating specific web search ranking functions tailored for different domains, such as ranking functions for different regions, have aggravated this problem. Recently, a few methods have been proposed to extract training examples from user click through log. Due to the low cost of getting user preference data, it is attractive to combine these examples in training ranking functions. However, because of the different natures of the two types of data, they may have different influences on ranking function. Therefore, it is challenging to develop methods for effectively combining them in training ranking functions. In this paper, we address the problem of adapting an existing ranking function to user preference data, and develop a framework for conveniently tuning the contribution of the user preference data in the tuned ranking function. Experimental results show that with our framework it is convenient to generate a batch of adapted ranking functions and to select functions with different tradeoffs between the base function and the user preference data.


international acm sigir conference on research and development in information retrieval | 2006

Incorporating query difference for learning retrieval functions in information retrieval

Hongyuan Zha; Zhaohui Zheng; Haoying Fu; Gordon Sun

We discuss information retrieval methods that aim at serving a diverse stream of user queries. We propose methods that emphasize the importance of taking into consideration of query difference in learning effective retrieval functions. We formulate the problem as a multi-task learning problem using a risk minimization framework. In particular, we show how to calibrate the empirical risk to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we also propose an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. We illustrate the effectiveness of the proposed methods using modeling data extracted from a commercial search engine.


Archive | 1999

Method and apparatus for measuring similarity among electronic documents

Michael E. Palmer; Gordon Sun; Hongyuan Zha


neural information processing systems | 2007

A General Boosting Method and its Application to Learning Ranking Functions for Web Search

Zhaohui Zheng; Hongyuan Zha; Tong Zhang; Olivier Chapelle; Keke Chen; Gordon Sun


Archive | 2006

System and method for indexing web content using click-through features

Gordon Sun; Zhaohui Zheng

Collaboration


Dive into the Gordon Sun's collaboration.

Top Co-Authors

Avatar

Keke Chen

Wright State University

View shared research outputs
Top Co-Authors

Avatar

Ya Zhang

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Haoying Fu

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Ke Zhou

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge