Daniel Lee Massey
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Lee Massey.
international conference on big data | 2015
Xugang Ye; Zijie Qi; Daniel Lee Massey
We introduce a new neural network based similarity model for learning document relevance under a query. The main idea is to use the binomial distribution to model the proportion of people who clicked document d under query q among the users who viewed d under q. Our model is a generalization of existing neural network based latent semantic models in that both its objective function and its parametrization of the user click probability generalizes the existing ones. Compared with the existing models, our new objective function distinguishes the clicked (query, document)-pairs with different relevance information, and our new parametrization of the user click probability considers both the semantic similarity and the term or lexical match information as the reasons for click(s). We tested our model on the media search logs of a commercial search engine and obtained superior performance under several metrics for relevance ranking.
conference on information and knowledge management | 2014
Xugang Ye; Jingjing Li; Zijie Qi; Bingyue Peng; Daniel Lee Massey
Lack of high quality relevance labels is a common challenge in the early stage of search engine development. In media search, due to the high recruiting and training cost, the labeling process is usually conducted by a small number of human judges. Consequently, the generated labels are often limited and biased. On the contrary, the click data that is extracted from a large population of real users is massive and less biased. However, the click data also contains considerable noise. Therefore, more and more researchers have begun to focus on combining those two resources to generate a better ground-truth approximation. In this paper, we present a novel method of generating the relevance labels for media search. The method is based on a generative model that considers human judgment, position, and click status as observations generated from a hidden relevance with multinomial prior. The model considers the position bias with a requirement that the click status depends on both the hidden relevance and the position. We infer the model parameters by using a Gibbs sampling procedure with hyper-parameter optimization. From experiments on the Xboxs data, the newly inferred relevance labels significantly increase the data volume for ranker training and have demonstrated superior performance compared to using the limited human labels only, the click-through-rates only, and the heuristic combination of the two.
international conference on data mining | 2015
Xugang Ye; Zijie Qi; Xinying Song; Xiaodong He; Daniel Lee Massey
Modeling text semantic similarity via neural network approaches has significantly improved performance on a set of information retrieval tasks in recent studies. However these neural network based latent semantic models are mostly trained by using simple user behavior logging data such as clicked (query, document)-pairs, and all the clicked pairs are assumed to be uniformly positive examples. Therefore, the existing method for learning the model parameters does not differentiate data samples that might reflect different relevance information. In this paper, we relax this assumption and propose a new learning method through a generalized loss function to capture the subtle relevance differences of training samples when a more granular label structure is available. We have applied it to the Xbox Ones movie search task where session-based user behavior information is available and the granular relevance differences of training samples are derived from the session logs. Compared with the existing method, our new generalized loss function has demonstrated superior test performance measured by several user-engagement metrics. It also yields significant performance lift when the score computed from our model is used as a semantic similarity feature in the gradient boosted decision tree model which is widely used in modern search engines.
Archive | 2009
Mark Groves; Daniel Lee Massey; Ian M. Bavey; David M. Sauntry
Archive | 2009
Daniel Lee Massey; Mark Groves
Archive | 2015
Douglas C. Burger; Daniel Lee Massey; Bart De Smet; Blaise Aguera y Arcas
Archive | 2014
Saar Yahalom; Bart De Smet; Daniel Lee Massey; Douglas C. Burger; Blaise Hillary Aguera y Arcas
Archive | 2009
Daniel Lee Massey; Mark Groves
Archive | 2015
Michael F. Cohen; Douglas C. Burger; Asta Roseway; Andrew D. Wilson; Daniel Lee Massey; Y Arcas Blaise Hilary Aguera
Archive | 2015
Douglas C. Burger; Daniel Lee Massey; Smet Bart J.F. De; Y Arcas Blaise Hilary Aguera