Scott Gaffney
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Scott Gaffney.
conference on information and knowledge management | 2009
Soo-Min Kim; Patrick Pantel; Lei Duan; Scott Gaffney
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled similar documents. Current state-of-the-art classifiers are supervised and require large amounts of manually labeled data. We hypothesize that unlabeled documents similar to our positive and negative labeled documents tend to be clicked through by the same user queries. Our proposed method leverages this hypothesis and augments our training set by modeling the similarity between documents in a click graph. We experiment with three different web page classifiers and show empirical evidence that our proposed approach outperforms state-of-the-art methods and reduces the amount of human effort to label training data.
international world wide web conferences | 2010
Suju Rajan; Dragomir Yankov; Scott Gaffney; Adwait Ratnaparkhi
Many web applications such as ad matching systems, vertical search engines, and page categorization systems require the identification of a particular type or class of pages on the Web. The sheer number and diversity of the pages on the Web, however, makes the problem of obtaining a good sample of the class of interest hard. In this paper, we describe a successfully deployed end-to-end system that starts from a biased training sample and makes use of several state-of-the-art machine learning algorithms working in tandem, including a powerful active learning component, in order to achieve a good classification system. The system is evaluated on traffic from a real-world ad-matching platform and is shown to achieve high categorization effectiveness with a significant reduction in editorial effort and labeling time.
international conference on data mining | 2010
Lan Nie; Zhigang Hua; Xiaofeng He; Scott Gaffney
Document classification plays an increasingly important role in extracting and organizing the knowledge, however, the Web document classification task was hindered by the huge number of Web documents while limited resource of human judgment on the training data. To obtain sufficient training data in a cost-efficient way, in this paper, we propose a semi-supervised learning approach to predict a document’s class label by mining the click graph. To overcome the sparseness problem of click graph, we enrich it by including hyperlinks between the Web documents. Content-based constraints are further added to regularize the graph. The resulting graph unifies three data sources: click-through data, hyperlinks and content relevance. Starting from a very small seed set of manually labeled documents, we automatically explore large amount of relevant documents by applying a Markov random walk model to the enriched click graph. The top pages with high confidence scores are included to the current training data for classifier model training. We investigate various combinations among the three sources and conduct extensive experiments on six typical web classification tasks. The experimental results show that the click graph enriched with hyperlink and content information can significantly improve the classification quality across multiple tasks only with a minimal human labeling cost.
Climate Dynamics | 2007
Scott Gaffney; Andrew W. Robertson; Padhraic Smyth; Suzana J. Camargo; Michael Ghil
Archive | 2007
John Canny; Shi Zhong; Scott Gaffney; Chad Brower; Pavel Berkhin; George H. John
international conference on computational linguistics | 2010
Yiping Zhou; Lan Nie; Omid Rouhani-kalleh; Flavian Vasile; Scott Gaffney
Archive | 2007
John Canny; Shi Zhong; Scott Gaffney; Chad Brower; Pavel Berkhin; George H. John
Archive | 2011
John Canny; Shi Zhong; Scott Gaffney; Chad Brower; Pavel Berkhin; George H. John
Archive | 2013
Nathan Liu; Scott Gaffney; Jean-Marc Langlois
Archive | 2008
Suju Rajan; Scott Gaffney