Krishna Prasad Chitrapura

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Krishna Prasad Chitrapura is active.

Explore More

Publication

Featured researches published by Krishna Prasad Chitrapura.

knowledge discovery and data mining | 2011

Response prediction using collaborative filtering with hierarchies and side-information

Aditya Krishna Menon; Krishna Prasad Chitrapura; Sachin Garg; Deepak Agarwal; Nagaraj Kota

In online advertising, response prediction is the problem of estimating the probability that an advertisement is clicked when displayed on a content publishers webpage. In this paper, we show how response prediction can be viewed as a problem of matrix completion, and propose to solve it using matrix factorization techniques from collaborative filtering (CF). We point out the two crucial differences between standard CF problems and response prediction, namely the requirement of predicting probabilities rather than scores, and the issue of confidence in matrix entries. We address these issues using a matrix factorization analogue of logistic regression, and by applying a principled confidence-weighting scheme to its objective. We show how this factorization can be seamlessly combined with explicit features or side-information for pages and ads, which let us combine the benefits of both approaches. Finally, we combat the extreme sparsity of response prediction data by incorporating hierarchical information about the pages and ads into our factorization model. Experiments on three very large real-world datasets show that our model outperforms current state-of-the-art methods for response prediction.

web search and data mining | 2010

Learning URL patterns for webpage de-duplication

Hema Swetha Koppula; Krishna P. Leela; Amit Agarwal; Krishna Prasad Chitrapura; Sachin Garg; Amit Sasturkar

Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we present a set of techniques to mine rules from URLs and utilize these rules for de-duplication using just URL strings without fetching the content explicitly. Our technique is composed of mining the crawl logs and utilizing clusters of similar pages to extract transformation rules, which are used to normalize URLs belonging to each cluster. Preserving each mined rule for de-duplication is not efficient due to the large number of such rules. We present a machine learning technique to generalize the set of rules, which reduces the resource footprint to be usable at web-scale. The rule extraction techniques are robust against web-site specific URL conventions. We compare the precision and scalability of our approach with recent efforts in using URLs for de-duplication. Experimental results demonstrate that our approach achieves 2 times more reduction in duplicates with only half the rules compared to the most recent previous approach. Scalability of the framework is demonstrated by performing a large scale evaluation on a set of 3 Billion URLs, implemented using the MapReduce framework.

conference on information and knowledge management | 2009

URL normalization for de-duplication of web pages

Amit Agarwal; Hema Swetha Koppula; Krishna P. Leela; Krishna Prasad Chitrapura; Sachin Garg; Pavan Kumar Gm; Chittaranjan Haty; Anirban Roy; Amit Sasturkar

Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we present a set of techniques to mine rules from URLs and utilize these learnt rules for de-duplication using just URL strings without fetching the content explicitly. Our technique is composed of mining the crawl logs and utilizing clusters of similar pages to extract specific rules from URLs belonging to each cluster. Preserving each mined rules for de-duplication is not efficient due to the large number of specific rules. We present a machine learning technique to generalize the set of rules, which reduces the resource footprint to be usable at web-scale. The rule extraction techniques are robust against web-site specific URL conventions. We demonstrate the effectiveness of our techniques through experimental evaluation.

conference on information and knowledge management | 2006

Search result summarization and disambiguation via contextual dimensions

Krishna Prasad Chitrapura; Sachindra Joshi; Raghu Krishnapuram

Topic hierarchies are a popular method of summarizing the results obtained in response to a query in various search applications. However, topic hierarchies are rigid when they are pre-defined and somewhat unintuitive when they are dynamically generated by statistical techniques. In this paper, we propose an alternative approach to query disambiguation and result summarization by placing the results in set of contextual dimensions which can be viewed as facets. For the generic search scenario, we illustrate our approach by using three types of contextual dimensions, namely, concepts, features, and specializations. We use NLP techniques and a data mining algorithm to select distinct contexts.

Archive | 2007