Krishna Bharat
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Krishna Bharat.
Journal of the Association for Information Science and Technology | 2000
Krishna Bharat; Andrei Z. Broder; Jeffrey Dean; Monika Rauch Henzinger
We compare several algorithms for identifying mirrored hosts on the World Wide Web. The algorithms operate on the basis of URL strings and linkage data: the type of information about Web pages easily available from Web proxies and crawlers. Identification of mirrored hosts can improve Web-based information retrieval in several ways: first, by identifying mirrored hosts, search engines can avoid storing and returning duplicate documents. Second, several new information retrieval techniques for the Web make inferences based on the explicit links among hypertext documents—mirroring perturbs their graph model and degrades performance. Third, mirroring information can be used to redirect users to alternate mirror sites to compensate for various failures, and can thus improve the performance of Web browsers and proxies. We evaluated four classes of “top-down” algorithms for detecting mirrored host pairs (that is, algorithms that are based on page attributes such as URL, IP address, and hyperlinks between pages, and not on the page content) on a collection of 140 million URLs (on 230,000 hosts) and their associated connectivity information. Our best approach is one which combines five algorithms and achieved a precision of 0.57 for a recall of 0.86 considering 100,000 ranked host pairs.
international world wide web conferences | 2000
Raymie Stata; Krishna Bharat; Farzin Maghoul
Abstract We have built a database that provides term vector information for large numbers of pages (hundreds of millions). The basic operation of the database is to take URLs and return term vectors. Compared to computing vectors by downloading pages via HTTP, the Term Vector Database is several orders of magnitude faster, enabling a large class of applications that would be impractical without such a database. This paper describes the Term Vector Database in detail. It also reports on two applications built on top of the database. The first application is an optimization of connectivity-based topic distillation. The second application is a Web page classifier used to annotate results returned by a Web search engine.
string processing and information retrieval | 2003
Krishna Bharat
The web is the product of a planet-wide, implicit collaboration between content creators on an unprecedented scale. Although authors on the web come from a diverse set of backgrounds and often operate independently their collective work embodies surprising regularities at various levels. In this paper we describe patterns in both structural and temporal properties of the web.
Archive | 2001
Krishna Bharat
Archive | 2001
Jeffrey A. Dean; Benedict A. Gomes; Krishna Bharat; Georges R. Harik; Monika H. Henzinger
Archive | 2003
Krishna Bharat; Stephen R. Lawrence; Mehran Sahami
Archive | 1998
Krishna Bharat; Monika R. Henzinger
Archive | 2004
Jason Liebman; Krishna Bharat
international conference on data mining | 2001
Krishna Bharat; Bay-Wei Chang; Monika Rauch Henzinger; Matthias Ruhl
Archive | 2012
Michael Curtiss; Krishna Bharat; Michael Schmitt