Krishna Bharat | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Krishna Bharat is active.

Explore More

Publication

Featured researches published by Krishna Bharat.

Journal of the Association for Information Science and Technology | 2000

A comparison of techniques to find mirrored hosts on the WWW

Krishna Bharat; Andrei Z. Broder; Jeffrey Dean; Monika Rauch Henzinger

We compare several algorithms for identifying mirrored hosts on the World Wide Web. The algorithms operate on the basis of URL strings and linkage data: the type of information about Web pages easily available from Web proxies and crawlers. Identification of mirrored hosts can improve Web-based information retrieval in several ways: first, by identifying mirrored hosts, search engines can avoid storing and returning duplicate documents. Second, several new information retrieval techniques for the Web make inferences based on the explicit links among hypertext documents—mirroring perturbs their graph model and degrades performance. Third, mirroring information can be used to redirect users to alternate mirror sites to compensate for various failures, and can thus improve the performance of Web browsers and proxies. We evaluated four classes of “top-down” algorithms for detecting mirrored host pairs (that is, algorithms that are based on page attributes such as URL, IP address, and hyperlinks between pages, and not on the page content) on a collection of 140 million URLs (on 230,000 hosts) and their associated connectivity information. Our best approach is one which combines five algorithms and achieved a precision of 0.57 for a recall of 0.86 considering 100,000 ranked host pairs.

international world wide web conferences | 2000

The term vector database: fast access to indexing terms for Web pages

Raymie Stata; Krishna Bharat; Farzin Maghoul

Abstract We have built a database that provides term vector information for large numbers of pages (hundreds of millions). The basic operation of the database is to take URLs and return term vectors. Compared to computing vectors by downloading pages via HTTP, the Term Vector Database is several orders of magnitude faster, enabling a large class of applications that would be impractical without such a database. This paper describes the Term Vector Database in detail. It also reports on two applications built on top of the database. The first application is an optimization of connectivity-based topic distillation. The second application is a Web page classifier used to annotate results returned by a Web search engine.

string processing and information retrieval | 2003

Patterns on the Web

Krishna Bharat

The web is the product of a planet-wide, implicit collaboration between content creators on an unprecedented scale. Although authors on the web come from a diverse set of backgrounds and often operate independently their collective work embodies surprising regularities at various levels. In this paper we describe patterns in both structural and temporal properties of the web.

Archive | 2001