Hector Garcia-Molina | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hector Garcia-Molina is active.

Explore More

Publication

Featured researches published by Hector Garcia-Molina.

international world wide web conferences | 2003

The Eigentrust algorithm for reputation management in P2P networks

Sepandar D. Kamvar; Mario T. Schlosser; Hector Garcia-Molina

Peer-to-peer file-sharing networks are currently receiving much attention as a means of sharing and distributing information. However, as recent experience shows, the anonymous, open nature of these networks offers an almost ideal environment for the spread of self-replicating inauthentic files.We describe an algorithm to decrease the number of downloads of inauthentic files in a peer-to-peer file-sharing network that assigns each peer a unique global trust value, based on the peers history of uploads. We present a distributed and secure method to compute global trust values, based on Power iteration. By having peers use these global trust values to choose the peers from whom they download, the network effectively identifies malicious peers and isolates them from the network.In simulations, this reputation system, called EigenTrust, has been shown to significantly decrease the number of inauthentic files on the network, even under a variety of conditions where malicious peers cooperate in an attempt to deliberately subvert the system.

very large data bases | 2004

Combating web spam with trustrank

Zoltán Gyöngyi; Hector Garcia-Molina; Jan O. Pedersen

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engines results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

international conference on data engineering | 2003

Designing a super-peer network

Beverly Yang; Hector Garcia-Molina

A super-peer is a node in a peer-to-peer network that operates both as a server to a set of clients, and as an equal in a network of super-peers. Super-peer networks strike a balance between the efficiency of centralized search, and the autonomy, load balancing and robustness to attacks provided by distributed search. Furthermore, they take advantage of the heterogeneity of capabilities (e.g., bandwidth, processing power) across peers, which recent studies have shown to be enormous. Hence, new and old P2P systems like KaZaA and Gnutella are adopting super-peers in their design. Despite their growing popularity, the behavior of super-peer networks is not well understood. For example, what are the potential drawbacks of super-peer networks? How can super-peers be made more reliable? How many clients should a super-peer take on to maximize efficiency? we examine super-peer networks in detail, gaming an understanding of their fundamental characteristics and performance tradeoffs. We also present practical guidelines and a general procedure for the design of an efficient super-peer network.

next generation information technologies and systems | 1997

The TSIMMIS Approach to Mediation: Data Models and Languages

Hector Garcia-Molina; Yannis Papakonstantinou; Dallan Quass; Anand Rajaraman; Yehoshua Sagiv; Jeffrey D. Ullman; Vasilis Vassalos; Jennifer Widom

TSIMMIS—The Stanford-IBM Manager of Multiple InformationSources—is a system for integrating information. It offers a datamodel and a common query language that are designed to support thecombining of information from many different sources. It also offerstools for generating automatically the components that are needed tobuild systems for integrating information. In this paper we shalldiscuss the principal architectural features and their rationale.

international conference on distributed computing systems | 2002

Routing indices for peer-to-peer systems

Arturo Crespo; Hector Garcia-Molina

Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or flooding the network with queries. We introduce the concept of routing indices (RIs), which allow nodes to forward queries to neighbors that are more likely to have answers. If a node cannot answer a query, it forwards the query to a subset of its neighbors, based on its local RI, rather than by selecting neighbors at random or by flooding the network by forwarding the query to all neighbors. We present three RI schemes: the compound, the hop-count, and the exponential routing indices. We evaluate their performance via simulations, and find that RIs can improve performance by one or two orders of magnitude vs. a flooding-based system, and by up to 100% vs. a random forwarding system. We also discuss the tradeoffs between the different RI schemes and highlight the effects of key design variables on system performance.

international conference on data engineering | 1995

Object exchange across heterogeneous information sources

Yannis Papakonstantinou; Hector Garcia-Molina; Jennifer Widom

We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object-based information exchange model and a corresponding query language that we believe are well suited for integration of diverse information sources. We describe how, the model and language have been used to integrate heterogeneous bibliographic information sources. We also describe two general-purpose libraries we have implemented for object exchange between clients and servers.<<ETX>>

international conference on distributed computing systems | 2002

Improving search in peer-to-peer networks

Beverly Yang; Hector Garcia-Molina

Peer-to-peer systems have emerged as a popular way to share huge volumes of data. The usability of these systems depends on effective techniques to find and retrieve data; however current techniques used in existing P2P systems are often very inefficient. We present three techniques for efficient search in P2P systems. We present the design of these techniques, and then evaluate them using a combination of analysis and experiments over Gnutella, the largest open P2P system in operation. We show that while our techniques maintain the same quality of results as currently used techniques, they use up to 5 times fewer resources. In addition, we designed our techniques to be simple, so that they can be easily incorporated into existing systems for immediate impact.

international conference on management of data | 2003

Extracting structured data from Web pages

Arvind Arasu; Hector Garcia-Molina

Many web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its book pages. The values used to generate the pages (e.g., the author, title,...) typically come from a database. In this paper, we study the problem of automatically extracting the database values from such template-generated web pages without any learning examples or other similar human input. We formally define a template, and propose a model that describes how values are encoded into pages using a template. We present an algorithm that takes, as input, a set of template-generated pages, deduces the unknown template used to generate the pages, and extracts, as output, the values encoded in the pages. Experimental evaluation on a large number of real input page collections indicates that our algorithm correctly extracts data in most cases.

ACM Transactions on Database Systems | 1992

Scheduling real-time transactions: a performance evaluation

Robert K. Abbott; Hector Garcia-Molina

This thesis has six chapters. Chapter 1 motivates the thesis by describing the characteristics of real-time database systems and the problems of scheduling transactions with deadlines. We also present a short survey of related work and discuss how this thesis has contributed to the state of the art. In Chapter 2 we develop a new family of algorithms for scheduling real-time transactions. Our algorithms have four components: a policy to manage overloads, a policy for scheduling the CPU, a policy for scheduling access to data, i.e., concurrency control and a policy for scheduling I/O requests on a disk device. In Chapter 3, our scheduling algorithms are evaluated via simulation. Our chief result is that real-time scheduling algorithms can perform significantly better than a conventional non real-time algorithm. In particular, the Least Slack (static evaluation) policy for scheduling the CPU, combined with the Wait Promote policy for concurrency control, produces the best overall performance. In Chapter 4 we develop a new set of algorithms for scheduling disk I/O requests with deadlines. Our model assumes the existence of a real-time database system which assigns deadlines to individual read and write requests. We also propose new techniques for handling requests without deadlines and requests with deadlines simultaneously. This approach greatly improves the performance of the algorithms and their ability to minimize missed deadlines. In Chapter 5 we evaluate the I/O scheduling algorithms using detailed simulation. Our chief result is that real-time disk scheduling algorithms can perform better than conventional algorithms. In particular, our algorithm FD-SCAN was found to be very effective across a wide range of experiments. Finally, in Chapter 6 we summarize our conclusions and discuss how this work has contributed to the state of the art. Also, we briefly explore some interesting new directions for continuing this research.

ACM Transactions on Internet Technology | 2001

Searching the Web

Arvind Arasu; Junghoo Cho; Hector Garcia-Molina; Andreas Paepcke; Sriram Raghavan

We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting search performance. The most common design and implementation techniques for each of these components are presented. For this presentation we draw from the literature and from our own experimental search engine testbed. Emphasis is on introducing the fundamental concepts and the results of several performance analyses we conducted to compare different designs.

Explore More