Ronald Fagin
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ronald Fagin.
symposium on principles of database systems | 2001
Ronald Fagin; Amnon Lotem; Moni Naor
Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, or combining rule, such as min or average, that combines the individual grades to obtain an overall grade. To determine objects that have the best overall grades, the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (“Fagins Algorithm”, or FA) that is much more efficient. For some distributions on grades, and for some monotone aggregation functions, FA is optimal in a high-probability sense. We analyze an elegant and remarkably simple algorithm (“the threshold algorithm”, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer. We distinguish two types of access: sorted access (where the middleware system obtains the grade of an object in some sorted list by proceeding through the list sequentially from the top), and random access (where the middleware system requests the grade of object in a list, and obtains it in one step). We consider the scenarios where random access is either impossible, or expensive relative to sorted access, and provide algorithms that are essentially optimal for these cases as well.
Journal of Computer and System Sciences | 1999
Ronald Fagin
In a traditional database system, the result of a query is a set of values (those values that satisfy the query). In other data servers, such as a system with queries based on image content, or many text retrieval systems, the result of a query is a sorted list. For example, in the case of a system with queries based on image content, the query might ask for objects that are a particular shade of red, and the result of the query would be a sorted list of objects in the database, sorted by how well the color of the object matches that given in the query. A multimedia system must somehow synthesize both types of queries (those whose result is a set and those whose result is a sorted list) in a consistent manner. In this paper we discuss the solution adopted by Garlic, a multimedia information system being developed at the IBM Almaden Research Center. This solution is based on “graded” (or “fuzzy”) sets. Issues of efficient query evaluation in a multimedia system are very different from those in a traditional database system. This is because the multimedia system receives answers to subqueries from various subsystems, which can be accessed only in limited ways. For the important class of queries that are conjunctions of atomic queries (where each atomic query might be evaluated by a different subsystem), the naive algorithm must retrieve a number of elements that is linear in the database size. In contrast, in this paper an algorithm is given, which has been implemented in Garlic, such that if the conjuncts are independent, then with arbitrarily high probability, the total number of elements retrieved in evaluating the query is sublinear in the database size (in the case of two conjuncts, it is of the order of the square root of the database size). It is also shown that for such queries, the algorithm is optimal. The matching upper and lower bounds are robust, in the sense that they hold under almost any reasonable rule (including the standard min rule of fuzzy logic) for evaluating the conjunction. Finally, we find a query that is provably hard, in the sense that the naive linear algorithm is essentially optimal.
Journal of the ACM | 1983
Catriel Beeri; Ronald Fagin; David Maier; Mihalis Yannakakis
A class of database schemes, called acychc, was recently introduced. It is shown that this class has a number of desirable properties. In particular, several desirable properties that have been studied by other researchers m very different terms are all shown to be eqmvalent to acydicity. In addition, several equivalent charactenzauons of the class m terms of graphs and hypergraphs are given, and a smaple algorithm for determining acychclty is presented. Also given are several eqmvalent characterizations of those sets M of multivalued dependencies such that M is the set of muRlvalued dependencies that are the consequences of a given join dependency. Several characterizations for a conflict-free (in the sense of Lien) set of muluvalued dependencies are provided.
ACM Transactions on Database Systems | 1977
Ronald Fagin
A new type of dependency, which includes the well-known functional dependencies as a special case, is defined for relational databases. By using this concept, a new (“fourth”) normal form for relation schemata is defined. This fourth normal form is strictly stronger than Codds “improved third normal form” (or “Boyce-Codd normal form”). It is shown that every relation schema can be decomposed into a family of relation schemata in fourth normal form without loss of information (that is, the original relation can be obtained from the new relations by taking joins).
ACM Transactions on Database Systems | 1979
Ronald Fagin; Jürg Nievergelt; Nicholas Pippenger; H. Raymond Strong
Extendible hashing is a new access technique, in which the user is guaranteed no more than two page faults to locate the data associated with a given unique identifier, or key. Unlike conventional hashing, extendible hashing has a dynamic structure that grows and shrinks gracefully as the database grows and shrinks. This approach simultaneously solves the problem of making hash tables that are extendible and of making radix search trees that are balanced. We study, by analysis and simulation, the performance of extendible hashing. The results indicate that extendible hashing provides an attractive alternative to other access methods, such as balanced trees.
very large data bases | 2002
Lucian Popa; Yannis Velegrakis; Mauricio A. Hernández; Renée J. Miller; Ronald Fagin
We present a novel framework for mapping between any combination of XML and relational schemas, in which a high-level, user-specified mapping is translated into semantically meaningful queries that transform source data into the target representation. Our approach works in two phases. In the first phase, the high-level mapping, expressed as a set of inter-schema correspondences, is converted into a set of mappings that capture the design choices made in the source and target schemas (including their hierarchical organization as well as their nested referential constraints). The second phase translates these mappings into queries over the source schemas that produce data satisfying the constraints and structure of the target schema, and preserving the semantic relationships of the source. Nonnull target values may need to be invented in this process. The mapping algorithm is complete in that it produces all mappings that are consistent with the schema constraints. We have implemented the translation algorithm in Clio, a schema mapping tool, and present our experience using Clio on several real schemas.
symposium on principles of database systems | 2003
Ronald Fagin; Phokion G. Kolaitis; Lucian Popa
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier article, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a “most general possible” solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a “best” universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, and has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. We investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data exchange, we identify natural and fairly broad conditions under which there are polynomial-time algorithms for computing the core of a universal solution. We also analyze the computational complexity of the following decision problem that underlies the computation of cores: given two graphs G and H, is H the core of G? Earlier results imply that this problem is both NP-hard and coNP-hard. Here, we pinpoint its exact complexity by establishing that it is a DP-complete problem. Finally, we show that the core is the best among all universal solutions for answering existential queries, and we propose an alternative semantics for answering queries in data exchange settings.
international conference on management of data | 2001
Renée J. Miller; Mauricio A. Hernández; Laura M. Haas; Lingling Yan; C. T. Howard Ho; Ronald Fagin; Lucian Popa
Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we present the underlying themes of our approach and present a brief case study.
symposium on principles of database systems | 1983
Ronald Fagin; Jeffrey D. Ullman; Moshe Y. Vardi
We suggest here a methodology for updating databases with integrity constraints and rules for deriving inexphcit information. First we consider the problem of updating arbitrary theories by inserting into them or deleting from them arbitrary sentences. The solution involves two key ideas when replacing an old theory by a new one we wish to minimize the change in the theory, and when there are several theories that involve minimal changes, we look for a new theory that reflects that ambiguity. The methodology is also adapted to updating databases, where different facts can carry different priorities, and to updating user views.
international conference on management of data | 1977
Catriel Beeri; Ronald Fagin; John H. Howard
We investigate the inference rules that can be applied to functional and multivalued dependencies that exist in a database relation. Three types of rules are discussed. First, we list the well known rules for functional dependencies. Then we investigate the rules for multivalued dependencies. It is shown that for each rule for functional dependencies the same rule or a similar rule holds for multivalued dependencies. There is, however, one additional rule for multivalued dependencies that has no parallel among the rules for functional dependencies. Finally, we present rules that involve functional and multivalued dependencies together. The main result of the paper is that the rules presented are complete for the family of functional and multivalued dependencies.