Edward L. Robertson
Indiana University Bloomington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Edward L. Robertson.
data warehousing and knowledge discovery | 2001
Catharine M. Wyss; Chris Giannella; Edward L. Robertson
The problem of discovering functional dependencies (FDs) from an existing relation instance has received considerable attention in the database research community. To date, even the most efficient solutions have exponential complexity in the number of attributes of the instance. We develop an algorithm, FastFDs, for solving this problem based on a depth-first, heuristic-driven (DFHD) search for finding minimal covers of hypergraphs. The technique of reducing the FD discovery problem to the problem of finding minimal covers of hypergraphs was applied previously by Lopes et al. in the algorithm Dep-Miner. Dep-Miner employs a levelwise search for minimal covers, whereas FastFDs uses DFHD search. We report several tests on distinct benchmark relation instances involving Dep-Miner, FastFDs, and TANE. Our experimental results indicate that DFHD search is more efficient than Dep-Miners levelwise search or TANEs partitioning approach for many of these benchmark instances.
Theoretical Computer Science | 1981
Lawrence H. Landweber; Richard J. Lipton; Edward L. Robertson
The relationship between resource bounded deterministic and nondeterministic complexity classes has been extensively studied. For polynomial time, the associated question P = NP? is particularly important because of the large number of problems of practical interest that can be solved by nondeterministic polynomial time bounded devices. If P = NP, then these psoblems all have determinisitic polynomial time solutiolns whereas otherwise only exponential time solutions exist. Furthermore, the class NP contains complete problen:s to which all other members of NP can be reduced [S, ?,16] and P = NIP if and only if one of these complete problems is in P. With few exceptions, most intuitively appealk?g members of NP have been shown to be complete. In this paper, we study the structure of sets in NP. We develop a number of simple tools which facilitate the study o.? the relatke complexity 04 sets with res,pect to polynomial time reducibility. The method yields the existence of a minimal pair (with respect to P) of sets A,, S E NP which ;\re not complete. This strengthens earlier results “of Ladner [lo] (minimill pair-ncb upper baound) and Machtey Cl!51 (minimal pair-subexponential but not .qecessarily in NP). In addition, the method can be used to constru@ partial orders of degrees with respect to polynomial time reducibility.
ACM Transactions on Database Systems | 2005
Catharine M. Wyss; Edward L. Robertson
In this article, we develop a relational algebra for metadata integration, Federated Interoperable Relational Algebra (FIRA). FIRA has many desirable properties such as compositionality, closure, a deterministic semantics, a modest complexity, support for nested queries, a subalgebra equivalent to canonical Relational Algebra (RA), and robustness under certain classes of schema evolution. Beyond this, FIRA queries are capable of producing fully dynamic output schemas, where the number of relations and/or the number of columns in relations of the output varies dynamically with the input instance. Among existing query languages for relational metadata integration, only FIRA provides generalized dynamic output schemas, where the values in any (fixed) number of input columns can determine output schemas.Further contributions of this article include development of an extended relational model for metadata integration, the Federated Relational Data Model, which is strictly downward compatible with the relational model. Additionally, we define the notion of Transformational Completeness for relational query languages and postulate FIRA as a canonical transformationally complete language. We also give a declarative, SQL-like query language that is equivalent to FIRA, called Federated Interoperable Structured Query Language (FISQL).While our main contributions are conceptual, the federated model, FISQL/FIRA, and the notion of transformational completeness nevertheless have important applications to data integration and OLAP. In addition to summarizing these applications, we illustrate the use of FIRA to optimize FISQL queries using rule-based transformations that directly parallel their canonical relational counterparts. We conclude the article with an extended discussion of related work as well as an indication of current and future work on FISQL/FIRA.
advances in databases and information systems | 2004
Chris Giannella; Edward L. Robertson
We examine the issue of how to measure the degree to which a functional dependency (FD) is approximate. The primary motivation lies in the fact that approximate FDs represent potentially interesting patterns existent in a table. Their discovery is a valuable data mining problem. However, before algorithms can be developed, a measure must be defined quantifying their approximation degree.First we develop an approximation measure by axiomatizing the following intuition: the degree to which X → Y is approximate in a table T is the degree to which T determines a function from ΠX(T) to ΠY(T). We prove that a unique unnormalized measure satisfies these axioms up to a multiplicative constant. Next we compare the measure developed with two other measures from the literature. In all but one case, we show that the measures can be made to differ as much as possible within normalization. We examine these measure on several real datasets and observe that many of the theoretically possible extreme differences do not bear themselves out. We offer some conclusions as to particular situations where certain measures are more appropriate than others.
Acta Informatica | 1981
Paul Walton Purdom; Cynthia A. Brown; Edward L. Robertson
SummaryThe order in which the variables are tested in a backtrack program can have a major effect on its running time. The best search order usually varies among the branches of the backtrack tree, so the number of possible search orders can be astronomical. We present an algorithm that chooses a search order dynamically by investigating all possibilities for k levels below the current level, extending beyond k levels wherever possible by setting the variables that have unique forced values. The algorithm takes time O(nk+1) to process a node. For k=2 and binary variables the analysis for selecting the next variable to introduce into the backtrack tree makes complete use of the information contained in the two-level investigations. For larger k or variables of higher degree there is no polynomial-time algorithm that makes complete use of the k-level investigations to limit searching (unless P=NP). The search rearrangement algorithm is closely related to constraint propagation. Experimental studies on conjunctive normal form predicates confirm that 1-level search rearrangement saves a great deal of time compared to 0-level (ordinary backtracking), and show that 2-level saves time over 1-level on large problems. For such problems with 256 variables 2-level is better than 1-level by a factor of two.
symposium on principles of database systems | 1994
Latha S. Colby; Edward L. Robertson; Lawrence V. Saxton; Dirk Van Gucht
We present a language for querying list-based complex objects. The language is shown to express precisely the polynomial-time generic list-object functions. The iteration mechanism of the language is based on a new approach wherein, in addition to the list over which the iteration is performed, a second list is used to control the number of iteration steps. During the iteration, the intermediate results can be moved to the output list as well as reinserted into the list being iterated over. A simple syntactic constraint allows the growth rate of the intermediate results to be tightly controlled which, in turn, restricts the expressiveness of the language to PTIME.
conference on information and knowledge management | 2005
Catharine M. Wyss; Edward L. Robertson
PIVOT is an important relational operation that allows data in rows to be exchanged for columns. Although most current relational database management systems support PIVOT-type operations, to date a purely formal, algebraic characterization of PIVOT has been lacking. In this paper, we present a characterization in terms of extended relational algebra operators τ (transpose), Π (drop projection), and μ (unique optimal tuple merge). This enables us to (1) draw parallels with PIVOT and existing operators employed in Dynamic Data Mapping Systems (DDMS), (2) formally characterize invertible PIVOT instances, and (3) provide complexity results for PIVOT-type operations. These contributions are an important part of ongoing work on formal models for relational OLAP.
conference on software engineering education and training | 2001
Dennis P. Groth; Edward L. Robertson
Process considerations are a central part of the material for a software engineering course; they are also central to accomplishing full-lifecycle, team-based systems development projects in such a course. This paper discusses the ways in which we have achieved an effective process structure within an academic context of full-year project courses. The key features are a kernel project plan and a process management mechanism. The project plan is a schedule including eight milestones with fixed due dates and quite explicit deliverables. The management is accomplished through an advanced full-year course, whose participants guide the project teams through the process.
Proceedings of the 1998 workshop on New paradigms in information visualization and manipulation | 1998
Dennis P. Groth; Edward L. Robertson
The rapid proliferation and growth of database management systems has resulted in the retention of massive amounts of information for data processing and analysis needs. Many data processing requirements can be satisfied through the use of traditional database languages, such as SQL. These languages retrieve and present query results in record-oriented tables. The table of records format is best for presenting every record, but it cannot give a feel for the overall character of the data set.
international semantic web conference | 2004
Edward L. Robertson
This paper introduces and develops an algebra over triadic relations (relations whose contents are only triples). In essence, the algebra is a severely restricted variation of relational algebra (RA) that is de.ned over relations with exactly three attributes and is closed for the same set of relations. In particular, arbitrary joins and Cartesian products are replaced by a single three-way join. Ternary relations are important because they provide the minimal, and thus most uniform, way to encode semantics wherein metadata may be treated uniformly with regular data; this fact has been recognized in the choice of triples to formalize the Semantic Web via RDF. Indeed, algebraic de.nitions corresponding to certain of these formalisms will be shown as examples. An important aspect of this algebra is an encoding of triples, implementing a kind of rei.cation. The algebra is shown to be equivalent, over non-rei.ed values, to a restriction of Datalog and hence to a fragment of .rst order logic. Furthermore, the algebra requires only two operators if certain .xed in.nitary constants (similar to Tarskis identity) are present. In this case, all structure is represented only in the data, that is, in the encodings that these in.nitary constants represent.