Frank Wm. Tompa
University of Waterloo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Frank Wm. Tompa.
ACM Transactions on Information Systems | 1989
Frank Wm. Tompa
Hypertext and other page-oriented databases cannot be schematized in the same manner as record-oriented databases. As a result, most hypertext databases implicitly employ a data model based on a simple, unrestricted graph. This paper presents a hypergraph model for maintaining page-oriented databases in such a way that some of the functionality traditionally provided by database schemes can be available to hypertext databases. In particular, the model formalizes identification of commonality in the structure, set-at-a-time database access, and definition of user-specific views. An efficient implementation of the model is also discussed.
document engineering | 2001
Airi Salminen; Frank Wm. Tompa
The shift from SGML to XML has created new demands for managing structured documents. Many XML documents will be transient representations for the purpose of data exchange between different types of applications, but there will also be a need for effective means to manage persistent XML data as a database. In this paper we explore requirements for an XML database management system. The purpose of the paper is not to suggest a single type of system covering all necessary features. Instead the purpose is to initiate discussion of the requirements arising from document collections, to offer a context in which to evaluate current and future solutions, and to encourage the development of proper models and systems for XML database management. Our discussion addresses issues arising from data modelling, data definition, and data manipulation.
International Conference on Applications of Databases | 1994
G. E. Blake; Mariano P. Consens; P. Kilpeläinen; P. Å. Larson; T. Snider; Frank Wm. Tompa
Combined text and relational database support is increasingly recognized as an emerging need of industry, spanning applications requiring text fields as parts of their data (e.g., for customer support) to those augmenting primary text resources by conventional relational data (e.g., for publication control). In this paper, we propose extensions to SQL that provide flexible and efficient access to structured text described by SGML. We also propose an architecture to support a text/relational database management system as a federated database environment, where component databases are accessed via “agents”: SQL agents that translate standard or extended SQL queries into vendorspecific dialects, and text agents that process text sub-queries on full-text search engines.
ACM Transactions on Database Systems | 1981
Tok Wang Ling; Frank Wm. Tompa; Tiko Kameda
In this paper, we show that some Codd third normal form relations may contain “superfluous” attributes because the definitions of transitive dependency and prime attribute are inadequate when applied to sets of relations. To correct this, an improved third normal form is defined and an algorithm is given to construct a set of relations from a given set of functional dependencies in such a way that the superfluous attributes are guaranteed to be removed. This new normal form is compared with other existing definitions of third normal form, and the deletion normalization method proposed is shown to subsume the decomposition method of normalization.
international conference on management of data | 2007
David DeHaan; Frank Wm. Tompa
Most contemporary database systems perform cost-based join enumeration using some variant of System-Rs bottom-up dynamic programming method. The notable exceptions are systems based on the top-down transformational search of Volcano/Cascades. As recent work has demonstrated, bottom-up dynamic programming can attain optimality with respect to the shape of the join graph; no comparable results have been published for transformational search. However, transformational systems leverage benefits of top-down search not available to bottom-up methods. In this paper we describe a top-down join enumeration algorithm that is optimal with respect to the join graph. We present performance results demonstrating that a combination of optimal enumeration with search strategies such as branch-and-bound yields an algorithm significantly faster than those previously described in the literature. Although our algorithm enumerates the search space top-down, it does not rely on transformations and thus retains much of the architecture of traditional dynamic programming. As such, this work provides a migration path for existing bottom-up optimizers to exploit top-down search without drastically changing to the transformational paradigm.
Information Systems | 1988
Frank Wm. Tompa; José A. Blakeley
Abstract Access to a database through a user view can be serviced quickly when the view is materialized, i.e. the transformed data is explicitly stored. In the presence of database updates, however, the materialized view can become costly to maintain; often it must be completely rederived from the base data using the view definition. Under some conditions the view can be updated directly given only the view definition, the current contents of the materialized view, and the update operation (still expressed against the base data), without accessing the base data itself. In this paper, we consider relational views defined by projection, selection, and join. We present necessary and sufficient conditions on the view definition, contents, and update operations for insertions and deletions to be reflected in the view without reference to base data. Because the possibility of such view-based updating is dependent on the current contents of view, we call the update conditionally autonomously computable .
Machine Learning | 2000
Matthew Young-Lai; Frank Wm. Tompa
For a document collection in which structural elements are identified with markup, it is often necessary to construct a grammar retrospectively that constrains element nesting and ordering. This has been addressed by others as an application of grammatical inference. We describe an approach based on stochastic grammatical inference which scales more naturally to large data sets and produces models with richer semantics. We adopt an algorithm that produces stochastic finite automata and describe modifications that enable better interactive control of results. Our experimental evaluation uses four document collections with varying structure.
international acm sigir conference on research and development in information retrieval | 2013
Shahab Kamali; Frank Wm. Tompa
Many documents with mathematical content are published on the Web, but conventional search engines that rely on keyword search only cannot fully exploit their mathematical information. In particular, keyword search is insufficient when expressions in a document are not annotated with natural keywords or the user cannot describe her query with keywords. Retrieving documents by querying their mathematical content directly is very appealing in various domains such as education, digital libraries, engineering, patent documents, medical sciences, etc. Capturing the relevance of mathematical expressions also greatly enhances document classification in such domains. Unlike text retrieval, where keywords carry enough semantics to distinguish text documents and rank them, math symbols do not contain much semantic information on their own. In fact, mathematical expressions typically consist of few alphabetical symbols organized in rather complex structures. Hence, the structure of an expression, which describes the way such symbols are combined, should also be considered. Unfortunately, there is no standard testbed with which to evaluate the effectiveness of a mathematics retrieval algorithm. In this paper we study the fundamental and challenging problems in mathematics retrieval, that is how to capture the relevance of mathematical expressions, how to query them, and how to evaluate the results. We describe various search paradigms and propose retrieval systems accordingly. We discuss the benefits and drawbacks of each approach, and further compare them through an extensive empirical study.
Computer Standards & Interfaces | 1996
Darrell R. Raymond; Frank Wm. Tompa; Derick Wood
SGML provides standard representations for documents, but as documents become more fluid, we will need standard semantics for them as well. The ability to manage change is a fundamental capability of any system that supports document semantics. We look at three areas important in change management: equivalence, redundancy, and operators. We show how these areas are implicitly addressed in SGML and SGML-based standards, and argue that more explicit consideration would be useful both for evaluating current standards, and for developing new systems for document semantics.
very large data bases | 1985
Claudia Bauzer Medeiros; Frank Wm. Tompa
Database views are traditionally described as unmaterialized queries, which may be coincidentally updatable according to some fixed criteria. One of the problems in updating through views lies in determining whether a given view modification can be correctly translated by the system. To define an updatable view, a view designer must be aware of how an update request in the view will be mapped into updates of the underlying relations. Furthermore, because of side effects, the view designer must also be made aware of the effects of isolated updates back into the view.To address this problem, we present a general algorithm that predicts the effects of arbitrary mapping policies. Given an update policy, this algorithm indicates whether a desired update will, in fact, occur in the view and describes all possible side effects it may have, documenting the conditions under which they occur. The algorithm subsumes the results obtained by other view design tools, and generalizes their use to encompass a larger class of views.