Maurice A. W. Houtsma
University of Twente
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maurice A. W. Houtsma.
international conference on data engineering | 1995
Maurice A. W. Houtsma; Arun N. Swami
Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. SETM uses only simple database primitives, viz. sorting and merge-scan join. SETM is simple, fast and stable over the range of parameter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be carried out by using general query languages such as SQL, rather than by developing specialized black-box algorithms. The set-oriented nature of SETM facilitates the development of extensions.<<ETX>>
Archive | 1997
Peter M. G. Apers; Henk M. Blanken; Maurice A. W. Houtsma
This volume provides much-needed coverage of the technical background to the development of multimedia applications. Based on an advanced summer school run by the University of Twente, each chapter is written by an expert in a particular topic, including enabling technologies, operating systems, index structures for multimedia, and communication issues. There is also a comprehensive discussion of the factors which determine the success or failure of database applications in the real-world, based on new multimedia projects in Dutch industries and service companies. Multimedia Databases in Perspective is an advanced textbook aimed at final year undergraduate students, and MSc and PhD students studying databases, database management, information systems, and multimedia applications. It will also be of interest to researchers in the above areas, and DBMS developers working in the software industry.
data and knowledge engineering | 1995
Maurice A. W. Houtsma; Arun N. Swami
Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing. n nIn this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases.
Distributed and Parallel Databases | 1995
Stefano Ceri; Maurice A. W. Houtsma; Arthur M. Keller; Pierangela Samarati
Update propagation and transaction atomicity are major obstacles to the development of replicated databases. Many practical applications, such as automated teller machine networks, flight reservation, and part inventory control, do not require these properties. In this paper we present an approach for incrementally updating a distributed, replicated database without requiring multi-site atomic commit protocols. We prove that the mechanism is correct, as it asymptotically performs all the updates on all the copies. Our approach has two important characteristics: it is progressive, and non-blocking.Progressive means that the transactions coordinator always commits, possibly together with a group of other sites. The update is later propagated asynchronously to the remaining sites.Non-blocking means that each site can take unilateral decisions at each step of the algorithm. Sites which cannot commit updates are brought to the same final state by means of areconciliation mechanism. This mechanism uses the history logs, which are stored locally at each site, to bring sites to agreement. It requires a small auxiliary data structure, called reception vector, to keep track of the time unto which the other sites are guaranteed to be up-to-date. Several optimizations to the basic mechanism are also discussed.
Distributed and Parallel Databases | 1993
Filippo Cacace; Stefano Ceri; Maurice A. W. Houtsma
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods.
IWDM | 1988
Martin L. Kersten; Peter M. G. Apers; Maurice A. W. Houtsma; Erik J. A. van Kuyk; Rob L. W. van de Weg
The PRISMA project is a large-scale research effort in the design and implementation of a highly parallel machine for data and knowledge processing. The PRISMA database machine is a distributed, main-memory database management system implemented in an object-oriented language that runs on top of a large message-passing multi-computer system. A knowledge-based approach is used to exploit parallelism and query processing. Moreover, it has both an SQL and a logic programming language interface. To improve the overall performance a generative approach is used to customize the relation managers.
international conference on data engineering | 1993
Maurice A. W. Houtsma; Peter M.G. Apers; Gideon L. V. Schipper
Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategies.<<ETX>>
data and knowledge engineering | 1992
Maurice A. W. Houtsma; Peter M. G. Apers
Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface. n nIn the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations. n nThe reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems.
workshop on management of replicated data | 1992
Stefano Ceri; Maurice A. W. Houtsma; Arthur M. Keller; Pierangela Samarati
The authors present the case for allowing independent updates on replicated databases. In autonomous, heterogeneous, or large scale systems, using two-phase commit for updates may be infeasible. Instead, the authors propose that a site may perform updates independently. Sites that are available can receive these updates immediately. But sites that are unavailable, or otherwise do not participate in the update transaction receive these updates later through propagation, rather than preventing the execution of the update transaction until sufficient sites can participate. Two or more sites come to agreement using a reconciliation procedure that uses reception vectors to determine how much of the history log should be transferred from one site to another. They also consider what events can initiate a reconciliation procedure.<<ETX>>
british national conference on databases | 1995
Hein M. Veenhof; Peter M.G. Apers; Maurice A. W. Houtsma
When viewing present-day technical applications that rely on the use of database systems, one notices that new techniques must be integrated in database management systems to be able to support these applications efficiently. This paper discusses one of these techniques in the context of supporting a Geographic Information System. It is known that the use of filters on geometric objects has a significant impact on the processing of 2-way spatial join queries. For this purpose, filters require approximations of objects. Queries can be optimized by filtering data not with just one but with several filters. Existing join methods are based on a combination of filters and a spatial index. The index is used to reduce the cost of the filter step and to minimize the cost of retrieving geometric objects from disk.nIn this paper we examine n-way spatial joins. Complex n-way spatial join queries require solving several 2-way joins of intermediate results. In this case, not only the profit gained from using both filters and spatial indices but also the additional cost due to using these techniques are examined. For 2-way joins of base relations these costs are considered part of physical database design. We focus on the criteria for mutually comparing filters and not on those for spatial indices. Important aspects of a multi-step filter-based n-way spatial join method are described together with performance experiments. The winning join method uses several filters with approximations that are constructed by rotating two parallel lines around the object.