Witold Litwin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Witold Litwin is active.

Explore More

Publication

Featured researches published by Witold Litwin.

ACM Computing Surveys | 1990

Interoperability of multiple autonomous databases

Witold Litwin; Leo Mark; Nick Roussopoulos

Database systems were a solution to the problem of shared access to heterogeneous files created by multiple autonomous applications in a centralized environment. To make data usage easier, the files were replaced by a globally integrated database. To a large extent, the idea was successful, and many databases are now accessible through local and long-haul networks. Unavoidably, users now need shared access to multiple autonomous databases. The question is what the corresponding methodology should be. Should one reapply the database approach to create globally integrated distributed database systems or should a new approach be introduced? We argue for a new approach to solving such data management system problems, called multidatabase or federated systems. These systems make databases interoperable, that is, usable without a globally integrated schema. They preserve the autonomy of each database yet support shared access. Systems of this type will be of major importance in the future. This paper first discusses why this is the case. Then, it presents methodologies for their design. It further shows that major commerical relational database systems are evolving toward multidatabase systems. The paper discusses their capabilities and limitations, presents and discusses a set of prototypes, and, finally, presents some current research issues.

very large data bases | 1996

Mariposa: a wide-area distributed database system

Michael Stonebraker; Paul M. Aoki; Witold Litwin; Avi Pfeffer; Adam Sah; Jeff Sidell; Carl Staelin; Andrew Yu

Abstract. The requirements of wide-area distributed database systems differ dramatically from those of local-area network systems. In a wide-area network (WAN) configuration, individual sites usually report to different system administrators, have different access and charging algorithms, install site-specific data type extensions, and have different constraints on servicing remote requests. Typical of the last point are production transaction environments, which are fully engaged during normal business hours, and cannot take on additional load. Finally, there may be many sites participating in a WAN distributed DBMS. In this world, a single program performing global query optimization using a cost-based optimizer will not work well. Cost-based optimization does not respond well to site-specific type extension, access constraints, charging algorithms, and time-of-day constraints. Furthermore, traditional cost-based distributed optimizers do not scale well to a large number of possible processing sites. Since traditional distributed DBMSs have all used cost-based optimizers, they are not appropriate in a WAN environment, and a new architecture is required. We have proposed and implemented an economic paradigm as the solution to these issues in a new distributed DBMS called Mariposa. In this paper, we present the architecture and implementation of Mariposa and discuss early feedback on its operating characteristics.

ACM Transactions on Database Systems | 1996

LH*—a scalable, distributed data structure

Witold Litwin; Marie-Anna Neimat; Donovan A. Schneider

We present a scalable distributed data structure called LH*. LH* generalizes Linear Hashing (LH) to distributed RAM and disk files. An LH* file can be created from records with primary keys, or objects with OIDs, provided by any number of distributed and autonomous clients. It does not require a central directory, and grows gracefully, through splits of one bucket at a time, to virtually any number of servers. The number of messages per random insertion is one in general, and three in the worst case, regardless of the file size. The number of messages per key search is two in general, and four in the worst case. The file supports parallel operations, e.g., hash joins and scans. Performing a parallel operation on a file of M buckets costs at most 2M + 1 messages, and between 1 and O(log2 Mrounds of messages. We first describle the basic LH* scheme where a coordinator site manages abucket splits, and splits a bucket every time a collision occurs. We show that the average load factor of an LH* file is 65%–70% regardless of file size, and bucket capacity. We then enhance the scheme with load control, performed at no additional message cost. The average load factor then increases to 80–95%. These values are about that of LH, but the load factor for LH* varies more. We nest define LH* schemes without a coordinator. We show that insert and search costs are the same as for the basic scheme. The splitting cost decreases on the average, but becomes more variable, as cascading splits are needed to prevent file overload. Next, we briefly describe two variants of splitting policy, using parallel splits and presplitting that should enhance performance for high-performance applications. All together, we show that LH* files can efficiently scale to files that are orders of magnitude larger in size than single-site files. LH* files that reside in main memory may also be much faster than single-site disk files. Finally, LH* files can be more efficient than any distributed file with a centralized directory, or a static parallel or distributed hash file.

IEEE Computer | 1991

The Pegasus heterogeneous multidatabase system

Rafi Ahmed; P. DeSmedt; Weimin Du; William Kent; Mohammad A. Ketabchi; Witold Litwin; Abbas Rafii; Ming-Chien Shan

Pegasus, a heterogeneous multidatabase management system that responds to the need for effective access and management of shared data across in a wide range of applications, is described. Pegasus provides facilities for multidatabase applications to access and manipulate multipole autonomous heterogeneous distributed object-oriented relational, and other information systems through a uniform interface. It is a complete data management system that integrates various native and local databases. Pegasus takes advantage of object-oriented data modeling and programming capabilities. It uses both type and function abstractions to deal with mapping and integration problems. Function implementation can be defined in an underlying database language or a programming language. Data abstraction and encapsulation facilities in the Pegasus object model provide an extensible framework for dealing with various kinds of heterogeneities in the traditional database systems and nontraditional data sources.<<ETX>>

international conference on parallel and distributed information systems | 1994

An economic paradigm for query processing and data migration in Mariposa

Michael Stonebraker; Robert Devine; Marcel Kornacker; Witold Litwin; Avi Pfeffer; Adam Sah; Carl Staelin

Many new database applications require very large volumes of data. Mariposa is a database system under construction at Berkeley responding to this need. This system combines the best features of traditional distributed database systems, object-oriented DBMSs, tertiary memory file systems and distributed file systems. Mariposa objects can be stored over thousands of autonomous sites and on memory hierarchies with very large capacity. This scale of the system leads to complex query execution and storage management issues, unsolvable in practice with traditional techniques. We propose an economic paradigm as the solution. A query receives a budges which it spends to obtain the answers. Each site attempts to maximize income by buying and selling storage objects, and processing queries for locally stored objects. We present the protocols which underlie the Mariposa economy.<<ETX>>

international conference on management of data | 1991

Language features for interoperability of databases with schematic discrepancies

Ravi Krishnamurthy; Witold Litwin; William Kent

Present relational language capabilities are insufficient to provide interoperability of databases even if they are all relational. In particular, unified multidatabaae view definitions cannot reconcile schematic discrepancies, where data in one database correspond to metadata of another. We claim that following new features are necessary: 1. Higher order expressions where variables can range over data and metadata, including database names. 2. Higher order (multidatabase) view definitions, where the number of relations or of attributes defined, is dependent on the state of the database(s). 3. Complete view updatability for the users of multidatabase views. We propose these features in the context of a Horn clause based language, called Interoperable Database Language, (IDL).

ieee conference on mass storage systems and technologies | 2003

Reliability mechanisms for very large storage systems

Qin Xin; Ethan L. Miller; Thomas J. E. Schwarz; Darrell D. E. Long; Scott A. Brandt; Witold Litwin

Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus on two types of errors: nonrecoverable read errors and drive failures. We discuss mechanisms for detecting and recovering from such errors, introducing improved techniques for detecting errors in disk reads and fast recovery from disk failure. We show that simple RAID cannot guarantee sufficient reliability; our analysis examines the tradeoffs among other schemes between system availability and storage efficiency. Based on our data, we believe that two-way mirroring should be sufficient for most large storage systems. For those that need very high reliability, we recommend either three-way mirroring or mirroring combined with RAID.

IEEE Transactions on Knowledge and Data Engineering | 1992

Main memory orientated optimization of OO queries using typed Datalog with foreign predicates

Witold Litwin; Tore Risch

Object-oriented database systems (OODBs) have created a demand for relationally complete, extensible, and declarative object-oriented query languages. Until now, the runtime performance of such languages was far behind that of procedural OO interfaces. One reason is the internal use of a relational engine with magnetic disk resident databases. The authors address the processing of the declarative OO language WS-OSQL, provided by the fully operational prototype OODB called WS-IRIS. A WS-IRIS database is main memory (MM) resident. The system architecture, data structures, and optimization techniques are designed accordingly. WS-OSQL queries are compiled into an OO extension of Datalog called ObjectLog, providing for objects, typing, overloading, and foreign predicates for extensibility. Cost-based optimizations in WS-IRIS using ObjectLog are presented. Performance tests show that WS-IRIS is about as fast as current OODBs with procedural interfaces only and is much faster than known relationally complete systems. These results would not be possible for a traditional disk-based implementation. However, MM residency of a database appears to be only a necessary condition for better performance. An efficient optimization is of crucial importance as well. >

international conference on data engineering | 1994

Mariposa: a new architecture for distributed data

Michael Stonebraker; Paul M. Aoki; Robert Devine; Witold Litwin; Michael A. Olson

We describe the design of Mariposa, an experimental distributed data management system that provides high performance in an environment of high data mobility and heterogeneous host capabilities. The Mariposa design unifies the approaches taken by distributed file systems and distributed databases. In addition, Mariposa provides a general, flexible platform for the development of new algorithms for distributed query optimization, storage management, and scalable data storage structures. This flexibility is primarily due to a unique rule-based design that permits autonomous, local-knowledge decisions to be made regarding data placement, query execution location, and storage management.<<ETX>>

Proceedings of the IEEE | 1987

An overview of the multi-database manipulation language MDSL

Witold Litwin; Abdelaziz Abdellatif

With the increase in availability of databases, data needed by a user are frequently in separate autonomous databases. The logical properties of such data differ from the classical ones within a single database. In particular, they call for new functions for data manipulation. MDSL is a new data manipulation language providing such functions. Most of the MDSL functions are not available in other languages.

Explore More