George P. Copeland | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where George P. Copeland is active.

Explore More

Publication

Featured researches published by George P. Copeland.

conference on object oriented programming systems languages and applications | 1986

Object identity

Setrag N. Khoshafian; George P. Copeland

Identity is that property of an object which distinguishes each object from all others. Identity has been investigated almost independently in general-purpose programming languages and database languages. Its importance is growing as these two environments evolve and merge. We describe a continuum between weak and strong support of identity, and argue for the incorporation of the strong notion of identity at the conceptual level in languages for general purpose programming, database systems and their hybrids. We define a data model that can directly describe complex objects, and show that identity can easily be incorporated in it. Finally, we compare different implementation schemes for identity and argue that a surrogate-based implementation scheme is needed to support the strong notion of identity.

international conference on management of data | 1988

Data placement in Bubba

George P. Copeland; William Alexander; Ellen E. Boughter; Tom W. Keller

This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue. In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead. We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.

IEEE Transactions on Knowledge and Data Engineering | 1990

Prototyping Bubba, a highly parallel database system

Haran Boral; William Alexander; Larry Clay; George P. Copeland; Scott Danforth; Michael J. Franklin; Brian E. Hart; Marc G. Smith; Patrick Valduriez

Bubba is a highly parallel computer system for data-intensive applications. The basis of the Bubba design is a scalable shared-nothing architecture which can scale up to thousands of nodes. Data are declustered across the nodes (i.e. horizontally partitioned via hashing or range partitioning) and operations are executed at those nodes containing relevant data. In this way, parallelism can be exploited within individual transactions as well as among multiple concurrent transactions to improve throughput and response times for data-intensive applications. The current Bubba prototype runs on a commercial 40-node multicomputer and includes a parallelizing compiler, distributed transaction management, object management, and a customized version of Unix. The current prototype is described and the major design decisions that went into its construction are discussed. The lessons learned from this prototype and its predecessors are presented. >

international conference on management of data | 1989

A comparison of high-availability media recovery techniques

George P. Copeland; Tom W. Keller

We compare two high-availability techniques for recovery from media failures in database systems. Both techniques achieve high availability by having two copies of all data and indexes, so that recovery is immediate. “Mirrored declustering” spreads two copies of each relation across two identical sets of disks. “Interleaved declustering” spreads two copies of each relation across one set of disks while keeping both copies of each tuple on separate disks. Both techniques pay the same costs of doubling storage requirements and requiring updates to be applied to both copies. Mirroring offers greater simplicity and universality. Recovery can be implemented at lower levels of the system software (e.g., the disk controller). For architectures that do not share disks globally, it allows global and local cluster indexes to be independent. Also, mirroring does not require data to be declustered (i.e., spread over multiple disks). Interleaved declustering offers significant improvements in recovery time, mean time to loss of both copies of some data, throughput during normal operation, and response time during recovery. For all architectures, interleaved declustering enables data to be spread over twice as many disks for improved load balancing. We show how tuning for interleaved declustering is simplified because it is dependent only on a few parameters that are usually well known for a specific workload and system configuration.

international conference on data engineering | 1987

A query processing strategy for the decomposed storage model

Setrag Khoshafian; George P. Copeland; Thomas Jagodits; Haran Boral; Patrick Valduriez

Handling parallelism in database systems involves the specification of a storage model, a placement strategy, and a query processing strategy. An important goal is to determine the appropriate combination of these three strategies in order to obtain the best performance advantages. In this paper we present a novel and promising query processing strategy for a decomposed storage model. We discuss some of the qualitative advantages of the scheme. We also compare the performance of the proposed “pivot” strategy with conventional query processing for the n-ary storage model. The comparison is performed using the Wisconsin Benchmarks.

international conference on management of data | 1988

Process and dataflow control in distributed data-intensive systems

William Alexander; George P. Copeland

In dataflow architectures, each dataflow operation is typically executed on a single physical node. We are concerned with distributed data-intensive systems, in which each base (i.e., persistent) set of data has been declustered over many physical nodes to achieve load balancing. Because of large base set size, each operation is executed where the base set resides, and intermediate results are transferred between physical nodes. In such systems, each dataflow operation is typically executed on many physical nodes. Furthermore, because computations are data-dependent, we cannot know until run time which subset of the physical nodes containing a particular base set will be involved in a given dataflow operation. This uncertainty creates several problems. We examine the problems of efficient program loading, dataflow—operation activation and termination, control of data transfer among dataflow operations, and transaction commit and abort in a distributed data-intensive system. We show how these problems are interrelated, and we present a unified set of mechanisms for efficiently solving them. For some of the problems, we present several solutions and compare them quantitatively.

extending database technology | 1990

Uniform object management

George P. Copeland; Michael J. Franklin; Gerhard Weikum

Most real-world applications require a capability for both general-purpose programming and database transactions on persistent data. Unfortunately, the implementation techniques for these capabilities are notoriously incompatible. Programming languages stress memory-resident transient data with a rich collection of data types, while database systems stress disk-resident persistent data with a limited collection of data types. Even in object-oriented database systems, combining these capabilities is traditionally done using a two-level storage model in which storage formats are quite different. This approach suffers from the performance overhead required to translate data between these two levels.

international conference on data engineering | 1988

Parallel query processing for complex objects

Setrag Khoshafian; Patrick Valduriez; George P. Copeland

The authors investigate a direct storage scheme for complex objects called FIHSM (Fully Inverted Hierarchical Storage Model) and propose a novel parallel-query-processing strategy (QPS) for it. The QPS has four phases (select, pivot, value materialize and compose). With a declustered placement strategy, each of these phases provides for both inter and intra-operation parallelism. Furthermore, partial results of one phase could be pipelined to the subsequent phase. The proposed four-phase structured algorithm is based on heuristics and thus avoids the prohibitive exhaustive searches which are needed for optimizing query executions in parallel environments.<<ETX>>

international conference on data engineering | 1986

Buffering schemes for permanent data

George P. Copeland; Setrag N. Khoshafian; Marc G. Smith; Patrick Valduriez

The availability of larger RAM spaces for DBMSs provides interesting opportunities for performance enhancements, especially in buffer management. In this paper we propose and compare two alternative strategies for the buffer management of permanent data (i.e., the data committed by transactions) called block buffering and attribute buffering. These strategies use statistics to capture the changing locality of a reference string. We model and demonstrate the impact of locality on the performance of buffering. We also analyze and compare the effect of both the attribute and predicate dimensions of locality on buffering, varying a number of parameters including the degree of locality, RAM size, and RAM utilization.

IWDM '89 Proceedings of the Sixth International Workshop on Database Machines | 1989

An Experiment on Response Time Scalability in Bubba

Marc G. Smith; Bill Alexander; Haran Boral; George P. Copeland; Tom Keller; Herb Schwetman; Chii-Ren Young

We describe results from an experiment that investigates the scalability of response time performance in shared-nothing systems, such as the Bubba parallel database machine. In particular, we show how—and how much—potential response time improvements for certain transaction types can be impaired in shared-nothing architectures by the increased cost of transaction startup, communication, and synchronization as the degree of execution parallelism is increased. We further show how these effects change under increased levels of concurrency and heterogeneity in the transaction workload. From the results, we conclude that although parallelism must be limited in some circumstances, in general the benefits of increased parallelism in shared-nothing systems exceed the costs.

Explore More