Is this you? Create Your Porfile

Doron Rotem

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Doron Rotem is active.

Explore More

Publication

Featured researches published by Doron Rotem.

international conference on data engineering | 1991

Spatial join indices

Doron Rotem

Algorithms based on grid files as the underlying spatial index are presented for spatial joins in databases which store images, pictures, maps and drawings. For typical data distributions, it is shown that the size of the index and its maintenance cost are relatively small. The effect of diagonal distributions and different densities of the two grid files on the size of the index is also studied. It is expected that similar algorithms can be employed with other types of multidimensional data structures.<<ETX>>

Statistics and Computing | 1995

Random sampling from databases: a survey

Frank Olken; Doron Rotem

This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in constructing DBMS sampling algorithms, e.g. acceptance/rejection and reservoir sampling. A discussion of sampling from various data structures follows: B+ trees, hash files, spatial data structures (including R-trees and quadtrees). Algorithms for sampling from simple relational queries, e.g. single relational operators such as selection, intersection, union, set difference, projection, and join are then described. We then describe sampling for estimation of aggregates (e.g. the size of query results). Here we discuss both clustered sampling, and sequential sampling approaches. Decision-theoretic approaches to sampling for query optimization are reviewed.

statistical and scientific database management | 1990

Random Sampling from Database Files: A Survey

Frank Olken; Doron Rotem

In this paper we survey known results on algorithms, data structures, and some applications of random sampling from databases. We first discuss various reasons for sampling from databases, and for inclusion of sampling as a DBMS operator. We consider basic sampling algorithms, sampling from trees, sampling from hash tables, and auxiliary memory resident index information to facilitate sampling.

statistical and scientific database management | 1999

Multidimensional indexing and query coordination for tertiary storage management

Arie Shoshani; Luis M. Bernardo; Henrik Nordberg; Doron Rotem; Alex Sim

In many scientific domains, experimental devices or simulation programs generate large volumes of data. The volumes of data may reach hundreds of terabytes and therefore it is impractical to store them on disk systems. Rather they are stored on robotic tape systems that are managed by some mass storage system (MSS). A major bottleneck in analyzing the simulated/collected data is the retrieval of subsets from the tertiary storage system. We describe the architecture and implementation of a Storage Access Coordination System (STACS) designed to optimize the use of a disk cache, and thus minimize the number of files read from tape. We achieve this by using a specialized index to locate the relevant data on tapes, and by coordinating file caching over multiple queries. We focus on a specific application area, a high energy physics data management and analysis environment. STACS was implemented and is being incorporated in an operational system, scheduled to go online at the end of 1999. We also include the results of various tests that demonstrate the benefits and efficiency gained of using the STACS.

Journal of the ACM | 1978

Generation of Binary Trees from Ballot Sequences

Doron Rotem; Yaakov L. Varol

An efficient algorithm for generating and indexing all shapes of n-noded binary trees is described The algorithm is based on a correspondence between binary trees and the class of stack-sortable permutations. together with a representation of such permutatmns as ballot sequences Justification for the related procedures is given, and their efficiency estabhshed by comparison to other approaches

international conference on data engineering | 1995

Buffer management for video database systems

Doron Rotem; J.L. Zhao

Future multimedia information systems are likely to manage thousands of videos with various lengths and display requirements. Mismatch of playback and delivery rates of compressed video data requires sophisticated buffer management algorithms to guarantee smooth playback of video data. In this paper, we address some of the many design and operational issues including buffer size requirements, refreshing policies, and support of multiple access points to the same video object. Three different buffer management strategies are proposed and analyzed to minimize the average waiting time while ensuring display without jerkiness. We also evaluate the effectiveness these buffer management strategies with a simulation study.<<ETX>>

IEEE Transactions on Computers | 1985

Distributed Sorting

Doron Rotem; Nicola Santoro; Jeffrey B. Sidney

The problem of sorting a file distributed over a number of sites of a communication network is examined. Two versions of this problem are investigated; distributed solution algorithms are presented; and their communication complexity analyzed both in the worst and in the average case. The worst case bounds are shown to be sharp, with respect to order of magnitude, for large files.

Information Systems | 1995

Efficient organization and access of multi-dimensional datasets on tertiary storage systems

Ling Tony Chen; Robert S. Drach; M. Keating; Steven S. Louis; Doron Rotem; Arie Shoshani

Abstract This paper addresses the problem of urgently needed data management techniques for efficiently retrieving requested subsets of large datasets from mass storage devices. This problem is especially critical for scientific investigators who need ready access to the large volume of data generated by large-scale supercomputer simulations and physical experiments as well as the automated collection of observations by monitoring devices and satellites. This problem also negates the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater than the time to transmit that subset over a fast network. This paper focuses on very large spatial and temporal datasets generated by simulation of climate models, but the techniques described here are applicable to any large multidimensional grid data. The main requirement is to efficiently access relevant information contained within much larger datasets for analysis and interactive visualization. Although these problems are now becoming more widely recognized, the problem persists because the access speed of robotic storage devices continues to be the bottleneck. To address this problem, we have developed algorithms for partitioning the original datasets into “clusters” based on analysis of data access patterns and storage device characteristics. Further, we have designed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We describe in this paper the approach we have taken, the partitioning algorithms, and simulation and experimental results that show 1 to 2 orders of magnitude in access improvements for predicted query types. We further describe the design and implementation of improvements to a specific storage management system, UniTree, which are necessary to support the enhanced protocols. In addition, we describe the development of a partitioning workbench to help scientists select the preferred solutions.

conference on information and knowledge management | 2005

Optimizing candidate check costs for bitmap indices

Doron Rotem; Kurt Stockinger; Kesheng Wu

In this paper, we propose a new strategy for optimizing the placement of bin boundaries to minimize the cost of query evaluation using bitmap indices with binning. For attributes with a large number of distinct values, often the most efficient index scheme is a bitmap index with binning. However, this type of index may not be able to fully resolve some user queries. To fully resolve these queries, one has to access parts of the original data to check whether certain candidate records actually satisfy the specified conditions. We call this procedure the candidate check, which usually dominates the total query processing time. Given a set of user queries, we seek to minimize the total time required to an-swer the queries by optimally placing the bin boundaries. We show that our dynamic programming based algorithm can efficiently determine the bin boundaries. We verify our analysis with some real user queries from the Sloan Digital Sky Survey. For queries that require significant amount of time to perform candidate check, using our optimal bin boundaries reduces the candidate check time by a factor of 2 and the total query processing time by 40%.

symposium on principles of database systems | 1988

Analytical modeling of materialized view maintenance

Jaideep Srivastava; Doron Rotem

A matenahzed view 1s a stored copy of the result of retrlenng the view from the database We conalder here, news that can be constructed from the relational algebra operations select, project and Jorn Also aggregates such as sum or count over views are considered Conventional systems use query mo&ficatlon, where the query on a view IS modified to operate on one or more of the base relations, [STON 751 Materlahnng a view before a query IS made on It has been a recent proposal

Explore More