Is this you? Create Your Porfile

Frank Olken

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Frank Olken is active.

Explore More

Publication

Featured researches published by Frank Olken.

international conference on management of data | 1984

Implementation techniques for main memory database systems

David J. DeWitt; Randy H. Katz; Frank Olken; Leonard D. Shapiro; Michael Stonebraker; David A. Wood

With the availability of very large, relatively inexpensive main memories, it is becoming possible keep large databases resident in main memory In this paper we consider the changes necessary to permit a relational database system to take advantage of large amounts of main memory We evaluate AVL vs B+-tree access methods for main memory databases, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory As expected, B+-trees are the preferred storage mechanism unless more than 80--90% of the database fits in main memory A somewhat surprising result is that hash based query processing strategies are advantageous for large memory situations

Statistics and Computing | 1995

Random sampling from databases: a survey

Frank Olken; Doron Rotem

This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in constructing DBMS sampling algorithms, e.g. acceptance/rejection and reservoir sampling. A discussion of sampling from various data structures follows: B+ trees, hash files, spatial data structures (including R-trees and quadtrees). Algorithms for sampling from simple relational queries, e.g. single relational operators such as selection, intersection, union, set difference, projection, and join are then described. We then describe sampling for estimation of aggregates (e.g. the size of query results). Here we discuss both clustered sampling, and sequential sampling approaches. Decision-theoretic approaches to sampling for query optimization are reviewed.

statistical and scientific database management | 1990

Random Sampling from Database Files: A Survey

Frank Olken; Doron Rotem

In this paper we survey known results on algorithms, data structures, and some applications of random sampling from databases. We first discuss various reasons for sampling from databases, and for inclusion of sampling as a DBMS operator. We consider basic sampling algorithms, sampling from trees, sampling from hash tables, and auxiliary memory resident index information to facilitate sampling.

international conference on management of data | 2004

Database management for life sciences research

H. V. Jagadish; Frank Olken

The life sciences provide a rich application domain for data management research, with a broad diversity of problems that can make a significant difference to progress in life sciences research. This article is an extract from the Report of the NSF Workshop on Data Management for Molecular and Cell Biology, edited by H. V. Jagadish and Frank Olken. The workshop was held at the National Library of Medicine, Bethesda, MD, Feb. 2-3, 2003.

conference on high performance computing (supercomputing) | 2002

Disk Cache Replacement Algorithm for Storage Resource Managers in Data Grids

Ekow J. Otoo; Frank Olken; Arie Shoshani

We address the problem of cache replacement policies for Storage Resource Managers (SRMs) that are used in Data Grids. An SRM has a disk storage of bounded capacity that retains some N objects. A replacement policy is applied to determine which object in the cache needs to be evicted when space is needed. We define a utility function for ranking the candidate objects for eviction and then describe an efficient algorithm for computing the replacement policy based on this function. This computation takes time O (log N). We compare our policy with traditional replacement policies such as Least Frequently Used (LFU), Least Recently Used (LRU), LRU-K, Greedy Dual Size (GDS), etc., using simulations of both synthetic and real workloads of file accesses to tertiary storage. Our simulations of replacement policies account for delays in cache space reservation, data transfer and processing. The results obtained show that our proposed method is the most cost effective cache replacement policy for Storage Resource Managers (SRM).

international conference on data engineering | 1992

Maintenance of materialized views of sampling queries

Frank Olken; Doron Rotem

The authors discuss materialized views of random sampling queries of a relational database. They show how to maintain such views in the presence of insertions, deletions, and updates of the base relations. The basic idea is to reuse the maximal portion of the original sample when constructing the updated sample. The results are based on a synthesis of view update techniques and sampling algorithms. It is demonstrated that maintenance of materialized sample views may be substantially cheaper than resampling.<<ETX>>

international conference on management of data | 1990

Random sampling from hash files

Frank Olken; Doron Rotem; Ping Xu

In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and hash files with separate overflow chains. The dynamic hashing methods considered are Linear Hash files [Lit80] and Extendible Hash files [FNPS79]. We give the cost of sampling in terms of the cost of successfully searching a hash file and show how to exploit features of the dynamic hashing methods to improve sampling efficiency.

international conference on data engineering | 1993

Sampling from spatial databases

Frank Olken; Doron Rotem

Techniques for obtaining random point samples from spatial databases are described. Random points are sought from a continuous domain that satisfy a spatial predicate which is represented in the database as a collection of polygons. Several applications of spatial sampling are described. Sampling problems are characterized in terms of two key parameters: coverage (selectivity), and expected stabbing number (overlap). Two fundamental approaches to sampling with spatial predicates, depending on whether one samples first or evaluates the predicate first, are discussed. The approaches are described in the context of both quadtrees and R-trees, detailing the sample-first, A/R-tree, and partial area tree algorithms. A sequential algorithm, the one-pass spatial reservoir algorithm, is also described.<<ETX>>

Uncertainty Management in Information Systems | 1997

Uncertain, Incomplete, and Inconsistent Data in Scientific and Statistical Databases

Stephen K. Kwan; Frank Olken; Doron Rotem

This chapter is a survey of several issues and applications in uncertain, inconsistent, and incomplete data in scientific and statistical databases (SSDBs).

Omics A Journal of Integrative Biology | 2003

Graph Data Management for Molecular Biology

Frank Olken

IT IS OUR VIEW that labeled directed graph data models (simple, nested, or hypergraphs) can naturally and usefully capture a wide variety of biological data and queries. We believe that development of general purpose graph data management systems (GDMSs) could become major platforms for development of a wide variety of bioinformatics database systems spanning applications from DNA sequences, chemical structure graphs, to contact graphs and biopathways. The use of a common data model and query language across numerous biological applications is very attractive from both a development and user standpoint. Considerable research and development effort are still required to extend current systems to meet modern requirements.

Explore More