Is this you? Create Your Porfile

Arie Shoshani

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arie Shoshani is active.

Explore More

Publication

Featured researches published by Arie Shoshani.

statistical and scientific database management | 1997

Summarizability in OLAP and statistical data bases

Hans-Joachim Lenz; Arie Shoshani

The summarizability of OLAP (online analytical processing) and statistical databases is an a extremely important property, because violating this condition can lead to erroneous conclusions and decisions. In this paper, we explore the conditions for summarizability. We introduce a framework for precisely specifying the context in which statistical objects are defined. We use a three-step process to define normalized statistical objects. Using this framework, we identify three necessary conditions for summarizability. We provide specific tests for each of the conditions that can be verified either from semantic knowledge or by checking the statistical database itself. We also provide the reasoning for our belief that these three summarizability conditions are sufficient as well.

ACM Transactions on Database Systems | 2006

Optimizing bitmap indices with efficient compression

Kesheng Wu; Ekow J. Otoo; Arie Shoshani

Bitmap indices are efficient for answering queries on low-cardinality attributes. In this article, we present a new compression scheme called Word-Aligned Hybrid (WAH) code that makes compressed bitmap indices efficient even for high-cardinality attributes. We further prove that the new compressed bitmap index, like the best variants of the B-tree index, is optimal for one-dimensional range queries. More specifically, the time required to answer a one-dimensional range query is a linear function of the number of hits. This strongly supports the well-known observation that compressed bitmap indices are efficient for multidimensional range queries because results of one-dimensional range queries computed with bitmap indices can be easily combined to answer multidimensional range queries. Our timing measurements on range queries not only confirm the linear relationship between the query response time and the number of hits, but also demonstrate that WAH compressed indices answer queries faster than the commonly used indices including projection indices, B-tree indices, and other compressed bitmap indices.

symposium on principles of database systems | 1997

OLAP and statistical databases: similarities and differences

Arie Shoshani

During the 1980’s there was a lot of activity in the area of Statistical Databases, focusing mostly on socioeconomic type applications, such as census data, national production and consumption patterns, etc. Tn the 1990’s the area of On-LineAnalytic Processing (OLAP) was introduced for the analysis of transaction based business data, such as retail stores transactions. Both areas deal with the representation and support of data in a multi-dimensional space. Much of the OLAP literature does not refer to the Statistical Database literature, perhaps because the connection between analyzing business data and socioeconomic data is not obvious. Furthermore, there are papers published in one area or the other whose results can be applied in both application areas. In this paper, we compare the work done in these two areas. We discuss concepts used in the conceptual modeling of the data and operations over them, efficient physical organization and access methods, as well as privacy issues. We point out the terminology used and the correspondence between terms. We identify which research aspects are emphasized in each of these areas and the reasons for that We conclude by arguing for the support of a Statistical Object data type as one of the fundamental structures that object-oriented data models and systems should support

arXiv: Computational Engineering, Finance, and Science | 2005

The Earth System Grid: Supporting the Next Generation of Climate Modeling Research

David E. Bernholdt; Shishir Bharathi; David Brown; Kasidit Chanchio; Meili Chen; Ann L. Chervenak; Luca Cinquini; Bob Drach; Ian T. Foster; Peter Fox; José I. García; Carl Kesselman; Rob S. Markel; Don Middleton; Veronika Nefedova; Line C. Pouchard; Arie Shoshani; Alex Sim; Gary Strand; Dean N. Williams

Understanding the Earths climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array of distributed supercomputers. The resulting data archive, spread over several sites, currently contains upwards of 100 TB of simulation data and is growing rapidly. Looking toward mid-decade and beyond, we must anticipate and prepare for distributed climate research data holdings of many petabytes. The Earth System Grid (ESG) is a collaborative interdisciplinary project aimed at addressing the challenge of enabling management, discovery, access, and analysis of these critically important datasets in a distributed and heterogeneous computational environment. The problem is fundamentally a Grid problem. Building upon the Globus toolkit and a variety of other technologies, ESG is developing an environment that addresses authentication, authorization for data access, large-scale data transport and management, services and abstractions for high-performance remote data access, mechanisms for scalable data replication, cataloging with rich semantic and syntactic information, data discovery, distributed monitoring, and Web-based portals for using the system.

very large data bases | 2004

On the performance of bitmap indices for high cardinality attributes

Kesheng Wu; Ekow J. Otoo; Arie Shoshani

It is well established that bitmap indices are efficient for read-only attributes with low attribute cardinalities. For an attribute with a high cardinality, the size of the bitmap index can be very large. To overcome this size problem, specialized compression schemes are used. Even though there are empirical evidences that some of these compression schemes work well, there has not been any systematic analysis of their effectiveness. In this paper, we systematically analyze the two most efficient bitmap compression techniques, the Byte-aligned Bitmap Code (BBC) and the Word-Aligned Hybrid (WAH) code. Our analyses show that both compression schemes can be optimal. We propose a novel strategy to select the appropriate algorithms so that this optimality is achieved in practice. In addition, our analyses and tests show that the compressed indices are relatively small compared with commonly used indices such as B-trees. Given these facts, we conclude that bitmap index is efficient on attributes of low cardinalities as well as on those of high cardinalities.

statistical and scientific database management | 2002

Compressing bitmap indexes for faster search operations

Kesheng Wu; Ekow J. Otoo; Arie Shoshani

We study the effects of compression on bitmap indexes. The main operations on the bitmaps during query processing are bitwise logical operations. Using the general purpose compression schemes the logical operations on the compressed bitmaps are much slower than on the uncompressed bitmaps. Specialized compression schemes, like the byte-aligned bitmap code (BBC), are usually faster in performing logical operations than the general purpose schemes, but in many cases they are still orders of magnitude slower than the uncompressed scheme. To make the compressed bitmap indexes operate more efficiently, we designed a CPU-friendly scheme which we refer to as the word-aligned hybrid code (WAH). Tests on both synthetic and real application data show that the new scheme significantly outperforms well-known compression schemes at a modest increase in storage space. Compared to BBC, WAH performs logical operations about 12 times faster and uses only 60% more space. Compared to the uncompressed scheme, in most test cases WAH is faster while still using less space. We further verified with additional tests that the improvement in logical operation speed translates to similar improvement in query processing speed.

Lawrence Berkeley National Laboratory | 2009

FastBit: interactively searching massive data

Kesheng Wu; Sean Ahern; Edward W Bethel; Jacqueline H. Chen; Hank Childs; E. Cormier-Michel; Cameron Geddes; Junmin Gu; Hans Hagen; Bernd Hamann; Wendy S. Koegler; Jerome Lauret; Jeremy S. Meredith; Peter Messmer; Ekow J. Otoo; V Perevoztchikov; A. M. Poskanzer; Prabhat; Oliver Rübel; Arie Shoshani; Alexander Sim; Kurt Stockinger; Gunther H. Weber; W. M. Zhang

As scientific instruments and computer simulations produce more and more data, the task of locating the essential information to gain insight becomes increasingly difficult. FastBit is an efficient software tool to address this challenge. In this article, we present a summary of the key underlying technologies, namely bitmap compression, encoding, and binning. Together these techniques enable FastBit to answer structured (SQL) queries orders of magnitude faster than popular database systems. To illustrate how FastBit is used in applications, we present three examples involving a high-energy physics experiment, a combustion simulation, and an accelerator simulation. In each case, FastBit significantly reduces the response time and enables interactive exploration on terabytes of data.

ACM Transactions on Database Systems | 1992

Representing extended entity-relationship structures in relational databases: a modular approach

Victor Markowitz; Arie Shoshani

A common approach to database design is to describe the structures and constraints of the database application in terms of a semantic data model, and then represent the resulting schema using the data model of a commercial database management system. Often, in practice, Extended Entity-Relationship (EER) schemas are translated into equivalent relational schemas. This translation involves different aspects: representing the EER schema using relational constructs, assigning names to relational attributes, normalization, and merging relations. Considering these aspects together, as is usually done in the design methodologies proposed in the literature, is confusing and leads to inaccurate results. We propose to treat separately these aspects and split the translation into four stages (modules) corresponding to the four aspects mentioned above. We define criteria for both evaluating the correctness of and characterizing the relationship between alternative relational representations of EER schemas.

IEEE Transactions on Software Engineering | 1985

Statistical and Scientific Database Issues

Arie Shoshani; Harry K. T. Wong

The purpose of this paper is to summarize the research issues of statistical and scientific databases (SSDBs). It organizes the issues into four major groups: physical organization and access methods, operators, logical organization and user interfaces, and miscellaneous issues. It emphasizes the differences between SSDBs and traditional database applications, and motivates the need for new and innovative techniques for the support of SSDBs. In addition to describing current work in this field, it discusses open research areas and proposes possible approaches to their solution.

statistical and scientific database management | 2003

Using bitmap index for interactive exploration of large datasets

Kesheng Wu; Wendy S. Koegler; Jacqueline H. Chen; Arie Shoshani

Many scientific applications generate large spatio-temporal datasets. A common way of exploring these datasets is to identify and track regions of interest. Usually these regions are defined as contiguous sets of points whose attributes satisfy some user defined conditions, e.g. high temperature regions in a combustion simulation. At each time step, the regions of interest may be identified by first searching for all points that satisfy the conditions and then grouping the points into connected regions. To speed up this process, the searching step may use a tree-based indexing scheme, such as a KD-tree or an Octree. However, these indices are efficient only if the searches are limited to one or a small number of selected attributes. Scientific datasets often contain hundreds of attributes and scientists frequently study these attributes in complex combinations, e.g. finding regions of high temperature and low pressure. Bitmap indexing is an efficient method for searching on multiple criteria simultaneously. We apply a bitmap compression scheme to reduce the size of the indices. In addition, we show that the compressed bitmaps can be used efficiently to perform the region growing and the region tracking operations. Analyses show that our approach scales well and our tests on two datasets from simulation of the autoignition process show impressive performance.

Explore More