Rimma V. Nehme | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rimma V. Nehme is active.

Explore More

Publication

Featured researches published by Rimma V. Nehme.

very large data bases | 2008

Transaction time indexing with version compression

David B. Lomet; Mingsheng Hong; Rimma V. Nehme; Rui Zhang

Immortal DB is a transaction time database system designed to enable high performance for temporal applications. It is built into a commercial database engine, Microsoft SQL Server. This paper describes how we integrated a temporal indexing technique, the TSB-tree, into Immortal DB to serve as the core access method. The TSB-tree provides high performance access and update for both current and historical data. A main challenge was integrating TSB-tree functionality while preserving original B+tree functionality, including concurrency control and recovery. We discuss the overall architecture, including our unique treatment of index terms, and practical issues such as uncommitted data and log management. Performance is a primary concern. To increase performance, versions are locally delta compressed, exploiting the commonality between adjacent versions of the same record. This technique is also applied to index terms in index pages. There is a tradeoff between query performance and storage space. We discuss optimizing performance regarding this tradeoff throughout the paper. The result of our efforts is a high-performance transaction time database system built into an RDBMS engine, which has not been achieved before. We include a thorough experimental study and analysis that confirms the very good performance that it achieves.

international conference on management of data | 2011

Automated partitioning design in parallel database systems

Rimma V. Nehme; Nicolas Bruno

In recent years, Massively Parallel Processors (MPPs) have gained ground enabling vast amounts of data processing. In such environments, data is partitioned across multiple compute nodes, which results in dramatic performance improvements during parallel query execution. To evaluate certain relational operators in a query correctly, data sometimes needs to be re-partitioned (i.e., moved) across compute nodes. Since data movement operations are much more expensive than relational operations, it is crucial to design a suitable data partitioning strategy that minimizes the cost of such expensive data transfers. A good partitioning strategy strongly depends on how the parallel system would be used. In this paper we present a partitioning advisor that recommends the best partitioning design for an expected workload. Our tool recommends which tables should be replicated (i.e., copied into every compute node) and which ones should be distributed according to specific column(s) so that the cost of evaluating similar workloads is minimized. In contrast to previous work, our techniques are deeply integrated with the underlying parallel query optimizer, which results in more accurate recommendations in a shorter amount of time. Our experimental evaluation using a real MPP system, Microsoft SQL Server 2008 Parallel Data Warehouse, with both real and synthetic workloads shows the effectiveness of the proposed techniques and the importance of deep integration of the partitioning advisor with the underlying query optimizer.

international conference on management of data | 2013

Split query processing in polybase

David J. DeWitt; Alan Halverson; Rimma V. Nehme; Srinath Shankar; Josep Aguilar-Saborit; Artin Avanes; Miro Flasza; Jim Gramling

This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data stored in a Hadoop cluster using the standard SQL query language. Unlike other database systems that provide only a relational view over HDFS-resident data through the use of an external table mechanism, Polybase employs a split query processing paradigm in which SQL operators on HDFS-resident data are translated into MapReduce jobs by the PDW query optimizer and then executed on the Hadoop cluster. The paper describes the design and implementation of Polybase along with a thorough performance evaluation that explores the benefits of employing a split query processing paradigm for executing queries that involve both structured data in a relational DBMS and unstructured data in Hadoop. Our results demonstrate that while the use of a split-based query execution paradigm can improve the performance of some queries by as much as 10X, one must employ a cost-based query optimizer that considers a broad set of factors when deciding whether or not it is advantageous to push a SQL operator to Hadoop. These factors include the selectivity factor of the predicate, the relative sizes of the two clusters, and whether or not their nodes are co-located. In addition, differences in the semantics of the Java and SQL languages must be carefully considered in order to avoid altering the expected results of a query.

extending database technology | 2006

SCUBA: scalable cluster-based algorithm for evaluating continuous spatio-temporal queries on moving objects

Rimma V. Nehme; Elke A. Rundensteiner

In this paper, we propose, SCUBA, a Scalable Cluster Based Algorithm for evaluating a large set of continuous queries over spatio-temporal data streams. The key idea of SCUBA is to group moving objects and queries based on common spatio-temporal properties at run-time into moving clusters to optimize query execution and thus facilitate scalability. SCUBA exploits shared cluster-based execution by abstracting the evaluation of a set of spatio-temporal queries as a spatial join first between moving clusters. This cluster-based filtering prunes true negatives. Then the execution proceeds with a fine-grained within-moving-cluster join process for all pairs of moving clusters identified as potentially joinable by a positive cluster-join match. A moving cluster can serve as an approximation of the location of its members. We show how moving clusters can serve as means for intelligent load shedding of spatio-temporal data to avoid performance degradation with minimal harm to result quality. Our experiments on real datasets demonstrate that SCUBA can achieve a substantial improvement when executing continuous queries on spatio-temporal data streams.

international conference on management of data | 2008

Configuration-parametric query optimization for physical design tuning

Nicolas Bruno; Rimma V. Nehme

Automated physical design tuning for database systems has recently become an active area of research and development. Existing tuning tools explore the space of feasible solutions by repeatedly optimizing queries in the input workload for several candidate configurations. This general approach, while scalable, often results in tuning sessions waiting for results from the query optimizer over 90% of the time. In this paper we introduce a novel approach, called Configuration-Parametric Query Optimization, that drastically improves the performance of current tuning tools. By issuing a single optimization call per query, we are able to generate a compact representation of the optimization space that can then produce very efficiently execution plans for the input query under arbitrary configurations. Our experiments show that our technique speeds-up query optimization by 30x to over 450x with virtually no loss in quality, and effectively eliminates the optimization bottleneck in existing tuning tools. Our techniques open the door for new, more sophisticated optimization strategies by eliminating the main bottleneck of current tuning tools.

conference on data and application security and privacy | 2013

FENCE: continuous access control enforcement in dynamic data stream environments

Rimma V. Nehme; Hyo-Sang Lim; Elisa Bertino

In this paper, we present FENCE framework that addresses the problem of continuous access control enforcement in dynamic data stream environments. The distinguishing characteristics of FENCE include: (1) the stream-centric approach to security, (2) the symmetric modeling of security for both continuous queries and streaming data, and (3) security-aware query processing that considers both regular and security-related selectivities. In FENCE, both data and query security restrictions are modeled in the form of streaming security metadata, called “security punctuations”, embedded inside data streams. We have implemented FENCE in a prototype DSMS and briefly summarize our performance observations.

international conference on management of data | 2012

Query optimization in microsoft SQL server PDW

Srinath Shankar; Rimma V. Nehme; Josep Aguilar-Saborit; Andrew Chung; Mostafa Elhemali; Alan Halverson; Eric R. Robinson; Mahadevan Sankara Subramanian; David J. DeWitt; Cesar A. Galindo-Legaria

In recent years, Massively Parallel Processors have increasingly been used to manage and query vast amounts of data. Dramatic performance improvements are achieved through distributed execution of queries across many nodes. Query optimization for such system is a challenging and important problem. In this paper we describe the Query Optimizer inside the SQL Server Parallel Data Warehouse product (PDW QO). We leverage existing QO technology in Microsoft SQL Server to implement a cost-based optimizer for distributed query execution. By properly abstracting metadata we can readily reuse existing logic for query simplification, space exploration and cardinality estimation. Unlike earlier approaches that simply parallelize the best serial plan, our optimizer considers a rich space of execution alternatives, and picks one based on a cost-model for the distributed execution environment. The result is a high-quality, effective query optimizer for distributed query processing in an MPP.

visual analytics science and technology | 2007

LAHVA: Linked Animal-Human Health Visual Analytics

Ross Maciejewski; Benjamin Tyner; Yun Jang; Cheng Zheng; Rimma V. Nehme; David S. Ebert; William S. Cleveland; Mourad Ouzzani; Shaun J. Grannis; Lawrence T. Glickman

Coordinated animal-human health monitoring can provide an early warning system with fewer false alarms for naturally occurring disease outbreaks, as well as biological, chemical and environmental incidents. This monitoring requires the integration and analysis of multi-field, multi-scale and multi-source data sets. In order to better understand these data sets, models and measurements at different resolutions must be analyzed. To facilitate these investigations, we have created an application to provide a visual analytics framework for analyzing both human emergency room data and veterinary hospital data. Our integrated visual analytic tool links temporally varying geospatial visualization of animal and human patient health information with advanced statistical analysis of these multi-source data. Various statistical analysis techniques have been applied in conjunction with a spatio-temporal viewing window. Such an application provides researchers with the ability to visually search the data for clusters in both a statistical model view and a spatio-temporal view. Our interface provides a factor specification/filtering component to allow exploration of causal factors and spread patterns. In this paper, we will discuss the application of our linked animal-human visual analytics (LAHVA) tool to two specific case studies. The first case study is the effect of seasonal influenza and its correlation with different companion animals (e.g., cats, dogs) syndromes. Here we use data from the Indiana Network for Patient Care (INPC) and Banfield Pet Hospitals in an attempt to determine if there are correlations between respiratory syndromes representing the onset of seasonal influenza in humans and general respiratory syndromes in cats and dogs. Our second case study examines the effect of the release of industrial wastewater in a community through companion animal surveillance.

very large data bases | 2008

Responding to Anomalous Database Requests

Ashish Kamra; Elisa Bertino; Rimma V. Nehme

Organizations have recently shown increased interest in database activity monitoring and anomaly detection techniques to safeguard their internal databases. Once an anomaly is detected, a response from the database is needed to contain the effects of the anomaly. However, the problem of issuing an appropriate response to a detected database anomaly has received little attention so far. In this paper, we propose a framework and policy language for issuing a response to a database anomaly based on the characteristics of the anomaly. We also propose a novel approach to dynamically change the state of the access control system in order to contain the damage that may be caused by the anomalous request. We have implemented our mechanisms in PostgreSQL and in the paper we discuss relevant implementation issues. We have also carried out an experimental evaluation to assess the performance overhead introduced by our response mechanism. The experimental results show that the techniques are very efficient.

very large data bases | 2014

Resource bricolage for parallel database systems

Jiexing Li; Jeffrey F. Naughton; Rimma V. Nehme

Running parallel database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. For database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines while at the same time it may under-utilize the more powerful machines. Since the processing time of a parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation. We take a first step to address this problem by introducing a technique we call resource bricolage that improves database performance in heterogeneous environments. Our approach quantifies the performance differences among machines with various resources as they process workloads with diverse resource requirements. We formalize the problem of minimizing workload execution time and view it as an optimization problem, and then we employ linear programming to obtain a recommended data partitioning scheme. We verify the effectiveness of our technique with an extensive experimental study on a commercial database system.

Explore More