Reza Sherkat | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Reza Sherkat is active.

Explore More

Publication

Featured researches published by Reza Sherkat.

very large data bases | 2009

Efficient index compression in DB2 LUW

Bishwaranjan Bhattacharjee; Lipyeow Lim; Timothy R. Malkemus; George A. Mihaila; Kenneth A. Ross; Sherman Lau; Cathy Mcarthur; Zoltan Toth; Reza Sherkat

In database systems, the cost of data storage and retrieval are important components of the total cost and response time of the system. A popular mechanism to reduce the storage footprint is by compressing the data residing in tables and indexes. Compressing indexes efficiently, while maintaining response time requirements, is known to be challenging. This is especially true when designing for a workload spectrum covering both data warehousing and transaction processing environments. DB2 Linux, UNIX, Windows (LUW) recently introduced index compression for use in both environments. This uses techniques that are able to compress index data efficiently while incurring virtually no performance penalty for query processing. On the contrary, for certain operations, the performance is actually better. In this paper, we detail the design of index compression in DB2 LUW and discuss the challenges that were encountered in meeting the design goals. We also demonstrate its effectiveness by showing performance results on typical customer scenarios.

very large data bases | 2008

On efficiently searching trajectories and archival data for historical similarities

Reza Sherkat; Davood Rafiei

We study the problem of efficiently evaluating similarity queries on histories, where a history is a d-dimensional time series for d ≥ 1. While there are some solutions for time-series and spatio-temporal trajectories where typically d ≤ 3, we are not aware of any work that examines the problem for larger values of d. In this paper, we address the problem in its general case and propose a class of summaries for histories with a few interesting properties. First, for commonly used distance functions such as the Lp-norm, LCSS, and DTW, the summaries can be used to efficiently prune some of the histories that cannot be in the answer set of the queries. Second, histories can be indexed based on their summaries, hence the qualifying candidates can be efficiently retrieved. To further reduce the number of unnecessary distance computations for false positives, we propose a finer level approximation of histories, and an algorithm to find an approximation with the least maximum distance estimation error. Experimental results confirm that the combination of our feature extraction approaches and the indexability of our summaries can improve upon existing methods and scales up for larger values of d and database sizes, based on our experiments on real and synthetic datasets of 17-dimensional histories.

international conference on data engineering | 2006

Efficiently Evaluating Order Preserving Similarity Queries over Historical Market-Basket Data

Reza Sherkat; Davood Rafiei

We introduce a new domain-independent framework for formulating and efficiently evaluating similarity queries over historical data, where given a history as a sequence of timestamped observations and the pair-wise similarity of observations, we want to find similar histories. For instance, given a database of customer transactions and a time period, we can find customers with similar purchasing behaviors over this period. Our work is different from the work on retrieving similar time series; it addresses the general problem in which a history cannot be modeled as a time series, hence the relevant conventional approaches are not applicable. We derive a similarity measure for histories, based on an aggregation of the similarities between the observations of the two histories, and propose efficient algorithms for finding an optimal alignment between two histories. Given the non-metric nature of our measure, we develop some upper bounds and an algorithm that makes use of those bounds to prune histories that are guaranteed not to be in the answer set. Our experimental results on real and synthetic data confirm the effectiveness and efficiency of our approach. For instance, when the minimum length of a match is provided, our algorithm achieves up to an order of magnitude speed-up over alternative methods.

very large data bases | 2017

Statisticum: data statistics management in SAP HANA

Anisoara Nica; Reza Sherkat; Mihnea Andrei; Xun Cheng; Martin Heidel; Christian Bensberg; Heiko Gerwens

We introduce a new concept of leveraging traditional data statistics as dynamic data integrity constraints. These data statistics produce transient database constraints, which are valid as long as they can be proven to be consistent with the current data. We denote this type of data statistics by constraint data statistics, their properties needed for consistency checking by consistency metadata, and their implied integrity constraints by implied data statistics constraints (implied constraints for short). Implied constraints are valid integrity constraints which are powerful query optimization tools employed, just as traditional database constraints, in semantic query transformation (aka query reformulation), partition pruning, runtime optimization, and semi-join reduction, to name a few. To our knowledge, this is the first work introducing this novel and powerful concept of deriving implied integrity constraints from data statistics. We discuss theoretical aspects of the constraint data statistics concept and their integration into query processing. We present the current architecture of data statistics management in SAP HANA and detail how constraint data statistics are designed and integrated into this architecture. As an instantiation of this framework, we consider dynamic partition pruning for data aging scenarios. We discuss our current implementation for constraint data statistics objects in SAP HANA which can be used for dynamic partition pruning. We enumerate their properties and show how consistency checking for implied integrity constraints is supported in the data statistics architecture. Our experimental evaluations on the TPC-H benchmark and a real customer application confirm the effectiveness of the implied integrity constraints; (1) for 59% of TPC-H queries, constraint data statistics utilization results in pruning cold partitions and reducing memory consumption, and (2) we observe up to 3 orders of magnitude speed-up in query processing time, for a real customer running an S/4HANA application.

international conference on data engineering | 2007

On MBR Approximation of Histories for Historical Queries: Expectations and Limitations

Reza Sherkat; Davood Rafiei

Traditional approaches for efficiently processing historical queries, where a history is a multidimensional time-series, employ a two step filter-and-refine scheme. In the filter step, an approximation of each history often as a set of minimum bounding hyper-rectangles (MBRs) is organized using a spatial index structure such as R-tree. The index is used to prune redundant disk accesses and to reduce the number of pairwise comparisons required in the refine step. To improve the efficiency of the filtering step, a heuristic is used to decrease the expected number of MBRs that overlap with a query, by reducing the volume of empty space indexed by the index. The heuristic selects, among all possible splitting schemes of a history, the one which results to a set of MBRs with minimum total volume. Although this heuristic is expected to improve the performance of spatial and history based queries with small temporal and spatial extents, in many real settings, the performance of historical queries depends on the extent of the query. Moreover, the optimal approximation of a history is not always the one with minimum total volume. In this paper, we present the limitations of using volume as a criteria for approximating histories, specially in high dimensional cases, where it is not feasible to index MBRs by traditional spatial index structures.

Archive | 2008