Mohamed Zait | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohamed Zait is active.

Explore More

Publication

Featured researches published by Mohamed Zait.

Future Generation Computer Systems | 1997

A comparative study of clustering methods

Mohamed Zait; Hammou Messatfa

Copyright (c) 1997 Elsevier Science B.V. All rights reserved. In this paper we propose a methodology for comparing clustering methods based on the quality of the result and the performance of the execution. We applied it to several known clustering methods: FastClust, Autoclass, Relational data analysis, and Kohonen nets. The quality of a clustering result depends on both the similarity measure used by the method and its implementation. An important feature of our methodology is a synthetic data generation program that allows producing data sets with specific (or desired) patterns using a combination of parameters, such as the number and the type of the attributes, the number of records, etc. We define a {metricto measure the quality of a clustering method, i.e., its ability to discover some or all of the hidden patterns. The performance study is based on the resource consumption, i.e., CPU time and memory space.

very large data bases | 2002

SQL memory management in Oracle9i

Benoit Dageville; Mohamed Zait

Complex database queries require the use of memory-intensive operators like sort and hash-join. Those operators need memory, also referred to as SQL memory, to process their input data. For example, a sort operator uses a work area to perform the in-memory sort of a set of rows. The amount of memory allocated by these operators greatly affects their performance. However, there is only a finite amount of memory available in the system, shared by all concurrent operators. The challenge for database systems is to design a fair and efficient strategy to manage this memory. Commercial database systems rely on database administrators (DBA) to supply an optimal setting for configuration parameters that are internally used to decide how much memory to allocate to a given database operator. However, database systems continue to be deployed in new areas, e.g, e-commerce, and the database applications are increasingly complex, e.g, to provide more functionality, and support more users. One important consequence is that the application workload is very hard, if not impossible, to predict. So, expecting a DBA to find an optimal value for memory configuration parameters is not realistic. The values can only be optimal for a limited period of time while the workload is within the assumed range. Ideally, the optimal value should adapt in response to variations in the application workload. Several research projects addressed this problem in the past, but very few commercial systems proposed a comprehensive solution to managing memory used by SQL operators in a database application with a variable workload. This paper presents a new model used in Oracle9i to manage memory for database operators. This approach is automatic, adaptive and robust. We will present the architecture of the memory manager, the internal algorithms, and a performance study showing its superiority.

very large data bases | 2009

Enhanced subquery optimizations in Oracle

Srikanth Bellamkonda; Rafi Ahmed; Andrew Witkowski; Angela Amor; Mohamed Zait; Chun Chieh Lin

This paper describes enhanced subquery optimizations in Oracle relational database system. It discusses several techniques -- subquery coalescing, subquery removal using window functions, and view elimination for group-by queries. These techniques recognize and remove redundancies in query structures and convert queries into potentially more optimal forms. The paper also discusses novel parallel execution techniques, which have general applicability and are used to improve the scalability of queries that have undergone some of these transformations. It describes a new variant of antijoin for optimizing subqueries involved in the universal quantifier with columns that may have nulls. It then presents performance results of these optimizations, which show significant execution time improvements.

international conference on data engineering | 2015

Oracle Database In-Memory: A dual format in-memory database

Tirthankar Lahiri; Shasank Chavan; Maria Colgan; Dinesh Das; Amit Ganesh; Michael J. Gleeson; Sanket Hase; Allison L. Holloway; Jesse Kamp; Teck-Hua Lee; Juan R. Loaiza; Neil Macnaughton; Vineet Marwah; Niloy Mukherjee; Atrayee Mullick; Sujatha Muthulingam; Vivekanandhan Raja; Marty Roth; Ekrem Soylemez; Mohamed Zait

The Oracle Database In-Memory Option allows Oracle to function as the industry-first dual-format in-memory database. Row formats are ideal for OLTP workloads which typically use indexes to limit their data access to a small set of rows, while column formats are better suited for Analytic operations which typically examine a small number of columns from a large number of rows. Since no single data format is ideal for all types of workloads, our approach was to allow data to be simultaneously maintained in both formats with strict transactional consistency between them.

international conference on management of data | 2008

Efficient and scalable statistics gathering for large databases in Oracle 11g

Sunil Chakkappen; Thierry Cruanes; Benoit Dageville; Linan Jiang; Uri Shaft; Hong Su; Mohamed Zait

Large tables are often decomposed into smaller pieces called partitions in order to improve query performance and ease the data management. Query optimizers rely on both the statistics of the entire table and the statistics of the individual partitions to select a good execution plan for a SQL statement. In Oracle 10g, we scan the entire table twice, one pass for gathering the table level statistics and the other pass for gathering the partition level statistics. A consequence of this gathering method is that, when the data in some partitions change, not only do we need to scan the changed partitions to gather the partition level statistics, but also we have to scan the entire table again to gather the table level statistics. Oracle 11g adopts a one-pass distinct sampling based method which can accurately derive the table level statistics from the partition level statistics. When data change, Oracle only re-gathers the statistics for the changed partitions and then derives the table level statistics without touching the unchanged partitions. To the best of our knowledge, although the one-pass distinct sampling has been researched in academia for some years, Oracle is the first commercial database that implements the technique. We have performed extensive experiments on both benchmark data and real customer data. Our experiments illustrate the this new method is highly accurate and has significantly better performance than the old method used in Oracle 10g.

very large data bases | 2015

Join size estimation subject to filter conditions

David Vengerov; Andre Cavalheiro Menck; Mohamed Zait; Sunil Chakkappen

In this paper, we present a new algorithm for estimating the size of equality join of multiple database tables. The proposed algorithm, Correlated Sampling, constructs a small space synopsis for each table, which can then be used to provide a quick estimate of the join size of this table with other tables subject to dynamically specified predicate filter conditions, possibly specified over multiple columns (attributes) of each table. This algorithm makes a single pass over the data and is thus suitable for streaming scenarios. We compare this algorithm analytically to two other previously known sampling approaches (independent Bernoulli Sampling and End-Biased Sampling) and to a novel sketch-based approach. We also compare these four algorithms experimentally and show that results fully correspond to our analytical predictions based on derived expressions for the estimator variances, with Correlated Sampling giving the best estimates in a large range of situations.

very large data bases | 2015

Distributed architecture of Oracle database in-memory

Niloy Mukherjee; Shasank Chavan; Maria Colgan; Dinesh Das; Michael J. Gleeson; Sanket Hase; Allison L. Holloway; Hui Jin; Jesse Kamp; Kartik Kulkarni; Tirthankar Lahiri; Juan R. Loaiza; Neil Macnaughton; Vineet Marwah; Atrayee Mullick; Andy Witkowski; Jiaqi Yan; Mohamed Zait

Over the last few years, the information technology industry has witnessed revolutions in multiple dimensions. Increasing ubiquitous sources of data have posed two connected challenges to data management solutions -- processing unprecedented volumes of data, and providing ad-hoc real-time analysis in mainstream production data stores without compromising regular transactional workload performance. In parallel, computer hardware systems are scaling out elastically, scaling up in the number of processors and cores, and increasing main memory capacity extensively. The data processing challenges combined with the rapid advancement of hardware systems has necessitated the evolution of a new breed of main-memory databases optimized for mixed OLTAP environments and designed to scale. The Oracle RDBMS In-memory Option (DBIM) is an industry-first distributed dual format architecture that allows a database object to be stored in columnar format in main memory highly optimized to break performance barriers in analytic query workloads, simultaneously maintaining transactional consistency with the corresponding OLTP optimized row-major format persisted in storage and accessed through database buffer cache. In this paper, we present the distributed, highly-available, and fault-tolerant architecture of the Oracle DBIM that enables the RDBMS to transparently scale out in a database cluster, both in terms of memory capacity and query processing throughput. We believe that the architecture is unique among all mainstream in-memory databases. It allows complete application-transparent, extremely scalable and automated distribution of Oracle RDBMS objects in-memory across a cluster, as well as across multiple NUMA nodes within a single server. It seamlessly provides distribution awareness to the Oracle SQL execution framework through affinitized fault-tolerant parallel execution within and across servers without explicit optimizer plan changes or query rewrites.

very large data bases | 2015

Query optimization in Oracle 12c database in-memory

Dinesh Das; Jiaqi Yan; Mohamed Zait; Satyanarayana R. Valluri; Nirav Vyas; Ramarajan Krishnamachari; Prashant Gaharwar; Jesse Kamp; Niloy Mukherjee

Traditional on-disk row major tables have been the dominant storage mechanism in relational databases for decades. Over the last decade, however, with explosive growth in data volume and demand for faster analytics, has come the recognition that a different data representation is needed. There is widespread agreement that in-memory column-oriented databases are best suited to meet the realities of this new world. Oracle 12c Database In-memory, the industrys first dual-format database, allows existing row major on-disk tables to have complementary in-memory columnar representations. The new storage format brings new data processing techniques and query execution algorithms and thus new challenges for the query optimizer. Execution plans that are optimal for one format may be sub-optimal for the other. In this paper, we describe the changes made in the query optimizer to generate execution plans optimized for the specific format -- row major or columnar -- that will be scanned during query execution. With enhancements in several areas -- statistics, cost model, query transformation, access path and join optimization, parallelism, and cluster-awareness -- the query optimizer plays a significant role in unlocking the full promise and performance of Oracle Database In-Memory.

conference on information and knowledge management | 2016

Approximate Aggregates in Oracle 12C

Hong Su; Mohamed Zait; Vladimir Barrière; Joseph Torres; Andre Cavalheiro Menck

New generation of analytic applications emerged to process data generated from non conventional sources. The challenge for the traditional database systems is that the data sets are very large and keep increasing at a very high rate while the application users have higher performance expectations. The most straightforward response to this challenge is to deploy larger hardware configurations making the solution very expensive and not acceptable for most cases. Alternative solutions fall into two categories: reduce the data set using sampling techniques or reduce the computational complexity of expensive database operations by using alternative algorithms. Alternative algorithms considered in this paper are approximate aggregates that perform a lot better at the cost of reduced and tolerable accuracy. In Oracle 12C we introduced approximate aggregates of expensive aggregate functions that are very common in analytic applications, that is, approximate count distinct and approximate percentile. The performance is improved in two ways. First, the approximate aggregates use bounded memory, often eliminating the need to use temporary storage which results in significant performance improvement over the exact aggregates. Second, we provide materialized view support that allows users to store pre-computed results of approximate aggregates. These results can be rolled up to answer queries on different dimensions (such rollup is not possible for exact aggregates).

very large data bases | 2008

Closing the query processing loop in Oracle 11g

Allison W. Lee; Mohamed Zait

The role of a query optimizer in a database system is to find the best execution plan for a given SQL statement based on statistics about the objects related to the tables referenced in the statement. These statistics include the tables themselves, their indexes, and other derived objects. Statistics include the number of rows, space utilization on disk, distribution of column values, etc. Optimization also relies on system statistics, such as the I/O bandwidth of the storage sub-system. All of this information is fed into a cost model. The cost model is used to compute the cost whenever the query optimizer needs to make a decision for an access path, join method, join order, or query transformation. The optimizer picks the alternative that yields the lowest cost. The quality of the final execution plan depends primarily on the quality of the information fed into the cost model as well as the cost model itself. In this paper, we discuss two of the problems that affect the quality of execution plans generated by the query optimizer: the cardinality of intermediate results and host variable values. We will give details of the solutions we introduced in Oracle 11g. Our approach establishes a bridge from the SQL execution engine to the SQL compiler. The bridge brings valuable information to help the query optimizer assess the impact of its decisions and make better decisions for future executions of the SQL statement. We illustrate the merits of our solutions based on experiments using the Oracle E-Business Suite workload.

Explore More