Berni Schiefer
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Berni Schiefer.
very large data bases | 2013
Vijayshankar Raman; Gopi K. Attaluri; Ronald J. Barber; Naresh K. Chainani; David Kalmuk; Vincent Kulandaisamy; Jens Leenstra; Sam Lightstone; Shaorong Liu; Guy M. Lohman; Tim R Malkemus; Rene Mueller; Ippokratis Pandis; Berni Schiefer; David C. Sharpe; Richard S. Sidle; Adam J. Storm; Liping Zhang
DB2 with BLU Acceleration deeply integrates innovative new techniques for defining and processing column-organized tables that speed read-mostly Business Intelligence queries by 10 to 50 times and improve compression by 3 to 10 times, compared to traditional row-organized tables, without the complexity of defining indexes or materialized views on those tables. But DB2 BLU is much more than just a column store. Exploiting frequency-based dictionary compression and main-memory query processing technology from the Blink project at IBM Research - Almaden, DB2 BLU performs most SQL operations - predicate application (even range predicates and IN-lists), joins, and grouping - on the compressed values, which can be packed bit-aligned so densely that multiple values fit in a register and can be processed simultaneously via SIMD (single-instruction, multipledata) instructions. Designed and built from the ground up to exploit modern multi-core processors, DB2 BLUs hardware-conscious algorithms are carefully engineered to maximize parallelism by using novel data structures that need little latching, and to minimize data-cache and instruction-cache misses. Though DB2 BLU is optimized for in-memory processing, database size is not limited by the size of main memory. Fine-grained synopses, late materialization, and a new probabilistic buffer pool protocol for scans minimize disk I/Os, while aggressive prefetching reduces I/O stalls. Full integration with DB2 ensures that DB2 with BLU Acceleration benefits from the full functionality and robust utilities of a mature product, while still enjoying order-of-magnitude performance gains from revolutionary technology without even having to change the SQL, and can mix column-organized and row-organized tables in the same tablespace and even within the same query.
measurement and modeling of computer systems | 2005
Zhifeng Chen; Yan Zhang; Yuanyuan Zhou; Heidi Scott; Berni Schiefer
To bridge the increasing processor-disk performance gap, buffer caches are used in both storage clients (e.g. database systems) and storage servers to reduce the number of slow disk accesses. These buffer caches need to be managed effectively to deliver the performance commensurate to the aggregate buffer cache size. To address this problem, two paradigms have been proposed recently to collaboratively manage these buffer caches together: the hierarchy-aware caching maintains the same I/O interface and is fully transparent to the storage client software, and the aggressively-collaborative caching trades off transparency for performance and requires changes to both the interface and the storage client software. Before storage industry starts to implement collaborative caching in real systems, it is crucial to find out whether sacrificing transparency is really worthwhile, i.e., how much can we gain by using the aggressively-collaborative caching instead of the hierarchy-aware caching? To accurately answer this question, it is required to consider all possible combinations of recently proposed local replacement algorithms and optimization techniques in both collaboration paradigms.Our study provides an empirical evaluation to address the above questions. Particularly, we have compared three aggressively-collaborative approaches with two hierarchy-aware approaches for four different types of database/file I/O workloads using traces collected from real commercial systems such as IBM DB2. More importantly, we separate the effects of collaborative caching from local replacement algorithms and optimizations, and uniformly apply several recently proposed local replacement algorithms and optimizations to all five collaboration approaches.When appropriate local optimizations and replacement algorithms are uniformly applied to both hierarchy-aware and aggressively-collaborative caching, the results indicate that hierarchy-aware caching can deliver similar performance as aggressively-collaborative caching. The results show that the aggressively-collaborative caching only provides less than 2.5% performance improvement on average in simulation and 1.0% in real system experiments over the hierarchy-aware caching for most workloads and cache configurations. Our sensitivity study indicates that the performance gain of aggressively-collaborative caching is also very small for various storage networks and different cache configurations. Therefore, considering its simplicity and generality, hierarchy-aware caching is more feasible than aggressively-collaborative caching.
international conference on data engineering | 2015
Ronald J. Barber; Guy M. Lohman; Vijayshankar Raman; Richard S. Sidle; Sam Lightstone; Berni Schiefer
Although the DRAM for main memories of systems continues to grow exponentially according to Moores Law and to become less expensive, we argue that memory hierarchies will always exist for many reasons, both economic and practical, and in particular due to concurrent users competing for working memory to perform joins and grouping. We present the in-memory BLU Acceleration used in IBMs DB2 for Linux, UNIX, and Windows, and now also the dashDB cloud offering, which was designed and implemented from the ground up to exploit main memory but is not limited to what fits in memory and does not require manual management of what to retain in memory, as its competitors do. In fact, BLU Acceleration views memory as too slow, and is carefully engineered to work in higher levels of the system cache by keeping the data encoded and packed densely into bit-aligned vectors that can exploit SIMD instructions in processing queries. To achieve scalable multi-core parallelism, BLU assigns to each thread independent data structures, or partitions thereof, designed to have low synchronization costs, and doles out batches of values to threads. On customer workloads, BLU has improved performance on complex analytics queries by 10 to 50 times, compared to the legacy row-organized run-time, while also significantly simplifying database administration, shortening time to value, and improving data compression. UPDATE and DELETE performance was improved by up to 112 times with the new Cancun release of DB2 with BLU Acceleration, which also added Shadow Tables for high performance on mixed OLTP and BI analytics workloads, and extended DB2s High Availability Disaster Recovery (HADR) and SQL compatibility features to BLUs column-organized tables.
intelligent information systems | 2008
Said Elnaffar; Patrick Martin; Berni Schiefer; Sam Lightstone
The type of the workload on a database management system (DBMS) is a key consideration in tuning the system. Allocations for resources such as main memory can be very different depending on whether the workload type is Online Transaction Processing (OLTP) or Decision Support System (DSS). A DBMS also typically experiences changes in the type of workload it handles during its normal processing cycle. Database administrators must therefore recognize the significant shifts of workload type that demand reconfiguring the system in order to maintain acceptable levels of performance. We envision intelligent, autonomic DBMSs that have the capability to manage their own performance by automatically recognizing the workload type and then reconfiguring their resources accordingly. In this paper, we present an approach to automatically identifying a DBMS workload as either OLTP or DSS. Using data mining techniques, we build a classification model based on the most significant workload characteristics that differentiate OLTP from DSS and then use the model to identify any change in the workload type. We construct and compare classifiers built from two different sets of workloads, namely the TPC-C and TPC-H benchmarks and the Browsing and Ordering profiles from the TPC-W benchmark. We demonstrate the feasibility and success of these classifiers with TPC-generated workloads and with industry-supplied workloads.
Ibm Journal of Research and Development | 2015
Balaram Sinharoy; Randal C. Swanberg; Naresh Nayar; Bruce Mealey; Jeffrey A. Stuecheli; Berni Schiefer; Jens Leenstra; J. Jann; Philipp Oehler; David Stephen Levitan; Susan E. Eisen; D. Sanner; Thomas Pflueger; Cedric Lichtenau; William E. Hall; T. Block
The IBM POWER8™ processor includes many innovative features that enable efficient and flexible computing, along with enhancements in virtualization, security, and serviceability. These features benefit application performance, and big data and analytics computing, as well as the cloud environment. Notable features include the capabilities to dynamically and efficiently change the number of threads active on a processor, enhancing application performance via integer vector operations, encryption accelerations, and reference history arrays. Also notable is improved virtual machine density (supporting multiple simultaneous partitions per core and providing fine-grain power management), allowing continuous monitoring of system performance as well as significantly enhanced system RAS (reliability, availability, and serviceability) and security. Each of these features is technologically complex and advanced. This paper provides an in-depth description of some of these features and their exploitation through systems software and middleware. These features will continue to bring value to the system-of-record workloads in the enterprise. They also make POWER8 systems well-suited for serving the needs of newer workloads such as big data and analytics, while efficiently supporting deployment in cloud environments.
Technology Conference on Performance Evaluation and Benchmarking | 2015
Dakshi Agrawal; Ali Raza Butt; Josep-lluis Larriba-pey; Min Li; Frederick R. Reiss; Francois Raab; Berni Schiefer; Toyotaro Suzumura; Yinglong Xia
Spark has emerged as an easy to use, scalable, robust and fast system for analytics with a rapidly growing and vibrant community of users and contributors. It is multipurpose—with extensive and modular infrastructure for machine learning, graph processing, SQL, streaming, statistical processing, and more. Its rapid adoption therefore calls for a performance assessment suite that supports agile development, measurement, validation, optimization, configuration, and deployment decisions across a broad range of platform environments and test cases.
The Computer Journal | 2004
Qiang Zhu; Brian Dunkel; Wing Lau; Suyun Chen; Berni Schiefer
A database management system (DBMS) performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to inefficient query processing in the system. The existing utility method, which collects statistics in batch mode, suffers from drawbacks such as heavy administrative burden, high system load and tardy updates. In this paper, we study approaches to performing statistical analysis on the fly during query execution, taking advantage of data already resident in main memory. We propose a framework for on-the-fly statistics collection, which we term piggybacking, and analyze the tradeoffs of piggybacking various statistics collection techniques on top of query execution plans. We present a multiple-granularity interleaving algorithm to integrate a set of piggyback operations with an execution plan, and show how the algorithm can be incorporated into an existing query optimizer. Our experiments demonstrate that useful statistics can be obtained via the piggyback method with a small overhead.
Proceedings of the 1st Workshop on Architectures and Systems for Big Data | 2011
Alexander Alexandrov; Berni Schiefer; John Poelman; Stephan Ewen; Thomas O. Bodner; Volker Markl
The need for efficient data generation for the purposes of testing and benchmarking newly developed massively-parallel data processing systems has increased with the emergence of Big Data problems. As synthetic data model specifications evolve over time, the data generator programs implementing these models have to be adapted continuously -- a task that often becomes more tedious as the set of model constraints grows. In this paper we present Myriad - a new parallel data generation toolkit. Data generators created with the toolkit can quickly produce very large datasets in a shared-nothing parallel execution environment, while at the same time preserve with cross-partition dependencies, correlations and distributions in the generated data. In addition, we report on our efforts towards a benchmark suite for large-scale parallel analysis systems that uses Myriad for the generation of OLAP-style relational datasets.
international conference on management of data | 2016
Sina Meraji; Berni Schiefer; Lan Pham; Lee Chu; Peter Kokosielis; Adam J. Storm; Wayne Young; Geoffrey Ng; Kajan Kanagaratnam
In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2s column store technology). Moreover, we show the benefits and problems of using hardware accelerators (more specifically GPUs) in a real commercial Relational Database Management System(RDBMS).We investigate the effect of off-loading specific database operations to a GPU, and show how doing so results in a significant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more beneficial. While we use some of Nvidias fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the first time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.
Workshop on Big Data Benchmarks | 2014
Avrilia Floratou; Fatma Ozcan; Berni Schiefer
Benchmarks are important tools to evaluate systems, as long as their results are transparent, reproducible and they are conducted with due diligence. Today, many SQL-on-Hadoop vendors use the data generators and the queries of existing TPC benchmarks, but fail to adhere to the rules, producing results that are not transparent. As the SQL-on-Hadoop movement continues to gain more traction, it is important to bring some order to this “wild west” of benchmarking. First, new rules and policies should be defined to satisfy the demands of the new generation SQL systems. The new benchmark evaluation schemes should be inexpensive, effective and open enough to embrace the variety of SQL-on-Hadoop systems and their corresponding vendors. Second, adhering to the new standards requires industry commitment and collaboration. In this paper, we discuss the problems we observe in the current practices of benchmarking, and present our proposal for bringing standardization in the SQL-on-Hadoop space.