Sam Lightstone | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sam Lightstone is active.

Explore More

Publication

Featured researches published by Sam Lightstone.

very large data bases | 2013

DB2 with BLU acceleration: so much more than just a column store

Vijayshankar Raman; Gopi K. Attaluri; Ronald J. Barber; Naresh K. Chainani; David Kalmuk; Vincent Kulandaisamy; Jens Leenstra; Sam Lightstone; Shaorong Liu; Guy M. Lohman; Tim R Malkemus; Rene Mueller; Ippokratis Pandis; Berni Schiefer; David C. Sharpe; Richard S. Sidle; Adam J. Storm; Liping Zhang

DB2 with BLU Acceleration deeply integrates innovative new techniques for defining and processing column-organized tables that speed read-mostly Business Intelligence queries by 10 to 50 times and improve compression by 3 to 10 times, compared to traditional row-organized tables, without the complexity of defining indexes or materialized views on those tables. But DB2 BLU is much more than just a column store. Exploiting frequency-based dictionary compression and main-memory query processing technology from the Blink project at IBM Research - Almaden, DB2 BLU performs most SQL operations - predicate application (even range predicates and IN-lists), joins, and grouping - on the compressed values, which can be packed bit-aligned so densely that multiple values fit in a register and can be processed simultaneously via SIMD (single-instruction, multipledata) instructions. Designed and built from the ground up to exploit modern multi-core processors, DB2 BLUs hardware-conscious algorithms are carefully engineered to maximize parallelism by using novel data structures that need little latching, and to minimize data-cache and instruction-cache misses. Though DB2 BLU is optimized for in-memory processing, database size is not limited by the size of main memory. Fine-grained synopses, late materialization, and a new probabilistic buffer pool protocol for scans minimize disk I/Os, while aggressive prefetching reduces I/O stalls. Full integration with DB2 ensures that DB2 with BLU Acceleration benefits from the full functionality and robust utilities of a mature product, while still enjoying order-of-magnitude performance gains from revolutionary technology without even having to change the SQL, and can mix column-organized and row-organized tables in the same tablespace and even within the same query.

international conference on autonomic computing | 2004

Recommending materialized views and indexes with the IBM DB2 design advisor

D.C. Zilio; C. Zuzarte; Sam Lightstone; Wenbin Ma; G.M. Lohman; R.J. Cochrane; H. Pirahesh; L. Colby; J. Gryz; E. Alton; G. Valentin

Materialized views (MVs) and indexes both significantly speed query processing in database systems, but consume disk space and need to be maintained when updates occur. Choosing the best set of MVs and indexes to create depends upon the workload, the database, and many other factors, which makes the decision intractable for humans and computationally challenging for computer algorithms. Even heuristic-based algorithms can be impractical in real systems. In this paper, we present an advanced tool that uses the query optimizer itself to both suggest and evaluate candidate MVs and indexes, and a simple, practical, and effective algorithm for rapidly finding good solutions even for large workloads. The algorithm trades off the cost for updates and storing each MV or index against its benefit to queries in the workload. The tool autonomically captures the workload, database, and system information, optionally permits sampling of candidate MVs to better estimate their size, and exploits multi-query optimization to construct candidate MVs that will benefit many queries, over which their maintenance cost can then be amortized cost-effectively. We describe the design of the system and present initial experiments that confirm the quality of its results on a database and workload drawn from a real customer database.

international conference on management of data | 2002

Toward autonomic computing with DB2 universal database

Sam Lightstone; Guy M. Lohman; Daniel C. Zilio

As the cost of both hardware and software falls due to technological advancements and economies of scale, the cost of ownership for database applications is increasingly dominated by the cost of people to manage them. Databases are growing rapidly in scale and complexity, while skilled database administrators (DBAs) are becoming rarer and more expensive. This paper describes the self-managing or autonomic technology in IBMs DB2 Universal Database® for UNIX and Windows to illustrate how self-managing technology can reduce complexity, helping to reduce the total cost of ownership (TCO) of DBMSs and improve system performance.

very large data bases | 2004

Automated statistics collection in DB2 UDB

Ashraf Aboulnaga; Peter J. Haas; Mokhtar Kandil; Sam Lightstone; Guy M. Lohman; Volker Markl; Ivan Popivanov; Vijayshankar Raman

The use of inaccurate or outdated database statistics by the query optimizer in a relational DBMS often results in a poor choice of query execution plans and hence unacceptably long query processing times. Configuration and maintenance of these statistics has traditionally been a time-consuming manual operation, requiring that the database administrator (DBA) continually monitor query performance and data changes in order to determine when to refresh the statistics values and when and how to adjust the set of statistics that the DBMS maintains. In this paper we describe the new Automated Statistics Collection (ASC) component of IBM® DB2® Universal DatabaseTM (DB2 UDB). This autonomic technology frees the DBA from the tedious task of manually supervising the collection and maintenance of database statistics. ASC monitors both the update-delete-insert (UDI) activities on the data as well as query feedback (QF), i.e., the results of the queries that are executed on the data. ASC uses these two sources of information to automatically decide which statistics to collect and when to collect them. This combination of UDI-driven and QF-driven autonomic processes ensures that the system can handle unforeseen queries while also ensuring good performance for frequent and important queries. We present the basic concepts, architecture, and key implementation details of ASC in DB2 UDB, and present a case study showing how the use of ASC can speed up a query workload by orders of magnitude without requiring any DBA intervention.

real time technology and applications symposium | 2004

Incorporating cost of control into the design of a load balancing controller

Yixin Diao; Joseph L. Hellerstein; Adam J. Storm; Maheswaran Surendra; Sam Lightstone; Sujay Parekh; C. Garcia Arellano

Load balancing is widely used in computing systems as a way to optimize performance by reducing bottleneck utilizations, such as adjusting the size of buffer pools to balance resource demands in a database management system. Load balancing is generally approached as a constrained optimization problem in which only the benefits of load balancing are considered. However, the costs of control are important as well. Herein, we study the value of including in controller design the trade-off between the cost of transient imbalances in resource utilizations and the cost of changing resource allocations. An example of the latter are actions such as resizing buffer pools that can reduce throughputs. This is because requests for data in pools whose memory is reduced immediately have longer access times whereas requests for data in pools whose memory is increased must fill this memory with data from disk before accessed times are reduced. We frame our study of control costs in terms of the widely used linear quadratic regulator (LQR). We develop a cost model that allows us to specify the LQR Q and R matrices based on the impact on system performance of changing resource allocations and transient load imbalances. Our studies of a DB2 universal database server using benchmarks for online transaction processing and decision support workloads show that incorporating our cost model into the MIMO LQR controller results in a 14% improvement in performance beyond that achieved by dynamically allocating the size of buffers without properly considering the cost of control.

distributed systems operations and management | 2003

Managing the Performance Impact of Administrative Utilities

Sujay Parekh; Kevin R. Rose; Joseph L. Hellerstein; Sam Lightstone; Matthew A. Huras; Victor Chang

Administrative utilities (e.g., filesystem and database backups, garbage collection in the Java Virtual Machines) are an essential part of the operation of production systems. Since production work can be severely degraded by the execution of such utilities, it is desirable to have policies of the form “There should be no more than an x% degradation of production work due to utility execution.” Two challenges arise in providing such policies: (1) providing an effective mechanism for throttling the resource consumption of utilities and (2) continuously translating from policy expressions of “degradation units” into the appropriate settings for the throttling mechanism. We address (1) by using self-imposed sleep, a technique that forces utilities to slow down their processing by a configurable amount. We address (2) by employing an online estimation scheme in combination with a feedback loop. This throttling system is autonomous and adaptive and allows the system to self-manage its utilities to limit their performance impact, with only high-level policy input from the administrator. We demonstrate the effectiveness of these approaches in a prototype system that incorporates these capabilities into IBM’s DB2 Universal Database server.

very large data bases | 2002

SMART: making DB2 (more) autonomic

Guy M. Lohman; Sam Lightstone

IBMs SMART (Self-Managing And Resource Tuning) project aims to make DB2 self-managing, i.e. autonomic, to decrease the total cost of ownership and penetrate new markets. Over several releases, increasingly sophisticated SMART features will ease administrative tasks such as initial deployment, database design, system maintenance, problem determination, and ensuring system availability and recovery.

international conference on autonomic computing | 2004

Benchmarking autonomic capabilities: promises and pitfalls

Aaron B. Brown; Joseph L. Hellerstein; Matt R. Hogstrom; Tony Lau; Sam Lightstone; Peter K. Shum; Mary Peterson Yost

Benchmarks provide a way to quantify progress in a field. Our goal is to produce a suite of benchmarks covering the four categories of autonomic capabilities: self-configuring, self-healing, self-optimizing, and self-protecting (IBM, 2003). This is not an easy task, however, and in this paper we identify several of the challenges and pitfalls that must be confronted to extend benchmarking technology beyond its traditional basis in performance evaluation.

very large data bases | 2014

Memory-efficient hash joins

Ronald J. Barber; Guy M. Lohman; Ippokratis Pandis; Vijayshankar Raman; Richard S. Sidle; Gopi K. Attaluri; Naresh K. Chainani; Sam Lightstone; David C. Sharpe

We present new hash tables for joins, and a hash join based on them, that consumes far less memory and is usually faster than recently published in-memory joins. Our hash join is not restricted to outer tables that fit wholly in memory. Key to this hash join is a new concise hash table (CHT), a linear probing hash table that has 100% fill factor, and uses a sparse bitmap with embedded population counts to almost entirely avoid collisions. This bitmap also serves as a Bloom filter for use in multi-table joins. We study the random access characteristics of hash joins, and renew the case for non-partitioned hash joins. We introduce a variant of partitioned joins in which only the build is partitioned, but the probe is not, as this is more efficient for large outer tables than traditional partitioned joins. This also avoids partitioning costs during the probe, while at the same time allowing parallel build without latching overheads. Additionally, we present a variant of CHT, called a concise array table (CAT), that can be used when the key domain is moderately dense. CAT is collision-free and avoids storing join keys in the hash table. We perform a detailed comparison of CHT and CAT against leading in-memory hash joins. Our experiments show that we can reduce the memory usage by one to three orders of magnitude, while also being competitive in performance.

Database Modeling and Design (Fourth Edition)#R##N#Logical Design | 2006

The Unified Modeling Language (UML)

Toby J. Teorey; Sam Lightstone; Tom Nadeau

This chapter is an overview of the syntax and semantics of the UML class and activity diagram constructs. The Unified Modeling Language (UML) is a graphical language for communicating design specifications for software. The object-oriented software development community created UML to meet the special needs of describing object-oriented software design. UML has grown into a standard for the design of digital systems in general. There are a number of different types of UML diagrams serving various purposes. The class and activity diagram types are particularly useful for discussing database design issues. UML class diagrams capture the structural aspects found in database schemas. UML activity diagrams facilitate discussion on the dynamic processes involved in database design. UML class diagrams and entity-relationship (ER) models are similar in both form and semantics. The original creators of UML point out the influence of ER models on the origins of class diagrams. The influence of UML has in turn affected the database community. Class diagrams appear frequently in the database literature to describe database schemas. A class is a descriptor for a set of objects that share some attributes and/ or operations.

Explore More