Is this you? Create Your Porfile

Tim Kaldewey

University of California, Santa Cruz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tim Kaldewey is active.

Explore More

Publication

Featured researches published by Tim Kaldewey.

international conference on management of data | 2010

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Changkyu Kim; Jatin Chhugani; Nadathur Satish; Eric Sedlar; Anthony D. Nguyen; Tim Kaldewey; Victor W. Lee; Scott A. Brandt; Pradeep Dubey

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal. In this paper, we present FAST, an extremely fast architecture sensitive layout of the index tree. FAST is a binary tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware. FAST eliminates impact of memory latency, and exploits thread-level and datalevel parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second, 5X (CPU) and 1.7X (GPU) faster than the best previously reported performance on the same architectures. FAST supports efficient bulk updates by rebuilding index trees in less than 0.1 seconds for datasets as large as 64Mkeys and naturally integrates compression techniques, overcoming the memory bandwidth bottleneck and achieving a 6X performance improvement over uncompressed index search for large keys on CPUs.

very large data bases | 2009

Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Changkyu Kim; Tim Kaldewey; Victor W. Lee; Eric Sedlar; Anthony D. Nguyen; Nadathur Satish; Jatin Chhugani; Andrea Di Blas; Pradeep Dubey

Join is an important database operation. As computer architectures evolve, the best join algorithm may change hand. This paper re-examines two popular join algorithms -- hash join and sort-merge join -- to determine if the latest computer architecture trends shift the tide that has favored hash join for many years. For a fair comparison, we implemented the most optimized parallel version of both algorithms on the latest Intel Core i7 platform. Both implementations scale well with the number of cores in the system and take advantages of latest processor features for performance. Our hash-based implementation achieves more than 100M tuples per second which is 17X faster than the best reported performance on CPUs and 8X faster than that reported for GPUs. Moreover, the performance of our hash join implementation is consistent over a wide range of input data sizes from 64K to 128M tuples and is not affected by data skew. We compare this implementation to our highly optimized sort-based implementation that achieves 47M to 80M tuples per second. We developed analytical models to study how both algorithms would scale with upcoming processor architecture trends. Our analysis projects that current architectural trends of wider SIMD, more cores, and smaller memory bandwidth per core imply better scalability potential for sort-merge join. Consequently, sort-merge join is likely to outperform hash join on upcoming chip multiprocessors. In summary, we offer multicore implementations of hash join and sort-merge join which consistently outperform all previously reported results. We further conclude that the tide that favors the hash join algorithm has not changed yet, but the change is just around the corner.

european conference on computer systems | 2008

Efficient guaranteed disk request scheduling with fahrrad

Anna Povzner; Tim Kaldewey; Scott A. Brandt; Richard A. Golding; Theodore M. Wong; Carlos Maltzahn

Guaranteed I/O performance is needed for a variety of applications ranging from real-time data collection to desktop multimedia to large-scale scientific simulations. Reservations on throughput, the standard measure of disk performance, fail to effectively manage disk performance due to the orders of magnitude difference between best-, average-, and worst-case response times, allowing reservation of less than 0.01% of the achievable bandwidth. We show that by reserving disk resources in terms of utilization it is possible to create a disk scheduler that supports reservation of nearly 100% of the disk resources, provides arbitrarily hard or soft guarantees depending upon application needs, and yields efficiency as good or better than best-effort disk schedulers tuned for performance. We present the architecture of our scheduler, prove the correctness of its algorithms, and provide results demonstrating its effectiveness.

real-time systems symposium | 2006

Diverse Soft Real-Time Processing in an Integrated System

Caixue Lin; Tim Kaldewey; Anna Povzner; Scott A. Brandt

The simple notion of soft real-time processing has fractured into a spectrum of diverse soft real-time types with a variety of different resource and time constraints. Schedulers have been developed for each of these types, but these are essentially point solutions in the space of soft real-time and no detailed unified definition of soft real-time has previously been provided that includes all types of soft realtime processing. We present a complete real-time taxonomy covering the spectrum of processes from best-effort to hard real-time. The taxonomy divides processes into nine classes based on their resource and timeliness requirements and includes four soft real-time classes, each of which captures a group of soft real-time applications with similar characteristics. We exploit the different features of each of the soft real-time classes to integrate all of them into a single scheduler together with hard real-time and best-effort processes and present results demonstrating their performance

IEEE Spectrum | 2009

Data monster

A. Di Blas; Tim Kaldewey

Why graphics processors will transform database processing?. The graphics coprocessor, invented in the 1970s to churn through voluminous and repetitive calculations and render smooth and realistic-looking images on computer screens, can now chew on large-scale databases. Database processing is a cornerstone of computing, and it is a market that last year generated approximately US

ACM Transactions on Database Systems | 2011

Designing fast architecture-sensitive tree search on modern multicore/many-core processors

Changkyu Kim; Jatin Chhugani; Nadathur Satish; Eric Sedlar; Anthony D. Nguyen; Tim Kaldewey; Victor W. Lee; Scott A. Brandt; Pradeep Dubey

27 billion, according to technology analysis firm Forrester Research, in Cambridge, Mass. The firm projects that this number-which includes new database licenses, technical support, and consulting-will grow to

petascale data storage workshop | 2007

End-to-end performance management for scalable distributed storage

David O. Bigelow; Suresh Iyer; Tim Kaldewey; Roberto C. Pineiro; Anna Povzner; Scott A. Brandt; Richard A. Golding; Theodore M. Wong; Carlos Maltzahn

32 billion by 2013. Every time you bid on an eBay auction, search for a movie on Netflix, look for a Kindle title on Amazon, or do a Google search, massive database applications spring into action, delving into huge quantities of data spread across tens of thousands of machines. This radical new task for graphics chips evolved from their role as the engine of computer games. So what does sifting enterprise-class databases have in common with rendering virtual monsters in a game? Both require handling huge amounts of data: Realistic-looking virtual monsters require generating millions of pixels every second, while searching large databases involves accessing millions of records per second. So why not take the same hardware that accelerates virtual monsters and put it to work on real-world applications, like the databases that are a large part of our daily livesa-more so than pixel monsters?

GPU Computing Gems Jade Edition | 2012

Large-Scale GPU Search

Tim Kaldewey; Andrea Di Blas

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join, and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal. In this article, we present FAST, an extremely fast architecture-sensitive layout of the index tree. FAST is a binary tree logically organized to optimize for architecture features like page size, cache line size, and Single Instruction Multiple Data (SIMD) width of the underlying hardware. FAST eliminates the impact of memory latency, and exploits thread-level and data-level parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second for large trees of 64M elements, with even better results on smaller trees. These are 5X (CPU) and 1.7X (GPU) faster than the best previously reported performance on the same architectures. We also evaluated FAST on the Intel

usenix conference on hot topics in parallelism | 2009