Richard A. Hankins
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Richard A. Hankins.
international conference on management of data | 2008
Yuanyuan Tian; Richard A. Hankins; Jignesh M. Patel
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hop-plots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control.n In this paper, we introduce two database-style operations to summarize graphs. Like the OLAP-style aggregation methods that allow users to drill-down or roll-up to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on user-selected node attributes and relationships. The second operation, called k-SNAP, further allows users to control the resolutions of summaries and provides the drill-down and roll-up abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the k-SNAP computation is NP-complete. We propose two heuristic methods to approximate the k-SNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.
international symposium on computer architecture | 2006
Richard A. Hankins; Gautham N. Chinya; Jamison D. Collins; Perry H. Wang; Ryan N. Rakvic; Hong Wang; John Paul Shen
Microprocessor design is undergoing a major paradigm shift towards multi-core designs, in anticipation that future performance gains will come from exploiting threadlevel parallelism in the software. To support this trend, we present a novel processor architecture called the Multiple Instruction Stream Processing (MISP) architecture. MISP introduces the sequencer as a new category of architectural resource, and defines a canonical set of instructions to support user-level inter-sequencer signaling and asynchronous control transfer. MISP allows an application program to directly manage user-level threads without OS intervention. By supporting the classic cache-coherent shared-memory programming model, MISP does not require a radical shift in the multithreaded programming paradigm. This paper describes the design and evaluation of the MISP architecture for the IA-32 family of microprocessors. Using a research prototype MISP processor built on an IA-32-based multiprocessor system equipped with special firmware, we demonstrate the feasibility of implementing the MISP architecture. We then examine the utility of MISP by (1) assessing the key architectural tradeoffs of the MISP architecture design and (2) showing how legacy multithreaded applications can be migrated to MISP with relative ease.
very large data bases | 2003
Richard A. Hankins; Jignesh M. Patel
The number of processor cache misses has a critical impact on the performance of DBMSs running on servers with large main-memory configurations. In turn, the cache utilization of database systems is highly dependent on the physical organization of the records in main-memory. A recently proposed storage model, called PAX, was shown to greatly improve the performance of sequential file-scan operations when compared to the commonly implemented N-ary storage model. However, the PAX storage model can also demonstrate poor cache utilization for other common operations, such as index scans. Under a workload of heterogenous database operations, neither the PAX storage model nor the N-ary storage model is optimal. n nIn this paper, we propose a flexible data storage technique called Data Morphing. Using Data Morphing, a cache-efficient attribute layout, called a partition, is first determined through an analysis of the query workload. This partition is then used as a template for storing data in a cache-efficient way. We present two algorithms for computing partitions, and also present a versatile storage model that accommodates the dynamic reorganization of the attributes in a file. Finally, we experimentally demonstrate that the Data Morphing technique provides a significant performance improvement over both the traditional N-ary storage model and the PAX model.
very large data bases | 2005
Yuanyuan Tian; Sandeep Tata; Richard A. Hankins; Jignesh M. Patel
Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a common and critical operation in many of these applications. The suffix tree is a versatile data structure that can be used to evaluate a wide variety of queries on sequence datasets, including evaluating exact and approximate string matches, and finding repeat patterns. However, methods for constructing suffix trees are often very time-consuming, especially for suffix trees that are large and do not fit in the available main memory. Even when the suffix tree fits in memory, it turns out that the processor cache behavior of theoretically optimal suffix tree construction methods is poor, resulting in poor performance. Currently, there are a large number of algorithms for constructing suffix trees, but the practical tradeoffs in using these algorithms for different scenarios are not well characterized.In this paper, we explore suffix tree construction algorithms over a wide spectrum of data sources and sizes. First, we show that on modern processors, a cache-efficient algorithm with O(n2) worst-case complexity outperforms popular linear time algorithms like Ukkonen and McCreight, even for in-memory construction. For larger datasets, the disk I/O requirement quickly becomes the bottleneck in each algorithms performance. To address this problem, we describe two approaches. First, we present a buffer management strategy for the O(n2) algorithm. The resulting new algorithm, which we call “Top Down Disk-based” (TDD), scales to sizes much larger than have been previously described in literature. This approach far outperforms the best known disk-based construction methods. Second, we present a new disk-based suffix tree construction algorithm that is based on a sort-merge paradigm, and show that for constructing very large suffix trees with very little resources, this algorithm is more efficient than TDD.
very large data bases | 2004
Sandeep Tata; Richard A. Hankins; Jignesh M. Patel
Large string datasets are common in a number of emerging text and biological database applications. Common queries over such datasets include both exact and approximate string matches. These queries can be evaluated very efficiently by using a suffix tree index on the string dataset. Although suffix trees can be constructed quickly in memory for small input datasets, constructing persistent trees for large datasets has been challenging. In this paper, we explore suffix tree construction algorithms over a wide spectrum of data sources and sizes. First, we show that on modern processors, a cache-efficient algorithm with O(n2) complexity outperforms the popular O(n) Ukkonen algorithm, even for in-memory construction. For larger datasets, the disk I/O requirement quickly becomes the bottleneck in each algorithms performance. To address this problem, we present a buffer management strategy for the O(n2) algorithm, creating a new disk-based construction algorithm that scales to sizes much larger than have been previously described in the literature. Our approach far outperforms the best known disk-based construction algorithms.
international symposium on microarchitecture | 2003
Richard A. Hankins; Trung A. Diep; Murali Annavaram; Brian Hirano; Harald Eri; Hubert Nueckel; John Paul Shen
On-line transaction processing (OLTP) workloads are crucial benchmarks for the design and analysis of server processors. Typical cached configurations used by researchers to simulate OLTP workloads are orders of magnitude smaller than the fully scaled configurations used by OEM vendors to achieve world-record transaction processing throughput. The objective of this study is to discover the underlying relationships that characterize OLTP performance over a wide range of configurations. To this end, we have derived the iron law of database performance. Using our iron law, we show that both the average instructions executed per transaction (IPX) and the average cycles per instruction (CPI) are critical to the transaction-throughput performance. We use an extensive, empirical examination of an Oracle based commercial OLTP workload on an Intel Xeon multiprocessor system to characterize the scaling behaviour of both the IPX and the CPI. We demonstrate that across a wide range of configurations the IPX and CPI behaviour follows predictable trends, which can be accurately characterized by simple linear or piece-wise linear approximations. Based on our data, we propose a method for selecting a minimal, representative workload configuration from which behaviours of much larger OLTP configurations can be accurately extrapolated.
international symposium on microarchitecture | 2004
Murali Annavaram; Ryan N. Rakvic; Marzia Polito; Jean-Yves Bouguet; Richard A. Hankins; Bob Davies
Recent studies have shown that most SPEC CPU2K benchmarks exhibit strong phase behavior, and the Cycles per Instruction (CPI) performance metric can be accurately predicted based on programs control-flow behavior, by simply observing the sequencing of the program counters, or extended instruction pointers (EIPs). One motivation of this paper is to see if server workloads also exhibit such phase behavior. In particular, can EIPs effectively predict CPI in server workloads? We propose using regression trees to measure the theoretical upper bound on the accuracy of predicting the CPI using EIPs, where accuracy is measure by the explained variance of CPI with EIPs. Our results show that for most server workloads and, surprisingly, even for CPU2K benchmarks, the accuracy of predicting CPI from EIPs varies widely. We classify the benchmarks into four quadrants based on their CPI variance and predictability of CPI using EIPs. Our results indicate that no single sampling technique can be broadly applied to a large class of applications. We propose a new methodology that selects the best-suited sampling technique to accurately capture the program behavior.
measurement and modeling of computer systems | 2003
Richard A. Hankins; Jignesh M. Patel
In main-memory databases, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve performance by reducing the number of processor cache misses that are incurred during a search operation. Conventional wisdom suggests that the indexs node size should be equal to the cache line size in order to minimize the number of cache misses and improve performance. As we show in this paper, this design choice ignores additional effects, such as the number of instructions executed and the number of TLB misses, which play a significant role in determining the overall performance. To capture the impact of node size on the performance of a cache-conscious B+ tree (CSB+-tree), we first develop an analytical model based on the fundamental components of the search process. This model is then validated with an actual implementation, demonstrating that the model is accurate. Both the analytical model and experiments confirm that using node sizes much larger than the cache line size can result in better search performance for the CSB+-tree.
conference on information and knowledge management | 2009
Yiming Ma; Richard A. Hankins; David Racz
Much research focuses on predicting a persons geo-spatial traversal patterns using a history of recorded geo-coordinates. In this paper, we focus on the problem of predicting location-state transitions. Location-states for a user refer to a set of anchoring points/regions in space, and the prediction task produces a sequence of predicted location states for a given query time window. If this problem can be solved accurately and efficiently, it may lead to new location based services (LBS) that can smartly recommend information to a user based on his current and future location states. The proposed iLoc (Incremental (Location-State Acquisition and Prediction) framework solves the prediction problem by utilizing the sensor information provided by a users mobile device. It incrementally learns the location states by constantly monitoring the signal environment of the mobile device. Further, the framework tightly integrates the learning and prediction modules, allowing iLoc to update location-states continuously and predict future location-states at the same time. Our extensive experiments show that the quality of the location-states learned by iLoc are better than the state-of-the-art. We also show that when other learners failed to produce reasonable predictions, iLoc provides good forecasts. As for the efficiency, iLoc processes the data in a single pass, which fits well to many data stream processing models.
Archive | 2005
Hong Wang; John Paul Shen; Ed Grochowski; James Paul Held; Bryant Bigbee; Shivnandan D. Kaushik; Gautham N. Chinya; Xiang Zou; Per Hammarlund; Xinmin Tian; Anil Aggarwal; Scott Dion Rodgers; Prashant Sethi; Baiju V. Patel; Richard A. Hankins