Calvin Lin
University of Texas at Austin
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Calvin Lin.
IEEE Computer | 2004
Doug Burger; Stephen W. Keckler; Kathryn S. McKinley; Michael Dahlin; Lizy Kurian John; Calvin Lin; Charles R. Moore; James H. Burrill; Robert McDonald; William Yoder
Microprocessor designs are on the verge of a post-RISC era in which companies must introduce new ISAs to address the challenges that modern CMOS technologies pose while also exploiting the massive levels of integration now possible. To meet these challenges, we have developed a new class of ISAs, called explicit data graph execution (EDGE), that will match the characteristics of semiconductor technology over the next decade. The TRIPS architecture is the first instantiation of an EDGE instruction set, a new, post-RISC class of instruction set architectures intended to match semiconductor technology evolution over the next decade, scaling to new levels of power efficiency and high performance.
high performance computer architecture | 2001
Daniel A. Jiménez; Calvin Lin
This paper presents a new method for branch prediction. The key idea is to use one of the simplest possible neural networks, the perceptron, as an alternative to the commonly used two-bit counters. Our predictor achieves increased accuracy by making use of long branch histories, which are possible becasue the hardware resources for our method scale linearly with the history length. By contrast, other purely dynamic schemes require exponential resources. We describe our design and evaluate it with respect to two well known predictors. We show that for a 4K byte hardware budget our method improves misprediction rates for the SPEC 2000 benchmarks by 10.1% over the gshare predictor. Our experiments also provide a better understanding of the situations in which traditional predictors do and do not perform well. Finally, we describe techniques that allow our complex predictor to operate in one cycle.
IEEE Transactions on Knowledge and Data Engineering | 1999
Jian Yin; Lorenzo Alvisi; Michael Dahlin; Calvin Lin
This article introduces volume leases as a mechanism for providing server-driven cache consistency for large-scale, geographically distributed networks. Volume leases retain the good performance, fault tolerance, and server scalability of the semantically weaker client-driven protocols that are now used on the Web. Volume leases are a variation of object leases, which were originally designed for distributed file systems. However, whereas traditional object leases amortize overheads over long lease periods, volume leases exploit spatial locality to amortize overheads across multiple objects in a volume. This approach allows systems to maintain good write performance even in the presence of failures. Using trace-driven simulation, we compare three volume lease algorithms against four existing cache consistency algorithms and show that our new algorithms provide strong consistency while maintaining scalability and fault-tolerance. For a trace-based workload of Web accesses, we find that volumes can reduce message traffic at servers by 40 percent compared to a standard lease algorithm, and that volumes can considerably reduce the peak load at servers when popular objects are modified.
international symposium on microarchitecture | 2000
Daniel A. Jiménez; Stephen W. Keckler; Calvin Lin
Modern microprocessors employ increasingly complicated branch predictors to achieve instruction fetch bandwidth that is sufficient for wide out-of-order execution cores. While existing predictors can still be accessed in a single clock cycle, recent studies show that slower wires and faster clock rates will require multi-cycle access times to large on-chip structures, such as branch prediction tables. Thus, future branch predictors must consider not only area and accuracy, but also delay. The paper explores these tradeoffs in designing branch predictors and shows that increased accuracy alone cannot overcome the penalties in delay that arise with larger predictor structures. We evaluate three schemes for accommodating delay: a caching approach, an overriding approach, and a cascading lookahead approach. While we use a common branch predictor, gshare, as the prediction component, these schemes can be constructed using most types of predictors.
programming language design and implementation | 2007
Ben Hardekopf; Calvin Lin
Pointer information is a prerequisite for most program analyses, and the quality of this information can greatly affect their precision and performance. Inclusion-based (i.e. Andersen-style) pointer analysis is an important point in the space of pointer analyses, offering a potential sweet-spot in the trade-off between precision and performance. However, current techniques for inclusion-based pointer analysis can have difficulties delivering on this potential. We introduce and evaluate two novel techniques for inclusion-based pointer analysis---one lazy, one eager1---that significantly improve upon the current state-of-the-art without impacting precision. These techniques focus on the problem of online cycle detection, a critical optimization for scaling such analyses. Using a suite of six open-source C programs, which range in size from 169K to 2.17M LOC, we compare our techniques against the three best inclusion-based analyses--described by Heintze and Tardieu [11], by Pearce et al. [21], and by Berndl et al. [4]. The combination of our two techniques results in an algorithm which is on average 3.2 xfaster than Heintze and Tardieus algorithm, 6.4 xfaster than Pearce et al.s algorithm, and 20.6 faster than Berndl et al.s algorithm. We also investigate the use of different data structures to represent points-to sets, examining the impact on both performance and memory consumption. We compare a sparse-bitmap implementation used in the GCC compiler with a BDD-based implementation, and we find that the BDD implementation is on average 2x slower than using sparse bitmaps but uses 5.5x less memory.
conference on domain specific languages | 1999
Samuel Z. Guyer; Calvin Lin
This paper introduces an annotation language and a compiler that together can customize a library implementation for specific application needs. Our approach is distinguished by its ability to exploit high level, domain-specific information in the customization process. In particular, the annotations provide semantic information that enables our compiler to analyze and optimize library operations as if they were primitives of a domain-specific language. Thus, our approach yields many of the performance benefits of domain-specific languages, without the effort of developing a new compiler for each domain. This paper presents the annotation language, describes its role in optimization, and illustrates the benefits of the overall approach. Using a partially implemented compiler, we show how our system can significantly improve the performance of two applications written using the PLAPACK parallel linear algebra library.
international conference on distributed computing systems | 1998
Jian Yin; Lorenzo Alvisi; Michael Dahlin; Calvin Lin
The paper introduces volume leases as a mechanism for providing cache consistency for large scale, geographically distributed networks. Volume leases are a variation of leases, which were originally designed for distributed file systems. Using trace driven simulation, we compare two new algorithms against four existing cache consistency algorithms and show that our new algorithms provide strong consistency while maintaining scalability and fault tolerance. For a trace based workload of Web accesses, we find that volumes can reduce message traffic at servers by 40% compared to a standard lease algorithm, and that volumes can considerably reduce the peak load at servers when popular objects are modified.
international symposium on microarchitecture | 2004
Ibrahim Hur; Calvin Lin
As memory performance becomes increasingly important to overall system performance, the need to carefully schedule memory operations also increases. This paper presents a new approach to memory scheduling that considers the history of recently scheduled operations. This history-based approach provides two conceptual advantages: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, and (2) it allows the scheduler to select operations so that they match the programs mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We evaluate our solution using a cycle-accurate simulator for the recently announced IBM Power5. When compared with an in-order scheduler, our solution achieves IPC improvements of 10.9% on the NAS benchmarks and 63% on the data-intensive Stream benchmarks. Using microbenchmarks, we illustrate the growing importance of memory scheduling in the context of CMPs, hardware controlled prefetching, and faster CPU speeds.
high-performance computer architecture | 2008
Ibrahim Hur; Calvin Lin
This paper describes a comprehensive approach for using the memory controller to improve DRAM energy efficiency and manage DRAM power. We make three contributions: (1) we describe a simple power-down policy for exploiting low power modes of modern DRAMs; (2) we show how the idea of adaptive history-based memory schedulers can be naturally extended to manage power and energy; and (3) for situations in which additional DRAM power reduction is needed, we present a throttling approach that arbitrarily reduces DRAM activity by delaying the issuance of memory commands. Using detailed microarchitectural simulators of the IBM Power5+ and a DDR2-533 SDRAM, we show that our first two techniques combine to increase DRAM energy efficiency by an average of 18.2%, 21.7%, 46.1%, and 37.1% for the Stream, NAS, SPEC2006fp, and commercial benchmarks, respectively. We also show that our throttling approach provides performance that is within 4.4% of an idealized oracular approach.
computer and communications security | 2008
Walter Chang; Brandon Streiff; Calvin Lin
Current taint tracking systems suffer from high overhead and a lack of generality. In this paper, we solve both of these issues with an extensible system that is an order of magnitude more efficient than previous software taint tracking systems and is fully general to dynamic data flow tracking problems. Our system uses a compiler to transform untrusted programs into policy-enforcing programs, and our system can be easily reconfigured to support new analyses and policies without modifying the compiler or runtime system. Our system uses a sound and sophisticated static analysis that can dramatically reduce the amount of data that must be dynamically tracked. For server programs, our systems average overhead is 0.65% for taint tracking, which is comparable to the best hardware-based solutions. For a set of compute-bound benchmarks, our system produces no runtime overhead because our compiler can prove the absence of vulnerabilities, eliminating the need to dynamically track taint. After modifying these benchmarks to contain format string vulnerabilities, our systems overhead is less than 13%, which is over 6X lower than the previous best solutions. We demonstrate the flexibility and power of our system by applying it to file disclosure vulnerabilities, a problem that taint tracking cannot handle. To prevent such vulnerabilities, our system introduces an average runtime overhead of 0.25% for three open source server programs.
