Kurt Keutzer
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kurt Keutzer.
design automation conference | 1992
Abhijit Ghosh; Srinivas Devadas; Kurt Keutzer; Jacob White
The authors address the problem of estimating the average power dissipated in VLSI combinational and sequential circuits, under random input sequences. Switching activity is strongly affected by gate delays and for this reason a general delay model is used in estimating switching activity. The method takes into account correlation caused at internal gates in the circuit due to reconvergence of input signals. In sequential circuits, the input sequence applied to the combinational portion of the circuit is highly correlated because some of the inputs to the combinational logic are flip-flop outputs representing the state of the circuit. Methods are presented to probabilistically estimate switching activity in sequential circuits. These methods automatically compute the switching rates and correlations between flip-flop outputs.<<ETX>>
design automation conference | 1987
Kurt Keutzer
Technology binding is the process of mapping a technology independent description of a circuit into a particular technology. This paper outlines a formalism of this problem and offers a solution to the problem in terms of matching patterns, describing technology specific cells and optimizations, against a technology independent circuit represented as a directed acyclic graph. This solution is implemented in DAGON. DAGON rests on a firm algorithmic foundation, and is able to guarantee locally optimal matches against a set of over three thousand patterns. DAGON is an integral part of a synthesis system that has been found to provide industrial quality solutions to real circuit design problems.
parallel computing | 2009
Krste Asanovic; Rastislav Bodik; James Demmel; Tony M. Keaveny; Kurt Keutzer; John Kubiatowicz; Nelson Morgan; David A. Patterson; Koushik Sen; John Wawrzynek; David Wessel; Katherine A. Yelick
Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers.
international conference on machine learning | 2008
Bryan Catanzaro; Narayanan Sundaram; Kurt Keutzer
Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 9-35x over LIBSVM running on a traditional processor. We also present a GPU-based system for SVM classification which achieves speedups of 81-138x over LIBSVM (5-24x over our own CPU based SVM classifier).
international conference on computer aided design | 1998
Dennis Sylvester; Kurt Keutzer
We take a fresh look at the problems posed by deep submicron (DSM) geometries and re-open the investigation into how DSM effects are most likely to affect future design methodologies. We describe a comprehensive approach to accurately characterize the device and interconnect characteristics of present and future process generations. This approach results in the generation of a representative strawman technology that is used in conjunction with analytical model simulation tools and empirical design data to obtain a realistic picture of the future of circuit design. We then proceed to quantify the precise impact of interconnect, including delay degradation due to noise, on high performance ASIC designs. Having determined the role of interconnect in performance, we then reconsider the impact of future processes on ASIC design methodology.
design automation conference | 2002
Michael Orshansky; Kurt Keutzer
The traditional approach to worst-case static-timing analysis is becoming unacceptably conservative due to an ever-increasing number of circuit and process effects. We propose a fundamentally different framework that aims to significantly improve the accuracy of timing predictions through fully probabilistic analysis of gate and path delays. We describe a bottom-up approach for the construction of joint probability density function of path delays, and present novel analytical and algorithmic methods for finding the full distribution of the maximum of a random path delay space with arbitrary path correlations.
european conference on computer vision | 2010
Narayanan Sundaram; Thomas Brox; Kurt Keutzer
Dense and accurate motion tracking is an important requirement for many video feature extraction algorithms. In this paper we provide a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion. The parallel implementation of large displacement optical flow runs about 78× faster than the serial C++ version. This makes it practical to use in a variety of applications, among them point tracking. In the course of obtaining the fast implementation, we also proved that the fixed point matrix obtained in the optical flow technique is positive semi-definite. We compare the point tracking to the most commonly used motion tracker - the KLT tracker - on a number of sequences with ground truth motion. Our resulting technique tracks up to three orders of magnitude more points and is 46% more accurate than the KLT tracker. It also provides a tracking density of 48% and has an occlusion error of 3% compared to a density of 0.1% and occlusion error of 8% for the KLT tracker. Compared to the Particle Video tracker, we achieve 66% better accuracy while retaining the ability to handle large displacements while running an order of magnitude faster.
international conference on computer aided design | 1992
Amelia Shen; Abhijit Ghosh; Srinivas Devadas; Kurt Keutzer
The implications of the observation that the probability of the occurrence of a transition on a wire of a circuit affects both the average power dissipation and the random pattern testability of a circuit are investigated. It is shown that restructuring a logic circuit can significantly affect its average power dissipation. Various methods for the synthesis of combinational logic networks are presented and the effect of different algorithms on the power dissipation of the circuit is demonstrated. The dual problem of improving the random pattern testability of logic circuits is emphasized. It is shown that modifying the signal probabilities can significantly affect the random pattern testability of a circuit.<<ETX>>
IEEE Design & Test of Computers | 2001
Serdar Tasiran; Kurt Keutzer
Software simulation remains the primary means of functional validation for hardware designs. Coverage metrics ensure optimal use of simulation resources, measure the completeness of validation, and direct simulations toward unexplored areas of the design. This article surveys the literature, and discusses the experiences of verification practitioners, regarding coverage metrics.
acm sigplan symposium on principles and practice of parallel programming | 2011
Bryan Catanzaro; Michael Garland; Kurt Keutzer
Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. The Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.