Is this you? Create Your Porfile

Kurt Keutzer

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kurt Keutzer is active.

Explore More

Publication

Featured researches published by Kurt Keutzer.

design automation conference | 1992

Estimation of average switching activity in combinational and sequential circuits

Abhijit Ghosh; Srinivas Devadas; Kurt Keutzer; Jacob White

The authors address the problem of estimating the average power dissipated in VLSI combinational and sequential circuits, under random input sequences. Switching activity is strongly affected by gate delays and for this reason a general delay model is used in estimating switching activity. The method takes into account correlation caused at internal gates in the circuit due to reconvergence of input signals. In sequential circuits, the input sequence applied to the combinational portion of the circuit is highly correlated because some of the inputs to the combinational logic are flip-flop outputs representing the state of the circuit. Methods are presented to probabilistically estimate switching activity in sequential circuits. These methods automatically compute the switching rates and correlations between flip-flop outputs.<<ETX>>

design automation conference | 1987

DAGON: Technology Binding and Local Optimization by DAG Matching

Kurt Keutzer

Technology binding is the process of mapping a technology independent description of a circuit into a particular technology. This paper outlines a formalism of this problem and offers a solution to the problem in terms of matching patterns, describing technology specific cells and optimizations, against a technology independent circuit represented as a directed acyclic graph. This solution is implemented in DAGON. DAGON rests on a firm algorithmic foundation, and is able to guarantee locally optimal matches against a set of over three thousand patterns. DAGON is an integral part of a synthesis system that has been found to provide industrial quality solutions to real circuit design problems.

parallel computing | 2009

A view of the parallel computing landscape

Krste Asanovic; Rastislav Bodik; James Demmel; Tony M. Keaveny; Kurt Keutzer; John Kubiatowicz; Nelson Morgan; David A. Patterson; Koushik Sen; John Wawrzynek; David Wessel; Katherine A. Yelick

Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers.

international conference on machine learning | 2008

Fast support vector machine training and classification on graphics processors

Bryan Catanzaro; Narayanan Sundaram; Kurt Keutzer

Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 9-35x over LIBSVM running on a traditional processor. We also present a GPU-based system for SVM classification which achieves speedups of 81-138x over LIBSVM (5-24x over our own CPU based SVM classifier).

international conference on computer aided design | 1998

Getting to the bottom of deep submicron

Dennis Sylvester; Kurt Keutzer

We take a fresh look at the problems posed by deep submicron (DSM) geometries and re-open the investigation into how DSM effects are most likely to affect future design methodologies. We describe a comprehensive approach to accurately characterize the device and interconnect characteristics of present and future process generations. This approach results in the generation of a representative strawman technology that is used in conjunction with analytical model simulation tools and empirical design data to obtain a realistic picture of the future of circuit design. We then proceed to quantify the precise impact of interconnect, including delay degradation due to noise, on high performance ASIC designs. Having determined the role of interconnect in performance, we then reconsider the impact of future processes on ASIC design methodology.

design automation conference | 2002

A general probabilistic framework for worst case timing analysis

Michael Orshansky; Kurt Keutzer

The traditional approach to worst-case static-timing analysis is becoming unacceptably conservative due to an ever-increasing number of circuit and process effects. We propose a fundamentally different framework that aims to significantly improve the accuracy of timing predictions through fully probabilistic analysis of gate and path delays. We describe a bottom-up approach for the construction of joint probability density function of path delays, and present novel analytical and algorithmic methods for finding the full distribution of the maximum of a random path delay space with arbitrary path correlations.

european conference on computer vision | 2010

Dense point trajectories by GPU-accelerated large displacement optical flow

Narayanan Sundaram; Thomas Brox; Kurt Keutzer

Dense and accurate motion tracking is an important requirement for many video feature extraction algorithms. In this paper we provide a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion. The parallel implementation of large displacement optical flow runs about 78× faster than the serial C++ version. This makes it practical to use in a variety of applications, among them point tracking. In the course of obtaining the fast implementation, we also proved that the fixed point matrix obtained in the optical flow technique is positive semi-definite. We compare the point tracking to the most commonly used motion tracker - the KLT tracker - on a number of sequences with ground truth motion. Our resulting technique tracks up to three orders of magnitude more points and is 46% more accurate than the KLT tracker. It also provides a tracking density of 48% and has an occlusion error of 3% compared to a density of 0.1% and occlusion error of 8% for the KLT tracker. Compared to the Particle Video tracker, we achieve 66% better accuracy while retaining the ability to handle large displacements while running an order of magnitude faster.

international conference on computer aided design | 1992

On average power dissipation and random pattern testability of CMOS combinational logic networks

Amelia Shen; Abhijit Ghosh; Srinivas Devadas; Kurt Keutzer

The implications of the observation that the probability of the occurrence of a transition on a wire of a circuit affects both the average power dissipation and the random pattern testability of a circuit are investigated. It is shown that restructuring a logic circuit can significantly affect its average power dissipation. Various methods for the synthesis of combinational logic networks are presented and the effect of different algorithms on the power dissipation of the circuit is demonstrated. The dual problem of improving the random pattern testability of logic circuits is emphasized. It is shown that modifying the signal probabilities can significantly affect the random pattern testability of a circuit.<<ETX>>

IEEE Design & Test of Computers | 2001

Coverage metrics for functional validation of hardware designs

Serdar Tasiran; Kurt Keutzer

Software simulation remains the primary means of functional validation for hardware designs. Coverage metrics ensure optimal use of simulation resources, measure the completeness of validation, and direct simulations toward unexplored areas of the design. This article surveys the literature, and discusses the experiences of verification practitioners, regarding coverage metrics.

acm sigplan symposium on principles and practice of parallel programming | 2011

Copperhead: compiling an embedded data parallel language

Bryan Catanzaro; Michael Garland; Kurt Keutzer

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. The Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.

Explore More