Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Markus Püschel is active.

Publication


Featured researches published by Markus Püschel.


ACM Transactions on Algorithms | 2007

Multiplierless multiple constant multiplication

Yevgen Voronenko; Markus Püschel

A variable can be multiplied by a given set of fixed-point constants using a multiplier block that consists exclusively of additions, subtractions, and shifts. The generation of a multiplier block from the set of constants is known as the multiple constant multiplication (MCM) problem. Finding the optimal solution, namely, the one with the fewest number of additions and subtractions, is known to be NP-complete. We propose a new algorithm for the MCM problem, which produces solutions that require up to 20% less additions and subtractions than the best previously known algorithm. At the same time our algorithm, in contrast to the closest competing algorithm, is not limited by the constant bitwidths. We present our algorithm using a unifying formal framework for the best, graph-based MCM algorithms and provide a detailed runtime analysis and experimental evaluation. We show that our algorithm can handle problem sizes as large as 100 32-bit constants in a time acceptable for most applications. The implementation of the new algorithm is available at www.spiral.net.


ieee international conference on high performance computing data and analytics | 2004

Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

Markus Püschel; José M. F. Moura; Bryan Singer; Jianxin Xiong; Jeremy R. Johnson; David A. Padua; Manuela M. Veloso; Robert W. Johnson

SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations the “best” match to the given computing platform. We present empirical data that demonstrate the high performance of SPIRAL generated code.


IEEE Transactions on Signal Processing | 2013

D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization

João F. C. Mota; João M. F. Xavier; Pedro M. Q. Aguiar; Markus Püschel

We propose a distributed algorithm, named Distributed Alternating Direction Method of Multipliers (D-ADMM), for solving separable optimization problems in networks of interconnected nodes or agents. In a separable optimization problem there is a private cost function and a private constraint set at each node. The goal is to minimize the sum of all the cost functions, constraining the solution to be in the intersection of all the constraint sets. D-ADMM is proven to converge when the network is bipartite or when all the functions are strongly convex, although in practice, convergence is observed even when these conditions are not met. We use D-ADMM to solve the following problems from signal processing and control: average consensus, compressed sensing, and support vector machines. Our simulations show that D-ADMM requires less communications than state-of-the-art algorithms to achieve a given accuracy level. Algorithms with low communication requirements are important, for example, in sensor networks, where sensors are typically battery-operated and communicating is the most energy consuming operation.


IEEE Transactions on Signal Processing | 2012

Distributed Basis Pursuit

João F. C. Mota; João M. F. Xavier; Pedro M. Q. Aguiar; Markus Püschel

We propose a distributed algorithm for solving the optimization problem Basis Pursuit (BP). BP finds the least ℓ1-norm solution of the underdetermined linear system Ax = b and is used, for example, in compressed sensing for reconstruction. Our algorithm solves BP on a distributed platform such as a sensor network, and is designed to minimize the communication between nodes. The algorithm only requires the network to be connected, has no notion of a central processing node, and no node has access to the entire matrix A at any time. We consider two scenarios in which either the columns or the rows of A are distributed among the compute nodes. Our algorithm, named D-ADMM, is a decentralized implementation of the alternating direction method of multi- pliers. We show through numerical simulation that our algorithm requires considerably less communications between the nodes than the state-of-the-art algorithms.


SIAM Journal on Computing | 2003

The Algebraic Approach to the Discrete Cosine and Sine Transforms and Their Fast Algorithms

Markus Püschel; José M. F. Moura

It is known that the discrete Fourier transform (DFT) used in digital signal processing can be characterized in the framework of the representation theory of algebras, namely, as the decomposition matrix for the regular module


IEEE Signal Processing Magazine | 2009

Discrete fourier transform on multicore

Franz Franchetti; Markus Püschel; Yevgen Voronenko; Srinivas Chellappa; José M. F. Moura

{\mathbb{C}}[Z_n] = {\mathbb{C}}[x]/(x^n - 1)


conference on high performance computing (supercomputing) | 2006

FFT program generation for shared memory: SMP and multicore

Franz Franchetti; Yevgen Voronenko; Markus Püschel

. This characterization provides deep insight into the DFT and can be used to derive and understand the structure of its fast algorithms. In this paper we present an algebraic characterization of the important class of discrete cosine and sine transforms as decomposition matrices of certain regular modules associated with four series of Chebyshev polynomials. Then we derive most of their known algorithms by pure algebraic means. We identify the mathematical principle behind each algorithm and give insight into its structure. Our results show that the connection between algebra and digital signal processing is stronger than previously understood.


IEEE Transactions on Signal Processing | 2008

Algebraic Signal Processing Theory: Cooley–Tukey Type Algorithms for DCTs and DSTs

Yevgen Voronenko; Markus Püschel

This article gives an overview on the techniques needed to implement the discrete Fourier transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible multicores, but we also discuss the IBM Cell and, briefly, graphics processing units (GPUs). The performance optimization is broken down into three key challenges: parallelization, vectorization, and memory hierarchy optimization. In each case, we use the Kronecker product formalism to formally derive the necessary algorithmic transformations based on a few hardware parameters. Further code-level optimizations are discussed. The rigorous nature of this framework enables the complete automation of the implementation task as shown by the program generator Spiral. Finally, we show and analyze DFT benchmarks of the fastest libraries available for the considered platforms.


IEEE Transactions on Signal Processing | 2008

Algebraic Signal Processing Theory: Foundation and 1-D Time

Markus Püschel; José M. F. Moura

The chip makers response to the approaching end of CPU frequency scaling are multicore systems, which offer the same programming paradigm as traditional shared memory platforms but have different performance characteristics. This situation considerably increases the burden on library developers and strengthens the case for automatic performance tuning frameworks like Spiral, a program generator and optimizer for linear transforms such as the discrete Fourier transform (DFT). We present a shared memory extension of Spiral. The extension within Spiral consists of a rewriting system that manipulates the structure of transform algorithms to achieve load balancing and avoids false sharing, and of a backend to generate multithreaded code. Application to the DFT produces a novel class of algorithms suitable for multicore systems as validated by experimental results: we demonstrate a parallelization speed-up already for sizes that fit into L1 cache and compare favorably to other DFT libraries across all small and midsize DFTs and considered platforms


ACM Transactions on Design Automation of Electronic Systems | 2012

Computer Generation of Hardware for Linear Digital Signal Processing Transforms

Peter A. Milder; Franz Franchetti; James C. Hoe; Markus Püschel

In this paper, we systematically derive a large class of fast general-radix algorithms for various types of real discrete Fourier transforms (real DFTs) including the discrete Hartley transform (DHT) based on the algebraic signal processing theory. This means that instead of manipulating the transform definition, we derive algorithms by manipulating the polynomial algebras underlying the transforms using one general method. The same method yields the well-known Cooley-Tukey fast Fourier transform (FFT) as well as general radix discrete cosine and sine transform algorithms. The algebraic approach makes the derivation concise, unifies and classifies many existing algorithms, yields new variants, enables structural optimization, and naturally produces a human-readable structural algorithm representation based on the Kronecker product formalism. We show, for the first time, that the general-radix Cooley-Tukey and the lesser known Bruun algorithms are instances of the same generic algorithm. Further, we show that this generic algorithm can be instantiated for all four types of the real DFT and the DHT.

Collaboration


Dive into the Markus Püschel's collaboration.

Top Co-Authors

Avatar

Franz Franchetti

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

James C. Hoe

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

José M. F. Moura

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yevgen Voronenko

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Jelena Kovacevic

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

João M. F. Xavier

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge