Peter A. Milder
Stony Brook University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter A. Milder.
ACM Transactions on Design Automation of Electronic Systems | 2012
Peter A. Milder; Franz Franchetti; James C. Hoe; Markus Püschel
Linear signal transforms such as the discrete Fourier transform (DFT) are very widely used in digital signal processing and other domains. Due to high performance or efficiency requirements, these transforms are often implemented in hardware. This implementation is challenging due to the large number of algorithmic options (e.g., fast Fourier transform algorithms or FFTs), the variety of ways that a fixed algorithm can be mapped to a sequential datapath, and the design of the components of this datapath. The best choices depend heavily on the resource budget and the performance goals of the target application. Thus, it is difficult for a designer to determine which set of options will best meet a given set of requirements. In this article we introduce the Spiral hardware generation framework and system for linear transforms. The system takes a problem specification as input as well as directives that define characteristics of the desired datapath. Using a mathematical language to represent and explore transform algorithms and datapath characteristics, the system automatically generates an algorithm, maps it to a datapath, and outputs a synthesizable register transfer level Verilog description suitable for FPGA or ASIC implementation. The quality of the generated designs rivals the best available handwritten IP cores.
Optics Express | 2009
Yannis Benlachtar; Philip M. Watts; Rachid Bouziane; Peter A. Milder; Deepak Rangaraj; Anthony Cartolano; Robert Koutsoyannis; James C. Hoe; Markus Püschel; Madeleine Glick; Robert I. Killey
We demonstrate a field programmable gate array (FPGA) based optical orthogonal frequency division multiplexing (OFDM) transmitter implementing real time digital signal processing at a sample rate of 21.4 GS/s. The QPSK-OFDM signal is generated using an 8 bit, 128 point inverse fast Fourier transform (IFFT) core, performing one transform per clock cycle at a clock speed of 167.2 MHz and can be deployed with either a direct-detection or a coherent receiver. The hardware design and the main digital signal processing functions are described, and we show that the main performance limitation is due to the low (4-bit) resolution of the digital-to-analog converter (DAC) and the 8-bit resolution of the IFFT core used. We analyze the back-to-back performance of the transmitter generating an 8.36 Gb/s optical single sideband (SSB) OFDM signal using digital up-conversion, suitable for direct-detection. Additionally, we use the device to transmit 8.36 Gb/s SSB OFDM signals over 200 km of uncompensated standard single mode fiber achieving an overall BER<10(-3).
design automation conference | 2005
Grace Nordin; Peter A. Milder; James C. Hoe; Markus Püschel
This paper presents a parameterized soft core generator for the discrete Fourier transform (DFT). Reusable IPs of digital signal processing (DSP) kernels are important time-saving resources in DSP hardware development. Unfortunately, reusable IPs, however optimized, could introduce inefficiencies because they cannot fit the exact requirements of every application context. Given the well-understood and regular computation in DSP kernels, an automatic tool could generate high-quality ready-to-use IPs customized to user-specified cost/performance tradeoffs (beyond basic parameters such as input size and data format). The paper shows that the generated DFT cores could match closely the performance and cost of DFT cores from the Xilinx LogiCore library. Furthermore, the generator could yield DFT cores over a range of different performance/cost tradeoff points that are not available from the library.
Journal of the ACM | 2009
Markus Püschel; Peter A. Milder; James C. Hoe
This article presents a method for constructing hardware structures that perform a fixed permutation on streaming data. The method applies to permutations that can be represented as linear mappings on the bit-level representation of the data locations. This subclass includes many important permutations such as stride permutations (corner turn, perfect shuffle, etc.), the bit reversal, the Hadamard reordering, and the Gray code reordering. The datapath for performing the streaming permutation consists of several independent banks of memory and two interconnection networks. These structures are built for a given streaming width (i.e., number of inputs and outputs per cycle) and operate at full throughput for this streaming width. We provide an algorithm that completely specifies the datapath and control logic given the desired permutation and streaming width. Further, we provide lower bounds on the achievable cost of a solution and show that for an important subclass of permutations our solution is optimal. We apply our algorithm to derive datapaths for several important permutations, including a detailed example that carefully illustrates each aspect of the design process. Lastly, we compare our permutation structures to those of Järvinen et al. [2004], which are specialized for stride permutations.
international conference on transparent optical networks | 2010
Yannis Benlachtar; Rachid Bouziane; Robert I. Killey; Christian R. Berger; Peter A. Milder; Robert Koutsoyannis; James C. Hoe; Markus Püschel; Madeleine Glick
We investigate the use of orthogonal frequency division multiplexing (OFDM) to increase the capacity of multimode fiber (MMF)-based optical interconnects for data center applications. This approach provides a solution to modulation bandwidth limitations of the lasers, and to the intermodal dispersion of the MMF which leads to frequency-dependent attenuation. Recent studies on adaptively modulated OFDM are reviewed, and new simulation results assessing the capacity of such links for lengths of up to 300 m are presented, assuming the use of 50/125 µm graded-index MMF at a wavelength of 850 nm. The use of coded OFDM as an approach to deal with intermodal dispersion is also discussed.
field-programmable custom computing machines | 2007
Paolo D'Alberto; Peter A. Milder; Aliaksei Sandryhaila; Franz Franchetti; James C. Hoe; José M. F. Moura; Markus Püschel; Jeremy R. Johnson
We present a domain-specific approach to generate high-performance hardware-software partitioned implementations of the discrete Fourier transform (DFT) in fixed point precision. The partitioning strategy is a heuristic based on the DFTs divide-and-conquer algorithmic structure and fine tuned by the feedback-driven exploration of candidate designs. We have integrated this approach in the Spiral linear-transform code-generation framework to support push-button automatic implementation. We present evaluations of hardware-software DFT implementations running on the embedded PowerPC processor and the reconfigurable fabric of the Xilinx Virtex-II Pro FPGA. In our experiments, the 1D and 2D DFTs FPGA-accelerated libraries exhibit between 2 and 7.5 times higher performance (operations per second) and up to 2.5 times better energy efficiency (operations per Joule) than the software-only version.
international symposium on microarchitecture | 2016
Manoj Alwani; Han Chen; Michael Ferdman; Peter A. Milder
Deep convolutional neural networks (CNNs) are rapidly becoming the dominant approach to computer vision and a major component of many other pervasive machine learning tasks, such as speech recognition, natural language processing, and fraud detection. As a result, accelerators for efficiently evaluating CNNs are rapidly growing in popularity. The conventional approaches to designing such CNN accelerators is to focus on creating accelerators to iteratively process the CNN layers. However, by processing each layer to completion, the accelerator designs must use off-chip memory to store intermediate data between layers, because the intermediate data are too large to fit on chip. In this work, we observe that a previously unexplored dimension exists in the design space of CNN accelerators that focuses on the dataflow across convolutional layers. We find that we are able to fuse the processing of multiple CNN layers by modifying the order in which the input data are brought on chip, enabling caching of intermediate data between the evaluation of adjacent CNN layers. We demonstrate the effectiveness of our approach by constructing a fused-layer CNN accelerator for the first five convolutional layers of the VGGNet-E network and comparing it to the state-of-the-art accelerator implemented on a Xilinx Virtex-7 FPGA. We find that, by using 362KB of on-chip storage, our fused-layer accelerator minimizes off-chip feature map data transfer, reducing the total transfer by 95%, from 77MB down to 3.6MB per image.
field programmable gate arrays | 2006
Peter A. Milder; Mohammad Ahmad; James C. Hoe; Markus Püschel
This paper presents an equation-based resource utilization model for automatically generated discrete Fourier transform (DFT) soft core IPs. The parameterized DFT IP generator allows a user to make customized tradeoffs between cost and performance and between utilization of different resource classes. The equation-based resource model permits immediate and accurate estimation of resource requirements as the user considers the different generator options. Furthermore, the fast turnaround of the model allows it to be combined with a search algorithm such that the user could query automatically for an optimal design within the stated performance and resource constraints.Following a brief review of the DFT IP generator, this paper presents the development of the equation-based models for estimating slice and hard macro utilizations in the Xilinx Virtex-II Pro FPGA family. The evaluation section shows that an average error of 6.1% is achievable by a model of linear equations that can be evaluated in sub-microseconds. The paper further offers a demonstration of the automatic design exploration capability.
Optics Express | 2012
Rene Schmogrow; Rachid Bouziane; Matthias Meyer; Peter A. Milder; Philipp Schindler; Robert I. Killey; Polina Bayvel; Christian Koos; Wolfgang Freude; Juerg Leuthold
We investigate the performance and DSP resource requirements of digitally generated OFDM and sinc-shaped Nyquist pulses. The two multiplexing techniques are of interest as they offer highest spectral efficiency. The comparison aims at determining which technology performs better with limited processing capacities of state-of-the-art FPGAs. It is shown that a novel Nyquist pulse shaping technique, based on look-up tables requires lower resource count than equivalent IFFT-based OFDM signal generation while achieving similar performance with low inter-channel guard-bands in ultra-dense WDM. Our findings are based on a resource assessment of selected DSP implementations in terms of both simulations and experimental validations. The experiments were performed with real-time software-defined transmitters using a single or three optical carriers.
Optics Express | 2011
Christian R. Berger; Yannis Benlachtar; Robert I. Killey; Peter A. Milder
Orthogonal frequency division multiplexing (OFDM) has recently gained substantial interest in high capacity optical fiber communications. Unlike wireless systems, optical OFDM systems are constrained by the limited resolution of the ultra high-speed digital-to-analog converters (DAC) and analog-to-digital converters (ADC). Additionally, the situation is exacerbated by the large peak-to-average power ratio (PAPR) inherent in OFDM signals. In this paper, we study the effects of clipping and quantization noise on the system performance. We analytically quantify the introduced distortion as a function of bit resolution and clipping ratio, both at the DAC and ADC. With this we provide a back-to-back signal-to-noise ratio analysis to predict the bit error rate of the system, assuming a fixed received optical power and ideal electrical-optical-electrical conversion. Simulation and experimental results are used to confirm the validity of the expressions.