Doris Chen
Altera
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Doris Chen.
field programmable logic and applications | 2012
Doris Chen; Deshanand P. Singh
The FPGA can be a tremendously efficient computational fabric for many applications. In particular, the performance to power ratios of FPGA make them attractive solutions to solve the problem of data centers that are constrained largely by power and cooling costs. However, the complexity of the FPGA design flow requires the programmer to understand cycle-accurate details of how data is moved and transformed through the fabric. In this paper, we explore techniques that allow programmers to efficiently use FPGAs at a level of abstraction that is closer to traditional software-centric approaches by using the emerging parallel language, OpenCL. Although the field of high level synthesis has evolved greatly in the last few decades, several fundamental parts were missing from the complete software abstraction of the FPGA. These include standard and portable methods of describing HW/SW codesign, memory hierarchy, data movement and control of parallelism. We believe that OpenCL addresses all of these issues and allows for highly efficient description of FPGA designs with a higher level of abstraction. We demonstrate this premise by examining the performance of a document filtering algorithm, implemented in OpenCL and automatically compiled to a Stratix IV 530 FPGA. We show that our implementation achieves 5.5× and 5.25× better performance per watt ratios than GPU and CPU implementations, respectively.
asia and south pacific design automation conference | 2013
Doris Chen; Deshanand P. Singh
Fractal compression is an efficient technique for image and video encoding that uses the concept of self-referential codes. Although offering compression quality that matches or exceeds traditional techniques with a simpler and faster decoding process, fractal techniques have not gained widespread acceptance due to the computationally intensive nature of its encoding algorithm. In this paper, we present a real-time implementation of a fractal compression algorithm in OpenCL [1]. We show how the algorithm can be efficiently implemented in OpenCL and optimized for multi-CPUs, GPUs, and FPGAs. We demonstrate that the core computation implemented on the FPGA through OpenCL is 3× faster than a high-end GPU and 114× faster than a multi-core CPU, with significant power advantages. We also compare to a hand coded FPGA implementation to showcase the effectiveness of an OpenCL-to-FPGA compilation tool.
field programmable gate arrays | 2010
Doris Chen; Deshanand P. Singh; Jeffrey Christopher Chromczak; David Lewis; Ryan Fung; David Neto; Vaughn Betz
Metastability is a phenomenon that can cause system failures in digital circuits. It may occur whenever signals are being transmitted across asynchronous or unrelated clock domains. The impact of metastability is increasing as process geometries shrink and supply voltages drop faster than transistor Vts. FPGA technologies are significantly affected since leading edge FPGAs are amongst the first devices to adopt the most recent process nodes. In this paper, we present a comprehensive suite of techniques for modeling, characterizing and optimizing metastability effects in FPGAs. We first discuss a theoretical model of metastability, and verify the predictions using both circuit level simulations and board measurements. Next we show how designers have traditionally dealt with metastability problems and contrast that with the automatic CAD algorithms described in this paper that both analyze and optimize metastability-related issues. Through our detailed experimental results, we show that we can improve the metastability characteristics of a large suite of industrial benchmarks by an average of 268,000 times with our optimization techniques.
high performance embedded architectures and compilers | 2013
Doris Chen; Deshanand P. Singh
The key to enabling widespread use of FPGAs for algorithm acceleration is to allow programmers to create efficient designs without the time-consuming hardware design process. Programmers are used to developing scientific and mathematical algorithms in high-level languages (C/C++) using floating point data types. Although easy to implement, the dynamic range provided by floating point is not necessary in many applications; more efficient implementations can be realized using fixed point arithmetic. While this topic has been studied previously [Han et al. 2006; Olson et al. 1999; Gaffar et al. 2004; Aamodt and Chow 1999], the degree of full automation has always been lacking. We present a novel design flow for cases where FPGAs are used to offload computations from a microprocessor. Our LLVM-based algorithm inserts value profiling code into an unmodified C/C++ application to guide its automatic conversion to fixed point. This allows for fast and accurate design space exploration on a host microprocessor before any accelerators are mapped to the FPGA. Through experimental results, we demonstrate that fixed-point conversion can yield resource savings of up to 2x--3x reductions. Embedded RAM usage is minimized, and 13%--22% higher Fmax than the original floating-point implementation is observed. In a case study, we show that 17% reduction in logic and 24% reduction in register usage can be realized by using our algorithm in conjunction with a High-Level Synthesis (HLS) tool.
field programmable gate arrays | 2011
Doris Chen; Deshanand P. Singh
FPGA logic density is roughly doubling at every process generation. Consequently, it is becoming increasingly challenging for FPGA CAD tools to keep up with the growing complexities of high-speed designs while keeping CAD run-times reasonable. In this paper, we present a novel incremental resynthesis tool called Line-Level Incremental reSynthesis (LLIS), integrated within an industrial tool suite, that addresses the problems of timing closure as well as CAD runtime (patent pending). We describe a general framework that can incrementally reuse results from a previous compile based on automatic differencing of HDL changes. We show that it is possible to reduce synthesis runtime by 6.5x for common HDL changes. As compared with complete resynthesis, we preserve known good timing solutions more than 82% of the time. This represents a 3X improvement vs. non-incremental techniques.
Archive | 2012
Doris Chen; Deshanand P. Singh
Archive | 2013
Doris Chen; Deshanand P. Singh
Archive | 2017
Doris Chen; Deshanand P. Singh
Archive | 2013
Doris Chen; Deshanand P. Singh
Archive | 2013
Doris Chen; Deshanand P. Singh