Joonseok Park
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joonseok Park.
field programmable custom computing machines | 2000
Pedro C. Diniz; Joonseok Park
Mapping computations written in high-level programming languages to FPGA-based computing engines requires programmers to create the datapath responsible for the core of the computation as well as the control structures to generate the appropriate signals to orchestrate its execution. This paper addresses the issue of automatic generation of data storage and control structures for FPGA-based reconfigurable computing engines using existing compiler data dependence analysis techniques. We describe a set of parameterizable data storage and control structures used as the target of our prototype compiler. We present a compiler analysis algorithm to derive the parameters of the data storage structures to minimize the required memory bandwidth of the implementation. We also describe a complete compilation scheme for mapping loops that manipulate multi-dimensional array variables to hardware. We present preliminary simulation results for complete designs generated manually using the results of the compiler analysis. These preliminary results show that it is possible to successfully integrate compiler data dependence analysis with existing commercial synthesis tools.
international symposium on systems synthesis | 2001
Joonseok Park; Pedro C. Diniz
Commercially available behavioral synthesis tools do not adequately support FPGA vendor-specific external memory interfaces making it extremely difficult to exploit pipelined memory access modes as well as application specific memory operations scheduling critical for high-performance solutions. This lack of support substantially increases the complexity and the burden on designers in the mapping of applications to FPGA-based computing engines. We address the problem of external memory interfacing and aggressive scheduling of memory operations by proposing a decoupled architecture with two components: one component captures the specific target architecture timing; while the other component uses application specific memory access pattern information. Our results support the claim that it is possible to exploit application specific information and integrate that knowledge into custom schedulers that mix pipelined and non-pipelined access modes aimed at reducing the overhead associated with external memory accesses. The results also reveal that the additional design complexity of the scheduler, and its impact in the overall design is minimal.
languages and compilers for parallel computing | 2001
Pedro C. Diniz; Mary W. Hall; Joonseok Park; Byoungro So; Heidi E. Ziegler
The DEFACTO project - a Design Environment For Adaptive Computing TechnOlogy - is a system that maps computations, expressed in high-level languages such as C, directly onto FPGA-based computing platforms. Major challenges are the inherent flexibility of FPGA hardware, capacity and timing constraints of the target FPGA devices, and accompanying speed-area trade-offs. To address these, DEFACTO combines parallelizing compiler technology with behavioral VHDL synthesis tools, obtaining the complementary advantages of the compilers high-level analyses and transformations and synthesis binding, allocation and scheduling of low-level hardware resources. To guide the compiler in the search of a good solution, we introduce the notion of balance between the rates at which data is fetched from memory and accessed by the computation, combined with estimation from behavioral synthesis. Since FPGA-based designs offer the potential for optimizing memory-related operations, we have also incorporated the ability to exploit parallel memory accesses and customize memory access protocols into the compiler analysis.
Microprocessors and Microsystems | 2005
Pedro C. Diniz; Mary W. Hall; Joonseok Park; Byoungro So; Heidi E. Ziegler
Abstract The DEFACTO compilation and synthesis system is capable of automatically mapping computations expressed in high-level imperative programming languages as C to FPGA-based systems. DEFACTO combines parallelizing compiler technology with behavioral VHDI, synthesis tools to guide the application of high-level compiler transformations in the search of high-quality hardware designs. In this article we illustrate the effectiveness of this approach in automatically mapping several kernel codes to an FPGA quickly and correctly. We also present a detailed example of the comparison of the performance of an automatically generated design against a manually generated implementation of the same computation. The design-space-exploration component of DEFACTO is able to explore a large number of designs for a particular computation that would otherwise be impractical for any designers.
field-programmable technology | 2004
Nastaran Baradaran; Joonseok Park; Pedro C. Diniz
Contemporary configurable architectures have dedicated internal functional units such as multipliers, high-capacity storage RAM, and even CAM blocks. These RAM blocks allow the implementations to cache data to be reused in the near future, thereby avoiding the latency of external memory accesses. We present a data allocation algorithm that utilizes the RAM blocks in the presence of a limited number of hardware registers. This algorithm, based on a compiler data reuse analysis, determines which data should be cached in the internal RAM blocks and when. The preliminary results, for a set of image/signal processing kernels targeting a Xilinx Virtex/spl trade/ FPGA device, reveal that despite the increase latency of accessing data in RAM blocks, designs that use them require smaller configurable resources than designs that exclusively use registers, while attaining comparable and in some cases even better performance.
field-programmable custom computing machines | 2003
Joonseok Park; Pedro C. Diniz
As the densities of current FPGA continue to grow it is now possible to generate System-On-a-Chip (SoC) designs where multiple computing cores are connected to various memory modules with customized topology with application specific memory access patterns. For example, Xilinx has recently introduced devices to which a paired down version of a PowerPC core can be mapped and connected to a set of internal memories. In this paper we address the problem of synthesizing and estimating the area and speed of memory interfacing for Static RAM (SRAM) and Synchronous Dynamic RAM (SDRAM) with various latency parameters and access modes. We describe a set of synthesizable and programmable memory interfaces a compiler can use to automatically generate the appropriate designs for mapping computations to FPGA-based architectures. Our preliminary results reveal that it is possible to accurately model the area and timing requirements using a linear estimation function. We have successfully integrated the proposed memory interface designs with simple image processing kernels generated using commercially available behavioral synthesis tools.
field-programmable custom computing machines | 2003
Pedro C. Diniz; Joonseok Park
FPGAs (field programmable gate arrays) have appealing features such as customizable internal and external bandwidth and the ability to exploit vast amounts of fine-grain parallelism. In this paper, we explore the applicability of these features in using FPGAs as smart memory engines for search and reorganization computations over spatial pointer-based data structures. The experimental results in this paper suggests that reconfigurable logic, when combined with the data reorganization, can lead to dramatic performance improvements of up to 20x over traditional computer architectures for pointer-based computations, traditionally not viewed as a good match for reconfigurable technologies.
field-programmable custom computing machines | 2003
K.R.S. Shayee; Joonseok Park; Pedro C. Diniz
Digital image processing algorithms are a good match for direct implementation on FPGAs as current FPGA architectures can naturally match the fine grain parallelism in these applications. Typically, these algorithms are structured as a sequence of operations, expressed in high-level programming languages as tight loop nests. The loops usually define a shifting-window region over which the algorithm applies a simple localized operator (e.g., a differential gradient, or a min/max). In this research we focus on the development of fast, yet accurate performance and area modeling of complete FPGA designs that combine analytical, empirical and behavioral estimation techniques. We model the application of a set of important program transformations for image processing algorithms, namely loop unrolling, tiling, loop interchanging, loop fission and array privatization, and explore pipelined and non-pipelined execution modes. We take into consideration the impact of various transformations, in the presence of limited I/O resources like address generators and external memory data channels, on the performance of a complete design implemented in a FPGA based architecture.
field programmable gate arrays | 2002
Pedro C. Diniz; Joonseok Park
Field-Programmable-Core-Arrays (FPCA) will include various computing cores for a wide variety of applications ranging from DSP to general purpose computing. With the increasing gap between core computing speeds and memory access latency, managing and orchestrating the movement of data across multiple cores will become increasingly important. In this paper we propose data reorganization engines that allow a wide variety of data reorganizations intra- as well as inter-memory modules for future FPCAs. We have experimented with a suite of data reorganizations pervasive in DSP applications. Our limited set of experiments reveals that the proposed designs for these engines are flexile and use little design area in current FPGA fabrics, making them amenable to be easily integrated in future FPCAs as either soft- or hard- macros.
ieee international conference on high performance computing data and analytics | 2004
Nastaran Baradaran; Pedro C. Diniz; Joonseok Park
Scalar replacement or register promotion uses scalar variables to save data that can be reused across loop iterations, leading to a reduction of the number of memory operations at the expense of a possibly large number of registers. In this paper we present a compiler data reuse analysis capable of uncovering and exploiting reuse opportunities for array references that exhibit Multiple-Induction-Variable (MIV) subscripts, beyond the reach of current data reuse analysis techniques. We present experimental results of the application of scalar replacement to a sample set of kernel codes targeting a programmable hardware computing device — a Field-Programmable-Gate-Array (FPGA). The results show that, for memory bound designs, scalar replacement alone leads to speedups that range between 2x to 6x at the expense of an increase in the FPGA design area in the range of 6x to 20x.