Sek M. Chai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sek M. Chai is active.

Explore More

Publication

Featured researches published by Sek M. Chai.

international parallel and distributed processing symposium | 2006

FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

Modern FPGA platforms provide the hardware and software infrastructure for building a bus-based system on chip (SoC) that meet the applications requirements. The designer can customize the hardware by selecting from a large number of pre-defined peripherals and fixed IP functions and by providing new hardware, typically expressed using RTL. Hardware accelerators that provide application-specific extensions to the computational capabilities of a system is an efficient mechanism to enhance the performance and reduce the power dissipation. What is missing is an integrated approach to identify the computationally critical parts of the application and to create accelerators starting from a high level representation with a minimal design effort. In this paper, we present an automation methodology and a tool that generates accelerators. We apply the methodology on an FPGA-based license plate recognition (LPR) system used in law enforcement. The accelerators process streaming data and support a programming model which can naturally express a large number of embedded applications resulting in efficient hardware implementations. We show that we can achieve an overall LPR application speed up from 1.2times to 2.6times, thus enabling real-time functionality under realistic road scenes

field-programmable custom computing machines | 2009

Real-Time Fisheye Lens Distortion Correction Using Automatically Generated Streaming Accelerators

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

Fisheye lenses are often used in scientific or virtual reality applications to enlarge the field of view of a conventional camera. Fisheye lens distortion correction is an image processing application which transforms the distorted fisheye images back to the natural-looking perspective space. This application is characterized by non-linear streaming memory access patterns that make main memory bandwidth a key performance limiter. We have developed a fisheye lens distortion correction system on a custom board that includes a Xilinx Virtex-4 FPGA. We express the application in a high level streaming language, and we utilize Proteus, an architectural synthesis tool, to quickly explore the design space and generate the streaming accelerator best suited for our cost and performance constraints. This paper shows that appropriate ESL tools enable rapid prototyping and design of real-life, performance critical and cost sensitive systems with complex memory access patterns and hardware-software interaction mechanisms.

international parallel and distributed processing symposium | 2010

Fisheye lens distortion correction on multicore and hardware accelerator platforms

Konstantis Daloukas; Christos D. Antonopoulos; Nikolaos Bellas; Sek M. Chai

Wide-angle (fisheye) lenses are often used in virtual reality and computer vision applications to widen the field of view of conventional cameras. Those lenses, however, distort images. For most real-world applications the video stream needs to be transformed, at real-time (20 frames/sec or better), back to the natural-looking, central perspective space. This paper presents the implementation, optimization and characterization of a fisheye lens distortion correction application on three platforms: a conventional, homogeneous multicore processor by Intel, a heterogeneous multicore (Cell BE), and an FPGA implementing an automatically generated streaming accelerator. We evaluate the interaction of the application with those architectures using both high- and low-level performance metrics. In macroscopic terms, we find that todays mainstream conventional multicores are not effective in supporting real-time distortion correction, at least not with the currently commercially available core counts. Architectures, such as the Cell BE and FPGAs, offer the necessary computational power and scalability, at the expense of significantly higher development effort. Among these three platforms, only the FPGA and a fully optimized version of the code running on the Cell processor can provide realtime processing speed. In general, FPGAs meet the expectations of performance, flexibility, and low overhead. General purpose multicores are, on the other hand, much easier to program.

field-programmable custom computing machines | 2006

Template-Based Generation of Streaming Accelators from a High Level Presentation

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

A design methodology and prototype tool to automate the design and architectural exploration of hardware accelerators are described in this paper. In comparison to other approaches, we utilize a well-engineered template to enable fast convergence to an area and speed efficient design. We show how this methodology is used for an application set with various architectural configurations

memory performance dealing with applications systems and architecture | 2006

Memory bandwidth optimization through stream descriptors

Abelardo López-Lagunas; Sek M. Chai

The memory subsystem for computer vision and image processing applications must sustain high memory bandwidth to keep processors busy. This paper advocates the use of stream descriptors, a mechanism that allows programmers to indicate data movement explicitly. Stream descriptors enable the compiler to organize memory transfers more efficiently by matching data movement to the capabilities of the underlying hardware. Stream descriptors are used in this paper on an image sensor interface to describe the deterministic movements of objects in segmented image regions. The paper shows how stream descriptors reduce the bandwidth requirements for a set of computer vision applications.

international workshop on computer architecture for machine perception | 2005

Streaming I/O for imaging applications

Sek M. Chai; Abelardo López-Lagunas

The streaming computation model is appropriate for imaging applications because of the compute-intensive characteristics and memory access patterns. This paper advocates the streaming computation model and describes a streaming I/O peripheral used in a system-on-chip architecture. The impact on applications is described with details on performance gains from efficient memory transfers. Discussions on algorithm enhancements using streaming I/O peripherals are also provided.

Archive | 2009

Mobile Challenges for Embedded Computer Vision

Sek M. Chai

The mobile environment poses uniquely challenging constraints for designers of embedded computer vision systems. There are traditional issues such as size, weight, and power, which are readily evident. However, there are also other less tangible obstacles related to technology acceptance and business models that stand in the way of a successful product deployment. In this chapter, I describe these issues as well as other qualities desired in a mobile smart camera using computer vision algorithms to “see and understand” the scene. The target platform of discussion is the mobile handset, as this platform is poised to be the ubiquitous consumer device all around the world.

signal processing systems | 2011

Streaming Data Movement for Real-Time Image Analysis

Abelardo López-Lagunas; Sek M. Chai

High performance portable systems for real-time video/image analysis continue to demand high processing power and memory bandwidth. In embedded systems such as digital still cameras, camcorders, and camera phones, the expected performance must be delivered while meeting size, weight, and power constraints. A well-designed system should include analyses of its memory subsystem as well as the computation platform. This paper focuses on a streaming memory subsystem that leverages deterministic memory access patterns. We formalize the notion of stream descriptors as a means to define these stream access patterns and to improve memory access efficiencies by discovering locality between different data streams. Data movement for a real-time image analysis applications are performed, showing favorable bandwidth savings using stream descriptors.

international conference on parallel processing | 2006

Compiler manipulation of stream descriptors for data access optimization

Abelardo López-Lagunas; Sek M. Chai

Efficient data movement is one of the key attributes for high performance computing. This paper advocates the use of stream descriptors to convey memory access patterns from the programmer to the compiler. This explicit separation of computation and data movement enables the compiler to manipulate the stream descriptors to match the systems interconnect capabilities. Data movement is optimized by manipulating stream descriptors to target specific optimizations such as bandwidth management and buffer allocation. In this paper, bandwidth improvements are shown for an example system performing video analysis using computer vision methods. The system includes key hardware mechanisms that use stream descriptors to prefetch and align data for stream processors

international parallel and distributed processing symposium | 2007

An Architectural Framework for Automated Streaming Kernel Selection

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

Hardware accelerators are increasingly used to extend the computational capabilities of baseline scalar processors to meet the growing performance and power requirements of embedded applications. The challenge to the designer is the extensive human effort required to identify the appropriate kernels to be mapped to gates and to implement a network of accelerators to execute the kernels. In this paper, we present a methodology to automate the selection of streaming kernels in a reconfigurable platform based on the characteristics of the application. The methodology is based on a flow graph that describes the streaming computations and communications. The flow graph is used to efficiently identify the most profitable subset of streaming kernels that optimize performance without exceeding the available area of the reconfigurable fabric.

Explore More