Jinwoo Suh
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jinwoo Suh.
international conference on cluster computing | 2011
Stephen P. Crago; Kyle Dunn; Patrick Eads; Lorin Hochstein; Dong-In Kang; Mikyung Kang; Devendra Modium; Karandeep Singh; Jinwoo Suh; John Paul Walters
Current cloud computing infrastructure typically assumes a homogeneous collection of commodity hardware, with details about hardware variation intentionally hidden from users. In this paper, we present our approach for extending the traditional notions of cloud computing to provide a cloud-based access model to clusters that contain a heterogeneous architectures and accelerators. We describe our ongoing work extending the Open Stack cloud computing stack to support heterogeneous architectures and accelerators, and our experiences running Open Stack on our local heterogeneous cluster test bed.
international symposium on computer architecture | 2003
Jinwoo Suh; Eun-Gyu Kim; Stephen P. Crago; Lakshmi Srinivasan; Matthew French
Trends in microprocessors of increasing die size and clock speed and decreasing feature sizes have fueled rapidly increasing performance. However, the limited improvements in DRAM latency and bandwidth and diminishing returns of increasing superscalar ILP and cache sizes have led to the proposal of new microprocessor architectures that implement processor-in- memory, stream processing, and tiled processing. Each architecture is typically evaluated separately and compared to a baseline architecture. In this paper, we evaluate the performance of processors that implement these architectures on a common set of signal processing kernels.The implementation results are compared with the measured performance of a conventional system based on the PowerPC with Altivec. The results show that these new processors show significant improvements over conventional systems and that each architecture has its own strengths and weaknesses.
real time systems symposium | 2002
Dong In Kang; Stephen P. Crago; Jinwoo Suh
We consider a resource synthesis technique for real-time systems where the energy budget is limited and the performance of the system depends on how resources and energy are used. We consider two performance models for a task: (1) a task has variable execution time and performance of a task depends on the amount of execution time received, and (2) the execution time of a task is constant and the performance of a task depends on its frequency. We first propose an optimal resource synthesis technique which maximizes system performance without energy constraints. We prove its optimality with the earliest deadline first (EDF) scheduling policy when the performance function of a task is non-decreasing and concave. We propose an energy-aware resource allocation technique for systems with energy constraints using the same analytical framework. The energy-aware resource synthesis technique considers both resource usage and energy consumption to find a near optimal solution that maximizes system performance within the energy budget.
IEEE Transactions on Computers | 2002
Jinwoo Suh; Viktor K. Prasanna
Efficient transposition of out-of-core matrices has been widely studied. These efforts have focused on reducing the number of I/O operations. However, in state-of-the-art architectures, the memory-memory data transfer time and the index computation time are also significant components of the overall time. In this paper, we propose an algorithm that considers the index computation time and the I/O time and reduces the overall execution time. Our algorithm reduces the total execution time by reducing the number of I/O operations and eliminating the index computation. In doing so, two techniques are employed: writing the data on to disk in pre-defined patterns and balancing the number of disk read and write operations. The index computation time, which is an expensive operation involving two divisions and a multiplication, is eliminated by partitioning the memory into read and write buffers. The expensive in-processor permutation is replaced by data collection from the read buffer to the write buffer. Even though this partitioning may increase the number of I/O operations for some cases, it results in an overall reduction in the execution time due to the elimination of the expensive index computation. Our algorithm is analyzed using the well-known linear model and the parallel disk model. The experimental results on a Sun Enterprise, an SGI R12000 and a Pentium III show that our algorithm reduces the overall execution time by up to 50% compared with the best known algorithms in the literature.
international parallel and distributed processing symposium | 2002
Jinwoo Suh; Dong-In Kang; Stephen P. Crago
Power management is critical to power-constrained real-time systems. We present a dynamic power management algorithm. Unlike other approaches that focus on the tradeoff between power and performance, our algorithm maximizes power utilization and performance. Our algorithm considers a dynamic environment, allowing for changes in the available energy and adapting system parameters such as operating voltage, frequency, and the number of processors. In our algorithm, we divide the power management problem into three subproblems: i) initial power allocation to minimize wasted energy and avoid the undersupplied power situation, ii) system parameter computation based on the allocated power that maximizes the performance for a given power budget, and iii) dynamic update of the power and system parameters in run time. The simulation results of the algorithm for a satellite system using eight Processor-In-Memory (PIM) processors is presented.
ieee aerospace conference | 2011
John Paul Walters; Robert Kost; Karandeep Singh; Jinwoo Suh; Stephen P. Crago
The current generation of radiation-hardened general-purpose processors, such as the RAD750, lag far behind their commercial counterparts in terms of performance. To combat this, a new many-core processor was designed that would allow space applications to leverage up to 49 general-purpose processing cores for high performance space applications. The Maestro processor, based on Tileras TILE64 chip, is the result of this effort. Maestro is rad-hard by design, but there is still the possibility of both hardware and software errors.
Information Sciences | 2008
Mikyung Kang; Dong-In Kang; Jinwoo Suh; Junghoon Lee
The recent evolution of wireless sensor networks have yielded a demand to improve energy-efficient scheduling algorithms and energy-efficient medium access protocols. This paper proposes an energy-efficient real-time scheduling scheme that reduces power consumption and network errors on dual channel networks. The proposed scheme is based on a dynamic modulation scaling scheme which can scale the number of bits per symbol and a switching scheme which can swap the polling schedule between channels. Built on top of EDF scheduling policy, the proposed scheme enhances the power performance without violating the constraints of real-time streams. The simulation results show that the proposed scheme enhances fault-tolerance and reduces power consumption.
languages, compilers, and tools for embedded systems | 2001
Dong-In Kang; Stephen P. Crago; Jinwoo Suh
This paper presents an end-to-end synthesis technique for low-power distributed real-time system design. This technique synthesizes supply voltages of resources to optimize system-level power consumption while satisfying end-to-end hard real-time latency bounds. A system is modeled as a set of distributed task chains (or pipelines). Each task chain is given its own end-to-end constraints. Task chains may share resources. Our approach searches the space of the trade-off between end-to-end latency and supply voltages of resources to minimize system-level power consumption. A power optimization algorithm is proposed for simple distributed real-time systems that do not have any resource sharing between task chains, and its optimality is shown. For more general systems, a heuristic based on the same techniques is proposed.
Power aware computing | 2002
Patrick Shriver; Maya Gokhale; Scott D. Briles; Dong-In Kang; Michael Cai; Kevin McCabe; Stephen P. Crago; Jinwoo Suh
Satellite subsystem power budgets typically have strict margin allocations that limit the on-board processing capability of the spacecraft. Subsystems are assigned a fixed, maximum power allocation and are managed in an on/off manner according to available power and operations schedule. For a remote-sensing satellite, this limitation can result in poorer detection performance of interesting signal events as well as static instrument or data collection settings. Power-aware computation techniques can be utilized to increase the capability of on-board processing of science data and give the remote-sensing system a greater degree of flexibility.We investigate a power-aware, signal processing scheme used to study signals from lightning events in the Earths atmosphere. Detection and analysis of these lightning signals is complicated by the frequency dispersion experienced by the signal in the ionosphere as well as the interfering anthropogenic signals. We outline a method using multiprocessor architecture to run processing algorithms which have varying rates of power consumption. A 6 order magnitude spectrum of energy usage for these algorithms is obtained from experiment results.
international conference on algorithms and architectures for parallel processing | 1997
Jinwoo Suh; Monte Ung; Viktor K. Prasanna
We show a high throughput implementation of SAR on high performance computing (HPC) platforms. In our implementation, the processors are divided into two groups of size M and N. The first group consisting of M processors computes the FDC (frequency domain convolution) in range dimension, and the second group of N processors computes the FDC in azimuth dimension. M and N are determined by the computational requirements of FDC in range and azimuth dimensions respectively. The key contribution of this paper is the development of a general high-throughput M-to-N communication algorithm. The M-to-N communication algorithm is a basic communication primitive used in many signal processing applications when a software task pipeline is employed to obtain high throughput performance. Our algorithm reduces the number of communication steps to 1g(N/M+1)+n(k-1), where k/spl ges/2 and n=[1g/sub k/ M]. Implementation results on the IBM SP2 and the Cray T3D based on the MITRE real-time benchmarks are presented. The results show that, given an image of size 1K/spl times/1K, the minimum number of processors required for processing the SAR benchmarks can be reduced by 50% by using the proposed communication algorithm.