Taeweon Suh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taeweon Suh is active.

Explore More

Publication

Featured researches published by Taeweon Suh.

design, automation, and test in europe | 2004

Supporting cache coherence in heterogeneous multiprocessor systems

Taeweon Suh; Douglas M. Blough; Hsien-Hsin Sean Lee

In embedded system-on-a-chip (SoC) applications, the demand for integrating heterogeneous processors onto a single chip is increasing. An important issue in integrating multiple heterogeneous processors on the same chip is to maintain the coherence of their data caches. In this paper, we propose a hardware/software methodology to make caches coherent in heterogeneous multiprocessor platforms with shared memory. Our approach works with any combination of processors that support invalidation-based protocols. As shown in our experiments, up to 58% performance improvement can be achieved with low miss penalty at the expense of adding simple hardware, compared to a pure software solution. Speedup can be improved even further as the miss penalty increases. In addition, our approach provides embedded system programmers a transparent view of shared data, removing the burden of software synchronization.

field programmable gate arrays | 2007

An FPGA-based Pentium® in a complete desktop system

Shih-Lien L. Lu; Peter Yiannacouras; Rolf Kassa; Michael Konow; Taeweon Suh

Software simulation has been the predominant method for architects to evaluate microprocessor research proposals. There are three tenets in modeling new designs with software models: simulation speed, model accuracy and model completeness. The increasing complexity of the processor and accelerated trend to have multiple processors on a chip are putting burden on simulators to achieve all tenets mentioned, including accurately capturing OS effects. In this work we perform preliminary experimentation/prototyping with an emulation system which overcomes the tension to satisfy all three requirements. The system is an original Socket-7 based desktop processor system with typical hardware peripherals running modern operating systems such as Fedora Core 4 and Windows XP; however we have inserted a Xilinx Virtex-4 in place of the processor that should sit in the motherboard and have used the Virtex-4 to host a complete version of the Pentium® microprocessor (which consumes less than half its resources). We can therefore apply architectural changes to the processor and evaluate their effects on the complete desktop system. We use this FPGA-based emulation system to conduct preliminary architectural experiments including growing the branch target buffer and the level 1 caches. In addition, we experimented with interfacing hardware accelerators such as DES and AES engines which resulted in 27x speedups.

IEEE Transactions on Education | 2012

Pipelined CPU Design With FPGA in Teaching Computer Architecture

Jong-hyuk Lee; Seung Eun Lee; Heon Chang Yu; Taeweon Suh

This paper presents a pipelined CPU design project with a field programmable gate array (FPGA) system in a computer architecture course. The class project is a five-stage pipelined 32-bit MIPS design with experiments on the Altera DE2 board. For proper scheduling, milestones were set every one or two weeks to help students complete the project on time. The goal of the project is to educate students effectively via hands-on learning, rather than having them achieve a complete and flawless CPU design. This study reveals that 21 MIPS instructions are enough to achieve the purpose. With the addition in 2010 of the properly enforced scheduling and the FPGA system, many more students successfully completed the class project than was the case in 2009. A student survey and the independent samples t-test reveal the effectiveness of the methodology with the FPGA system. This work differs from previous work in that the devised project requires the implementation of a real CPU instead of utilizing simulators or just experimenting with ready-made complete CPU models.

Computers & Electrical Engineering | 2013

Accelerating Histograms of Oriented Gradients descriptor extraction for pedestrian recognition

Seung Eun Lee; Kyungwon Min; Taeweon Suh

Abstract Pedestrian recognition is an emerging visual computing application for embedded systems. In one usage model, a vehicle mounted camera acquires image from road and a pedestrian recognition system automatically recognizes and alarms information on the road preventing traffic accidents. Achieving this in software on embedded systems requires significant compute processing for object recognition. In this paper, we identify the hotspot function of the workload on an embedded system that motivates acceleration and present the detailed design of a hardware accelerator for Histograms of Oriented Gradients descriptor extraction. We also quantify the performance and area efficiency of the hardware accelerator. Our analysis shows that hardware acceleration has the potential to improve the hotspot function. As a result, user response time can be reduced significantly.

Ksii Transactions on Internet and Information Systems | 2010

A Personalized English vocabulary learning system based on cognitive abilities related to foreign language proficiency

Dai Young Kwon; Heui Seok Lim; Won Gyu Lee; Hyeoncheol Kim; Soonyoung Jung; Taeweon Suh; Kichun Nam

This paper proposes a novel of a personalized Computer Assisted Language Learning (CALL) system based on learner’s cognitive abilities related to foreign language proficiency. In this CALL system, a strategy of retrieval learning, a method of learning memory cycle, and a method of repeated learning are applied for effective vocabulary memorization. The system is designed to offer personalized learning based on cognitive abilities related to the human language process. For this, the proposed CALL system has a cognitive diagnosis module which can measure five types of cognitive abilities. The results of this diagnosis are used to create dynamic learning scenarios for personalized learning and to evaluate user performance in the learning. This system is also designed in order to have users be able to create learning word lists and to share them simply with various functions based on open APIs. Additionally, through experiments, it has shown that this system helps students to learn English vocabulary effectively and enhances their foreign language skills.

international symposium on microarchitecture | 2004

Integrating cache coherence protocols for heterogeneous multiprocessor system. Part 2

Taeweon Suh; Hsien Hsin S Lee; Douglas M. Blough

This systematic methodology maintains cache coherency in a heterogeneous shared-memory multiprocessor system on a chip. It works with any combination of processors that support any invalidation-based protocol, and experiments have demonstrated up to a 51 percent performance improvement, compared to a pure software solution.

high performance computing and communications | 2009

A Potential Based Routing Protocol for Mobile Ad Hoc Networks

Dai Yong Kwon; Jae-Hwa Chung; Taeweon Suh; Won Gyu Lee; Kyeong Hury

In this paper, we propose a novel proactive routing protocol, referred to as potential management based proactive routing (PMPR), for mobile ad hoc networks. Unlike other proactive routing protocols, PMPR performs request based routing recovery for proactive route maintenance. When a node has lost the routing information, it attempts a local route recovery by broadcasting a request message to neighbor nodes within a limited hop range. If the local recovery succeeds, the routing information is reconstructed by the interaction between the requesting node and the neighbor nodes. In this paper,we introduce a concept of potential and propose an efficient management method of potential. Potential is a value assigned to each node for each destination. Routes are determined based on the potential of each node. When a node is requested to perform a route recovery and the recovery is feasible, the node modifies its potential to a lower value to provide the requestor a new route. A potential management method determines the success rate of the local route recovery and consequent route optimality. In our simulation with a moderate node density and high node mobility, over 95% of broken routes are recovered with1 hop request. PMPR outperforms DVDS for all the simulated parameters. PMPR also outperforms AODV and DSR under high node mobility and high traffic load condition. Under a low node mobility or low data rate condition, PMPR provides comparable performance to AODV and DSR

The Journal of Supercomputing | 2010

Adaptive service scheduling for workflow applications in Service-Oriented Grid

Sung Ho Chin; Taeweon Suh; Heon Chang Yu

When the workflow application is executed in Service-Oriented Grid (SOG), performance issues such as service scheduling should be considered, to achieve high and stable performance in execution. However, most of the prior works on workflow management neither study the performance issues nor provide evaluation methodologies on the performance of Grid Services. Therefore, it is infeasible to apply for the service scheduling problem in SOG. In this paper, we propose and model evaluation metrics for the Grid Service performance. The metrics are extracted based on common properties of Grid Services and are used to quantify and evaluate the performance of an individual Grid Service. With these metrics, we develop a service scheduling scheme with a list scheduling heuristic, to choose proper and optimal Grid Services for tasks in workflow applications. It ensures high performance in the execution of the workflow applications. In addition, we propose a low-overhead rescheduling method, referred to as Adaptive List Scheduling for Service (ALSS), to adapt to the dynamic nature of a grid environment. ALSS provides stable performance for workflow applications, even in abnormal circumstances. Finally, we design an experimental environment with actual traces and perform simulations to quantify the benefits of our approach. Throughout the experiments, we demonstrate that ALSS outperforms conventional scheduling methods. Our scheme produces a scheduling performance that is superior to AHEFT by 50.2%, SLACK by 50.8%, HEFT by 68.3%, MaxMin by 72.0%, MinMin by 71.0%, and Myopic by 69.8%.

international conference on cloud computing | 2014

PFC: Privacy Preserving FPGA Cloud - A Case Study of MapReduce

Lei Xu; Weidong Shi; Taeweon Suh

Privacy is one of the critical concerns that hinder the adoption of public cloud. For storage, encryption can be used to protect users data. But for outsourced data processing, for example MapReduce, there is no satisfying solution. Users have to trust the cloud service providers totally. In this work, we propose PFC, a FPGA cloud for privacy preserving computation in the public cloud environment. PFC leverages the security feature of the existing FPGAs originally designed for bitstream IP protection and proxy re-encryption for preserving user data privacy. In PFC, cloud service providers are not necessarily trusted, and during outsourced computation, users data is protected by a data encryption key only accessible by trusted FPGA devices. As an important application of cloud computing, we apply PFC to the popular MapReduce programming model and extend the FPGA based MapReduce pipeline with privacy protection capabilities. Proxy re-encryption is employed to support dynamic allocations of trusted FPGA devices as mappers and reducers. Finally, we conduct evaluation to demonstrate the effectiveness of PFC.

Information Systems Frontiers | 2014

Scalable and leaderless Byzantine consensus in cloud computing environments

JongBeom Lim; Taeweon Suh; Joon-Min Gil; HeonChang Yu

Traditional Byzantine consensus in distributed systems requires n ≥ 3f + 1, where n is the number of nodes. In this paper, we present a scalable and leaderless Byzantine consensus implementation based on gossip, requiring only n ≥ 2f + 1 nodes. Unlike conventional distributed systems, the network topology of cloud computing systems is often not fully connected, but loosely coupled and layered. Hence, we revisit the Byzantine consensus problem in cloud computing environments, in which each node maintains some number of neighbors, called local view. The message complexity of our Byzantine consensus scheme is O(n), instead of O(n2). Experimental results and correctness proof show that our Byzantine consensus scheme can solve the Byzantine consensus problem safely in a scalable way without a bottleneck and a leader in cloud computing environments.

Explore More