Won-Young Chung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Won-Young Chung is active.

Explore More

Publication

Featured researches published by Won-Young Chung.

The Journal of Korean Institute of Communications and Information Sciences | 2011

Study of Parallel Network Processor using Global Cache

Jaewon Park; Won-Young Chung; Hyun-Pil Kim; Jung-Hee Lee; Yong-Surk Lee

The mount of network traffic from the Internet is increasing because of the use of Broadband Convergence Networks(BcN). Network traffic is also increasing because of the development of application, especially multimedia traffic from IPTV, VOD, and online games. This multimedia traffic not only has a huge payload but also should be considered a threat in real time. For this reason, this study examines the ways that routers distribute the bandwidth in accordance to traffic properties. To classify the property of the traffic, it is essential to analyze the application layer. However, the general network processor architecture serially processes the L2-4 and L7 layer. We propose a novel parallel network processor architecture with a global cache that processes L2-4 and L7 in parallel. To verify the proposed architecture, we simulated both of the architecture with SystemC. EEMBC and SNORT was used to measure L2-4 and L7 processing time. When multimedia traffic was entered into the network processor in the same flow, the proposed architecture showed about 85% higher performance than general architecture.

asia-pacific conference on communications | 2011

Implementing and optimizing ROM table for broadcast message used in MPI unit

Sang-Su Park; Heejun Yun; Won-Young Chung; Yong-Surk Lee

In this paper, we propose to implement and optimize a ROM table which contains a set of point-to-point communications for broadcast communication in message passing interface (MPI) systems in multi-processors that use distributed memory. MPI broadcast communication is one of the most frequently used collective functions. The contents of the ROM table are part of message packets about a set of point-to-point communications for broadcast communication determined by considering the states of every processing node. Thus, it can prevent the sending node from communicating the data to another node in a busy state. This minimizes the performance degradation caused by conflict. Also, the broadcast communication is based on a binary tree algorithm. Since each processing node owns the same ROM table, we need to optimize the size of the ROM table. The states of all the processing nodes and their own identification number are required to index the ROM table correctly. Consequently, by optimizing the ROM table, the bit size is reduced about 25% with four nodes, about 75% with eight nodes, and about 81% with 16 nodes.

The Journal of Korean Institute of Communications and Information Sciences | 2011

The Design of MPI Hardware Unit for Enhanced Broadcast Communication

Heejun Yun; Won-Young Chung; Yong-Surk Lee

This paper proposes an algorithm and hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional systems, collective communication is converted into point-to-point communications by MPI library cell without considering the state of communication port of each processing node which represents the processing node is in busy state or free state. If conflicting point-to-point communication occurs during broadcast communication, the transmitting speed for broadcast communication is decreased. Thus, this paper proposed an algorithm which determines the order of point-to-point communications for broadcast communication according to the state of each processing node. According to the state of each processing node, the proposed algorithm decreases total broadcast communication time by transmitting message preferentially to the processing node with communication port in free state. The proposed MPI unit for broadcast communication is evaluated by modeling it with systemC. In addition, it achieved a highly improved performance for broadcast communication up to 78% with 16 nodes. This result shows the proposed algorithm is useful to improving total performance of MPSoC.

The Journal of Korean Institute of Communications and Information Sciences | 2011

The Design of Hardware MPI Units for MPSoC

Ha-young Jeong; Won-Young Chung; Yong-Surk Lee

In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip . And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).

international conference on hybrid information technology | 2009

An implementation of the CQS supporting multimedia traffic

Jinsil Kim; Won-Young Chung; Jung-Hee Lee; Yong-Surk Lee

In this paper, we propose a CQS (Calendar Queue Scheduler) architecture which was designed for processing multimedia and timing traffic in home network. With various characteristics of the increased traffic flowed in home such as VoIP, VOD, IPTV, and Best-efforts traffic, the needs of managing QoS (Quality of Service) are being discussed. Making a group regarding application or service is effective to guarantee successful QoS under the restricted circumstances. The proposed design is aimed for home gateway corresponding to the end points of receiver on end-to-end QoS and eligible for supporting multimedia traffic within restricted network sources and optimizing queue sizes. We present a CQS (Calendar Queue Scheduler) architecture implemented in synthesizable Verilog form. We simulated the area for both each module and each memory. The area for each module is referenced by NAND(2x1) Gate(11.09) when synthesizing with Magnachip 0.18 CMOS libraries through the Synopsys Design Compiler. We verified the portion of memory is 85.38% of the entire CQS. And each memory size is extracted through CACTI 5.3(a unit in mm2). As the day size increases, the increment of the total area increases. According to the increase of the memorys entry, the increment of memory area gradually increases, and defining the day size for 1 year definitely affects the total CQS area. Even though the CQS is eligible for rate control and delay control and it is in pursuit of home gateway corresponding to the end points of receiver on end-to-end QoS, its biggest problem is the increased memory size. In this paper, we discussed design methodology and operation for each module when designing CQS by hardware. Also, its biggest problem on designing CQS is the memory size. We surely know that it is important to define the number of priorities and the day size for 1 year.

IEICE Transactions on Information and Systems | 2011