Is this you? Create Your Porfile

I-Wei Wu

National Chiao Tung University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where I-Wei Wu is active.

Explore More

Publication

Featured researches published by I-Wei Wu.

design automation conference | 2008

ETAHM: an energy-aware task allocation algorithm for heterogeneous multiprocessor

Po-Chun Chang; I-Wei Wu; Jean Jyh-Jiun Shann; Chung-Ping Chung

In demand of more computing power and less energy use, multiprocessor with power management facility emerges in embedded system design. Dynamic voltage scaling is such a facility that varies clock speed and supply voltage to save more energy. In this paper, we propose ETAHM to allocate tasks on a target multiprocessor system. In pursuit of global optimal solution, it mixes task scheduling, mapping and DVS utilization in one phase and couples ant colony optimization algorithm. Extensive experiments show ETAHM could save 22.71% more energy than CASPER (V. Kianzad et al., 2005), a state-of-the-art integrated framework that tackles the identical problem with genetic algorithm instead.

IEEE Transactions on Consumer Electronics | 2002

A seamless handoff approach of Mobile IP protocol for mobile wireless data networks

I-Wei Wu; Wen-Shiung Chen; Ho-En Liao; Fongray Frank Young

With recent advances in wireless communication technology, mobile computing is an increasingly important area of research. Enabling mobility in IP networks is a significant issue for making use of many portable devices appearing on the Internet. The IP mobility support being standardized in the IETF utilizes tunneling of IP packets from a home agent to a foreign agent to make the mobility transparent to the higher layer. We propose an approach to optimize the routing path and avoid the triangular routing problem in IP mobility, which is an extension of the Mobile IP architecture. We propose an efficient handoff scheme in which a routing table, called the mobile routing table (MRT), is designed in each edge router such as home agent, foreign agent and general router. A packet retransmission scheme is also proposed to reduce the packet loss during handoff. We analyze and compare both the standard Mobile IP and the proposed seamless handoff approach. Finally, the simulation results are presented.

design, automation, and test in europe | 2008

Instruction set extension exploration in multiple-issue architecture

I-Wei Wu; Zhi-Yuan Chen; Jean Jyh-Jiun Shann; Chung-Ping Chung

To satisfy high-performance computing demand in modern embedded devices, current embedded processor architectures provide designer with possibility either to define customized instruction set extension (ISE) or to increase instruction issue width. Previous studies have shown that deploying ISE in multiple-issue architecture can significantly improve performance. However, identifying ISE for multiple-issue architecture by using current ISE exploration algorithms will result in unnecessary waste of silicon area and limitation of performance improvement. This is because most algorithms overlook two important considerations: (1) only packing the operations lying on the critical path into ISE can improve performance; (2) the critical path usually changes after packing operations into an ISE. With these considerations, this paper presents an algorithm for ISE exploration based on list scheduling and Ant Colony Optimization (ACO), in which combines ISE exploration and the critical path identification (i.e. instruction scheduling). Results indicate that our approach outperforms the previous work in both performance improvement and area efficiency.

annual acis international conference on computer and information science | 2013

Improving performance of JNA by using LLVM JIT compiler

Yu-Hsin Tsai; I-Wei Wu; I-Chun Liu; Jean Jyh-Jiun Shann

Java Native Access (JNA) has been proposed to alleviate the burden of programming in Java Native Interface (JNI). JNA allows programmer to call native functions without writing any JNI codes. However, JNA suffers from some performance degradation. To overcome this problem, in this paper, we modify the JNA source code and integrate the LLVM JIT compiler into JNA to improve the performance. Our experiment achieves about 8% to 16% performance improvement for calling a native function with different types and numbers of arguments. Furthermore, our design is a non-traditional way of using the runtime compiler, and the challenges we encountered may help other researchers to face the similar situations.

high performance embedded architectures and compilers | 2007

Instruction set extension generation with considering physical constraints

I-Wei Wu; Shih-Chia Huang; Chung-Ping Chung; Jean Jyh-Jiun Shann

In this paper, we propose new algorithms for both ISE exploration and selection with considering important physical constraints such as pipestage timing and instruction set architecture (ISA) format, silicon area and register file. To handle these considerations, an ISE exploration algorithm is proposed. It not only explores ISE candidates but also their implementation option to minimize the execution time meanwhile using less silicon area. In ISE selection, many researches only take silicon area into account, but it is not comprehensive. In this paper, we formulate ISE selection as a multiconstrained 0-1 knapsack problem so that it can consider multiple constraints. Results with MiBench indicate that under same number of ISE, our approach achieves 69.43%, 1.26% and 33.8% (max., min. and avg., respectively) of further reduction in silicon area and also has maximally 1.6% performance improvement compared with the previous one.

ACM Transactions in Embedded Computing Systems | 2014

Extended Instruction Exploration for Multiple-Issue Architectures

I-Wei Wu; Jean Jyh-Jiun Shann; Wei-Chung Hsu; Chung-Ping Chung

In order to satisfy the growing demand for high-performance computing in modern embedded devices, several architectural and microarchitectural enhancements have been implemented in processor architectures. Extended instruction (EI) is often used for architectural enhancement, while issuing multiple instructions is a common approach for microarchitectural enhancement. The impact of combining both of these approaches in the same design is not well understood. While previous studies have shown that EI can potentially improve performance in some applications on certain multiple-issue architectures, the algorithms used to identify EI for multiple-issue architectures yield only limited performance improvement. This is because not all arithmetic operations are suited for EI for multiple-issue architectures. To explore the full potential of EI for multiple-issue architectures, two important factors need to be considered: (1) the execution performance of an application is dominated by critical (located on the critical path) and highly resource-contentious (i.e., having a high probability of being delayed during execution due to hardware resource limitations) operations, and (2) an operation may become critical and/or highly resource contentious after some operations are added to the EI. This article presents an EI exploration algorithm for multiple-issue architectures that focuses on these two factors. Simulation results show that the proposed algorithm outperforms previously published algorithms.

ubiquitous intelligence and computing | 2015

Instruction Emulation and OS Supports of a Hybrid Binary Translator for x86 Instruction Set Architecture

I-Chun Liu; I-Wei Wu; Jean Jyh-Jiun Shann

Binary translation is one of the most important techniques of virtualization. The main purpose of a binary translator (BT) is to translate an executable from a source instruction set architecture (ISA) to a target ISA. Traditionally, there are two types of binary translators: static binary translator (SBT) and dynamic binary translator (DBT). In recent years, a new type of BT called hybrid binary translator (HBT) was proposed, which translates the source executable first at static time, and then, at run time, if the execution of the target executable emits an exception because of reaching statically untranslated code, it switches to the attached dynamic translator for translating these code. Therefore, an HBT may have the merits of both good performance of SBT and easy handling of code discovery and code location problems of DBT. Nowadays, massive application programs have been developed for x86 platforms, and thus, many binary translators have been proposed for x86 ISA. However, due to the characteristics of CISC architecture of x86, for example, variable-length instruction format, the BT designed for it previously usually apply dynamic translation strategy or make use of profiling data for resolving the code discovery and code location problems. In this paper, we present an HBT which supports x86 ISA and emulates the execution behavior of an x86 executable under Linux operation system. In our x86-32 to x86-64 translation experiments, the target executables translated by our HBT outperform that of QEMU on most programs of EEMBC benchmark suite.

symposium on application specific processors | 2010

Reconfigurable custom functional unit generation and exploitation in multiple-issue processors

Hui-Shan Wang; I-Wei Wu; Jean Jyh-Jiun Shann; Chung-Ping Chung

Recently, next-generation digital entertainment and mobile communication devices are driving the demand for high-performance processing solutions. In order to achieve this demand, multiple-issue processors such as very long instruction word (VLIW) architecture augmented with a reconfigurable hardware accelerator have been proposed in many papers. The reconfigurable hardware accelerator is usually realized by multiple functional units (FUs) organized in matrix fashion, called reconfigurable customized functional unit (RCFU). Since a multiple-issue processor can execute several data-independent operations simultaneously, executing operations on both of the RCFU and FUs of the base processor concurrently is reasonable and is also beneficial for improving the hardware resource utilization and the execution performance. Because of this observation, we propose an RCFU generation algorithm and an RCFU exploitation algorithm in this paper. In our experiment, 43% of execution performance improvement can be further achieved averagely compared with the previous works.1

computational science and engineering | 2009

Reducing Code Size by Graph Coloring Register Allocation and Assignment Algorithm for Mixed-Width ISA Processor

Jyh-Shian Wang; I-Wei Wu; Yu-Sheng Chen; Jean Jyh-Jiun Shann; Wei-Chung Hsu

Reducing program size is a critical issue in manyembedded systems which require more functionalitieswithout increasing the memory size. One of theapproaches is to adopt a mixed-width instruction setarchitecture (ISA) which usually has an instruction setin general formats (usually 32-bit long) as normalinstruction set and an instruction set in shorter format(usually 16-bit long) with limited opcodes and set ofregisters. Traditionally, a code segment can beencoded in only one format, no multiple formatsinterleaved. However, more and more processors useinstruction encoding to indicate the length of eachindividual instruction and take mixed-width ISA intoinstruction-level granularity. For this kind of ISAs, thenumber of instructions can be encoded in shorterformat is highly dependent on the limited set ofregisters that can be accessed by shorter formatinstructions. In this paper, we present a registerallocation and assignment algorithm based on graphcoloring, which uses a heuristic model to find outwhich virtual variables in program should be assignedinto the set of registers accessible by shorterinstructions. The simulation results show that 63.34%of the instructions can be translated into shorterformat on average.

Journal of Information Science and Engineering | 2015