Ren-Song Tsay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ren-Song Tsay is active.

Explore More

Publication

Featured researches published by Ren-Song Tsay.

asia and south pacific design automation conference | 2010

Source-level timing annotation for fast and accurate TLM computation model generation

Kai-Li Lin; Chen Kang Lo; Ren-Song Tsay

This paper proposes a source-level timing annotation method for generation of accurate transaction level models for software computation modules. While Transaction Level Modeling (TLM) approach is widely adopted now for system modeling and simulation speed improvement, timing estimation accuracy often is compromised. To have reliable and accurate estimation results at system level, we propose a timing annotation method for accurate TLM computation model generation considering processor architecture with pipeline and cache structures, which are challenging but critical to accurate timing estimation. The experiments show that our results are within 2% of cycle accurate results and the approach is three orders faster than conventional ISS approaches.

embedded software | 2009

An effective synchronization approach for fast and accurate multi-core instruction-set simulation

Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay

This paper proposes a synchronization approach for fast and accu-rate Multi-Core Instruction-Set Simulation (MCISS). An ideal MCISS should run accurately in a real-time fashion. In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead and lowers the simulation speed. Instead of synchronizing every cycle, our approach synchronizes the MCISS based on the data dependency among the simulated programs. Therefore, the synchronization overheads can be highly reduced while the accurate simulation results are ensured. With the proposed approach applied, the simulation speed of MCISS is up to 40 ~ 1,000 million instructions per second (MIPS) in general.

design, automation, and test in europe | 2010

Automatic generation of software TLM in multiple abstraction layers for efficient HW/SW co-simulation

Meng-Huan Wu; Wen-Chuan Lee; Chen-Yu Chuang; Ren-Song Tsay

This paper proposes a novel software Transaction-Level Modeling (TLM) approach for efficient HW/SW co-simulation. In HW/SW co-simulation, timing synchronization should be involved between the hardware and software simulations for keeping their concurrency. However, improperly handling timing synchronization either slows down the simulation speed or scarifies the simulation accuracy. Our approach performs timing synchronization only at the points of HW/SW interactions, so the accurate simulation result can be achieved efficiently. Furthermore, we define three abstraction levels of software TLM models based on the type of interactions captured. Given the target software, the software TLM models can be automatically generated in multiple abstraction layers. The experimental results show that our software TLM models attain 3 million instructions per second (MIPS) for low-level abstraction and go as high as 248 MIPS for higher level abstraction. Therefore, designers can have efficient co-simulation by selecting a proper layer according to the abstraction of corresponding hardware components.

international conference on hardware/software codesign and system synthesis | 2009

Cycle count accurate memory modeling in system level design

Yi-Len Lo; Mao Lin Li; Ren-Song Tsay

In this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

design, automation, and test in europe | 2011

Cycle-count-accurate processor modeling for fast and accurate system-level simulation

Chen Kang Lo; Li-Chun Chen; Meng-Huan Wu; Ren-Song Tsay

Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing cycle-accurate (CA) and cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first cycle-count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate cycle count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution cycle count information as the target RTL model.

design automation conference | 2011

A high-parallelism distributed scheduling mechanism for multi-core instruction-set simulation

Meng-Huan Wu; Peng-Chih Wang; Cheng-Yang Fu; Ren-Song Tsay

Ideally, multi-core instruction-set simulation should run in parallel to improve simulation performance. However, the conventional low-parallelism centralized scheduler greatly constrains simulation performance. To resolve this issue, we propose a high-parallelism distributed scheduling mechanism. The experimental results show that our proposed approach accelerates simulation by 6 to 20 times, depending on the number of cores.

international conference on computer aided design | 2009

How to consider shorts and guarantee yield rate improvement for redundant wire insertion

Fong-Yuan Chang; Ren-Song Tsay; Wai-Kei Mak

This paper accurately considers wire short defects and proposes an algorithm to guarantee IC chip yield rate improvement for redundant wire insertion. Without considering yield rate degradation caused by shorts, traditional methods may even lead to yield rate loss. However, shorts are more complicated to analyze than opens. Moreover, since any two points of a routed net can be connected by a redundant wire, the number of possible insertion patterns for a chip is un-tractable. To maximize yield rate improvement and to make the problem tractable, we identify a key insight, tolerance-ratio, as an effective guide for choosing insertion patterns and insertion order. Finally, to guarantee yield rate improvement, only positive gain redundant wires are committed. Experimental results show that, compared with unprocessed cases, all yield rate improvements in the proposed algorithm are positive, and the defect rates are reduced by up to 65% and by 24% on average. On the other hand, without considering shorts, the defect rate can increase as much as 7%.

design, automation, and test in europe | 2011

DOM: A Data-dependency-Oriented Modeling approach for efficient simulation of OS preemptive scheduling

Peng-Chih Wang; Meng-Huan Wu; Ren-Song Tsay

Operating system (OS) models are widely used to alleviate the overwhelmed complexity of running system-level simulation of software applications on specific OS implementation. Nevertheless, current OS modeling approaches are unable to maintain both simulation speed and accuracy when dealing with preemptive scheduling. This paper proposes a Data-dependency-Oriented Modeling (DOM) approach. By guaranteeing the order of shared variable accesses, accurate simulation results are obtained. Meanwhile, the simulation effort of our approach is considerably less than that of the conventional Cycle-Accurate (CA) modeling approach, thereby leading to high simulation speed, 42 to 223 million instructions per second (MIPS) or 114 times faster, than CA modeling as supported by our experimental results.

ACM Transactions in Embedded Computing Systems | 2013

A distributed timing synchronization technique for parallel multi-core instruction-set simulation

Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay

As multi-core architecture has become the mainstream, the corresponding multi-core instruction-set simulation (MCISS) is also needed to aid system development. Ideally, we may run a MCISS in parallel to enhance the simulation speed. However, the conventional centralized timing synchronization mechanism would greatly constrain the parallelism of a MCISS, so the simulation speed is bounded. To resolve this issue, we propose a new distributed timing synchronization technique which allows higher parallelism for a MCISS. Hence, it accelerates the simulation speed by 9 to 20 times as the number of cores increases in contrast to the centralized synchronization approach.

asia and south pacific design automation conference | 2011

Cut-demand based routing resource allocation and consolidation for routability enhancement

Fong-Yuan Chang; Sheng-Hsiung Chen; Ren-Song Tsay; Wai-Kei Mak

To successfully route a design, one essential requirement is to allocate sufficient routing resources. In this paper, we show that allocating routing resources based on horizontal and vertical (H/V) cut-demands can greatly improve routability especially for designs with thin areas. We then derive methods to predict the maximum H/V cut-demands and propose two cut-demand based approaches, one is to allocate routing resources considering the maximum H/V cut-demands and the other is to consolidate fragmented metal-1 routing resources for effective resource utilization. Experimental results demonstrate that the resource allocation method can precisely determine design areas and the resource consolidation method can significantly improve routability. With better routability, the routing time is about 5 times faster on average and the design area can be further reduced by 2–15%.

Explore More