Jianwen Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianwen Zhu is active.

Explore More

Publication

Featured researches published by Jianwen Zhu.

design, automation, and test in europe | 1999

A retargetable, ultra-fast instruction set simulator

Jianwen Zhu; Daniel D. Gajski

In this paper we present new techniques which further improve the static compiled instruction set architecture (ISA) simulation by the aggressive utilization of the host machine resources. Such utilization is achieved by defining a low level code generation interface specialized for ISA simulation, rather than the traditional approaches which use C as a code generation interface. We are able to perform the simulation at a speed up to 10/sup 2/ millions of simulated instructions per second (MIPS). This result is only 1.1-2.5 times slower than the native execution on the host machine, the fastest to the best of our knowledge. Furthermore, the code generation interface is organized to implement a RISC like virtual machine, which makes our tool easily retargetable to many host platforms.

international conference on computer aided design | 2002

Symbolic pointer analysis

Jianwen Zhu

One of the bottlenecks in the recent movement of hardware synthesis from behavioral C programs is the difficulty in reasoning about runtime pointer values at compile time. The pointer analysis problem has been investigated in the compiler community for two decades, which has yielded efficient, polynomial time algorithms for context-insensitive analysis. However, at the accuracy level for which hardware synthesis is desired, namely context and flow sensitive analysis, the time and space complexity of the best algorithms reported grow exponentially with program size. In this paper, we present our first step towards a new analysis technology which potentially leads to almost-linear time complexity and sub-linear space complexity algorithm even for the most accurate analysis. The key idea that contributes to this efficiency is to implicitly encode the pointer-to relation in the Boolean domain, thereby capturing the procedure transfer function completely, compactly and canonically. This represents a wide departure from the traditional techniques, all of which explicitly capture pointer-to relation using variations of point-to graph, which have to be re-evaluated for different calling contexts. Experiments for our first flow-insensitive algorithm on common benchmarks show promising result.

field programmable gate arrays | 2010

Towards scalable placement for FPGAs

Huimin Bian; Andrew C. Ling; Alexander Choong; Jianwen Zhu

Placement based on simulated annealing is in dominant use in the FPGA community due to its superior quality of result (QoR). However, given the progression of FPGA device capacity to the order of 100K LUTs, the long runtime associated with simulated annealing warrants a revisit of other placement paradigms in the context of FPGAs. In this paper, we attempt to make a rigorous comparison of a recent crop of academic ASIC placers and VPR when applied to modern FPGA device features and design sizes. We also report a new detailed placer, MDP, based on a new problem formulation of maximum-bipartite matching. We show that MDP is 3X to 7X faster than the detailed placer in FastPlace, which until now has been the fastest detailed placer publicly available. Furthermore, this speedup occurs while producing comparable or superior QoR. With these results, we speculate promising research directions towards scalable, high quality FPGA placement flows that can change the user experience from an overnight wait-time to a coffee break wait-time -- even on large benchmarks.

Archive | 1997

Essential Issues in Codesign

Daniel D. Gajski; Jianwen Zhu; Rainer Dömer

In the last ten years, VLSI design technology, and the CAD industry in particular, have been very successful, enjoying an exceptional growth that has been paralleled only by the advances in IC fabrication. Since the design problems at the lower levels of abstraction became humanly intractable earlier than those at higher abstraction levels, researchers and the industry alike were forced to devote their attention first to lower-level problems such as physical and logic design. As these problems became more manageable, CAD tools for logic simulation and synthesis were developed successfully and introduced into the design process. As design complexities have grown and time-to-market requirements have shrunk drastically, both industry and academia have begun to focus on system levels of design since they reduce the number of objects that a designer needs to consider by an order of magnitude and thus allow the design and manufacturing of complex application specific integrated circuits (ASIC) quickly.

IEEE Transactions on Very Large Scale Integration Systems | 2002

An ultra-fast instruction set simulator

Jianwen Zhu; Daniel D. Gajski

In this paper, we present new techniques which further improve the static compilation-based instruction set architecture (ISA) simulation by the aggressive utilization of the host machine resources. Such utilization is achieved by defining a low-level code-generation interface specialized for ISA simulation, rather than the traditional approaches which use C as a code-generation interface. We are able to perform the simulation at a speed of up to 10/sup 2/ millions of simulated instructions per second (MIPS) on a 270 MHz Ultra-5 workstation. This result is only on average 1.6 times slower than the native execution on the host machine, the fastest to the best of our knowledge.

international conference on computer aided design | 2004

DynamoSim: a trace-based dynamically compiled instruction set simulator

Wai Sum Mong; Jianwen Zhu

Instruction set simulators are indispensable tools for the architectural exploration and verification of embedded systems. Different techniques have recently been proposed to speed up the simulation over the classical interpretation-based simulators, while maintaining their flexibility. We introduce a suite of techniques inspired by recent advances in dynamic compilers to construct a hybrid simulation framework. Compared with compiled simulators reported earlier, our framework is more flexible, since any instruction can be interpreted; and faster, since only frequently executed instructions are translated on-the-fly into native code for direct execution, and the scope of our translation is extended from basic blocks to traces, and sophisticated register allocation is performed. Comprehensive results on SPEC2000 benchmarks are reported for the standard SimpleScalar processor to demonstrate the efficiency of proposed techniques.

field-programmable logic and applications | 2010

Parallelizing Simulated Annealing-Based Placement Using GPGPU

Alexander Choong; Rami Beidas; Jianwen Zhu

Simulated annealing has became the de facto standard for FPGA placement engines since it provides high quality solutions and is robust under a wide range of objective functions. However, this method will soon become prohibitive due to its sequential nature and since the performance of single-core processor has stagnated. General purpose computing on graphics processing units (GPGPU) offers a promising solution to improve runtime with only commodity hardware. In this work, we develop a highly parallel approach to simulated annealing-based placement using GPGPU. We identify the challenges posed by the GPU architecture and describe effective solutions. An average speedup of about 10x was achieved over conventional placement within 3% of wirelength.

design automation conference | 2005

Towards scalable flow and context sensitive pointer analysis

Jianwen Zhu

Pointer analysis, a classic problem in software program analysis, has emerged as an important problem to solve in design automation, at a time when complex designs, specified in the form of C code, need to be synthesized or verified. However, precise pointer analysis algorithms that are both context and flow sensitive (FSCS), have not been shown to scale. In this paper, we report a new solution for FSCS analysis, which can evaluate the program states of all program points under billions of different calling paths. Our solution extends the recently proposed symbolic pointer analysis (SPA) technology, which exploits the efficiency of binary decision diagrams (BDDs). With our new strategy of problem solving, called superposed symbolic computation, and its application on our generic pointer analysis framework, we are able to report the first result on all SPEC2000 benchmarks that completes context sensitive, flow insensitive analysis in seconds, and context sensitive, flow sensitive analysis in minutes.

design automation conference | 1999

Soft scheduling in high level synthesis

Jianwen Zhu; Daniel D. Gajski

In this paper, we establish a theoretical framework for a new concept of scheduling called soft scheduling. In contrasts to the traditional schedulers referred as hard schedulers, soft schedulers make soft decisions at a time, or decisions that can be adjusted later. Soft scheduling has a potential to alleviate the phase coupling problem that has plagued traditional high level synthesis (HLS), HLS for deep submicron design and VLIW code generation. We then develop a specific soft scheduling formulation, called threaded schedule, under which a linear, optimal (in the sense of online optimality) algorithm is guaranteed.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006

Dynamic-range estimation

Bin Wu; Jianwen Zhu; Farid N. Najm

It has been widely recognized that the dynamic-range information of an application can be exploited to reduce the datapath bitwidth of either processors or application-specific integrated circuits and, therefore, the overall circuit area, delay, and power consumption. While recent proposals of analytical dynamic-range-estimation methods have shown significant advantages over the traditional profiling-based method in terms of runtime, it is argued here that the rather simplistic treatment of input correlation and system nonlinearity may lead to significant error. In this paper, three mathematical tools, namely Karhunen-Loegraveve expansion, polynomial chaos expansion, and independent component analysis are introduced, which enable not only the orthogonal decomposition of input random processes, but also the propagation of random processes through both linear and nonlinear systems with difficult constructs such as multiplications, divisions, and conditionals. It is shown that when applied to interesting nonlinear applications such as adaptive filters, polynomial filters, and rational filters, this method can produce complete accurate statistics of each internal variable, thereby allowing the synthesis of bitwidth with the desired trade off between circuit performance and signal-to-noise ratio

Explore More