Wenrui Gong
University of California, Santa Barbara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wenrui Gong.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2007
Gang Wang; Wenrui Gong; Brian DeRenzi; Ryan Kastner
Operation scheduling (OS) is a fundamental problem in mapping an application to a computational device. It takes a behavioral application specification and produces a schedule to minimize either the completion time or the computing resources required to meet a given deadline. The OS problem is NP-hard; thus, effective heuristic methods are necessary to provide qualitative solutions. We present novel OS algorithms using the ant colony optimization approach for both timing-constrained scheduling (TCS) and resource-constrained scheduling (RCS) problems. The algorithms use a unique hybrid approach by combining the MAX-MIN ant system metaheuristic with traditional scheduling heuristics. We compiled a comprehensive testing benchmark set from real-world applications in order to verify the effectiveness and efficiency of our proposed algorithms. For TCS, our algorithm achieves better results compared with force-directed scheduling on almost all the testing cases with a maximum 19.5% reduction of the number of resources. For RCS, our algorithm outperforms a number of different list-scheduling heuristics with better stability and generates better results with up to 14.7% improvement. Our algorithms outperform the simulated annealing method for both scheduling problems in terms of quality, computing time, and stability
Journal of Low Power Electronics | 2005
Yan Meng; Wenrui Gong; Ryan Kastner; Timothy Sherwood
Wireless networks are making the vision of ubiquitous computing a reality: users will be able to connect anytime and anywhere from anything. To achieve this vision, the next generation of wireless devices must learn about, and adapt to, the transmission environment through a process called channel estimation. In this paper, we describe a cross-cutting approach to explore the design space to solve the channel estimation problem on reconfigurable devices. In particular we focus on the matching pursuit algorithm, which is a fast and accurate iterative algorithm for multipath channel estimation. Our methodology models modern reconfigurable devices as an array of Block RAMlevel operation blocks (“BLOBs”), which act as flexible data paths. With the model, we describe design techniques and tradeoffs, resulting in novel optimizations at every level in building an energy efficient MP core, from the theory and algorithms to the bit level. We present results from our design space exploration over a number of different parameters, including both high level characteristics of the application, data and computation partitioning schemes, and module- and bit-level low-power techniques. The results demonstrate the effectiveness and efficiency of our approach to building a high speed and low power channel estimator. The total power saving is 25.4%. We further show that the local, distributed computation is, on average, 145% faster with minimum cost in power dissipation, than the global, centralized computation.
design automation conference | 2006
Gang Wang; Wenrui Gong; Brian DeRenzi; Ryan Kastner
Design space exploration during high level synthesis is often conducted through ad-hoc probing of the solution space using some scheduling algorithm. This is not only time consuming but also very dependent on designers experience. We propose a novel design exploration method that exploits the duality between the time and resource constrained scheduling problems. Our exploration automatically constructs a high quality time/area tradeoff curve in a fast, effective manner. It uses the max-min ant colony optimization to solve both the time and resource constrained scheduling problems. We switch between the time and resource constrained algorithms to quickly traverse the design space. Our algorithm provides a significant solution quality savings (average 17.3% reduction of resource counts) with similar run time on a comprehensive benchmark suite constructed with classic and real-life samples, compared to using force directed scheduling exhaustively at every time step. Our algorithms scale well over different applications and problem sizes
international conference on computer aided design | 2006
Gang Wang; Wenrui Gong; Ryan Kastner
While the exact manufacturing process for nanoscale computing devices is uncertain, it is abundantly clear that future technology nodes will see an increase in defect rates. Therefore, it is of paramount importance to construct new architectures and design methodologies that can tolerate large numbers of defects. Defect maps are a necessity in the future design flows, and research on their practical construction is essential. In this work, we study the use of Bloom filters as a data structure for defect maps. We show that Bloom filters provide the right tradeoff between accuracy and space-efficiency. In particular, they can help simplify the nanosystem design flow by embedding defect information within the nanosystem delivered by the manufacturers. We develop a novel nanoscale memory design that uses this concept. It does not rely on a voting strategy, and utilizes the device redundancy more effectively than existing approaches
great lakes symposium on vlsi | 2005
Gang Wang; Wenrui Gong; Ryan Kastner
Instruction scheduling is a fundamental step for mapping an application to a computational device. It takes a behavioral application specification and produces a schedule for the instructions onto a collection of processing units. The objective is to minimize the completion time of the given application while effectively utilizing the computational resources. The instruction scheduling problem is NP-hard, thus effective heuristic methods are necessary to provide a qualitative scheduling solution. In this paper, we present a novel instruction scheduling algorithm using MAX-MIN Ant System Optimization approach. The algorithm utilizes a unique hybrid approach by combining the ant system meta-heuristic with list scheduling, where the local and global heuristics are dynamically adjusted to the input application in an iterative manner. Compared with force-directed scheduling and a number of different list scheduling heuristics, our algorithm generates better results over all the tested benchmarks with better stability. Furthermore, by solving the test samples optimally using ILP formulation, we show that our algorithm consistently achieves a near optimal solution.
design, automation, and test in europe | 2006
Ryan Kastner; Wenrui Gong; Xin Hao; Forrest Brewer; Adam Kaplan; P. Brisbane; Majid SarrafzadehWenrui
High level synthesis transformations play a major part in shaping the properties of the final circuit. However, most optimizations are performed without much knowledge of the final circuit layout. In this paper, we present a physically aware design flow for mapping high level application specifications to a synthesizable register transfer level hardware description. We study the problem of optimizing the data communication of the variables in the application specification. Our algorithm uses floorplan information that guides the optimization. We develop a simple, yet effective, incremental floorplanner to handle the perturbations caused by the data communication optimization. We show that the proposed techniques can reduce the wirelength of the final design, while maintaining a legal floorplan with the same area as the initial floorplan
international conference on computer aided design | 2005
Wenrui Gong; Gang Wang; Ryan Kastner
Modern, high performance configurable architectures integrate on-chip, distributed block RAM modules to provide ample data storage. Synthesizing applications to these complex systems requires an effective and efficient approach to conduct data partitioning and storage assignment. In this paper, we present a data and iteration space partitioning solution that focuses on minimizing remote memory accesses or, equivalently, maximizing the local computation. Using the same code but different data partitionings, we can achieve faster clock frequencies, without increasing the number of cycles, by simply minimizing global memory accesses. Other optimization techniques like scalar replacement, prefetching and buffer insertion can further minimize remote accesses and lead to average 4.8/spl times/ speedup in overall runtime.
field-programmable custom computing machines | 2006
Gang Wang; Wenrui Gong; Ryan Kastner
The authors propose a novel defect-tolerant design methodology using Bloom filters for defect mapping for nanoscale computing devices. It is a general approach that can be used for any permanent defects incurred during the manufacturing process. The redundant design methodology does not rely on a voting strategy, thus it utilizes the device redundancy more effectively than existing approaches. Additionally, our method does not have false-positive in defect identification, i.e. it will not report a defective device as functional. Moreover, it is very space economic and can be programmed to fit different scales and characteristics of the underlying specific nanoscale devices used in the system
Journal of Embedded Computing | 2006
Gang Wang; Wenrui Gong; Ryan Kastner
Archive | 2005
Wenrui Gong; Gang Wang; Ryan Kastner