Bwolen Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bwolen Yang is active.

Explore More

Publication

Featured researches published by Bwolen Yang.

formal methods in computer aided design | 1998

A Performance Study of BDD-Based Model Checking

Bwolen Yang; Randal E. Bryant; David R. O'Hallaron; Armin Biere; Olivier Coudert; Geert Janssen; Rajeev K. Ranjan; Fabio Somenzi

We present a study of the computational aspects of model checking based on binary decision diagrams (BDDs). By using a trace-based evaluation framework, we are able to generate realistic benchmarks and perform this evaluation collaboratively across several different BDD packages. This collaboration has resulted in significant performance improvements and in the discovery of several interesting characteristics of model checking computations. One of the main conclusions of this work is that the BDD computations in model checking and in building BDDs for the outputs of combinational circuits have fundamentally different performance characteristics. The systematic evaluation has also uncovered several open issues that suggest new research directions. We hope that the evaluation methodology used in this study will help lay the foundation for future evaluation of BDD-based algorithms.

acm sigplan symposium on principles and practice of parallel programming | 1997

A new model for integrated nested task and data parallel programming

Jaspal Subhlok; Bwolen Yang

High Performance Fortran (HPF) has emerged as a standard language fordata parallel computing. However, a wide variety of scientific applications are best programmed by a combination of task and data parallelism. Therefore, a good model of task parallelism is important for continued success of HPF for parallel programming. This paper presents a task parallelism model that is simple, elegant, and relatively easy to implement in an HPF environment. Task parallelism is exploited by mechanisms for dividing processors into subgroups and mapping computations and data onto processor subgroups. This model of task parallelism has been implemented in the Fx compiler at Carnegie Mellon University. The paper addresses the main issues in compiling integrated task and data parallel programs and reports on the use of this model for programming various flat and nested task structures. Performance results are presented for a set of programs spanning signal processing, image processing, computer vision and environment modeling. A variant of this task model is a new approved extension of HPF and this paper offers insight into the power of expression and ease of implementation of this extension.

acm sigplan symposium on principles and practice of parallel programming | 1997

Parallel breadth-first BDD construction

Bwolen Yang; David R. O'Hallaron

With the increasing complexity of protocol and circuit designs, formal verification has become an important research area and binary decision diagrams (BDDs) have been shown to be a powerful tool in formal verification. This paper presents a parallel algorithm for BDD construction targeted at shared memory multiprocessors and distributed shared memory systems. This algorithm focuses on improving memory access locality through specialized memory managers and partial breadth-first expansion, and on improving processor utilization through dynamic load balancing. The results on a shared memory system show speedups of over two on four processors and speedups of up to four on eight processors. The measured results clearly identify the main source of bottlenecks and point out some intereeting directions for further improvements.

asia and south pacific design automation conference | 1998

Space- and time-efficient BDD construction via working set control

Bwolen Yang; Yirng-An Chen; Randal E. Bryant; David R. O'Hallaron

Binary decision diagrams (BDDs) have been shown to be a powerful tool in formal verification. Efficient BDD construction techniques become more important as the complexity of protocol and circuit designs increases. This paper addresses this issue by introducing three techniques based on working set control. First, we introduce a novel BDD construction algorithm based on partial breadth-first expansion. This approach has the good memory locality of the breadth-first BDD construction while maintaining the low memory overhead of the depth-first approach. Second, we describe how memory management on a per-variable basis can improve spatial locality of BDD construction at all levels, including expansion, reduction, and rehashing. Finally, we introduce a memory compacting garbage collection algorithm to remove unreachable BDD nodes and minimize memory fragmentation. Experimental results show that when the applications fit in physical memory, our approach has speedups of up to 1.6 in comparison to both depth-first (CUDD) and breadth-first (GAL) packages. When the applications do not fit into physical memory, our approach outperforms both CUDD and CAL by up to an order of magnitude. Furthermore, the good memory locality and low memory overhead of this approach has enabled us to be the first to have successfully constructed the entire C6288 multiplication circuit from the ISCAS85 benchmark set using only conventional BDD representations.

computer aided verification | 1999

Optimizing Symbolic Model Checking for Constraint-Rich Models

Bwolen Yang; Reid G. Simmons; Randal E. Bryant; David R. O'Hallaron

This paper presents optimizations for verifying systems with complex time-invariant constraints. These constraints arise naturally from modeling physical systems, e.g., in establishing the relationship between different components in a system. To verify constraint-rich systems, we propose two new optimizations. The first optimization is a simple, yet powerful, extension of the conjunctivepartitioning algorithm. The second is a collection of BDD-based macro-extraction and macro-expansion algorithms to remove state variables. We showthat these two optimizations are essential in verifying constraint-rich problems; in particular, this work has enabled the verification of fault diagnosis models of the Nomad robot (an Antarctic meteorite explorer) and of the NASA Deep Space One spacecraft.

languages and compilers for parallel computing | 1993

Do&Merge: Integrating Parallel Loops and Reductions

Bwolen Yang; Jon A. Webb; James M. Stichnoth; David R. O'Hallaron; Thomas R. Gross

Many computations perform operations that match this pattern: first, a loop iterates over an input array, producing an array of (partial) results. The loop iterations are independent of each other and can be done in parallel. Second, a reduction operation combines the elements of the partial result array to produce the single final result. We call these two steps a Do&Merge computation. The most common way to effectively parallelize such a computation is for the programmer to apply a DOALL operation across the input array, and then to apply a reduction operator to the partial results. We show that combining the Do phase and the Merge phase into a single Do&Merge computation can lead to improved execution time and memory usage. In this paper we describe a simple and efficient construct (called the Pdo loop) that is included in an experimental HPF-like compiler for private-memory parallel systems.

languages and compilers for parallel computing | 1995

Language and Run-Time Support for Network Parallel Computing

Peter A. Dinda; David R. O'Hallaron; Jaspal Subhlok; Jon A. Webb; Bwolen Yang

Network parallel computing is the use of diverse computing resources interconnected by general purpose networks to run parallel applications. This paper describes NetFx, an extension of the Fx compiler system which uses the Fx model of task parallelism to distribute and manage computations across the sequential and parallel machines of a network. A central problem in network parallel computing is that the compiler is presented with a heterogeneous and dynamic target. Our approach is based on a novel run-time system that presents a simple communication interface to the compiler, yet uses compiler knowledge to customize communication between tasks executing over the network. The run-time system is designed to support complete applications developed with different compilers and parallel program generators. It presents a standard communication interface for point-to-point transfer of distributed data sets between tasks. This allows the compiler to be portable, and enables communication generation without knowledge of exactly how the tasks will be mapped at run-time and what low level communication primitives will be used. The compiler also generates a set of custom routines, called address computation functions, to translate between different data distributions. The run-time system performs the actual communication using a mix of generic and custom address computation functions depending on run-time parameters like the type and number of nodes assigned to the communicating tasks and the data distributions of the variables being communicated. This mechanism enables the run-time system to exploit compile-time optimizations, and enables the compiler to manage foreign tasks that use non-standard data distributions. We outline several important applications of network parallel computing and describe the NetFx programming model and run-time system.

software product lines | 1994

Procedure call models for distributed parameters in data parallel programs

Bwolen Yang; David R. O'Hallaron

When a computer program invokes a procedure, both the caller and the callee must agree on how to pass the parameters into and out of the procedure. In this paper, this agreement is referred to as the procedure call model. In data parallel languages like High Performance Fortran (HPF), the procedure call model for distributed parameters can have an impact on procedure call overhead. This paper introduces a taxonomy of procedure call models, and examines how different models can reduce the procedure call overhead by avoiding unnecessary redistribution and by providing compile-time distribution information. A key result is that the procedure call model bounds the availability of compile-time information on the distribution of parameters, and this information can have an impact on the quality of the redistribution code.<<ETX>>

Archive | 1999