Cho-Li Wang
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cho-Li Wang.
Journal of Parallel and Distributed Computing | 1994
Cho-Li Wang; Viktor K. Prasanna; Hyoung Joong Kim; Ashfaq A. Khokhar
Abstract Object recognition involves identifying known objects in a given scene. It plays a key role in image understanding. Geometric hashing has been proposed as a technique for model-based object recognition in occluded scenes. However, parallel techniques are needed to realize real time vision systems employing geometric hashing. In this paper, we present scalable parallel algorithms for object recognition using geometric hashing. We define a realistic abstract model of CM-5 in which explicit cost is associated with data routing and synchronization. We develop a load-balancing technique that results in scalable processor-time optimal algorithms for performing a probe on this model. Given a model of CM-5 with P PNs and a set S of feature points in a scene, a probe of the recognition phase can be performed in O(|V(S)|/P) time, where V(S) is the set of votes cast by feature points in S. This algorithm is scalable in the range 1 ≤ P ≤ |V(S)| 1 3 . On a mesh processor array of any size [formula] × [formula] which models MP-1, we show that a probe can be performed on O(|V(S)|/[formula]) time, log2|V(S)| ≤ P ≤ |V(S)|. These results do not assume any distributions of hash bin lengths or scene points. In earlier parallel implementations, the number of processors employed was independent of the size of the scene but depended on the size of the model database (which is usually very large). Our implementations on CM-5 and MP-1 significantly improve upon the number of processors employed and also result in superior time performance. The implementations developed in this paper require a number of processors that are independent of the size of the model database and are scalable with the machine size. Results of concurrent processing of multiple probes are also reported.
international conference on distributed computing systems | 1996
Wenheng Liu; Cho-Li Wang; Viktor K. Prasanna
In this paper we develop portable and scalable algorithms for performing irregular all-to-all communication in High Performance Computing (HPC) systems. To minimize the communication latency, the algorithm reduces the total number of messages transmitted, reduces the variance of the lengths of these messages, and overlaps the communication with computation. The performance of the algorithm is characterized using a simple model of HPC systems. Our implementations are performed using the Message Passing Interface (MPI) standard and they can be ported to various HPC platforms. The performance of our algorithms is evaluated on CM5, T3D and SP2. The results show the effectiveness of the techniques as well as the interplay between the architectural features, the machine size, and the variance of message lengths. The experiences of our study can be applied in other HPC systems to optimize the performance of collective communication operations.
conference on computer architectures for machine perception | 1995
Yongwha Chung; Viktor K. Prasanna; Cho-Li Wang
We present a fast parallel implementation of linear feature extraction on IBM SP-2. We first analyze the machine features and the problem characteristics to understand the overheads in parallel solutions to the problem. Based on these, we propose an asynchronous algorithm which enhances processor utilization and overlaps communication with computation by maintaining algorithmic threads in each processing node. Our implementation shows that, given a 512/spl times/512 image, the linear feature extraction task can be performed in 0.065 seconds on a SP-2 having 64 processing nodes. A serial implementation takes 3.45 seconds on a single processing node of SP-2. A previous implementation on CM-5 takes 0.1 second on a partition of 512 processing nodes. Experimental results on various sizes of images using 4, 8, 16, 32, and 64 processing nodes are also reported.
1993 Computer Architectures for Machine Perception | 1993
Viktor K. Prasanna; Cho-Li Wang; Ashfaq A. Khokhar
The authors study low level vision processing on connection machine CM-5. A parallel computing model to capture the architectural features of CM-5 is identified. In this model, given an n by n image, it is shown that, a low level vision system, which includes edge detection, thinning, linking, and linear approximation, can be performed in O(n/sup 2//P) time using P processors. These algorithms are scalable in the range 1 /spl les/ P /spl les/ n. Various experiments were conducted to fine tune the implementation to suit the communication and the computation capabilities of the machine. Based on these experiments, implementations were performed to efficiently utilize the architectural and programming features of the machine. The implementations show that, given a 2048 /spl times/ 2048 grey level image as input, linear features can be extracted in less than 1.1 seconds on a CM-5 partition having 512 processing nodes. A serial implementation on a Sun Sparc 400 takes more than eight minutes. Experimental results on various sizes of images using various partitions of CM-5 are also reported. The software has been developed in a modular fashion to permit various techniques to be employed for the individual steps of the processing.
Journal of Parallel and Distributed Computing | 2002
Wenheng Liu; Cho-Li Wang; Viktor K. Prasanna
In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also reduces node contention by smoothing out the lengths of the messages communicated. As compared to the earlier approaches, our algorithm provides deterministic performance and also reduces the buffer space at the nodes during message passing. The performance of the algorithm is characterised using a simple communication model of high-performance computing (HPC) platforms. We show the implementation on T3D and SP2 using C and the message passing interface standard. These can be easily ported to other HPC platforms. The results show the effectiveness of the proposed technique as well as the interplay among the machine size, the variance in message length, and the network interface.
international conference on pattern recognition | 1994
Viktor K. Prasanna; Cho-Li Wang
Perceptual grouping is a key step in vision to organize image data into structural hypotheses to be used for high level analysis. We propose data allocation and load balancing strategies which reduce the communication cost and evenly distribute the grouping operations among the processors. These techniques result in scalable algorithms for performing perceptual grouping on CM-5. The performance of our algorithms depends only on the total grouping operations generated by the image data and is independent of the distribution of the data among the processors. Our implementations show that given a 1 K/spl times/1 K input image, extraction of line segments and several perceptual grouping steps can be performed in 5.0 seconds using a partition of CM-5 having 32 processing nodes. A serial implementation of these steps on a Sun Sparc 400 takes more than 2 minutes.
conference on computer architectures for machine perception | 1995
Cho-Li Wang; Viktor K. Prasanna; Young Won Lim
We propose architecture independent parallel algorithms for solving perceptual grouping tasks on distributed memory machines. Given an n/spl times/n image, using P processors, we show that these tasks can be performed in O(n/sup 2//P) computation time and 20/spl radic/(P)T/sub d/+8(logP)T/sub d/+(40n//spl radic/(P)+20P)/spl tau//sub d/ communication time, where T/sub d/ is the communication startup time and /spl tau//sub d/ is the transmission rate. Our implementations show that, given 7K line segments extracted from a 1K/spl times/1K image, the line grouping task can be performed in 1.115 seconds using a partition of CM-5 having 256 processing nodes and in 0.382 seconds using a 16 node Cray T3D. Our code is written in C and MPI message passing standard and can be easily ported to other high performance computing platforms.
Journal of Parallel and Distributed Computing | 1998
Yongwha Chung; Cho-Li Wang; Viktor K. Prasanna
Perceptual grouping is a key intermediate-level vision problem. Parallel solutions to this problem are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular interprocessor data dependency caused by the input image. In this paper, we propose two load-balancing techniques for parallelizing perceptual grouping on distributed-memory machines. By using an initial workload estimate, we first partition the computations to distribute the workload across the processors. In addition, we asynchronously perform ongoing task migrations to adapt to the unbalanced workload which may evolve differently from the initial estimate. We also discuss two strategies to manage the irregular interprocessor data dependency. To illustrate our ideas, perceptual grouping steps used in an integrated vision system for building detection are used as examples. Our experimental results show that, given 8K extracted line segments from a 1K × 1K image, both the line and junction grouping steps can be completed in 0.644 s on a 32-node SP2 and in 0.585 s on a 32-node T3D. For the same grouping steps, a serial implementation requires 10.550 s and 10.023 s on a single node of SP2 and T3D, respectively. The implementations were performed using the message passing interface standard and are portable to other high performance computing platforms.
Archive | 1995
Viktor K. Prasanna; Cho-Li Wang
This chapter summarizes our work in using Connection Machine CM-5 for vision. We define a realistic model of CM-5 in which explicit cost is associated with data routing and cooperative operations. Using this model, we develop scalable parallel algorithms for representative problems in vision computations at all three levels: low-level, intermediate-level and high-level.
ieee international conference on high performance computing data and analytics | 1994
Viktor K. Prasanna; Cho-Li Wang
Presents scalable parallel algorithms for object recognition using geometric hashing. We define an abstract model of the CM-5 computer. We develop a load balancing technique that results in scalable processor-time optimal algorithms for performing a probe on the CM-5 model. Given a model of a CM-5 with P processor nodes and a set S of feature points in a scene, a probe of the recognition phase can be performed in O(|V(S)|)/P) time, where V(S) is the set of votes cast by feature points in S. This algorithm is scalable in the range 1/spl les/P/spl lesspl radic/[|V(S)|/log|V(S)|]. These results do not assume any distributions of hash bin lengths or scene points. The implementations developed in this paper require a number of processors which is independent of the size of the model database and which is scalable with the machine size.<<ETX>>