Is this you? Create Your Porfile

Cho-Li Wang

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cho-Li Wang is active.

Explore More

Publication

Featured researches published by Cho-Li Wang.

Journal of Parallel and Distributed Computing | 1994

Scalable data parallel implementations of object recognition using geometric hashing

Cho-Li Wang; Viktor K. Prasanna; Hyoung Joong Kim; Ashfaq A. Khokhar

Abstract Object recognition involves identifying known objects in a given scene. It plays a key role in image understanding. Geometric hashing has been proposed as a technique for model-based object recognition in occluded scenes. However, parallel techniques are needed to realize real time vision systems employing geometric hashing. In this paper, we present scalable parallel algorithms for object recognition using geometric hashing. We define a realistic abstract model of CM-5 in which explicit cost is associated with data routing and synchronization. We develop a load-balancing technique that results in scalable processor-time optimal algorithms for performing a probe on this model. Given a model of CM-5 with P PNs and a set S of feature points in a scene, a probe of the recognition phase can be performed in O(|V(S)|/P) time, where V(S) is the set of votes cast by feature points in S. This algorithm is scalable in the range 1 ≤ P ≤ |V(S)| 1 3 . On a mesh processor array of any size [formula] × [formula] which models MP-1, we show that a probe can be performed on O(|V(S)|/[formula]) time, log2|V(S)| ≤ P ≤ |V(S)|. These results do not assume any distributions of hash bin lengths or scene points. In earlier parallel implementations, the number of processors employed was independent of the size of the scene but depended on the size of the model database (which is usually very large). Our implementations on CM-5 and MP-1 significantly improve upon the number of processors employed and also result in superior time performance. The implementations developed in this paper require a number of processors that are independent of the size of the model database and are scalable with the machine size. Results of concurrent processing of multiple probes are also reported.

international conference on distributed computing systems | 1996

Portable and scalable algorithms for irregular all-to-all communication

Wenheng Liu; Cho-Li Wang; Viktor K. Prasanna

In this paper we develop portable and scalable algorithms for performing irregular all-to-all communication in High Performance Computing (HPC) systems. To minimize the communication latency, the algorithm reduces the total number of messages transmitted, reduces the variance of the lengths of these messages, and overlaps the communication with computation. The performance of the algorithm is characterized using a simple model of HPC systems. Our implementations are performed using the Message Passing Interface (MPI) standard and they can be ported to various HPC platforms. The performance of our algorithms is evaluated on CM5, T3D and SP2. The results show the effectiveness of the techniques as well as the interplay between the architectural features, the machine size, and the variance of message lengths. The experiences of our study can be applied in other HPC systems to optimize the performance of collective communication operations.

conference on computer architectures for machine perception | 1995

A fast asynchronous algorithm for linear feature extraction on IBM SP-2

Yongwha Chung; Viktor K. Prasanna; Cho-Li Wang

We present a fast parallel implementation of linear feature extraction on IBM SP-2. We first analyze the machine features and the problem characteristics to understand the overheads in parallel solutions to the problem. Based on these, we propose an asynchronous algorithm which enhances processor utilization and overlaps communication with computation by maintaining algorithmic threads in each processing node. Our implementation shows that, given a 512/spl times/512 image, the linear feature extraction task can be performed in 0.065 seconds on a SP-2 having 64 processing nodes. A serial implementation takes 3.45 seconds on a single processing node of SP-2. A previous implementation on CM-5 takes 0.1 second on a partition of 512 processing nodes. Experimental results on various sizes of images using 4, 8, 16, 32, and 64 processing nodes are also reported.

1993 Computer Architectures for Machine Perception | 1993

Low level vision processing on connection machine CM-5

Viktor K. Prasanna; Cho-Li Wang; Ashfaq A. Khokhar

The authors study low level vision processing on connection machine CM-5. A parallel computing model to capture the architectural features of CM-5 is identified. In this model, given an n by n image, it is shown that, a low level vision system, which includes edge detection, thinning, linking, and linear approximation, can be performed in O(n/sup 2//P) time using P processors. These algorithms are scalable in the range 1 /spl les/ P /spl les/ n. Various experiments were conducted to fine tune the implementation to suit the communication and the computation capabilities of the machine. Based on these experiments, implementations were performed to efficiently utilize the architectural and programming features of the machine. The implementations show that, given a 2048 /spl times/ 2048 grey level image as input, linear features can be extracted in less than 1.1 seconds on a CM-5 partition having 512 processing nodes. A serial implementation on a Sun Sparc 400 takes more than eight minutes. Experimental results on various sizes of images using various partitions of CM-5 are also reported. The software has been developed in a modular fashion to permit various techniques to be employed for the individual steps of the processing.

Journal of Parallel and Distributed Computing | 2002

Portable and scalable algorithm for irregular all-to-all communication

Wenheng Liu; Cho-Li Wang; Viktor K. Prasanna

In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also reduces node contention by smoothing out the lengths of the messages communicated. As compared to the earlier approaches, our algorithm provides deterministic performance and also reduces the buffer space at the nodes during message passing. The performance of the algorithm is characterised using a simple communication model of high-performance computing (HPC) platforms. We show the implementation on T3D and SP2 using C and the message passing interface standard. These can be easily ported to other HPC platforms. The results show the effectiveness of the proposed technique as well as the interplay among the machine size, the variance in message length, and the network interface.

international conference on pattern recognition | 1994

Scalable parallel implementations of perceptual grouping on connection machine CM-5

Viktor K. Prasanna; Cho-Li Wang

Perceptual grouping is a key step in vision to organize image data into structural hypotheses to be used for high level analysis. We propose data allocation and load balancing strategies which reduce the communication cost and evenly distribute the grouping operations among the processors. These techniques result in scalable algorithms for performing perceptual grouping on CM-5. The performance of our algorithms depends only on the total grouping operations generated by the image data and is independent of the distribution of the data among the processors. Our implementations show that given a 1 K/spl times/1 K input image, extraction of line segments and several perceptual grouping steps can be performed in 5.0 seconds using a partition of CM-5 having 32 processing nodes. A serial implementation of these steps on a Sun Sparc 400 takes more than 2 minutes.

conference on computer architectures for machine perception | 1995

Parallelization of perceptual grouping on distributed memory machines

Cho-Li Wang; Viktor K. Prasanna; Young Won Lim

We propose architecture independent parallel algorithms for solving perceptual grouping tasks on distributed memory machines. Given an n/spl times/n image, using P processors, we show that these tasks can be performed in O(n/sup 2//P) computation time and 20/spl radic/(P)T/sub d/+8(logP)T/sub d/+(40n//spl radic/(P)+20P)/spl tau//sub d/ communication time, where T/sub d/ is the communication startup time and /spl tau//sub d/ is the transmission rate. Our implementations show that, given 7K line segments extracted from a 1K/spl times/1K image, the line grouping task can be performed in 1.115 seconds using a partition of CM-5 having 256 processing nodes and in 0.382 seconds using a 16 node Cray T3D. Our code is written in C and MPI message passing standard and can be easily ported to other high performance computing platforms.

Journal of Parallel and Distributed Computing | 1998

Parallel Algorithms for Perceptual Grouping on Distributed Memory Machines

Yongwha Chung; Cho-Li Wang; Viktor K. Prasanna

Perceptual grouping is a key intermediate-level vision problem. Parallel solutions to this problem are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular interprocessor data dependency caused by the input image. In this paper, we propose two load-balancing techniques for parallelizing perceptual grouping on distributed-memory machines. By using an initial workload estimate, we first partition the computations to distribute the workload across the processors. In addition, we asynchronously perform ongoing task migrations to adapt to the unbalanced workload which may evolve differently from the initial estimate. We also discuss two strategies to manage the irregular interprocessor data dependency. To illustrate our ideas, perceptual grouping steps used in an integrated vision system for building detection are used as examples. Our experimental results show that, given 8K extracted line segments from a 1K × 1K image, both the line and junction grouping steps can be completed in 0.644 s on a 32-node SP2 and in 0.585 s on a 32-node T3D. For the same grouping steps, a serial implementation requires 10.550 s and 10.023 s on a single node of SP2 and T3D, respectively. The implementations were performed using the message passing interface standard and are portable to other high performance computing platforms.

Archive | 1995

Parallelizing Vision Computations on CM-5: Algorithms and Experiences

Viktor K. Prasanna; Cho-Li Wang

This chapter summarizes our work in using Connection Machine CM-5 for vision. We define a realistic model of CM-5 in which explicit cost is associated with data routing and cooperative operations. Using this model, we develop scalable parallel algorithms for representative problems in vision computations at all three levels: low-level, intermediate-level and high-level.

ieee international conference on high performance computing data and analytics | 1994

Scalable data parallel object recognition using geometric hashing on CM-5

Viktor K. Prasanna; Cho-Li Wang

Presents scalable parallel algorithms for object recognition using geometric hashing. We define an abstract model of the CM-5 computer. We develop a load balancing technique that results in scalable processor-time optimal algorithms for performing a probe on the CM-5 model. Given a model of a CM-5 with P processor nodes and a set S of feature points in a scene, a probe of the recognition phase can be performed in O(|V(S)|)/P) time, where V(S) is the set of votes cast by feature points in S. This algorithm is scalable in the range 1/spl les/P/spl lesspl radic/[|V(S)|/log|V(S)|]. These results do not assume any distributions of hash bin lengths or scene points. The implementations developed in this paper require a number of processors which is independent of the size of the model database and which is scalable with the machine size.<<ETX>>

Explore More