Siang W. Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Siang W. Song is active.

Explore More

Publication

Featured researches published by Siang W. Song.

international colloquium on automata languages and programming | 1997

Efficient Parallel Graph Algorithms For Coarse Grained Multicomputers and BSP

Edson Norberto Cáceres; Frank K. H. A. Dehne; Afonso Ferreira; Paola Flocchini; Ingo Rieping; Alessandro Roncato; Nicola Santoro; Siang W. Song

In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2-edge connectivity and biconnectivity (testing and component computation), and (8) cordai graph recognition (finding a perfect elimination ordering). The algorithms for Problems 1–7 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2 hold for arbitrary ratios \(\frac{n}{p}\), i.e. they are fully scalable, and for Problems 3–8 it is assumed that \(\frac{n}{p} \geqslant p^ \in ,{\mathbf{ }} \in {\mathbf{ }} > 0\), which is true for all commercially available multiprocessors. We view the algorithms presented as an important step towards the final goal of O(1) communication rounds. Note that, the number of communication rounds obtained in this paper is independent of n and grows only very slowly with respect to p. Hence, for most practical purposes, the number of communication rounds can be considered as constant. The result for Problem 1 is a considerable improvement over those previously reported. The algorithms for Problems 2–7 are the first practically relevant deterministic parallel algorithms for these problems to be used for commercially available coarse grained parallel machines.

foundations of computer science | 1977

An efficient parallel garbage collection system and ITS correctness proof

H. T. Kung; Siang W. Song

Abstract : An efficient system to perform garbage collection in parallel with list operations is proposed and its correctness is proven. The system consists of two independent processes sharing a common memory. One process is performed by the list processor (LP) for list processing and the other by the garbage collector (GC) for marking active nodes and collecting garbage nodes. The system is derived by using both the correctness and efficiency arguments. Assuming that memory references are indivisible the system satisfies the following properties: No critical sections are needed in the entire system. The time to perform the marking phase by the GC is independent of the size of memory, but depends only on the number of active nodes. Nodes on the free list need not be marked during the marking phase by the GC. Minimum overheads are introduced to the LP. Only two extra bits for encoding four colors are needed for each node. Efficiency results show that the parallel system is usually significantly more efficient in terms of storage and time than the sequential stack algorithm. (Author)

international conference on computational science and its applications | 2003

A parallel wavefront algorithm for efficient biological sequence comparison

Carlos Eduardo Rodrigues Alves; Edson Norberto Cáceres; Frank K. H. A. Dehne; Siang W. Song

In this paper we present a parallel wavefront algorithm for computing an alignment between two strings A and C, with |A| = m and |C| = n. On a distributed memory parallel computer of p processors each with O((m + n)/p) memory, the proposed algorithm requires O(p) communication rounds and O(mn/p) local computing time. The novelty of this algorithm is based on a compromise between the workload of each processor and the number of communication rounds required, expressed by a parameter called α. The proposed algorithm is expressed in terms of this parameter that can be tuned to obtain the best overall parallel time in a given implementation. We show very promising experimental results obtained on a 64-node Beowulf machine. A characteristic of the wavefront communication requirement is that each processor communicates with few other processors. This makes it very suitable as a potential application for grid computing.

International Journal of Parallel Programming | 1997

Randomized parallel list ranking for distributed memory multiprocessors

Frank K. H. A. Dehne; Siang W. Song

We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP type model. We first describe a simple version which requires, with high probability, log(3p)+log ln(n)=Õ(logp+log logn) communication rounds (h-relations withh=Õ(n/p)) andÕ(n/p)) local computation. We then outline an improved version that requires high probability, onlyr⩽(4k+6) log(2/3p)+8=Õ(k logp) communication rounds wherek=min{i⩾0 |ln(i+1)n⩽(2/3p)2i+1}. Notek<ln*(n) is an extremely small number. Forn andp⩾4, the value ofk is at most 2. Hence, for a given number of processors,p, the number of communication rounds required is, for all practical purposes, independent ofn. Forn⩽1, 500,000 and 4⩽p⩽2048, the number of communication rounds in our algorithm is bounded, with high probability, by 78, but the actual number of communication rounds observed so far is 25 in the worst case. Forn⩽10010100 and 4⩽p⩽2048, the number of communication rounds in our algorithm is bounded, with high probability, by 118; and we conjecture that the actual number of communication rounds required will not exceed 50. Our algorithm has a considerably smaller member of communication rounds than the list ranking algorithm used in Reid-Miller’s empirical study of parallel list ranking on the Cray C-90.(1) To our knowledge, Reid-Miller’s algorithm(1) was the fastest list ranking implementation so far. Therefore, we expect that our result will have considerable practical relevance.

ieee international symposium on fault tolerant computing | 1989

Comprehensive evaluation of a two-dimensional configurable array

Onat Menzilcioglu; H. T. Kung; Siang W. Song

An evaluation is presented of a highly configurable architecture for two-dimensional arrays of powerful processors. The evaluation is based on an array of Warp cells and uses real application programs. The evaluation covers the areas of configurability, array survivability, and performance degradation. The software and algorithms developed for the evaluation are also discussed. The results based on simulations of small and medium size arrays (up to 16*16) show that a high degree of configurability and array survivability can be achieved with little impact on program performance.<<ETX>>

ASIAN '96 Proceedings of the Second Asian Computing Science Conference on Concurrency and Parallelism, Programming, Networking, and Security | 1996

Randomized Parallel List Ranking for Distributed Memory Multiprocessors

Frank K. H. A. Dehne; Siang W. Song

We present a randomized parallel list ranking algorithm for distributed memory multiprocessors. A simple version requires, with high probability, log(3p)+log ln(n)=O(log p+log log n) communication rounds (h-relations with h=O(n/p)) and O(n/p) local computation. An improved version requires, with high probability, only r ≤ (4k+6) log (2/3p)+8=O(k log p) communication rounds where k= min{i ≥ 0¦ln(i+1)n ≤ (2/3p)2i+1}. Note that k < ln*(n) is an extremely small number. For \(n \leqslant 10^{10^{100} }\)and p ≥ 4, the value of k is at most 2. For a given number of processors, p, the number of communication rounds required is, for all practical purposes, independent of n. For \(n \leqslant 10^{10^{100} }\)and 4 ≤ p ≤ 2048, the number of communication rounds in our algorithm is bounded, with high probability, by 118. We conjecture that the actual number of communications rounds will not exceed 50.

Discrete Applied Mathematics | 2008

An all-substrings common subsequence algorithm

Carlos Eduardo Rodrigues Alves; Edson Norberto Cáceres; Siang W. Song

Abstract Given two strings A and B of lengths n a and n b , n a ⩽ n b , respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B ′ of B, the length of the longest string that is a subsequence of both A and B ′ . The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O ( n a n b ) time and O ( n a + n b ) space. After this preparation, it is possible to build a matrix of size O ( n b 2 ) that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem.

symposium on computer architecture and high performance computing | 2002

A parallel approximation hitting set algorithm for gene expression analysis

D.P. Ruchkys; Siang W. Song

With the recent DNA-microarray technology, it is possible to measure the expression levels of thousands of genes simultaneously in the same experiment. A genetic network is a model that describes how the expression level of each gene is affected by the expression levels of other genes in the network. Given the results of an experiment with n genes and m measures over time (m << n), we consider the problem of finding a subset of genes (k genes, where k < < n) that explain the expression level of a given target gene under study. We consider the coarse-grained multicomputer (CGM) model, with p processors. In this paper we first present a sequential approximation algorithm of O(m/sup 4/n) time and O(m/sup 2/n) space. The main result is a new parallel approximation algorithm that determines the k genes in O(m/sup 4/n/p) local computing time plus O(k) communication rounds, and with space requirement of O(m/sup 2/n/p). The p factor in the parallel time and space complexities indicates a good parallelization. We also show preliminary promising experimental results on a Beowulf machine. To our knowledge there are no CGM algorithms for the problem considered in this paper.

Algorithmica | 1999

A Note on Parallel Selection on Coarse-Grained Multicomputers

E. L. G. Saukas; Siang W. Song

Abstract. Consider the selection problem of determining the k th smallest element of a set of n elements. Under the CGM (coarse-grained multicomputer) model with p processors and O(n/p) local memory, we present a deterministic parallel algorithm for the selection problem that requires O( log p) communication rounds. Besides requiring a low number of communication rounds, the algorithm also attempts to minimize the total amount of data transmitted in each round (only O(p) except in the last round). In addition to showing theoretical complexities, we present very promising experimental results obtained on a parallel machine that show almost linear speedup, indicating the efficiency and scalability of the proposed algorithm.

conference on high performance computing (supercomputing) | 1998

Efficient Selection Algorithms on Distributed Memory Computers

E. L. G. Saukas; Siang W. Song

Consider the selection problem of determining the k th smallest element of a sequence of n elements. Under the CGM (Coarse Grained Multicomputer) model with p processors and O (n/p) local memory, we present a deterministic parallel algorithm for the selection problem that requires O(log p) communication rounds. Besides requiring a low number of communication rounds, the algorithm also attempts to minimize the total amount of data transmitted in each round (only O(p) except in the last round). The basic algorithm is then extended to solve the problem of q simultaneous selections using the same input sequence, also in O(log p) communication rounds and asymptotically same local computing time (if q = O(p) ). The simultaneous selection algorithm gives rise to a communication efficient sorting algorithm, with O(log p) communication rounds and a total of O(p2) data transmitted in each round except in the last one. In addition to showing theoretical complexities, we present very promising experimental results obtained on two parallel machines that show almost linear speedup, indicating the efficiency and scalability of the proposed algorithms. To our knowledge, this is the best deterministic CGM algorithm in the literature for the selection problem.

Explore More