Nelson L. Passos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nelson L. Passos is active.

Explore More

Publication

Featured researches published by Nelson L. Passos.

IEEE Transactions on Parallel and Distributed Systems | 1996

Achieving full parallelism using multidimensional retiming

Nelson L. Passos; Edwin Hsing-Mean Sha

Most scientific and digital signal processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for one-dimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG into a new structure. The restructuring process is based on a multidimensional retiming technique. The theory and two algorithms to obtain full parallelism are presented in this paper. Examples of optimization of nested loops and digital signal processing designs are shown to demonstrate the effectiveness of the algorithms.

international conference on parallel processing | 1994

Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming

Nelson L. Passos; Edwin Hsing-Mean Sha

Most scientific and DSP applications are recursive or iterative. Uniform nested loops can be modeled as multi-dimensional data flow graphs (DFGs). To achieve full parallelism of the loop body, i.e., all the computational nodes executed in parallel, substantially decreases the overall computation time. It is well known that for one-dimensional DFGs retiming can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for DFGs with more than one dimension. It also presents two novel multi-dimensional retiming techniques to obtain full parallelism.

IEEE Transactions on Computers | 2000

Probabilistic loop scheduling for applications with uncertain execution time

Sissades Tongsima; Edwin Hsing-Mean Sha; Chantana Chantrapornchai; David R. Surma; Nelson L. Passos

One of the difficulties in high-level synthesis and compiler optimization is obtaining a good schedule without knowing the exact computation time of the tasks involved. The uncertain computation times of these tasks normally occur when conditional instructions are employed and/or inputs of the tasks influence the computation time. The relationship between these tasks can be represented as a data-flow graph where each node models the task associated with a probabilistic computation time. A set of edges represents the dependencies between tasks. In this research, we study scheduling and optimization algorithms taking into account the probabilistic execution times. Two novel algorithms, called probabilistic retiming and probabilistic rotation scheduling, are developed for solving the underlying nonresource and resource constrained scheduling problems, respectively. Experimental results show that probabilistic retiming consistently produces a graph with a smaller longest path computation time for a given confidence level, as compared with the traditional retiming algorithm that assumes a fixed worst-case and average-case computation times. Furthermore, when considering the resource constraints and probabilistic environments, probabilistic rotation scheduling gives a schedule whose length is guaranteed to satisfy a given probability requirement. This schedule is better than schedules produced by other algorithms that consider worst-case and average-case scenarios.

design automation conference | 1994

Loop Pipelining for Scheduling Multi-Dimensional Systems via Rotation

Nelson L. Passos; Edwin Hsing-Mean Sha; Steven C. Bass

Multi-dimensional (MD) systems are widely used in scientific applications such as image processing, geophysical signal processing and fluid dynamics. Earlier scheduling methods in synthesizing MD systems do not explore loop pipelining across different dimensions. This paper explores the basic properties of MD loop pipelining and presents an algorithm, called multi-dimensional rotation scheduling, to find an efficientschedule based on the multi-dimensional retiming technique we developed. The description and the correctness of our algorithm are presented in the paper. The experiments show that our algorithm can achieve optimal results efficiently.

international conference on computer aided design | 1995

Push-up scheduling: Optimal polynomial-time resource constrained scheduling for multi-dimensional applications

Nelson L. Passos; Edwin Hsing-Mean Sha

Multi-dimensional computing applications, such as image processing and fluid dynamics, usually contain repetitive groups of operations represented by nested loops. The optimization of such loops, considering processing resource constraints, is required to improve their computational time. This study presents a new technique, called push-up scheduling, able to achieve the shortest possible schedule length in polynomial time. Such technique transforms a multi-dimensional dataflow graph representing the problem, while assigning the loop operations to a schedule table in such a way to occupy, legally, any empty spot. The algorithm runs in O(n|E|) time where n is the number of dimensions of the problem, and |E| is the number of edges in the graph.

international conference on parallel processing | 1996

Polynomial-time nested loop fusion with full parallelism

Edwin Hsing-Mean Sha; Chenhua Lang; Nelson L. Passos

Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way of reducing synchronization and improving data locality. Traditional fusion techniques, however either cannot address the case when fusion-preventing dependence exists in nested loops, or cannot achieve good parallelism after fusion. This paper gives a significant improvement by presenting several efficient polynomial-time algorithms to solve these problems. These algorithms combined with the retiming technique allow nested loop fusion in the existence of outmost loop-carried dependence as in the presence of fusion-preventing dependence. Furthermore, the technique is proved to achieve fully parallel execution of the fused loops.

IEEE Transactions on Signal Processing | 1996

Optimizing DSP flow graphs via schedule-based multidimensional retiming

Nelson L. Passos; Edwin Hsing-Mean Sha; Steven C. Bass

Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable tool in optimizing 1-D signal processing applications, represented by flow graphs. Such transformation can maximize the parallelism of a loop body. Few results on retiming have been obtained for multidimensional (MD) systems. The article develops a novel framework, which consists of a MD retiming technique that considers the final schedule as part of the optimization process. To the authors knowledge, this is the first retiming algorithm on general MD flow graphs.

IEEE Transactions on Signal Processing | 1997

Communication-sensitive loop scheduling for DSP applications

Sissades Tongsima; Edwin Hsing-Mean Sha; Nelson L. Passos

The performance of computation-intensive digital signal processing applications running on parallel systems is highly dependent on communication delays imposed by the parallel architecture. In order to obtain a more compact task/processor assignment, a scheduling algorithm considering the communication time between processors needs to be investigated. Such applications usually contain iterative or recursive segments that are modeled as communication sensitive data flow graphs (CS-DFGs), where nodes represent computational tasks and edges represent dependencies between them. Based on the theorems derived, this paper presents a novel efficient technique called cyclo-compaction scheduling, which is applied to a CS-DFG to obtain a better schedule. This new method takes into account the data transmission time, loop carried dependencies, and the target architecture. It implicitly uses the retiming technique (loop pipelining) and a task remapping procedure to allocate processors and to iteratively improve the parallelism while handling the underlying communication and resource constraints. Experimental results on different architectures demonstrate that this algorithm yields significant improvement over existing methods. For some applications, the final schedule length is less than one half of its initial length.

midwest symposium on circuits and systems | 1999

ASIC design for conditional nested loops with predicate registers

B. Sinclair; R.P. Light; Nelson L. Passos

Time-critical sections of multi-dimensional applications, such as image processing and computational fluid dynamics are, in general, iterative or recursive. Most of these applications require each iteration to be executed under a specific time constraint associated with the data input rate. The design of circuits dedicated to perform such repetitive tasks is highly dependent on optimization techniques to achieve the desired execution time. The existence of branch instructions within the recursive code (loop) may degrade the performance of the optimized code. Branch predication techniques utilize predicate registers to centralise the validity of speculatively computed results and prevent those branch hazards. These registers are significant obstacles in the performance gain achievable by the overlap of successive iterations of nested loops. This paper presents a process of designing and dimensioning those registers while optimizing the computational time of the loop.

international symposium on circuits and systems | 1994

Partitioning and retiming of multi-dimensional systems

Nelson L. Passos; E. Hsing-Mean Sha; Steven C. Bass

The use of massive parallelism in solving Partial Differential Equations (PDEs) has been studied for a long time. Fettweis and Nitache (1991) introduced a new method of transforming a PDE problem in a set of computational nodes represented by wave digital filters working in a multidimensional environment. Those computational nodes may not be mapped one-to-one to processor elements. After the nodes are partitioned into blocks, this paper introduces the concept of transforming such blocks to multidimensional data flow graphs, and an algorithm to obtain a final execution schedule with an optimal performance by using multidimensional retiming. The method is applicable to any uniformly represented data dependence graph and the Fettweis and Nitache method was chosen as an interesting example of its application.<<ETX>>

Explore More