Manuel Saldaña | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manuel Saldaña is active.

Explore More

Publication

Featured researches published by Manuel Saldaña.

field-programmable logic and applications | 2006

TMD-MPI: An MPI Implementation for Multiple Processors Across Multiple FPGAs

Manuel Saldaña; Paul Chow

With current FPGAs, designers can now instantiate several embedded processors, memory units, and a wide variety of IP blocks to build a single-chip, high-performance multiprocessor embedded system. Furthermore, multi-FPGA systems can be built to provide massive parallelism given an efficient programming model. In this paper, we present a lightweight subset implementation of the standard message-passing interface, MPI, that is suitable for embedded processors. It does not require an operating system and uses a small memory footprint. With our MPI implementation (TMD-MPI), we provide a programming model capable of using multiple-FPGAs that hides hardware complexities from the programmer, facilitates the development of parallel code and promotes code portability. To enable intra-FPGA and inter-FPGA communications, a simple network-on-chip is also developed using a low overhead network packet protocol. Together, TMD-MPI and the network provide a homogeneous view of a cluster of embedded processors to the programmer. Performance parameters such as link latency, link bandwidth, and synchronization cost are measured by executing a set of microbenchmarks

field-programmable custom computing machines | 2006

A Scalable FPGA-based Multiprocessor

Arun Patel; Christopher A. Madill; Manuel Saldaña; Christopher Comis; Régis Pomès; Paul Chow

It has been shown that a small number of FPGAs can significantly accelerate certain computing tasks by up to two or three orders of magnitude. However, particularly intensive large-scale computing applications, such as molecular dynamics simulations of biological systems, underscore the need for even greater speedups to address relevant length and time scales. In this work, we propose an architecture for a scalable computing machine built entirely using FPGA computing nodes. The machine enables designers to implement large-scale computing applications using a heterogeneous combination of hardware accelerators and embedded microprocessors spread across many FPGAs, all interconnected by a flexible communication network. Parallelism at multiple levels of granularity within an application can be exploited to obtain the maximum computational throughput. By focusing on applications that exhibit a high computation-to-communication ratio, we narrow the extent of this investigation to the development of a suitable communication infrastructure for our machine, as well as an appropriate programming model and design flow for implementing applications. By providing a simple, abstracted communication interface with the objective of being able to scale to thousands of FPGA nodes, the proposed architecture appears to the programmer as a unified, extensible FPGA fabric. A programming model based on the MPI message-passing standard is also presented as a means for partitioning an application into independent computing tasks that can be implemented on our architecture. Finally, we demonstrate the first use of our design flow by developing a simple molecular dynamics simulation application for the proposed machine, which runs on a small platform of development boards

ACM Transactions on Reconfigurable Technology and Systems | 2010

MPI as a Programming Model for High-Performance Reconfigurable Computers

Manuel Saldaña; Arun Patel; Christopher A. Madill; Daniel Nunes; Danyao Wang; Paul Chow; Ralph D. Wittig; Henry E. Styles; Andrew Putnam

High-Performance Reconfigurable Computers (HPRCs) consist of one or more standard microprocessors tightly-coupled with one or more reconfigurable FPGAs. HPRCs have been shown to provide good speedups and good cost/performance ratios, but not necessarily ease of use, leading to a slow acceptance of this technology. HPRCs introduce new design challenges, such as the lack of portability across platforms, incompatibilities with legacy code, users reluctant to change their code base, a prolonged learning curve, and the need for a system-level Hardware/Software co-design development flow. This article presents the evolution and current work on TMD-MPI, which started as an MPI-based programming model for Multiprocessor Systems-on-Chip implemented in FPGAs, and has now evolved to include multiple X86 processors. TMD-MPI is shown to address current design challenges in HPRC usage, suggesting that the MPI standard has enough syntax and semantics to program these new types of parallel architectures. Also presented is the TMD-MPI Ecosystem, which consists of research projects and tools that are developed around TMD-MPI to further improve HPRC usability. Finally, we present preliminary communication performance measurements.

international workshop on high-performance reconfigurable computing technology and applications | 2008

MPI as an abstraction for software-hardware interaction for HPRCs

Manuel Saldaña; Aarun Patel; Christopher A. Madill; Daniel Nunes; Danyao Wang; Henry E. Styles; Andrew Putnam; Ralph D. Wittig; Paul Chow

High performance reconfigurable computers (HPRCs) consist of one or more standard microprocessors tightly coupled with one or more reconfigurable FPGAs. HPRCs have been shown to provide good speedups and good cost/performance ratios, but not necessarily ease of use, leading to a slow acceptance of this technology. HPRCs introduce new design challenges, such as the lack of portability across platforms, incompatibilities with legacy code, users reluctant to change their code base, a prolonged learning curve, and the need for a system-level hardware/software co-design development flow. This paper presents the evolution and current work on TMD-MPI, which started as an MPI-based programming model for multiprocessor systems-on-chip implemented in FPGAs, and has now evolved to include multiple X86 processors. TMD-MPI is shown to address current design challenges in HPRC usage, suggesting that the MPI standard has enough syntax and semantics to program these new types of parallel architectures. Also presented is the TMD-MPI ecosystem, which consists of research projects and tools that are developed around TMD-MPI to further improve HPRC usability.

IEEE Transactions on Very Large Scale Integration Systems | 2007

Routability of Network Topologies in FPGAs

Manuel Saldaña; Lesley Shannon; Jia Shuo Yue; Sikang Bian; John Craig; Paul Chow

A fundamental difference between application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) is that the wires in ASICs are designed to match the requirements of a particular design. Conversely, in an FPGA, the area is fixed and the routing resources exist whether or not they are used. In this paper, we investigate how well several common network topologies map onto a modern FPGA routing fabric. Different multiprocessor network topologies with between 8 and 64 nodes are mapped to a single large FPGA. Except for the fully-connected networks, it is observed that the difference in logic resources used and routing overhead among these topologies is insignificant for the systems tested. Fully-connected networks up to about 22 nodes are also feasible on the same FPGA although the logic and routing utilization clearly grows much faster. The conclusion is that a modern FPGA fabric is very rich in resources and capable of supporting highly interconnected topologies. For systems with a modest number of nodes implemented on current large FPGAs, it is not necessary to use the connectivity-limited topologies typically used for networks-on-chip. Rather, direct point-to-point connections between all communicating nodes can be considered.

system-level interconnect prediction | 2006

The routability of multiprocessor network topologies in FPGAs

Manuel Saldaña; Lesley Shannon; Paul Chow

A fundamental difference between ASICs and FPGAs is that wires in ASICs are designed such that they match the requirements of a particular design. Wire parameters such as length, width, layout and the number of wires can be varied to implement a desired circuit. Conversely, in an FPGA, area is fixed and routing resources exist whether or not they are used, so the goal becomes implementing a circuit within the limits of available resources. The architecture for existing routing structures in FPGAs has evolved over time to suit the requirements of large, localized digital circuits. However, FPGAs now have the capacity to implement networks of such circuits, and system-level interconnection becomes a key element of the design process.Following a standard design flow and using commercial tools, we investigate how this fundamental difference in resource usage affects the mapping of various network topologies to a modern FPGA routing structure. By exploring the routability of different multiprocessor network topologies with 8, 16 and 32 nodes on a single FPGA, we show that the difference between resource utilization of a ring, star, hypercube and mesh topologies is not significant up to 32 nodes. We also show that a fully-connected network can be implemented with at least 16 nodes, but with 32 nodes it exceeds the routing resources available on the FPGA. We also derive a cost metric that helps to estimate the impact of the topology selection based on the number of nodes.

reconfigurable computing and fpgas | 2006

Configuration and Programming of Heterogeneous Multiprocessors on a Multi-FPGA System Using TMD-MPI

Manuel Saldaña; Daniel Nunes; Emanuel Ramalho; Paul Chow

Recent research has shown that FPGAs have true potential to speedup demanding applications even further than what state-of-the art superscalar processors can do. The penalty is the loss of generality in the architecture, but reconfigurability of FPGAs allows them to be reprogrammed for other applications. Therefore, an efficient programming model and a flexible design flow are paramount for this technology to be more widely accepted. Furthermore, in the history of computers, standards have been a positive experience because they provide a common ground for research and development. A programming model for multiprocessor Systems-On-FPGAs should be standard and application independent, but optimized for a particular architecture. In this paper, we use TMD-MPI, a subset implementation of the message passing standard MPI, and a flexible system-level design flow to implement heterogeneous multiprocessor systems-on-chip on FPGAs. Hardware engines are also by using a message passing engine, which encapsulates the TMD-MPI functionality in hardware, to enable the communication between hardware engines and embedded processors. We test the functionality and scalability of the system by implementing a 45-processor system across five FPGAs. As a test example, we solve the heat equation by using the Jacobi iterations method. Some performance metrics are measured to demonstrate the impact of different computing cores on the overall computation

field-programmable technology | 2009

The challenges of using an embedded MPI for hardware-based processing nodes

Daniel Le Ly; Manuel Saldaña; Paul Chow

This paper presents several challenges and solutions in designing an efficient Message Passing Interface (MPI) implementation for embedded FPGA applications. Popular MPI implementations are designed for general-purpose computers which have significantly different properties and trade-offs than embedded platforms. Our work focuses on two types of interactions that are not present in typical MPI implementations. First, a number of improvements designed to accelerate software-hardware interactions are introduced, including a Direct Memory Access (DMA) engine with MPI functionality; the use of non-interrupting, non-blocking messages; and a proposed function, called MPI_Coalesce, to reduce the function call overhead from a series of sequential messages. These improvements resulted in a speed-up of 5-fold compared to an embedded software-only MPI implementation. Next, a novel dataflow message passing model is presented for hardware-hardware interactions to overcome the limitations of atomic messages, allowing hardware engines to communicate and compute simultaneously. This dataflow model provides a natural method for hardware designers to build high performance, MPI systems. Finally, two hardware cores, Tee cores and message watchdog timers, are introduced to provide a transparent method of debugging hardware MPI designs.

field-programmable technology | 2008

A profiler for a heterogeneous multi-core multi-FPGA system

Daniel Nunes; Manuel Saldaña; Paul Chow

Understanding the behavior of an application is rarely a trivial task, due to the complexity of the system in which the application is executed, and the complexity of the application itself. The task becomes even more troublesome, if the application is being run in a parallel environment where relationships between each application execution are needed to grasp the necessary understanding of the application behavior. FPGA flexibility increases the complexity of such tasks by allowing not only changes to the application, to adapt to the hardware, but also to tailor the hardware for a specific application. To take full advantage of these systems, a tool that will help the user to understand an application is paramount. In this paper, we present a profiler for the TMD, a heterogeneous multicore multiFPGA system designed at the University of Toronto. The profiler can be configured for a specific application running on a specific hardware configuration. It allows retrieval of all communication calls and any user state defined by instrumentation of the source code. We test the profiler with two simple case studies: MPI Barrier, where we compare a sequential with a binary tree algorithm, and a heat equation solver that uses the Jacobi iterations method, where we compare blocking with non-blocking MPI calls.

reconfigurable computing and fpgas | 2008

A Message-Passing Hardware/Software Co-simulation Environment to Aid in Reconfigurable Computing Design Using TMD-MPI

Manuel Saldaña; Emanuel Ramalho; Paul Chow

High-performance reconfigurable computers (HPRC) provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs, and system-level verification tools. In this paper, we extend previous work on TMD-MPI to include an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs that improves design portability by hiding vendor-specific communication details. Also, we have created a tool called the message-passing simulation framework (MSF) that we use to develop TMD-MPI itself as well as an application development tool that enables an FPGA in simulation to exchange messages with other X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-FPGA platform to quickly prototype the hardware, to test the communications and to verify the benchmark results.

Explore More