Thomas Marconi
Delft University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Marconi.
applied reconfigurable computing | 2008
Thomas Marconi; Yi Lu; Koen Bertels; Georgi Gaydadjiev
In this paper, we propose an online hardware task scheduling and placement algorithm and evaluate it performance. Experimental results on large random task set show that our algorithm outperforms the existing algorithms in terms of reduced total wasted area up to 89.7%, has 1.5 % shorter schedule time and 31.3% faster response time.
design, automation, and test in europe | 2008
Thomas Marconi; Yi Lu; Koen Bertels; Georgi Gaydadjiev
Speed and placement quality are two very important attributes of a good online placement algorithm, because the time taken by the algorithm is considered as an overhead to the application overall execution time. To solve this problem, we propose three techniques: Merging Only if Needed (MON), Partial Merging (PM), and Direct Combine (DC). Our IM (intelligent merging) algorithm uses dynamically these three techniques to exploit their specific advantages. IM outperforms Bazargans algorithm as it has placement quality within 0.89% but is 1.72 times faster.
applied reconfigurable computing | 2010
Thomas Marconi; Yi Lu; Koen Bertels; Georgi Gaydadjiev
Few of the benefits of exploiting partially reconfigurable devices are power consumption reduction, cost reduction, and customized performance improvement. To obtain these benefits, one main problem needs to be solved is the task scheduling and placement. Existing algorithms tend to allocate tasks at positions where can block future tasks to be scheduled earlier denoted as ”blocking-effect”. To tackle this effect, a novel 3D total contiguous surface (3DTCS) heuristic is proposed for equipping our scheduling and placement algorithm with blocking-awareness. The proposed algorithm is evaluated with both synthetic and real workloads (e.g. MDTC, matrix multiplication, hamming code, sorting, FIR, ADPCM, etc). The proposed algorithm not only has better scheduling and placement quality but also has shorter algorithm execution time compared to existing algorithms.
field-programmable custom computing machines | 2010
Yi Lu; Thomas Marconi; Koen Bertels; Georgi Gaydadjiev
In this paper, we propose an efficient online task scheduling algorithm which targets 2D FPGA area partitioning model and takes into account the data dependency and the data communications 1) among hardware tasks and 2) between hardware tasks and external devices which have not been explicitly investigated in previous work. In the experiment with 10000 workloads, the evaluation result shows that our proposed scheduling algorithm is about 20x faster than the comparable approach.
design, automation, and test in europe | 2008
Yi Lu; Thomas Marconi; Georgi Gaydadjiev; Koen Bertels
Finding the available empty space for arrival tasks on FPGAs with runtime partially reconfigurable abilities is the most time consuming phase in on-line placement algorithms. Naturally, this phase has the highest impact on the overall system performance. In this paper, we present a new algorithm which is used to find the complete set of maximum free rectangles on the FPGA at runtime. During scanning, our algorithm relies on dynamic information about the edges of all already placed tasks. Simulation results show that our algorithm has 1.5times to 5times speedup compared to state of the art algorithms aiming at maximum free rectangles. In addition, our proposal requires at least 4.4times less scanning load.
international parallel and distributed processing symposium | 2008
Yi Lu; Thomas Marconi; Georgi Gaydadjiev; Koen Bertels; Roel Meeuws
With the arrival of partial reconfiguration technology, modern FPGAs support swapping tasks in or out individually at run-time without interrupting other tasks running on the same FPGA. Although, implementing this feature achieves much better flexibility and device utilization, the challenge remains to quickly and efficiently place tasks arriving at run-time on such partially reconfigurable systems. In this paper, we propose an algorithm to handle this on-line, run-time task placement problem. By adding logical constraints on the FPGA and introducing our resources management solution, the simulation results show our algorithm has better overall performances compared with previous reported methods in terms of task rejection number, placement quality and execution time.
field-programmable technology | 2009
Thomas Marconi; Yi Lu; Koen Bertels; Georgi Gaydadjiev
In this paper, we propose a new strategy for online placement algorithm on 2D partially reconfigurable devices, termed the Quad-Corner(QC). The main differences between our algorithm and related art are quad-corner spreading capability and dynamical searching sequences. Moreover, existing algorithms do not evaluate their algorithms with real hardware tasks; we do experimentations with real hardware tasks on a real FPGA. Our proposal achieves better placement quality and fast online placement compared to existing approaches. Experiments with real workloads (e.g. MDCT, matrix multiplication, hamming code, sorting, FIR, ADPCM, etc) on Virtex-4 show that the QC not only has 78 % less penalty and 93 % less wasted area than the existing algorithms on average but also has lower runtime overhead.
field-programmable technology | 2007
Zubair Nawaz; Ozana Silvia Dragomir; Thomas Marconi; Elena Moscu Panainte; Koen Bertels; Stamatis Vassiliadis
Loops are an important source of performance improvement, for which there exists a large number of compiler based optimizations. Few optimizations assume that the loop will be fully mapped on hardware. In this paper, we discuss a loop transformation called recursive variable expansion, which can be efficiently implemented in hardware. It removes all the data dependencies from the program and then the parallelism is only bounded by the amount of resources one has. To show the performance improvement and the utilization of resources, we have chosen four kernels from widely used applications (FIR, DCT, Sobel edge detection algorithm and matrix multiplication). The hardware implementation of these kernels proved to be 1.5 to 77 times faster (depending on application) than the code compiled and run on PowerPC.
design, automation, and test in europe | 2012
Liang Chen; Thomas Marconi; Tulika Mitra
Processor customization in the form of application-specific instructions has become a popular choice to meet the increasing performance demands of embedded applications under short time-to-market constraints. Implementing the custom instructions in reconfigurable logic provides greater flexibility. Recently, a number of architectures have been proposed where multiple cores on chip share a single reconfigurable fabric that implements the custom instructions. Effective exploitation of this reconfigurable fabric requires runtime scheduling of the tasks on the cores and allocation of reconfigurable logic for custom instructions. In this paper, we propose an efficient online scheduling algorithm for multi-core shared reconfigurable fabric and show its effectiveness through experimental evaluation.
Computers & Electrical Engineering | 2014
Thomas Marconi
Display Omitted HighlightsA fast efficient online task scheduling and placement algorithm is proposed.The algorithm orchestrates multiple hardware versions of tasks to optimize quality.The algorithm shortens runtime overhead by reducing search space on-the-fly.Experimental studies conclusively reveal the superiority of the proposed algorithm.The algorithm is not only better in scheduling and placement quality but also faster. Hardware task scheduling and placement at runtime plays a crucial role in achieving better system performance by exploring dynamically reconfigurable Field-Programmable Gate Arrays (FPGAs). Although a number of online algorithms have been proposed in the literature, no strategy has been engaged in efficient usage of reconfigurable resources by orchestrating multiple hardware versions of tasks. By exploring this flexibility, on one hand, the algorithms can be potentially stronger in performance; however, on the other hand, they can suffer much more runtime overhead in selecting dynamically the best suitable variant on-the-fly based on its runtime conditions imposed by its runtime constraints. In this work, we propose a fast efficient online task scheduling and placement algorithm by incorporating multiple selectable hardware implementations for each hardware request; the selections reflect trade-offs between the required reconfigurable resources and the task runtime performance. Experimental studies conclusively reveal the superiority of the proposed algorithm in terms of not only scheduling and placement quality but also faster runtime decisions over rigid approaches.