Alcides Fonseca
University of Coimbra
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alcides Fonseca.
Facing the Multicore-Challenge | 2013
Alcides Fonseca; Bruno Cabral
As a consequence of the immense computational power available in GPUs, the usage of these platforms for running data-intensive general purpose programs has been increasing. Since memory and processor architectures of CPUs and GPUs are substantially different, programs designed for each platform are also very different and often resort to a very distinct set of algorithms and data structures. Selecting between the CPU or GPU for a given program is not easy as there are variations in the hardware of the GPU, in the amount of data, and in several other performance factors.
Big Data Research | 2017
Alcides Fonseca; Bruno Cabral
Abstract Big Data concerns with large-volume complex growing data. Given the fast development of data storage and network, organizations are collecting large ever-growing datasets that can have useful information. In order to extract information from these datasets within useful time, it is important to use distributed and parallel algorithms. One common usage of big data is machine learning, in which collected data is used to predict future behavior. Deep-Learning using Artificial Neural Networks is one of the popular methods for extracting information from complex datasets. Deep-learning is capable of more creating complex models than traditional probabilistic machine learning techniques. This work presents a step-by-step guide on how to prototype a Deep-Learning application that executes both on GPU and CPU clusters. Python and Redis are the core supporting tools of this guide. This tutorial will allow the reader to understand the basics of building a distributed high performance GPU application in a few hours. Since we do not depend on any deep-learning application or framework—we use low-level building blocks—this tutorial can be adjusted for any other parallel algorithm the reader might want to prototype on Big Data. Finally, we will discuss how to move from a prototype to a fully blown production application.
International Journal of Parallel Programming | 2016
Alcides Fonseca; Bruno Cabral; João Rafael; Ivo Correia
There are billions of lines of sequential code inside nowadays’ software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the available parallelism, has been a research goal for some time now. This work proposes a new approach for achieving such goal. We created a new parallelizing compiler that analyses the read and write instructions, and control-flow modifications in programs to identify a set of dependencies between the instructions in the program. Afterwards, the compiler, based on the generated dependencies graph, rewrites and organizes the program in a task-oriented structure. Parallel tasks are composed by instructions that cannot be executed in parallel. A work-stealing-based parallel runtime is responsible for scheduling and managing the granularity of the generated tasks. Furthermore, a compile-time granularity control mechanism also avoids creating unnecessary data-structures. This work focuses on the Java language, but the techniques are general enough to be applied to other programming languages. We have evaluated our approach on 8 benchmark programs against OoOJava, achieving higher speedups. In some cases, values were close to those of a manual parallelization. The resulting parallel code also has the advantage of being readable and easily configured to improve further its performance manually.
high performance computing for computational science (vector and parallel processing) | 2016
Alcides Fonseca; Bruno Cabral
Parallel programs have the potential of executing several times faster than sequential programs. However, in order to achieve its potential, several aspects of the execution have to be parameterized, such as the number of threads, task granularity, etc. This work studies the task granularity of regular and irregular parallel programs on symmetrical multicore machines. Task granularity is how many parallel tasks are created to perform a certain computation. If the granularity is too coarse, there might not be enough parallelism to occupy all processors. But if granularity is too fine, a large percentage of the execution time may be spent context switching between tasks, and not performing useful work.
Journal of Computational Science | 2016
Alcides Fonseca; Bruno Cabral
Abstract Programming for concurrent platforms, such as multicore cpus, is very time consuming and requires fine tuning of the final program in order to optimize the program parallel layout to the hardware architecture. Parallelization of programs is done by identification parts of code (tasks) that can be executed concurrently and execution in different threads. Current approaches for automatic parallelization cannot achieve the same performance of manually parallelized programs. Current tools are limited and either parallelize everything possible, or are limited to parallelizing the outer loops, which may miss potential parallelism that could improve the program. Some approaches have controlled granularity during execution only, but without any relevant speedups. Automatic Parallelizing Compilers have shown little overall speedup without the manual guidance of programmers in terms of granularity. This work addresses the issue of achieving performant programs from a fully automated parallelization. We propose a cost-model to decide between different parallelization alternatives. By performing static-analysis, we are able to estimate the time of tasks and parallelize them only if the time is larger than the overhead of task spawning. Because the information during compilation might not be enough to make that decision, we delay some of the decisions to runtime, when all variables are available. Thus, we use an hybrid approach that performs optimizations at compile-time and at runtime. Although we apply our model in the Java language on top of the AEminium runtime, our approach is modular and can be applied to any programming language in any task-based runtime for shared-memory. We have evaluated our approach in existing benchmark programs, in cases where a wrong granularity value would result in slowing down the programs. We were able to achieve speedups greater than versions without granularity control, or with runtime-based granularity control information. We were also able to generate programs with better performance than the state-of-the-art Java automatic parallelizing compiler. Finally, in some cases we were able to outperform the human programmer.
european conference on parallel processing | 2014
João Rafael; Ivo Correia; Alcides Fonseca; Bruno Cabral
There are billions of lines of sequential code inside nowadays software which do not benefit from the parallelism available in modern multicore architectures. Transforming legacy sequential code into a parallel version of the same programs is a complex and cumbersome task. Trying to perform such transformation automatically and without the intervention of a developer has been a striking research objective for a long time. This work proposes an elegant way of achieving such a goal. By targeting a task-based runtime which manages execution using a task dependency graph, we developed a translator for sequential JAVA code which generates a highly parallel version of the same program. The translation process interprets the AST nodes for signatures such as read-write access, execution-flow modifications, among others and generates a set of dependencies between executable tasks. This process has been applied to well known problems, such as the recursive Fibonacci and FFT algorithms, resulting in versions capable of maximizing resource usage. For the case of two CPU bounded applications we were able to obtain 10.97x and 9.0x speedup on a 12 core machine.
european conference on parallel processing | 2014
Alcides Fonseca; João Rafael; Bruno Cabral
We propose a model for event-oriented programming under shared memory based on access permissions with explicit parallelism. In order to obtain safe parallelism, programmers need to specify the variable permissions of functions. Blocking operations are non existent, and callback-based APIs are used instead, which can be called in parallel for different events as long as the access permissions are guaranteed. This model scales for both IO and CPU-bounded programs.
Proceedings of the 5th International Workshop on Exception Handling | 2012
Alcides Fonseca; Bruno Cabral
Multi-core processors are present in everyones daily life. Consequently, concurrent programming has reemerged as a pressing concern for everyone interested in exploring all the potential computational power in these machines. But, the emergence of new concurrency models and programming languages also brings new challenges in terms of how one can deal with abnormal occurrences, much due to the heterogenous parallel control flow. Unexpectedly, sequential Exception Handling models remain as the most used tool for robustness, even in the most recent concurrent programming languages. Though, the appearance of more complex models, such as programming languages with implicit concurrency, might pose a challenge too big for these sequential mechanisms. In this article we will provide evidences why such models are not generally suited to deal with faults in programs with implicit concurrency and, in the light of more recent advances in concurrent Exception Handling, we will discuss the attributes of a model for addressing this problem.
european conference on applications of evolutionary computation | 2017
Alcides Fonseca; Nuno Lourenço; Bruno Cabral
Optimizing parallel programs is a complex task because the interference among many different parameters. Work-stealing runtimes, used to dynamically balance load among different processor cores, are no exception. This work explores the automatic configuration of the following runtime parameters: dynamic granularity control algorithms, granularity control cache, work-stealing algorithm, lazy binary splitting parameter, the maximum queue size and the unparking interval. The performance of the program is highly sensible to the granularity control algorithm, which can be a combination of other granularity algorithms. In this work, we address two search-based problems: finding a globally efficient work-stealing configuration, and finding the best configuration just for an individual program. For both problems, we propose the use of a Genetic Algorithm (GA). The genotype of the GA is able to represent combinations of up to three cut-off algorithms, as well as other work-stealing parameters.
pacific rim international symposium on dependable computing | 2015
Bruno Cabral; Alcides Fonseca; Paulo Marques; Jonathan Aldrich
The advent of multi-core systems set off a race to get concurrent programming to the masses. One of the challenging aspects of this type of system is how to deal with exceptional situations, since it is very difficult to assert the precise state of a concurrent program when an exception arises. In this paper we propose an exception-handling model for concurrent systems. Its main quality attributes are simplicity and expressiveness, allowing programmers to deal with exceptional situations in a concurrent setting in a familiar way. The proposal is centered on a new kind of exception type that defines new paths for exception propagation among concurrent threads of execution. In our model, beyond being able to control where exceptions are raised, the developer can define in which thread, and when during its execution, a particular exception will be handled. The proposed model has been implemented in Scala, and we show its application to the construction of concurrent software.