Is this you? Create Your Porfile

Pedro Trancoso

Chalmers University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pedro Trancoso is active.

Explore More

Publication

Featured researches published by Pedro Trancoso.

computing frontiers | 2017

Using Personality Metrics to Improve Cache Interference Management in Multicore Processors

Mwaffaq Otoom; Aamer Jaleel; Pedro Trancoso

The trend of increasing the number of cores in a processor will lead to certain challenges, among which the fact that more cores issue more memory requests and this in turn will increase the competition, or interference, for shared resources such as the Last-Level Cache (LLC). In this work we focus on the cache interference while executing Decision Support System queries, which is a common case for a Data Center scenario. We study the co-execution of different queries from the TPC-H benchmark using the PostgreSQL DBMS system on a multicore with up to 16 cores and different LLC configurations. In addition to the working set metric, to better understand the effects of co-execution, we develop two new personality metrics to classify the behavior of the queries in co-execution: social and sensitive metrics. These metrics can be used to manage the cache interference and thus improve the co-execution performance of the queries.

Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems | 2017

Auto-tuning Static Schedules for Task Data-flow Applications

Andreas Diavastos; Pedro Trancoso

Scheduling task-based parallel applications on many-core processors is becoming more challenging and has received lots of attention recently. The main challenge is to efficiently map the tasks to the underlying hardware topology using application characteristics such as the dependences between tasks, in order to satisfy the requirements. To achieve this, each application must be studied exhaustively as to define the usage of the data by the different tasks, that would provide the knowledge for mapping tasks that share the same data close to each other. In addition, different hardware topologies will require different mappings for the same application to produce the best performance.n In this work we use the synchronization graph of a task-based parallel application that is produced during compilation and try to automatically tune the scheduling policy on top of any underlying hardware using heuristic-based Genetic Algorithm techniques. This tool is integrated into an actual task-based parallel programming platform called SWITCHES and is evaluated using real applications from the SWITCHES benchmark suite. We compare our results with the execution time of predefined schedules within SWITCHES and observe that the tool can converge close to an optimal solution with no effort from the user and using fewer resources.

ACM Transactions on Architecture and Code Optimization | 2017

SWITCHES: A Lightweight Runtime for Dataflow Execution of Tasks on Many-Cores

Andreas Diavastos; Pedro Trancoso

SWITCHES is a task-based dataflow runtime that implements a lightweight distributed triggering system for runtime dependence resolution and uses static scheduling and compile-time assignment policies to reduce runtime overheads. Unlike other systems, the granularity of loop-tasks can be increased to favor data-locality, even when having dependences across different loops. SWITCHES introduces explicit task resource allocation mechanisms for efficient allocation of resources and adopts the latest OpenMP Application Programming Interface (API), as to maintain high levels of programming productivity. It provides a source-to-source tool that automatically produces thread-based code. Performance on an Intel Xeon-Phi shows good scalability and surpasses OpenMP by an average of 32%.

computing frontiers | 2018

LEGaTO: towards energy-efficient, secure, fault-tolerant toolset for heterogeneous computing

Adrian Cristal; Osman S. Unsal; Xavier Martorell; Paul M. Carpenter; Raúl de la Cruz; Leonardo Bautista; Daniel A. Jiménez; Carlos Álvarez; Behzad Salami; Sergi Madonar; Miquel Pericàs; Pedro Trancoso; Micha vor dem Berge; Gunnar Billung-Meyer; Stefan Krupop; Wolfgang Christmann; Frank Klawonn; Amani Mihklafi; Tobias Becker; Georgi Gaydadjiev; Hans Salomonsson; Devdatt Dubhashi; Oron Port; Yoav Etsion; Vesna Nowack; Christof Fetzer; Jens Hagemeyer; Thorsten Jungeblut; Nils Kucza; Martin Kaiser

LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.

international conference on parallel processing | 2017

SWAS: Stealing Work Using Approximate System-Load Information

Stavros Tzilis; Miquel Pericàs; Pedro Trancoso; Ioannis Sourdis

This paper explores the potential of utilizing approximate system load information to enhance work stealing for dynamic load balancing in hierarchical multicore systems. Maintaining information about the load of a system has not been extensively researched since it is assumed to introduce performance overheads. We propose SWAS, a lightweight approximate scheme for retrieving and using such information, based on compact bit vector structures and lightweight update operations. This approximate information is used to enhance the effectiveness of work stealing decisions. Evaluating SWAS for a number of representative scenarios on a multi-socket multi-core platform showed that work stealing guided by approximate system load information achieves considerable performance improvements: up to 18.5% for dynamic, severely imbalanced workloads; and up to 34.4% for workloads with complex task dependencies, when compared with random work stealing.

acm international conference on systems and storage | 2017

Heterogeneous- and NUMA-aware scheduling for many-core architectures

Panayiotis Petrides; Pedro Trancoso

As the number of cores increases in a single chip processor, several challenges arise: wire delays, contention for out-of-chip accesses, and core heterogeneity. In order to address these issues and the applications demands, future large-scale many-core processors are expected to be organized as a collection of NUMA clusters of heterogeneous cores. In this work we propose a scheduler that takes into account the non-uniform memory latency, the heterogeneity of the cores, and the contention to the memory controller to find the best matching core for the applications memory and compute requirements. Scheduler decisions are based on an on-line classification process that determines applications requirements either as memory- or compute-bound. We evaluate our proposed scheduler on the 48-core Intel SCC using applications from SPEC CPU2006 benchmark suite. Our results show that even when all cores are busy, migrating processes to cores that match better the requirements of applications results in overall performance improvement. In particular we observed a reduction of the execution time from 15% to 36% compared to a random static scheduling policy.

Proceedings of the International Symposium on Memory Systems | 2017

PHOENIX: efficient computation in memory

Mats Rimborg; Pedro Trancoso; Gunnar Carlstedt

Parallelism is inherent in most problems but due to current programming models and architectures which have evolved from a sequential paradigm, the parallelism exploited is restricted. We believe that the most efficient parallel execution is achieved when applications are represented as graphs of operations and data, which can then be mapped for execution on a modular and scalable processing-in-memory architecture. In this paper, we present PHOENIX, a general-purpose architecture composed of many Processing Elements (PEs) with memory storage and efficient computational logic units interconnected with a mesh network-on-chip. A preliminary design of PHOENIX shows it is possible to include 10,000 PEs with a storage capacity of 0.6GByte on a 1.5cm2 chip using 14nm technology. PHOENIX may achieve 6TFLOPS with a power consumption of up to 42W, which results in a peak energy efficiency of at least 143GFLOPS/W. A simple estimate shows that for a 4K FFT, PHOENIX achieves 117GFLOPS/W which is more than double of what is achieved by state-of-the-art systems.

Proceedings of the International Symposium on Memory Systems | 2017

Odd-ECC: on-demand DRAM error correcting codes

Alirad Malek; Evangelos Vasilakis; Vasileios Papaefstathiou; Pedro Trancoso; Ioannis Sourdis

An application may have different sensitivity to faults in different subsets of the data it uses. Some data regions may therefore be more critical than others. Capitalizing on this observation, Odd-ECC provides a mechanism to dynamically select the memory fault tolerance of each allocated page of a program on demand depending on the criticality of the respective data. Odd-ECC error correcting codes (ECCs) are stored in separate physical pages and hidden by the OS as pages unavailable to the user. Still, these ECCs are physically aligned with the data they protect so the memory controller can efficiently access them. Thereby, capacity, performance and energy overheads of memory fault tolerance are proportional to the criticality of the data stored. Odd-ECC is applied to memory systems that use conventional 2D DRAM DIMMs as well as to 3D-stacked DRAMs and evaluated using various applications. Compared to flat memory protection schemes, Odd-ECC substantially reduces ECCs capacity overheads while achieving the same Mean Time to Failure (MTTF) and in addition it slightly improves performance and energy costs. Under the same capacity constraints, Odd-ECC achieves substantially higher MTTF, compared to a flat memory protection. This comes at a performance and energy cost, which is however still a fraction of the cost introduced by a flat equally strong scheme.

MARC Symposium | 2011