Jeronimo Castrillon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeronimo Castrillon is active.

Explore More

Publication

Featured researches published by Jeronimo Castrillon.

design, automation, and test in europe | 2015

Multi/many-core programming: where are we standing?

Jeronimo Castrillon; Lothar Thiele; Lars Schorr; Weihua Sheng; Ben H. H. Juurlink; Mauricio Alvarez-Mesa; Angela Pohl; Ralph Jessenberger; Victor Reyes; Rainer Leupers

This paper presents different views exposed in a special session on the current standing of programming and design tools for multi and manycores in the embedded domain. After approximately ten years of the advent of multicore architectures, we take a look at state-of-the-art and trends in model-based programming methodologies from an academic point of view. This view is contrasted with early experiences in transferring multicore compiler research to industry, and complemented with a critical view on the performance gap introduced by compilers for complex architectures. Today, multicores permeate new applications domains, creating new requirements and forcing researchers to rethink some underlying assumptions. This paper exposes the requirements of one such new domain, namely automotive. Applications in this domain require not only programming tools that comply to standards (e.g., ISO 26262) but also tools for high-level simulation, performance analysis and debugging. In this context, we discuss the role of virtual platforms in managing complexity of hardware-software interactions and accelerating the design of multicore systems for automotive applications.

international embedded systems symposium | 2015

Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs

Andrés Goens; Jeronimo Castrillon

Current approaches for mapping Kahn Process Networks (KPN) and Dynamic Data Flow (DDF) applications rely on assumptions on the program behavior specific to an execution. Thus, a near-optimal mapping, computed for a given input data set, may become sub-optimal at run-time. This happens when a different data set induces a significantly different behavior. We address this problem by leveraging inherent mathematical structures of the dataflow models and the hardware architectures. On the side of the dataflow models, we rely on the monoid structure of histories and traces. This structure help us formalize the behavior of multiple executions of a given dynamic application. By defining metrics we have a formal framework for comparing the executions. On the side of the hardware, we take advantage of symmetries in the architecture to reduce the search space for the mapping problem. We evaluate our implementation on execution variations of a randomly-generated KPN application and on a low-variation JPEG encoder benchmark. Using the described methods we show that trace differences are not sufficient for characterizing performance losses. Additionally, using platform symmetries we manage to reduce the design space in the experiments by two orders of magnitude.

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) | 2016

Why Comparing System-Level MPSoC Mapping Approaches is Difficult: A Case Study

Andrés Goens; Robert Khasanov; Jeronimo Castrillon; Simon Polstra; Andy D. Pimentel

Software abstractions are crucial to effectively program heterogeneous Multi-Processor Systems on Chip (MPSoCs). Prime examples of such abstractions are Kahn Process Networks (KPNs) and execution traces. When modeling computation as a KPN, one of the key challenges is to obtain a good mapping, i.e., an assignment of logical computation and communication to physical resources. In this paper we compare two system-level frameworks for solving the mapping problem: Sesame and MAPS. These frameworks, while superficially similar, embody different approaches. Sesame, motivated by modeling and design-space exploration, uses evolutionary algorithms for mapping. MAPS, being a compiler framework, uses simple and fast heuristics instead. In this work we highlight the value of common abstractions, such as KPNs and traces, as a vehicle to enable comparisons between large independent frameworks. These types of comparisons are fundamental for advancing research in the area. At the same time, we illustrate how the lack of formalized models at the hardware level are an obstacle to achieving fair comparisons. Additionally, using a set of applications from the embedded systems domain, we observe that genetic algorithms tend to outperform heuristics by a factor between 1× and 5×, with notable exceptions. This performance comes at the cost of a longer computation time, between 0 and 2 orders of magnitude in our experiments.

signal image technology and internet based systems | 2015

Efficient Data Structures for Dynamic Graph Analysis

Benjamin Schiller; Jeronimo Castrillon; Thorsten Strufe

In the era of social networks, gene sequencing, and big data, a new class of applications that analyze the properties of large graphs as they dynamically change over time has emerged. The performance of these applications is highly influenced by the data structures used to store and access the graph in memory. Depending on its size and structure, update frequency, and read accesses of the analysis, the use of different data structures can yield great performance variations. Even for expert programmers, it is not always obvious, which data structure is the best choice for a given scenario. In this paper, we present a framework for handling this issue automatically. It provides compile-time support for automatically selecting the most efficient data structures for a given graph analysis application assuming a consistent workload on the graph. We perform a measurement study to better understand the performance of five data structures and evaluate a prototype Java implementation of our framework. It achieves a speedup of up to 4.7× compared to basic data structure configurations for the analysis of real-world dynamic graphs.

Proceedings of the Real World Domain Specific Languages Workshop 2018 on | 2018

CFDlang: High-level code generation for high-order methods in fluid dynamics

Norman A. Rink; Immo Huismann; Adilla Susungi; Jeronimo Castrillon; Jörg Stiller; Jochen Fröhlich; Claude Tadonki

Numerical simulations continue to enable fast and enormous progress in science and engineering. Writing efficient numerical codes is a difficult challenge that encompasses a variety of tasks from designing the right algorithms to exploiting the full potential of a platforms architecture. Domain-specific languages (DSLs) can ease these tasks by offering the right abstractions for expressing numerical problems. With the aid of domain knowledge, efficient code can then be generated automatically from abstract expressions. In this work, we present the CFDlang DSL for expressing tensor operations that constitute the performance-critical code sections in a class of real numerical applications from fluid dynamics. We demonstrate that CFDlang can be used to generate code automatically that performs as well, if not better, than carefully hand-optimized code.

design, automation, and test in europe | 2017

Rethinking on-chip DRAM cache for simultaneous performance and energy optimization

Fazal Hameed; Jeronimo Castrillon

State-of-the-art DRAM cache employs a small Tag-Cache and its performance is dependent upon two important parameters namely bank-level-parallelism and Tag-Cache hit rate. These parameters depend upon the row buffer organization. Recently, it has been shown that a small row buffer organization delivers better performance via improved bank-level-parallelism than the traditional large row buffer organization along with energy benefits. However, small row buffers do not fully exploit the temporal locality of tag accesses, leading to reduced TagCache hit rates. As a result, the DRAM cache needs to be re-designed for small row buffer organization to achieve additional performance benefits. In this paper, we propose a novel tag-store mechanism that improves the Tag-Cache hit rate by 70% compared to existing DRAM tag-store mechanisms employing small row buffer organization. In addition, we enhance the DRAM cache controller with novel policies that take into account the locality characteristics of cache accesses. We evaluate our novel tag-store mechanism and controller policies in an 8-core system running the SPEC2006 benchmark and compare their performance and energy consumption against recent proposals. Our architecture improves the average performance by 21.2% and 11.4% respectively compared to large and small row buffer organizations via simultaneously improving both parameters. Compared to DRAM cache with large row buffer organization, we report an energy improvement of 62%.

Companion to the first International Conference on the Art, Science and Engineering of Programming on | 2017

Analyzing State-of-the-Art Role-based Programming Languages

Lars Schütze; Jeronimo Castrillon

With ubiquitous computing, autonomous cars, and cyber-physical systems (CPS), adaptive software becomes more and more important as computing is increasingly context-dependent. Role-based programming has been proposed to enable adaptive software design without the problem of scattering the context-dependent code. Adaptation is achieved by having objects play roles during runtime. With every role, the objects behavior is modified to adapt to the given context. In recent years, many role-based programming languages have been developed. While they greatly differ in the set of supported features, they all incur in large runtime overheads, resulting in inferior performance. The increased variability and expressiveness of the programming languages have a direct impact on the run-time and memory consumption. In this paper we provide a detailed analysis of state-of-the-art role-based programming languages, with emphasis on performance bottlenecks. We also provide insight on how to overcome these problems.

Proceedings of the 15th International Conference on Modularity | 2016

Fault tolerance with aspects: a feasibility study

Sven Karol; Norman A. Rink; Bálint Gyapjas; Jeronimo Castrillon

To enable correct program execution on unreliable hardware, software can be made fault-tolerant by adding program statements or machine instructions for fault detection and recovery. Manually modifying programs does not scale, and extending compilers to emit additional machine instructions lacks flexibility. However, since software-implemented hardware fault tolerance (SIHFT) can be understood as a cross-cutting concern, we propose aspect-oriented programming as a suitable implementation technique. We prove this proposition by implementing an AN encoder based on AspectC++. In terms of performance and fault coverage, we achieve comparable results to existing compiler-based solutions.

IEEE Transactions on Multi-Scale Computing Systems | 2018

A Hardware/Software Stack for Heterogeneous Systems

Jeronimo Castrillon; Matthias Lieber; Sascha Klüppelholz; Marcus Völp; Nils Asmussen; Uwe Aßmann; Franz Baader; Christel Baier; Gerhard P. Fettweis; Jochen Fröhlich; Andrés Goens; Sebastian Haas; Dirk Habich; Hermann Härtig; Mattis Hasler; Immo Huismann; Tomas Karnagel; Sven Karol; Akash Kumar; Wolfgang Lehner; Linda Leuschner; Siqi Ling; Steffen Märcker; Christian Menard; Johannes Mey; Wolfgang E. Nagel; Benedikt Nöthen; Rafael Peñaloza; Michael Raitza; Jörg Stiller

Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.

software and compilers for embedded systems | 2017

Robust Mapping of Process Networks to Many-Core Systems using Bio-Inspired Design Centering

Gerald Hempel; Andrés Goens; Jeronimo Castrillon; Josefine Asmus; Ivo F. Sbalzarini

Embedded systems are often designed as complex architectures with numerous processing elements. Effectively programming such systems requires parallel programming models e.g. task-based or dataflow-based models. With these types of models, the mapping of the abstract application model to the existing hardware architecture plays a decisive role and is usually optimized to achieve an ideal resource footprint or a near-minimal execution time. However, when mapping several independent programs to the same platform, resource conflicts can arise. This can be circumvented by remapping some of the tasks of an application, which in turn affect its timing behavior, possibly leading to constraint violations. In this work we present a novel method to compute mappings that are robust against local task remapping. The underlying method is based on the bio-inspired design centering algorithm of Lp-Adaptation. We evaluate this with several benchmarks on different platforms and show that mappings obtained with our algorithm are indeed robust. In all experiments, our robust mappings tolerated significantly more run-time perturbations without violating constraints than mappings devised with optimization heuristics

Explore More