Johan Ersfolk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Johan Ersfolk is active.

Explore More

Publication

Featured researches published by Johan Ersfolk.

international conference on acoustics, speech, and signal processing | 2012

Scheduling of dynamic dataflow programs based on state space analysis

Johan Ersfolk; Ghislain Roquier; Johan Lilius; Marco Mattavelli

Compile-time scheduling of dynamic dataflow programs is still an open problem. This paper presents how scheduling of dynamic portions of asynchronous dataflow networks described using CAL language can be determined before execution by the analysis of the state space of network partitions. Experiments show that the number of run-time operations employed by dynamic schedulers is largely reduced when the schedules extracted by the state analysis are employed.

signal processing systems | 2011

Scheduling of dynamic dataflow programs with model checking

Johan Ersfolk; Ghislain Roquier; Fareed Jokhio; Johan Lilius; Marco Mattavelli

The scheduling at compile-time of dynamic dataflow programs is still an open problem. This paper presents some initial results showing that scheduling of dynamic portions of CAL dataflow networks can be reduced to static scheduling by the analysis of the state space of network partitions. The CAL sub-network is converted to an equivalent Promela program and analyzed using a state analysis tool (SPIN) identifying deterministic schedules that link recurring network execution states. Therefore, the only dynamic operation of the scheduler remains the guard evaluations between states linked by the obtained deterministic schedules. Experiments show that the number of operations employed by dynamic schedulers is largely reduced when the schedules extracted by the state analysis are employed.

software and compilers for embedded systems | 2009

The canals language and its compiler

Andreas Dahlin; Johan Ersfolk; Guyfu Yang; Haitham Habli; Johan Lilius

Stream-based computing as embodied in stream-programming environments and streaming languages has attracted quite a lot of interest as a potential solution to programming many-cores. Modern embedded multimedia devices embody many characteristics of stream-based computing with the additional constraint on energy-consumption. In this paper we present a new streaming language Canals together with its compiler. Canals proposes the following novel features: 1. The ability to describe the scheduling of the computation kernels: Canals has a sub-language for describing schedulers and run-time system support. 2. The ability to detect type of data on the inputs of a network (the scheduling is often dependent on the data at run-time): Canals provides bit-stream parsing through automatic deserialization of data in network inputs. 3. Choice of synchronization mechanism between computational kernels and the scheduler to avoid overheads. This is implemented in the run-time system through the Hardware Abstraction Layer (HAL). We describe the language and the code-generators for the Cell processor and the Altera FPGA board.

IEEE Transactions on Signal Processing | 2015

Actor Merging for Dataflow Process Networks

Jani Boutellier; Johan Ersfolk; Johan Lilius; Marco Mattavelli; Ghislain Roquier; Olli Silvén

Dataflow process networks provide a versatile model of computation for specifying signal processing applications in a platform independent fashion. This attractive feature of dataflow has lately been realized in dataflow programming tools that allow synthesizing the same application specification as both fixed hardware circuits and as software for programmable processors. However, in practice, the specification granularity of the dataflow program remains an arbitrary choice of the designer. Dataflow specifications of the same application with equivalent I/O behaviour can range from a single dataflow actor to a very fine grained network composed of elementary processing operations. A very fine grained dataflow specification might result into a high performance implementation when synthesized as hardware, but might perform poorly when executed on a programmable processor. This article presents actor merging as one solution for this performance portability problem of dataflow programs. In contrast to previous work around actor merging, this article presents a methodology that can merge also dynamic dataflow actors. To support these claims, results of experiments on several processing platforms and application examples ranging from telecommunications to video compression are reported.

parallel, distributed and network-based processing | 2015

Execution of Dataflow Process Networks on OpenCL Platforms

Wictor Lund; Sudeep Kanur; Johan Ersfolk; Leonidas Tsiopoulos; Johan Lilius; Joakim Haldin; Ulf Falk

The trend in computing systems is to combine various kinds of processing elements (PEs) to build more parallel architectures. This trend leads to more heterogeneous computing systems, for which abstractions are needed to efficiently program the systems without increasing the programming cost. This has lead to new programming languages and application programming interfaces (APIs). Parallel programming has always been a holy grail in computer science and dataflow programming promises a way to automatically provide parallel constructs for the programmer. This paper provides an approach to translate dataflow process networks (DPNs) into programs running some of the computations on the Open Computing Language (OpenCL) platform, supporting running programs on massively parallel hardware such as graphics processing units (GPUs). We show that certain DPN programs could run very efficiently on data-parallel architectures but also that there are certain patterns in DPN programs that prove problematic.

international conference on embedded software and systems | 2009

Memory Analysis of Low Power MPEG-4 Decoder Architecture

Andreas Dahlin; Johan Ersfolk; Haitham Habli; Johan Lilius

Recent research has shown that in mobile devices, energy efficiency of the total system does not scale at the same pace with the energy efficiency of the silicon. The reason has been attributed to overheads in software, and in the context of multi-media codecs a new approach has been proposed. In this approach hardware accelerators are scheduled quasi-statically thus decreasing the interfacing overhead substantially. The validation of the approach has been done by restructuring the open-source Xvid codec software implementation. In this paper we analyze the approach for its memory requirements, and propose some optimizations that will substantially decrease the memory bandwidth of the approach.

signal processing systems | 2013

High-performance programs by source-level merging of RVC-CAL dataflow actors

Jani Boutellier; Amanullah Ghazi; Olli Silvén; Johan Ersfolk

RVC-CAL is a dataflow language that has acquired an ecosystem of sophisticated design tools. Previous works have shown that RVC-CAL-based applications can automatically be deployed to multiprocessor platforms, as well as hardware descriptions with high efficiency. However, as RVC-CAL is a concurrent language, code generation for a single processor core requires careful application analysis and scheduling. Although much work has been done in this area, to this date no publication has reported that programs generated from RVC-CAL could rival handwritten programs on single-core processors. This paper proposes performance optimization of RVC-CAL applications by actor merging at source code level. The proposed methodology is demonstrated with an IEEE 802.15.4 (ZigBee) transmitter case study. The transmitter baseband software, previously written in C, is rewritten in RVC-CAL and optimized with the proposed methodology. Experiments show that on a VLIW-flavored processor the RVC-CAL-based program achieves the performance of manually written software.

international conference on information and communication security | 2012

Optimizing off-chip memory access costs in low power MPEG-4 decoder

Haitham Habli; Johan Ersfolk; Johan Lilius; Tomi Westerlund; Jari Nurmi

One of the main sources for energy consumption in modern video coding algorithms is caused by reading reference frames from main memory. Once a reference frame block is in the local memory, we need to predict in advance when the same reference frame block would be reread from the main memory. The prediction relies on the calculation of motion vectors according to their absolute addresses. The reduction of the number of read accesses can be attained by calculating which motion vectors point to a part of the reference frame that can be copied to a local memory. This increases data locality, and thus reduces energy consumption. The optimized memory access algorithm was integrated into low power MPEG-4 decoder architecture [1, 2] and modeled using SystemC. A cycle accurate model simulation with optimized algorithm showed an average of 10% increase on runtime performance compared to an unoptimized typical video decoding algorithm.

conference on design and architectures for signal and image processing | 2017

Detecting data-parallel synchronous dataflow graphs

Sudeep Kanur; Johan Lilius; Johan Ersfolk

Synchronous Dataflow (SDF), a popular subset of the dataflow programming paradigm, gives a well structured formalism to capture signal and stream processing applications. With data-parallel architectures becoming ubiquitous, several frameworks leverage the SDF formalism to map applications to parallel architectures. But, these frameworks assume that the Synchronous Dataflow graphs (SDFGs) under consideration already are data-parallel. In this paper, we address the lack of mechanisms required to detect if an SDFG can be executed in a data-parallel fashion. We develop necessary and sufficient conditions that an SDFG must satisfy for its data-parallel execution. In addition, we develop methods that detect and transform SDFGs that cannot be determined to be data-parallel through visual graph inspection alone. We report on a prototype implementation of the developed conditions as a compiler pass in PREESM framework and test them against some useful applications expressed as an SDFG.

international conference on acoustics, speech, and signal processing | 2013

Static and quasi-static compositions of stream processing applications from dynamic dataflow programs

Johan Ersfolk; Ghislain Roquier; Wictor Lund; Marco Mattavelli; Johan Lilius

Dynamic dataflow models for their expressiveness properties have shown to represent more adequate and attractive solutions for describing state of the art signal processing applications. However, they are known to present potential run-time penalties when implementations are obtained by mapping and scheduling a dataflow network partition on a processing unit. In general terms, a completely static scheduling at compile-time of dynamic dataflow programs remains an unsolved problem. Several approaches for the composition of actors are promising approach that can significantly reduce the potential penalty of run-time scheduling thus increasing the overall performance of the system. This paper presents static and quasi-static composition techniques that results in a reduction of the portion of dynamic dataflow networks, by applying appropriate transformations to network partitions that after a specific analysis demonstrate to possess a predictable behaviour. Some experiments based on a video processing application ported on several system-on-chips show the achievable speedup corresponding to the reduction of the number of run-time scheduling decisions.

Explore More