Is this you? Create Your Porfile

Emil Matus

Dresden University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emil Matus is active.

Explore More

Publication

Featured researches published by Emil Matus.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2004

Synchronous Transfer Architecture (STA)

Gordon Cichon; Pablo Robelly; Hendrik Seidel; Emil Matus; Marcus Bronzel; Gerhard P. Fettweis

This paper presents a novel micro-architecture for high-performance and low-power DSPs. The underlying Synchronous Transfer Architecture (STA) fills the gap between SIMD-DSPs and coarse-grain reconfigurable hardware. STA processors are modeled using a common machine description suitable for both compiler and core generator. The core generator is able to generate models in Lisa, System-C, and VHDL. A special emphasis is placed on the good synthesis of the generated VHDL model.

international solid-state circuits conference | 2014

10.7 A 105GOPS 36mm 2 heterogeneous SDR MPSoC with energy-aware dynamic scheduling and iterative detection-decoding for 4G in 65nm CMOS

Benedikt Noethen; Oliver Arnold; Esther P. Adeva; Tobias Seifert; Erik Fischer; Steffen Kunze; Emil Matus; Gerhard P. Fettweis; Holger Eisenreich; Georg Ellguth; Stephan Hartmann; Sebastian Höppner; Stefan Schiefer; Jens-Uwe Schlüßler; Stefan Scholze; Dennis Walter; René Schüffny

Modern mobile communication systems face conflicting design constraints. On the one hand, the expanding variety of transmission modes calls for highly flexible solutions supporting the ever-growing number and diversity of application requirements. On the other hand, stringent power restrictions (e.g., at femto base stations and terminals) must be considered, while satisfying the demanding performance requirements. In order to cope with these issues, existing SDR platforms, e.g. [1-2], propose an MPSoC with a heterogeneous array of processing elements (PEs). MPSoC solutions provide programmability and parallelism yielding flexibility, processing performance and power efficiency. To schedule the resources and to apply power gating, a static approach is employed. In contrast, we present a heterogeneous MPSoC platform (Tomahawk2) with runtime scheduling and fine-grained hierarchical power management. This solution can fully adapt to the dynamically varying workload and semi-deterministic behavior in modern concurrent wireless applications. The proposed dynamic scheduler (CoreManager, CM) can be implemented either in software on a general-purpose processor or on a dedicated application-specific hardware unit. It is evident that the software approach offers the highest degree of flexibility; however, it may become a performance-bottleneck for complex applications. A high-throughput ASIC was presented in [3], but this solution does not permit scheduling algorithms to be adjusted. In this work, these limitations are overcome by implementing the CM on an ASIP.

ACM Transactions in Embedded Computing Systems | 2014

Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs

Oliver Arnold; Emil Matus; Benedikt Noethen; Markus Winter; Torsten Limberg; Gerhard P. Fettweis

Heterogeneity and parallelism in MPSoCs for 4G (and beyond) communications signal processing are inevitable in order to meet stringent power constraints and performance requirements. The question arises on how to cope with the problem of system programmability and runtime management incurred by the statically or even dynamically varying number and type of processing elements. This work addresses this challenge by proposing the concept of a heterogeneous many-core platform called Tomahawk. Apart from the definition of the system architecture, in this approach a unified framework including a model of computation, a programming interface and a dedicated runtime management unit called CoreManager is proposed. The increase of system complexity in terms of application parallelism and number of resources may lead to a dramatic increase of the management costs, hence causing performance degradation. For this reason, the efficient implementation of the CoreManager becomes a major issue in system design. This work compares the performance and capabilities of various CoreManager HW/SW solutions, based on ASIC, RISC and ASIP paradigms. The results demonstrate that the proposed ASIP-based solution approaches the performance of the ASIC realization, while preserving the full flexibility of the software (RISC-based) implementation.Heterogeneity and parallelism in MPSoCs for 4G (and beyond) communications signal processing are inevitable in order to meet stringent power constraints and performance requirements. The question arises on how to cope with the problem of system programmability and runtime management incurred by the statically or even dynamically varying number and type of processing elements. This work addresses this challenge by proposing the concept of a heterogeneous many-core platform called Tomahawk. Apart from the definition of the system architecture, in this approach a unified framework including a model of computation, a programming interface and a dedicated runtime management unit called CoreManager is proposed. The increase of system complexity in terms of application parallelism and number of resources may lead to a dramatic increase of the management costs, hence causing performance degradation. For this reason, the efficient implementation of the CoreManager becomes a major issue in system design. This work compares the performance and capabilities of various CoreManager HW/SW solutions, based on ASIC, RISC and ASIP paradigms. The results demonstrate that the proposed ASIP-based solution approaches the performance of the ASIC realization, while preserving the full flexibility of the software (RISC-based) implementation.

international symposium on circuits and systems | 2009

Vectorization of the Sphere Detection algorithm

Björn Mennenga; Emil Matus; Gerhard P. Fettweis

In this paper we present concepts for vectorization of sphere detection algorithms based on regularization of depth first tree search algorithms. Due to data dependant control flow, these tree search algorithms exhibit a highly irregular structure not allowing an efficient collaborative detection of multiple received symbols in parallel. In order to enable parallel symbol processing, a transformation of the irregular tree search algorithm is proposed resulting in a novel regular algorithm structure. Based on this, a concept for a vectorized List Sphere Detector is introduced, employing a SIMD computational model. In addition to this, limiting effects of vector processing are studied, leading to concepts which ease these effects and enable the utilization of vectorizations benefits.

international solid-state circuits conference | 2012

A 335Mb/s 3.9mm 2 65nm CMOS flexible MIMO detection-decoding engine achieving 4G wireless data rates

Markus Winter; Steffen Kunze; Esther P. Adeva; Björn Mennenga; Emil Matus; Gerhard P. Fettweis; Holger Eisenreich; Georg Ellguth; Sebastian Höppner; Stefan Scholze; René Schüffny; Tomoyoshi Kobori

In current and future wireless standards, such as WiMAX, 3GPP-LTE or LTE-Advanced, receiver terminals have to support numerous operating modes for each protocol [1], as well as sophisticated transmission techniques, especially enhanced MIMO detection and iterative forward error correction (FEC). MIMO detection and FEC belong to the most computationally complex parts of the receiver-side baseband signal processing chain. Implementations thereof must have low power consumption, but also be able to interact in a flexible and efficient way in the detection-decoding engine, while at the same time not compromising on the challenging throughput and flexibility requirements associated with 4G standards. In this paper, we present a chip implementation of a MIMO sphere detector combined with a flexible FEC engine, realizing a detection-decoding engine in silicon capable of satisfying 4G requirements with a data rate of 335Mb/s.

international conference on parallel processing | 2006

Code generation for STA architecture

Jie Guo; Torsten Limberg; Emil Matus; Björn Mennenga; Reimund Klemm; Gerhard P. Fettweis

This paper presents a novel compiler backend which generates assembly code for Synchronous Transfer Architecture (STA). STA is a Very Long Instruction Word (VLIW) architecture and in addition it uses a non-orthogonal Instruction Set Architecture (ISA). Generating efficient code for this architecture needs highly optimizing techniques. The compiler backend presented in this paper is based on Integer Linear Programming (ILP). Experimental results show that the generated assembly code consumes much less execution time than the code generated by traditional ways, and the code generation can be accomplished in acceptable time.

signal processing systems | 2010

A ”multi-user” approach towards a channel decoder for convolutional, turbo and LDPC codes

Steffen Kunze; Emil Matus; Gerhard P. Fettweis; Tomoyoshi Kobori

In this paper we present the concept of a high-throughput multi-mode channel decoder architecture that consists of a tightly coupled array of independently programmable processing cores. Every core is capable of decoding low-density parity-check (LDPC), convolutional turbo (CTC) and con-volutional codes (CC) either independently or jointly with other cores. This approach allows parallel handling of several separate decoding processes in one decoder engine as well as performing a single high-throughput decoding, opening up a new level of flexibility for channel decoding. The multi-mode decoder core as well as the multi-core approach are explained and simulation results presented. A case study of the proposed architecture was implemented in a 65nm-process using an area of 0.44 mm2. At 200 MHz, throughputs of up to 86 Mbps could be reached.

design, automation, and test in europe | 2016

EUROSERVER: Share-anything scale-out micro-server design

Manolis Marazakis; John Goodacre; Didier Fuin; Paul M. Carpenter; John Thomson; Emil Matus; Antimo Bruno; Per Stenström; Jérôme Martin; Yves Durand; Isabelle Dor

This paper provides a snapshot summary of the trends in the area of micro-server development and their application in the broader enterprise and cloud markets. Focusing on the technology aspects, we provide an understanding of these trends and specifically the differentiation and uniqueness of the approach being adopted by the EUROSERVER FP7 project. The unique technical contributions of EUROSERVER range from the fundamental system compute unit design architecture, through to the implementation approach both at the chiplet nanotechnological integration, and the everything-close physical form factor. Furthermore, we offer optimizations at the virtualisation layer to exploit the unique hardware features, and other framework optimizations, including exploiting the hardware capabilities at the run-time system and application layers.

international conference on multimedia and expo | 2009

ICT-Emuco. An innovative solution for future smart phones

Maria Elizabeth Gonzalez; Attila Bilgic; Adam Lackorzynski; Dacian Tudor; Emil Matus; Irv Badr

Mobile communication has become the dominant branch in the communication business over the last decade and is still rapidly growing in the market. With the recent advances in wireless networks and the exponential growth in the usage of multimedia applications, multi-core platforms point to be the solution of feature-rich phones, such as the iPhone or the BlackBerry Storm to deliver the performance comparable to todays computer system. On the other hand, system scalability and flexibility are vital to enable fast time-to-market and allow manufacturers and service providers to be competitive. Use of virtualization techniques and software development to scalable parallel hardware architectures are inevitable outcome to face the migration to multi-core platforms on mobile devices.

great lakes symposium on vlsi | 2016

Trellis-search based Dynamic Multi-Path Connection Allocation for TDM-NoCs

Yong Chen; Emil Matus; Gerhard P. Fettweis

This paper proposes a centralized approach for connection allocation for TDM-based NoCs by making use of dedicated hardware unit called NoCManager that employs trellis-based search algorithm enabling dynamic parallel multi-path, multislot allocation. Be different to the previous unrolled trellis search algorithm, in this paper the folded architecture is employed to achieve efficiency. In comparison with previous TDM connection allocation methods, the proposed design has the following advantages: (1) hardware supported low-latency, high-throughput allocation mechanism, (2) improved success rate due to parallel multi-path search and (3) efficient NoCManager architecture. Compared to centralized software solutions the proposed design demonstrates two orders of magnitude improvement in allocation speed and ten times higher success rate. It can provide up to several times higher allocation speed and up to 18% higher success rate against recently proposed distributed solution.

Explore More