Is this you? Create Your Porfile

Dimitrios Soudris

National Technical University of Athens

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dimitrios Soudris is active.

Explore More

Publication

Featured researches published by Dimitrios Soudris.

ACM Transactions on Design Automation of Electronic Systems | 2006

Systematic dynamic memory management design methodology for reduced memory footprint

David Atienza; José M. Mendías; Dimitrios Soudris; Francky Catthoor

New portable consumer embedded devices must execute multimedia and wireless network applications that demand extensive memory footprint. Moreover, they must heavily rely on Dynamic Memory (DM) due to the unpredictability of the input data (e.g., 3D streams features) and system behavior (e.g., number of applications running concurrently defined by the user). Within this context, consistent design methodologies that can tackle efficiently the complex DM behavior of these multimedia and network applications are in great need. In this article, we present a new methodology that allows to design custom DM management mechanisms with a reduced memory footprint for such kind of dynamic applications. First, our methodology describes the large design space of DM management decisions for multimedia and wireless network applications. Then, we propose a suitable way to traverse the aforementioned design space and construct custom DM managers that minimize the DM used by these highly dynamic applications. As a result, our methodology achieves improvements of memory footprint by 60% on average in real case studies over the current state-of-the-art DM managers used for these types of dynamic applications.

power and timing modeling optimization and simulation | 2000

Data-Reuse and Parallel Embedded Architectures for Low-Power, Real-Time Multimedia Applications

Dimitrios Soudris; Nikolaos D. Zervas; Antonios Argyriou; Minas Dasygenis; Konstantinos Tatas; Constantinos E. Goutis; Adonios Thanailakis

Exploitation of data re-use in combination with the use of custom memory hierarchy that exploits the temporal locality of data accesses may introduce significant power savings, especially for data-intensive applications. The effect of the data-reuse decisions on the power dissipation but also on area and performance of multimedia applications realized on multiple embedded cores is explored. The interaction between the data-reuse decisions and the selection of a certain data-memory architecture model is also studied. As demonstrator a widely-used video processing algorithmic kernel, namely the full search motion estimation kernel, is used. Experimental results prove that improvements in both power and performance can be acquired, when the right combination of data memory architecture model and data-reuse transformation is selected.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

A Heterogeneous Multicore System on Chip with Run-Time Reconfigurable Virtual FPGA Architecture

Michael Hübner; Peter Figuli; Dimitrios Soudris; Kostas Siozios; Jürgen Becker

System design, especially for low power embedded applications often profit from a heterogeneous target hardware platform. The application can be partitioned into modules with specific requirements e.g. parallelism or performance in relation to the provided hardware blocks on the multicore hardware. The result is an optimized application mapping and a parallel processing with lower power consumption on the different cores on the hardware. This paper presents a heterogeneous platform consisting of a microprocessor and a field programmable gate array (FPGA) connected via a standard AMBA bus. The novelty of this approach is that the FPGA is realized as virtual reconfigurable hardware upon a traditional off the shelf FPGA device. The advantage with this approach is that the specification of the virtual FPGA stays unchanged, independent to the underlying hardware and provides therefore features, which the exploited physical host FPGA cannot provide. A special feature of the presented virtual FPGA amongst others is the dynamic reconfigurability which is for example not available with all off the shelf FPGAs. Furthermore the concept of FPGA virtualization enables the re-use of hardware blocks on other physical FPGA devices. This paper presents the hardware platform and describes the tool chain for the heterogeneous system on chip.

international symposium on circuits and systems | 2001

The circuit design of multiple-valued logic voltage-mode adders

I. Thoidis; Dimitrios Soudris; J. M. Fernandez; Adonios Thanailakis

Novel quaternary half adder, full adder, and a carry-lookahead adder are introduced. The proposed circuits are static and operate in voltage-mode. Moreover, there is no current flow in steady states, and thus no static power dissipation. Although the comparison in transistor count shows that the proposed quaternary circuits are larger than two respective binary ones, benefits in parallel addition arise from the use of multiple-valued logic. Firstly, the ripple-carry additions are faster because the number of carries are half compared to binary ones and the delay from the input carry through the output carry is relatively small. Secondly, the carry-lookahead scheme exhibits less complexity, which leads to overall reduction in transistor count for addition with a large number of bits.

IEEE Transactions on Very Large Scale Integration Systems | 2006

A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

Minas Dasygenis; Erik Brockmeyer; Bart Durinck; Francky Catthoor; Dimitrios Soudris; A. Thanailakis

Memory latency has always been a major issue in embedded systems that execute memory-intensive applications. This is even more true as the gap between processor and memory speed continues to grow. Hardware and software prefetching have been shown to be effective in tolerating the large memory latencies inherit in large off-chip memories; however, both types of prefetching have their shortcomings. Hardware schemes are more complex and require extra circuitry to compute data access strides, while software schemes generate prefetch instructions, which if not computed carefully may hamper performance. On the other hand, some applications domains (such as multimedia) have a uniform and known a priori memory access pattern, that if exploited, could yield significant application performance improvement. With this characteristic in mind, we present our findings on hiding memory latency using the direct memory access (DMA) mode, which is present in all modern systems, combined with a software prefetch mechanism, and a customized on-chip memory hierarchy mapping. Compared to previous approaches, we are able to estimate the performance and power metrics, without actually implementing the embedded system. Experimental results on nine well known multimedia and imaging applications prove the efficiency of our technique. Finally, we verify the performance estimations by implementing and simulating the algorithms on the TI C6201 processor.

International Journal of Reconfigurable Computing | 2008

Architecture-Level Exploration of Alternative Interconnection Schemes Targeting 3D FPGAs: A Software-Supported Methodology

Kostas Siozios; Alexandros Bartzas; Dimitrios Soudris

In current reconfigurable architectures, the interconnection structures increasingly contribute more to the delay and power consumption. The demand for increased clock frequencies and logic density (smaller area footprint) makes the problem even more important. Three-dimensional (3D) architectures are able to alleviate this problem by accommodating a number of functional layers, each of which might be fabricated in different technology. However, the benefits of such integration technology have not been sufficiently explored yet. In this paper, we propose a software-supported methodology for exploring and evaluating alternative interconnection schemes for 3D FPGAs. In order to support the proposed methodology, three new CAD tools were developed (part of the 3D MEANDER Design Framework). During our exploration, we study the impact of vertical interconnection between functional layers in a number of design parameters. More specifically, the average gains in operation frequency, power consumption, and wirelength are 35%, 32%, and 13%, respectively, compared to existing 2D FPGAs with identical logic resources. Also, we achieve higher utilization ratio for the vertical interconnections compared to existing approaches by 8% for designing 3D FPGAs, leading to cheaper and more reliable devices.

Integration | 2006

Efficient system-level prototyping of power-aware dynamic memory managers for embedded systems

David Atienza; Francesco Poletti; José M. Mendías; Francky Catthoor; Luca Benini; Dimitrios Soudris

In the near future, portable embedded devices must run multimedia and wireless network applications with enormous computational performance (1-40GOPS) requirements at a low energy consumption (0.1-2W). In these applications, the dynamic memory subsystem is currently one of the main sources of power consumption and its inappropriate management can severely affect the performance of the whole system. Within this context, the construction and power evaluation of custom memory managers is one of the most difficult parts for an efficient mapping of such dynamic applications on low-power embedded systems. In this paper, we present a new system-level approach to model complex dynamic memory managers integrating detailed power profiling information. This approach allows to obtain power consumption estimates, memory footprint and memory access values to refine the dynamic memory management of the system in an early stage of the design flow and to easily explore the large search space of memory manager implementations.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1997

A VLSI design methodology for RNS full adder-based inner product architectures

Dimitrios Soudris; Vassilis Paliouras; Thanos Stouraitis; Costas E. Goutis

In this paper, a systematic graph-based methodology for synthesizing VLSI RNS architectures using full adders as the basic building block is introduced. The design methodology derives array architectures starting from the algorithm level and ending up with the bit-level design. Using as target architectural style the regular array processor, the proposed procedure constructs the two-dimensional (2-D) dependence graph of the bit-level algorithm, which is formally described by sets of uniform recurrent equations. The main characteristic of the proposed architectures is that they can operate at very high-throughput rates. The proposed architectures exhibit significantly reduced complexity over ROM-based ones.

design, automation, and test in europe | 2004

Dynamic memory management design methodology for reduced memory footprint in multimedia and wireless network applications

David Atienza; Francky Catthoor; José M. Mendías; Dimitrios Soudris

New portable consumer embedded devices must execute multimedia and wireless network applications that demand extensive memory footprint. Moreover, they must heavily rely on dynamic memory (DM) due to the unpredictability of the input data (e.g. 3D streams features) and system behaviour (e.g. number of applications running concurrently defined by the user). Within this context, consistent design methodologies that can tackle efficiently the complex DM behaviour of these multimedia and network applications are in great need. In this paper, we present a new methodology that allows to design custom DM management mechanisms with a reduced memory footprint for such kind of dynamic applications. The experimental results in real case studies show that our methodology improves memory footprint 60% on average over current state-of-the-art DM managers.

design automation conference | 2013

Distributed run-time resource management for malleable applications on many-core platforms

Iraklis Anagnostopoulos; Vasileios Tsoutsouras; Alexandros Bartzas; Dimitrios Soudris

Todays prevalent solutions for modern embedded systems and general computing employ many processing units connected by an on-chip network leaving behind complex superscalar architectures In this paper, we couple the concept of distributed computing with parallel applications and present a workload-aware distributed run-time framework for malleable applications on many-core platforms. The presented framework is responsible for serving in a distributed way and at run-time, the needs of malleable applications, maximizing resource utilization avoiding dominating effects and taking into account the type of processors supporting platform heterogeneity, while having a small overhead in overall inter-core communication. Our framework has been implemented as part of a C simulator and additionally as a runtime service on the Single-Chip Cloud Computer (SCC), an experimental processor created by Intel Labs, and we compared it against a state-of-art run-time resource manager. Experimental results showed that our framework has on average 70% less messages, 64% smaller message size and 20% application speed-up gain.

Explore More