Menno Lindwer
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Menno Lindwer.
Microprocessors and Microsystems | 2013
Lech Józwiak; Menno Lindwer; Rosilde Corvino; Paolo Meloni; Laura Micconi; Jan Madsen; Erkan Diken; Deepak Gangadharan; R Roel Jordans; Sebastiano Pomata; Paul Pop; Giuseppe Tuveri; Luigi Raffo; Giuseppe Notarangelo
This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an overview of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to address the challenges and solve the problems. Finally, it discusses the ASAM design-flow, its main stages and tools and their application to a real-life case study.
digital systems design | 2012
Lech Józwiak; Menno Lindwer; Rosilde Corvino; Paolo Meloni; Laura Micconi; Jan Madsen; Erkan Diken; Deepak Gangadharan; R Roel Jordans; Sebastiano Pomata; Paul Pop; Giuseppe Tuveri; Luigi Raffo
This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an over-view of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main problems to be solved and challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to resolve the problems and address the challenges. Finally, it introduces and briefly discusses the ASAM design-flow and its main stages.
Vlsi Design | 2012
Paolo Meloni; Sebastiano Pomata; Giuseppe Tuveri; Simone Secchi; Luigi Raffo; Menno Lindwer
Application Specific Instruction-set Processors (ASIPs) expose to the designer a large number of degrees of freedom. Accurate and rapid simulation tools are needed to explore the design space. To this aim, FPGA-based emulators have recently been proposed as an alternative to pure software cycle-accurate simulator. However, the advantages of on-hardware emulation are reduced by the overhead of the RTL synthesis process that needs to be run for each configuration to be emulated. The work presented in this paper aims at mitigating this overhead, exploiting a form of software-driven platform runtime reconfiguration. We present a complete emulation toolchain that, given a set of candidate ASIP configurations, identifies and builds an overdimensioned architecture capable of being reconfigured via software at runtime, emulating all the design space points under evaluation. The approach has been validated against two different case studies, a filtering kernel and an M-JPEG encoding kernel. Moreover, the presented emulation toolchain couples FPGA emulation with activity-based physical modeling to extract area and power/energy consumption figures. We show how the adoption of the presented toolchain reduces significantly the design space exploration time, while introducing an overhead lower than 10% for the FPGA resources and lower than 0.5% in terms of operating frequency.
parallel, distributed and network-based processing | 2011
Lech Józwiak; Menno Lindwer
The recent spectacular progress in modern Nan electronic technology enabled implementation of very complex multiprocessor systems on single chips (MPSoCs) and created a big stimulus towards development of MPSoCs for embedded applications. The increasingly complex MPSoCs are required to perform real-time computations to extremely tight schedules and to satisfy high demands regarding adaptability, as well as energy, area and cost efficiency. This results in serious design and development challenges. The opportunities created can effectively be exploited only through use of more adequate system architectures and more integrated system IP modules, supported by new effective design methods and electronic design automation tools. This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It is related to a European project ASAM being currently executed in the framework of the ARTEMIS program. It presents the results of our analysis of the main problems that have to be solved and challenges to be faced in design of such heterogeneous customizable MPSoCs for modern demanding applications.
design, automation, and test in europe | 2012
Sebastiano Pomata; Paolo Meloni; Giuseppe Tuveri; Luigi Raffo; Menno Lindwer
Complex Application Specific Instruction-set Processors (ASIPs) expose to the designer a large number of degrees of freedom, posing the need for highly accurate and rapid simulation environments. FPGA-based emulators represent an alternative to software cycle-accurate simulators, preserving maximum accuracy and reasonable simulation times. The work presented in this paper aims at exploiting FPGA emulation within technology aware design space exploration of ASIPs. The potential speedup provided by reconfigurable logic is reduced by the overhead of RTL synthesis/implementation. This overhead can be mitigated by reducing the number of FPGA implementation processes, through the adoption of binary-level translation. Hereby we present a prototyping method that, given a set of candidate ASIP configurations, defines an overdimensioned ASIP architecture, capable of emulating all the design space points under evaluation. This approach is then evaluated with a design space exploration case study. Along with execution time, by coupling FPGA emulation with activity-based physical modeling, we can extract area/power/energy figures.
digital systems design | 2011
Alexandre Solon Nery; Lech Józwiak; Menno Lindwer; Mauro Cocco; Nadia Nedjah; Felipe M. G. França
Effective exploitation of the application-specific parallel patterns and computation operations through their direct implementation in hardware is the base for construction of high-quality application-specific (re-)configurable application specific instruction set processors (ASIPs) and hardware accelerators for modern highly-demanding applications. Although it receives a lot of attention from the researchers and practitioners, a very important problem of hardware reuse in ASIP and accelerator synthesis is clearly underestimated and does not get enough attention in the published research. This paper is an effect of an industry and academic collaborative research. It analyses the problem of hardware sharing, shows its high practical relevance, as well as a big influence of hardware sharing on the major circuit and system parameters, and its importance for the multi-objective optimization and tradeoff exploitation. It also demonstrates that the state-of-the-art synthesis tools do not sufficiently address this problem and gives several guidelines related to enhancement of the hardware reuse.
design, automation, and test in europe | 2013
Menno Lindwer; Mark Ruvald Pedersen
Within todays SoCs, functionality such as video, audio, graphics, and imaging is increasingly integrated through IP blocks, which are subsystems in their own right. Integration of IP blocks within SoCs always brought software integration aspects with it. However, since these subsystems increasingly consist of programmable processors, many more layers of firmware and software need to be integrated. In the imaging domain, this is particularly true. Imaging subsystems typically are highly heterogeneous, with high levels of parallelism. The construction of their firmware requires target-specific optimization, yet needs to take interoperability with sensor input systems and graphics/display subsystems into account. Hard real-time scheduling within the subsystem needs to cooperate with less stringent image analytics and SoC-level (OS) scheduling. In many of todays systems, the latter often only supports soft scheduling deadlines. At HW level, IP subsystems need to be integrated such that they can efficiently exchange both short-latency control signals and high-bandwidth data-plane blocks. Solutions exist, but need to be properly configured. However, at the SW level, currently no support exists that provides (i) efficient programmability, (ii) SW abstraction of all the different HW features of these blocks, and (iii) interoperability of these blocks. Starting points could be languages such as OpenCL and OpenCV, which do provide some abstractions, but are not yet sufficiently versatile.
Microprocessors and Microsystems | 2013
Alexandre Solon Nery; Lech Jówiak; Menno Lindwer; Mauro Cocco; Nadia Nedjah; Felipe M. G. França
Effective exploitation of the application-specific parallel patterns and computation operations through their direct implementation in hardware is the base for construction of high-quality application-specific (re-) configurable application specific instruction set processors (ASIPs) and hardware accelerators for modern highly-demanding applications. Although it receives a lot of attention from the researchers and practitioners, a very important problem of hardware reuse in ASIP and accelerator synthesis is clearly underestimated and does not get enough attention in the published research. This paper is an effect of an industry and academic collaborative research. It analyses the problem of hardware sharing, shows its high practical relevance, as well as a big influence of hardware sharing on the major circuit and system parameters, and its importance for the multi-objective optimization and tradeoff exploitation. It also demonstrates that the state-of-the-art synthesis tools do not sufficiently address this problem and gives several guidelines related to enhancement of the hardware reuse.
digital systems design | 2011
Menno Lindwer
Programmable data-parallel embedded systems are typically associated with tasks such as image processing, video decoding, and software-defined radio. This talk is particularly focused on designs for resource-constrained mobile and consumer devices. Today, heterogeneous multi-core designs are hailed as the solution, and many research teams claim to work on this topic. However, the heterogeneous processing often stays at the level of combining many RISCs with many DSPs or similarly adapted processors, which should actually still be classified as a homogeneous. In order to really compete with hardwired designs, extremely high efficiency is required. In this talk, we will show how the required levels of efficiency are obtained by building systems which consist of limited sets of highly parallel purpose-built processors, and by ensuring that these systems are programmed to efficiently utilize the available compute resources.
embedded software | 2016
Merten Popp; Orlando Moreira; Wim Yedema; Menno Lindwer
Automated hardware design flows considerably speed up the development of embedded systems and are a useful asset during architecture exploration phase. However, any existing software has to be adapted for every new system. In this work we will demonstrate, how a Hardware Abstraction Layer (HAL) for device addresses and properties can be automatically generated from a formal system description while providing sufficient abstraction from hardware details. Comparison to earlier projects show that this saves between 40-50 person-weeks of work per IP. Necessary device specific information is stored using a novel approach that allows the compiler to remove unused data. We will show with a real imaging application that this can reduce the amount of memory used by the HAL from ≈ 9.5% to ≈ 5.1% of the available scratchpad memory. It will also be shown that the overhead of this HAL only depends on the level of abstraction that is used by the application and that performance and memory usage will equal a hard-coded solution in the case that an application uses compile-time constant device identifiers.