Is this you? Create Your Porfile

Semeen Rehman

Dresden University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Semeen Rehman is active.

Explore More

Publication

Featured researches published by Semeen Rehman.

design automation conference | 2016

Invited - Cross-layer approximate computing: from logic to architectures

Muhammad Shafique; Rehan Hafiz; Semeen Rehman; Walaa El-Harouni; Jörg Henkel

We present a survey of approximate techniques and discuss concepts for building power-/energy-efficient computing components reaching from approximate accelerators to arithmetic blocks (like adders and multipliers). We provide a systematical understanding of how to generate and explore the design space of approximate components, which enables a wide-range of power/energy, performance, area and output quality tradeoffs, and a high degree of design flexibility to facilitate their design. To enable cross-layer approximate computing, bridging the gap between the logic layer (i.e. arithmetic blocks) and the architecture layer (and even considering the software layers) is crucial. Towards this end, this paper introduces open-source libraries of low-power and high-performance approximate components. The elementary approximate arithmetic blocks (adder and multiplier) are used to develop multi-bit approximate arithmetic blocks and accelerators. An analysis of data-driven resilience and error propagation is discussed. The approximate computing components are a first steps towards a systematic approach to introduce approximate computing paradigms at all levels of abstractions.

international conference on computer aided design | 2016

Architectural-space exploration of approximate multipliers

Semeen Rehman; Walaa El-Harouni; Muhammad Shafique; Akash Kumar; Jörg Henkel

This paper presents an architectural-space exploration methodology for designing approximate multipliers. Unlike state-of-the-art, our methodology generates various design points by adapting three key parameters: (1) different types of elementary approximate multiply modules, (2) different types of elementary adder modules for summing the partial products, and (3) selection of bits for approximation in a wide-bit multiplier design. Generation and exploration of such a design space enables a wide-range of multipliers with varying approximation levels, each exhibiting distinct area, power, and output quality, and thereby facilitates approximate computing at higher abstraction levels. We synthesized our designs using Synopsys Design Compiler with a TSMC 45nm technology library and verified using ModelSim gate-level simulations. Power and quality evaluations for various designs are performed using PrimeTime and behavioral models, respectively. The selected designs are then deployed in a JPEG application. For reproducibility and to facilitate further research and development at higher abstraction layers, we have released the RTL and behavioral models of these approximate multipliers and adders as an open-source library at https://sourceforge.net/projects/lpaclib/.

design, automation, and test in europe | 2017

Embracing approximate computing for energy-efficient motion estimation in high efficiency video coding

Walaa El-Harouni; Semeen Rehman; Bharath Srinivas Prabakaran; Akash Kumar; Rehan Hafiz; Muhammad Shafique

Approximate Computing is an emerging paradigm for developing highly energy-efficient computing systems. It leverages the inherent resilience of applications to trade output quality with energy efficiency. In this paper, we present a novel approximate architecture for energy-efficient motion estimation (ME) in high efficiency video coding (HEVC). We synthesized our designs for both ASIC and FPGA design flows. ModelSim gate-level simulations are used for functional and timing verification. We comprehensively analyze the impact of heterogeneous approximation modes on the power/energy-quality tradeoffs for various video sequences. To facilitate reproducible results for comparisons and further research and development, the RTL and behavioral models of approximate SAD architectures and constituting approximate modules are made available at https://sourceforge.net/projects/lpaclib/.

ACM Transactions in Embedded Computing Systems | 2016

Reliability-Aware Adaptations for Shared Last-Level Caches in Multi-Cores

Florian Kriebel; Semeen Rehman; Arun Subramaniyan; Segnon Jean Bruno Ahandagbe; Muhammad Shafique; Jörg Henkel

On account of their large footprint, on-chip last-level caches in multi-core systems are one of the most vulnerable components to soft errors. However, vulnerability to soft errors highly depends on the configuration and parameters of the last-level cache, especially when executing different applications concurrently. In this article we propose a novel reliability-aware reconfigurable last-level cache architecture (R2Cache) and cache vulnerability model for multi-cores. R2Cache supports various reliability-wise efficient cache configurations (i.e., cache parameter selection and cache partitioning) for different concurrently executing applications. The proposed vulnerability model takes into account the vulnerability of both the data and tag arrays as well as the active cache area for applications in different execution phases. To enable runtime adaptations, we introduce a lightweight online vulnerability predictor that exploits the knowledge of performance metrics like number of L2 misses to accurately estimate the cache vulnerability to soft errors. Based on the predicted vulnerabilities of different concurrently executing applications in the current execution epoch, our runtime reliability manager reconfigures the cache such that, for the next execution epoch, the total vulnerability for all concurrently executing applications is minimized under user-provided tolerable performance/energy overheads. In scenarios where single-bit error correction for cache lines may be afforded, vulnerability-aware reconfigurations can be leveraged to increase the reliability of the last-level cache against multi-bit errors. Compared to state-of-the-art vulnerability-minimizing and reconfigurable caches, the proposed architecture provides 35.27% and 23.42% vulnerability savings, respectively, when averaged across numerous experiments, while reducing the vulnerability by more than 65% and 60%, respectively, for selected applications and application phases.

design automation conference | 2018

Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators

Salim Ullah; Semeen Rehman; Bharath Srinivas Prabakaran; Florian Kriebel; Muhammad Abdullah Hanif; Muhammad Shafique; Akash Kumar

The architectural differences between ASICs and FPGAs limit the effective performance gains achievable by the application of ASIC-based approximation principles for FPGA-based reconfigurable computing systems. This paper presents a novel approximate multiplier architecture customized towards the FPGA-based fabrics, an efficient design methodology, and an open-source library. Our designs provide higher area, latency and energy gains along with better output accuracy than those offered by the state-of-the-art ASIC-based approximate multipliers. Moreover, compared to the multiplier IP offered by the Xilinx Vivado, our proposed design achieves up to 30%, 53%, and 67% gains in terms of area, latency, and energy, respectively, while incurring an insignificant accuracy loss (on average, below 1% average relative error). Our library of approximate multipliers is open-source and available online at https://cfaed.tudresden.de/pd-downloads to fuel further research and development in this area, and thereby enabling a new research direction for the FPGA community.

design, automation, and test in europe | 2017

Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-cores

Arun Subramaniyan; Semeen Rehman; Muhammad Shafique; Akash Kumar; Jörg Henkel

Mainstream multi-core processors employ large multilevel on-chip caches making them highly susceptible to soft errors. We demonstrate that designing a reliable cache hierarchy requires understanding the vulnerability interdependencies across different cache levels. This involves vulnerability analyses depending upon the parameters of different cache levels (partition size, line size, etc.) and the corresponding cache access patterns for different applications. This paper presents a novel soft error-aware cache architectural space exploration methodology and vulnerability analysis of multi-level caches considering their vulnerability interdependencies. Our technique significantly reduces exploration time while providing reliability-efficient cache configurations. We also show applicability/benefits for ECC-protected caches under multi-bit fault scenarios.

IEEE Transactions on Computers | 2017

Application-Guided Power-Efficient Fault Tolerance for H.264 Context Adaptive Variable Length Coding

Muhammad Shafique; Semeen Rehman; Florian Kriebel; Muhammad Usman Karim Khan; Bruno Zatt; Arun Subramaniyan; Bruno Boessio Vizzotto; Jörg Henkel

This paper presents a fault-tolerance technique for H.264s Context-Adaptive Variable Length Coding (CAVLC) on unreliable computing hardware. The application-specific knowledge is leveraged at both algorithm and architecture levels to protect the CAVLC process (especially context adaptation and coding tables) in a reliable yet power-efficient manner. Specifically, the statistical analysis of coding syntax and video content properties are exploited for: (1) selective redundancy of coefficient/header data of video bitstreams; (2) partitioning the coding tables into various sub-tables to reduce the power overhead of fault tolerance; and (3) run-time power management of memory parts storing the sub-tables and their parity computations. Experimental results demonstrate that leveraging application-specific knowledge reduces area and performance overhead by 2x compared to a double-parity table protection technique. For functional verification and area comparison, the complete H.264 CAVLC architecture is prototyped on a Xilinx Virtex-5 FPGA (though not limited to it).

software and compilers for embedded systems | 2016

Cross-Layer Reliability Modeling and Optimization: Compiler and Run-Time System Interactions

Muhammad Shafique; Semeen Rehman; Florian Kriebel; Jörg Henkel

This paper presents a cross-layer reliability modeling and optimization approach that leverages multiple software layers like compiler and run-time system to improve the overall reliability considering unreliable or partially-reliable hardware. In order to bridge the gap between hardware and software to achieve high efficiency, our technique incorporates the knowledge from hardware layers during reliability modeling and design of optimization techniques. We demonstrate how different software layers operate synergistically to achieve a high degree of reliability.

design automation conference | 2016