Hamed Tabkhi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hamed Tabkhi is active.

Explore More

Publication

Featured researches published by Hamed Tabkhi.

IEEE Transactions on Very Large Scale Integration Systems | 2014

Application-Guided Power Gating Reducing Register File Static Power

Hamed Tabkhi; Gunar Schirner

Power and energy efficiency are on the top priority list in embedded computing. Embedded processors taped out in deep submicron technology have a high contribution of static power to overall power consumption. At the same time, current embedded processors often include a large register file (RF) to increase performance. However, a larger RF aggravates the static power issues associated with technology shrinking. Therefore, approaches to improve static power consumption of large RFs are in high demand. In this paper, we introduce an application-guided function-level register file power-gating (AFReP) approach to efficiently manage and reduce the RFs static power consumption. The AFReP is an interplay of automatic binary analysis and instrumentation at function-level granularity supported by instruction-set architecture and microarchitecture extensions. The AFReP enables runtime power-gating of registers during unutilized periods, whereas applications can fully benefit from a large RF during utilized periods. To demonstrate the AFRePs potential for reducing static power consumption, we have enhanced a Blackfin processor with the AFReP technology. Using the AFReP, the RF static power is reduced on average by 64% and 39% for control and DSP applications, respectively. At the same time, the AFReP only induces a very minimal overhead of 0.4% and 0.6%.

2010 Third International Conference on Dependability | 2010

RMAP: A Reliability-Aware Application Mapping for Network-on-Chips

Ahmad Patooghy; Hamed Tabkhi; Seyed Ghassem Miremadi

This paper proposes a reliability-aware application mapping for mesh-based NoCs. The proposed reliable mapping, called RMAP, adds redundant communications to the application graph in order to improve the reliability of packet delivery in NoCs. The RMAP divides the application graph into two sub-graphs which have the lowest possible communication with each other. One of the sub-graphs is mapped on the upper triangular nodes of the NoC and the other is mapped on the lower triangular nodes. In this way, lower traffic load is imposed on some channels which are efficiently used to route packets of redundant communications. This minimizes the overheads imposed to the NoC due to redundant communications. A cycle accurate NoC simulator is used to evaluate the reliability and performance of the proposed mapping. The RMAP is also compared with the previously proposed reliability improvement methods, e.g., flow-control and flood-based methods. Simulation results reveal that the RMAP improves the reliability of an unprotected NoC by about 20%, while its performance overhead is lower than the other methods.

asilomar conference on signals, systems and computers | 2013

Flexible function-level acceleration of embedded vision applications using the Pipelined Vision Processor

Robert Bushey; Hamed Tabkhi; Gunar Schirner

The emerging massive embedded vision market is driving demanding and ever-increasing computationally complex high-performance and low-power MPSoC requirements. To satisfy these requirements innovative solutions are required to deliver high performance pixel processing combined with low energy per pixel execution. These solutions must combine the power efficiency of ASIC style IP while incorporating elements of Instruction-Level Processors flexibility and software ecosystem. This paper introduces Analog Devices BF609s Pipelined Vision Processor (PVP) as a state-of-the-art industrial solution achieving both efficiency and flexibility. The PVP incorporates over 10 function level blocks enabling dozens of programmable functions that can be allocated to implement many algorithms and applications. Additionally, the pipelined style connectivity is programmable enabling many temporal function permutations. Overall, the PVP offers greater than 25 billion operations per second (GOPs) and very low memory bandwidth. These capabilities enable the PVP to execute multiple concurrent ADAS, Industrial, or general vision applications. This paper focuses on the key architecture concepts of the PVP from individual function-block construction to the allocation and chaining of functional blocks to build function based application implementations. The paper also addresses the benefits and challenges of architecting and programming at the function-level granularity and abstractions.

international conference on computer aided design | 2012

AFReP: application-guided function-level registerfile power-gating for embedded processors

Hamed Tabkhi; Gunar Schirner

With shrinking CMOS feature size, static power is growing significantly and power density has emerged as an increasing concern. At the same time, one trend of embedded processors is toward larger Register Files (RFs) which further increases static power dissipation and aggravating the issue. This paper introduces an Application-guided Function-level Register file Power-gating (AFReP) that reduces static power of RFs in embedded processors. Our AFReP approach is based on a automatic analysis of register lifetime in the application binary, followed an automatic binary instrumentation for runtime RF power-gating. The instrumented code executes on a processor with ISA and micro-architecture extension for power-gating control over individual registers. Our application binary analysis/instrumentation operates at function-level granularity, automatically gating the registers that do not contribute to program outcome. Our experimental results using an AFReP-enhanced Blackfin processor demonstrate average RF static power reduction by 60% and 52% for control and DSP applications from Mibench and DSPstone suites, respectively. The added instructions for run-time power-gating increase execution time by only 1% on average.

IEEE Embedded Systems Letters | 2014

Function-Level Processor (FLP): A High Performance, Minimal Bandwidth, Low Power Architecture for Market-Oriented MPSoCs

Hamed Tabkhi; Robert Bushey; Gunar Schirner

This letter introduces function-level processors (FLPs) to fill the flexibility/efficiency gap between instruction-level processors (ILPs) and hardware accelerators (HWACCs). Compared to an ILP, an FLP has a coarser programmability at function-level constructed out of configurable function blocks (FBs) implementing market-oriented functions. FBs are connected via a MUX-based programmable interconnect, tuned for envisioned application flows, for realizing flexible macro pipelines. We demonstrate FLP benefits with an industry example of the pipeline-vision processor (PVP). Mapping six embedded vision applications, the PVP offers up to 22.4 GOPs/s with average power of 120 mW; consuming 17x and 6x less power than compared ILP and ILP + HWACCs approaches.

field-programmable custom computing machines | 2013

A Power-Efficient FPGA-Based Mixture-of-Gaussian (MoG) Background Subtraction for Full-HD Resolution

Hamed Tabkhi; Majid Sabbagh; Gunar Schirner

System-level mixed-criticality design towards low product cost and high resource efficiency. This paper studies the integration technology of mixed-criticality avionics traffics for Avionics Full-Duplex Switched Ethernet (AFDX) network, which can transmit both critical traffics and non-critical traffics. In the architecture, critical traffics use by Bandwidth Allocation Gap (BAG) based scheduler and non-critical traffics are scheduled by Round Robin manner. In order to estimate delay bound meeting requirements of applications, End-to-End delay for both critical traffics and non-critical traffics are analyzed exerting the approach of Network Calculus. Finally, a True Time based simulation of AFDX networks is conducted to show the effectiveness of proposed approach.

international conference on microelectronics | 2008

A cost-effective error detection and roll-back recovery technique for embedded microprocessor control logic

Hassan Ghasemzadeh-Mohammadi; Hamed Tabkhi; Seyed Ghassem Miremadi; Alireza Ejlali

The increasing rate of transient faults necessitates the use of on-chip fault-tolerant techniques in embedded microprocessors. Performance overhead is a challenging problem in on-chip fault-tolerant techniques used in the random logic of the embedded microprocessors. This paper presents a signature-based error detection and roll-back recovery technique for the control logic with much lower performance overhead as compared to many previous techniques. The low performance overhead is achieved by eliminating the fault masking overhead cycles in the previous techniques. The performance overhead is analytically studied, and the analytical results recommend at which fault rate the use of the technique is preferred. To measure the cycle time of the pipeline critical path and area overhead, this technique has been implemented and synthesized using a behavioral VHDL model of the Leon2 processor. The synthesis results show that the area and the cycle time overhads of the technique are only 17.7% and 3.4%, respectively. In addition, the injection of about 74000 transient single bit-flip faults into the control logic part of the Leon2 processor shows that the technique detected about 99% of the injected faults.

brazilian conference on intelligent systems | 2014

A GPU-Based Algorithm-Specific Optimization for High-Performance Background Subtraction

Chulian Zhang; Hamed Tabkhi; Gunar Schirner

Background subtraction is an essential first stage in many vision applications differentiating foreground pixels from the background scene, with Mixture of Gaussians (MoG) being a widely used implementation choice. MoGs high computation demand renders a real-time single threaded realization infeasible. With its pixel level parallelism, deploying MoG on top of parallel architectures such as a Graphics Processing Unit (GPU) is promising. However, MoG poses many challenges having a significant control flow (potentially reducing GPU efficiency) as well as a significant memory bandwidth demand. In this paper, we propose a GPU implementation of Mixture of Gaussians (MoG) that surpasses real-time processing for full HD (1080p 60 Hz). This paper describes step-wise optimizations starting from general GPU optimizations (such as memory coalescing, computation & communication overlapping), via algorithm-specific optimizations including control flow reduction and register usage optimization, to windowed optimization utilizing shared memory. For each optimization, this paper evaluates the performance potential and identifies architectural bottlenecks. Our CUDA-based implementation improves performance over sequential implementation by 57×, 97× and 101× through general, algorithm-specific, and windowed optimizations respectively, without impact to the output quality.

application-specific systems, architectures, and processors | 2014

Function-Level Processor (FLP): Raising efficiency by operating at function granularity for market-oriented MPSoC

Hamed Tabkhi; Robert Bushey; Gunar Schirner

The exponential growth in computation demand drives chip vendors to heterogeneous architectures combining Instruction-Level Processors (ILPs) and custom HW Accelerators (HWACCs) in an attempt to provide the needed processing capabilities while meeting power/energy requirements. ILPs, on one hand, are highly flexible, but power inefficient. Custom HWACCs, on the other hand, are inflexible (focusing on dedicated kernels), but highly power efficient. Since, designing HWACCs for every application is cost prohibitive, large portions of applications still run inefficiently on ILPs. New processing architectures are needed that combine the power efficiency of HWACCs while still retaining sufficient flexibility to realize applications across targeted market segments. This paper introduces Function-Level Processors (FLPs) to fill the gap between ILPs and dedicated HWACCs. FLPs are comprised of configurable Function Blocks (FBs) implementing selected functions which are then interconnected via programmable point-to-point connections constructing an extensible/configurable macro data-path. An FLP raises programming abstraction to a Function-Set Architecture (FSA) controlling FBs allocation, configuration and scheduling. We demonstrate FLP benefits with an industry example of the Pipeline-Vision Processor (PVP). We highlight the gained flexibility by mapping 10 embedded vision applications entirely to the FLP-PVP offering up to 22.4 GOPs/s with average power of 120 mW. The results also demonstrate that our FLP-PVP solution consumes 14×-18× less power than an ILP and 5x less power than a hybrid ILP+HWACCs solution.

design, automation, and test in europe | 2012

Application-specific power-efficient approach for reducing register file vulnerability

Hamed Tabkhi; Gunar Schirner

This paper introduces a power efficient approach for improving reliability of heterogeneous register files in embedded processors. The approach is based on the fact that control applications have high demands in reliability, while many special-purpose register are unused in a considerable portion of execution. The paper proposes a static application binary analysis which is applied at function-level granularity and offers a systematic way to manage the RFs protection by mirroring the content of used registers into unused ones. The simulation results on an enhanced Blackfin processor demonstrate that Register File Vulnerability Factor (RFVF) is reduced from 35% to 6.9% in cost of 1% performance lost on average for control applications from Mibench suite.

Explore More