Is this you? Create Your Porfile

Ahmed Shalaby

Egypt-Japan University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ahmed Shalaby is active.

Explore More

Publication

Featured researches published by Ahmed Shalaby.

Computers & Mathematics With Applications | 2012

Flexible router architecture for network-on-chip

Mostafa S. Sayed; Ahmed Shalaby; Mohamed El-Sayed; Victor Goulart

The growing complexity of systems-on-chip (SoCs) pushes researchers to propose replacing the bus architecture by Networks-on-Chip (NoCs). The key advantages of NoCs are efficient exploitation of performance and scalability. Nowadays NoCs are a well established research topic and several implementations have been proposed. Some techniques are proposed to improve NoC performance in terms of latency and throughput while others are proposed to improve area utilization and power consumption. An important research in NoC design is the tradeoff between area/power and performance. In order to improve performance some techniques tend to increase the number of buffers. However this method increases area and power consumption. This paper introduces new router architecture, called the Flexible Router, which improves the performance of the overall network using the same amount of available buffers but in more efficient way. Therefore there is no need to increase the size of buffers or to use extra virtual channels (VCs) which cause high power consumption, area overheads, and complex logic. The Flexible Router provides a way to handle the requests to a busy buffer by other buffers in the router. It is observed that the Flexible router can achieve better performance in terms of increasing the saturation rate for Hotspot, Uniform, and Nearest-Neighbor traffic patterns, especially Hotspot with an 11.4% increase. Discussion about area overhead compared to the Base router and analysis of arriving out of order packets (side-effect) are also provided.

asia pacific conference on circuits and systems | 2014

A highly parallel SAD architecture for motion estimation in HEVC encoder

Ahmed Medhat; Ahmed Shalaby; Mohammed S. Sayed; Maha Elsabrouty; Farhad Mehdipour

The high computational cost of the motion estimation module in the new HEVC standard raises the need for efficient hardware architectures that can meet the real-time processing constraint. In addition, targeting HD and UHD resolutions increases the motion estimation processing cost beyond the capabilities of the currently existing architectures. This paper presents a highly parallel sum of absolute difference (SAD) architecture for motion estimation in HEVC encoder. The proposed architecture has 64 PUs operating in parallel to calculate the SAD values of the prediction blocks. It processes block sizes from 4×4 up to 64×64. The proposed architecture has been prototyped, simulated and synthesized on Xilinx Virtix-7 XC7VX550T FPGA. At 458 MHz clock frequency, the proposed architecture processes 30 2K resolution fps with ±20 pixels search range. The prototyped architecture utilizes 7% of the LUTs and 5% of the slice registers in Xilinx Virtex-7 XC7VX550T FPGA.

Iet Image Processing | 2016

Adaptive low-complexity motion estimation algorithm for high efficiency video coding encoder

Ahmed Medhat; Ahmed Shalaby; Mohammed S. Sayed; Maha Elsabrouty; Farhad Mehdipour

High quality videos became an essential requirement in recent applications. High efficiency video coding (HEVC) standard provides an efficient solution for high quality videos at lower bit rates. On the other hand, HEVC comes with much higher computational cost. In particular, motion estimation (ME) in HEVC, consumes the largest amount of computations. Therefore, fast ME algorithms and hardware accelerators are proposed in order to speed-up integer ME in HEVC. This study presents a fast centre search algorithm (FCSA) and an adaptive search window algorithm (ASWA) for integer pixel ME in HEVC. In addition, centre adaptive search algorithm, a combination of the two proposed algorithms FCSA and ASWA, is proposed in order to achieve the best performance. Experimental results show notable speed-up in terms of encoding time and bit rate saving with tolerable peak signal-to-noise ratio (PSNR) quality degradation. The proposed fast search algorithms reduce the computational complexity of the HEVC encoder by 57%. This improvement is accompanied with a modest average PSNR loss of 0.014 dB and an increase by 0.6385% in terms of bit rate when compared with related works.

international conference on electronics, circuits, and systems | 2014

Fast center search algorithm with hardware implementation for motion estimation in HEVC encoder

Ahmed Medhat; Ahmed Shalaby; Mohammed S. Sayed; Maha Elsabrouty; Farhad Mehdipour

This paper presents a Fast Center Search Algorithm (FCSA) and its hardware implementation design of integer Motion Estimation for High Efficiency Video Coding (HEVC). FCSA achieves average time saving ratio up to 40% for HD video sequences with respect to full search, with insignificant loss in terms of PSNR performance and bit rate. The proposed hardware implementation shows that it meets the requirements of 30 4K frame per second with ±16 search window at 550 MHz. The prototyped architecture utilizes 8% of the LUTs and 4% of the slice registers in Xilinx Virtex-6 XC6VLX-550T FPGA.

visual communications and image processing | 2015

Fast parameter estimation algorithm for sample adaptive offset in HEVC encoder

Sayed El Gendy; Ahmed Shalaby; Mohammed S. Sayed

HEVC has adopted Sample Adaptive Offset (SAO) as a new in-loop filtering block. SAO can significantly improve coding efficiency, however, it requires exhaustive operations in order to calculate best SAO parameters for each Coding Tree Unit (CTU). SAO parameters estimation process takes around 93 % of the overall SAO execution time which depends on number of candidate SAO modes. Real time and low power video encoders are in demand for more efficient SAO encoding algorithms. In this paper, we propose an algorithm that reduces the SAO parameter estimation complexity by adaptively reusing the dominate mode of corresponding set of CTUs. The proposed algorithm can save up to 75% of SAO parameter estimation time with only 0.8% BD-rate penalty.

international midwest symposium on circuits and systems | 2015

High-throughput hardware implementation for motion estimation in HEVC encoder

Ahmed Medhat; Ahmed Shalaby; Mohammed S. Sayed

This paper presents a highly parallel motion estimation architecture for High Efficiency Video Coding (HEVC) encoder. The proposed architecture has 16 processing units operating in parallel to calculate the sum of absolute difference values of all possible variable prediction block sizes. Hence, it calculates the bit cost regarding every partition in order to find the best matching candidate in terms of bit cost. The proposed unit processes block sizes from 4×4 up to 64×64. The proposed architecture was prototyped, simulated and synthesized using 65nm TSMC CMOS technology. At 720 MHz clock frequency, the proposed architecture processes 2K (1920×1080) resolution at 30 fps with ±27 (55×55) pixel search range using full search algorithm. Moreover, the proposed architecture is a flexible one and it can be used with different search algorithms to process higher resolutions such as 4K (3840×2160) resolution with 30 fps rate. To the best of our knowledge, the proposed architecture is one of the first ASIC motion estimation architectures in the literature for HEVC.

2012 Japan-Egypt Conference on Electronics, Communications and Computers | 2012

Congestion mitigation using flexible router architecture for Network-on-Chip

Mostafa S. Sayed; Ahmed Shalaby; M. El-Sayed Ragab; Victor Goulart

An important topic in Network-on-Chip (NoC) design is the tradeoff between area and performance. Some techniques tend to increase the number of buffers to improve performance. However this method increases the chip area and so does the power consumption. In this paper we introduce a new flexible router architecture that can improve the performance of the overall network using the same amount of buffering available but in an efficient way. Therefore there is no need to increase the size of buffers or to use extra virtual channels (VCs) which have high power and area overheads or complex logic. If there is a request to a busy buffer the router will store the incoming packet in any other suitable free buffer in the router. The Flexible router shows an increase in performance in terms of increasing the saturation rate for Hotspot, Uniform, and Nearest-Neighbor traffics, especially Hotspot with 11.4% increase. Discussion about area overhead over a standard Base router and the analysis of arriving unordered packets (side-effect) are also presented.

Intelligent Decision Technologies | 2016

A narrative of UVM testbench environment for interconnection routers: A practical approach

Ahmed El-Naggar; Essraa Massoud; Ahmed Medhat; Hala Ibrahim; Bassma Al-Abassy; Sameh El-Ashry; Mostafa Khamis; Ahmed Shalaby

In contrast to past projections using conventional bus-based interconnections, the use of Network on Chip (NoC) as an interconnection platform has become more promising to solve complex on-chip communication problems due to what it offers from scalability, reusability and efficiency. Moreover, providing a suitable test base to inspect and verify functionality of any IP core is a compulsory stage. To elaborate; Universal Verification Methodology (UVM) is introduced as a standardized and reusable methodology for verifying integrated circuit designs. In this paper, we present an architecture of a complete UVM environment to test generic routers through various test cases providing different scenarios to be applied. We also aim to establish a base on which other researchers can build to proceed towards finding better solutions.

ieee computer society annual symposium on vlsi | 2016

Low Cost VLSI Architecture for Sample Adaptive Offset Encoder in HEVC

Sayed El Gendy; Ahmed Shalaby; Mohammed S. Sayed

Sample Adaptive Offset (SAO) has been adopted as a new in-loop filtering block in High Efficiency Video Coding (HEVC). It can significantly increase compression efficiency especially for sequences that contain computer graphics content up to 23%. To get the optimum SAO parameters, exhaustive operations are required because of the huge amount of samples which the encoder has to study. In this work, a low cost high throughput VLSI implementation for the parameter estimation (encoding) phase is proposed. The proposed novel architecture reduces the cost in terms of gates count by 47% in comparison with prior work. The proposed design is prototyped using 65 nm CMOS technology. It has 89.3 Kgates, 8832 bits SRAM, and a maximum clock frequency of 426 MHz. It can support real time 8K×4K@120fps videos at 378 MHz.

Microprocessors and Microsystems | 2016

A design methodology and various performance and fabrication metrics evaluation of 3D Network-on-Chip with multiplexed Through-Silicon Vias

Mostafa Said; Ahmed Shalaby; Farhad Mehdipour; Morteza Biglari-Abhari; Mohamed El-Sayed

The use of short Through-Silicon Vias (TSVs) in 3D integration Technology introduces a significant reduction in routing area, power consumption, and delay. Although, there are still several challenges in 3D integration technology; mainly low yield, which is a direct result of extra fabrication steps of TSVs. Therefore, reducing TSV count has a considerable effect on improving yield and hence reducing cost. A TSV multiplexing technique called TSVBOX was introduced in Said et?al. (2013) to reduce the TSV count without affecting the direct benefits of TSVs. Although, the TSVBOX introduces some delay to the signals to be multiplexed, this delay effect of TSV multiplexing is not addressed yet. In this paper, we analyze the TSVBOX timing requirements and propose a design methodology for TSVBOX-based 3D Network-on-Chip (NoC). Then performance and power comparisons are conducted to investigate the direct effects of TSV multiplexing on these two metrics. After that the basic fabrication metrics are compared to investigate the effect of the proposed design methodology on yield and cost. We show that the TSVBOX extremely enhances the fabrication metrics at minimal degradation in performance and power consumption, especially for Hotspot-like traffic patterns.

Explore More