Hassan Youness | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hassan Youness is active.

Explore More

Publication

Featured researches published by Hassan Youness.

IEEE Transactions on Industrial Informatics | 2014

MPSoCs and Multicore Microcontrollers for Embedded PID Control: A Detailed Study

Hassan Youness; Mohammed Moness; Mahmoud Khaled

This paper presents different multiprocessor implementations of the proportional-integral-derivative (PID) controller using two technologies: 1) field programmable gate array (FPGA)-based multiprocessor system-on-chip (MPSoC); and 2) multicore microcontrollers (MCUs). Techniques to implement a parallelized PID controller, a multi-PID controller, and a self-tuning PID controller are proposed. These techniques are verified using hardware (HW) in the loop (HIL) simulations. Then, the paper presents a detailed case study of an embedded real-time (RT) self-tuning PID controller for a 1-degree-of-freedom (1-DOF) aerodynamical system. This includes controller design, parameters tuning, and implementation using a multiprocessor system. Results proved the effectiveness of the proposed techniques to improve performance and functionality. It is shown that customizing HW and software (SW) within MPSoCs provides higher RT performance. Moreover, using multicore MCUs can reduce design time, implementation time, and cost, while keeping adequate performance. Therefore, it is possible to realize and implement complex RT embedded controllers that employ advanced control algorithms in rapid, effective, and cost-efficient fashion.

international symposium on circuits and systems | 2010

Efficient partitioning technique on multiple cores based on optimal scheduling and mapping algorithm

Hassan Youness; Abdel-Moniem Wahdan; Mohammed Hassan; Ashraf Salem; Mohammed Moness; Keishi Sakanushi; Yoshinori Takeuchi; Masaharu Imai

In this paper, efficient hardware-software (HW-SW) partitioning technique based on high performance scheduling and mapping algorithms on multiple cores is presented. The scheduling and mapping algorithms produce the optimality of mapping tasks onto cores. The partitioning technique reduces the overall execution time and number of buses among the cores. The viability and potential of the proposed algorithms are demonstrated by extensive experimental results to conclude that the proposed algorithms are efficient scheme to obtain the optimality of scheduling, mapping and partitioning with hard and large task graph problems.

international conference on microelectronics | 2010

A Design Space Exploration methodology for allocating Task Precedence graphs to multi-core system architectures

Hassan Youness; Mohamed Hassan; Ashraf Salem

In this paper, we propose a Design Space Exploration (DSE) methodology to produce multi-core system architectures with optimal scheduling, number of cores, number of buses and hardware-software partitioning from Task Precedence Graphs (TPGs). The viability and potential of the proposed methodology is demonstrated by extensive experimental results to conclude that it is an efficient scheme to obtain the optimality with hard and large task graph problems.

national radio science conference | 2016

High performance reconfigurable Viterbi Decoder design for multi-standard receiver

Khloud Mostafa; Hassan Youness; Mohammed Moness

A Viterbi Decoder (VD) is employed to decode the convolutional codes, where convolutional codes are commonly used to encode digital data before transmission. However, there is a large variety of modern wireless communication standards; a flexible hardware platform that can be configured to support different standards is still needed. In this paper, a reconfigurable Viterbi decoder has been designed. The proposed Viterbi decoder has an architecture that supports constraint lengths 3, 5, and 7, and code rates 1/2 and 1/3 which makes it compatible with many common standards, like Wi-Max, WLAN, 3GPP2, GSM and LTE. The proposed Viterbi decoder has been simulated using Xilinx ISE 14.5 simulator and implemented with VHDL on Xilinx Zed board, Zynq-7000 FPGA using Xilinx iMPACT device configuration tool. Moreover, in the proposed architecture design, a modified add-compare-select unit that efficiently reduces power consumption by 26% and area by 21% is employed.

parallel, distributed and network-based processing | 2015

An Efficient Implementation of Ant Colony Optimization on GPU for the Satisfiability Problem

Hassan Youness; Aziza Ibraheim; Mohammed Moness; Muhammad Osama

This paper focuses on solving the Boolean Satisfiability (SAT) problem using a parallel implementation of the Ant Colony Optimization (ACO) algorithm for execution on the Graphics Processing Unit (GPU) using NVIDIA CUDA (Compute Unified Device Architecture). We propose a new efficient parallel strategy for the ACO algorithm executed entirely on the CUDA architecture, and perform experiments to compare it with the best sequential version exists implemented on CPU with incomplete approaches. We show how SAT problem can benefit from the GPU solutions, leading to significant improvements in speed-up even though keeping the quality of the solution. Our results shows that the new parallel implementation executes up to 21x faster compared to its sequential counterpart.

international conference on computer engineering and systems | 2009

Optimization method for scheduling length and the number of processors on multiprocessor systems

Hassan Youness; Mohammed Hassan; Keishi Sakanushi; Yoshinori Takeuchi; Masaharu Imai; Ashraf Salem; Abdel-Moniem Wahdan; Mohammed Moness

A high performance algorithm for scheduling of tasks aims to optimize the overall execution time of the program by properly allocating and arranging the execution order of the tasks on the multiprocessor systems such that the precedence constraints among the tasks are preserved. In this paper, we propose an algorithm to get the optimality of scheduling for large problem sizes and optimize the target system. The algorithm uses geometrical analysis based on an Artificial Intelligence (AI) technique to produce the optimal solution for the allocation/scheduling problem, also it uses pruning techniques to reduce the size of the search space and to minimize the number of processors that used. The viability and potential of the proposed algorithm is demonstrated by extensive experimental results (more than 180 random task graphs) to conclude that the proposed algorithm is an efficient scheme to obtain the optimality with hard and large problem of task graphs.

international conference on computer engineering and systems | 2015

A new hardware/software partitioning technique

Hassan Youness; Amal Mahfoz

Hardware/software (HW/SW) partitioning is one of the most important issues of co-design systems, deciding which components of the system could be implemented in hardware and which ones in software. It plays a crucial role in improving the system performance. HW/SW partitioning problem is also a NP-hard problem. In this paper, a new hardware/software partitioning technique is presented to reduce the overall execution time of the system; the technique is based on dividing the task graphs into levels. In each level, the task with high computing cost and high communication cost is assigned to hardware implementation. If there is no task with the previous specifications, the technique computes granularity of each task, for the task with coarse grain is assigned to hardware implementation. Experimental results conclude that the proposed algorithm is an efficient algorithm to reduce the overall execution time and reduce hardware resources about 45% to the existing one.

international conference on computer engineering and systems | 2013

Fault tolerant heterogeneous scheduling for precedence constrained task graphs using simulated annealing

Hassan Youness; Aly Omar; Mohamed Moness

Scheduling is known to be an NP complete problem in most cases that has no optimal solution in polynomial time. Scheduling task graphs on heterogeneous architecture increases the difficulty of the problem. These heterogeneous architectures like any other platforms are prone to faults thus fault tolerance techniques must be used to ensure accomplishment of the job therefore task replication is used to achieve fault tolerance. However scheduling complexity is increased and the schedule length is affected dramatically due to duplication. Also task replication introduces great communication delays overhead. Here we propose the use of simulated annealing optimization method to find optimal solution according to platform reliability, where the algorithm can be used to minimize lower bound makespan on high reliability platforms and genuinely optimize upper bound makespan for platforms that are prone to failures.

international conference on computer engineering and systems | 2016

Accelerated Processing Unit (APU) potential: N-body simulation case study

Hassan Youness; Mohamed Moness; Omar Shaaban

This paper investigates and studies the acceleration of irregular/regular algorithms via Integrate Graphic Processing Unit (Integrated GPU) known as Accelerated Processing Unit (APU) that is fused on the same die with the CPU, and Discrete Graphic Processing Unit (GPU), while answering the question of How potential is the APU for applications with iregular data structures such as trees knowing that the APU share power and bandwidth resources with the CPU. Morever, this paper determine the cases at which the APU can be considered a cheaper solution than the GPU. Cosmological N-body simulation with two different implemntations were used as a case study of regular and irregular algorithms. Results indicated that the GPU is more powerful than the APU in all of the conducted tests.

2013 Second International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC) | 2013

Fault tolerant heterogeneous MPSOC schedule length minimization based on platform reliability

Hassan Youness; Aly Omar; Mohamed Moness

Fault tolerant scheduling has been a subject of great concern recently, considering heterogeneous multiprocessor systems that may contain low reliability processors in their systems. Task replication is an established technique to achieve fault tolerance; however it has a negative influence on schedule length. Moreover increasing system reliability always has a negative impact on schedule length. In this paper we devised a new method for optimizing schedule length and maximizing system reliability simultaneously using simulated annealing. Schedule length is investigated in case of fault free operation and in the presence of a processor fault and then the schedule length is averaged based on the probability of the schedule success and failure, then this average schedule length is optimized to the minimum to give the lowest possible makespan in both cases. Results show that our algorithm is able to maximize the system reliability without degrading schedule length, in fact increasing system reliability decrease the averaged schedule length and hence the system overall performance in all cases.

Explore More