Hassan Salamy
Texas State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hassan Salamy.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2012
Hassan Salamy; J. Ramanujam
The growing trend in current complex embedded systems is to deploy a multiprocessor system-on-chip (MPSoC). A MPSoC consists of multiple heterogeneous processing elements, a memory hierarchy, and input/output components which are linked together by an on-chip interconnect structure. Such an architecture provides the flexibility to meet the performance requirements of multimedia applications while respecting the constraints on memory, cost, size, time, and power. Many embedded systems employ software-managed memories known as scratch-pad memories (SPM). Unlike caches, SPMs are software-controlled and hence the execution time of applications on such systems can be accurately predicted. Scheduling the tasks of an embedded application on the processors and partitioning the available SPM budget among these processors are two critical issues in such systems. Often, these are considered separately; such a decoupled approach may miss better quality schedules. In this paper, we present an integrated approach to task scheduling and SPM partitioning to further reduce the execution time of embedded applications. Results on several real-life benchmarks show the significant improvement from our proposed technique.
high performance embedded architectures and compilers | 2008
Hassan Salamy; J. Ramanujam
The growing trend in current complex embedded systems is the use of multiprocessor system-on-chip (MPSoC). An MPSoC consists of multiple heterogeneous processing elements, a memory hierarchy, and input/output components which are linked together by an on-chip interconnect structure. Using such an architecture provides the flexibility to meet the performance requirements of multimedia applications while respecting the constraints on memory, cost, size, time and power. Such embedded systems employ software-managed memories known as scratch-pad memories (SPM). Scratchpad memories, unlike caches, are software-controlled and hence the execution time of applications on such systems can be accurately predicted and controlled. Scheduling the tasks of an application on the processors as well as partitioning the available SPM budget among those processors are two critical issues in reducing the overall computation time as well as the communication overhead. Traditionally, the step of task scheduling is applied separately from the memory partitioning step. Such a decoupled approach may miss better quality schedules. In this paper, we present an effective heuristic that integrates task allocation and SPM partitioning to further reduce the execution time of embedded applications. Results on several real life benchmarks show the significant improvement of our proposed technique compared to decoupled techniques as well as to an integer-linear programming approach.
languages and compilers for parallel computing | 2006
Hassan Salamy; J. Ramanujam
In many Digital Signal Processors (DSPs) with limited memory, programs are loaded in the ROM and thus it is very important to optimize the size of the code to reduce the memory requirement. Many DSP processors include address generation units (AGUs) that can perform address arithmetic (auto-increment and auto-decrement) in parallel to instruction execution, and without the need for extra instructions. Much research has been conducted to optimize the layout of the variables in memory to get the most benefit from auto-increment and auto-decrement. The simple offset assignment (SOA) problem concerns the layout of variables for machines with one address register and the general offset assignment (GOA) deals with multiple address registers. Both these problems assume that each variable needs to be allocated for the entire duration of a program. Both SOA and GOA are NP-complete. In this paper, we present a heuristic for SOA that considers coalescing two or more non-interfering variables into the same memory location. SOA with variable coalescing is intended to decrease the cost of address arithmetic instructions as well as to decrease the memory requirement for variables by maximizing the number of variables mapped to the same memory slot. Results on several benchmarks show the significant improvement of our solution compared to other heuristics. In addition, we have adapted simulated annealing to further improve the solution from our heuristic.
canadian conference on electrical and computer engineering | 2011
Hassan Salamy; Haidar M. Harmanani
With the growing trend of increasing number of cores on a single chip, bus-based communication is suffering from bandwidth and scalability issues. As a result, the new approach is to use a network on chip (NoC) as the main communication system on a SoC. NoC provides the flexibility and scalability much needed in the era of multi-cores. NoC-based systems also provide the capability of multiple clocking that is widely used in many SoC nowadays. In this paper, an optimal integer linear programming (ILP) solution for test scheduling of cores in a NoC-based SoC using multiple clock rates is presented. Results on different benchmarks show the effectiveness of our techniques.
International Journal of Electronics | 2013
Hassan Salamy; Haidar M. Harmanani
The increasing trend in the number of cores on a single chip has led to scalability and bandwidth issues in bus-based communication. Network-on-chip (NoC) techniques have emerged as a solution that provides a much needed flexibility and scalability in the era of multi-cores. This article presents an optimal integer linear programming (ILP) formulation and a simulated annealing (SA) solution to thermal and power-aware test scheduling of cores in an NoC-based SoC using multiple clock rates. The methods have been implemented and results on various benchmarks are presented.
Computers & Electrical Engineering | 2012
Hassan Salamy
Address arithmetic instructions constitute a big part of the generated code for digital signal processors (DSPs). Most modern digital signal processors (DSPs) provide multiple address registers and a dedicated address generation unit (AGU) which performs address generation in parallel to instruction execution. There is no address computation overhead if the next address is within the auto-modify range. A careful placement of variables in memory is utilized to reduce the number of address arithmetic instructions and thus generate compact and efficient code. The simple offset assignment (SOA) problem concerns the layout of variables for machines with one address register and the general offset assignment (GOA) deals with multiple address registers. Both these problems assume that each variable needs to be allocated for the entire duration of a program. Both SOA and GOA are NP-complete. In this article, we present effective solutions using simulated annealing (SA) for the simple and the general offset assignment problems with variable coalescing where two or more non-interfering variables can be mapped into the same memory location. Results on several benchmarks show the significant improvement from our proposed techniques compared to other heuristics in the literature.
ACM Transactions on Design Automation of Electronic Systems | 2012
Hassan Salamy; J. Ramanujam
Digital Signal Processors (DSPs) are a family of embedded processors designed under tight memory, area, and cost constraints. Many DSPs use irregular addressing modes where base-plus-offset mode is not supported. However, they often have Address Generation Units (AGUs) that can perform auto-increment/decrement address arithmetic instructions in parallel with Load/Store instructions. This feature can be utilized to reduce the number of explicit address arithmetic instructions and thus reduce the embedded application code size. This code size reduction is essential for this family of DSP as the code usually resides in the ROM and hence the code size directly translates into silicon area. An effective technique for optimized code generation is offset assignment. This is a well-used technique in the literature to decrease the code size by finding an offset assignment that can effectively utilize auto-increment/decrement. This problem is known as simple offset assignment when there is only one address register and as General Offset Assignment (GOA) for multiple available address registers. In this article, we present an optimal Integer Linear Programming (ILP) solution to the offset assignment problem with variable coalescing where more than one variable can share the same memory location. Variable permutation is also formulated to find the best access sequence to achieve the best offset assignment that decreases the code size the most. Experimental results on several benchmarks show the effectiveness of our variable permutation technique as well as the large improvement from the ILP-based solutions compared to heuristics.
International Journal of Electronics | 2015
Hassan Salamy
The trend nowadays is to utilise multiple processors to overcome the limited additional power that can be extracted from a single core. This adds to the challenge of task scheduling on such architectures. Task scheduling should consider the power consumption of concurrently running tasks to avoid going over the maximum power limit. However, often scheduling with power awareness does not guarantee thermal safety. Thermal safety is intended to keep the temperatures of all system components under the maximum allowable temperature at all times. High temperatures can reduce the reliability and the overall functionality of the system. This implies that thermal-aware task scheduling is essential to reduce the system hotspots. In this article, we propose effective solutions to power and thermal-aware scheduling based on an integer linear formulation and genetic algorithms. Results on benchmarks proved the effectiveness and usefulness of our provided techniques.
canadian conference on electrical and computer engineering | 2013
Hassan Salamy; Semih Aslan; Divya Methukumalli
Task scheduling on multicore systems for optimized power and energy consumption is essential as the trend of utilizing multiple processors is ever increasing. Reducing power and energy consumption indirectly reduces cores temperatures and aids in eliminating the hot spots to ensure overall thermal safety. In this paper, we propose an optimal ILP solution to task scheduling of different applications on a multicore system with power and energy constraints. Results on different benchmarks show the effectiveness of our techniques.
embedded systems for real-time multimedia | 2008
Hassan Salamy; J. Ramanujam
Most modern digital signal processors (DSPs) provide multiple address registers and a dedicated address generation unit (AGU) which performs address generation in parallel to instruction execution. There is no address computation overhead if the next address is within the auto-modify range. A careful placement of variables in memory is utilized to decrease the number of address arithmetic instructions and thus to generate compact and efficient code. The simple offset assignment (SOA) problem concerns the layout of variables for machines with one address register and the general offset assignment (GOA) deals with multiple address registers. Both these problems assume that each variable needs to be allocated for the entire duration of a program. Both SOA and GOA are NP-complete. In this paper, we present an effective heuristic for the general offset assignment problem with variable coalescing (CGOA) where two or more non-interfering variables can be mapped into the same memory location. Results on several benchmarks show the significant improvement of our solution compared to other heuristics. Results were further improved using a simulated annealing (SA).