Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mohammad Fattah is active.

Publication


Featured researches published by Mohammad Fattah.


design automation conference | 2013

Smart hill climbing for agile dynamic mapping in many-core systems

Mohammad Fattah; Masoud Daneshtalab; Pasi Liljeberg; Juha Plosila

Stochastic hill climbing algorithm is adapted to rapidly find the appropriate start node in the application mapping of network-based many-core systems. Due to highly dynamic and unpredictable workload of such systems, an agile run-time task allocation scheme is required. The scheme is desired to map the tasks of an incoming application at run-time onto an optimum contiguous area of the available nodes. Contiguous and unfragmented area mapping is to settle the communicating tasks in close proximity. Hence, the power dissipation, the congestion between different applications, and the latency of the system will be significantly reduced. To find an optimum region, we first propose an approximate model that quickly estimates the available area around a given node. Then the stochastic hill climbing algorithm is used as a search heuristic to find a node that has the required number of available nodes around it. Presented agile climber takes the steps using an adapted version of hill climbing algorithm named Smart Hill Climbing, SHiC, which takes the runtime status of the system into account. Finally, the application mapping is performed starting from the selected first node. Experiments show significant gain in the mapping contiguousness which results in better network latency and power dissipation, compared to state-of-the-art works.


asia and south pacific design automation conference | 2014

Adjustable contiguity of run-time task allocation in networked many-core systems

Mohammad Fattah; Pasi Liljeberg; Juha Plosila; Hannu Tenhunen

In this paper, we propose a run-time mapping algorithm, CASqA, for networked many-core systems. In this algorithm, the level of contiguousness of the allocated processors (α) can be adjusted in a fine-grained fashion. A strictly contiguous allocation (α = 0) decreases the latency and power dissipation of the network and improves the applications execution time. However, it limits the achievable throughput and increases the turnaround time of the applications. As a result, recent works consider non-contiguous allocation (α = 1) to improve the throughput traded off against applications execution time and network metrics. In contradiction, our experiments show that a higher throughput (by 3%) with improved network performance can be achieved when using intermediate α values. More precisely, up to 35% drop in the network costs can be gained by adjusting the level of contiguity compared to non-contiguous cases, while the achieved throughput is kept constant. Moreover, CASqA provides at least 32% energy saving in the network compared to other works.


parallel, distributed and network-based processing | 2014

Mixed-Criticality Run-Time Task Mapping for NoC-Based Many-Core Systems

Mohammad Fattah; Amir-Mohammad Rahmani; Thomas Canhao Xu; Anil Kanduri; Pasi Liljeberg; Juha Plosila; Hannu Tenhunen

Contiguous processor allocation improves both the network and the application performance, by decreasing the congestion probability among communication of different applications. Consequently, the average, standard deviation and worst-case latency of the network is decreased significantly. This makes the contiguous allocation a good solution for time-critical applications with bounded deadlines. On the other hand, non-contiguous allocation will increase the system throughput significantly. Isolated nodes are utilized and more applications can finish their job in a time unit. However, this will lead to poor network metrics, unsuitable for real-time applications. In this work, we combine these two approaches in order to manage workloads with mixed-critical characteristics. Real-time applications are mapped contiguously, while non-critical applications are allowed to get dispersed over the available system nodes. Results show over 50% improvement in worst-case latency and 100 times improvement in deadline misses.


reconfigurable communication centric systems on chip | 2011

Exploration of MPSoC monitoring and management systems

Mohammad Fattah; Masoud Daneshtalab; Pasi Liljeberg; Juha Plosila

A well suited monitoring and management system is becoming a necessity as the number of cores on single chip systems is increasing. Some works have proposed monitoring systems in order to enable off-chip system debugging, while some others have introduced a monitoring approach towards system self-awareness. The latter tries to facilitate self-management of NoC-based MPSoCs in different aspects, such as power, performance, fault tolerance, reconfigurability etc. In this paper, we discuss different solutions and present a qualitative comparison between them. Hierarchical agent-based management systems are also surveyed as a promising solution to cope with different fine and coarse grained demands of a real time network based many-core architectures.


ieee computer society annual symposium on vlsi | 2010

A High Throughput Low Power FIFO Used for GALS NoC Buffers

Mohammad Fattah; Abdurrahman Manian; Abbas Rahimi; Siamak Mohammadi

In Networks-on-chip, increasing the depth of routers’ buffers even by a few stages can have a significant effect on average latency and saturation threshold of the network. However, the price to pay could be high in terms of power and silicon area. In this paper, we propose a low power, high throughput asynchronous FIFO suitable for buffers of GALS NoC routers. We consistently compare the performance with regards to power, area and throughput of our FIFO with some different FIFO structures, by exploring their design trade-offs with various number of stages and for different data lengths. These structures are simulated in 90nm CMOS technology with accurate spice simulations, where results show a low power consumption and latency, with a higher throughput. Finally, a back-annotated HDL model of a 4x4 mesh network, wherein a fully asynchronous router is implemented, shows better average latency, saturation threshold and power tradeoffs, using the proposed FIFO.


networks on chips | 2015

A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips

Mohammad Fattah; Antti Airola; Rachata Ausavarungnirun; Nima Mirzaei; Pasi Liljeberg; Juha Plosila; Siamak Mohammadi; Tapio Pahikkala; Onur Mutlu; Hannu Tenhunen

This paper introduces a new, practical routing algorithm, Maze-routing, to tolerate faults in network-on-chips. The algorithm is the first to provide all of the following properties at the same time: 1) fully-distributed with no centralized component, 2) guaranteed delivery (it guarantees to deliver packets when a path exists between nodes, or otherwise indicate that destination is unreachable, while being deadlock and livelock free), 3) low area cost, 4) low reconfiguration overhead upon a fault. To achieve all these properties, we propose Maze-routing, a new variant of face routing in on-chip networks and make use of deflections in routing. Our evaluations show that Maze-routing has 16X less area overhead than other algorithms that provide guaranteed delivery. Our Maze-routing algorithm is also high performance: for example, when up to 5 links are broken, it provides 50% higher saturation throughput compared to the state-of-the-art.


design automation conference | 2014

SHiFA: System-Level Hierarchy in Run-Time Fault-Aware Management of Many-Core Systems

Mohammad Fattah; Maurizio Palesi; Pasi Liljeberg; Juha Plosila; Hannu Tenhunen

A system-level approach to fault-aware resource management of many-core systems is proposed. The proposed approach, called SHiFA, is able to tolerate run-time faults at system level without any hardware overhead. In contrast to the existing system-level methods, network resources are also considered to be potentially faulty. Accordingly, applications are mapped onto healthy nodes of the system at run-time such that their interaction will not require the use of faulty elements. By utilizing the simple routing approach, results show 100% utilizability of PEs and 99.41% of successful mapping when up to 8 links are broken. SHiFA design is based on distributed operating systems, such that it is kept scalable for future many-core systems. A significant improvement in scalability properties is observed compared to the state-of-the-art distributed approaches.


international conference on future information technology | 2010

History-Based Dynamic Voltage Scaling with Few Number of Voltage Modes for GALS NoC

Abbas Rahimi; Mostafa E. Salehi; Mohammad Fattah; Siamak Mohammadi

In this paper, we propose a history-based dynamic voltage scaling (DVS) scheme for a GALS NoC which is suitable for MPSoC architectures. The DVS scheme exploits the link utilization and adjusts the router voltages among few number of voltage modes. The introduced architecture is simulated in 90nm CMOS technology with accurate Spice simulations. Experimental results show that history-based DVS successfully adjusts router voltages to track actual link utilization over time. In comparison to the system in which the voltage is fixed at 1.0v, in a 91% saturated network, the proposed DVS scheme saves dynamic and leakage power about 3.3 and 2.3 times, respectively. In addition, 17% energy-delay saving is achieved in the same traffic load.


reconfigurable communication centric systems on chip | 2012

Transport layer aware design of network interface in many-core systems

Mohammad Fattah; Masoud Daneshtalab; Pasi Liljeberg; Juha Plosila

In this paper we instigate the design of network interfaces which have knowledge about the transport layer and networking protocols of many-core systems. Workload dynamicity and multitasking are two main features of many-core systems, which are handled by relatively small kernels on each core. In the message-passing paradigm the kernel also acts as the transport layer interface to tasks for exchanging packets. However, the networking overhead of kernel cripples the real network performance. The proposed NI eases the networking job of kernels and reduces their performance bottleneck. This is done in the receiver side of the NI by depacketizing, storing, and retrieving transport packets in the hardware level. Simulation results show up to 4 times reduction in the network packets latency as well as up to 4.7 times enhancement in the achievable bandwidth in the transport layer. Furthermore, the worst case latency of the network gets significantly balanced, which makes the system more reliable and predictable for real-time and stream applications.


asia symposium on quality electronic design | 2010

A high-throughput, metastability-free GALS channel based on pausible clock method

Mohammad Ali Rahimian; Siamak Mohammadi; Mohammad Fattah

Synchronization issues such as metastability in multi-clock domain systems have become a big problem, reducing data transmission throughput between domains. In this paper, a high-throughput, metastability-free data transmission channel based on pausible clock method in Globally-Asynchronous Locally-Synchronous (GALS) systems is proposed. This channel can be used as the interconnection of mixed-clock synchronous IP cores without having concerns about their synchronization. We show that the probability of metastability in our design is practically zero; and this without loss of throughput and latency, allowing the transmitter and receiver to operate with their own maximum clock frequency. The proposed channel is simulated in 90nm CMOS process using Predictive Technology Model (PTM) library. Gate delays and power parameters are extracted from Spice simulations and are back annotated into our channel HDL code. The throughput, latency and power are analyzed and compared with existing designs.

Collaboration


Dive into the Mohammad Fattah's collaboration.

Top Co-Authors

Avatar

Juha Plosila

Information Technology University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hannu Tenhunen

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Amir-Mohammad Rahmani

Information Technology University

View shared research outputs
Top Co-Authors

Avatar

Masoud Daneshtalab

Mälardalen University College

View shared research outputs
Top Co-Authors

Avatar

Mohammad Hashem Haghbayan

Information Technology University

View shared research outputs
Top Co-Authors

Avatar

Abbas Rahimi

University of California

View shared research outputs
Top Co-Authors

Avatar

Anil Kanduri

Information Technology University

View shared research outputs
Top Co-Authors

Avatar

Igor Tcarenko

Information Technology University

View shared research outputs
Top Co-Authors

Avatar

Kameswar Rao Vaddina

Information Technology University

View shared research outputs
Researchain Logo
Decentralizing Knowledge