Masoud Daneshtalab
Mälardalen University College
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masoud Daneshtalab.
Journal of Systems Architecture | 2010
Pejman Lotfi-Kamran; Amir-Mohammad Rahmani; Masoud Daneshtalab; Ali Afzali-Kusha; Zainalabedin Navabi
In this paper, an adaptive routing algorithm for two-dimensional mesh network-on-chips (NoCs) is presented. The algorithm, which is based on Dynamic XY (DyXY), is called Enhanced Dynamic XY (EDXY). It is congestion-aware and more link failure tolerant compared to the DyXY algorithm. On contrary to the DyXY algorithm, it can avoid the congestion when routing from the current switch to the destination whose X position (Y position) is exactly one unit apart from the switch X position (Y position). This is achieved by adding two congestion wires (one in each direction) between each two cores which indicate the existence of congestion in a row (column). The same wires may be used to alarm a link failure in a row (column). These signals enable the routing algorithm to avoid these paths when there are other paths between the source and destination pair. To assess the latency of the proposed algorithm, uniform, transpose, hotspot, and realistic traffic profiles for packet injection are used. The simulation results reveal that EDXY can achieve lower latency compared to those of other adaptive routing algorithms across all workloads examined, with a 20% average and 30% maximum latency reduction on SPLASH-2 benchmarks running on a 49-core CMP. The area of the technique is about the same as those of the other routing algorithms.
design automation conference | 2013
Mohammad Fattah; Masoud Daneshtalab; Pasi Liljeberg; Juha Plosila
Stochastic hill climbing algorithm is adapted to rapidly find the appropriate start node in the application mapping of network-based many-core systems. Due to highly dynamic and unpredictable workload of such systems, an agile run-time task allocation scheme is required. The scheme is desired to map the tasks of an incoming application at run-time onto an optimum contiguous area of the available nodes. Contiguous and unfragmented area mapping is to settle the communicating tasks in close proximity. Hence, the power dissipation, the congestion between different applications, and the latency of the system will be significantly reduced. To find an optimum region, we first propose an approximate model that quickly estimates the available area around a given node. Then the stochastic hill climbing algorithm is used as a search heuristic to find a node that has the required number of available nodes around it. Presented agile climber takes the steps using an adapted version of hill climbing algorithm named Smart Hill Climbing, SHiC, which takes the runtime status of the system into account. Finally, the application mapping is performed starting from the selected first node. Experiments show significant gain in the mapping contiguousness which results in better network latency and power dissipation, compared to state-of-the-art works.
networks on chips | 2012
Masoumeh Ebrahimi; Masoud Daneshtalab; Fahimeh Farahnakian; Juha Plosila; Pasi Liljeberg; Maurizio Palesi; Hannu Tenhunen
The occurrence of congestion in on-chip networks can severely degrade the performance due to increased message latency. In mesh topology, minimal methods can propagate messages over two directions at each switch. When shortest paths are congested, sending more messages through them can deteriorate the congestion condition considerably. In this paper, we present an adaptive routing algorithm for on-chip networks that provide a wide range of alternative paths between each pair of source and destination switches. Initially, the algorithm determines all permitted turns in the network including 180-degree turns on a single channel without creating cycles. The implementation of the algorithm provides the best usage of all allowable turns to route messages more adaptively in the network. On top of that, for selecting a less congested path, an optimized and scalable learning method is utilized. The learning method is based on local and global congestion information and can estimate the latency from each output channel to the destination region.
design, automation, and test in europe | 2012
Masoumeh Ebrahimi; Masoud Daneshtalab; Pasi Liljeberg; Juha Plosila; Hannu Tenhunen
Congestion occurs frequently in Networks-on-Chip when the packets demands exceed the capacity of network resources. Congestion-aware routing algorithms can greatly improve the network performance by balancing the traffic load in adaptive routing. Commonly, these algorithms either rely on purely local congestion information or take into account the congestion conditions of several nodes even though their statuses might be out-dated for the source node, because of dynamically changing congestion conditions. In this paper, we propose a method to utilize both local and non-local network information to determine the optimal path to forward a packet. The non-local information is gathered from the nodes that not only are more likely to be chosen as intermediate nodes in the routing path but also provide up-to-date information to a given node. Moreover, to collect and deliver the non-local information, a distributed propagation system is presented.
IEEE Transactions on Computers | 2014
Masoumeh Ebrahimi; Masoud Daneshtalab; Pasi Liljeberg; Juha Plosila; Jose Flich; Hannu Tenhunen
Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in Chip Multiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and in various parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at the hardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs, each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore the efficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose the Minimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show that an advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the network until all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsets and the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performance improvement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent average and 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.
international conference on computer design | 2012
Mohamamd Fattah; Marco Ramirez; Masoud Daneshtalab; Pasi Liljeberg; Juha Plosila
Increasing the number of processors in a single chip toward network-based many-core systems requires a run-time task allocation algorithm. We propose an efficient mapping algorithm that assigns communicating tasks of incoming applications onto resources of a many-core system utilizing Network-on-Chip paradigm. In our contiguous neighborhood allocation (CoNA) algorithm, we target at the reduction of both internal and external congestion due to detrimental impact of congestion on the network performance. We approach the goal by keeping the mapped region contiguous and placing the communicating tasks in a close neighborhood. A completely synthesizable simulation environment where none of the system objects are assumed to be ideal is provided. Experiments show at least 40% gain in different mapping cost functions, as well as 16% reduction in average network latency compared to existing algorithms.
asia and south pacific design automation conference | 2013
Masoumeh Ebrahimi; Masoud Daneshtalab; Juha Plosila; Farhad Mehdipour
The communication requirements of many-core embedded systems are convened by the emerging Network-on-Chip (NoC) paradigm. As on-chip communication reliability is a crucial factor in many-core systems, the NoC paradigm should address the reliability issues. Using fault-tolerant routing algorithms to reroute packets around faulty regions will increase the packet latency and create congestion around the faulty region. On the other hand, the performance of NoC is highly affected by the network congestion. Congestion in the network can increase the delay of packets to route from a source to a destination, so it should be avoided. In this paper, a minimal and defect-resilient (MD) routing algorithm is proposed in order to route packets adaptively through the shortest paths in the presence of a faulty link, as long as a path exists. To avoid congestion, output channels can be adaptively chosen whenever the distance from the current to destination node is greater than one hop along both directions. In addition, an analytical model is presented to evaluate MD for two-faulty cases.
Archive | 2013
Maurizio Palesi; Masoud Daneshtalab
This book provides a single-source reference to routing algorithms for Networks-on-Chip (NoCs), as well as in-depth discussions of advanced solutions applied to current and next generation, many core NoC-based Systems-on-Chip (SoCs). After a basic introduction to the NoC design paradigm and architectures, routing algorithms for NoC architectures are presented and discussed at all abstraction levels, from the algorithmic level to actual implementation. Coverage emphasizes the role played by the routing algorithm and is organized around key problems affecting current and next generation, many-core SoCs. A selection of routing algorithms is included, specifically designed to address key issues faced by designers in the ultra-deep sub-micron (UDSM) era, including performance improvement, power, energy, and thermal issues, fault tolerance and reliability.
design, automation, and test in europe | 2008
Pejman Lotfi-Kamran; Masoud Daneshtalab; Caro Lucas; Zainalabedin Navabi
A novel routing algorithm, named balanced adaptive routing protocol (BARP), is proposed for NoCs to provide adaptive routing and ensure deadlock-free and livelock-free routing at the same time. By evenly distributing input packets of a router among all its shortest path output ports, a novel adaptive routing protocol for avoiding congestion condition emerges. It is observed that BARP can achieve better performance compared to static XY routing, odd- even routing and dynamic XY routing.
digital systems design | 2012
Masoumeh Ebrahimi; Masoud Daneshtalab; Juha Plosila; Hannu Tenhunen
While Networks-on-Chip have been increasing in popularity with industry and academia, it is threatened by the decreasing reliability of aggressively scaled transistors. This level of failure has architectural level ramifications, as it may cause an entire on-chip network to fail. Traditional fault-tolerant routing algorithms can overcome the faulty links or routers by rerouting packets around faulty regions. These approaches increase the packet latency and create congestion around the faulty region. In this paper, we present a novel fault-tolerant method that is able to route packets through shortest paths in the presence of faulty links, as long as a path exists. Although the same idea can be applied to a network with any number of virtual channels, we utilize two virtual channels to tolerate all one and two faulty links. Finally, the method is extended to support multiple faulty links by fully utilizing all allowable turns in the network.