Frank Olaf Sem-Jacobsen
Simula Research Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Frank Olaf Sem-Jacobsen.
international symposium on system-on-chip | 2009
Tor Skeie; Frank Olaf Sem-Jacobsen; Samuel Rodrigo; Jose Flich; Davide Bertozzi; Simone Medardoni
The expected increase in number of cores on a single chip leads to the necessity of high-performance on chip interconnects (NoC). Furthermore, in order to fully utilize the abundance of cores, the chip is expected to support a number of applications running on the chip simultaneously. It is therefore necessary to partition the chip to support numerous applications without any risk of interference between them. The success of this depends on the flexibility of the underlying routing algorithm. This paper presents a flexible routing algorithm based on dimension ordered routing, which supports a large variety of irregular (2-D and 3-D) mesh topologies. The algorithm provides high efficiency at very low additional complexity, as is confirmed by experimental results.
international parallel and distributed processing symposium | 2005
Frank Olaf Sem-Jacobsen; Tor Skeie; Olav Lysne; O. Toerudbakken; E. Rongved; Bjørn Dag Johnsen
Fat-trees are a special case of multistage interconnection networks with quite good static fault tolerance capabilities. They are however straightforwardly unable to provide local dynamic fault tolerance. In this paper we propose a network topology based on the fat-tree using two parallel networks with crossover links between them in an effort to enable dynamic fault tolerance. We evaluate and compare this topology with two other similar fat-tree topologies and show through simulations that the new topology is able to improve slightly upon the ability to tolerate faults statically. More importantly, we show that the new network topology is the only one of the evaluated topologies able to tolerate one fault dynamically, with a superior network performance in the face of dynamically handled faults.
cluster computing and the grid | 2012
Frank Olaf Sem-Jacobsen; Olav Lysne
Toleration of faults in the interconnection networks is of vital importance in to days huge computer installations. Still, the existing solutions are short of being satisfactory. They require that the system defaults into a routing algorithm that is inferior to the original, either in terms of performance, or in terms of the need for virtual channels, or both. Furthermore, since support for dynamic reconfiguration is not supported in current hardware, existing methods require the system to be halted while reconfiguration takes place in order to avoid deadlocks. In this paper we present a method that efficiently generates a new routing function in the presence of faults. The new routing function only reroutes the traffic that is affected by the fault, so that the performance of the original routing function is preserved to the extent possible. No specific functionality in the switches is required, we only require exactly the same number of virtual channels in the presence of faults as the original routing algorithm did. Finally, the new routing function is compatible with the old one, so that deadlock free dynamic transition between the old and the new routing function is immediately available. This means that our solution can easily be implemented on current InfiniBand platforms, e.g. through the OFED software stack. We demonstrate that the method is workable for meshes, tori and fat-trees, and that it is able to guarantee one-fault tolerance for all of these topologies.
IEEE Transactions on Computers | 2011
Frank Olaf Sem-Jacobsen; Tor Skeie; Olav Lysne; José Duato
Fat trees are a very common communication architecture in current large-scale parallel computers. The probability of failure in these systems increases with the number of components. We present a routing method for deterministically and adaptively routed fat trees, applicable to both distributed and source routing, that is able to handle several concurrent faults and that transparently returns to the original routing strategy once the faulty components have recovered. The method is local and dynamic, completely masking the fault from the rest of the system. It only requires a small extra functionality in the switches to handle rerouting packets around a fault. The method guarantees connectedness and deadlock and livelock freedom for up to k -1 benign simultaneous switch and/or link faults where k is half the number of ports in the switches. Our simulation experiments show a graceful degradation of performance as more faults occur. Furthermore, we demonstrate that for most fault combinations, our method will even be able to handle significantly more faults beyond the k -1 limit with high probability.
Proceedings of the Fifth International Workshop on Interconnection Network Architecture | 2011
Frank Olaf Sem-Jacobsen; Samuel Rodrigo; Tor Skeie
Many-core chip design requires flexible routing solutions for the interconnect to handle faults, provide performance partitions, and react to dynamic changes in processing requirements and power/heat distribution. We have developed a logic based rerouting mechanism suitable for tolerating dynamic powering down of regions within the application partition on the chip. This mechanism is combined with the logic based FDOR routing algorithm to create a powerful routing algorithm with low implementation cost. This allows for higher system utilisation through enabling more efficient power management as well as supporting many irregular mesh topologies through flexible virtualisation. Results show that powering down a single switch results in an 8% throughput reduction in the worst case for the evaluated topology.
international conference on parallel processing | 2006
Frank Olaf Sem-Jacobsen; Tor Skeie; Olav Lysne; José Duato
Fault tolerance is critical for efficient utilisation of large computer systems. Dynamic fault tolerance allows the network to remain available through the occurance of faults as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Although dynamic fault tolerance may lead to less efficient solutions than static fault tolerance, it allows for a much higher availability of the system. In this paper we devise a dynamic fault tolerant adaptive routing algorithm for the fat tree, a much used interconnect topology, which relies on misrouting around link faults. We show that we are guaranteed to tolerate any combination of less than (num_switch_ports)/2 link faults without the need for additional network resources for deadlock freedom. There is also a high probability of tolerating an even larger number of link faults. Simulation results show that network performance degrades very little when faults are dynamically tolerated
international conference on parallel and distributed systems | 2010
Bartosz Bogdanski; Frank Olaf Sem-Jacobsen; Sven-Arne Reinemo; Tor Skeie; Line Holen; Lars Paul Huse
The fat-tree topology has become a popular choice for InfiniBand fabrics due to its inherent deadlock freedom, fault-tolerance and full bisection bandwidth. InfiniBand is used by more than 40% of the systems on the latest Top 500 list, and many of these systems are based on a fat-tree topology. However, the current InfiniBand fat-tree routing algorithm suffers from flaws that reduce its scalability and flexibility. Counter-intuitively, the achievable throughput per node deteriorates both when the number of nodes in a tree decreases or when the node distribution among leaves is nonuniform. In this paper, we identify the weaknesses of the current enhanced fat-tree routing algorithm in Open Fabrics Enterprise Distribution and we propose extensions to it that alleviate all performance problems related to node distribution. The new algorithm is implemented in OpenSM for real world evaluation and for future contribution to the Open Fabrics community. We demonstrate that our solution allows to achieve a predictable high throughput regardless of the number of nodes and their distribution. Furthermore, the simulations show that our extensions improve throughput up to 30% depending on topology size and node distribution.
high performance embedded architectures and compilers | 2012
Bartosz Bogdanski; Sven-Arne Reinemo; Frank Olaf Sem-Jacobsen; Ernst Gunnar Gran
Existing fat-tree routing algorithms fully exploit the path diversity of a fat-tree topology in the context of compute node traffic, but they lack support for deadlock-free and fully connected switch-to-switch communication. Such support is crucial for efficient system management, for example, in InfiniBand (IB) systems. With the general increase in system management capabilities found in modern InfiniBand switches, the lack of deadlock-free switch-to-switch communication is a problem for fat-tree-based IB installations because management traffic might cause routing deadlocks that bring the whole system down. This lack of deadlock-free communication affects all system management and diagnostic tools using LID routing. In this paper, we propose the sFtree routing algorithm that guarantees deadlock-free and fully connected switch-to-switch communication in fat-trees while maintaining the properties of the current fat-tree algorithm. We prove that the algorithm is deadlock free and we implement it in OpenSM for evaluation. We evaluate the performance of the sFtree algorithm experimentally on a small cluster and we do a large-scale evaluation through simulations. The results confirm that the sFtree routing algorithm is deadlock-free and show that the impact of switch-to-switch management traffic on the end-node traffic is negligible.
symposium on computer architecture and high performance computing | 2006
Frank Olaf Sem-Jacobsen; Olav Lysne; Tor Skeie
An increasing amount of interconnect technologies rely on source routing to forward packets through the network. It is therefore important to develop methods for fault tolerance that are well suited for source routed networks. Dynamic fault tolerance allows the network to remain available through the occurrence of faults, as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Source routing readily supports the source node choosing a different path when a fault occurs, but using this approach, packets already in the network will be lost. Local dynamic fault tolerance, where the packet is routed around the fault locally, would prevent much of the traffic being lost during failures, but this is cumbersome to achieve in source routed networks since packets encountering a fault will need to follow a path different from that encoded in the packet header. In this paper we present a mechanism to achieve local dynamic fault tolerance in source routed fat trees, a topology that has widespread use in supercomputer systems, and compare it with endpoint dynamic fault tolerance. We also show that by combining the two approaches we achieve performance superior to any of the two individually
international parallel and distributed processing symposium | 2008
Frank Olaf Sem-Jacobsen; Olav Lysne
Fault tolerance has become an important part of current supercomputers. Local dynamic fault tolerance is the most expedient way of tolerating faults by preconfiguring the network with multiple paths from every node/switch to every destination. In this paper we present a local shortest path dynamic fault-tolerance mechanism inspired by a solution developed for the Internet, that can be applied to any shortest path routing algorithm such as dimension ordered routing, fat tree routing, layered shortest path, etc., and provide a solution for achieving deadlock freedom in the presence of faults. Simulation results show that 1) for fat trees this yields the to this day highest throughput and lowest requirements on virtual layers for dynamic one-fault tolerance, 2) we require in general few layers to achieve deadlock freedom, and 3) for irregular topologies it gives at most a 10 times performance increase compared to FRoots.