Is this you? Create Your Porfile

Ikki Fujiwara

National Institute of Informatics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ikki Fujiwara is active.

Explore More

Publication

Featured researches published by Ikki Fujiwara.

symposium on applications and the internet | 2010

Applying Double-Sided Combinational Auctions to Resource Allocation in Cloud Computing

Ikki Fujiwara; Kento Aida; Isao Ono

We believe that a market-based resource allocation will be effective in a cloud computing environment where resources are virtualized and delivered to users as services. We propose such a market mechanism to allocate services to participants efficiently. The mechanism enables users (1) to order a combination of services for workflows and co-allocations and (2) to reserve future/current services in a forward/spot market. The evaluation shows that the mechanism works well in probable setting.

design, automation, and test in europe | 2014

Low-latency wireless 3D NoCs via randomized shortcut chips

Hiroki Matsutani; Michihiro Koibuchi; Ikki Fujiwara; Takahiro Kagami; Yasuhiro Take; Tadahiro Kuroda; Paul Bogdan; Radu Marculescu; Hideharu Amano

In this paper, we demonstrate that we can reduce the communication latency significantly by inserting a fraction of randomness into a wireless 3D NoC (where CMOS wireless links are used for vertical inter-chip communication) when considering the physical constraints of the 3D design space. Towards this end, we consider two cases, namely 1) replacing existing horizontal 2D links in a wireless 3D NoC with randomized shortcut NoC links and 2) enabling full connectivity by adding a randomized NoC layer to a wireless 3D platform with partial or no horizontal connectivity. Consequently, the packet routing is optimized by exploiting both the existing and the newly added random NoC. At the same time, by adding randomly wired shortcut NoCs to a wireless 3D platform, a good balance can be established between the modularity of the design and the minimum randomness needed to achieve low latency, and experimental results show that by adding a random NoC chip to wireless 3D CMPs without built-in horizontal connectivity, the communication latency can be reduced by as much as 26.2% when compared to adding a 2D mesh NoC. Also, the application execution time and average flit transfer energy can be improved accordingly.

high-performance computer architecture | 2013

Layout-conscious random topologies for HPC off-chip interconnects

Michihiro Koibuchi; Ikki Fujiwara; Hiroki Matsutani; Henri Casanova

As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Random network topologies can be used to achieve low hop counts between nodes and thus low latency. However, random topologies lead to increased aggregate cable length and cable packaging complexity on a machine room floor. In this work we propose two new methods for generating random topologies and their physical layout on a floorplan: randomize links after optimizing the physical layout, or optimize the layout after randomizing links. The first method randomly swaps link endpoints for a given non-random topology for which a good physical layout is known. The resulting topology has the same cable length and cable packaging as the original topology, but achieves lower communication latency. The second method creates a random topology with random links picked so that they will not lead to a long physical cable length, and then solves a constrained optimization problem to compute a physical layout that minimizes aggregate cable length. We quantitatively compare these two methods using both graph analysis and cycle-accurate network simulation, including comparisons with previously proposed random topologies and non-random topologies.

parallel and distributed computing: applications and technologies | 2012

Cabinet Layout Optimization of Supercomputer Topologies for Shorter Cable Length

Ikki Fujiwara; Michihiro Koibuchi; Henri Casanova

As the scales of supercomputers increase total cable length becomes enormous, e.g., up to thousands of kilometers. Recent high-radix switches with dozens of ports make switch layout and system packaging more complex. In this study, we study the optimization of the physical layout of topologies of switches on a machine room floor with the goal of reducing cable length. For a given topology, using graph clustering algorithms, we group switches logically into cabinets so that the number of inter-cabinet cables is small. Then, we map the cabinets onto a physical floor space so as to minimize total cable length. This is done by modeling and optimizing the mapping problem as a facility location problem. Our evaluation results show that, when compared to standard clustering/mapping approaches and for popular network topologies, our clustering approach can reduce the number of inter-cabinet cables by up to 40.3% and our mapping approach can reduce the inter-rack cable length by up to 39.6%.

high-performance computer architecture | 2015

Augmenting low-latency HPC network with free-space optical links

Ikki Fujiwara; Michihiro Koibuchi; Tomoya Ozaki; Hiroki Matsutani; Henri Casanova

Various network topologies can be used for deploying High Performance Computing (HPC) clusters. The network topology, which connects switches In cabinets on a machine room floor, is typically defined once and for all at system deployment time. For a diverse application workload, there are downsides to having a single wired topology. In this work, we propose using free-space optics (FSO) in large-scale systems so that a diverse application workload can be better supported. A high-density layout of FSO terminals on top of the cabinets is determined that allows line-of-sight communication between arbitrary cabinet pairs. We first show that our proposal reduces both end-to-end network latency and total cable length when compared to a wired topology. We then demonstrate that the use of FSO links improves the embedding/partitioning capabilities of a wired topology. More specifically, we show that a recently proposed random low-latency topology can be augmented with a reasonable number of FSO links to support multiple k-ary n-cube and fat tree embedded topologies. Finally, we investigate power-aware on/off link regulation techniques and show how adding/reconfiguring FSO links leads to both performance and power efficiency improvements.

IEEE Transactions on Parallel and Distributed Systems | 2015

Swap-And-Randomize: A Method for Building Low-Latency HPC Interconnects

Ikki Fujiwara; Michihiro Koibuchi; Hiroki Matsutani; Henri Casanova

Random network topologies have been proposed to create low-diameter, low-latency interconnection networks in large-scale computing systems. However, these topologies are difficult to deploy in practice, especially when re-designing existing systems, because they lead to increased total cable length and cable packaging complexity. In this work we propose a new method for creating random topologies without increasing cable length: randomly swap link endpoints in a non-random topology that is already deployed across several cabinets in a machine room. We quantitatively evaluate topologies created in this manner using both graph analysis and cycle-accurate network simulation, including comparisons with non-random topologies and previously-proposed random topologies.

international parallel and distributed processing symposium | 2014

Skywalk: A Topology for HPC Networks with Low-Delay Switches

Ikki Fujiwara; Michihiro Koibuchi; Hiroki Matsutani; Henri Casanova

With low-delay switches on the horizon, end-to-end latency in large-scale High Performance Computing (HPC) interconnects will be dominated by cable delays. In this context we define a new network topology, Skywalk, for deploying low-latency interconnects in upcoming HPC systems. Skywalk uses randomness to achieve low latency, but does so in a way that accounts for the physical layout of the topology so as to lead to further cable length and thus latency reductions. Via graph analysis and discrete-event simulation we show that Skywalk compares favorably (in terms of latency, cable length, and throughput) to traditional low-degree torus and moderate-degree hypercube topologies, to high-degree fully-connected Dragonfly topologies, to the HyperX topology, and to recently proposed fully random topologies.

international conference on parallel processing | 2013

Distributed Shortcut Networks: Layout-Aware Low-Degree Topologies Exploiting Small-World Effect

Van K. Nguyen; Nhat T. X. Le; Ikki Fujiwara; Michihiro Koibuchi

Low communication latency becomes a main concern in highly parallel computers and supercomputers. Random network topologies are best to achieve low average shortest path length and low diameter in hop counts between nodes and thus low communication latency. However, random topologies lead to a problem of increased aggregate cable length on a machine room floor. In this context we propose low-degree non-random topologies that exploit the small-world effect, which has been typically well modeled by some random network models. Our main idea is to carefully design a set of various-length shortcuts that keep the diameter small while maintain an economical cable length. Our experimental graph analysis showed that our proposed topology has low diameter and low average shortest path length, which is considerably better than those of a counterpart 2-D torus and is near to those of a counterpart random topology with the same average degree. Meanwhile, the proposed topology has average cable length drastically shorter than that of the counterpart random topology. Our cycle-accurate network simulation results show that the proposed topology has lower latency by 15% and almost the same throughput when compared to torus with the same degree.

parallel, distributed and network-based processing | 2016

Suitability of the Random Topology for HPC Applications

Fabien Chaix; Ikki Fujiwara; Michihiro Koibuchi

With each technology improvement, parallel systems get larger, and the impact of interconnection networks becomes more prominent. Random topologies and their variants received more and more attention lately due to their low diameter, low average shortest path length and high scalability. However, existing supercomputers still prefer torus and fat-tree topologies, because a number of existing parallel algorithms are optimized for them and the interconnect implementation is more straight-forward in terms of floor layout. In this paper, we investigate the performance of traditional and emerging parallel workloads on these network topologies, using a event-discrete simulation called SimGrid. We observe that random topology is better for Fourier Transform (FT), Graph500, Himeno benchmarks, and its improvement over the counterpart torus is 18 percent in average. Through this study, our recommendation is to use random topology in current and future supercomputers for these scientific and big-data analysis parallel applications.

international conference on parallel processing | 2016

Randomly Optimized Grid Graph for Low-Latency Interconnection Networks

Koji Nakano; Daisuke Takafuji; Satoshi Fujita; Hiroki Matsutani; Ikki Fujiwara; Michihiro Koibuchi

In this work we present randomly optimized grid graphs that maximize the performance measure, such as diameter and average shortest path length (ASPL), with subject to limited edge length on a grid surface. We also provide theoretical lower bounds of the diameter and the ASPL, which prove optimality of our randomly optimized grid graphs. We further present a diagonal grid layout that significantly reduces the diameter compared to the conventional one under the edge-length limitation. We finally show their applications to three case studies of off-and on-chip interconnection networks. Our design efficiently improves their performance measures, such as end-to-end communication latency, network power consumption, cost, and execution time of parallel benchmarks.

Explore More